Sitemap.xml: what it is, why it matters, and how to build one
A sitemap is how search engines find every page on your site. It's a plain XML file that hands them a clean, complete list of every URL worth indexing. Every site should have one. For new sites, big sites, and sites with content that's hard to find through internal links, the sitemap is the difference between getting indexed in days vs months.
What a sitemap is
A sitemap.xml is a file at the root of your website (https://yoursite.com/sitemap.xml) that lists every URL you want search engines to discover, along with optional metadata about when each page was last updated and how often it changes. Search engines crawl the sitemap when they visit, and use it as a shortcut to find new and updated pages instead of relying on internal links alone.
The minimal format
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
</url>
<url>
<loc>https://example.com/about/</loc>
</url>
</urlset>
That's a valid sitemap. The only required field per entry is <loc>, the URL itself. Everything else is optional metadata.
The optional fields (and which ones actually matter)
<url> <loc>https://example.com/blog/post</loc> <lastmod>2026-05-25</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url>
- lastmod: useful. Search engines use it to decide when to re-crawl. Use ISO 8601 format (
YYYY-MM-DDor with full timestamp). - changefreq: mostly ignored. Major search engines have publicly said they don't use this field. Still recognized by older crawlers.
- priority: mostly ignored. Major search engines have said they don't use this either. Some smaller crawlers might.
So in practice: include loc and lastmod. The other two can be included for completeness or skipped without consequence.
Why every site needs a sitemap
A sitemap is how search engines see all your pages. Without one, they rely on following links to discover content, which is slow and often incomplete. A sitemap hands them the full list directly. The benefit is biggest for:
- New sites. Search engines have no history with you. A sitemap accelerates initial discovery.
- Large sites. 500+ pages. Crawlers have a budget; a sitemap helps them spend it well.
- Sites with orphan pages. Any page not linked from your main navigation. Without a sitemap, search engines may never find them.
- Content that updates often. News, e-commerce, blogs.
lastmodsignals what to re-crawl. - Multilingual / multi-region sites. The sitemap can carry hreflang annotations for all your language versions (see our hreflang guide).
Even small sites benefit. Adding a sitemap takes minutes, costs nothing, and gives every page a direct path to the index.
After generating the file, test that it's reachable at https://yoursite.com/sitemap.xml and submit it through your search-engine webmaster tools dashboard. Submitting nudges search engines to fetch it sooner instead of waiting for them to find it on their own. The "How to submit it" section below walks through the steps.
What a sitemap does NOT do
A sitemap doesn't guarantee that listed pages get indexed. Search engines may still skip URLs they consider low-quality, duplicate, or blocked by other signals (noindex, canonical pointing elsewhere, robots.txt). The sitemap is an invitation, not a guarantee.
Where to put it
Standard location: https://yoursite.com/sitemap.xml. Put it in the root of your site (not in a subdirectory).
If your sitemap is somewhere else, declare it in your robots.txt:
Sitemap: https://yoursite.com/path/to/sitemap.xml
Splitting big sitemaps
A single sitemap can hold up to 50,000 URLs OR 50 MB uncompressed, whichever comes first. Past that, you need to split into multiple sitemaps and link them from a sitemap index file:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-05-25</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-05-25</lastmod>
</sitemap>
</sitemapindex>
Most sites never hit these limits.
How to submit it to search engines
- Open your search-engine webmaster tools dashboard for the property.
- Find the Sitemaps section in the menu.
- Paste the path (typically
sitemap.xml) and submit. - Wait a few hours. The dashboard shows the status (success / errors / warnings) plus how many URLs were discovered.
Most major search engines have a similar webmaster-tools dashboard. Others typically pick up the sitemap automatically via the Sitemap: line in your robots.txt.
Build your sitemap.xml in 60 seconds
Paste URLs, optionally add lastmod / changefreq / priority. Download a valid file.
Common mistakes
- Including URLs that return 404 or redirect. Every URL in your sitemap must be the final 200-status canonical URL. Sitemaps full of 404s look like a broken site.
- Including URLs blocked by robots.txt. Confusing signal: "find this page" combined with "don't crawl this page." Pick one.
- Including URLs with noindex. Same issue. If you don't want it indexed, don't list it.
- Including non-canonical URLs. If
/pageand/page?ref=emailshow the same content, only list the canonical (/page). - Including non-HTML files like PDFs without thought. They can be listed, but only if you want them in search results. A 200-page PDF will likely outrank your HTML page on the same topic if both are indexed.
- Outdated
lastmoddates. Don't setlastmodto today on every URL just to get re-crawled. Search engines will catch on and start ignoring the field. - Never updating it. A sitemap from 2022 with new posts missing teaches search engines to stop trusting it. Regenerate when content changes — most CMS plugins do this automatically.
How to keep it updated
Three approaches, in order of effort:
- Plugin / built-in feature: most CMSes, e-commerce platforms, and static-site hosts include sitemap auto-generation either built in or via a free plugin. Set-and-forget.
- Static site generator: most modern static site generators build sitemaps at compile time. Updated every deploy.
- Manual: fine for tiny sites or one-time launches. Regenerate with a tool when content changes and upload the new file.