Most submissions fail silently. This checklist covers XML validation, URL inclusion rules, priority tags, and lastmod accuracy so your sitemap passes Google's requirements on the first try.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.
Submitting a sitemap to Google is not a fire-and-forget operation. In practice, when you hit 'submit' in Search Console, Google runs a pre-validation pass. If your XML has a stray closing tag, a missing namespace, or a URL encoded with a non-percent character, the whole file gets queued for manual review or silently dropped. A common situation we see is teams uploading a 50,000-URL sitemap that passes W3C validation but fails because the lastmod field contains a future date or a static string like '2024-01-01' for every URL — Google treats that as a signal to treat the sitemap as low priority.
To understand how Google actually ingests sitemaps, read the official Google Search fundamentals. The key takeaway: sitemaps influence discovery, not ranking. Yet a broken sitemap can delay indexing by weeks. This checklist walks you through pre-flight validation, URL inclusion rules, priority tag best practices, and lastmod accuracy. Use it before every submission, not just the first.
| Element | Google Requirement | Common Failure | Your Action |
|---|---|---|---|
| XML Namespace | Declare xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" | Missing or incorrect namespace causes parse rejection | Validate with xmllint --noout or online validator before upload |
| URL Count | Max 50,000 URLs per sitemap, file size max 50 MB uncompressed | File exceeds limit and gets truncated without warning | Split via sitemap index file if you have more than 45,000 URLs |
| URL Encoding | All non-ASCII chars must be percent-encoded | Spaces, umlauts, or Cyrillic characters in raw form cause 404 crawl errors | Run sed or use a bulk URL encoder before generating XML |
| lastmod Format | W3C Datetime: YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DD | Static date (e.g., '2024-01-01' for all URLs) or future date | Use dynamic generation; set lastmod to actual content change timestamp |
| changefreq | Optional, can be 'always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never' | Overuse of 'always' or 'hourly' on stable pages dilutes signal | Set 'weekly' for blog posts, 'monthly' for evergreen, omit for trivial pages |
| priority | 0.0 to 1.0, default 0.5 | All pages set to 1.0 (Google ignores inflated priorities) | Use 0.8 for homepage, 0.6 for category pages, 0.3 for thin content; avoid 1.0 except for critical pages |
Use CMS plugin or custom script with dynamic lastmod and correct encoding
Run through W3C XML validator or <code>xmllint</code>; check namespace and closing tags
Cross-check against your crawl or database; remove blocked, noindex, or redirect URLs
Ensure timestamps are not static; use content update date, not generation date
Submit to a test property first; inspect coverage report for parse errors
Wait 24-48 hours; re-check coverage for indexed vs. discovered URLs
Your sitemap should only contain URLs that are indexable, canonical, and valuable. Including pages with noindex tags, redirect chains, or 4xx status codes wastes crawl budget and pollutes Google's index. A common situation we see is agencies dumping all blog tags and archive pages into the sitemap — 15,000 URLs where 60% return 404 or soft 404. That signals poor site health.
Use a noindex tag checker before generating your sitemap to filter out pages that robots or meta tags block. Even one noindex URL in the sitemap can cause Google to question the entire file's reliability. After you submit, use a bulk URL index checker to confirm which URLs actually got indexed. If the gap between submitted and indexed is larger than 20%, your sitemap likely contains low-quality or blocked URLs.
You have an e-commerce site with 10,000 product URLs. You generate a sitemap via a plugin. Before submission, you run these steps:
1. Count URLs: The file contains 10,423 URLs (8,200 products, 1,500 categories, 423 tags, 300 blog posts). You remove all 423 tag pages because they duplicate product listings.
2. Check lastmod: You grep the XML and find 4,200 URLs have lastmod set to '2024-01-01' — a static date from initial import. You regenerate with actual product update timestamps from the database. Now 9,800 URLs have unique dates.
3. Test for noindex: You run the noindex checker and find 180 URLs still set to noindex (old archived products). You remove them. Final count: 9,620 URLs.
4. Validate encoding: You find 12 URLs with unencoded umlauts (e.g., 'f%C3%BCr' is fine, but 'für' is not). You fix encoding via a batch sed command.
5. Submit: After 48 hours, you check the bulk URL index checker: 9,100 out of 9,620 are indexed (94.6% success). The missing 520 are mostly category pages with thin content — you decide to improve those pages rather than keep them in the sitemap.
Validate XML syntax with an online validator or xmllint
Ensure URL count is under 50,000 and file size under 50 MB (uncompressed) or 100 MB gzipped
Remove all noindex, 4xx, 5xx, and redirect-chain URLs
Set lastmod to actual content update dates, not static values or generation timestamps
Use priority 0.5 as default; reserve 0.8-1.0 for high-value pages only
Percent-encode all non-ASCII characters in URLs
Test with a Google Search Console property before submitting to production
Monitor coverage report for 24-48 hours after submission for parse errors
The most common error is a missing or incorrect XML namespace declaration. Google's parser will reject the entire file if xmlns is missing. Second is static lastmod dates — Google treats uniform dates as a signal to deprioritize the sitemap.
Use a dedicated noindex tag checker tool that scans the actual HTML of each URL in your sitemap. Alternatively, run a crawl with Screaming Frog and filter on 'noindex' meta tags. Remove those URLs from your sitemap before generating the XML.
For blog posts, use 0.6 to 0.7 if the content is timely and high-quality. For evergreen blog content, 0.5 is sufficient. Avoid setting all blog posts to 1.0 — Google will ignore inflated priorities and may distrust your entire sitemap.
No, Google's limit is 50,000 URLs per individual sitemap and 50 MB uncompressed file size. If you have more, create a sitemap index file that lists multiple sitemaps. Each sitemap in the index must adhere to the same limits.
Wait at least 24 to 48 hours. Google's crawl queue processes sitemaps within that window. After 48 hours, check the Search Console coverage report for parse errors, and run a bulk URL index checker to compare submitted vs. indexed counts.
A static lastmod date (e.g., all URLs set to '2024-01-01') tells Google that the content has not changed. Google may ignore your sitemap for fresh content discovery. Always use dynamic lastmod values reflecting the actual last modification time of each page.
No. Category pages with little to no unique content, especially those that just list products, are considered thin. Including them can lower your sitemap's quality signal. Only include category pages that have at least 200-300 words of unique editorial content.
Use a bulk URL index checker that accepts a list of URLs and returns indexed status. Compare the results to your sitemap. If more than 20% of URLs are not indexed, remove low-quality or blocked URLs and resubmit.
Google will crawl those URLs, see the 404 status, and eventually remove them from the index. However, repeatedly submitting 404 URLs can signal poor site maintenance and reduce crawl trust. Always verify URL status codes before inclusion.
No. changefreq and priority are only hints. Google uses its own algorithms to determine crawl frequency. Setting changefreq to 'hourly' or priority to 1.0 on all pages will not increase crawl speed; it may actually cause Google to ignore those fields.