Build faster indexing workflows without the spreadsheet swamp. Open the app
Pre-Submission Audit

Sitemap Submission Checklist: Validate, Audit, Then Submit

Most submissions fail silently. This checklist covers XML validation, URL inclusion rules, priority tags, and lastmod accuracy so your sitemap passes Google's requirements on the first try.

On this page
Budget math

Estimate the cost of waiting

Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.

Field notes

Why Most Sitemap Submissions Fail Before They Start

Submitting a sitemap to Google is not a fire-and-forget operation. In practice, when you hit 'submit' in Search Console, Google runs a pre-validation pass. If your XML has a stray closing tag, a missing namespace, or a URL encoded with a non-percent character, the whole file gets queued for manual review or silently dropped. A common situation we see is teams uploading a 50,000-URL sitemap that passes W3C validation but fails because the lastmod field contains a future date or a static string like '2024-01-01' for every URL — Google treats that as a signal to treat the sitemap as low priority.

To understand how Google actually ingests sitemaps, read the official Google Search fundamentals. The key takeaway: sitemaps influence discovery, not ranking. Yet a broken sitemap can delay indexing by weeks. This checklist walks you through pre-flight validation, URL inclusion rules, priority tag best practices, and lastmod accuracy. Use it before every submission, not just the first.

Data table

Sitemap Element Validation: What Google Checks vs. What You Should Check

ElementGoogle RequirementCommon FailureYour Action
XML NamespaceDeclare xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"Missing or incorrect namespace causes parse rejectionValidate with xmllint --noout or online validator before upload
URL CountMax 50,000 URLs per sitemap, file size max 50 MB uncompressedFile exceeds limit and gets truncated without warningSplit via sitemap index file if you have more than 45,000 URLs
URL EncodingAll non-ASCII chars must be percent-encodedSpaces, umlauts, or Cyrillic characters in raw form cause 404 crawl errorsRun sed or use a bulk URL encoder before generating XML
lastmod FormatW3C Datetime: YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DDStatic date (e.g., '2024-01-01' for all URLs) or future dateUse dynamic generation; set lastmod to actual content change timestamp
changefreqOptional, can be 'always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'Overuse of 'always' or 'hourly' on stable pages dilutes signalSet 'weekly' for blog posts, 'monthly' for evergreen, omit for trivial pages
priority0.0 to 1.0, default 0.5All pages set to 1.0 (Google ignores inflated priorities)Use 0.8 for homepage, 0.6 for category pages, 0.3 for thin content; avoid 1.0 except for critical pages
Workflow map

Pre-Submission Sitemap Validation Flow

Generate sitemap XML

Use CMS plugin or custom script with dynamic lastmod and correct encoding

Validate XML structure

Run through W3C XML validator or <code>xmllint</code>; check namespace and closing tags

Verify URL list

Cross-check against your crawl or database; remove blocked, noindex, or redirect URLs

Check lastmod precision

Ensure timestamps are not static; use content update date, not generation date

Test with Google Search Console

Submit to a test property first; inspect coverage report for parse errors

Submit and monitor

Wait 24-48 hours; re-check coverage for indexed vs. discovered URLs

Field notes

URL Inclusion Rules: What Belongs and What Hurts

Your sitemap should only contain URLs that are indexable, canonical, and valuable. Including pages with noindex tags, redirect chains, or 4xx status codes wastes crawl budget and pollutes Google's index. A common situation we see is agencies dumping all blog tags and archive pages into the sitemap — 15,000 URLs where 60% return 404 or soft 404. That signals poor site health.

Use a noindex tag checker before generating your sitemap to filter out pages that robots or meta tags block. Even one noindex URL in the sitemap can cause Google to question the entire file's reliability. After you submit, use a bulk URL index checker to confirm which URLs actually got indexed. If the gap between submitted and indexed is larger than 20%, your sitemap likely contains low-quality or blocked URLs.

Worked example

Worked Example: Validating a 10,000-URL Sitemap

You have an e-commerce site with 10,000 product URLs. You generate a sitemap via a plugin. Before submission, you run these steps:

1. Count URLs: The file contains 10,423 URLs (8,200 products, 1,500 categories, 423 tags, 300 blog posts). You remove all 423 tag pages because they duplicate product listings.

2. Check lastmod: You grep the XML and find 4,200 URLs have lastmod set to '2024-01-01' — a static date from initial import. You regenerate with actual product update timestamps from the database. Now 9,800 URLs have unique dates.

3. Test for noindex: You run the noindex checker and find 180 URLs still set to noindex (old archived products). You remove them. Final count: 9,620 URLs.

4. Validate encoding: You find 12 URLs with unencoded umlauts (e.g., 'f%C3%BCr' is fine, but 'für' is not). You fix encoding via a batch sed command.

5. Submit: After 48 hours, you check the bulk URL index checker: 9,100 out of 9,620 are indexed (94.6% success). The missing 520 are mostly category pages with thin content — you decide to improve those pages rather than keep them in the sitemap.

Final Pre-Submission Checklist

1

Validate XML syntax with an online validator or xmllint

2

Ensure URL count is under 50,000 and file size under 50 MB (uncompressed) or 100 MB gzipped

3

Remove all noindex, 4xx, 5xx, and redirect-chain URLs

4

Set lastmod to actual content update dates, not static values or generation timestamps

5

Use priority 0.5 as default; reserve 0.8-1.0 for high-value pages only

6

Percent-encode all non-ASCII characters in URLs

7

Test with a Google Search Console property before submitting to production

8

Monitor coverage report for 24-48 hours after submission for parse errors

FAQ

What is the most common error in sitemap submission checklist Google validation?

The most common error is a missing or incorrect XML namespace declaration. Google's parser will reject the entire file if xmlns is missing. Second is static lastmod dates — Google treats uniform dates as a signal to deprioritize the sitemap.

How do I check if my sitemap has noindex URLs before submission?

Use a dedicated noindex tag checker tool that scans the actual HTML of each URL in your sitemap. Alternatively, run a crawl with Screaming Frog and filter on 'noindex' meta tags. Remove those URLs from your sitemap before generating the XML.

What is the recommended priority value for blog posts in a sitemap?

For blog posts, use 0.6 to 0.7 if the content is timely and high-quality. For evergreen blog content, 0.5 is sufficient. Avoid setting all blog posts to 1.0 — Google will ignore inflated priorities and may distrust your entire sitemap.

Can I submit a sitemap with more than 50,000 URLs?

No, Google's limit is 50,000 URLs per individual sitemap and 50 MB uncompressed file size. If you have more, create a sitemap index file that lists multiple sitemaps. Each sitemap in the index must adhere to the same limits.

How long after submitting a sitemap to Google should I wait to check results?

Wait at least 24 to 48 hours. Google's crawl queue processes sitemaps within that window. After 48 hours, check the Search Console coverage report for parse errors, and run a bulk URL index checker to compare submitted vs. indexed counts.

What does a static lastmod date do to my sitemap's effectiveness?

A static lastmod date (e.g., all URLs set to '2024-01-01') tells Google that the content has not changed. Google may ignore your sitemap for fresh content discovery. Always use dynamic lastmod values reflecting the actual last modification time of each page.

Should I include category pages with thin content in my sitemap?

No. Category pages with little to no unique content, especially those that just list products, are considered thin. Including them can lower your sitemap's quality signal. Only include category pages that have at least 200-300 words of unique editorial content.

How do I bulk verify that my submitted sitemap URLs are actually indexed?

Use a bulk URL index checker that accepts a list of URLs and returns indexed status. Compare the results to your sitemap. If more than 20% of URLs are not indexed, remove low-quality or blocked URLs and resubmit.

What happens if I submit a sitemap with URLs that return 404 errors?

Google will crawl those URLs, see the 404 status, and eventually remove them from the index. However, repeatedly submitting 404 URLs can signal poor site maintenance and reduce crawl trust. Always verify URL status codes before inclusion.

Can I use changefreq and priority to force Google to crawl my pages faster?

No. changefreq and priority are only hints. Google uses its own algorithms to determine crawl frequency. Setting changefreq to 'hourly' or priority to 1.0 on all pages will not increase crawl speed; it may actually cause Google to ignore those fields.

Next reads

Related guides