August 2, 2010

How to use Sitemap index files (to group multiple sitemap files)

A graphical depiction of a very simple xml doc...
This article will explain howto use Sitemap index files (to group multiple sitemap files) for a large website or Multiple Sites
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760 bytes). If you would like, you may compress your Sitemap files using gzip to reduce your bandwidth requirement; however the sitemap file once uncompressed must be no larger than 10MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files.

If you do provide multiple Sitemaps, you should then list each Sitemap file in a Sitemap index file. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 10MB (10,485,760 bytes) and can be compressed. You can have more than one Sitemap index file. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file.

The Sitemap index file must:

* Begin with an opening tag and end with a closing tag.
* Include a entry for each Sitemap as a parent XML tag.
* Include a child entry for each parent tag.



The optional tag is also available for Sitemap index files.

Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.example.com or http://yourhost.yoursite.com. As with Sitemaps, your Sitemap index file must be UTF-8 encoded.
Sample XML Sitemap Index

The following example shows a Sitemap index that lists two Sitemaps:




http://www.example.com/sitemap1.xml.gz
2004-10-01T18:23:17+00:00


http://www.example.com/sitemap2.xml.gz
2005-01-01



Note: Sitemap URLs, like all values in your XML files, must be entity escaped.

Sitemap Index XML Tag Definitions
required -Encapsulates information about all of the Sitemaps in the file.
required Encapsulates information about an individual Sitemap.
required - Identifies the location of the Sitemap. This location can be a Sitemap, an Atom file, RSS file or a simple text file.
optional-Identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in W3C Datetime format.

By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler may only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.
 

Enhanced by Zemanta

About the Author

Tomboy

Author & Editor

Has laoreet percipitur ad. Vide interesset in mei, no his legimus verterem. Et nostrum imperdiet appellantur usu, mnesarchum referrentur id vim.

Post a Comment

 
Iwebslog Blog © 2015 - Designed by Templateism.com