A sitemap is basically a web page on a website that contains links to other pages within the website - usually organized in a way that the page is useful to humans (HTML sitemaps) and search engines (XML sitemaps). A sitemap contains metadata for all URLs on aspects such as each page’s frequency of updates, relative importance and last update, etc.
While any page that has links to each page on your website may be useful to some extent, a sitemap must apply the use of XML with the code having to follow a given pattern for it to be as efficient as possible as far as providing precise information to search engines is concerned. A sitemap is useful as it provides one location from where a search engine can access all the web pages in a specific site without crawling the whole site on a page-by-page basis, in what may be several link layers. When you want to create a sitemap, here is what you need to do:
Every sitemap should be enclosed in the tags <urlsetxmlns=“[namespace]”>> and </urlset> and all the pages within the sitemap should include, at least, the <url> and <loc> tags, both of which should be enclosed. Optional tags that can be included are <lastmod>, <changefreq>, and <priority>.
Here is an example of a sitemap that has two entries using all of the required as well as optional tags:
<?xml version="1.0" encoding="UTF-8"?>
In the sitemap above, there are two pages; the main page and the “Blog” page. The last modification dates are set and their relative priority and change frequency indicated. The change frequency does not have to be specifically exact and the priority is fully subjective; it shows how you as the site owner feel the page relates to other pages on the website.
What do Sitemap tags mean?
- <urlset> and </urlset> tags; These tags inform search engines like Google the beginning and end of the sitemap. Begin the pages’ list immediately after the <urlset> tag and finish it before the </urlset> closing tag.
- <url> and </url> tags: These tags provide information to the search engine on where the details of each page in the sitemap starts and ends. These tags enclose all the other tags used for each page.
- <loc> and </loc> tags: These tags are very important as they provide information to a search engine on the location of the page. Without page location details, then all the other details and information is useless.
- <lastmod> and </lastmod> tags: (Last Modified Date): These tags provide information on the last modification of the page. They use the format YYYY-MM-DD. For any number less than 10, always remember to include a leading zero. For instance, 2010-07-29 instead of 2010-7-29.
- <changefreq> and </changefreq> tags (Change Frequency): These tags provide information to search engines on the likelihood of the page changing. This information may not necessarily be exact and the search engine may decide to visit the page more or less frequently than it is indicated. For instance, if the change frequency is set on an hourly basis, and your page does not have a high ranking, Google may settle on visiting after every few days or less often. The possible values of this tag include:
- Always: This shows that a page is dynamic and changes at all times when accessed for example in a weather monitoring site
- <priority> and </priority> tags: These tags inform search engines on the importance of the page in comparison to other pages on your website according to the site owner. This probably the least useful tags because in majority of websites as mostly, search engines identify the pages that are more relevant to a specific search.
An ideal sitemap escapes all entities and only contains ASCII letters, numbers and certain symbols. Entities are characters with a special meaning in URLs or in HTML. The ones that need to be escaped include:
- & — The ampersand character must always be written as &
- ' — The single quote character must always be written as '
- " — The double quote character must always be written as "
- < — The less than symbol must always be written as <
- > — The greater than symbol must always be written as >
If there are non English alphabetical letters such as
ü in your URLs, then you need to use the code for such characters.
Where a Sitemap File should Be Inserted
Under normal circumstances, a sitemap can only catalog URLs that are in or under the specific directory where the sitemap is located. Therefore, if for instance you have an online store in a store directory and want to catalog both the pages of the store and your regular site pages, then you should put your Sitemap file in the most preferred directory you want to catalog; usually the site’s “root web directory”. The name of the directory may take different forms such as httpdocs, public html, www, wwwroot or anything else.
Validating your Sitemap before Submission
It is important to validate your Sitemap to ensure that there are no errors before submitting it. This saves you great trouble and problems in the long run.
Submitting Your Sitemap
There are 3 methods you can use to submit your Sitemap to different search engines. These include:
- This method is the simplest of the three though it requires you to ensure that your site is occasionally visited by search engines for the method to work. If your site is new or you are experiencing trouble in making it popular in search engines, consider using the Direct Search Engine Submission method.
Add Sitemap:http://yoursite.com/sitemap.xml to your robots.txt file and the Sitemap will automatically be picked by the search engine, the next time it crawls your site.
- HTTP Request: This method allows you to submit a Sitemap or an update it by directly typing the following: (Note that some search engines may require you to have an existing account with them or use your domain name in the place of “yoursite.com”)
- Direct Search Engine Submission: This is most time consuming method of the three methods as you must access each of the search engines individually. However, necessarily need not do this as the main search engines are able to automatically read the robots.txt file to locate your Sitemap file.
- Bing: http://www.bing.com/webmaster/ping.aspx?siteMap=yoursite.com/sitemap.xml
- Yahoo:http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=http://yoursite.com/sitemap.xml (change YahooDemo to your App ID)
- Google: Ensure that you have a Google Webmaster Tools Account and submit your Sitemap through your account’s interface.
- Yahoo: Just like in submitting a Sitemap through Google, submit through your Yahoo Site Explorer (create an account if you do not have an existing one).
- Bing: Create an MSN account (if you already have an account, you don’t need to create a new one), then submit your Sitemap via the Bing Webmaster Center.
An individual sitemap file should have a size of not more than 10MB and may contain nor more than 50,000 URLs. If your website exceeds these limits, then you should consider using multiple Sitemap files.
Today, most webmasters have resorted to using sitemap generators such as DYNO Mapper that is faster and guarantees a higher accuracy when creating sitemaps. You therefore don’t have to worry about the complicated codes.