What is robots.txt file?
A robots.txt is a simple text file which can be created by notepad editer. This file is used for search engines, which provide instructions to search engines on how search engines crawl your website. It is supported by all major search engines.
A robots.txt file contains allow and disallow statements that define which sections of website search engines should or shouldn’t crawl. Additionally, an XML Sitemap declaration can be added as well to provide an additional signal about your XML Sitemaps or Sitemap Index file to search engines.
Make sure that the "robots.txt" file should be placed in the root directory of your website and should access like this:
User agent is a way by which a browsers and search engines bots identify themselves to webservers. Such as googlebot which is google's web crawler user agent. By using user-agents statement, we can provide specific allow and disallow statements to a particular search engines.It’s important to note that if you use user agent sections, such as a googlebot section, googlebot will ignore all other sections in the robots.txt file.
User-agent: * Disallow: /user_login Disallow: /cart/ Allow: /index.html //googlebot section User-agent: googlebot Disallow: /catalogs Disallow: /cart/ Allow: /index.html
There are two user agent statement is defined in the above example. 1st statement applies to all search engine which defined by regular expression ("regex") wildcard. However, the second statement applies to just googlebot. In this case googlebot ignore all statements except googlebot section.
disallow statements is use for telling search engines not to crawl certain part of you website.
User-agent: * Disallow: /cart/ Disallow: /user_login/
Allow statements is use for telling search engines to crawl certain part of your website.
User-agent: * Allow: /index.html Allow: /category/
if a folder is covered by disallow statement and you write a allow statement for specific file from that folder. With allow and disallow statements, the more specific statement wins, so search bots will respect this allow statement despite it being covered by the disallow.
User-agent:* Disallow: /folder1/ Allow: /folder1/demo.html
XML Sitemap Declarations
This is an additional featcher of "robots.txt" file. Since search engine bots start crawling a site by checking the robots.txt file, it provides you an opportunity to notify them for your XML Sitemap(s).
Sitemap: http://www.example.com/sitemap-categories.xml Sitemap: http://www.example.com/sitemap-blogposts.xml
Here is a sample code of robots.txt
User-agent: * Disallow: /search/ Disallow: /private/ Sitemap: http://www.example.com/sitemap.xml Sitemap: http://www.example.com/sitemap1.xml