Googlebot | What is a robots.txt file?
A robots.txt is a simple text file that can be created by a notepad editor. This file is used for search engines, which provide instructions to search engines on how search engines crawl your website. It is supported by all major search engines.
A robots.txt file contains allow and disallow statements that define which sections of website search engines should or shouldn’t crawl. Additionally, an XML Sitemap declaration can be added as well to provide an additional signal about your XML Sitemaps or Sitemap Index file to search engines.
Make sure that the "robots.txt" file should be placed in the root directory of your website and should access like this:
User-agent is a way by which browsers and search engines bots identify themselves to web servers. Such as Googlebot which is google's web crawler user agent. By using the user-agents statements, we can provide specific allow and disallow statements to a particular search engine. It’s important to note that if you use user agent sections, such as a Googlebot section, Googlebot will ignore all other sections in the robots.txt file.
User-agent: * Disallow: /user_login Disallow: /cart/ Allow: /index.html //googlebot section User-agent: googlebot Disallow: /catalogs Disallow: /cart/ Allow: /index.html
There are two user agent statement is defined in the above example. 1st statement applies to all search engines which defined by regular expression ("regex") wildcard. However, the second statement applies to just Googlebot. In this case, Googlebot ignores all statements except the Googlebot section.
disallow statements is useful for telling search engines not to crawl a certain part of your website.
User-agent: * Disallow: /cart/ Disallow: /user_login/
Allow statements are used for telling search engines to crawl certain parts of your website.
User-agent: * Allow: /index.html Allow: /category/
if a folder is covered by a disallow statement and you write a allow statement for a specific file from that folder. With allow and disallow statements, the more specific statement wins, so search bots will respect this allow statement despite it being covered by the disallow.
User-agent:* Disallow: /folder1/ Allow: /folder1/demo.html
XML Sitemap Declarations
This is an additional feature of the "robots.txt" file. Since search engine bots start crawling a site by checking the robots.txt file, it provides you an opportunity to notify them of your XML Sitemap(s).
Sitemap: http://www.example.com/sitemap-categories.xml Sitemap: http://www.example.com/sitemap-blogposts.xml
Here is a sample code of robots.txt
User-agent: * Disallow: /search/ Disallow: /private/ Sitemap: http://www.example.com/sitemap.xml Sitemap: http://www.example.com/sitemap1.xml