What is robots.txt file?

Last updated 2 years, 8 months ago | 545 views 75     5

Tags:- robots_txt

Googlebot | What is a robots.txt file?

A robots.txt is a simple text file that can be created by a notepad editor. This file is used for search engines, which provide instructions to search engines on how search engines crawl your website. It is supported by all major search engines.

A robots.txt file contains allow and disallow statements that define which sections of website search engines should or shouldn’t crawl. Additionally, an XML Sitemap declaration can be added as well to provide an additional signal about your XML Sitemaps or Sitemap Index file to search engines. 

Make sure that the "robots.txt" file should be placed in the root directory of your website and should access like this:

http://www.example.com/robots.txt

User-agent

User-agent is a way by which browsers and search engines bots identify themselves to web servers. Such as Googlebot which is google's web crawler user agent. By using the user-agents statements, we can provide specific allow and disallow statements to a particular search engine. It’s important to note that if you use user agent sections, such as a Googlebot section, Googlebot will ignore all other sections in the robots.txt file.

User-agent: *
Disallow: /user_login
Disallow: /cart/
Allow: /index.html


//googlebot section

User-agent: googlebot
Disallow: /catalogs
Disallow: /cart/
Allow: /index.html

There are two user agent statement is defined in the above example. 1st statement applies to all search engines which defined by regular expression ("regex") wildcard. However, the second statement applies to just Googlebot. In this case, Googlebot ignores all statements except the Googlebot section.

Disallow Statements

disallow statements is useful for telling search engines not to crawl a certain part of your website.

User-agent: *
Disallow: /cart/
Disallow: /user_login/

Allow Statements

Allow statements are used for telling search engines to crawl certain parts of your website.

User-agent: *
Allow: /index.html
Allow: /category/

if a folder is covered by a disallow statement and you write a allow statement for a specific file from that folder. With allow and disallow statements, the more specific statement wins, so search bots will respect this allow statement despite it being covered by the disallow. 

User-agent:*
Disallow: /folder1/
Allow: /folder1/demo.html

XML Sitemap Declarations

This is an additional feature of the "robots.txt" file.  Since search engine bots start crawling a site by checking the robots.txt file, it provides you an opportunity to notify them of your XML Sitemap(s).

Sitemap: http://www.example.com/sitemap-categories.xml
Sitemap: http://www.example.com/sitemap-blogposts.xml

Here is a sample code of robots.txt 

User-agent: *
Disallow: /search/
Disallow: /private/

Sitemap: http://www.example.com/sitemap.xml
Sitemap: http://www.example.com/sitemap1.xml