Want to get more traffic? Check out how easy SEO can be.
Sign up!

Tutorial On Robots.Txt

 
What is Robot.txt?

Robot Exclusion Standard or Robot Exclusion Protocol provides information to search engine spiders on the directories that have to be skipped or disallowed in your website. Robots.txt protocol is as important as site structure, site content, search engine friendliness and Meta descriptions.  If Robots.txt is implemented incorrectly, it can easily trip up websites. Small errors in the Robots.txt file can prevent your website from being looked up by search engines. It can also change the way search engines index your site and this can have adverse effects on your SEO strategy. If you are interested in knowing more about Robot Exclusion Protocol, click here http://en.wikipedia.org/wiki/Robots_exclusion_standard.

Robots.txt file can be found in the root of the domain. If you open the file in a text editor, you will find a list of directories that the site webmaster asks the search engines to skip. It is therefore, important to ensure that the file does not ask search engines to skip important directories in your website. You can also prevent ‘bat bots’ from indexing you site using robots.txt file.

General Robots.txt format

The Robots.txt file has to be placed in the root of your domain (For example, domain.com/robots.txt). The general format used to exclude all robots from indexing certain parts of a website is given below.

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /junk/

When the above syntax is used, information will be given to the search engine robots to avoid indexing the /cgi-bin, /temp and /junk directories in the website.

Some examples of Robot.txt

Example #1: Allow indexing of everything

User-agent: *

Disallow:

Example #2: Disallow indexing of everything

User-agent: *

Disallow: /

Example #3: Disallow indexing of a specific folder

User-agent: *

Disallow: /folder/

Example #4: Disallow Googlebot from indexing of a folder, except for allowing the indexing of one file in that folder

User-agent: Googlebot

Disallow: /folder1/

Allow: /folder1/myfile.html

Example #5: Allow Only One Specific Robot Access

User-agent: *

Disallow: /

User-agent: Googlebot

Disallow:

Example #6: To exclude a single robot

User-agent: BadBot

Disallow: /

Why it is beneficial to use Robots.txt

  • Using Robots.txt you will be able to disallow directories that you would not want the search engine robots to index. For example directories such as /cgi-bin/, /scripts/, /cart/, /wp-admin/ and other directories that may contain sensitive data.
  • Certain directories in your website may contain duplicate content, such as print versions of articles or web pages. You can use ‘Robots.txt’ to allow search engine robots to index only one version of the duplicate content.
  • You can ensure that the search engine bots index the main content in your website.
  • You can avoid search engines from indexing certain files in a directory that may contain scripts, personal data or other kinds of sensitive data.

What to avoid in Robots.txt

  • Avoid the use of comments in the ‘robots.txt’ file
  • Robots.txt file does not have a “/allow” command. Therefore, avoid using such commands in the file.
  • Do not list all files as it will give others information regarding the files you want to hide. Try to put all files in a directory and disallow that directory.