Control search engine indexing with robots.txt

by
Ryan
on
September 26, 2008

If you wish to restrict all or part of your website from being indexed by various search engine robots you can use a robots.txt file.

For it to work properly it should be a simple ASCII text file named exactly “robots.txt” and it should be placed in the domain root directory. The well behaved robot will look at this location for instructions before indexing anything on the website.

You will need a separate robots.txt in the root directory for every sub domain you have as well. Apart from the root directory, a robots.txt file in any other location such as a subdirectory, will be ignored.

The basic syntax involves two lines.

  • User-agent: the robot the following rule applies to
  • Disallow: the pages you want to block

Here is a robots.txt that will block an entire site. An asterisk indicates all robots should be blocked.

User-agent: *
Disallow: /

This will allow an entire domain. You can achieve the same thing by removing the robots.txt file as well.

User-agent: *
Disallow:

You can block a specific robot.

User-agent: googlebot
Disallow: /

Block a specific directory. Make sure you include the forward slash.

User-agent: googlebot
Disallow: /sample_directory/

Block a specific file.

User-agent: googlebot
Disallow: /sample_file.htm

Block a multiple directories and files.

User-agent: *
Disallow: /sample_directory1/
Disallow: /sample_directory2/
Disallow: /sample_file1.htm
Disallow: /sample_file2.htm

Block everything for every robot except for google which can index everything.

User-agent: *
Disallow: /

User-agent: googlebot
Disallow:
No Comments
web development
,

No related posts.

Comments (0)

No comments yet

Trackbacks (0)

No trackbacks yet

Leave a Comment

(displayed with your post)
(will not be published)
(optional)
Copyright 2008-2010 WiredRevolution.com. All rights reserved.