<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>WiredRevolution.com &#187; web development</title>
	<atom:link href="http://www.wiredrevolution.com/category/web-development/feed" rel="self" type="application/rss+xml" />
	<link>http://www.wiredrevolution.com</link>
	<description>A Bit of Linux Wisdom</description>
	<lastBuildDate>Wed, 18 Jan 2012 22:45:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Control search engine indexing with robots.txt</title>
		<link>http://www.wiredrevolution.com/web-development/control-search-engine-indexing-with-robotstxt?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=control-search-engine-indexing-with-robotstxt</link>
		<comments>http://www.wiredrevolution.com/web-development/control-search-engine-indexing-with-robotstxt#comments</comments>
		<pubDate>Sat, 27 Sep 2008 04:22:35 +0000</pubDate>
		<dc:creator>Ryan</dc:creator>
				<category><![CDATA[web development]]></category>
		<category><![CDATA[robots.txt]]></category>

		<guid isPermaLink="false">http://www.wiredrevolution.com/?p=104</guid>
		<description><![CDATA[<img src="http://www.wiredrevolution.com/wp-content/uploads/webdevel_logo.png" width="80" height="76" alt="" title="web development" /><br/>If you wish to restrict all or part of your website from being indexed by various search engine robots you can use a robots.txt file. For it to work properly it should be a simple ASCII text file named exactly &#8220;robots.txt&#8221; and it should be placed in the domain root directory. The well behaved robot [...]


Related posts<ol><li><a href='http://www.wiredrevolution.com/commands/search-for-files-with-the-find-command' rel='bookmark' title='Permanent Link: Search for files with the find command'>Search for files with the find command</a></li>
<li><a href='http://www.wiredrevolution.com/system-administration/free-ext3-reserved-blocks-with-tune2fs' rel='bookmark' title='Permanent Link: Free ext3 reserved blocks with tune2fs'>Free ext3 reserved blocks with tune2fs</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<img src="http://www.wiredrevolution.com/wp-content/uploads/webdevel_logo.png" width="80" height="76" alt="" title="web development" /><br/><p>If you wish to restrict all or part of your website from being indexed by various search engine robots you can use a robots.txt file.</p>
<p>For it to work properly it should be a simple ASCII text file named exactly &#8220;robots.txt&#8221; and it should be placed in the domain root directory.  The well behaved robot will look at this location for instructions before indexing anything on the website.  </p>
<p>You will need a separate robots.txt in the root directory for every sub domain you have as well. Apart from the root directory, a robots.txt file in any other location such as a subdirectory, will be ignored.  </p>
<p>The basic syntax involves two lines.</p>
<ul>
<li><strong>User-agent</strong>: the robot the following rule applies to</li>
<li><strong>Disallow</strong>: the pages you want to block</li>
</ul>
<p>Here is a robots.txt that will block an entire site. An asterisk indicates all robots should be blocked.</p>
<pre>
User-agent: *
Disallow: /
</pre>
<p>This will allow an entire domain. You can achieve the same thing by removing the robots.txt file as well.</p>
<pre>
User-agent: *
Disallow:
</pre>
<p>You can block a specific robot.</p>
<pre>
User-agent: googlebot
Disallow: /
</pre>
<p>Block a specific directory. Make sure you include the forward slash.</p>
<pre>
User-agent: googlebot
Disallow: /sample_directory/
</pre>
<p>Block a specific file.</p>
<pre>
User-agent: googlebot
Disallow: /sample_file.htm
</pre>
<p>Block a multiple directories and files.</p>
<pre>
User-agent: *
Disallow: /sample_directory1/
Disallow: /sample_directory2/
Disallow: /sample_file1.htm
Disallow: /sample_file2.htm
</pre>
<p>Block everything for every robot except for google which can index everything.</p>
<pre>
User-agent: *
Disallow: /

User-agent: googlebot
Disallow:
</pre>


<p>Related posts<ol><li><a href='http://www.wiredrevolution.com/commands/search-for-files-with-the-find-command' rel='bookmark' title='Permanent Link: Search for files with the find command'>Search for files with the find command</a></li>
<li><a href='http://www.wiredrevolution.com/system-administration/free-ext3-reserved-blocks-with-tune2fs' rel='bookmark' title='Permanent Link: Free ext3 reserved blocks with tune2fs'>Free ext3 reserved blocks with tune2fs</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.wiredrevolution.com/web-development/control-search-engine-indexing-with-robotstxt/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

