Ever wanted to keep a part of your website from being indexed by the search engines? A simple robot.txt file allows you to keep the search engine spiders from crawling your files. Here’s a quick little guide on setting one up for your webiste.
Start off by opening up a text editor like notepad. Enter the following lines:
User-agent: *
Disallow: /
Save it as robots.txt and upload it to the root of your website. Most of the time, this is in the /public_html/ folder. The example above instructs the spider that you don’t want your site crawled. If you want to prevent the spider from crawling specific directories, use the following instead:
User-agent: *
Disallow: /test/
The example above is telling the spider to not crawl the folder called ‘test’. Don’t forget the trailing slash ( / ) mark. If you want to prevent the spider from crawling multiple directories, just enter them on a new line like this:
User-agent: *
Disallow: /test/
Disallow: /blog/
Disallow: /forum/
Note that all this also prevents the spider from crawling any files or sub-directories inside the directory you specify. So in this last example above, if I had a folder called ‘research’ underneath the folder called ‘test’, then it would also not get spidered.
Lastly, if you have any programs running like a blog or forum that you don’t want to get spidered, then make sure that you turn off any RSS feeds and update ping functions.
Technorati Tags: no indexing, no spidering, don’t spider, don’t index, hide site from search engine


















