If you wish to restrict all or part of your website from being indexed by various search engine robots you can use a robots.txt file.
For it to work properly it should be a simple ASCII text file named exactly “robots.txt” and it should be placed in the domain root directory. The well behaved robot will look at this location for instructions before indexing anything on the website.
You will need a separate robots.txt in the root directory for every sub domain you have as well. Apart from the root directory, a robots.txt file in any other location such as a subdirectory, will be ignored.
The basic syntax involves two lines.
- User-agent: the robot the following rule applies to
- Disallow: the pages you want to block
Here is a robots.txt that will block an entire site. An asterisk indicates all robots should be blocked.
User-agent: * Disallow: /
This will allow an entire domain. You can achieve the same thing by removing the robots.txt file as well.
User-agent: * Disallow:
You can block a specific robot.
User-agent: googlebot Disallow: /
Block a specific directory. Make sure you include the forward slash.
User-agent: googlebot Disallow: /sample_directory/
Block a specific file.
User-agent: googlebot Disallow: /sample_file.htm
Block a multiple directories and files.
User-agent: * Disallow: /sample_directory1/ Disallow: /sample_directory2/ Disallow: /sample_file1.htm Disallow: /sample_file2.htm
Block everything for every robot except for google which can index everything.
User-agent: * Disallow: / User-agent: googlebot Disallow: