Use robots.txt can prevent effective inbound link
The only thing about using robots.txt to block indexing of search engines is not only that is quite inefficient, but may also reduce the flow of inbound links. Locking a page using robots.txt, search engines are not indexing the content (or links!) Page. This means that if you have inbound links to the page, the link juice can not flow to other pages. It creates an impasse.
While inbound links to the blocked page is probably some benefit to the general area, this value inbound links are not used to their full potential. You missed an opportunity to convey a certain value link internal page blocked in several important internal pages.
3 Big Sites with Blocked Opportunity in the Robots.txt File
*1 - Digg.com
*2 - Blogger.com or Blogspot.com
*3 - IBM
Super Solutions to the Robots.txt
Great site, for example above, we have covered the wrong robots.txt file. Some of the scenarios were not included. The following is a list of effective solutions to maintain the contents index of search engines with no link juice to lose.
Noindex
In most cases the best alternative to robots.txt robots exclusion meta tags. By adding "noindex" and make sure it does not add 'nofollow' your pages will remain in the results of search engines, but will link value.
301 Redirect
The robots.txt file is not a place to list the old worn pages. If the page has expired (delete, move, etc) is not only the block. Redirect this page through a 301 to replace the most relevant. Get more information about redirecting the Knowledge Centre.
Canonical Tag
Do not block your overlap since the versions in robots.txt. Using the tag canon to keep the additional versions of the index, and consolidate the link value. Where possible. Get more information at the Information Centre on canonization and use the rel = tag canon.
Password Protection
The robots.txt is not an effective way to keep the information confidential at the hands of others. If you have confidential information on the Internet, password protect. If you have a login screen, go ahead and add meta tag "noindex" page. If you expect a lot of incoming links on this page for users, be sure to link to some of the most important pages of internal login page. This way, you pass through the link juice.
Effective Robots.txt Usage
The best way to use a robots.txt file does not use it at all. Use it to report that robots will have full access to all files on the site and to control a robot in the sitemap.xml file. That's it.
Your robots.txt file should look like this:
-----------------
User-agent: *
Disallow:
Sitemap: http://www.yoursite.com/sitemap.xml
-----------------
Bad Bots
"Robots and instructions for the robots.txt file," which means that there are robots that do not follow the robots.txt at all. So when you do a good job of keep away with a good, you are doing a horrible job to keep away from "bad" against. In addition to filtering to allow access only to the Google bot Bing is not recommended for three reasons:
1. The engines change/update bot names frequently.
2. Engines employ multiple types of bots for different types of content.
3. New engines/content discovery technologies getting off the ground stand even less of a chance with institutionalized preferences for existing user agents only and search competition is good for the industry.
Competitors
If your competitors are warned SEO in any way whatsoever, they look at your robots.txt file to see what they can discover. Say you are working on a new design or an entirely new product and you have a line in your robots.txt file that disallows bots "index" it. If a competitor appears, check the file and see this folder called "/ newproducttest" when they just won the jackpot! Better to keep it on a staging server, or behind a login. Do not give all your secrets in a small file.
Handling Non-HTML & System Content
* It isn't necessary to block .js and .css files in your robots.txt. The search engines won't index them, but sometimes they like the ability to analyze them so it is good to keep access open.
* To restrict robot access to non-HTML documents like PDF files, you can use the x-robots tag in the HTTP Header.
* Images! Every website has background images or images used for styling that you don't want to have indexed. Make sure these images are displayed through the CSS and not using the tag as much as possible. This will keep them from being indexed, rather than having to disallow the "/style/images" folder from the robots.txt.
* A good way to determine whether the search engines are even trying to access your non-HTML files is to check your log files for bot activity.
Robots.txt High Impact Solutions
The only thing about using robots.txt to block indexing of search engines is not only that is quite inefficient, but may also reduce the flow of inbound links. Locking a page using robots.txt, search engines are not indexing the content (or links!) Page. This means that if you have inbound links to the page, the link juice can not flow to other pages. It creates an impasse.
While inbound links to the blocked page is probably some benefit to the general area, this value inbound links are not used to their full potential. You missed an opportunity to convey a certain value link internal page blocked in several important internal pages.
3 Big Sites with Blocked Opportunity in the Robots.txt File
*1 - Digg.com
*2 - Blogger.com or Blogspot.com
*3 - IBM
Super Solutions to the Robots.txt
Great site, for example above, we have covered the wrong robots.txt file. Some of the scenarios were not included. The following is a list of effective solutions to maintain the contents index of search engines with no link juice to lose.
Noindex
In most cases the best alternative to robots.txt robots exclusion meta tags. By adding "noindex" and make sure it does not add 'nofollow' your pages will remain in the results of search engines, but will link value.
301 Redirect
The robots.txt file is not a place to list the old worn pages. If the page has expired (delete, move, etc) is not only the block. Redirect this page through a 301 to replace the most relevant. Get more information about redirecting the Knowledge Centre.
Canonical Tag
Do not block your overlap since the versions in robots.txt. Using the tag canon to keep the additional versions of the index, and consolidate the link value. Where possible. Get more information at the Information Centre on canonization and use the rel = tag canon.
Password Protection
The robots.txt is not an effective way to keep the information confidential at the hands of others. If you have confidential information on the Internet, password protect. If you have a login screen, go ahead and add meta tag "noindex" page. If you expect a lot of incoming links on this page for users, be sure to link to some of the most important pages of internal login page. This way, you pass through the link juice.
Effective Robots.txt Usage
The best way to use a robots.txt file does not use it at all. Use it to report that robots will have full access to all files on the site and to control a robot in the sitemap.xml file. That's it.
Your robots.txt file should look like this:
-----------------
User-agent: *
Disallow:
Sitemap: http://www.yoursite.com/sitemap.xml
-----------------
Bad Bots
"Robots and instructions for the robots.txt file," which means that there are robots that do not follow the robots.txt at all. So when you do a good job of keep away with a good, you are doing a horrible job to keep away from "bad" against. In addition to filtering to allow access only to the Google bot Bing is not recommended for three reasons:
1. The engines change/update bot names frequently.
2. Engines employ multiple types of bots for different types of content.
3. New engines/content discovery technologies getting off the ground stand even less of a chance with institutionalized preferences for existing user agents only and search competition is good for the industry.
Competitors
If your competitors are warned SEO in any way whatsoever, they look at your robots.txt file to see what they can discover. Say you are working on a new design or an entirely new product and you have a line in your robots.txt file that disallows bots "index" it. If a competitor appears, check the file and see this folder called "/ newproducttest" when they just won the jackpot! Better to keep it on a staging server, or behind a login. Do not give all your secrets in a small file.
Handling Non-HTML & System Content
* It isn't necessary to block .js and .css files in your robots.txt. The search engines won't index them, but sometimes they like the ability to analyze them so it is good to keep access open.
* To restrict robot access to non-HTML documents like PDF files, you can use the x-robots tag in the HTTP Header.
* Images! Every website has background images or images used for styling that you don't want to have indexed. Make sure these images are displayed through the CSS and not using the tag as much as possible. This will keep them from being indexed, rather than having to disallow the "/style/images" folder from the robots.txt.
* A good way to determine whether the search engines are even trying to access your non-HTML files is to check your log files for bot activity.
Robots.txt High Impact Solutions
Someone advise me to put "blog labels" into Robots.txt. I don't know why? Is this correct?
ReplyDeleteRegard
Radhika