Robots.txt set to Disallow?!

RobotPerhaps it’s just coincidence, but we’ve been noticing a rather alarming trend of late, namely robots.txt files being used to block all search engine robots from accessing what should otherwise be a fully accessible web site. In each case, the discovery has come about after a client asking us why their web site has suddenly dropped in ranking, or why they can no longer find it on Google.

In one case we suspected sabotage, and in another it was probably something that was forgotten about when the site went live (the developer using robots.txt to restrict search access instead of having a separate development environment most likely, still no excuse), but whatever the case the impact is severe and perhaps even catastrophic for a site or business.

The robots file can be used to some effect to aid search engine optimisation efforts, or to aid the transfer from an old site with its outdated architecture and navigation to a new site with its new structure, for example:

URL Removal via Webmaster

You can request the removal of old URLs or directories from Google’s index by adding them to the robots file, then submitting a request via the webmaster console. This is particularly useful when whole directories become redundant and need to be removed from search engine indexes.

Sculpting Link Juice

This is one of those SEO techniques that seems to attract an equal mixture of praise and criticism, but it is one way of controlling how internal links flow around your web site, and from there how much ‘equity’ or ‘link juice’ you pass from one page to another. The theory says that you can focus link juice from – say – your home page on to a select few sub-pages and in doing so raise their ranking.

The recent announcement from Google on the canonical tag does (to a certain degree) render this redundant, but until we see conclusive proof it works, we’ll remain on the slightly sceptical side of the fence.