Thursday, November 22, 2007

Unvalidated Robots.Txt Risks Google Banishment

The web crawling Googlebot may find a forgotten line in robots.txt that causes it to de-index a site from the search engine.

Unvalidated Robots.Txt Risks Google Banishment
Unvalidated Robots.Txt Risks Google Banishment

Webmasters welcome being dropped out of Google about as much as they enjoy flossing with barbed wire. Making it easier for Google to do that would be anathema to being a webmaster. Why willingly exclude one's site from Google?

That could happen with an unvalidated robots.txt file. Robots.txt allows webmasters to provide standing instructions to visiting spiders, which contributes to having a site indexed faster and more accurately.

Google has been considering new syntax to recognize within robots.txt. The Sebastians-Pamphlets blog said Google confirmed recognizing experimental syntax like Noindex in the robots.txt file.

This poses a danger to webmasters who have not validated their robots.txt. A line reading Noindex: / could lead to one's site being completely de-indexed.

The surname-less Sebastian recommended Google's robots.txt analyzer, part of Google's Webmaster Tools, and only using the Disallow, Allow, and Sitemaps crawler directives in the Googlebot section of robots.txt.

No comments: