I was recently sent a newsletter from a client about Search Engines and Valid HTML. The author points out that Valid HTML is hard to find amoung the results of his searches.
MANY of his results that were invalid HTML (or XHTML) were sites that were ‘Official’ Sites such as FIFA.COM or the like. Why in a search engine would the official site be high ranked would it be related to valid HTML or the CONTENT of the site?
Several times wikipedia came up as a result and while it wasn’t valid it failed only due to a single problem. (Duplicated ID tag) NOT really the biggest of issues and entirely ignored by most indexers anyway.
He also brings up the point of ESPN.com, IMDB.com and MySpace.com not having valid HTML. What do those sites have that most sites don’t? Content, Visitors and Links APLENTY! As for the search engines being valid, why should they care? Often they are automatically added and the end user doesn’t care at all.
Think of things this way. A website is like a resturant. The Health Code is Valid HTML. Nobody follows it to the letter (I know my wife used to run a resturant and I’ve talked with inspectors) HOWEVER as long as most of the place is clean and safe you can serve food. Now some of the NASTIEST places I’ve been in have been MAJOR fast food resturants. Do they need to care about each and every health code infraction? No… they have the traffic and sales. Does the little corner pizza shop need to care? A bit more, but they better have good pizza(CONTENT) or else people won’t come. Make crappy food and odds are people won’t come no matter how clean your floor is… HOWEVER have a dirty floor and contaminated food and not only will people not come the board of health will shut you down.
The author does conclude that while valid HTML is a VERY GOOD THING and encourages people to use it. The impact on search engines in minimal. But I don’t think he’s reallly seen HOW bad Invalid HTML can be.
ON a side note the links to the online version of the article and the printable version ended up being 404 errors… a bit more serious error than invalid HTML…