Google reveals new version of its search engine
Google today unveiled project "Caffeine", the codename for the secret project to develop the next-generation of the company's search-engine infrastructure.
Web developers have been invited to test the new version of its search engine, and provide feedback about differences observed in results from the current and new system.
The preview was announced today by the technical lead, Matt Cutt, on the company's official Webmaster Central Blog.
For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google's web search. It's the first step in a process that will let us push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions. The new infrastructure sits "under the hood" of Google's search engine, which means that most users won't notice a difference in search results. But web developers and power searchers might notice a few differences, so we're opening up a web developer preview to collect feedback.
I welcome this opportunity to preview the new service and report back with my observations. Preliminary data from my own tests so far -- covering key metrics including search speed, the number of results returned, and relevance -- don't indicated any big differences between the current and new version of the search engine. Qualitative and quantitative differences -- perceived and measured -- are either not big enough to be readily apparent or not consistent enough to support general conclusions. There are obvious differences in the layout of the results page, e.g. news and video sections are now both at top for applicable queries. Comparisons with Bing are inevitable, since the Caffeine sandbox seems to be copying Microsoft's search engine.
The test site is not yet finished and currently has a number of issues, such as broken links and other errors, about which Google's engineers do not currently require feedback. [Note: on 13 August Google's test service was down, with an error message saying the data centre was being updated.]
10 August 2009
Tags: google search engine new version beta preview test secret project caffeine webmasters faster relavent results next-generation architecture sandbox engineer matt cutts blog search competitive intelligence technology news
Comments: 4
Add Comment
Predictably, yesterday's news is generating a lot of hype and hysteria. Everybody wants a piece of the action. People are so keen to have a say, it seems many of them do so without reading Google's announcement, since a number of commentators seem to have missed the point of this test.
The Daily Telegraph in the UK provides a typical example:
The excitement is more infectious than swine flu. That single quote turned into the headline: "Google reveals caffeine: a new faster search engine".
In reality, this test is not about the speed with which results are returned. I wouldn't suggest it's faster, either, and if I did we could safely assume it's because the new version has very much less traffic. The sandbox test probably has a tiny fraction of one percent of the load on the existing search engine.
As Google points out, the change only affects the infrastructure, and the only mention of speed is "indexing speed" which means the speed with which Google captures useful data as it crawls the web. As Google points out:-
It's an endless source of entertainment, to see how the power of suggestion affects the psychology of other commentators. Google’s reputation is literally intoxicating -- intelligent adults abandon rational thought under its influence. Is there an implicit assumption that anything Google does must be impressive in every imaginable way? "Wow, Google did something, so it must work faster now!"
I noticed that the BBC jumped on the same bandwagon as everybody else, quoting the same person as the Telegraph, and basing their headline on the same unscientific claim: "New Google 'puts Bing in shade'". This is often how mainstream "news" is generated. A spurious sensationalist quote is published somewhere, then the same thing is regurgitated elsewhere with minor adaptations, in a process of dissemination and legitimisation that can be extremely rapid. This is a good example of contemporary mass-hysteria.
Querystring Spam in Google's new search engine
Among the issues I've identified and reported so far in Google's Caffeine sandbox is an interesting form of URL manipulation Spam that I haven't seen before, which could indicate a new vulnerability.
E.g. In a search for "open directory project" the first result listed is dmoz.org. But notice that appended to the URL is the domain name of a Turkish search engine: "www.dmoz.org/?id=www.interturknet.com/".
Presumably somebody is deliberately linking to dmoz.org usng a URL with this rogue query string parameter. The problem doesn't affect the current version of Google search. I've observed this phenomenon with minor web sites before -- but never such a major URLs for which there must be millions of correct links, and an extremely high ratio of correct links vs links with querystring Spam.
Update: this update to Google's indexing system looks likely to go live soon, and the preview is now offline.