Googler 13

Tuesday, January 25, 2011

The 3 most important things I learned from Google (part 1)

none of which are likely to be endorsed at Business School

This is the first of three posts.

1) Reinvent the wheel

When I decided to go to work for Google in May of 1999, I heard time and again: “the world doesn’t need another search engine”. The search game was already in high gear. Alta Vista, Lycos, Excite, Infoseek, Inktomi, all the meta search guys...why did we need another? There was an answer, but it was one few could relate to at the time. The Internet was cool and all, but it was not quite indispensable. In 1999 the world had a tolerance for digging through pages upon pages of results to find what we were looking for, if we found exactly what we were looking for at all. The world did need a "better" search engine. It just didn't know it yet.

For those who don't know, when you go to a search engine and type in your query, the search engine doesn’t look at the entire web and bring you back results from it. What actually happens is that, periodically, the search engine sends out little computer programs called “spiders” which go out and look at pages on the internet and store a copy of what they see in their own database (this is called their "index"). When you search, the results you see are from that old database, not the web that exists at the time of your search. As long as the page in the index hasn’t been moved or deleted in between the time the index is built and your search, you end up at the correct page. Remember “Error 404 File Not Found”? That was from all the changes that occurred in between indexing and searching.

In 1999, indexes were compiled once per month and the largest search index was 100 million pages. Even then, that was only a fraction of all the pages on the web. And here was the real problem- millions of pages were being added to the web daily! When Google came on the scene, it matched Alta Vista’s index size. Alta Vista tried to increase their index to 250 million pages, but the quality of results (as measured by the number of pages you had to dig through before finding what you wanted) got so bad that they had to bring it back to 100 million. This happened because of the algorithm (the logic of how they decided what to return as search results) they (and all the other search engines of their day) used wasn’t able to scale, in addition, they were susceptible to webpage designers being unscrupulous to trick them into showing their pages even when the pages weren’t related to the search performed by the user. One way this happened is that a designer would put a popular word 1000 times in white on a white background. The search engines didn’t know the word was “invisible”, only that it occurred a lot on the page, so must be really relevant to that word. It wasn't. Google did things completely different. It looked at an dizzying array of variables when analyzing each page, not just word counts. This made it much more resilient.
If 1st generation search engines couldn’t scale their indexes without loss in result quality, they were eventually going to be rendered obsolete. Google went on to launch an “index size” war, since it believed that its mission was to make the world's information universally usable and accessible. To do that, it had to keep up with the growth of the web (as well as expand beyond just finding traditional web pages to serve as answers to queries). Google’s technology not only scaled well, but because of the specific way it looked at webpages, the results actually got better as the index grew. It also began to index more frequently so the freshest data was in its index, vastly improving user experience, which is a key, fanatical focus for the company. Ultimately, Google knew that the web would keep growing and that search would one day be akin to a utility and without it, the web was going to be useless. Google took a completely fresh look at the problem. In doing so, the quality increased (not to mention the speed gained by uncluttering the page), people searched more, relied on the web more to find things and the entire market grew.

So, my take-away here is:
Just because things are the way they are now doesn’t mean that its optimal or has to stay that way. Often today’s standard is the legacy of something that has long since changed or no longer serves the needs of those it was meant to. Regardless of if your market is dominated by large established players, if you look at how the world is evolving and to the needs of tomorrow, you may discover an opportunity to grow market share and the market itself by COMPLETELY rethinking the way things are done.

2 comments:

Maometto said...

This is so true for our business: Internet Radio. The deal is not done. check out http://www.soundtrckr.com

landing page said...

Steve, thanks for your blog posts and fascinating insights about the early days of Google. I personally find the actual stories about the early days at Google the most interesting parts of your post.