One of the myths that I can dispell about developing search engines is that there is a massive barrier to entry when it comes to hardware.
Google have 100,000s of servers running their search engine, so that’s what it takes to compete and enter the market – or so the argument goes. This couldn’t be further from the truth.
This forgets that Google only need this number of servers to run their search because they have a vast number of people searching on the site. The actual volume of traffic starts small in any start-up, and builds up on a growth curve – usually allowing the start-up to ramp up the number of PCs required as the traffic grows. Each individual index is actually fairly small, and sits across a few hard-drives only.
This is another popular myth – the volume of data is so high that hundreds of PCs are required to begin a new engine. I challenge people to do the calculation; how many bytes does it take to store each web-page? How many web-pages are required for a useful service? You may be surprised.
What I can personally confirm is that the hardest part about bringing a search engine to market is not the hardware, but the software. 90% of the software can be finished in a short space of time, but the other 10% can take years to complete to a satisfactory level. This is the real barrier to entry.
The story behind our local shopping search engine goes back several years. You can keep in touch with the story on this blog.