Real time search - the problem


Both Google and Microsoft’s new search service Bing has partnered with Twitter to provide real-time search results for queries. This is great news for finding valuable information but it also creates new problems to overcome; filtering out the irrelevant data. Search today is based on relevance through counting the number of links to and from a site. This relevance also weights the linked sites. This is the basic idea behind Google’s PageRank system. But its fundamentally flawed, namely the older the site the more information and weight it can get. Google has of course tried to minimize this affect but it’s still visible when searching for certain topics.  Google “next apple event” for an example. The search result is completely useless.

Twitter however has the opposite problem. Without a system like PageRank to value the posts a lot of relevance comes from time. The latest posts are the most relevant. But this also means that topics that aren’t current might not yield any relevant information available. So the time problem is reversed from Google’s PageRank time problem.

So how will we solve this? Well, I don’t have a definitive answer of course. But I’ve more and more come to believe in crowd sourcing as a means to get accurate data. Perhaps relevance can be calculated not from the content itself but from how we interact with it. If users can be filtered out from bots (usage patterns for bots are really hard to mask over time since the cloud could potentially remember ever mouse move they make) relevance could be weighted from number of users who actually read or view the content.

No doubt Google has teams working on this. And no doubt they will eventually buy some small startup doing it a lot smarter than they are. It’s an interesting problem nevertheless.


Categoryweb