A bit of History
I’ve been wondering about what is next for web search. From its humble beginnings, the internet has grown into a colossus which can potentially connect entities from across the planet in seconds. Anyone could start a website. This power to the masses meant there was a explosion of content.
From the point of view of search, this was a huge challenge. How could all this data be effectively indexed so that people could find what they’re looking for easily. And more importantly could people find the most relevant information from the internet when they wanted to. The huge amount of content available meant that it was very difficult to effectively rank stand-alone web-pages on its own merits. The fact that web-pages started linking to each other became the factor which was used to point at relevance. The concept of “if a relevant website thinks of you as relevant, enough to link you, then you get some deflected relevancy too” – the PageRank algorithm caught on.
But there are possibly 2 main reasons for the exponential growth of the internet. The opportunity for users to easily add value to the network through weBlogs and the increasing access to technology in the 2 major new markets of India and China. This, when combined with the social networking phenomenon – where people spend time online with the sole purpose of keeping in touch with people and increasing their online friend networks – we have a complex problem on our hands from a search and retrieval point-of-view.
The Human Problem
Users have multiple profiles on multiple online locations which are disconnected. The fragmented nature of the internet is affecting the relevancy of content and impeding the user’s contributions. A user now thinks twice before jumping into a new online location which he/she fancies because of the difficulties in re-establishing himself in the new domain. This is especially relevant in sites which rank users on the basis of credibility.
Many have noticed this and have tried to bring in some form of standardization. The bigger social networks are trying to push their existing validation as can be seen from Facebook’s attempts. If successful, the accepting of the standard would render the company irreplaceable, so to say. As the gate-keepers to the social world, they will also have a potentially steady income as well.
Whoever the connector – whether it is something like Facebook or a more open implementation like the Diaspora project or something completely different, it is eventually going to happen. Once it does, what happens then?
In an ideal internet future, anonymity would become a thing of the past. Already, in many blogs you need to sign-in to post a comment. A system by which we can aggregate all the activities of each unique user rather than just building a huge web of inanimate documents would be interesting. Ranking of content could then be used as only the first step. As a result of the rankings, we can effectively judge sources – not sites, but people – on the basis of their expertise in a given field.
Hence, if for example a new forum is launched which discusses about open source technology, a user could easily switch, taking his reputation with him. Also, from a search and retrieve point of view, a post by this said user in the new forum would rank as highly as one made on the other forum – which is not the case now.
Also, a user who links to another’s work, passes on deflected credit, quite like system of academic paper publications.
So hence we have HumanRank. (Self coined)
HumanRank is closer to the concept of citations in academic papers because we are including not just documents (papers) but very human reputations along with it. It is the sum total of your contributions to the internet ranked according to topic. It crosses domains and sites and its open to websites to which you sign into. It is also available to search engines which can index your data based on your overall personality online and not just your activities on one online property.
To start, its possible to use the Google Social Graph API (which, sadly hasn’t developed much it would seem) to find properties on the internet to which a particular user contributes. It uses XFN and FOAF to relate twitter, blogs and other sites which you own. Crawling through sites person-by-person and analysing the content such as to get an idea of where you stand among your peers can be a great way to extend search.
Just a thought.