research

Who is the searcher in real life? How does search behaviour depend on who you are?

These are definitely questions which have been endlessly modeled by search engines with the aim of personalizing the user experience – in whatever slight way they can – as well as for potentially maximizing their advertising revenue with better targeting. But for understandably selfish reasons, the amount of publically (or academically) available user information is very limited.

Ingmar Weber and Carlos Castillo of Yahoo! Research Barcelona published (or are about to publish) this paper titled : The Demographics of Web Search at SIGIR ’10.

To show a demographic break-down of who the users they used 3 sets of data

1. Yahoo! USA query log data
2. Yahoo! user profile data (year of birth, gender, ZIP)
3. Demographic based on ZIP code from the census of the year 2000. (How different is that from Claritas Prizm ?)

There will definitely be some discrepancies. In fact there may be discrepancies in both sides of the data. A number of user profiles may have wrong information and the Census is 10 years old. (How often is the US Census? When is the next one?)

Being (I think) the first paper of its kind, there are a lot of results. One was the “discriminating queries”. The queries which were distinctly unique to a certain generalized population, with “young” people searching for “free teen chatrooms” and rich fellas searching for Chris Jordan.

Another interesting result apparently shows that the more educated people tend to search with longer keywords and click deeper results.

And lastly, an interesting line :

Finally, we are looking at a possibility to share our data on a per-query basis for high-volume queries if privacy guarantees such as k-anonymity can be given. Releasing query log information aggregated for demographic groups is similar in spirit to releasing census information for a particular ZIP code.

Is aggregated query log data any use?

Will it help in any way to Simulate User Interactions?

PS: I get this weird feeling sometimes that some of what I write is taken from what I have read somewhere else. But I can’t re-find those links. How do bloggers manage to remember links which they read weeks (if not months) ago?!

I feel the main difference between the year 1800 and 2000+ is the ability to compete on equal terms whoever you might be. Wherever you may be from. In just over 200 years, I think the free access to information has been our biggest step ahead in terms of civilization.

And in this respect we’ve taken tremendous strides thanks to the Internet. I am part of the Google generation and to me content is something that is freely available. With this freedom came tremendous innovation, creativity and empowerment. The common man is no longer power-less against anything. Everything can be brought down with a bit of time and mental effort.

Pay or Leave

It was a few months back that I started seriously looking through research papers published in the Information Retrieval field online. What I was taught/read from books was, ever so slightly, outdated. And that is when I came across the pay-walls. Any link which reads ‘Springer’ or ‘ACM’ were out of bounds. In many cases I ended up trying to find ways around this, scouring the net for a free e-version of an academic published paper. (Thank you DBLP)

I always believed that it was the duty of people in academics to spread information in general – and any good work that they do themselves – to as wide an audience as possible. This is obviously not happening since I personally know a lot of students who are turned away by the pay-walls. I would like to bring to your notice that Not Everyone Has Money. Not Everyone Has a Credit Card Either.

How about an overhaul of the system such that the papers are stored in servers across many universities across the world. If the real need to charge subscription fees is maintenance then it is the duty of the researchers to maintain it. Why does it require money? It requires only effort. Effort from people who will gain something from the process. I know I am willing to put in a few hours a week to keep something useful up and running.

This pay-wall concept has run its course. In today’s world it is just Unacceptable. If it is possible for so many developers to maintain their pet open-source projects over so many years, I don’t see why academics the world over can’t keep something like ACM open and free. It is high time research papers in all fields became open to all Joes everywhere – as long as they have an internet connection.

Update: The whole paper publishing thing is generally paid for by the government (directly or indirectly) – So shouldn’t papers be free for students wanting to read it, anywhere in the world?

Technically Possible

My Thoughts on Tech & Advertising

Academic papers should be Free