Who is the searcher in real life? How does search behaviour depend on who you are?
These are definitely questions which have been endlessly modeled by search engines with the aim of personalizing the user experience – in whatever slight way they can – as well as for potentially maximizing their advertising revenue with better targeting. But for understandably selfish reasons, the amount of publically (or academically) available user information is very limited.
1. Yahoo! USA query log data
2. Yahoo! user profile data (year of birth, gender, ZIP)
3. Demographic based on ZIP code from the census of the year 2000. (How different is that from Claritas Prizm ?)
There will definitely be some discrepancies. In fact there may be discrepancies in both sides of the data. A number of user profiles may have wrong information and the Census is 10 years old. (How often is the US Census? When is the next one?)
Being (I think) the first paper of its kind, there are a lot of results. One was the “discriminating queries”. The queries which were distinctly unique to a certain generalized population, with “young” people searching for “free teen chatrooms” and rich fellas searching for Chris Jordan.
Another interesting result apparently shows that the more educated people tend to search with longer keywords and click deeper results.
And lastly, an interesting line :
Finally, we are looking at a possibility to share our data on a per-query basis for high-volume queries if privacy guarantees such as k-anonymity can be given. Releasing query log information aggregated for demographic groups is similar in spirit to releasing census information for a particular ZIP code.
Is aggregated query log data any use?
Will it help in any way to Simulate User Interactions?
PS: I get this weird feeling sometimes that some of what I write is taken from what I have read somewhere else. But I can’t re-find those links. How do bloggers manage to remember links which they read weeks (if not months) ago?!