Much of the data that we have, or potentially might have, represents choices. Choices made by users or choices made by a service provider to meet user needs.
Holdings data. Circulation data. Database usage data.
In its production services, OCLC tends to use holdings data quite a bit to rank results. We have also been doing some research to see whether we can mine some other intelligence from holdings. For example, we are working to see whether holdings are a good indication of audience level. Does the pattern of holding institutions (research, other academic, public, school) say something about the audience that would find a book useful. How would one use this? As a query filter would be one example.
There is a growing interest in circulation data also. Of course, libraries use circ data internally as they look at their collections. However, what if one pooled circ data to develop recommender systems? An interesting issue here is the proportion in different libraries of items that circulate above certain levels. One sees a 20% figure quoted for research libraries. And the COUNTER initiative provides a basis for sharing database usage data, although we have yet to see services built upon aggregations of this data.
In his recent book, The Search, John Battelle talk about ‘the database of intentions’, as manifest in the collective usage data of AOL, MSN, Google and Yahoo:
Taken together, this information represents real a real-time history of post-Web culture – a massive clickstream database of desires, needs, wants, and preferences that can be discovered, subpoenaed, archived, tracked, and exploited for all sorts of ends. [p. 6. The Search]
Mining the collective library ‘database of intentions’ to refine and improve service will become of much greater interest in coming years.