ISKO Italia. Documenti

Folksonomies: power to the people

by Emanuele Quintarelli

<info @ infospaces.it>

paper presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005


the author during the presentation / photo by Antonella Pastore


Introduction

In recent times, an unprecedented amount of Web content has begun to be generated through web logs, wikis and other social tools thanks to lower technology and cost barriers. A new host of content creators is emerging, often individuals with the will to participate in discussions and share their ideas with like-minded people. This is to say that this increasing amount of varied, valuable content is generated by non-trained, non-expert information professionals: they are at the same time users and producers of information.

We have gone past a critical mass of connectivity between people that has introduced a new revolutionary ability to communicate, collaborate and share goods online.

To respond to these increased informational and exchange needs, new communication models are emerging and producing an incredible amount of distributed information that information management professionals, information architects, librarians and knowledge workers at large need to link, aggregate, and organize in order to extract knowledge.

The issue is whether the traditional organizational schemes used so far are suitable to address the classification needs of fast-proliferating, new information sources or if, to achieve this goal, better aggregation and concept matching tools are required.

Folksonomies attempt to provide a solution to this issue, by introducing an innovative distributed approach based on social classification.

Overview of classification schemes

For centuries, classification has been used to provide context and direction in any aspect of human knowledge. Our mind seems to define, understand and describe the external world by tracing boundaries and fitting things into classes, containers with a name. As human beings, we need to clearly know our relative position and the viable routes towards other places. In a physical world, we design and use maps, coordinates, graphs, diagrams, and signposts. Equivalent tools are needed to find our way in the virtual world.

Much work has been done by librarians and information scientists to create appropriate and powerful classification systems. Classification requires the design and consistent use of a scheme for a systematic organization of knowledge. See [1].

Traditionally, there are two different approaches to classification:

Hierarchical-enumerative schemes are basically trees of containers connected by parent-child relationships and with one only path from the "root" to the "leaves". We are very familiar with this kind of knowledge organization which, anyway, has a number of drawbacks:

Unlike the above, analytical-synthetical schemes give up enumerating classes by describing items through a combination of aspects (facets). In a faceted classification scheme, the facets may be considered to be dimensions in a Cartesian n-dimensional space, and the value of a facet is the position of the object in that dimension. Instead of imposing a pre-determined hierarchy, items can be placed on-the-fly, by evaluating their inherent characteristics, and can be retrieved by users using the same item properties, either one at a time or all together.

Faceted classification can be applied to large homogeneous datasets and suggests an explorative approach, whereby a large dataset is progressively filtered through the user's choices. Users can restrict the resulting dataset at each step, until they arrive at a group of items that meets their needs. See [2].

In this flexible and scalable approach, an item can be associated to, or better described, by more than one facet, and new facets can be quite painlessly and freely introduced to express new concepts.

New information sources and mass amateurization of Web publishing

Ten years after the first homepages, today's mass phenomenon is weblogs, online diaries or journals that either individuals or groups publish and share with others on the Web. We now have hundreds of thousands active weblogs, most of them powered by simple content management systems. Thousands of weblogs are created and die everyday, in every country of the world. Weblog authors, or bloggers, are entering the realm of politics and large corporations at an incredibly fast pace, linking, posting, trackbacking and commenting in an enormous living network. Now, everyone can have their own blog to express opinions, create communities, collect links and keep an online diary.

The Web publishing process has come to the masses thanks to lower technology and cost barriers.

Blogging and content management software provides every one interested with extremely simple and accessible tools to update a website every day, almost effortlessly and at no cost. See [3].

Blogging is just one component of the emerging, more general concept of social software, a technology «which supports, extends, or derives added value from human social behavior -- message-boards, musical taste-sharing, photo-sharing, instant messaging, mailing lists, social networking». See [4].

The point here is that we have gone past a critical mass of connectivity between people that introduced a new revolutionary ability to communicate, collaborate and share goods online.

Beside blogs and wikis, other tools of social connection are emerging, such as photo sharing, social bookmarking, to-do-list sharing.

These tools are producing an incredible amount of distributed information that we need to link, aggregate, organize in order to extract knowledge. To achieve this goal, better aggregation and concept matching tools are required.

Limits of taxonomies

From what we saw in the previous paragraphs, traditional schemes of classification work better when a domain presents these properties:

In other words, to classify cleanly, formal categories should be identified that do not change over time, contain homogeneous entities and are capable of describing all the items in a corpus. DDC (Dewey Decimal Classification) is an example of traditional classification scheme often applied in libraries. Libraries appear to be homogeneous sets of items (books), that can be grouped in hierarchical, formal, pre-defined classes and in which new items are added at a reasonable pace.

Another fundamental point to take into account is the target audience of the classification strategy. Taxonomies and controlled vocabularies work by establishing a clear view and organization of the corpus on which users have to agree in order to use the classification scheme properly.

Traditional classification schemes require also:

Using a sound and complete classification scheme requires professionals to do the job, a common clear view of the domain and skilled users that understand the categories and the structure of the classification to use it without problems. See [5].

On the other hand, sprawling, heterogeneous information sources make up an enormous, ever-changing, time-sensitive, not-clearly defined corpus of items to classify without a central authority, targeted at a heterogeneous and increasing group of users. This situation requires new and different classification strategies.

The Web today fits neatly in this description. On the Web, the direction is scalability, flexibility, fluidity and simplicity to satisfy the demanding needs of millions of people with different cultural and social backgrounds all over the world. Under these circumstances, traditional precise classification schemes become expensive (to create and maintain) and probably lose the capability to match the user's way of thinking and organizing the world.

Folksonomies provide an approach to address Web-specific classification issues.

Folksonomies: an emerging approach to distributed classification

A folksonomy is a user-generated classification, emerging through bottom-up consensus (see [6]). A fusion of the words folks and taxonomy, the first use of the term folksonomy has been attributed to Thomas Vander Wal. Taxonomy comes from taxis and nomos (from Greek). Taxis means classification. Nomos (or nomia) means management. Folk is people.

The term was coined in the AIfIA mailing list to mean the wide-spreading practice of collaborative categorization using freely chosen keywords by a group of people cooperating spontaneously. See [7].

Folksonomies are not a theory or a top-down strategy: they were born out of a feature (folk classification tools) introduced by software like Del.icio.us <http://del.icio.us>, Flickr <http://www.flickr.com>, 43things <http://www.43things.com>, Furl <http://www.furl.net>, Technorati <http://www.technorati.com>, etc. and from people using these platforms to tag their contents (links, photos, etc).

Folksonomies require people to associate keywords with content. Using popular keywords gives them the reward of visibility, to see one's own content gravitate in evidence in the system (for example on the homepage).

In a bottom-up distributed and collaborative grassroots approach, tagging or folksonomy is a manifestation of people moving away from hierarchical authoritative schemes. Rather than learning yet another imposed external scheme to classify items and to restrict, to some extent, the user's thinking, people started to associate their own tags to the items they wanted to collect and share. In a social distributed environment, sharing one's own tags makes for innovative ways to map meaning and let relationships naturally emerge. See [8].

Folksonomies are not simply visitors tagging something for personal use: they also are an aggregation of the information that visitors provide. The power of folksonomy is connected to the act of aggregating, not simply to the creation of tags. Without a social distributed environment that suggests aggregation, tags are just flat keywords, only meaningful for the user that has chosen them. The power is people here. The term-significance relationship emerges by means of an implicit contract between the users.

The concept on which folksonomies are based can be applied to everything that we can aggregate. The key point is in having an activity to observe that:

Though working on a different mechanism, an example of aggregation based on user activity and interest is the recommendations feature on Amazon.com: the aggregated activity here, instead of tagging, is users reading a product page. This activity is explicit, can be aggregated, is meaningful for users and, by transparently tracing user behavior, produces useful insights for the company. See [9].

Folksonomies at work

Two of the best known examples of social software using folksonomies are probably Flickr and Del.icio.us. They are aimed at different user needs and profiles, but the basic idea is simply to make people share items annotated with tags.

Flickr is a social photo management and sharing tool that allows users to easily upload and share digital photos.

Del.icio.us is a social bookmarks manager. It allows its users to easily add web pages to personal collections, to tag them, and to share their collections with others.

As Jon Udell defined them on a more formal level, both Del.icio.us and Flickr are collaborative systems for:

We have always believed that nobody (except for professional indexers) would have assigned metadata or classified content, and that even if someone wanted to try, they would have produced useless inconsistent taxonomies. The main reasons for this were the lack of benefits for the user in classifying things and the complexity of the operation. Starting from this belief, nobody could have imagined that users of Flickr and Del.icio.us would assign tags to content everyday and that these tags could be the gateway to a new experience of the Web, opening fascinating and innovative possibilities of navigation and search.

Put simply, this is how this software works: users tag an item with a list of existing and/or new keywords they identify at the moment or others have already provided. Keywords are for sure not new on the Web, but these tools add some new relevant properties:

Broad and narrow folksonomies

Flickr, Del.icio.us and other social tools that leverage the power of folksonomies aim at different user profiles and show several trends of use. In other words, the nature of the folksonomies they produce is quite different.

As explained by Thomas Vander Wal (see [11]), we can distinguish two typologies of folksonomy, each associated with specific properties and suggested use:

A broad folksonomy (as the one of Del.icio.us) is the result of many people tagging the same item. Every user can tag the object in a different way following their own mental model, vocabulary and language. This approach tends to show a power law curve and a long tail effect.

(In nature, events deviating from the average are rare. They follow a bell curve, a curve with a marked peak (a Gaussian curve). Power law distributions are very different from Gaussian curves: they do not have a peak, a characteristic value, but they look like continuously decreasing curves in which a large amount of tiny events (the long tail) coexist with a few anomalously very large ones. See "Long tail" on Wikipedia.)

In a broad folksonomy, the power law reveals that many people agree on using a few popular tags but also that smaller groups often prefer less known terms to describe their items of interest.

Therefore, a broad folksonomy provides a tool to investigate trends in large groups of people describing a corpus of items and can be used to select preferred terms or extract a controlled vocabulary.

The real power of broad folksonomies is in the richness of the mass, in people explicitly exposing their way to define and describe things that leads to the long tail and power curve. These effects are simply absent in personomies, i.e individuals tagging their own self-produced or uploaded content.

A narrow folksonomy (as the one of Flickr), on the other hand, is the result of a smaller number of individuals tagging (using one or more tags) items for later personal retrieval or for their own convenience

Narrow folksonomies lose the richness of the mass, but provide benefits in tagging objects that are not easily findable with traditional tools (full-text search or other text-related tools) or that cannot be simply described in current text-based software on the Web.

A narrow folksonomy provides various target audiences (maybe with a rather specific shared vocabulary) with the instrument to add tags in their own language. This property makes later retrieval fast, efficient and enjoyable.

Properties of folksonomies

Much debate is currently going on about folksonomies. From this discussion, a number of properties have emerged.

Detractors of Folksonomies highlight the following drawbacks:

On a positive note, supporters of folksonomies underline that:

In brief, using the words of Timo Hannay (see [20]), a folksonomy is «liberating, not restrictive; bottom-up, not imposed; relational, not hierarchical. It also cleverly harnesses selfish acts and directs them towards the common good. But most of all, it just seems to fit the way our brains work».

Enterprise folksonomies

Folksonomies are not limited to the geek world or to the blogosphere. Enterprises have also started blogging and experimenting with folksonomies. An example is IBM's Intranet that serves 315,000 IBM employees worldwide in different languages and with multiple roles and information needs. While actually using a controlled taxonomy, they have announced to start experimenting with folksonomy to keep information updated and organized following their users' personal way of accessing the system. See [21].

In the direction of facing the intrinsic precision loss of folksonomies, Jess McMullin proposes to complement social classification with other classification approaches: «automated keyword extraction, tag suggestions built into the tagging tool as the tag is typed [see Google Suggest and Ajax technology], mapping ad-hoc tags to structured facets, and top-down classification oversight by information professionals». See [19].

Large corporations are often made of independent silos unable to communicate with each other and not sharing a common vocabulary. The same thing can have different names in different silos. A typical argument against the introduction of folksonomies in a corporate environment is that their use as a basis for retrieving documents from corporate archives would still require a common language, a shared vocabulary, spoken by the entire company, allowing the use of a well-defined label or set of labels for every article. This is not true: while the vocabulary is not the same, people are classifying the same real things underlying the terms used to name them. This knowledge allows the creation of a mapped folksonomy between the language of individuals and the corporate language as a sort of synonym ring. Every user will retrieve documents using the terms of their specific vocabulary that the system would match to the corporate vocabulary. See [22].

Side benefits of folksonomies

Leaving aside classification as a goal in itself for a moment, folksonomies appear as a means of self-expression in a group and, in a more general context, it can suggest useful possibilities of aggregation and analysis. The aggregation of tags that people assign to items is revealing of their personal ways to express concepts and the means by which they communicate in their group.

This analysis of people behavior and perceptions can be accelerated by sharing folksonomies. A new XHTML microformat has been proposed for this purpose by Bud Gibson and it is named xFolk. See [23].

As a side benefit, tagging enhances the creation of communities around classification. People using the same keywords have a common interest. Therefore, folksonomy can be a «ridiculously low-cost kind of community that's nothing more than a beneficial side effect of people tagging documents for their own future recall» as Gene Smith writes in his post after IA Summit 2005. See [24].

From trees to leaves: a comparison of taxonomies, facets and folksonomies

Folksonomies are not the solution to every modern problem of classification and they are not alternative to the traditional classification schemes librarians have designed over the years. They are more simply a powerful and innovative tool that should be applied only under the right circumstances and considering their own specific properties and the differences in respect to other classification schemes as taxonomies and faceted classification.

Here are outlined some of the major differences between folksonomies and traditional classification:

Using a sentence from David Weinberger, «Trees are neat; piles of leaves are messy».

Because of these differences, taxonomies, facets and folksonomies have different potential areas of application:

For more information see [25].

Conclusions

Folksonomies are a new, rapidly evolving approach to classification of digital objects. Much has still to be discovered and tested. What we have not created yet is probably «a middle ground, somewhere between the pure democracy of bottom-up tagging and the empirical determinism of top-down controlled vocabularies». In this scenario, «users could freely create, adopt or reject terms stored in a distributed repository that gets administered by a representative authority that "owns" the vocabulary». See [6, 26].

All that we have to do is to merge and leverage emerging and traditional tools to improve findability. Somewhere at the intersection of those two models is a more powerful framework for identifying, sharing, and finding information.

The goal is a metadata ecology, where the best tools we have bend towards a real user-centred design. See [13].

The increasing interest in folksonomies is confirmed by new projects like Freetag. Freetag is an API written in PHP for setting up a folksonomy on a website. With such tools, in a near future, we should be able to leverage the power of folksonomies outside of the original environment that introduced them, such as Flickr. See [27].

Traditional hierarchies for organizing information (or reality) will not be replaced by tags, but through tagging we are finding new ways of thinking about classification and new applications for organizing and sharing knowledge. See [28].

Acknowledgments

This paper has been written with the help of some clever and supportive friends. First of all, I want to thank Antonella Pastore (information architect currently working on user research, strategic consulting and evaluation of information systems for international organizations) for the invaluable editorial and linguistic help and for the fundamental advice on some central parts of this paper. Another thankful mention goes to Claudio Gnoli (librarian, teacher and chair of the Italian chapter of the International Society for Knowledge Organization) for the inspiration and the enlightening explanations. A special thought goes to Peter Van Dijck and Luca Rosati for the encouragement and the discussions about all things information architecture. Moreover, I would like to thank James Weinheimer (information management specialist at the FAO of the UN, and moderator of the ASIS&T SIGIA mailing list) for the kind and insightful review. Finally, thanks to everyone who kept on reading until the end of this article.

Bibliography

1: Content classification -- <http://encyclozine.com/Reference/Library/Classification/>

2: Innovation in classification / Peter Merholz -- <http://www.peterme.com/archives/00000063.html> : September 23, 2001

3: (Weblogs and) The mass amateurisation of (nearly) everything... / Tom Coates -- <http://www.plasticbag.org/archives/2003/09/weblogs_and_the_mass_amateurisation_of_nearly_everything.shtml> : - September 03, 2003

4: An addendum to a definition of social software / Tom Coates -- <http://www.plasticbag.org/archives/2005/01/an_addendum_to_a_definition_of_social_software.shtml> : January 5, 2005

5: Ontology is overrated: categories, links, tags / Clay Shirky -- <http://shirky.com/writings/ontology_overrated.html>

6: Folksonomy / Alex Wright -- <http://www.agwright.com/blog/archives/000900.html> : January 5, 2005

7: Folksonomy (Wikipedia) -- <http://en.wikipedia.org/wiki/Folksonomy>

8: Introduction: Jon Lebkowsky / Jon Lebkowsky -- <http://tagsonomy.com/index.php/introduction-jon-lebkowsky/> : January 5, 2005

9: I've heard of folksonomies. Now how do I apply them to my site? / Joshua Porter -- <http://www.bokardo.com/archives/applying_folksonomies/> : January 5, 2005

10: Collaborative knowledge gardening / Jon Udell -- <http://www.infoworld.com/article/04/08/20/34OPstrategic_1.html> : August 20, 2004

11: Explaining and showing broad and narrow folksonomies / Thomas Vander Wal -- <http://www.personalinfocloud.com/2005/02/explaining_and_.html> : February 21, 2005

12: Ethnoclassification and vernacular vocabularies / Peter Merholz -- <http://www.peterme.com/archives/000387.html> : - August 30, 2004

13: Folksonomies? How about metadata ecologies? / Louis Rosenfeld -- <http://louisrosenfeld.com/home/bloug_archive/000330.html> : January 06, 2005

14: Folksonomy / Clay Shirky -- <http://www.corante.com/many/archives/2004/08/25/folksonomy.php> : - August 25, 2004

15: Controlled vocabularies cut off the long tail / Joshua Porter -- <http://bokardo.com/archives/controlled_vocabularies_long_tail/> : - March 09, 2005

16: Findability vs discoverability / Donna Maurer -- <http://www.maadmob.net/donna/blog/archives/000609.html> : March 08, 2005

17: Folksonomies are a forced move: a response to Liz / Clay Shirky -- <http://www.corante.com/many/archives/2005/01/22/folksonomies_are_a_forced_move_a_response_to_liz.php> : January 22, 2005

18: Folksonomies + controlled vocabularies / Clay Shirky -- <http://www.corante.com/many/archives/2005/01/07/folksonomies_controlled_vocabularies.php> : January 07, 2005

19: The cognitive cost of classification / Jess McMullin -- <http://www.interactionary.com/index.php?cat=7> : August 19, 2004

20: Introduction: Tino Hannay / Timo Hannay -- <http://tagsonomy.com/index.php/introduction-timo-hannay/> : August 19, 2004

21: IBM's Intranet and Folksonomy / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/03/ibms_intranet_a.html> : August 19, 2004

22: Using mapped folksonomy to break corporate silos / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/02/using_mapped_fo.html> : February 16, 2005

23: Folksonomy : practical application and xFolk / Bud Gibson -- <http://thecommunityengine.com/home/archives/2005/03/folksonomy_prac.html> : March 29, 2005

24: IA summit Folksonomies panel / Gene Smith -- <http://atomiq.org/archives/2005/03/ia_summit_folksonomies_panel.html> : March 08, 2005

25: Taxonomies and tags: from trees to piles of leaves / David Weinberger -- <http://www.hyperorg.com/blogger/misc/taxonomies_and_tags.html>

26: Bridging the gap: folksonomy and taxonomy / James Melzer -- <http://www.jamesmelzer.com/bearings/archives/2005/02/bridging_the_ga.html#more> : February 11, 2005

27: Freetag : an open source tagging : Folksonomy module for PHP/MySQL applications <http://getluky.net/freetag/>

28: Introduction: Jon Lebkowsky / Jon Lebkowsky -- <http://tagsonomy.com/index.php/introduction-jon-lebkowsky/> : May 3, 2005

Further readings

1: Faceted classification of information (The Knowledge management connection) -- <http://kmconnection.com/DOC100100.htm>

2: Metacrap: putting the torch to seven straw-men of the meta-utopia / Cory Doctorow -- <http://www.well.com/~doctorow/metacrap.htm> : August 26, 2001

3: Social bookmarking tools / T Hammond, T Hannay, B Lund, J Scott -- <http://www.dlib.org/dlib/april05/hammond/04hammond.html/> : April 2005

4: Folksonomies : cooperative classification and communication through shared metadata / Adam Mathes -- <http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html> : December 2004

5: Bookmark, classify and share: a mini-ethnography of social practices in a distributed classification community -- <http://ideant.typepad.com/ideant/2004/12/a_delicious_stu.html>

 


Folksonomies: power to the people : presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005 / by Emanuele Quintarelli (( ISKO Italia -- <