Well, it finally happened. I got too lazy to comb through the relationships in the taxonomy I've been using on this site over the years and I've decided to stop creating parent/child relationships. So now I only have the tag clouds. I still separate the facets for subject and people.
I believe taxonomy and thesauri to be useful when describing content for the purposes of browsing, especially to first time users of a system. It makes sense when classifying content in business information systems, reference or documentation content, newspaper and magazine sites, etc. It's useful for CMS with granular levels description. But the level of effort to maintain it over time is significant, and I never really know what the right methods are in Drupal to do things like show links to related tags when browsing taxonomy, or show related links on nodes in this site.
As I see it, I've got 2 problems:
- Creation Issue: I want to continue to organically tag as I create nodes. But I also want to create the relationships for each tag I create while I'm creating the tag in Node>Add mode, rather than having to go find it afterward in my non-searchable taxonomy controls.
- Relationship Display Issues: I want to better show relationships on both taxonomy pages (see also: synonyms, navigate to: parents) and in nodes (more entries like this, i.e. an algorithmically generated list of nodes weighted to show those containing most of the terms used in this entry).
The relationship creation is painful, though, because by freetagging, I put off the task until some later time--which seems to never come. I don't yet know how I'm going to deal with this growing problem. I've just decided to stop caring for this blog. I'm sure others could care less, but I used to use the hierarchical list of my taxonomies occasionally to survey what I've been writing about. I just haven't found the proper way to what I want in Drupal yet.
For now, all I know is that I have this big-ass tag cloud that becomes more and more difficult to maintain and explore in a meaningful way. I'm not sure how to make better of use of it, without knowing what Drupal modules work best for my needs. It's been a while since I've looked at the contributed taxonomy modules. This might be the kick in the pants I need to go see what good stuff people have come up with for problems like mine.
My del.icio.us tag cloud in Jonathan Feinberg's Wordle:
About Wordle:
Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like.
Gmail updates labels to add coloring.
1) On the left side, you see the label panel. Clicking on the boxes to the left provides a drop down allowing you to provide a color for the label and now allows you to edit the label directly via the panel.
2) On the right side you see the label with coloring in the messages panel.
I found an interesting post on the Drupal support list about finding the right method for using taxonomy in Drupal.
I am using the taxonomy module and have several vocabularies added , several look like this: 'Politics' which has Single Hierarchy and Multiple select, and has several Terms such as 'Countries', 'Groups', 'Ideas', and Item such as countries has 'sub' Items such as 'USA', 'England', 'Germany' etc. Now.. whenever I post a new article or blog entry and tag it, say.. USA, I still need to tag Countries and Politics as well or otherwise when clicking Countries it will show an empty list. Is this the right practice?
I think this is probably the biggest issue when approaching description problems with a content management system. In large enterprise CMS, the problem might require maintaining taxonomies using the sophisticated tools for finding and grouping content based on rules created by the taxonomy tools included with or added to the CMS. But people dealing with Open Source CMS have been left to the task of figuring out how to deal with these same problems with a less sophisticated set of tools tools and plugins.
Fortunately, Drupal has recently added the Views module to at least help admins deal with display issues more easily. Taxonomy management is an area that still needs more attention. But for most people, the basic question about what practices are best suited to taxonomy use in the more full-featured systems like Drupal persist. Do you use hierarchy in taxonomies? Do you just free tag everything? How do you set up taxonomies for the outcome you expect or desire?
Of course it depends on the type of description/categorization you intend to do. To pick the right method for creating taxonomies you need to focus on how you want to extract or display that information and on how much effort you want to expend on maintaining it. The Views module seems like it will give you the flexibility you need in most cases for creating display rules. But what you want to consider is whether or not the structure you are creating will require you to spend time maintaining the views over time. For instance, given the Drupal Support example above, say the admin starts with the following structure in his Politics taxonomy:
- Countries
- Afghanistan
- Algeria
- Andorra
- Groups
- Al-Qaeda
- Amnesty International
Now, say that he wants to display all of the content that comes in under the Groups category, including all of its subcategories. He can create a Groups View and select each descendent term in the taxonomy filter. But I don't believe there is a way to simply say "give me every descendent of the Groups term and show that in my view". If there is a way to do that with Views, I'd love to see it. So instead, he has to select the terms in the filter's select field.
But, say he adds "Animal Rights Advocates" under Groups. Now he wants to update his View. What I think he would need to do to update the view is go back to the Views filter and now select "Animal Rights Advocates" as well. He would need to do that for every term added to Groups.
In the example above, Countries and Groups are facets of description. And while he can choose to create a taxonomy for each facet, he's chosen to create a single taxonomy for the Politics domain and insert facets under each one.
In the past -- prior to the release of the View module -- I've worked around the display maintainenance issues by creating separate taxonomies for each facet. See the taxonomies in use on my blog for instance. Yikes, right? I've used that separation to break up the display of my metadata below each entry so I can show terms broken up by taxonomy, e.g. Subjects, People, File formats. The output looks like this:
Works for that display need under each blog entry. But you'll notice when you look at the taxonomies for each facet that I also added some complexity by using hierarchy within some of the taxonomies, e.g. Subjects and People. Taking the People example, if I wanted a view that only showed Groups of people rather than individuals, I'd run into the same maintenance problem you see in the Politics taxonomy of having to manually update the Views filter again.
To make matters more complex, I am starting to use and love the new free tagging capabilities of Taxonomy for selecting my terms. I started to do this because my Subjects taxonomy is rather large and choosing multiple options in a form Select can be problematic. The free tagging options allow me to select terms much more quickly. The problem is that I've also become accustomed to adding terms using this tool. This change in behavior of "tagging as you go" conflicts with my previous behavior of first adding terms to the hierarchical taxonomies before I blog because as I now add free tags, they end up in the root of the Subject Taxonomy, requiring me to still go back and file each term I add under a parent for the hierarchy to continue to make sense. And because the Taxonomy module doesn't presently provide a way to filter terms in the admin view, e.g. to find the term I just added without paging through the list, this task has become cumbersome as an after-the-fact activity.
Here's what I see in the Categories section when I create entries in Drupal:
This is a tricky issue, because I'm at a point where I could decide to go two different ways: to dump the hierarchy within taxonomies because of the level of effort it requires to maintain; or to continue to maintenan the hierarchy within taxonomies because it provides some value in terms of browsing. I'm sure the browsing of hierarchies is only valuable to me, but when I think about the amount of energy expended compared with value, my gut tells me that it's not worth the effort to maintain.
Doing this kind of hierarchical taxonomy management on a blog is probably unheard of, but in certain applications, e.g. enterprise content management it can be absolutely warranted and/or required. As an information worker I tend to try out different methods for describing and organizing information to help me understand which practices work in different contexts. The last several years of using taxonomies on this blog and now introducing free tagging have helped me see that each method holds utility in different circumstances, but getting both to work together nicely is a bit tricky.
So, this may not be a very clear answer to the question about best practices for taxonomy. It does serve as a cautionary tale about how being very descriptive and maintaining relationships within one hierarchy can leave you with maintenance concerns as you scale up. In terms of ease of entry using Drupal, creating facets within one taxonomy might make it easier to select terms when you are creating new content and doing free tagging using one field in Drupal. But organizationally speaking, that approach seems to presents you with some challenging display and maintenance issues, for now at least.
[Warning, this is a sort of a brain dump/thought wander as I put together my thoughts about this topic.]
Someone at work pointed out this discussion of OPAC as tag clouds on The Gordian Knot. OPAC stands for Online Public Access Catalog, the database you would use in a library to search for titles and manage your transactions.
The exploration of different methods for displaying terms is interesting, but what I point out is that a tag cloud serves a different purpose than a vertically arranged list -- usually to display frequency of use of user-supplied keywords (freetags). That's why it's called a TAG cloud not a SUBJECT HEADING cloud, the difference being that tags are created and applied when the item being tagged is examined whereas the application of a subject heading involves the selection of a term from an authorized list that's already been developed and is thus usually more or less static (e.g. the worst case scenario, Library of Congress Subject Headings).
It's not particularly clear to me what systems would display from an OPAC if they're not dealing with user-supplied freetags. Perhaps we'd be dealing with either a) frequency of occurrences of the keywords or subject headings in the corpus (the OPAC) or b) number of items with a keyword or subject heading applied? If we're visualizing occurrences of keywords within the corpus that only tells us about how librarians catalog and about what the library collection contains rather than about what the patrons are using. I suppose, however, that you can correlate this frequency with popularity of loans to make it more meaningful. Tag clouds would be most interesting if they could show us how actual users perceive the collection, and unless I'm missing something here I don't think that's possible unless you allow freetagging.
The cloud display might not be particularly good for people who are interested in skimming lengthy lists of controlled vocabulary terms for known items, and that's the distinction I'd like to make, lest we start putting tag clouds all over information systems or using them in place of vertically-arranged lists. I think, however, when people read the entry on Gordian Knot or Shifted Librarian they might be get caught up in the demonstrations without really exploring the proofs of concept and how to make them useful. I'm curious to know if there are any OPACs out there that have some form of free tagging functionality built in. Clearly, those systems could do something with the cloud displays right away. Since the demonstrations in these threads deal not with free tagging, I wonder if cloud displays of terms are appropriate. I think you can make the case for cloud displays, however, if you execute on the right set of data. In our organization, for example, a programmer is doing some proof of concept for frequency of occurence of terms in our News source (Factiva), splitting up by facet (companies, industries, etc.), and that makes sense. We're visualizing the hot topics for the period being indexed and that seems to work. For example, showing the news tagged by company name, displaying the most frequent discussed companies can indicate which companies are being discussed most in the media, like a buzz metric. It would be really cool to apply this type of analysis against other sources as well, e.g. against weblog data using Moreover's service, for example.
But to get back to the discussion on the blogs, take a look at the example from Davey P's library blog. It shows a large list of subject headings from a database where a subject contains more than 10 items. Again, it tells you more about the collection and the cataloging than about usage. It is interesting as a visualization, but the thing about clouds is that they force you to work really hard if you are looking for known items, because vertical scanning for first letter occurrences is quicker than horizontal scanning. There's no reason why you couldn't do this as an alternative version for visualization of certain types of lists. But, the question really is, should you? I'm not offering any answers, I'm just playing devil's advocate.
Just to make the comparison of the two methods of display, take a look at this site's (urlgreyhot's) categories displayed as tag clouds and as vertical lists:
My blog tag cloud:
http://urlgreyhot.com/personal/tagadelic
Versus browsing the hierarchical lists of categories:
http://urlgreyhot.com/personal/sitemenu
The cloud is good for showing you which terms I applied most frequently, while the hierarchical list excels at being exhaustive and supporting skimming for known items. Granted the first example is simply a cloud representation of my controlled vocabulary rather than being a cloud of user-supplied freetags or keywords. But my point here is that each display serves different purposes and different types of information seeking tasks.
The appropriateness of the display should be determined by the nature of the information need or question the user demands of the system. I can't imagine that very many people coming to an OPAC would wonder, "Hmm, I wonder what subjects this library has the most books on?" Or alternatively, "What subject headings hold the most books in this library?, Let me browse the subject headings by number of books in the collection." Or maybe someone would want to know this. I don't know.
It could be very appropriate and meaningful to know, on the other hand what books are most popular (i.e. most borrowed) when browsing, especially when narrowing within a subject heading. I suppose that's more the realm of recommender systems, however, but still meaningful in this context. User-generated keywords would be a welcome addition to most OPACs and to other types of databases for that matter (see Headshift's BBC tagging demo, for example). But the problem with a freetagging cloud is that you'd have to have enough tags added to the system to make the display meaningful. I don't think there'd be a way to bootstrap that. You'd really have to put the tagging functionality in and wait for patrons to use it. I wonder how many people would tag an OPAC. I would guess very few, but it depends on the user group and how you incentivize freetagging.
These are all good things to explore, but before I advocate putting up tag clouds everywhere in our OPAC, I have to emphasize our focus on user needs/goals so we come up with the solutions that are most appropriate for meeting those needs rather than just throwing up more and more features into our information systems just because we can. The point is to design features to anticipate needs and information seeking behaviors. If a tag cloud anticipates a certain type of information seeking behavior, then it's appropriate. But you have to know and understand those behaviors and needs first. That's the part of the design process thats missing in these OPAC discussions.
For the past several months in my group at Lucent we've been testing out a system developed to be a simple self-service publishing application. You might recognize the interface. It follows the model other social bookmarking services have made common.

Identifying the needs
The idea to take the concept of social bookmarking and turn it into more than just a bookmark saving service came as the result of several different types of requests we've gotten in the past. One type of request was for a way to clip or save articles found on our site digital library site. We aggregate a wide variety of diverse sources. The most relevant databases include vendor news (e.g. feeds from Factiva for newspapaers and journals) and internal databases (e.g. internal news publications, technical documents repository).
A second and more urgent request we got was to provide a way for users to save articles found on our site and publish them on portlets within the corporate portal. Portlets are small windows of html content that act like little building blocks or modules in a portal page.
Several things we had done in the past helped us to add on to or evolve our existing database system and develop a new and separate system that would handle these specific bookmarking needs. We had already RSS-ified our databases, providing very complete feeds of our data as XML and partial feeds (bibliographic data) of our data as RSS. Prior to that, the primary method for doing something with database results was to set up an email or web-based alert. But the new set of requirements dealt with two issues:
- Tagging of individual records
- Re-use of records off site
Social bookmarking to the rescue
So I began developing the concept for using the social bookmarking model we've been seeing on sites like del.icio.us and furl. The first requirement was to provide a means for flagging records. The second was to provide a way to re-use that data elsewhere.
Our first releases did pretty much everything that del.icio.us does. We provided a bookmarklet/favelet for saving, tagging and commenting on a web page. The default view for bookmarks showed all users tagged bookmark entries, and you could navigate to view all bookmarks under a single tag, the bookmarks of one user, etc.
The screenshots below show the bookmarks main page and the pop-up window for saving/modifying a bookmark.


The application was shaping up to be pretty decent, utilizing all of the commonplace features on social bookmarking sites. We integrated the XML and RSS feed feature that we already used on our other databases. Feeds are available for any view the application can generate, e.g. Michael's bookmarks, Michael's bookmarks on tag "searchengines", All users bookmarks, All users bookmarks on "searchengines".
Self service publishing
Now the reason I thought we could try to use this model for self-service portal publishing is the free-tagging model. The idea was to allow individuals or groups to start bookmarking articles from our News databases, e.g. any of the Factiva sources such as newspaper and magazine articles. They could use a common tag, e.g. Mobility-Portal-Hot-News, for instance. Then they could get an aggregation of all of the articles saved with that tag and somehow display them in a portlet. Of course, controlled vocabularies would have worked as well, but the free-tagging model allows them to define the use. The portlet idea is just one applicable use. There are others we could thinkg of including ad-hoc reporting.
Feeds and exporting
This was shaping up to be a pretty decent way to do self-service publishing, but the obstacle of knowing what to do with RSS stood in the way. The concept of a feed is still pretty foreign to most business users. Savvy users can install RSS readers, but re-using that content on web sites would be time consuming. The next step was to provide a means for doing this more simply.
We first provided an HTML output along with RSS, thinking that portlets could display this content as HTML, but that necessitated using iframes. The second idea I came up with was to use JavaScript to put the bookmark entries in a JS feed with the latest entries stored in an array. Then portal owners could insert a JavaScript in a portlet that referenced the JS feed and the recent entries would be displayed on the site as HTML. If you're familiar with how Google Adsense ads work, you know how simple this is.
The screenshots below show the process of preparing scripts for display on a portlet:



As always with the type of evolutionary design we do where I work, these proofs of concept helped drive the design of other functionalities we could think of. One of the nice things about working in-house somewhere is that you can continue to improve applications over time.
A common request we've gotten in the past was to provide a way to create reports for things. We commonly do output of some data for Excel, for instance. For this tool, it made sense to provide a way to generate bibliographies of bookmarks. So I began creating a tool to tranform the data into APA-style bibliographies at first, with plans to also provide RTF export of bookmark lists.


Controlling the sprawl
The set of steps we took up until now took each function and divided them into atoms or pieces of functionality that we added to our existing systems. I'm very interested in the organic approach to solving the problems. The programmer I tend to work with likes to work this way. I document the needs and the concept for the application, he makes the prototype and we evolve it together. It's actually a pretty nice approach, and we have the freedom and flexibility to do things this way.
All of these features make the system servicable, but as we conceived of different functionalities to add, it became clear that this system was becoming more and more complex from a user perspective and could do with some simplification. I liken this to getting control of a garden that has become overgrown. At some point all of those aggresive plants start dominating and stifle the smaller ones. What do we do so we can see the parts more clearly again?
At this point, I'm trying to get some traction behind removing all those little XML, RSS, HTML, and JS buttons and replacing them with one button for viewing "Export options". I'm presently trying to design the interaction and interface for this clean up.
It's been an interesting several months testing out this application. It's nice to work on such a small application that suits very narrowly defined needs. Smaller, well defined scenarios are much easier to design for than broader scenarios and rules. In the end, these small scenarios fit into the larger business rules we've established for the site and if done right, will feed back into the way we design other aspects of the site. In this instance, the self-service functionalities created for the bookmark application will be added to our other databases so that people can, for instance, create a search on a news database and generate a JavaScript to display the feed from that source on a portlet. The common example is to do a search Factiva News, for instance, on a topic like 802.11 and automatically display the links to the news items on your portal site.
This application still has a bit further to go. We're still talking about issues such as making some bookmarks private. That is possibly the last system feature we'll add. The remaining work is just refining the interface for exporting. I'm interested in seeing how other library systems are approaching the need to re-use data. Clearly enterprise information systems should be thinking about these types of issues. I'm constantly thinking of how aspects of our system can be made more useful to people throughout the company.
This piece is based on two talks Clay Shirky gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification." The written version is a heavily edited concatenation of those two talks.
Rashmi Sinha discusses freetagging as a form of data freelisting for card sorting.
Dan Brown puts a finer point on the folksonomy buzz, which is already getting too loud for my ears. Dan makes the point of clarifying that the process of freetagging is not the same as creating folksonomies. The notion of a freetag, or a user-supplied index term as I know it to be historically called in IR, is not the same as builiding what Thomas Vanderwal calls a folksonomy. Folksonomies are like (analagous to) thesauri or taxonomies (without the important aspect of control, of course).
But folksonomies, unlike taxonomies, aren't built, they emerge organically through the accretion of freetags. It's probably a good point to make the analogy that freetag is like index term and folksonomy is like taxonomy (or controlled vocabulary) in order to help people understand what these terms mean. There's no doubt that there are information workers outside of the world of del.icio.us and flickr have no idea what these terms mean and why they should matter. They will need to pay attention sooner or later.
The distinctions Dan and Thomas are making are probably minor to most people who use freetagging sites. I think Dan and Thomas are navel gazing at the minutiae because terms are being coined left and right in industry mags, on discussion boards and in blogs and knowledgable IAs and content people are trying to hone the terms so the meaning matches the use of the vocabulary. This is sort of important at this stage, because soon applications will be released that throw the terms around and as the applications start getting recognition, the meaning of terms will become modified with the use. This is sort of what happened with the term "taxonomy", which a lot of information workers hated because it wasn't quite correct. A few key business people start using a term one way and boom, it's accepted jargon.
This idea of freetagging isn't new by the way, but the bubbling up of tags into large shared lists in heavily used sites is. If only del.icio.us or flickr would think about applying synonym rings to make clustering more usable then we'd have something special. Even I'd be willing to work on that in order to use it, especially on image databases.
Figuring out how to make images findable was the main reason I studied library and information science. I even proposed freetags (user-supplied keywords I called them) in a hypothetical visual resources database I wrote a spec for in 1997. After grad school I turned down job offers related to image indexing at StockObjects and TMS and interviews at Corbis because I wanted be a web designer instead. I didn't like the idea of going into those places to actually work as an indexer massaging the thesauri. I don't even do that where I work now because we have someone who dedicates about 75% of his time doing just that. I couldn't do that.
Anyway, all interesting stuff. I just hope the recent rash of arbitrary technology acronyms and neologisms end soon.
"There’s been some excellent IA discussion lately on the concept of “social classification” (aka “folksonomy” aka “ethnoclassification”). The concept pretty much is that if you have a bunch of people independently classifying a selection of resources, responding to other people’s classifications and perhaps altering their own classifications as a result—in aggregate, you might have something really useful."



