Classification

Well, it finally happened. I got too lazy to comb through the relationships in the taxonomy I've been using on this site over the years and I've decided to stop creating parent/child relationships. So now I only have the tag clouds. I still separate the facets for subject and people.

I believe taxonomy and thesauri to be useful when describing content for the purposes of browsing, especially to first time users of a system. It makes sense when classifying content in business information systems, reference or documentation content, newspaper and magazine sites, etc. It's useful for CMS with granular levels description. But the level of effort to maintain it over time is significant, and I never really know what the right methods are in Drupal to do things like show links to related tags when browsing taxonomy, or show related links on nodes in this site.

As I see it, I've got 2 problems:

  1. Creation Issue: I want to continue to organically tag as I create nodes. But I also want to create the relationships for each tag I create while I'm creating the tag in Node>Add mode, rather than having to go find it afterward in my non-searchable taxonomy controls.
  2. Relationship Display Issues: I want to better show relationships on both taxonomy pages (see also: synonyms, navigate to: parents) and in nodes (more entries like this, i.e. an algorithmically generated list of nodes weighted to show those containing most of the terms used in this entry).

The relationship creation is painful, though, because by freetagging, I put off the task until some later time--which seems to never come. I don't yet know how I'm going to deal with this growing problem. I've just decided to stop caring for this blog. I'm sure others could care less, but I used to use the hierarchical list of my taxonomies occasionally to survey what I've been writing about. I just haven't found the proper way to what I want in Drupal yet.

For now, all I know is that I have this big-ass tag cloud that becomes more and more difficult to maintain and explore in a meaningful way. I'm not sure how to make better of use of it, without knowing what Drupal modules work best for my needs. It's been a while since I've looked at the contributed taxonomy modules. This might be the kick in the pants I need to go see what good stuff people have come up with for problems like mine.


New videos from Michael Wesch, presented during the opening keynote at IDEA:

This video explores the changes in the way we find, store, create, critique, and share information. This video was created as a conversation starter, and works especially well when brainstorming with people about the near future and the skills needed in order to harness, evaluate, and create information effectively.

I found an interesting post on the Drupal support list about finding the right method for using taxonomy in Drupal.

I am using the taxonomy module and have several vocabularies added , several look like this: 'Politics' which has Single Hierarchy and Multiple select, and has several Terms such as 'Countries', 'Groups', 'Ideas', and Item such as countries has 'sub' Items such as 'USA', 'England', 'Germany' etc. Now.. whenever I post a new article or blog entry and tag it, say.. USA, I still need to tag Countries and Politics as well or otherwise when clicking Countries it will show an empty list. Is this the right practice?

I think this is probably the biggest issue when approaching description problems with a content management system. In large enterprise CMS, the problem might require maintaining taxonomies using the sophisticated tools for finding and grouping content based on rules created by the taxonomy tools included with or added to the CMS. But people dealing with Open Source CMS have been left to the task of figuring out how to deal with these same problems with a less sophisticated set of tools tools and plugins.

Fortunately, Drupal has recently added the Views module to at least help admins deal with display issues more easily. Taxonomy management is an area that still needs more attention. But for most people, the basic question about what practices are best suited to taxonomy use in the more full-featured systems like Drupal persist. Do you use hierarchy in taxonomies? Do you just free tag everything? How do you set up taxonomies for the outcome you expect or desire?

Of course it depends on the type of description/categorization you intend to do. To pick the right method for creating taxonomies you need to focus on how you want to extract or display that information and on how much effort you want to expend on maintaining it. The Views module seems like it will give you the flexibility you need in most cases for creating display rules. But what you want to consider is whether or not the structure you are creating will require you to spend time maintaining the views over time. For instance, given the Drupal Support example above, say the admin starts with the following structure in his Politics taxonomy:

  • Countries
    • Afghanistan
    • Algeria
    • Andorra
  • Groups
    • Al-Qaeda
    • Amnesty International

Now, say that he wants to display all of the content that comes in under the Groups category, including all of its subcategories. He can create a Groups View and select each descendent term in the taxonomy filter. But I don't believe there is a way to simply say "give me every descendent of the Groups term and show that in my view". If there is a way to do that with Views, I'd love to see it. So instead, he has to select the terms in the filter's select field.

But, say he adds "Animal Rights Advocates" under Groups. Now he wants to update his View. What I think he would need to do to update the view is go back to the Views filter and now select "Animal Rights Advocates" as well. He would need to do that for every term added to Groups.

In the example above, Countries and Groups are facets of description. And while he can choose to create a taxonomy for each facet, he's chosen to create a single taxonomy for the Politics domain and insert facets under each one.

In the past -- prior to the release of the View module -- I've worked around the display maintainenance issues by creating separate taxonomies for each facet. See the taxonomies in use on my blog for instance. Yikes, right? I've used that separation to break up the display of my metadata below each entry so I can show terms broken up by taxonomy, e.g. Subjects, People, File formats. The output looks like this:

facetted metadata screenshot

Works for that display need under each blog entry. But you'll notice when you look at the taxonomies for each facet that I also added some complexity by using hierarchy within some of the taxonomies, e.g. Subjects and People. Taking the People example, if I wanted a view that only showed Groups of people rather than individuals, I'd run into the same maintenance problem you see in the Politics taxonomy of having to manually update the Views filter again.

To make matters more complex, I am starting to use and love the new free tagging capabilities of Taxonomy for selecting my terms. I started to do this because my Subjects taxonomy is rather large and choosing multiple options in a form Select can be problematic. The free tagging options allow me to select terms much more quickly. The problem is that I've also become accustomed to adding terms using this tool. This change in behavior of "tagging as you go" conflicts with my previous behavior of first adding terms to the hierarchical taxonomies before I blog because as I now add free tags, they end up in the root of the Subject Taxonomy, requiring me to still go back and file each term I add under a parent for the hierarchy to continue to make sense. And because the Taxonomy module doesn't presently provide a way to filter terms in the admin view, e.g. to find the term I just added without paging through the list, this task has become cumbersome as an after-the-fact activity.

Here's what I see in the Categories section when I create entries in Drupal:

Taxonomies in Drupal's

This is a tricky issue, because I'm at a point where I could decide to go two different ways: to dump the hierarchy within taxonomies because of the level of effort it requires to maintain; or to continue to maintenan the hierarchy within taxonomies because it provides some value in terms of browsing. I'm sure the browsing of hierarchies is only valuable to me, but when I think about the amount of energy expended compared with value, my gut tells me that it's not worth the effort to maintain.

Doing this kind of hierarchical taxonomy management on a blog is probably unheard of, but in certain applications, e.g. enterprise content management it can be absolutely warranted and/or required. As an information worker I tend to try out different methods for describing and organizing information to help me understand which practices work in different contexts. The last several years of using taxonomies on this blog and now introducing free tagging have helped me see that each method holds utility in different circumstances, but getting both to work together nicely is a bit tricky.

So, this may not be a very clear answer to the question about best practices for taxonomy. It does serve as a cautionary tale about how being very descriptive and maintaining relationships within one hierarchy can leave you with maintenance concerns as you scale up. In terms of ease of entry using Drupal, creating facets within one taxonomy might make it easier to select terms when you are creating new content and doing free tagging using one field in Drupal. But organizationally speaking, that approach seems to presents you with some challenging display and maintenance issues, for now at least.

[Warning, this is a sort of a brain dump/thought wander as I put together my thoughts about this topic.]

Someone at work pointed out this discussion of OPAC as tag clouds on The Gordian Knot. OPAC stands for Online Public Access Catalog, the database you would use in a library to search for titles and manage your transactions.

The exploration of different methods for displaying terms is interesting, but what I point out is that a tag cloud serves a different purpose than a vertically arranged list -- usually to display frequency of use of user-supplied keywords (freetags). That's why it's called a TAG cloud not a SUBJECT HEADING cloud, the difference being that tags are created and applied when the item being tagged is examined whereas the application of a subject heading involves the selection of a term from an authorized list that's already been developed and is thus usually more or less static (e.g. the worst case scenario, Library of Congress Subject Headings).

It's not particularly clear to me what systems would display from an OPAC if they're not dealing with user-supplied freetags. Perhaps we'd be dealing with either a) frequency of occurrences of the keywords or subject headings in the corpus (the OPAC) or b) number of items with a keyword or subject heading applied? If we're visualizing occurrences of keywords within the corpus that only tells us about how librarians catalog and about what the library collection contains rather than about what the patrons are using. I suppose, however, that you can correlate this frequency with popularity of loans to make it more meaningful. Tag clouds would be most interesting if they could show us how actual users perceive the collection, and unless I'm missing something here I don't think that's possible unless you allow freetagging.

The cloud display might not be particularly good for people who are interested in skimming lengthy lists of controlled vocabulary terms for known items, and that's the distinction I'd like to make, lest we start putting tag clouds all over information systems or using them in place of vertically-arranged lists. I think, however, when people read the entry on Gordian Knot or Shifted Librarian they might be get caught up in the demonstrations without really exploring the proofs of concept and how to make them useful. I'm curious to know if there are any OPACs out there that have some form of free tagging functionality built in. Clearly, those systems could do something with the cloud displays right away. Since the demonstrations in these threads deal not with free tagging, I wonder if cloud displays of terms are appropriate. I think you can make the case for cloud displays, however, if you execute on the right set of data. In our organization, for example, a programmer is doing some proof of concept for frequency of occurence of terms in our News source (Factiva), splitting up by facet (companies, industries, etc.), and that makes sense. We're visualizing the hot topics for the period being indexed and that seems to work. For example, showing the news tagged by company name, displaying the most frequent discussed companies can indicate which companies are being discussed most in the media, like a buzz metric. It would be really cool to apply this type of analysis against other sources as well, e.g. against weblog data using Moreover's service, for example.

But to get back to the discussion on the blogs, take a look at the example from Davey P's library blog. It shows a large list of subject headings from a database where a subject contains more than 10 items. Again, it tells you more about the collection and the cataloging than about usage. It is interesting as a visualization, but the thing about clouds is that they force you to work really hard if you are looking for known items, because vertical scanning for first letter occurrences is quicker than horizontal scanning. There's no reason why you couldn't do this as an alternative version for visualization of certain types of lists. But, the question really is, should you? I'm not offering any answers, I'm just playing devil's advocate.

Just to make the comparison of the two methods of display, take a look at this site's (urlgreyhot's) categories displayed as tag clouds and as vertical lists:

My blog tag cloud:
http://urlgreyhot.com/personal/tagadelic

Versus browsing the hierarchical lists of categories:
http://urlgreyhot.com/personal/sitemenu

The cloud is good for showing you which terms I applied most frequently, while the hierarchical list excels at being exhaustive and supporting skimming for known items. Granted the first example is simply a cloud representation of my controlled vocabulary rather than being a cloud of user-supplied freetags or keywords. But my point here is that each display serves different purposes and different types of information seeking tasks.

The appropriateness of the display should be determined by the nature of the information need or question the user demands of the system. I can't imagine that very many people coming to an OPAC would wonder, "Hmm, I wonder what subjects this library has the most books on?" Or alternatively, "What subject headings hold the most books in this library?, Let me browse the subject headings by number of books in the collection." Or maybe someone would want to know this. I don't know.

It could be very appropriate and meaningful to know, on the other hand what books are most popular (i.e. most borrowed) when browsing, especially when narrowing within a subject heading. I suppose that's more the realm of recommender systems, however, but still meaningful in this context. User-generated keywords would be a welcome addition to most OPACs and to other types of databases for that matter (see Headshift's BBC tagging demo, for example). But the problem with a freetagging cloud is that you'd have to have enough tags added to the system to make the display meaningful. I don't think there'd be a way to bootstrap that. You'd really have to put the tagging functionality in and wait for patrons to use it. I wonder how many people would tag an OPAC. I would guess very few, but it depends on the user group and how you incentivize freetagging.

These are all good things to explore, but before I advocate putting up tag clouds everywhere in our OPAC, I have to emphasize our focus on user needs/goals so we come up with the solutions that are most appropriate for meeting those needs rather than just throwing up more and more features into our information systems just because we can. The point is to design features to anticipate needs and information seeking behaviors. If a tag cloud anticipates a certain type of information seeking behavior, then it's appropriate. But you have to know and understand those behaviors and needs first. That's the part of the design process thats missing in these OPAC discussions.

A native Mac OS X application for thesaurus construction.

This piece is based on two talks Clay Shirky gave in the spring of 2005 -- one at the O'Reilly ETech conference in March, entitled "Ontology Is Overrated", and one at the IMCExpo in April entitled "Folksonomies & Tags: The rise of user-developed classification." The written version is a heavily edited concatenation of those two talks.

Open source software for Social Bookmarking, Tagging, Blogging & Notes.

Technorati's tag indexing has gotten me interested in having a full Atom feed for this site, including categories. So I downloaded the latest Atom module for Drupal 4.5 and then used Walkah's patch to modify the module to generate a full Atom feed. Then http://urlgreyhot.com/personal/atom/feed stopped working. Oddly enough, when I commented out the cache portion it worked again. So now I have an atom feed with categories here: http://urlgreyhot.com/personal/atom.xml

I recording the steps I took to do this because this module is not documented (as is sometimes the case with Drupal modules). Good luck:

1) Downloaded the atom module.

2) Downloaded atom.diff and saved to modules/atom/ directory.

3) Executed command in Unix terminal window: patch < atom.diff

4) Checked the module's output at http://urlgreyhot.com/personal/atom/feed

5) Created an alias via the menu "Administer > Url aliases" from atom/feed to atom.xml.

Thanks to Kika for providing the module and to Walkah for the patch. This is why I now send people who ask me if I freelance over to people like those Bryght guys.

I've been looking a lot at bookmark managers. We've been doing some development of our own bookmarks managers at work. One of our programmers had a Url manager that has seen some use as a proof of concept. Worked much like Drupal's cloud system or a lot of bloggrolling apps by checking pages for updates.

With the buzz around folksonomies and tagging, I've found it particularly attractive to bring the idea of user tagging into a project we're developing to meet one of our customers' needs. So we've begun building a new program from the ground up that will serve as our bookmark manager. I'll probably be posting more about this in the coming days and will likely include some wireframes and screenshots in my portfolio. The most interesting aspect of all of this is how we're developing sharing/publishing services around this application to drive content selecting and publishing on corporate portals.

I've been meaning to update the portfolio with the work I've done at Lucent in the last 3 years. I just haven't had a reason to update it. I hope to add some of this stuff soon.

Some related sites of interest:

A client was looking at different collaboration and information sharing software such as Groove and OnFolio. I've tried both and they're interesting client software to consider if you're looking for these functionalities.

Feedmarker is also a newish bookmarks manager to watch.

Of interest to Blog users is the new tag indexing service on Technorati. Technorati has started gathering tagged results from Flickr and del.icio.us and includes them in their index so you can search, for instance tag:nyc and retrieve all entries using that user supplied tag on those systems.

Blog systems that support categories can use the following elements, which technorati will index as tags:

<category>[tagname]</category>
<dc:subject>[tagname]</dc:subject>

I'm not sure what technorati will do with categories that have spaces in them, however, so that makes the use of existing categories questionable. It would be nice if they remove the spaces in those categories or something.

If your web publishing software doesn't support categories or if you don't know how to configure them to add categories to your RSS feed, you can include a link that identifies your entry with a tag and technorati will index it, e.g. <a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a>. This also seems particularly useful for tagging wikis or manually created html pages.

If this indexing takes off as a system for aggregating topics across different sites, I can imagine that a lot of people might start to become interested in creating controlled tag lists. If the tags are unique enough then this might be a usable system until someone creates an service that is solely dedicated to this type of cross-site topic mapping.

Here's an example of how you might hack together a process to use on Technorati. Create a tag set that prepends each tag with a long string followed by the tag, e.g. drupal-bloggers-[tag], and get everyone in your community to start using that tag as their category. Make sure your blog system puts out the category in RSS or Atom feeds. Then every term that uses that tag will be aggregated in Technorati. Problem is that Technorati is not providing an XML feed for the results, so you can't actually aggregate them in your own RSS reader. They do have an API, however, so I'm sure someone could create an application that takes advantage of this tag indexing. I wonder how long it will take before Feedster does.

So here's the open question to Drupal developers. Can we get categories in our RSS feeds? We Drupal users already have RSS feeds by taxonomy (e.g. at the bottom of any of my category pages there is an XML feed button), but it would be nice if we can get our categories indexed by Technorati as well. I'd love to do a proof of concept for community tagging around a topic. The participants could create a community-editable page to manage the master list of tags to use for something like Drupal related topics. Then we get our blogs to output tags in RSS.

[Technorati tags: , , ]