I am a big fan of Clay Shirky. I’ve written a hub reviewing his book Here Comes Everyone. It’s one of my favorite books of 2008.
When HubPages announced the support of categories in addition to tags, many people emailed me links to Clay’s essay: “Ontology is Overrated: Categories, Links, and Tags”. Clay Shirky first presented this talk at the O’Reilly Emerging Technology Conference in 2005. Since then, it has had a tremendous impact on discussions about categories. As shown by the emails I received in 2009, it carries strong weight with some very smart people. Why would HubPages invest its time and resources to add top-down, hierarchical categories? Isn’t everyone now convinced that categories are overrated and worthless?
Well, before someone sends me another email with a link to this essay, I will try to explain why categories are a very good idea for HubPages, even in light of Clay Shirky’s fascinating analysis.
In 2005, the essay had an important purpose. Categories were often being misused and there were some exciting trends in search engines and tagging that promised greater usefulness than categories alone. Clay argues very persuasively that tags are on the rise and categorization as practiced by the Yahoo directory and libraries around the world have major flaws. If you haven’t read this paper yet, I encourage you to take a look or a listen.
Here is a summary of Clay Shirky’s main points:
- Category schemes are inevitably flawed.
- Search using key terms is significantly higher value than browsing the web via a categorized directory of web sites.
- Categorization fails because it involves mind reading and fortune telling.
- Tags have many benefits over categories.
- Binary categorization is on its way out.
If you have just read Clay’s article, there’s one major point that sticks out: lots has changed since 2005. The article was written when categories were everywhere and tags were an upcoming trend. This was before Twitter, Yahoo’s purchase of Del.icio.us (Yahoo purchased delicious later that year in December, 2005), and before HubPages was started (HubPages was started in 2006). While Clay Shirky does a great job of showing that categories have problems, he is not as successful in showing that top-down categories, in general, have no value. Others have commented on the problems with his argument: he is a bit unfair to librarians, doesn’t talk about areas where the Yahoo directory does a decent job, and is overly soft on tagging. None of this disproves that ontologies in 2005 were overrated. In my view, Clay’s main point stands strong. Rather it shows that Clay, for purposes of his main argument, is ignoring the benefits of categories and ontological systems.
Even if category schemes are flawed, they can provide significant value when they are offered to users (instead of imposed on them), when they have a nice fit to the topics people are interested in, and especially when they succeed in organizing similar and related items.
The Library Revisited
One of the most entertaining parts of Clay’s paper is his review of library categorization schemes. The Dewey categorization system organizes the world religions into 9 categories (Natural theology, Bible, Christian theology, Christian moral & devotional theology, Christian orders & local church, Christian social theology, Christian church history, Christian sects & denominations, Other religions). So, the punchline is: “How much is this not the categorization you want in the 21 st century.”
But is this really proof of a broken library system? As anyone who goes to the library knows, the call number is the least important part of a book search (it’s what you use after you have selected your books). The main search consists of a subject search, an author search, or a title search. The set of subjects is itself a set of tags. The category, as revealed through the call number, is often invisible to the person looking for a book. It could just as well be a random code (as long as it maps well to a book’s location on a shelf). Even so, there are benefits to having a category system in addition to the tagging system: it is easier to find similar books and easier to find similar topics. For most people, the benefit comes when you are looking for one book and find another book on the same subject or on a related subject.
Yahoo Directory has flaws but for many topics is quite valuable.
The Yahoo directory is easy to attack these days. Who, after all, can argue that the Yahoo directory is better than Google’s search engine. But that is not a valid measure of the worth of the Yahoo directory. Are we better off having a Yahoo directory of the web or is it better if it went away completely. The Yahoo directory clearly provides great value for certain topics. One of my personal favorites is the Yahoo directory of mathematics (no, my math blog did not make the Yahoo cut).
Directories have their place. I find that they are especially useful for very specific topics. Often, a search excerpt does not provide enough information to figure out the value and relevance of a url. In these circumstances, I find myself going to Wikipedia, Amazon categories (great for browsing books), or Yahoo’s directory.
At HubPages, we have spent a lot of time making sure that our category system is usable. We have a category tool to make it as easy as possible for authors to identify the relevant category for their hub. They can do a search, get recommendations based on their tags, look up the categories they have used previously, or browse through the full list. We will have category landing pages that showcase featured hubs of that category. We have an excellent tool so that the HubPages staff can add new categories, remove categories, or move categories around. From my experience, usability of a feature is far more important than any theory behind it. Categories are useful because they are one more way for a reader to discover new content in a specific area of interest. It is one of many navigation tools available on HubPages. If you prefer, you can filter hubs by related tags, do a full-text, site-wide search, or browse hubs by popularity, quality, or recency. I think it’s clear that categories provide value not offered by these other choices.
Tags are not the perfect end-all categorization technology
Tags are not the end all classification technology. There, I said it. I feel better. Tags are great. We will continue using them at HubPages but they have significant limitations. Their identity depends on their form more than their meaning. If one person adds a space, another person adds a hyphen, and a third person removes the space, then you have three different tags that should really the same thing. Tagging is a skill. While the experienced HubPages author may get a sense of the popular tags, the newbie and the occasional writer often have real trouble figuring out the optimal tags. A few months ago, we added a tag suggestion feature which has really helped people to pick better tags. Tools such as this can really help to improve the effectiveness of tagging.
Tags also suffer from the same problems that Clay outlines in his essay about categories: they quickly get dated and they assume some level of mind reading and fortune telling. Tag clouds are found everywhere on the web seemingly more for show than for use. In 2005, Jeffrey Zeldman asked if Tag Clouds were not becoming the web equivalent of the mullet haircut.
Categorization does not have to rely on mind reading and fortune telling
Clay argues that categorization schemes inevitably require mind reading and fortune telling. You need to mind-read the would-be users of the hierarchy scheme and then you need to predict the interests of future users as well as the relevant topics that will organize future web links. Clay writes:
“One of the biggest problems with categorizing things in advance is that it forces the categorizers to take on two jobs that have historically been quite hard: mind reading and fortune telling. It forces categorizers to guess what their users are thinking, and to make predictions about the future.”
He talks here about the problems of using the Soviet Union as a category or East Germany as a category. In both of these cases, the category is no longer relevant. This is one of the dangers of categories (and tags for that matter). They get dated over time and even well chosen categories may not be relevant to future users.
At HubPages, we provide a set of categories which our authors decide to use or not use. It is true that we are doing limited mind reading and fortune telling in the sense that we came up with our set of categories before the author writes a hub. In practice, this is not as tough as it sounds. If no one signs up for a category, we can remove it. If someone requests a category, we can add it. Additionally, our writers tend to have favorite topics so past writing behavior is a reasonably good predictor of the future. For example, the user mattressguru is excited that there is a mattress subcategory.
While each hub has to get mapped to a single category, we give authors many choices: they can use a top level category, a second level category, or in some cases, a fourth-level category: it’s up to them. This really takes away the burden of mind reading since if all else fails, the writers can pick a more general category. You can see our complete list of categories here.
Our design of categories came about from a discussion between the staff and the HubPages community. The staff got the ball rolling by coming up with an initial set of top-level and second-level categories which we presented through forum announcements. I am proud to say that most of what was initially proposed was well received and the discussion that followed really helped to improve the overall set of categories.
Binary Categorization didn’t leave completely
Visit any of your favorite sites and most likely you will find some level of categorization there. Clay Shirky’s web site, for example, has the following categories:
- New Book
- Napster and Peer-to-Peer
- WAP and the Wireless Internet
- Globalization and the Internet
- Network Economics
- Media and Community
- Open Source and Software Design
Clay was surprised to find that he could organize his writings around common “themes”. He writes:
“As I have gathered these writings together and tried to organize them, however, I have been surprised to see that there are things here that organize these writings, not so much by category as by theme.”
Recently, Google has introduced categories to its Knol product.
Clay’s arguments in his essay focus on an older way of doing categories: of having a small group of category experts create a hierarchy of boxes which they impose on the world. The HubPages approach is very different. We recognize from the beginning that categories are imperfect and we offer categories more as “themes” than as a “boxes”. Authors are able to switch categories at any time.
We moved to categories as a result of our experience with tags. Tags were too imprecise to help users find the specific content that they wished. Because authors, in a sense, get to vote for the categories they like, category ordering at HubPages is significantly more organic than the categorization schemes that Clay critiques.
I am very excited about this week’s release of categories to HubPages. I look forward to hearing the feedback as people try it out.