A few weeks back, I was lucky enough to have That Damn Canadian(tm) as a houseguest. Karl Fast recently finished his masters in library and information science and this fall he’ll be starting his PhD (his research area will be information visualization). Dude!
As part of his masters he took a class on thesaurus construction using facet analysis in which he had to develop a small thesaurus. So while he was in Palo Alto I took the opportunity to pump him on facets, the hot IA topic these days. I make everyone sing for their supper.
While I stirred some risotto, we talked about why facets are harder than most folks think.
Since we were there in my kitchen, Karl used the example of cooking equipment. “That pot you’re fussing over: what are its characteristics?”
“Well, it’s a pan. flatbottom. non-stick. calphalon. metal handle.”
“So you have type, material, shape, brand….those are potential facets. What about knives?”
“What have knives got to do with it?”
“Well, a knife isn’t a pan so it might have different facets.”
“Like sharpness or length.”
“Yeah, those are potential facets for the knife that aren’t facets for the pan. So some facets would be shared by most cooking items, like material and brand, and other facets would be unique to certain items, like sharpness was to the knife.”
“I think I got it.”
“Do you? All the knives in a kitchen store are sharp, but they all have different handles. It’s an important distinguishing characteristic. So how do you distinguish between the blade of the knife and the handle?
“I don’t know. Maybe material, but also color, edge and…hmmm. I don’t know.”
“Neither do I. It’s starting to get fuzzy here isn’t it? How *do* you describe the handle of a knife? I’m sure it can be done but it’ll require some research and analysis. And we should remember that most cookware has handles but handles aren’t always an important characteristic. It’s probably important for describing a knife, but probably not important for a blender.”
“Blenders! Shoot! What about strainers, toasters, lemon zesters…our classification needs to describe facets for those too, right?
“This is getting a bit harder.”
“It could be worse. What if we decided to tackle not just cookware, but the whole subject of cooking?”
“Well, we’d want to hit techniques, history, recipes.. um (looking around) interior design? Counters and shelving? And ingredients.. oh my god! ingredients. Canned, fresh, dried, fruits, vegetables.. vegetables! peas, beans, root veggies…”
“And what about something like the history of cooking? Famous chefs like Julia Child? Geographic differences? How do we handle that?”
“I don’t know either. Not yet. But with enough time one could develop a faceted scheme to handle all of this. That would take a lot of work.”
“I begin to see what you mean about facets. Not being simple.” (I opened the fridge for a beer at this point. God, the fridge was full of facets also; a french husband means a shelf dedicated to cheese: there was french italian spanish, goat, sheep and cows milk, soft and hard, herbed and plain and what about the creme fraise! where the hell would that go?…. My revery was broken by Karl reaching past the cheese for an ice tea.)
“Most writers use simple examples to describe facets. Like the cheese there”
(simple?!?! I thought to myself)
“This is effective at introducing the concept but there is a dark side. Simple examples mislead readers into thinking facets are simple, or worse, that they understand facets. Life in facet-land *is* relatively simple when you’re dealing with narrow subject areas or physical objects. Life is far more complicated when you expand your scope or when you start dealing with concepts (like history) instead of just physical characteristics. This is true of any sort of classification or indexing scheme, not just facets. And library and information scientists have done a lot of investigation into these things.”
“So coming up with a faceted scheme to describing cooking in general would take years of work, but even doing facets for cookware would take months.”
“Not necessarily. It depends on the scope of the project. The broader the subject area, the more work. It also depends on how exhaustive you want to be.
“Detailed. Cooking would take a lot of time, probably months depending on how many people are involved and how exhaustive you want to be. Cookware would be a lot easier, but not necessarily months.
So mind reeling, I put facets in the back of my head. Use cautiously. In limited way. Watch out for scope.
Some months later, I saw Adaptive Path’s article on facets, and immediately forwarded it to Karl to tease him.
“After reading this article, I’m going to put in my book how easy and fun facets are, and how every one should do them.”
“I see you have chosen the way of pain. (I just saw Lord of the Rings again–Christopher Lee is soooo evil)
Anyhow, this is not — IMHO — a particularly helpful article about facets.”
“What’s wrong with it? Is it the problem of scale we discussed?”
“It’s a tease. It tells you what facets are, sort of, but not really. Check closely and you’ll see there is only one paragraph that describes what facets are. One paragraph? Not enough.
It has other problems too:
1. It covers what facets are, not how to develop your own faceted classification scheme.
2. It doesn’t tell you how (a) difficult and (b) time consuming it is to design a faceted thesaurus, which is to say how expensive
3. The terms “thesaurus” and “classification” are never used, but that’s what you’re building. No mentioned either of controlled vocabularies.
4. A faceted thesaurus is a dynamic thing and requires a lot of time and energy to *maintain*. This costs more money.
The best thing about the article is the discussion about the interface issues. It correctly points out that these issues are enormous and much harder than they first appear.
Jeff also makes the excellent point that browsers aren’t well suited to an iterative query interface, which is the direction most faceted interfaces are headed. The idea is to use a point-n-click interface and make lots of little adjustments to your query until you’ve whittled the dataset down. Each iteration involves a request to the server and the relatively slow network response time makes this problematic (information visualization, my research area starting this fall, faces a similar problem).”
“So it’s not inaccurate, it’s sin is that of omission? Personally, I think any kind of thesaurus or even controlled vocabulary design is incredibly difficult and time consuming.”
“There might be a few niggles, accuracy wise, but it seems basically correct.”
“So what’s the problem?”
“The article is too thin for my liking. In my view this is beyond omission. It’s like saying that the engine in your car is simply a
metal container into which you inject gasoline vapor and then light it on fire. It’s far more complicated than that and anyone trying to duplicate an engine with this information is going to fail miserably.
Now the article isn’t going to cause anyone to set themselves on fire, and it’s purpose is not to teach anyone how develop a faceted classification scheme. That should be clear to anyone who reads it. Nothing wrong with that.
My complaint is that there is a lot of talk about facets, but little of any substance. Most of it won’t help you build your own faceted classification scheme. It amounts to saying the grass is greener on the other (faceted) side, but fails to give you a map explaining how to get there and what obstacles you’ll face along the way. And the academic literature doesn’t help much either. It’s too dense and I can’t recommend it to the practitioner (not the stuff I’ve seen).”
“So where’s the article that will explain all of this in a language we can understand?”
“Well, I’d thought about writing it this summer but things have happened and I think I’m too busy (I’m going backpacking in the mountains for four weeks). More importantly, the answer is probably a series of articles. We need something to fill the gap between the enthusiastic but simplified articles we’ve been getting and the rigorous, dense explanations in the academic LIS literature.”
“So, B&A is waiting, Karl….”
Or Amy! or anyone!
Make your own conclusions. But I didn’t want to wait until an article was written to get the word out– it’s complex. So Karl agreed to let me put up our conversations.
Amy Warner’s talk at the summit http://www.asis.org/Conferences/Summit2002/IA_Summit_031602.ppt made many wonderful points about when and why to choose what degree
of controlled vocabulary you want to use. (and faceted thesauri are the Cadillac’s of controlled vocabularies. See slide 6.).
She also pointed out that often a company cannot technologically support a thesaurus, and designing one would be a waste of time and money (which Karl agrees with 100%). In fact, if you are excited by Jeff’s article, definitely go through Amy’s slides. It illustrates what it takes to make a thesaurus.
Personally while I fear faceted classification in all its majesty I think adding limited facets to your navigation is just fine. There is nothing bad about “shop by occasion, recipient, lifestyle and shops (brand)” as seen on www.redenvelope.com. It’s important to be careful when you open that genie’s bottle. Facets are like wishing: they may seem simple, but the consequences can be unpleasantly surprising.