So I started a conversation with the clever and oh-so helpful Mike Steckel from International SEMATECH about thesauri and their kinfolk. It seems he learned a ton from the argus seminar, and was kind enough to share some of that learning with me.
It proved to be trendously helpful. You wouldn’t not believe how little about organization tools is in english for ordinary people. Kudos to Mike and the former argonauts!
I reproduce it below in hopes it helps some other poor lost fool
Despite having read this, and having followed multiple links from google, I can’t really sort out how a controlled vocabulary is so different from a thesaurus (they seem to be used almost interchangeably) and why is it useful. I seems to me– please please correct me– that a controlled vocabulary could hinder information retrieval if used without a thesaurus.. so is it just the basis for one? or?
Use “CONGO, Democratic Republic of” rather than ZAIRE
A thesaurus is a very advanced way of controlling vocabulary and in general shows:
1. Equivalence – variants and preferred terms
2. Hierarchical – Broader and Narrower
3. Associative – “see also” references
Hope that helps.
I find it interesting that thesaurus is a subset of controlled vocabularies.. what are other sort of controlled vocabularies? would a misspelling list be one?
An article: http://www.dlib.org/dlib/november98/11batty.html
Peter and Lou have written a lot on this.
A huge series of links on the subject.
I’m still trying to figure out relationships between controlled vocabs, thesauri, taxonomies, and keywords…
sigh. messy, innit?
Normally a taxonomy does add in hierarchy, but it does not attempt to be a
complete representation of something.
Check out the ASIS thesaurus:
http://www.asis.org/Publications/Thesaurus/isframe.htm
or the Art and Architecture Thesaurus (faceted! Cool!):
http://www.getty.edu/research/tools/vocabulary/aat/hierarchies.html
These attempt to be fully descriptive of the activities of their field. People often think of Roget’s when they think about thesaurus, this basically is something different.
A taxonomy is smaller, but usually does contain hierarchies. “Taxonomy” is often thrown around in KM circles and is the most abused word in your list.
Both of these (taxonomy and thesaurus) are controlled vocabularies. In my case, I have a thesaurus of semiconductor manufacturing terms that I assign to documents for information retrieval. When I take a term from the thesaurus and put it on a document I am giving the document a keyword. When the user searches for a variant of the keyword, like, he calls it a reticle when we use the term MASK (they generally mean the same thing), we can pull the documents with MASK assigned and give them to him. The thesaurus tells us “when you see reticle, pretend it is MASK.” A taxonomy would do the same thing.
I would say that this is a taxonomy that would be familiar:
http://www.usableweb.com/
Keith assigns keywords to the documents he puts here. The thing he draws from is the taxonomy. The terms are taken more or less from the material and organized, hard to do this without some hierarchy involved.
is yahoo a place where we can start to talk about relationships.. like I search, right/ well, my keyword is matched against what…? I get pages, but I also get categories….. what’s going on here….
Equivalence – variants and preferred terms
Hierarchical – Broader and Narrower
Associative – “see also” references
I don’t know what you mean by “taxonomy of organization tools”
controlled vocabulary ^ thesaurus taxonomy ^ ^ spelling synonyms category keywords
or some such… you know, a hierarchal relation demonstrating diagram of the organization tools….
Authority list — lowest level — no hierarchy, just preferred terms, a way to tell the system “CA” is the same as “California”
Taxonomy — middle level — hierarchy, pulled from material, may have gaps if there is no content. You would be able to tell that San Francisco is a narrower term and California is a broader term. If there is no content relating to Santa Clara, then Santa Clara would not be a term. This is the highest level necessary
for most websites.
Thesaurus — Highest level — Peter Morville called this the “Rolls-Royce of controlled vocabularies” at a seminar I went to. It would attempt to include all California cities as a subset of California. In other words each city would have California as a broader term. It would also show related terms such as cities that are near each other, or something like that. Generally useful only to very large sites. By the way there are two kinds — pre-enumerative and post-enumerative (faceted), but don’t worry about that yet.
correctly!):
“Big Blue” is the same as IBM. This says “when you see ‘Big Blue’ it means IBM”
If you had a medical site used by both doctors who might use “Nyctalopia” and consumers who might look for “Night Blindness,” this would be a way to link them together.
My understanding is that “Thesaurus” was used for what we now call “Dictionary” and that Roget used it in 1852 as a way to give the user a choice among several terms. In the early 1950’s people started to use it in the opposite way. As a prescriptive limitation on the terms used.
I have a book here in my office that says “thesaurus” is a Latin form of a Greek word meaning “Treasure Store.” I like that!
Also, FYI…for me, the process of finding a keyword from my thesaurus and applying it to some content is “indexing.”
at which point I asked if I could reproduce this in the blog, and he graciously agreed. Thanks Mike!