Free software download for Thesaurus construction on a Windows PC only (!)
Subject Thesaurus
Construction
Aspects in Need of Consideration
The subject field
Does your subject have easily defined boundaries?
What are the core areas?
How often do new developments take place?
What are the peripheral areas?
Are there any existing schemes in the subject field?
What are the core areas?
How often do new developments take place?
What are the peripheral areas?
Are there any existing schemes in the subject field?
The Collection
What sort of documents/items/information objects do you have?
Books?
Electronic documents?
Reports or continuing resources? This may seem a strange thing to take into consideration but thinking about this will help you decide on the depth of indexing required. A full-text database, for example, will need greater specificity in the indexing terms.
Books?
Electronic documents?
Reports or continuing resources? This may seem a strange thing to take into consideration but thinking about this will help you decide on the depth of indexing required. A full-text database, for example, will need greater specificity in the indexing terms.
Language Considerations
Do the intended users of the thesaurus have any special language
requirements?
For example: will it be used by scientists who might prefer to use scientific terms, or will it be used by a user who is more familiar with everyday language usage?
For example: will it be used by scientists who might prefer to use scientific terms, or will it be used by a user who is more familiar with everyday language usage?
Thesaurus Users
Is the thesaurus intended for use by end users or information
professionals? If it is aimed at end
users it should be user friendly and the controlled language should be as
unobtrusive as possible. Use as many
natural language terms and forms as you can.
Questions, Searches and Profiles
Ask yourself about the sort of queries users will be making?
Will they be of a general nature or will they be very specific?
Your response will impact on the design of the thesaurus.
Will they be of a general nature or will they be very specific?
Your response will impact on the design of the thesaurus.
Resources
The really big question to consider is the amount of resources you have
at your disposal. Thesaurus construction
is a costly exercise.
Can your organisation afford it?
What about staff? Can you free up staff time so they have the time to do the job properly?
What about your access to thesaurus software?
Can your organisation afford it?
What about staff? Can you free up staff time so they have the time to do the job properly?
What about your access to thesaurus software?
Adaptation
Is there a thesaurus available in the field? If so, is it possible that it can be
adapted? This is a less costly option
than producing one from scratch. It may
not be an ideal solution but it may be a compromise that you can make.
Once the Issues Have Been Considered, What’s Next?
If you have considered the above issues and have decided the
construction of a new thesaurus is necessary, then remember that it cannot be
done in a quick and dirty fashion. A
professional thesaurus should conform to standards which have been set down by
the International Standards Organisation (ISO).
There are also standards in individual countries which need to be
met. These standards cover all aspects
of thesaurus construction such as word control, grammatical form, ambiguous
terms and the use of explanatory notes.
Once you have your facet analysis,
you are ready for the next stage in the process – turning it into a
thesaurus.
Vocabulary Control
Indexing terms
Let’s begin with the actual
indexing terms – preferred and non-preferred.
In the ISO standards an indexing term is described as being the
representation of a concept. This is not
a new idea as we have discussed this before in relation to subject indexing in
general. The representation can be made
by using one word or a combination of words.
A preferred term is that which is consistently used to represent a
concept. The non-preferred form is
usually a synonym (equivalent term). In
the literature this is also referred to as a non-descriptor.
Indexing terms are usually broken
down into two types:
·
Concrete entities
·
Abstract concepts
Knowing which category a term
belongs to is important. Concrete
entities are usually made up of things and their parts, eg: cars, gear levers,
or materials such as steel or plastic.
Abstract concepts cover actions and events, abstract entities, properties of things, materials and actions. An example of these might be strength, durability or management. They also include disciplines and sciences such as law or physics.
Abstract concepts cover actions and events, abstract entities, properties of things, materials and actions. An example of these might be strength, durability or management. They also include disciplines and sciences such as law or physics.
At this point you might still be
asking yourself why knowing which category a term is in is still
important. Well, knowing the category
helps you to decide on whether a term is going to be plural or singular in the
thesaurus, as well as helping to verify the validity of the facet
analysis. For example: in the English
language, concrete entities are usually nouns and if you can ask yourself how
many of the item you can have, then they are usually recorded in the plural
form. An exception to this, according to
Aitchison, is when you are dealing with body parts and then we have to modify
our thinking and use terms such as ‘mouth’ or ‘renal system’ – the singular
form. If you have concrete entities such
as ‘mercury’ or ‘water’, you can’t ask how many of them can you have so they,
so they are always recorded in the singular.
Complex isn’t it!
Spelling
You must decide to adopt a
particular version of the English language.
For example: American English or Australian English.
Punctuation
Punctuation should be avoided as
much as possible as it can cause retrieval problems. The hyphen is probably the cause of most
difficulties which crop up. If you leave
it out, what are you going to replace it with?
You must decide whether to leave a space or join the two words
together. Whatever your decision, make
sure you are consistent in your practice.
Homographs
These are words which have the
same spelling but have a different meaning.
For example: Cell (Biology) and Cell (Battery). The use of qualifiers in brackets as shown in
the previous sentence can help overcome problems of meaning.
Scope Notes
Scope notes should only be used
when absolutely necessary. They are used
to explain how you want the preferred term to be used or to explain how the
term is to be interpreted. Scope notes
should not be used to define terms on a regular basis. If your facet analysis has been correctly
carried out, you should not have to use a large number of scope notes.
Finishing Touches
When you have reached this point
in the construction process, there should be a series of hierarchically
structured facets which are ready to turn into a thesaurus. We would recommend that all the conventional
thesaurus relationships and their abbreviations are used:
Use
UF Used
for
BT Broader
term
NT
Narrower term
RT Related
term NB: RTs never come from the same
facet!
Following
through on turning the facets into a thesaurus is the easy bit! And with a bit of luck, you will have the
software to do it for you.
You might
like to look at the demonstrations available at the following site:
This is
quite an interesting way of testing your facet analysis. At Curtin, we used
this software in focus groups to assess reactions to terms in the new
enterprise wide classification scheme which will be used with the electronic
document management system. The groups found it quite fun to use and the added
bonus is that we could save groups responses for further analysis which made
our scheme more accurate and user friendly.
Some Extra Hints on Creating an
Indexing Language
DO consult colleagues at all stages and seek expert
help in areas of your subject which are technical.
DO record all important decisions so that it is not
necessary to go back and re-invent the wheel.
DO keep a sense of proportion. This is a job which cannot be done
perfectly. Indexing languages need to be
continually fine tuned.
DO be prepared to make the final decisions as the
information specialist. The information
professional is the one who knows about indexing, not the office expert on
widgets. Find out about widgets from the
expert for your widget manufacture thesaurus, then make decisions based on best
indexing practice.
DON'T keep too many previous drafts. Learn to recognise when a draft contains a
genuinely alternative analysis which might prove worth going back to from when
it contains a dead end.
DON'T get too hung up on the hard bits. Have a place to note them but concentrate on
getting the basic structure right.
Perhaps when that is achieved all will become clear.
DON'T expect to be done in a short time and don't let
the boss labour under the delusion that it can.
You will both be disappointed.
DON'T get downhearted.
Appendix
Retrieval from the APAIS Thesaurus explanation:
Non preferred terms: ND refers to
Preferred term: PT
Appendix
Retrieval from the APAIS Thesaurus explanation:
| Reference | Abbrev | Meaning |
| USE | Indicates the preferred term, e.g. Currency USE Money | |
| Note | Scope notes are used to indicate the meaning or application of certain descriptors. | |
| USED FOR | UF | Indicates the non-preferred terms which the synonymous preferred term encompasses, e.g. Wildlife UF Fauna |
| BROADER TERM | BT | Indicates the name of the class of which the term is a member, e.g. Consumption tax BT Taxation |
| NAROWER TERM | NT | Indicates members of the class represented by the term, e.g Women NT Aboriginal Women; Women NT Professional women |
| RELATED TERM | RT | Indicates concepts associated with the term but not related in a class membership way, e.g. Counselling RT Crisis centres; Counselling RT Social work |
Preferred term: PT
No comments:
Post a Comment