ISKO Italia
Home page | Iscrizione | Documenti | Bibliografia | Collegamenti |
---|
<frasdon
research presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005;
adaptation of a master thesis in Cooperation and development at Pavia university;
awarded with the EADI prize 2006;
later version published in European journal of developmental research, 19: 2007, n 2, p 327-251
The methodology applied during the design of a thesaurus can determine its variety in terms of structure. The differences are usually related to the formal part of the structure (external), or the intrinsic part (semantic, classificatory). Many other elements can play a key role in defining the variety of ways in which the thesauri can be presented: practical choices, material constraints, planning choices, variables peculiar to a specific culture.
In this work we intend to apply conceptual tools based on philosophical and anthropological knowledge in order to elucidate the peculiarity of thesauri in terms of culture. Our statement is that the thesaurus as a KOS is the outcome of a culture, specifically the western culture. For this reason it is critical, following Foucault's invitation, to examine the thesaurus as a precise modality of the arrangement of knowledge, based on regulating codes that are the results of a historical period and cultural pattern rather than a single definitive assertion. From this, the most important indication of this philosopher is to scrutinize the conditions of possibility of our "discourse"[5] . Which kind of epistemological rules are at the origin of a conscious classification claiming, for instance, that a cat and a dog are less similar when compared to two greyhounds? On the basis of what kind of table, what space of identity, similitude or analogy have we chosen this categorisation? Why have we fallen into this habit? It is not a matter of a-priori coherence, obtained by linking together simple consequences, but it is an operation connected with combination and isolation, analysis and adaptation of empirical contents. This way a culture selects and determines a "system of elements", a definition of segments in which the affinities and the differences can appear and change over time. This order represents the internal law of things and at the same time, the secret grid in which the empirical contents can look at each other in a coherent way. With our objectives in mind, it is important to underline that the fundamental codes of a culture are the basis on which language, perceptive schemes, techniques and values are constructed. And, more importantly, they define from the outset the empirical orders in which a human being will act and in which he will live. On the other hand, scientific theories or philosophical interpretations will explain the reason of the existing order, the underlying law that governs this order and the motivation of this type of choice.
Foucault states that a culture can observe itself by exploring the existing space between its code and the underlying scientific theories. Between these two areas of knowledge there exists, as Foucault supposes, a more confused intermediate space, difficult to analyse, and one in which a culture "straying from the empirical orders that its codes prescribes, creating a distance between them, ceases to be passively crossed […] in order to understand that these empirical orders aren't the only possibilities or that they are any better" [6]. Taking into account these suggestions we have tried to define a term, deep-rooted in the western epistemology, which represents an ambiguous concept in terms of cultural relativism. In consideration with our field of research [7] , we have chosen the term development. Following Gilbert Rist's analysis, our motivation was to "scrutinize the aura of self-evidence surrounding a concept which is supposed to command universal acceptance but which, as many have doubtless forgotten, was constructed within a particular history and culture" [8]. The investigation was directed towards some of the most important thesauri that are located on the border between different cultures. Chapter II of this article states the core part of our research and the results that we have obtained. It explores the use of the term development in five online thesauri of International Organisations: AGROVOC (Food and Agriculture Organisation), EUROVOC (European Union), OECD MACROTHESAURUS (Organisation for Economic Co-operation and Development), UNBIS (United Nations), UNESCO (United Nations Educational, Scientific and Cultural Organisation). The first important result to underline is that by including or omitting a term, a thesaurus determines the trend of a piece of research. Often "development" is not used as a term (i.e. a preferred term) but as a non-preferred term (i.e. referable to another term more peculiar to the research). In order to understand which kind of implications this methodological choice has on our navigation we have constructed five "relations' tables" of the analysed thesauri.
Chapter III presents our final considerations. Hypothesizing the arrangement of the knowledge within a documentation centre concerning relationships between western countries and non-western, for instance Europe and ACP [9] countries, what kind of syndetic structure (i.e. cross-reference connection between terms) would be appropriate?
To understand this question we can attempt to view a thesaurus in anthropological terms. Admitting that a thesaurus is the result of a particular history and culture, what credence is accorded to others points of view? What kind of result can be useful to the non-western user (e.g. African or Caribbean) in order to meet with his semantic constellation of the term development, when taking into account the fact that another guarantee in ensuring the consistency between the thesaurus and the documents is the so called user warrant (users' most frequently asked questions to the knowledge system)? How we can define the most precise classification criteria? A thesaurus can be likened to a litmus paper of the western discourse on development. Due to the fact that this KOS represents the documentation in existence in archives, it is determinant, in our opinion, to display to a non-western user that the term development is a product of western culture.
This article does not try to solve the problem of translatability between different cultures but attempts to underline a simple supposition: the western concept of development (and its semantic constellation) has been used over time as a powerful tool to map the world in "developed" and "underdeveloped"(i.e. western/non-western) areas. When this division was established, the underdeveloped world missed out on the opportunity to express what development is in other epistemological terms. We believe that this division is obsolete. It is now possible to observe that the conditions of truth, which have constituted what was or was not acceptable for western culture, are changing. The process of thesaurus evaluation can enhance the value of the thesaurus in terms of usability, scope, precision and recall. Structural, formative, observational and trans-cultural comparative evaluation must be applied in the assessment of an existing thesaurus or the construction of a new one.
In this work special attention is paid to the results obtained by a generic external user (or "end user") in browsing a thesaurus. End-users are not likely to be experienced in the jargon and complexities of online information retrieval, for this reason it is relevant to explain few points before approaching the online thesauri of the international organizations. This section provides firstly a definition of thesaurus. The following paragraphs, 1.2 and 1.3, are dedicated to the clarification of: the purposes and structure of a multilingual thesaurus and the practical operations that an end user will carry out in using this tool.
A thesaurus, in general terms, can be defined as a classification tool to assist libraries, archives or other centres of documentation to manage their records and other information. This tool is designed to facilitate users to identify preferred (or authorized) terms for classifying and titling records and to provide a range of paths to reach these terms. The thesaurus also facilitates strategies for retrieving documents and reduces the probability of an unsuccessful research, or one resulting in a confusing or irrelevant outcome. This functionality is achieved by establishing paths between terms. The following part of this chapter will be dedicated to this argument.
The establishment and development of a thesaurus is generally arranged in accordance with the standards of ISO (International Standards Organization), which are officially recognised at international level [10] . The guidelines for the construction and management of a multilingual thesaurus, ISO 5964:1985, refer to the following contents: definitions, abbreviations and symbols, vocabulary control, establishment of a multilingual thesaurus (general problems, management decisions, language problems), establishment of equivalent terms in different languages, other language problems, relationships between terms, display of terms and relationships, form and contents of a multilingual thesaurus, organization of work. The definition of thesaurus supplied by these guidelines is the following: a thesaurus is the vocabulary of a controlled indexing language, formally organized so that the a-priori relationships between concepts (for example as "broader" and "narrower") are made explicit. In addition to this explanation we find another definition, related to the multilingual thesaurus, which is: a thesaurus containing terms selected from more than one natural language. It displays not only the interrelationships between terms, but also equivalent terms in each of the languages covered.
Next to these international standards we find different types of national standards that offer other types of guidelines for the construction of a thesaurus [11]. The shades of meaning between these national standards are generally related to aspects peculiar to a language (e.g. the semantic structure), and, consequently, are reflecting different choices in displaying the thesaurus (e.g. using, for a term, the singular or plural). Each of these national standards provides a peculiar definition of thesaurus. For instance, the American National Standard Institute. ANSI Z.39.19 characterizes a thesaurus as a controlled vocabulary arranged in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicators, which must be employed reciprocally. The Australian Standard for Records Management defines a thesaurus as an alphabetical presentation of a controlled list of terms linked together by semantic, hierarchical, associative or equivalence relationships. Such tool acts as a guide to allocating classification terms to individual records. The Association Française de Normalisation labels the thesaurus as a vocabulaire contrôlé et dynamique de termes (descripteurs et non-descripteurs), obéissant à des règles terminologiques propres et reliés entre eux par des relations sémantiques. And so on.
It is now important to focalise on one definition of thesaurus, the one provided by the International Standards, and to deepen the analysis of the concepts it contains. A thesaurus firstly is "the vocabulary of a controlled indexing language". This statement implies the comprehension of: 1- what it means to index in the field of library science, and 2- what an indexing language is.
The first point can be explained bearing in mind the material (documents or resources) contained in a library or an archive. The substantial amount of items that are composing, for instance, a library, needs to be organized. In this sense it is determinant to create indexes that follow precise criteria (e.g. author, subject, collocation) and that provide synthetic representations of the documents. Catalogues represent the indexing of really existing documents. The indexing of a library can be: descriptive (taking into account the form of the existing documents, i.e. author/title/pages) or semantic (taking into account the contents of documents). The semantic indexing, properly called subject indexing, can be made by terms (such as in the case of a thesaurus), by subjects, or by classes (by using the Dewey Decimal Classification, the Universal Decimal Classification, the Library of Congress Classification or the BC2 Bliss Classification). The process of indexing documents requires a previous definition of preferred terms, which constitute a controlled vocabulary. An indexing language is used for the representation of concepts dealt with in documents and for the retrieval of such documents from an information storage and retrieval system.
A thesaurus is "formally organized so that the a-priori relationships between concepts (for example as "broader" and "narrower") are made explicit". In the following section of this article we will explain in details the structure of a thesaurus. To introduce the argument we can say that the thesauri are lists of terms that define single concepts. The terms are connected by cross-references that define a syndetic structure internal to the thesaurus. Cross-references are of three types: (a) equivalence relationship, beginning with the word see or USE, which leads to one or more descriptors that are to be used instead of the term from which the cross-reference is made; (b) associative relationship, beginning with the word see also, or related term (RT), which leads from one descriptor to other descriptors that are related to or associated with it in the context of a thesaurus; (c) hierarchical, beginning with the words broader term (BT) or narrower term (NT) which represent generic and specific relationships, respectively. It is also possible to display hierarchical relationships without using the cross-references.
Concerning the definition of a multilingual thesaurus, a thesaurus containing terms selected from more than one natural language, we must underline a few points. The first is related to the meaning of "natural language". The expression "natural language" refers to a language used by human beings for verbal communication. The thesaurus is used world-wide, hence it is necessary to translate this tool into as many languages possible in order to make it easier for the users to search information sources in their own language. All the languages that compose a multilingual thesaurus have equal status: each descriptor in one language necessarily matches a descriptor in each other language. This way, the thesaurus displays not only the interrelationships between terms, but also equivalent terms in each of the languages covered. What it is now important to underline is that constructing a multilingual thesaurus always implies the choice of a starting language. This decision will have several implications on the "semantic field" covered by the thesaurus. In this article we can't deepen the problems related to the translatability between different languages, notwithstanding this we can stress a point. The indexing operations imply always a standardization of forms and contents of the existing documents. In this sense we can hypothesize that the definition of a starting language will have a fundamental importance in defining the choice of the preferred terms and, furthermore, will determine the semantic reference for all the languages that compose a thesaurus. In this sense, the need to establish similarities between different languages can determine an impoverishment of the semantic constellation peculiar to each language.
Purposes
We have already anticipated, in the introduction of this paper, the main purposes that are served by a thesaurus. To summarize our statements we can now affirm that the functions provided by the this tool are, at least, four:
Structure
A thesaurus displays through its structure, properly called syndetic structure, the cross-references among terms. The cross-references are of tree types: 1- equivalence (synonymous) relationship, 2- associative relationship, 3- hierarchical relationship.
It is also possible to consider the Microthesaurus relationship as another relationship among descriptors. This section provides a general explanation of these types of relations.
The equivalence relationship
The equivalence relationship[12](also called synonymous or preferential) concerns the relations among terms that are considered equivalent inside the thesaurus, i.e. they represent the same concept. When two or more terms express the same concept one of these is selected as the preferred term, i.e. the descriptor. The descriptor in effect substitutes for other terms expressing equivalent or nearly equivalent concepts. A cross-reference to the descriptor should be made from any synonymous or quasi-synonymous that may function as an entry term for the user.
The equivalence relationship between descriptors and non-descriptors is expressed by the following conventions:
U or USE which leads from a non-preferred (entry) term to the descriptor and UF or USED FOR the reciprocal, which records entry terms leading to the descriptor.
The equivalence relationship covers three basic types of terms:
a- synonyms, b- lexical variants, c- quasi-synonyms.
The hierarchical relationship
This basic relationship[13] is the primary feature that distinguishes a systematic thesaurus from an unstructured list of terms, such as a glossary. It is based on a ranking from a superior to a subordinate position. The terms that are superior in a category are at a superordinate level and terms that fall under or below a category are at a subordinate level. The broader term represents a class or a whole, while the narrower term refers to components or parts of the broader concept.
The following indicators show the hierarchical relationship between descriptors:
BT (Broader Term), a label for the superordinate descriptor (i.e. between a specific descriptor and a more generic descriptor).
NT (Narrower term), a label for the subordinate descriptor (i.e. between a more generic descriptor and a more specific descriptor).
The associative relationship
An associative relationship[14] is established to indicate that a term has similarities with other concepts. A related term relationship alerts users to the fact that other information of interest may be classified under a different, but related, set of terms. Related terms relationships are only established between preferred terms.
The associative relationship is not hierarchical but symmetrical; it is a thesaurus convention that term with this kind of relationship at the same hierarchical level. This relationship is generally indicated by the abbreviation RT (related term).
The microthesaurus relationship[15]
A Microthesaurus is a subset of a thesaurus converting a limited range of topics within the domain of the thesaurus. A microthesaurus may contain highly specialized descriptors that are not of the broad thesaurus. The descriptors are accompanied by a reference to a microthesaurus, introduced by the abbreviation (MT) to show to which microthesaurus or microthesauri they belong. Such descriptors should map to the hierarchical structure of the broad thesaurus. A microthesaurus is internally consistent with respect to relationships among terms.
In the following tables, extracted from the Australian Standard for Records Management, AS ISO 15489 – 2002, we will resume and explain the tree most important relationships that are established between terms in a thesaurus.
Table 1- Relationships established in a thesaurus
1. The thesaurus as a particular form of Knowledge Organisation System
1.1. Definition of thesaurus
1.2. Purposes and structure of a multilingual thesaurus