Australian Library and Information Association
home > publishing > aarl > 35.2 > full.text > Natural language thesaurus
 

AARL

Volume 35 Nº 2, June 2004

Australian Academic & Research Libraries

Natural language thesaurus: a survey of student research skills and research tool preferences

Victoria Redfern

Abstract This paper reports the results of a University of Canberra Library survey of student research knowledge, skills, tools and resources. Students are experiencing difficulties interrogating databases, the internet and library catalogues because of the lack of consistency in terminology and various methods of interrogation. This research was an investigation of the need for an online contemporary natural language thesaurus. This report provides evidence that students interviewed in this survey support the idea and it is recommended that further research on a semantic contemporary natural language thesaurus be conducted and focused on higher education in Australia.

Over the past ten years there has been an increase in various research tools such as databases, online journals, and the internet. Students experience difficulties using databases, library catalogues, online journals and the internet because of the lack of consistency in terminology, recognition of appropriate search terms and various methods of interrogation and this creates an inefficient and time wasting experience. In order to find better ways of helping students, the University of Canberra Library conducted a pilot study to investigate whether an online contemporary natural language thesaurus would assist students with their research. 'Natural language' is 'any naturally evolved language as opposed to artificial languages constructed.' [1] Accordingly, in order to evaluate the need for the contemporary natural language thesaurus, it was first necessary to evaluate student knowledge of library and database terminologies and research tool usage and skill.

Student research strategies

The keys to successful research are effective research strategies and a willingness to learn to explore all available possibilities in order to obtain reliable and comprehensive results.

Before commencing research it is essential for students to understand the objective, decide what subject and search headings cover the topic and to use a variety of resources effectively. [2] Accordingly, to find material in electronic databases, online journals, library catalogues and the internet, it is imperative to formulate appropriate search terms using key words and phrases. Controlled vocabulary searching and free-text searching are different methods of interrogation. A controlled vocabulary search interrogates a database by using words that exist in the built-in taxonomy of that database. A free-text search interrogates a database by using any word to find incidences of that word occurring in the database.

There are recent research technologies being developed such as natural language thesauri, and semantic searching. The natural language thesauri interrogates on terms based on commonly accepted speech vocabularies. Semantic searching interrogates on terms semantically linked to find associated materials on the internet as well as databases and other electronically authored resources.

The Library of Congress, database developers, and web page authors use different terminology. Consequently, students have to become conversant with different interrogation methods such as controlled vocabulary searching or free-text searching and the varying terminologies of research tools and databases.

Defining the research question, establishing search terms, definitions and descriptors and using these terms for interrogating the library catalogue, electronic databases, online journals and the internet is complicated by the differences in the thesauri, truncation, Boolean operators and wild cards as well as subject headings. Therefore, to research one topic researchers have to try different thesaurus terms and keywords according to the design of the database or library catalogue. Consequently, because of the differences between databases and their content, varying outcomes using the same search term will result.

Feitzer [3] says that traditional description and indexing is stretched to the extent that the concept of the 'main entry' has been replaced by a multitude of contributors such as database developers and authors who share responsibility for information and this in turn results in inconsistencies in terminology and the interrogation techniques required. Therefore, for consistency, a semantic webbed thesaurus using natural language is needed to enable interrogation across boundaries previously constructed by cataloguers, database developers and web page authors. This would help researchers by providing a simpler interrogation tool and enhance research methodology. Consequently, this leads to gaining required literature and information quickly. In order to evaluate if further research should be undertaken it was essential to conduct a survey to evaluate current student research skills and tools used.

A similar study to evaluate student research skills and whether the internet was found to be beneficial or frustrating was conducted by Lindsay and McLaren [4] at the University of Strathclyde in England. The survey focused on 35 participants who were studying for their Bachelor of Education in Design and Technology.

Before the University of Canberra Library survey two major problems that students had experienced were identified. These problems were that library and database terminology and differing interrogation techniques are required because of the design of library online catalogues, electronic databases and the internet.

Language, speech, terminology and ontology are constantly evolving. Database thesauri, research descriptors and Library of Congress classifications and subject headings are constantly added to, altered and deleted according to language and vocabulary accepted as natural language and being in common use. Consequently, because of the differences between Library of Congress subject headings, database designs, search terms, keywords and interrogation methodologies, there is a lack of consistency.

An example of inconsistency is shown in research term variations. The example is the contemporary natural language term, 'flight delay'. 'Flight delay' is not a Library of Congress subject heading. However, Academic Search Elite refers the user to 'avigation easements' whilst also having a spelling error. Additionally, acronyms such as 'SARS' are accepted as contemporary natural language but alterations and additions to the Library of Congress subject headings can take a long time to appear in directories.

Table 1
research term variations

Research term Library of Congress classifications 'subject headings' Academic Search Elite 'subject terms'
Flight delay no See avigation (sic) easements
Day dreaming no use fantasy
Debt management no yes
Fire back-drafts no no
Flexible workplace practices no no
SARS no yes
SMS no Use text messages

There are research terms such as 'fire back-drafts' and 'flexible workplace practices' in existence as contemporary natural language that are not classified in the Library of Congress subject headings or as subject terms in databases such as Academic Search Elite.

Besides typographical errors such as 'avigation', other examples of differences are: Ingenta links to Merriam Webster Collegiate Dictionary for keywords, Expanded Academic has no subject heading list but matches words in articles (free-text), some databases have boolean operators and others operate on wild cards.

Natural language thesaurus

The Library of Congress designs their data elements in a standardised subject heading format. Database developers such as Expanded Academic design their elements interrogated by free-text querying. Field [5] supported natural language research results because adult educators are paying more attention to those students who are from varying ethnic backgrounds and those for whom English is not their natural tongue. Field sees one way of helping these students through new forms of communication media is through tools such as the Online Thesaurus of Natural Language.

Knapp, Cohen and Juedes [6] conducted research on the use of natural language interrogation to determine the potential value to the humanities and social sciences. This research developed because the humanities cross many subjects and many synonyms may be used to cover one concept. They found that even though databases may be interrogated using controlled vocabularies and free-text, there was significant evidence that free-text produced unsuccessful search results because of 'the inability of the searcher to think of all the terms an author may have used'. Knapp, Cohen and Juedes [7] also found that the use of natural language querying did increase the number of relevant citations and this included databases with controlled vocabularies. Additionally, they add that as more full-text databases have become available and there is increased growth in the internet, natural language searching has become 'increasingly vital to ensure retrieval of relevant information'. This being so, to develop a contemporary natural language thesaurus that covers all academic disciplines would be time and resource expensive. Naturally, it would be difficult to develop the thesaurus to cover all resources. However, with the increasing use of digital tools, by concentrating only on electronic formats and constructing the thesaurus one discipline at a time, such as the humanities and social sciences initiative of Knapp, Cohen and Juedes, [8] it may be possible to develop an online contemporary natural language thesaurus.

Methodology

Academic research not only enhances and promotes the knowledge base of available information, it also provides opportunities to learn and develop research methodologies, skills and excellence. The University of Canberra Library survey was designed firstly to evaluate student knowledge and skills in using different search terms, keywords, descriptors and subject headings for the library catalogue and databases. Secondly, it was to evaluate student use of research tools and research methodologies. The final, and major question of the survey was to establish that if a contemporary natural language thesaurus was available and linked to library catalogues, databases and web pages, would students think this would assist their research and why.

Before the survey commenced, the following limitations were noted: the survey was not expected to represent all students studying at the university as there are a number who are studying by distance education, flexible delivery and online via WebCT and additionally, some students do not attend the library.

The purpose of the University of Canberra College is to prepare students for university studies. Their students were excluded as there was the assumption that they were not yet studying at the same level as undergraduate, postgraduate or masters level students and therefore may not represent the overall cross section of library users. It was expected that some students would not be interested in the survey. It was hoped that the survey would obtain a sample representative of cultural background, gender and age, and from both undergraduate and postgraduate respondents.

The survey was held in the University of Canberra Library foyer during weeks 7 and 10 of Semester one of 2004. The study was both quantitative and qualitative and data were collected through questioning and through numerical sequencing.

The interview used ethnographic semi-structured questions and questions that utilised numeric ranking. This was to better understand student knowledge, to extrapolate further information and to allow participants to add any additional comment or opinion. Participants approached were of any nationality, gender or age. The survey was conducted during weekdays, weekends and evenings in order to gain a representative sample. Participants interviewed were undergraduate, postgraduate or masters level students. Twenty students were interviewed. Initially it was intended that at least 35 participants would be interviewed but because a consistent and definite pattern was found it was decided the number would be reduced to 20. Each interview was planned for ten minutes but this was flexible and often increased for participants who wanted to provide additional input for the contemporary natural language thesaurus question.

The survey instruments were in four sections: student demographics, research terminology skills, research tool preferences and online thesaurus of natural language search terms.

Student demographics

Student demographic questions were designed to establish the student level of experience. There were four questions. The first was to confirm that they were enrolled at the University of Canberra, the second was their full time or part time status, and the third and fourth were to establish how long they had studied at UC and to establish past research experience.

Result Although the participants were randomly selected, fewer undergraduate students were represented than had been expected. Those who were included and in their first semester at the University of Canberra had previous experience at CIT, technical college, and the University of Canberra College or had previously dropped out during their first undergraduate semester at another university.

Research terminology skills

Research terminology skill questions were designed to establish student understanding of basic library and database terminology as well as gaining information on their research methodology.

In order to ease the participant into the interview three questions were asked. These questions were: 'How do you go about starting an assignment?' 'Why do you do it this way?' and, 'What have you learned by doing it this way?' These three questions provided the primary answers that they first look for keywords and phrases and then follow on with searching on the internet and conferring with fellow students. Because participants being interviewed were now thinking about search terms, this was used as an introduction to the next set of questions on research terminology skills.

Research terminology skills comprised four questions that required the participant to correctly identify the terms 'search term', 'subject heading', 'descriptor' and 'keyword'.

'Search term' was defined as 'a word or phrase input by the user to find those records on the database that contain that term'; [9] 'subject heading' was defined as 'the word or group of words under which books and other material on a subject are entered in a catalogue in which the entries are arranged in alphabetical order'; [10] 'descriptor' is 'an elementary term used to identify a subject'; [11] and 'keyword' [12] is defined as 'grammatical element which conveys the significant meaning in a document'. Although these terms were used as a basis to define meaning, the participants were not expected produce these definitions as their answer. The intention of the question was to gain an indication whether they knew where the term is used and how they can relate that term to their research.

The percentage figures when averaged mean that, of the total 20 participants, 52 per cent understood the terms and 48 per cent did not understand the terms.

Table 2
Student knowledge of terminology

Term Did not know term
number/per cent
Unsure
number/per cent
Vaguely sure
number/per cent
Pretty sure
number/per cent
Knew term
number/per cent
Search term 8/40 4/20 2/10 2/10 4/20
Subject heading 4/20 2/10 3/15 5/25 6/30
Descriptor 16/80 1/5 1/5 2/10 0/0
Keyword 0/0 0/0 1/5 2/10 7/85

Note: The percentage in each row is the percentage of all participants who understood the term for that row.

Locating search terms

Locating search terms consisted of a list of twelve items from which the participant selected those they use. Participants were asked to nominate the tool or tools that they have used for locating search terms. They were asked the same question for each tool. The question for print format tools was: 'Do you look in a thesaurus on the library shelves or somewhere else to find your search terms?' The question for digital format tools was: 'Do you use the electronic databases on computers such as here in the library, at home or work to find your search terms?' The questions were asked the same way for the print format and digital tools. The answers were noted on a survey sheet by the interviewer.

The three most frequently mentioned tools used by participants for finding search terms are:

  • Internet (Google or similar web search tool): 70 per cent
  • Ask other students: 60 per cent
  • Electronic databases: 50 per cent
  • Equal fourth are:
    • Dictionary of research terms: 45 per cent
    • Online journals and serials: 45 per cent
    • Browse the library shelves: 45 per cent

Additionally, in ascending order, the six least used for finding search terms are:

  • Newspapers: 1 student
  • Printed journals: 1 student
  • Television documentaries: 1 student
  • Index of text books: 3 students
  • Directory of research terms: 3 students
  • Thesaurus: 3 students

Table 3
Locating search terms

Tool Total n=20
number/per cent
Thesaurus 3/15
Directory of research terms 3/15
Dictionary of research terms 9/45
Electronic databases 10/50
Online journals and serials 9/45
Online library catalogue 5/25
Browse the library shelves for ideas 9/45
Internet (Google or similar web search tool) 14/7
Other university web sites 6/30
Ask library staff 8/40
Ask lecturer/tutor 9/45
Ask other students 12/60
Newspapers 1/5
Index of text books 2/10
Printed journals 1/5
Television documentaries 1/5

Note: The above list of tools for locating search terms is in the same order as the participants were interviewed.

Research tool preferences

Research tool preferences investigated what research tools students use.

The analysis showed that all of the participants surveyed begin their research in the same way. After looking for keywords or phrases in their research question they then refer to their prescribed texts and other books for related materials on their topic. They then search the library catalogue, databases and the internet and gather the materials together. This was a consistent theme of all participants. A second recurring theme the participants revealed was that their main research tools are digitally based, such as databases and the internet. Additionally, before undertaking their research they consistently attempted to obtain textbooks in the library, however, because these were often not available participants would then turn to digital resources.

The question was asked: 'What tools do you use most frequently for your research?' Participants were shown the research tool list and asked to place them in order of importance and advised they could add additional tools to the list.

Table 4
Research tool preferences

Tool Importance order Total respondents n=20
per cent
Internet 1 65
Library books 2 50
Online serials 3 45
Electronic databases 4 35
Others - Web CT 5 10
Printed newspapers and journals 6 5
Other libraries 7 5

The tool most often used is the internet (65 per cent) and 'Other Libraries' came last (5 per cent).

Two participants who were studying law said they only accessed books in the library and the law databases. Two were studying landscape architecture and said they did not access databases because the majority of their work was visual. Conversely, the Lindsay and McLaren [13] study revealed that although their participants were studying design they still used the internet for benchmarking, market research and consumer appraisal. Therefore, the discipline being studied may not have a bearing on the tools that participants use because many academic disciplines cross over into other areas.

Some participants said that the major benefit of online tools such as databases was that information was up-to-date. Consequently, because it takes longer for books to be published, there was no guarantee that libraries would add them to their collection. Digitally available research findings were more readily available than library books. Conversely, some participants preferred to use books because they were more comprehensive than electronic materials and they also perceive that to do research without books was non-academic. Two participants said that because they are tactile people, they learn better from books and find them easier to handle than computer keyboards. Overall, most participants felt that obtaining books is often difficult and when they are available they are often out of date. One participant said that obtaining a pass grade would be difficult without tools such as databases, the library catalogue and the internet because these provide a broader knowledge base. A second participant preferred the internet because of its flexibility and greater amount of information. A third participant preferred the internet and databases for their 'time saving and resource convenience'.

Table 5
Hours per week spent using research tools

Tool Total users Total hours
In the library using books and journals 18 91 hours
Electronic databases 16 37 hours
Internet 19 200 hours

Two students did not use the library at all, four did not use electronic databases and one did not use the internet.

The online thesaurus of contemporary natural language search terms

The final question of the survey was to establish if students felt an online thesaurus of contemporary natural language search terms would assist their research. This question was designed to find out whether further research into this was warranted. The question was asked using terms that are not listed in the Library of Congress subject headings [14] nor The Contemporary Thesaurus of Search Terms and Synonyms. [15]

The concept of an online contemporary natural language thesaurus linked to research materials on the library web page was explained to each participant.

Participants were asked: 'If search terms were more like the words we use in everyday conversation, such as 'day dreaming', 'debt management', 'fire back-drafts', 'flexible workplace practices' and 'flight delays', and these were available in an online contemporary natural language thesaurus, do you think this would make your research task easier, and why?'

Eighty per cent of the participants said 'yes' and 20 per cent said 'no'.

The two who said 'no' said that if students study in an academic environment they must use academic language as the rigour and discipline of academic research could otherwise be compromised. Another two students said that the cost of establishing the thesaurus and putting it to use could not be justified. However these same four also said that for students for whom English was their second language, a contemporary natural language thesaurus would be a good idea because it would not only assist them with finding search terms but it would also help them with their English. Of the 80 per cent who said 'yes', 37 per cent of these offered the opinion that even though they may not find it useful, it could benefit students when their English language skill is not strong. Eighty percent of participants said that contemporary natural language was preferable to library, database and internet 'jargon'.

Discussion

The University of Canberra Library survey has produced some interesting results in all three sections of student demographics, research terminology skills and research tool preferences. The results of the final question about the online thesaurus of contemporary natural language search terms are surprising.

Student demographic results were unexpected in that there was a very strong representation of participants who had studied previously at tertiary level and a number of these were currently studying at postgraduate level. It is therefore of concern that given the sample and the amount of experience in research, a large number of participants observed during the interview appeared uncomfortable and also stated that they were embarrassed that they were unfamiliar with basic terminologies and had trouble establishing search terms. However, even though a number of participants did not know the library and database 'jargon', they still believed they knew how to commence research. Most participants said that frequently their search of databases, the library catalogue and the internet produced many results that were unrelated and useless for their studies.

Research terminology skills analysis has revealed some interesting statistics that display student lack of understanding of library and database search terms and it is surprising that knowledge of terminology was not higher. This suggests that library research skills training should focus more on terminology and teach students how to develop or find search terms before they learn how to interrogate databases, the internet and the library catalogue.

Interestingly, all participants used a variety of tools for locating search terms. It is also noted that one student nominated newspapers and a second student nominated television documentaries for locating search terms. Of further interest is that the usage of the traditional methodology of using Library of Congress subject headings and discipline-based directories to locate search terms was not popular.

Even though the majority of students use digital tools for locating search terms, when they do begin searching for materials their preferences turn firstly to the internet and then library books. Whilst students consider library books are often not available or out of date, they still see them as being more comprehensive than the electronic databases. The most interesting statistic is that 65 per cent of participants prefer to use the internet for their research. Do we really know if the internet provides students with the materials they need, and that they find it accurate, authentic and easy to use?

The amount of time students spend using the library, electronic databases and the internet is not important as long as that time is spent locating quality, accurate and authentic results. However, it is also possible that students are not finding the materials they need. This may be because of their interrogation techniques. Could this be because they are familiar with one search methodology rather than another, and therefore use only one database or search engine and limit their results? It is possible that they are using contemporary natural language for their interrogation, and after not obtaining successful results, they then revert to the internet where contemporary natural language is prominent.

The online thesaurus of contemporary natural language search terms question produced interesting results. The participants' opinion showed they were fully in favour of the idea, especially if it linked to databases and other resource materials. Eighty percent of participants are in favour, and 20 per cent are partially in favour. Further research is needed to investigate the possible realisation of the contemporary natural language thesaurus as it would be highly beneficial not only to domestic students, but also to international students by enhancing academic skills and knowledge of the English language which in turn will assist research methodology and lifelong learning.

Recommendations

There are three recommendations.

  • The first recommendation is that this survey should be conducted at the University of Canberra Library again in twelve months time with additional questions pertaining to the use of the AARLIN portal and the interrogation methods that students employ in using databases and the internet.
  • The second recommendation is that an investigation should be conducted on upgrading library catalogues, databases and web pages within the context of the viability of implementing semantic web technologies.
  • The third recommendation is to conduct research on the educational impact of semantic web technology and investigate the possibility of developing a contemporary natural language thesaurus.

Programming that represents the semantics of documents within web applications will provide a means of intelligent research that will more closely parallel the 'natural language' research processes of humans. [16] Developing a contemporary natural language thesaurus that interrogates all databases, thus saving researchers having to use different interrogation methodologies in different databases, should promote efficiency and enhance search success.

Conclusion

This paper has reported the results of a University of Canberra Library survey of student research knowledge skills, tools and resources, and has also investigated and evaluated usage patterns and abilities and has made recommendations for further investigations and research. There is sufficient evidence in the survey of the University of Canberra Library and the literature that higher education students need to have digital research tools and methodologies made more consistent and complementary. A semantic contemporary natural language thesaurus could assist in this.

This research has suggested that students lack knowledge of terminologies and of research resources and use tools that they are comfortable with in order to obtain information. Although the Lindsay and McLaren [17] research came to the same conclusion, the survey results also showed that even though they value the internet as a search facility, they also felt it was too easy for the researcher to be side tracked. Therefore, with semantic technology used for academic research on the internet, researchers should obtain more focused results.

Finally, because the great majority of students interviewed in the University of Canberra Library survey supported the idea of an online contemporary natural language thesaurus, the implementation of a semantic contemporary natural language thesaurus linked to databases and other resource tools is worthy of further investigation.

Notes

  1. 'Oxford Dictionary' 2nd ed Oxford Clarendon Press
  2. S Irvine, 'Essential information skills in a busy library' School libraries in Canada, pp33-34, vol 22, no 4, 2003
  3. W Feitzer, 'Integrating metadata frameworks into library description' in C F Thomas (ed) Libraries, the internet and scholarship: tools and trends converging, New York, Marcel Dekker
  4. W Lindsay and S McLaren 'The internet: an aid to student research or a source of frustration?' Journal of educational media, pp115-128, vol 25, no 2, 2000
  5. J Field 'The adult learner as listener, viewer and cybersurfer' in J Field (ed) Electronic pathways: adult learning and the new communication technologies, Leicester, NIACE
  6. S D Knapp, L B Cohen, D R Juedes 'A natural language thesaurus for the humanities: the need for a database search aid', Library quarterly, pp406-430, vol 68, 1998
  7. Ibid
  8. Ibid
  9. 'Harrods librarians glossary and reference handbook', 9th ed, R Prytherch (comp) Aldershot Gower
  10. Ibid
  11. Ibid
  12. Ibid
  13. Lindsay and McLaren op cit
  14. 'Library of Congress subject headings', 26th ed, Washington Library of Congress
  15. 'The contemporary thesaurus of search terms and synonyms: a guide for natural language computer searching', 2nd ed, Phoenix Arizona, Oryx Press, 2000
  16. F Cervone 'W3C delivers standards for the semantic web', Information today, http://www.infotoday.com/newsbreaks/nb040216-1.shtml, referenced 4 May 2004
  17. Lindsay and McLaren op cit

top
ALIA logo http://alianet.alia.org.au/publishing/aarl/35.2/full.text/redfern.html
© ALIA [ Feedback | site map | privacy ] pc.rm 11:59pm 1 March 2010

Warning: Unknown(): open(/tmp/sess_7c1e75b7cbb3cc105187e9420a18379b, O_RDWR) failed: No space left on device (28) in Unknown on line 0

Warning: Unknown(): Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/tmp) in Unknown on line 0