Search results
1 – 10 of over 8000Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…
Abstract
Purpose
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.
Design/methodology/approach
On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.
Findings
The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.
Originality/value
The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.
Details
Keywords
David Roberts and Clive Souter
This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information…
Abstract
This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of ‘concept strings’ from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.
Details
Keywords
Judit Gárdos, Julia Egyed-Gergely, Anna Horváth, Balázs Pataki, Roza Vajda and András Micsik
The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for…
Abstract
Purpose
The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for Social Sciences (TK KDK) in Budapest. It explores the use of artificial intelligence (AI) in producing, managing and processing social science data and its potential to generate useful metadata to describe the contents of such archives on a large scale.
Design/methodology/approach
The authors combined manual and automated/semi-automated methods of metadata development and curation. The authors developed a suitable domain-oriented taxonomy to classify a large text corpus of semi-structured interviews. To this end, the authors adapted the European Language Social Science Thesaurus (ELSST) to produce a concise, hierarchical structure of topics relevant in social sciences. The authors identified and tested the most promising natural language processing (NLP) tools supporting the Hungarian language. The results of manual and machine coding will be presented in a user interface.
Findings
The study describes how an international social scientific taxonomy can be adapted to a specific local setting and tailored to be used by automated NLP tools. The authors show the potential and limitations of existing and new NLP methods for thematic assignment. The current possibilities of multi-label classification in social scientific metadata assignment are discussed, i.e. the problem of automated selection of relevant labels from a large pool.
Originality/value
Interview materials have not yet been used for building manually annotated training datasets for automated indexing of scientifically relevant topics in a data repository. Comparing various automated-indexing methods, this study shows a possible implementation of a researcher tool supporting custom visualizations and the faceted search of interview collections.
Details
Keywords
Koraljka Golub, Marianne Lykke and Douglas Tudhope
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social…
Abstract
Purpose
The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval.
Design/methodology/approach
Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings.
Findings
The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.
Originality/value
No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Details
Keywords
The findings of a survey of Scottish university, central institution and college of education libraries to assess present and planned subject access to their catalogues and…
Abstract
The findings of a survey of Scottish university, central institution and college of education libraries to assess present and planned subject access to their catalogues and whether online catalogues are likely to improve subject access are reported. The results are analysed and the findings discussed in relation to published studies of subject access in online catalogues. It is concluded that greater attention needs to be paid to subject access both by librarians in specifying automated systems and by system suppliers in responding to specifications.
States that as use of networks becomes more innovative and widespread in higher education libraries, current approaches to the organization of network‐accessible resources reveal…
Abstract
States that as use of networks becomes more innovative and widespread in higher education libraries, current approaches to the organization of network‐accessible resources reveal flaws. Moving forward from the recommendations of the Follett Report, and adopting an approach which seeks to redefine conceptually conventional practices and standards the study examines, from a technical services perspective, issues and approaches relating to the development of existing cataloguing rules and practices, and machine‐readable standards, and proposes these standards as the most effective means of enhancing accessibility to electronic resources. Characterizes the current period as one of organizational, technological and conceptual transition, and addresses the broader issue of academic network‐accessibility in the local, regional, national and international context. Additionally, identifies the challenges to and implications for conventional, and future, technical services operations of these trends.
Details
Keywords
VINE is a Very Informal NEwsletter produced three or four times a year by the Information Officer for Library Automation and financed by the British Library Research and…
Abstract
VINE is a Very Informal NEwsletter produced three or four times a year by the Information Officer for Library Automation and financed by the British Library Research and Development Department. It is issued free of charge on request to interested librarians, systems staff and library college lecturers. VINEs objective is to provide an up‐to‐date picture of work being done in U.K. library automation projects which has not been reported elsewhere.
This article describes the design, use and evolution over four years of INFO, a Cardbox‐plus based file containing bibliographic and other data relating to the computer and…
Abstract
This article describes the design, use and evolution over four years of INFO, a Cardbox‐plus based file containing bibliographic and other data relating to the computer and telecommunications industries. INFO was designed and implemented by the author who provides an information service to the UK‐based consultantcy PA Computers and Telecommunications Ltd. What began as a listing of articles in current periodicals is now an index to potentially useful sources of information. These sources may be:
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…
Abstract
Purpose
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.
Design/methodology/approach
A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.
Findings
Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.
Research limitations/implications
The paper does not attempt to provide an exhaustive bibliography of related resources.
Practical implications
As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.
Originality/value
To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Details
Keywords
This article discusses some of the forces at work within today's online database environment which could lead to the emergence of a distributed information network. Several…
Abstract
This article discusses some of the forces at work within today's online database environment which could lead to the emergence of a distributed information network. Several important modules in such a network are identified, including an automated subject switching module. Switching options include: exact matching, equivalency matchings and word‐ and phrase‐stem matching. Research investigations, critical issues, and preliminary findings with regard to switching options and strategies are reported. It is anticipated that one of the primary benefits from automated subject switching will be much greater utlization of the online STI resource.