Search results

1 – 10 of over 8000
Open Access
Article
Publication date: 2 April 2024

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 December 2000

David Roberts and Clive Souter

This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information…

Abstract

This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of ‘concept strings’ from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.

Details

Aslib Proceedings, vol. 52 no. 10
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 13 October 2023

Judit Gárdos, Julia Egyed-Gergely, Anna Horváth, Balázs Pataki, Roza Vajda and András Micsik

The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for…

Abstract

Purpose

The present study is about generating metadata to enhance thematic transparency and facilitate research on interview collections at the Research Documentation Centre, Centre for Social Sciences (TK KDK) in Budapest. It explores the use of artificial intelligence (AI) in producing, managing and processing social science data and its potential to generate useful metadata to describe the contents of such archives on a large scale.

Design/methodology/approach

The authors combined manual and automated/semi-automated methods of metadata development and curation. The authors developed a suitable domain-oriented taxonomy to classify a large text corpus of semi-structured interviews. To this end, the authors adapted the European Language Social Science Thesaurus (ELSST) to produce a concise, hierarchical structure of topics relevant in social sciences. The authors identified and tested the most promising natural language processing (NLP) tools supporting the Hungarian language. The results of manual and machine coding will be presented in a user interface.

Findings

The study describes how an international social scientific taxonomy can be adapted to a specific local setting and tailored to be used by automated NLP tools. The authors show the potential and limitations of existing and new NLP methods for thematic assignment. The current possibilities of multi-label classification in social scientific metadata assignment are discussed, i.e. the problem of automated selection of relevant labels from a large pool.

Originality/value

Interview materials have not yet been used for building manually annotated training datasets for automated indexing of scientifically relevant topics in a data repository. Comparing various automated-indexing methods, this study shows a possible implementation of a researcher tool supporting custom visualizations and the faceted search of interview collections.

Article
Publication date: 2 September 2014

Koraljka Golub, Marianne Lykke and Douglas Tudhope

The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social…

1753

Abstract

Purpose

The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval.

Design/methodology/approach

Over 11,000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings.

Findings

The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology.

Originality/value

No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Details

Journal of Documentation, vol. 70 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 March 1988

John C. Crawford

The findings of a survey of Scottish university, central institution and college of education libraries to assess present and planned subject access to their catalogues and…

Abstract

The findings of a survey of Scottish university, central institution and college of education libraries to assess present and planned subject access to their catalogues and whether online catalogues are likely to improve subject access are reported. The results are analysed and the findings discussed in relation to published studies of subject access in online catalogues. It is concluded that greater attention needs to be paid to subject access both by librarians in specifying automated systems and by system suppliers in responding to specifications.

Details

Library Review, vol. 37 no. 3
Type: Research Article
ISSN: 0024-2535

Keywords

Article
Publication date: 1 September 1997

Neil Jones

States that as use of networks becomes more innovative and widespread in higher education libraries, current approaches to the organization of network‐accessible resources reveal…

Abstract

States that as use of networks becomes more innovative and widespread in higher education libraries, current approaches to the organization of network‐accessible resources reveal flaws. Moving forward from the recommendations of the Follett Report, and adopting an approach which seeks to redefine conceptually conventional practices and standards the study examines, from a technical services perspective, issues and approaches relating to the development of existing cataloguing rules and practices, and machine‐readable standards, and proposes these standards as the most effective means of enhancing accessibility to electronic resources. Characterizes the current period as one of organizational, technological and conceptual transition, and addresses the broader issue of academic network‐accessibility in the local, regional, national and international context. Additionally, identifies the challenges to and implications for conventional, and future, technical services operations of these trends.

Details

New Library World, vol. 98 no. 5
Type: Research Article
ISSN: 0307-4803

Keywords

Article
Publication date: 1 February 1974

VINE is a Very Informal NEwsletter produced three or four times a year by the Information Officer for Library Automation and financed by the British Library Research and…

Abstract

VINE is a Very Informal NEwsletter produced three or four times a year by the Information Officer for Library Automation and financed by the British Library Research and Development Department. It is issued free of charge on request to interested librarians, systems staff and library college lecturers. VINEs objective is to provide an up‐to‐date picture of work being done in U.K. library automation projects which has not been reported elsewhere.

Details

VINE, vol. 4 no. 2
Type: Research Article
ISSN: 0305-5728

Article
Publication date: 1 February 1988

Julia M. Johnson

This article describes the design, use and evolution over four years of INFO, a Cardbox‐plus based file containing bibliographic and other data relating to the computer and…

2610

Abstract

This article describes the design, use and evolution over four years of INFO, a Cardbox‐plus based file containing bibliographic and other data relating to the computer and telecommunications industries. INFO was designed and implemented by the author who provides an information service to the UK‐based consultantcy PA Computers and Telecommunications Ltd. What began as a listing of articles in current periodicals is now an index to potentially useful sources of information. These sources may be:

Details

Program, vol. 22 no. 2
Type: Research Article
ISSN: 0033-0337

Article
Publication date: 1 May 2006

Koraljka Golub

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…

2223

Abstract

Purpose

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach

A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings

Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications

The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications

As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value

To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 February 1979

R.T. Niehoff and S. Kwasny

This article discusses some of the forces at work within today's online database environment which could lead to the emergence of a distributed information network. Several…

Abstract

This article discusses some of the forces at work within today's online database environment which could lead to the emergence of a distributed information network. Several important modules in such a network are identified, including an automated subject switching module. Switching options include: exact matching, equivalency matchings and word‐ and phrase‐stem matching. Research investigations, critical issues, and preliminary findings with regard to switching options and strategies are reported. It is anticipated that one of the primary benefits from automated subject switching will be much greater utlization of the online STI resource.

Details

Online Review, vol. 3 no. 2
Type: Research Article
ISSN: 0309-314X

1 – 10 of over 8000