Search results

1 – 3 of 3
Article
Publication date: 14 April 2014

Faisal Alkhateeb and Jerome Euzenat

The paper aims to discuss extensions of SPARQL that use regular expressions to navigate RDF graphs and may be used to answer queries considering RDFS semantics (in particular…

Abstract

Purpose

The paper aims to discuss extensions of SPARQL that use regular expressions to navigate RDF graphs and may be used to answer queries considering RDFS semantics (in particular, nSPARQL and our proposal CPSPARQL).

Design/methodology/approach

The paper is based upon a theoretical comparison of the expressiveness and complexity of both nSPARQL and the corresponding fragment of CPSPARQL, that we call cpSPARQL.

Findings

The paper shows that nSPARQL and cpSPARQL (the fragment of CPSPARQL) have the same complexity through cpSPARQL, being a proper extension of SPARQL graph patterns, is more expressive than nSPARQL.

Research limitations/implications

It has not been possible to the authors to compare the performance of our CPSPARQL implementation with other proposals. However, the experimentation has allowed to make interesting observations.

Practical implications

The paper includes implications for implementing the SPARQL RDFS entailment regime.

Originality/value

The paper demonstrates the usefulness of cpSPARQL language. In particular, cpSPARQL, which is sufficient for capturing RDFS semantics, admits an efficient evaluation algorithm, while the whole CPSPARQL language is in theory as efficient as SPARQL is. Moreover, using such a path language within the SPARQL structure allows for properly extending SPARQL.

Details

International Journal of Web Information Systems, vol. 10 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 8 May 2017

Amed Leiva-Mederos, Jose A. Senso, Yusniel Hidalgo-Delgado and Pedro Hipola

Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be…

1149

Abstract

Purpose

Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources.

Design/methodology/approach

After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the “dimensions” included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file.

Findings

The paper also includes an evaluation based on the comparison – by means of calculations of recall and precision – of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information.

Originality/value

The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.

Article
Publication date: 19 June 2019

Prafulla Bafna, Dhanya Pramod, Shailaja Shrwaikar and Atiya Hassan

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer…

Abstract

Purpose

Document management is growing in importance proportionate to the growth of unstructured data, and its applications are increasing from process benchmarking to customer relationship management and so on. The purpose of this paper is to improve important components of document management that is keyword extraction and document clustering. It is achieved through knowledge extraction by updating the phrase document matrix. The objective is to manage documents by extending the phrase document matrix and achieve refined clusters. The study achieves consistency in cluster quality in spite of the increasing size of data set. Domain independence of the proposed method is tested and compared with other methods.

Design/methodology/approach

In this paper, a synset-based phrase document matrix construction method is proposed where semantically similar phrases are grouped to reduce the dimension curse. When a large collection of documents is to be processed, it includes some documents that are very much related to the topic of interest known as model documents and also the documents that deviate from the topic of interest. These non-relevant documents may affect the cluster quality. The first step in knowledge extraction from the unstructured textual data is converting it into structured form either as term frequency-inverse document frequency matrix or as phrase document matrix. Once in structured form, a range of mining algorithms from classification to clustering can be applied.

Findings

In the enhanced approach, the model documents are used to extract key phrases with synset groups, whereas the other documents participate in the construction of the feature matrix. It gives a better feature vector representation and improved cluster quality.

Research limitations/implications

Various applications that require managing of unstructured documents can use this approach by specifically incorporating the domain knowledge with a thesaurus.

Practical implications

Experiment pertaining to the academic domain is presented that categorizes research papers according to the context and topic, and this will help academicians to organize and build knowledge in a better way. The grouping and feature extraction for resume data can facilitate the candidate selection process.

Social implications

Applications like knowledge management, clustering of search engine results, different recommender systems like hotel recommender, task recommender, and so on, will benefit from this study. Hence, the study contributes to improving document management in business domains or areas of interest of its users from various strata’s of society.

Originality/value

The study proposed an improvement to document management approach that can be applied in various domains. The efficacy of the proposed approach and its enhancement is validated on three different data sets of well-articulated documents from data sets such as biography, resume and research papers. These results can be used for benchmarking further work carried out in these areas.

Details

Benchmarking: An International Journal, vol. 26 no. 6
Type: Research Article
ISSN: 1463-5771

Keywords

1 – 3 of 3