Search results

1 – 10 of 418
Article
Publication date: 22 February 2024

Ranjeet Kumar Singh

Although the challenges associated with big data are increasing, the question of the most suitable big data analytics (BDA) platform in libraries is always significant. The…

125

Abstract

Purpose

Although the challenges associated with big data are increasing, the question of the most suitable big data analytics (BDA) platform in libraries is always significant. The purpose of this study is to propose a solution to this problem.

Design/methodology/approach

The current study identifies relevant literature and provides a review of big data adoption in libraries. It also presents a step-by-step guide for the development of a BDA platform using the Apache Hadoop Ecosystem. To test the system, an analysis of library big data using Apache Pig, which is a tool from the Apache Hadoop Ecosystem, was performed. It establishes the effectiveness of Apache Hadoop Ecosystem as a powerful BDA solution in libraries.

Findings

It can be inferred from the literature that libraries and librarians have not taken the possibility of big data services in libraries very seriously. Also, the literature suggests that there is no significant effort made to establish any BDA architecture in libraries. This study establishes the Apache Hadoop Ecosystem as a possible solution for delivering BDA services in libraries.

Research limitations/implications

The present work suggests adapting the idea of providing various big data services in a library by developing a BDA platform, for instance, providing assistance to the researchers in understanding the big data, cleaning and curation of big data by skilled and experienced data managers and providing the infrastructural support to store, process, manage, analyze and visualize the big data.

Practical implications

The study concludes that Apache HadoopsHadoop Distributed File System and MapReduce components significantly reduce the complexities of big data storage and processing, respectively, and Apache Pig, using Pig Latin scripting language, is very efficient in processing big data and responding to queries with a quick response time.

Originality/value

According to the study, there are significantly fewer efforts made to analyze big data from libraries. Furthermore, it has been discovered that acceptance of the Apache Hadoop Ecosystem as a solution to big data problems in libraries are not widely discussed in the literature, although Apache Hadoop is regarded as one of the best frameworks for big data handling.

Details

Digital Library Perspectives, vol. 40 no. 2
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 26 April 2022

Jianpeng Zhang and Mingwei Lin

The purpose of this paper is to make an overview of 6,618 publications of Apache Hadoop from 2008 to 2020 in order to provide a conclusive and comprehensive analysis for…

Abstract

Purpose

The purpose of this paper is to make an overview of 6,618 publications of Apache Hadoop from 2008 to 2020 in order to provide a conclusive and comprehensive analysis for researchers in this field, as well as a preliminary knowledge of Apache Hadoop for interested researchers.

Design/methodology/approach

This paper employs the bibliometric analysis and visual analysis approaches to systematically study and analyze publications about Apache Hadoop in the Web of Science database. This study aims to investigate the topic of Apache Hadoop by means of bibliometric analysis with the aid of visualization applications. Through the bibliometric analysis of the collected documents, this paper analyzes the main statistical characteristics and cooperation networks. Research themes, research hotspots and future development trends are also investigated through the keyword analysis.

Findings

The research on Apache Hadoop is still the top priority in the future, and how to improve the performance of Apache Hadoop in the era of big data is one of the research hotspots.

Research limitations/implications

This paper makes a comprehensive analysis of Apache Hadoop with methods of bibliometrics, and it is valuable for researchers can quickly grasp the hot topics in this area.

Originality/value

This paper draws the structural characteristics of the publications in this field and summarizes the research hotspots and trends in this field in recent years, aiming to understand the development status and trends in this field and inspire new ideas for researchers.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 6 August 2021

Alexander Döschl, Max-Emanuel Keller and Peter Mandl

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing…

Abstract

Purpose

This paper aims to evaluate different approaches for the parallelization of compute-intensive tasks. The study compares a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and resilient distributed data set (RDD) (Apache Spark) paradigms and a graphics processing unit (GPU) approach with Numba for compute unified device architecture (CUDA).

Design/methodology/approach

The paper uses a simple but computationally intensive puzzle as a case study for experiments. To find all solutions using brute force search, 15! permutations had to be computed and tested against the solution rules. The experimental application comprises a Java multi-threaded algorithm, distributed computing solutions with MapReduce (Apache Hadoop) and RDD (Apache Spark) paradigms and a GPU approach with Numba for CUDA. The implementations were benchmarked on Amazon-EC2 instances for performance and scalability measurements.

Findings

The comparison of the solutions with Apache Hadoop and Apache Spark under Amazon EMR showed that the processing time measured in CPU minutes with Spark was up to 30% lower, while the performance of Spark especially benefits from an increasing number of tasks. With the CUDA implementation, more than 16 times faster execution is achievable for the same price compared to the Spark solution. Apart from the multi-threaded implementation, the processing times of all solutions scale approximately linearly. Finally, several application suggestions for the different parallelization approaches are derived from the insights of this study.

Originality/value

There are numerous studies that have examined the performance of parallelization approaches. Most of these studies deal with processing large amounts of data or mathematical problems. This work, in contrast, compares these technologies on their ability to implement computationally intensive distributed algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 8 February 2016

Zhihua Li, Zianfei Tang and Yihua Yang

The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of…

Abstract

Purpose

The high-efficient processing of mass data is a primary issue in building and maintaining security video surveillance system. This paper aims to focus on the architecture of security video surveillance system, which was based on Hadoop parallel processing technology in big data environment.

Design/methodology/approach

A hardware framework of security video surveillance network cascaded system (SVSNCS) was constructed on the basis of Internet of Things, network cascade technology and Hadoop platform. Then, the architecture model of SVSNCS was proposed using the Hadoop and big data processing platform.

Findings

Finally, we suggested the procedure of video processing according to the cascade network characteristics.

Originality/value

Our paper, which focused on the architecture of security video surveillance system in big data environment on the basis of Hadoop parallel processing technology, provided high-quality video surveillance services for security area.

Details

World Journal of Engineering, vol. 13 no. 1
Type: Research Article
ISSN: 1708-5284

Keywords

Article
Publication date: 14 October 2019

Priyadarshini R., Latha Tamilselvan and Rajendran N.

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the…

Abstract

Purpose

The purpose of this paper is to propose a fourfold semantic similarity that results in more accuracy compared to the existing literature. The change detection in the URL and the recommendation of the source documents is facilitated by means of a framework in which the fourfold semantic similarity is implied. The latest trends in technology emerge with the continuous growth of resources on the collaborative web. This interactive and collaborative web pretense big challenges in recent technologies like cloud and big data.

Design/methodology/approach

The enormous growth of resources should be accessed in a more efficient manner, and this requires clustering and classification techniques. The resources on the web are described in a more meaningful manner.

Findings

It can be descripted in the form of metadata that is constituted by resource description framework (RDF). Fourfold similarity is proposed compared to three-fold similarity proposed in the existing literature. The fourfold similarity includes the semantic annotation based on the named entity recognition in the user interface, domain-based concept matching and improvised score-based classification of domain-based concept matching based on ontology, sequence-based word sensing algorithm and RDF-based updating of triples. The aggregation of all these similarity measures including the components such as semantic user interface, semantic clustering, and sequence-based classification and semantic recommendation system with RDF updating in change detection.

Research limitations/implications

The existing work suggests that linking resources semantically increases the retrieving and searching ability. Previous literature shows that keywords can be used to retrieve linked information from the article to determine the similarity between the documents using semantic analysis.

Practical implications

These traditional systems also lack in scalability and efficiency issues. The proposed study is to design a model that pulls and prioritizes knowledge-based content from the Hadoop distributed framework. This study also proposes the Hadoop-based pruning system and recommendation system.

Social implications

The pruning system gives an alert about the dynamic changes in the article (virtual document). The changes in the document are automatically updated in the RDF document. This helps in semantic matching and retrieval of the most relevant source with the virtual document.

Originality/value

The recommendation and detection of changes in the blogs are performed semantically using n-triples and automated data structures. User-focussed and choice-based crawling that is proposed in this system also assists the collaborative filtering. Consecutively collaborative filtering recommends the user focussed source documents. The entire clustering and retrieval system is deployed in multi-node Hadoop in the Amazon AWS environment and graphs are plotted and analyzed.

Details

International Journal of Intelligent Unmanned Systems, vol. 7 no. 4
Type: Research Article
ISSN: 2049-6427

Keywords

Article
Publication date: 1 June 2012

James Powell, Linn Collins, Ariane Eberhardt, David Izraelevitz, Jorge Roman, Thomas Dufresne, Mark Scott, Miriam Blake and Gary Grider

The purpose of this paper is to describe a process for extracting and matching author names from large collections of bibliographic metadata using the Hadoop implementation of…

Abstract

Purpose

The purpose of this paper is to describe a process for extracting and matching author names from large collections of bibliographic metadata using the Hadoop implementation of MapReduce. It considers the challenges and risks associated with name matching on such a large‐scale and proposes simple matching heuristics for the reduce process. The resulting semantic graphs of authors link names to publications, and include additional features such as phonetic representations of author last names. The authors believe that this achieves an appropriate level of matching at scale, and enables further matching to be performed with graph analysis tools.

Design/methodology/approach

A topically‐focused collection of metadata records describing peer‐reviewed papers was generated based upon a search. The matching records were harvested and stored in the Hadoop Distributed File System (HDFS) for processing by hadoop. A MapReduce job was written to perform coarse‐grain author name matching, and multiple papers were matched with authors when the names were very similar or identical. Semantic graphs were generated so that the graphs could be analyzed to perform finer grained matching, for example by using other metadata such as subject headings.

Findings

When performing author name matching at scale using MapReduce, the heuristics that determine whether names match should be limited to the rules that yield the most reliable results for matching. Bad rules will result in lots of errors, at scale. MapReduce can also be used to generate or extract other data that might help resolve similar names when stricter rules fail to do so. The authors also found that matching is more reliable within a well‐defined topic domain.

Originality/value

Libraries have some of the same big data challenges as are found in data‐driven science. Big data tools such as hadoop can be used to explore large metadata collections, and these collections can be used as surrogates for other real world, big data problems. MapReduce activities need to be appropriately scoped so as to yield good results, while keeping an eye out for problems in code which can be magnified in the output from a MapReduce job.

Details

Library Hi Tech News, vol. 29 no. 4
Type: Research Article
ISSN: 0741-9058

Keywords

Article
Publication date: 19 November 2018

I-Cheng Chen and I-Ching Hsu

In recent years, governments around the world are actively promoting the Open Government Data (OGD) to facilitate reusing open data and developing information applications…

Abstract

Purpose

In recent years, governments around the world are actively promoting the Open Government Data (OGD) to facilitate reusing open data and developing information applications. Currently, there are more than 35,000 data sets available on the Taiwan OGD website. However, the existing Taiwan OGD website only provides keyword queries and lacks a friendly query interface. This study aims to address these issues by defining a DBpedia cloud computing framework (DCCF) for integrating DBpedia with Semantic Web technologies into Spark cluster cloud computing environment.

Design/methodology/approach

The proposed DCCF is used to develop a Taiwan OGD recommendation platform (TOGDRP) that provides a friendly query interface to automatically filter out the relevant data sets and visualize relationships between these data sets.

Findings

To demonstrate the feasibility of TOGDRP, the experimental results illustrate the efficiency of the different cloud computing models, including Hadoop YARN cluster model, Spark standalone cluster model and Spark YARN cluster model.

Originality/value

The novel solution proposed in this study is a hybrid approach for integrating Semantic Web technologies into Hadoop and Spark cloud computing environment to provide OGD data sets recommendation.

Details

International Journal of Web Information Systems, vol. 15 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Content available
Article
Publication date: 25 November 2013

Heidi Hanson and Zoe Stewart-Marshall

430

Abstract

Details

Library Hi Tech News, vol. 30 no. 10
Type: Research Article
ISSN: 0741-9058

Article
Publication date: 19 April 2018

Aleksandar Simović

With the exponential growth of the amount of data, the most sophisticated systems of traditional libraries are not able to fulfill the demands of modern business and user needs…

2947

Abstract

Purpose

With the exponential growth of the amount of data, the most sophisticated systems of traditional libraries are not able to fulfill the demands of modern business and user needs. The purpose of this paper is to present the possibility of creating a Big Data smart library as an integral and enhanced part of the educational system that will improve user service and increase motivation in the continuous learning process through content-aware recommendations.

Design/methodology/approach

This paper presents an approach to the design of a Big Data system for collecting, analyzing, processing and visualizing data from different sources to a smart library specifically suitable for application in educational institutions.

Findings

As an integrated recommender system of the educational institution, the practical application of Big Data smart library meets the user needs and assists in finding personalized content from several sources, resulting in economic benefits for the institution and user long-term satisfaction.

Social implications

The need for continuous education alters business processes in libraries with requirements to adopt new technologies, business demands, and interactions with users. To be able to engage in a new era of business in the Big Data environment, librarians need to modernize their infrastructure for data collection, data analysis, and data visualization.

Originality/value

A unique value of this paper is its perspective of the implementation of a Big Data solution for smart libraries as a part of a continuous learning process, with the aim to improve the results of library operations by integrating traditional systems with Big Data technology. The paper presents a Big Data smart library system that has the potential to create new values and data-driven decisions by incorporating multiple sources of differential data.

Details

Library Hi Tech, vol. 36 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Book part
Publication date: 19 July 2022

Ayesha Banu

Introduction: The Internet has tremendously transformed the computer and networking world. Information reaches our fingertips and adds data to our repository within a second. Big…

Abstract

Introduction: The Internet has tremendously transformed the computer and networking world. Information reaches our fingertips and adds data to our repository within a second. Big data was initially defined as three Vs, where data come with greater variety, increasing volumes and extra velocity. Big data is a collection of structured, unstructured and semi-structured data gathered from different sources and applications. It has become the most powerful buzzword in almost all the business sectors. The real success of any industry can be counted based on how the big data is analysed, potential knowledge is discovered and productive business decisions are made. New technologies such as artificial intelligence and machine learning have added more efficiency to storing and analysing data. This big data analytics (BDA) becomes more valuable to those companies, focusing on getting insight into customer behaviour, trends and patterns. This popularity of big data has inspired insurance companies to utilise big data at their core systems and advance the financial operations, improve customer service, construct a personalised environment and take all possible measures to increase revenue and profits.

Purpose: This study aims to recognise what big data stands for in the insurance sector and how the application of BDA has opened the door for new and innovative changes in the insurance industry.

Methodology: This study describes the field of BDA in the insurance sector, discusses the benefits, outlines tools, architectural framework, the method, describes applications in general and specific and briefly discusses the opportunities and challenges.

Findings: The study concludes that BDA in insurance is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Its potential is great; however, there remain challenges to overcome.

Details

Big Data: A Game Changer for Insurance Industry
Type: Book
ISBN: 978-1-80262-606-3

Keywords

1 – 10 of 418