Search results
1 – 10 of over 287000This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…
Abstract
Purpose
This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.
Design/methodology/approach
In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.
Findings
The authors got very satisfactory classification results.
Originality/value
DDPML system is specially designed to smoothly handle big data mining classification.
Details
Keywords
Adli Hamdam, Ruzita Jusoh, Yazkhiruni Yahya, Azlina Abdul Jalil and Nor Hafizah Zainal Abidin
The role of big data and data analytics in the audit engagement process is evident. Notwithstanding, understanding how big data influences cognitive processes and, consequently…
Abstract
Purpose
The role of big data and data analytics in the audit engagement process is evident. Notwithstanding, understanding how big data influences cognitive processes and, consequently, on the auditors’ judgment decision-making process is limited. The purpose of this paper is to present a conceptual framework on the cognitive process that may influence auditors’ judgment decision-making in the big data environment. The proposed framework predicts the relationships among data visualization integration, data processing modes, task complexity and auditors’ judgment decision-making.
Design/methodology/approach
The methodology to accomplish the conceptual framework is based on a thorough literature review that consists of theoretical discussions and comparative studies of other authors’ works and thinking. It also involves summarizing and interpreting previous contributions subjectively and narratively and extending the work in some fashion. Based on this approach, this paper formulates four propositions about data visualization integration, data processing modes, task complexity and auditors’ judgment decision-making. The proposed framework was built from cognitive theory addressing how auditors process data into useful information to make judgment decision-making.
Findings
The proposed framework expects that the cognitive process of data visualization integration and intuitive data processing mode will improve auditors’ judgment decision-making. This paper also contends that task complexity may influence the cognitive process of data visualization integration and processing modes because of the voluminous nature of data and the complexity of business processes. Hence, it is also expected that the relationships between data visualization integration and audit judgment decision-making and between processing mode and audit judgment decision-making will be moderated by task complexity.
Research limitations/implications
There is a dearth of studies examining how big data and big data analytics affect auditors’ cognitive processes in making decisions. This paper will help researchers and auditors understand the behavioral consequences of data visualization integration and data processing mode in making judgment decision-making, given a certain level of task complexity.
Originality/value
With the advent of big data and the evolution of innovative audit procedures, the constructed framework can be used as a theoretical foundation for future empirical studies concerning auditors’ judgment decision-making. It highlights the potential of big data to transform the nature and practice of accounting and auditing.
Details
Keywords
Yanchao Rao and Ken Huijin Guo
The US Securities and Exchange Commission (SEC) requires public companies to file structured data in eXtensible Business Reporting Language (XBRL). One of the key arguments behind…
Abstract
Purpose
The US Securities and Exchange Commission (SEC) requires public companies to file structured data in eXtensible Business Reporting Language (XBRL). One of the key arguments behind the XBRL mandate is that the technical standard can help improve processing efficiency for data aggregators. This paper aims to empirically test the data processing efficiency hypothesis.
Design/methodology/approach
To test the data processing efficiency hypothesis, the authors adopt a two-sample research design by using data from Compustat: a pooled sample (N = 61,898) and a quasi-experimental sample (N = 564). The authors measure data processing efficiency as the time lag between the dates of 10-K filings on the SEC’s EDGAR system and the dates of related data finalized in the Compustat database.
Findings
The statistical results show that after controlling for potential effects of firm size, age, fiscal year and industry, XBRL has a non-significant impact on data efficiency. It suggests that the data processing efficiency benefit may have been overestimated.
Originality/value
This study provides some timely empirical evidence to the debate as to whether XBRL can improve data processing efficiency. The non-significant results suggest that it may be necessary to revisit the mandate of XBRL reporting in the USA and many other countries.
Details
Keywords
“One picture says more than 1000 words”. This saving offers us information about one of the most important features of human beings. Human beings mostly relate their actions to…
Abstract
“One picture says more than 1000 words”. This saving offers us information about one of the most important features of human beings. Human beings mostly relate their actions to their surroundings by optical means. No other information channel is as well developed as the optical channel and only the optical channel is able to process very large quantities of data at one time, bearing in mind the large number of steps which are between the image received by the eyes and the understanding of the contents of the picture by the brain.
Bojan Božić and Werner Winiwarter
The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community…
Abstract
Purpose
The purpose of this paper is to present a showcase of semantic time series processing which demonstrates how this technology can improve time series processing and community building by the use of a dedicated language.
Design/methodology/approach
The authors have developed a new semantic time series processing language and prepared showcases to demonstrate its functionality. The assumption is an environmental setting with data measurements from different sensors to be distributed to different groups of interest. The data are represented as time series for water and air quality, while the user groups are, among others, the environmental agency, companies from the industrial sector and legal authorities.
Findings
A language for time series processing and several tools to enrich the time series with meta‐data and for community building have been implemented in Python and Java. Also a GUI for demonstration purposes has been developed in PyQt4. In addition, an ontology for validation has been designed and a knowledge base for data storage and inference was set up. Some important features are: dynamic integration of ontologies, time series annotation, and semantic filtering.
Research limitations/implications
This paper focuses on the showcases of time series semantic language (TSSL), but also covers technical aspects and user interface issues. The authors are planning to develop TSSL further and evaluate it within further research projects and validation scenarios.
Practical implications
The research has a high practical impact on time series processing and provides new data sources for semantic web applications. It can also be used in social web platforms (especially for researchers) to provide a time series centric tagging and processing framework.
Originality/value
The paper presents an extended version of the paper presented at iiWAS2012.
Details
Keywords
Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.
Abstract
Purpose
Resilient distributed processing technique (RDPT), in which mapper and reducer are simplified with the Spark contexts and support distributed parallel query processing.
Design/methodology/approach
The proposed work is implemented with Pig Latin with Spark contexts to develop query processing in a distributed environment.
Findings
Query processing in Hadoop influences the distributed processing with the MapReduce model. MapReduce caters to the works on different nodes with the implementation of complex mappers and reducers. Its results are valid for some extent size of the data.
Originality/value
Pig supports the required parallel processing framework with the following constructs during the processing of queries: FOREACH; FLATTEN; COGROUP.
Details
Keywords
Dawn M. Russell and David Swanson
The purpose of this paper is to investigate the mediators that occupy the gap between information processing theory and supply chain agility. In today’s Mach speed business…
Abstract
Purpose
The purpose of this paper is to investigate the mediators that occupy the gap between information processing theory and supply chain agility. In today’s Mach speed business environment, managers often install new technology and expect an agile supply chain when they press<Enter>. This study reveals the naivety of such an approach, which has allowed new technology to be governed by old processes.
Design/methodology/approach
This work takes a qualitative approach to the dynamic conditions surrounding information processing and its connection to supply chain agility through the assessment of 60 exemplar cases. The situational conditions that have created the divide between information processing and supply chain agility are studied.
Findings
The agility adaptation typology (AAT) defining three types of adaptations and their mediating constructs is presented. Type 1: information processing, is generally an exercise in synchronization that can be used to support assimilation. Type 2: demand sensing, is where companies are able to incorporate real-time data into everyday processes to better understand demand and move toward a real-time environment. Type 3: supply chain agility, requires fundamentally new thinking in the areas of transformation, mindset and culture.
Originality/value
This work describes the reality of today’s struggle to achieve supply chain agility, providing guidelines and testable propositions, and at the same time, avoids “ivory tower prescriptions,” which exclude the real world details from the research process (Meredith, 1993). By including the messy real world details, while difficult to understand and explain, the authors are able to make strides in the AAT toward theory that explains and guides the manager’s everyday reality with all of its messy real world details.
Details
Keywords
This monograph defines distributed intelligence and discusses the relationship of distributed intelligence to data base, justifications for using the technique, and the approach…
Abstract
This monograph defines distributed intelligence and discusses the relationship of distributed intelligence to data base, justifications for using the technique, and the approach to successful implementation of the technique. The approach is then illustrated by reference to a case study of experience in Birds Eye Foods. The planning process by which computing strategy for the company was decided is described, and the planning conclusions reached to date are given. The current state of development in the company is outlined and the very real savings so far achieved are specified. Finally, the main conclusions of the monograph are brought together. In essence these conclusions are that major savings are achievable using distributed intelligence, and that the implementation of a company data processing plan can be made quicker and simpler by its use. However, careful central control must be maintained so as to avoid fragmentation of machine, language skills, and application taking place.
Sabrina Lechler, Angelo Canzaniello, Bernhard Roßmann, Heiko A. von der Gracht and Evi Hartmann
Particularly in volatile, uncertain, complex and ambiguous (VUCA) business conditions, staff in supply chain management (SCM) look to real-time (RT) data processing to reduce…
Abstract
Purpose
Particularly in volatile, uncertain, complex and ambiguous (VUCA) business conditions, staff in supply chain management (SCM) look to real-time (RT) data processing to reduce uncertainties. However, based on the premise that data processing can be perfectly mastered, such expectations do not reflect reality. The purpose of this paper is to investigate whether RT data processing reduces SCM uncertainties under real-world conditions.
Design/methodology/approach
Aiming to facilitate communication on the research question, a Delphi expert survey was conducted to identify challenges of RT data processing in SCM operations and to assess whether it does influence the reduction of SCM uncertainty. In total, 14 prospective statements concerning RT data processing in SCM operations were developed and evaluated by 68 SCM and data-science experts.
Findings
RT data processing was found to have an ambivalent influence on the reduction of SCM complexity and associated uncertainty. Analysis of the data collected from the study participants revealed a new type of uncertainty related to SCM data itself.
Originality/value
This paper discusses the challenges of gathering relevant, timely and accurate data sets in VUCA environments and creates awareness of the relationship between data-related uncertainty and SCM uncertainty. Thus, it provides valuable insights for practitioners and the basis for further research on this subject.
Details
Keywords
Francesco Ciclosi, Paolo Ceravolo, Ernesto Damiani and Donato De Ieso
This chapter analyzes the compliance of some category of Open Data in Politics with EU General Data Protection Regulation (GDPR) requirements. After clarifying the legal basis of…
Abstract
This chapter analyzes the compliance of some category of Open Data in Politics with EU General Data Protection Regulation (GDPR) requirements. After clarifying the legal basis of this framework, with specific attention to the processing procedures that conform to the legitimate interests pursued by the data controller, including open data licenses or anonymization techniques, that can result in partial application of the GDPR, but there is no generic guarantee, and, as a consequence, an appropriate process of analysis and management of risks is required.
Details