An Innovative Web Intelligence Data Clustering Algorithm for Human Resources Based on Sustainability

a Universidad Nacional Mayor de San Marcos, Perú
b Universidad Nacional Santiago Antunez de Mayolo, Peru
c Kristu Jayanti College, Autonomous, India

Technological Innovations for Business, Education and Sustainability

ISBN: 978-1-83753-107-3, eISBN: 978-1-83753-106-6

Publication date: 23 April 2024

Abstract

The development of information technologies has led to a considerable transformation in human resource management from conventional or commonly known as personnel management to modern one. Data mining technology, which has been widely used in several applications, including those that function on the web, includes clustering algorithms as a key component. Web intelligence is a recent academic field that calls for sophisticated analytics and machine learning techniques to facilitate information discovery, particularly on the web. Human resource data gathered from the web are typically enormous, highly complex, dynamic, and unstructured. Traditional clustering methods need to be upgraded because they are ineffective. Standard clustering algorithms are enhanced and expanded with optimization capabilities to address this difficulty by swarm intelligence, a subset of nature-inspired computing. We collect the initial raw human resource data and preprocess the data wherein data cleaning, data normalization, and data integration takes place. The proposed K-C-means-data driven cuckoo bat optimization algorithm (KCM-DCBOA) is used for clustering of the human resource data. The feature extraction is done using principal component analysis (PCA) and the classification of human resource data is done using support vector machine (SVM). Other approaches from the literature were contrasted with the suggested approach. According to the experimental findings, the suggested technique has extremely promising features in terms of the quality of clustering and execution time.

Keywords

Citation

Norabuena-Figueroa, E., Rurush-Asencio, R., Jaheer Mukthar, K.P., Sifuentes-Stratti, J. and Ramírez-Asís, E. (2024), "An Innovative Web Intelligence Data Clustering Algorithm for Human Resources Based on Sustainability", Hamdan, A. (Ed.) Technological Innovations for Business, Education and Sustainability (Technological Innovation and Sustainability for Business Competitive Advantage), Emerald Publishing Limited, Leeds, pp. 47-67. https://doi.org/10.1108/978-1-83753-106-620241004

Publisher

:

Emerald Publishing Limited

Copyright © 2024 Emerson Norabuena-Figueroa, Roger Rurush-Asencio, Jaheer Mukthar K. P., Jose Sifuentes-Stratti and Elia Ramírez-Asís. Published under exclusive licence by Emerald Publishing Limited


1 Introduction

Human resource management (HRM) has seen a dramatic transformation as a result of the widespread adoption of digital technologies in all facets of the field. HRM has figured out how to use technology like the computer and the internet to boost efficiency, cut costs, and boost the company's competitive edge. Web intelligence has been included in many tactical HR operations due to the large amount of organizational, human, and task-oriented data for which HR is responsible; this improves the viability of business models (Castillo, Fernández, Camones, & Guerra, 2022; Votto, Valecha, Najafirad, & Rao, 2021). The HR department now typically presents itself through an online portal rather than a live employee. In today's economy, companies can't afford to lose ground, therefore they need to invest in their workforce if they want to succeed. There is a widespread quest among the nation's largest corporations for the most qualified local workers. Staffing is a crucial part of HR development, and recruitment is one of the HR responsibilities included in the selection process. Changes in our economy, society, and culture have been substantial as a result of technological progress Veluchamy, Sanchari, and Gupta (2021). There have been numerous influences on HRM since the field's inception. The pressure was put on HR development to yield desirable results as a result of work shifts necessitated by the introduction of new, diverse technologies like online intelligence solutions. The future of the intelligent system will have a significant impact on human life, and it will be web intelligence (WI) (Hmoud & Laszlo, 2019). A type of meta-heuristic algorithm, “nature-inspired computing” (NIC) draws inspiration from the workings of the natural world. It draws its inspiration from the natural world and features a wide range of living things, including humans and animals. When it comes to translating the natural or biological cycle into machine intelligence, NIC is important. Understanding natural processes, designing patterns for nature processes, identifying the problem, and modeling it technologically are just a few of the steps involved in creating intelligent systems. Maintaining stability requires efficient management of scarce resources, and nature acts as a self-optimizing system in this regard (Asís, Figueroa, Quiñones, & Márquez Mázmela, 2022).

One of the most popular techniques for resolving practically all problems internationally is to apply algorithms that are inspired by nature. The dispersion of cloud services is analyzed and configured as part of cloud management to maximize the effectiveness of power applications, infrastructures, or workloads and minimize loss due to oversupply. The introduction of DevOps resulted in rapid distribution and frequent new code modifications. These algorithms draw their inspiration from nature and make use of how the world works to solve issues (Yahia et al., 2021). There have been numerous influences on HRM since the field's inception. The pressure was put on HR development to yield desirable results as a result of work shifts necessitated by the introduction of new, diverse technologies like online intelligence solutions. The future of the intelligent system will have a significant impact on human life, and it will be WI.

Wang (2022) applied WI and data analysis (DA) to the main parts of HRs, and then uses experimental simulation to get the data findings of the model. This chapter aims to construct and explain a WI and deep DA model for HRs within the context of the currently available theoretical framework. Mellal (2022) explained the fundamental ideas underlying several NIC methods and their applications, such as particle swarm optimization (PSO), a grey wolf optimization method, ant colony optimizations (ACOs), plant propagation techniques, cuckoo optimization algorithms, and artificial neural networks. Oral and Turgut (2018) evaluated the efficacy of the Flower Pollination Algorithm (“FPA”), the Forest Optimization Algorithm (“FOA”), and the Artificial Algae Algorithm (“AAA”), three relatively new nature-inspired computing (“NIC”) algorithms. Ten widely used benchmark test functions, split into multimodal and unimodal categories, are used for the comparisons. Rui, Fong, Yang, and Deb (2019) explored the feasibility of using alternative optimization methods inspired by nature to carry out clustering using WI data. In this research, we claim that all of the recently developed clustering algorithms are superior to the industry standard, cuckoo particle swarm optimization (C-PSO). Dey et al. (2020) provided a state-of-the-art research methodology in the field of NIC. It introduces readers to a wide variety of algorithms, including multi-agent systems, genetic methods, particle optimization, the firefly method, flower pollination algorithms, collision-based optimization techniques, and the bat algorithm. Soto, Asis, Figueroa, and Plasencia (2023) and Shaikh et al. (2022) provided an overview of algorithms that are based on natural phenomena, including biologically inspired algorithms and swarm intelligence techniques. Optimization plays a crucial part in the success of many algorithms that take their inspiration from nature and are used to the solution of practical problems. Applications of computers inspired by nature for the wireless sensor network (WSN) are presented in this work. Even though WSN is becoming increasingly popular, it does have certain drawbacks, such as battery life, distraction, slow communication, and security. New forms of clever algorithms are required to solve these problems (Huerta-Soto et al., 2022).

Bharti, Biswas, and Shukla (2020) provided a high-level survey of recent developments and applications in the subject of nature-inspired computation focusing on their relevance to deep learning. Pallathadka et al. (2023) offered light on contemporary research trends. There is a brief list of algorithms followed by various variations of nature-inspired algorithms before it addresses the classifications of the algorithms. New optimization algorithms are occasionally created and altered using nature as their primary source of inspiration. HRM aims to optimize planned HR allocation by the organization's development requirements. The enthusiasm of the employees can be utilized through recruitment, training, assessment, incentive, and other areas of the staff to maximize benefits for the organization (Castro, Castillo, Camones, & Cochachin, 2022; Zhao, 2020). Sohrabi, Vanani, and Abedin (2018) employed text mining techniques applied to a comprehensive search of scholarly literature from across the world to examine emerging trends in the field of HRM in tandem with information systems. Herrera and delas Heras-Rosas (2020) examined developments in corporate social responsibility (CSR) and HRM-related scientific output. The connection between CSR, HRM and economic, environmental, and social sustainability has been the subject of numerous case studies. However, the groundwork for addressing the emerging competencies of CSR, HRM, and sustainable company management has yet to be laid. Ramirez-Asis, Maguina, Infantes, and Naranjo-Toro (2020) provided a comprehensive literature review of HR analytics to identify the present research trends and to establish future research agendas in this area. The purpose of this chapter is to provide a comprehensive overview of the topic, including its historical context, underlying theoretical principles and cutting-edge advancements. Human Resource Management and the Importance of Web Intelligence. As a result of globalization, traditional approaches to managing businesses are being put to the test. As the world becomes smaller because of technological advancements, businesses no longer have to compete solely with companies in their immediate geographic area (Vasantham, 2021). Li and Zhou (2022) investigated and enhanced the deep learning-based recommendation algorithm and applied it to the HR recommendation domain. The current recommendation system relies on a single, time-tested algorithm, and its efficacy in the field of HRM may benefit from a revamp. Venusamy, Rajagopal, and Yousoof (2020) identified the effects of chatbots in the modern day. This cutting-edge method orients the present HR executives toward a graphical viewpoint for analyzing candidates. Using a designated examination approach, the benefits and drawbacks of online intelligence have been analyzed, and the outcome of the investigation is certain. Statistical and analytical abilities are often lacking in HR departments, making it difficult to work with massive datasets (Yating et al., 2022). Inadequate infrastructure can prevent some businesses from gaining access to high-quality data.

Hence in this chapter, the novel clustering algorithm for HR WI data based on NIC was described. The further part of this chapter is categorized as follows: Part 2 provides the methods, Part 3 explains the experiments and results, and Part 4 explains the conclusion.

2 Methods

This section details the procedure that would be followed if the proposed approach were to be implemented. HR data collections, preprocessing including data cleaning, data normalization, data integration, a proposed technique of K-C-means-data driven cuckoo bat optimization algorithm (KCM-DCBOA) technique, a feature extraction utilizing principal component analysis (PCA), classifications utilizing support vector machines (SVMs), and so on are all depicted in the diagram. Fig. 4.1 depicts the flow of the suggested methodology.

Fig. 4.1. 
Flow of the Suggested Methodology.

Fig. 4.1.

Flow of the Suggested Methodology.

2.1 HR Data Collection

At a large institution's part-time MBA program, executives were chosen by approaching HR-focused students. Since taking part in the study was entirely optional, only about 83% (n = 481) of executives were present for the first phase. These executives could have been away on vacation, personal leave, or business trips. The second portion of the study, which lasted around 14 weeks, involved the author hand administering a questionnaire to a subject to collect data on a density of job experience. Nearly 79% of this sample identified their managers and gave contact information for follow-up (n = 289), and approximately 76% of respondents showed up for the second phase of data collection (n = 366).

2.2 Preprocessing

Unstructured data are converted into a more understandable format during data preparation. Before using machine learning and data mining methods, it's important to make sure the data are of sufficient quality. Preprocessing consists of the following steps: data cleansing, data normalization, and data integration.

2.2.1 Data Cleaning

In computing, “data cleaning” relates to the process of correcting or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data from a dataset. If we combine data from many sources, we run the danger of having duplicate or incorrectly labeled information.

2.2.2 Data Normalization

The three data normalization techniques that are most frequently employed in the literature min–max, decimal scaling, and z-score are briefly discussed in this section. Additionally, we go through the sliding window method, which is typically used to standardize time series data.

Attribute A's values are normalized using the min–max technique, which takes into account both the minimum and maximum values of A. Using the given A value, a, it computes the ranges (low, high) for the resulting a′ value. According to the following Eq. (4.1):

(4.1) a = ( high low ) × a min A max A min A + low

Since the lowest and maximum values of the out-of-sample datasets are unknown, the min–max normalization approach cannot be used for time series forecasting. If we look at the minimum ( min A ) and maximum ( max A ) values in the in-sample datasets, we can easily solve this issue by assigning low and high to all out-of-sample values that fall between those ranges. This method, however, causes a dramatic drop in the quality of learning procedures and an over-concentration of values in a small portion of the normalized range.

Decimal scaling normalization is another popular normalization technique in which a decimal point of values of attribute A is shifted to correspond with its greatest absolute value. Thus, we convert an A value, a, to its normalized form, a′, by the following Eq. (4.2):

(4.2) a = a ( 10 d )
where d is the smallest positive integer such that Max ( | a | ) < 1 . Similar to min–max, this approach requires knowing the highest values in a time series and suffers from the same limitations when working with time series data.

Last but not least, the z-score normalization averages and standardizes the values of attribute A. To convert an A value, a, to a more universal value, a′, we compute Eq. (4.3).

(4.3) a = a μ ( A ) σ ( A )

When the real minimum and maximum values for attribute A are unknown, this approach excels in stationary situations, but it struggles with nonstationary time series because the mean and standard deviations change with time.

2.2.3 Data Integration

Data integration is a preprocessing method that unifies disparate datasets from several sources into a cohesive database. Multiple data cubes, databases, or flat files could be among these resources.

2.3 K-C-Means-Data Driven Cuckoo Bat Optimization Algorithm

According to the K-C-means algorithm, data can be split into two or more groups with varying membership coefficients. Clustering with K-C-means is a recursive procedure. The first step is to build the initial partition matrix and determine the starting points for the clusters. At each iteration, an objective function is reduced and the cluster centers and membership cutoffs are recalculated to improve the solution. After a predetermined maximum number of iterations, or if the goal function doesn't improve by at least a user-specified threshold between two consecutive iterations, the procedure terminates. When the improvement of the objective function between two successive iterations is below the minimum quantity of improvement stated, or when a maximal number of iterations has been reached, the process terminates at a set point known as the threshold. The update in the iteration is performed using the membership degree and the center of the cluster, which are the two parameters changing. Additionally, a fuzziness coefficient denoted by “m,” is selected; “m” can be any positive real integer higher than 1.

To divide a set of N vectors into C groups, an algorithm is available; this method is called KC-means clustering, or hard KC-means clustering. This algorithm determines the centroid of each cluster. Specifically, the dissimilarity function is minimized using this algorithm. Initially, the read function is used to import the image into the MATLAB workspace. Clustering is an approach to classifying collections of things. By default, KC-means clustering assumes that each item has a discrete spatial location. It finds clusterings where items are as close to each other as feasible inside their group, and as far away from other groups as possible. To use KC-means clustering, you must choose the distance metrics to measure how close items are to one another and the number of clusters into which they will be divided. Additionally, pixels are the building blocks of any image. The k-means clustering algorithm is used to assign labels to these pixels. KC-means provides an index representing a cluster for each object in the input. The KC-means cluster center output will be utilized in a subsequent demonstration stage. In Fuzzy K-C-Means, the goal is to achieve a best-case scenario with an iteration count equal to that of Fuzzy K-C-Means. This means that even with a reduced number of iterations, we will obtain a reliable result. In Fuzzy K-C-Means, the goal is to achieve a best-case scenario with an iteration count equal to that of Fuzzy K-C-Means. This means that even with a reduced number of iterations, we will obtain a reliable result.

A cuckoo algorithm (CA) has come a long way since its inception and has finally earned a place alongside other optimization techniques. This algorithm is inspired by observations of obligate brood parasitism in certain species of cuckoos and the Levy flight behavior observed in certain species of birds and fruit flies. The CA is a swarm-based metaheuristic algorithm that quickly finds a middle ground between exploiting the immediate environment and exploring the entire search space. A distinctive trait of the cuckoo is how it lays its eggs. Here are three simplified and idealized principles that characterize and explain the typical cuckoo search:

  1. Every time a cuckoo lays an egg, she chooses a nest at random and deposits it there.

  2. High-quality egg-laying nests will be passed down from generation to generation.

  3. There is a constant supply of host nests, and the cuckoo's egg has a likelihood of being found by the host bird of P α ( 0,1 ) . A host bird could either quit the nest and start over, or it can remove the egg. In addition, an n host nest can oust the additional nests using probability P.

The CA begins with a population of n host nests, similarly to other swarm-based algorithms. The cuckoos bearing eggs, as well as the random Levy flights, will attract these initial hosts’ nests at random. Following this, the quality of the nest will be assessed and compared to that of a different host nest selected at random. The current host nests may be replaced by the new host nest if it proves to be superior. The cuckoo lays the egg for this novel approach. Host birds will either discard an egg or abandon it if they find it, given the probability P α ( 0,1 ) . In this step, new random solutions are substituted for the many existing ones. Since the cuckoo only lays a single egg, it can only signify a single answer. The goal is to get rid of the poorest solutions and bring in more novel ones that will hopefully be better. The CA, on the other hand, can be made more intricate by having numerous eggs in each nest to stand in for a group of answers.

Like a bat algorithm, the CA strikes a balance between probing and exploiting the environment. Integrating the local random walk is equivalent to the CA. The local random walk Eq. (4.4) can be written as:

(4.4) x i I + 1 = x i t + α s H ( P α ε ) ( x j T x k t )

If x i t and x k t are two solutions chosen at random by permutations, the Heaviside function H ( u ) generates a random number between 0 and 1 with uniform probability, and s is a step size. It is also possible to give an expression for a worldwide random walk of exploration using Levy flights. According to the following Eqs. (4.5), (4.6):

(4.5) x i t + 1 = x i t + α L ( s , λ ) ,
And
(4.6) L ( s , λ ) = λ Γ ( λ ) sin ( λ π / 2 ) π 1 s 1 + λ , s s 0 > 0 10
where L is the typical size at which a problem of interest occurs and α > 0 is a size scaling factor. The formula for this number is α = O ( L / 10 ) , while α = O ( L / 100 ) is more emotive. There is only one way to find the following location, x t + 1 , which is represented by x t in the equation above. Two well-known models for this situation are a random walk and a Markov chain. The second term is the transition probability, and it is denoted by α L ( s , λ ) . Though the new solution should be created at a far enough distance from the best solution at the moment and some random aspects should be introduced to avoid early convergence and increase diversity (not just restricted to a local optimum). The search process of a CA is illustrated in Fig. 4.2.

Fig. 4.2. 
Flowchart of Cuckoo Algorithm.

Fig. 4.2.

Flowchart of Cuckoo Algorithm.

The bat algorithm (BA) takes its cues from the way bats use echolocation. In the wild, you can find a wide variety of bat species. Their navigational and hunting behaviors are similar, yet their physical characteristics vary greatly. Microbats rely heavily on echolocation, which helps them find food and avoid dangers even in total darkness. This cutting-edge optimization method can be applied to the study of microbat behavior.

Throughout the iterative process of the BA, the artificial bat's location vector, velocity vector, and frequency vector are modified to reflect the current state of play. The BA's position and speed vectors are useful for sifting through the search space.

In a d-dimensional search space, each bat occupies a certain location (represented by X i ), has a specific frequency (represented by F i ), and travels at a specific speed (represented by V i ). In Eqs. (4.7), (4.8), and (4.9), the velocities, positions, and frequencies are revised.

(4.7) V i ( t + 1 ) = v i ( t ) + ( x i ( t ) Gbest ) × F i
(4.8) X i = ( t + 1 ) = X i ( t ) + V i ( t + 1 )
where Gbest is the current best solution and F i is the updated frequency of the i th bat after each iteration.
(4.9) F i = F min + ( F max F min ) × β
Where β is a uniformly distributed random number in [0,1]. To enhance its exploitative capacity, the BA used a random walk, the Eq. (4.10) as described below:
(4.10) x new = x old + ε A t
where ε is an arbitrary value in the range [−1,1], and A is an output sound pressure level. The loudness or pulse emissions ( r ) are revised at each iteration in the following way. According to the following Eqs. (4.11), (4.12):
(4.11) A i ( t + 1 ) = α A i ( t )
(4.12) r i ( t + 1 ) = r i ( 0 ) ( 1 e ( γ × t ) )
Where α and γ are two constants between 0 and 1 which influence the rates of change for A i and ( r i ), respectively.

Algorithm 1 Basic Bat-Inspired Algorithm

  • 1: Set X i = ( i = 1,2 , . , n ) , V i , and start the bat population.

  • 2: Define pulse frequency F i

  • 3: Set the r i pulse rate and A i volume to initial values.

  • 4: while t < Max number of iterations do

  • 5: Create fresh ideas by varying the frequency,

  • 6: Changing speeds and locations

  • 7: if rand > r i then

  • 8: Choose at random one of the top solutions.

  • 9: Create a regional solution based on the best option.

  • 10: end if

  • 11: Create a fresh answer by flying at random

  • 12: if rand < A i and f ( X i ) < f ( x ) then

  • 13: Accept the innovative approaches.

  • 14: Raising r i and lowering A i

  • 15: end if

  • 16: Determine the current Gbest by ranking the best.

  • 17: end while

The combined algorithm of the BA and CA prioritizes exploitation to boost slow convergence and avoids low-fitness solutions to boost solution quality. Levy flight is used in optimization and optimal search, and it is effective at these tasks, with positive outcomes that point to a promising start. In this way, CA strikes a good balance between discovery and exploitation. In contrast, there are cases in which solutions are not fully utilized, such as when a large step yields a new solution that is either too distant from the original solution or falls outside the boundary. When the step size is too tiny, on the other hand, the effect is negligible. To get over this shortcoming of CA, the Cooperative bat searching Algorithm (CBA) is developed, which makes use of the benefits of BA; BA can give fast convergence in the first stage by transitioning from exploration to exploitation. Consequently, the CBA's benefits include raising solution quality, improving performance, and avoiding being stuck in a cycle of local maxima.

The CBA flowchart, which is broken up into three sections, is depicted in Fig. 4.3. The initialization and comparison of the Levy flight and tournament selection solutions are shown in the first part before moving on to the second. A crimson band, which stands in for the BA component, surrounds the second component. On a basis of solutions i from the first phase, a new solution is constructed in this section. More so, Eqs. (4.9), (4.11), and (4.12) determine pulse frequencies F i , loudness A i , and pulse rate r i , respectively. The updated coordinates and velocities (Eqs. 4.7 and 4.8) enable the search for all feasible solutions around the optimum solution.

Fig. 4.3. 
Flowchart of CBA.

Fig. 4.3.

Flowchart of CBA.

2.4 Feature Extraction Using PCA

PCA is a method for easy identification and classification that can be used to deal with datasets built from a large number of noisy and highly correlated process observations. The idea behind PCA is to map the dataset onto a lower dimensional space. Collinearity between datasets is eliminated or considerably reduced in this compact representation. To do this, PCA re-explains the variance in the original data matrix X m × n , which had m observation of n variables ( m > N ) in terms of a new set of independent factors. The Eq. (4.13) is calculated as follows:

(4.13) X = T P T + E = t 1 p 1 T + + t 2 p 2 T + t a p a T + E
where P represents a loading matrix, E represents residual matrixes, and T represents a matrix containing the principal components' scores. The dimension a n , a should ideally be selected so that it leaves no substantial process information in E. Since the matrix E represents a random error, adding another PC would only fit the noise, raising the prediction errors.

The loading vector is subject to | p i | = 1 and is orthogonal to one another, implying that p i T p j = 0 ( i j ) , p i T p j = 1 ( i j ) . A new orthogonal basis is established by the PCs of an observation space of X. The score vectors t i = ( i = 1 , , a ) locate each observation x i on the PC subspace. The distances along each PC from the subspace's origin make up a score vector's elements. A loading vectors p i and an actual observation are multiplied to arrive at the principal components scores t i , which is then stated as follows. Following is an Eq. (4.14) that describes:

(4.14) t i = X p i ( i = 1 , , a )

The PCA identifies the largest source of data variability, successive components account for progressively less variation. Significant variations are assumed to be related to feature space structure, and redundant features are eliminated with minimal accuracy loss.

Singular value decomposition is a well-liked method for determining PCs. It is straightforward since it is an eigenvalue problem of the covariance matrix, which can be easily solved. The Eq. (4.15) is given below:

(4.15) C = 1 m i = 1 m ( x i x ¯ ) ( x i x ¯ ) T
where x ¯ is the mean value of x i .

Solving an eigenvalue equation is necessary to accomplish this. By utilizing this Eq. (4.16):

(4.16) λ v = C v
where C is the covariance matrix and v is its eigenvalue at λ 0 . The orthogonal projection onto the eigenvectors, denoted here as “new features,” are new coordinates on an eigenvector basis and are represented by PCs. PC modeling is used to summarize the time-varying and interfeature correlations structure in the feature datasets in a single, digestible model. Three-dimensional and two-dimensional charts, often known as “windows into multidimensional features space,” can be used for condition monitoring. Online, you can utilize a PC model to see the current process state against a historical background for known process states.

2.5 Classification Using SVM

In the domain of supervised learning, SVMs were a relatively recent tool for performing tasks like binary classification, regression, and outlier detection. SVM is superior to other classification algorithms because its structure is straightforward and it uses a modest set of features. According to statistical learning theory, SVM is the best classifier algorithm for minimizing structural risk. Pattern regression and classification issues were the original motivation for developing SVMs.

The task is to assign a new data point to one of two groups, each of which is represented by a set of existing points. In SVMs, a data point is seen as an n-dimensional vector in an n-dimensional space R n , and we want to know if we can separate these points with an ( n 1 ) -dimensional hyperplane (Canonical plane). A linear classifier is a term for this type of system. The data could be classified along any number of hyperplanes. Due to the positive correlation between margin and generalization error, the hyperplane with the highest gap among the two classes is a strong candidate for the optimal hyperplane. It is possible to determine the hyperplane by utilizing the margins and support vectors. After “pushing against” two datasets, the canonical planes are used to construct two simultaneous supporting hyperplanes, one on either side of the plane to estimate the margin. To do this, we pick a hyperplane that is farthest away from the nearest data point on both sides. If such a hyperplane exists, we refer to it as a maximum margin hyperplane, and we refer to the linear classifier that it defines as a maximum margin classifier, also known as a perceptron because it achieves the highest levels of stability.

During its training phase, an SVM constructs spatial models of data points so that there is a sharp, maximally broad separation between the data points of different categories. Then, we anticipate which group a new set of instances belongs to dependent on which side of a divide they fall. Through the use of the kernel method, which implicitly maps their input into high-dimensional feature spaces, SVMs can easily conduct a nonlinear classification in addition to their more well-known linear classification capabilities. SVMs, in their most formal form, create a hyperplane or group of hyperplanes in a large or infinite dimensional space for classification, regression, and other tasks.

3 Experiments and Results

In this section, we discuss the recommended framework and its overall behavior. Figs. 4.4, 4.5, 4.6, 4.7, and 4.8 show the comparison of parameters, like accuracy, precision, specificity, recall, and F1 measure for existing and proposed methods. For example, among the approaches that may be utilized are the back propagation neural network (BPNN), PSO, ACO, confirmatory factor analysis (CFA), and KCM-DCBOA.

Fig. 4.4. 
Accuracy Results of Proposed and Existing Methodology.

Fig. 4.4.

Accuracy Results of Proposed and Existing Methodology.

Fig. 4.5. 
Precision Results of Proposed and Existing Methodology.

Fig. 4.5.

Precision Results of Proposed and Existing Methodology.

Fig. 4.6. 
Results of Proposed and Existing Methodologies' Specificity.

Fig. 4.6.

Results of Proposed and Existing Methodologies' Specificity.

Fig. 4.7. 
Recall Results of Proposed and Existing Methodology.

Fig. 4.7.

Recall Results of Proposed and Existing Methodology.

Fig. 4.8. 
F1-Measure Results of Proposed and Existing Methodology.

Fig. 4.8.

F1-Measure Results of Proposed and Existing Methodology.

Neural network training relies fundamentally on a process called backpropagation. Finer tuning of a neural network's weights according to the error rate (loss) measured from the previous epoch's training (i.e., iteration). Reduced error rates and improved generalization are two benefits of fine-tuning the weights. The backpropagation algorithm has the potential downside of being very sensitive to irregularities and noisy data. Training data has a significant impact on backpropagation's final result. Training with backpropagation requires a significant amount of time. PSO is a method for optimizing a problem using a computer by repeatedly searching for and testing improved versions of candidate solutions to the problem. Many benefits, including quick convergence, are associated with the basic PSO approach. One notable drawback, however, is that PSO algorithms frequently converge to local optimization. One population-based metaheuristic which can help with finding approximations to tough optimization issues is ACO. Artificial ACO is a technique where a colony of software agents, nicknamed “artificial ants,” works to find optimal solutions to an optimization issue. If you want to use ACO, you'll need to first modify your optimization issue into one where you need to determine the optimal path via a weighted graph. To verify the observed dataset's factor structure, statisticians employ a technique called CFA. With CFA, the researcher can investigate the hypothesis that the observed variables are linked to latent constructs. One possible drawback of this often-used method is that factors that differ in quantity and substance from test scales may be generated due to excessive incidental item intercorrelations or the over or under of certain items. Therefore, in this work, we applied the KCM-DCBOA to address these challenges.

To determine a test's accuracy, it must correctly distinguish between patient and healthy samples. Calculating the ratio of positive to negative results across all instances is a good way to get a sense of a test's reliability. The accuracy Eq. (4.17) is described given below:

(4.17) Accuracy = TP + TNTP + TN + FP + FN

The results of suggested and existing approaches' accuracy calculations are shown in Fig. 4.4. According to the aforementioned graph, the proposed approach of KCM-DCBOA has a 94% higher accuracy level than the existing methods.

Precision is determined by dividing the total numbers of true positives and false positives by the imbalanced classification problem's two classes. According to the following Eq. (4.18):

(4.18) Precision = TP TP + FP

Fig. 4.5 represents the precision of the proposed and existing methodology. As shown in Fig. 4.5, the suggested approach of KCM-DCBOA has a high precision than the existing methods such as BPNN, PSOA, ACO, and CFA.

A test's specificity is measured by how well it can identify healthy instances. Calculating the percentage of true negatives in healthy patients will help us estimate it. The Eq. (4.19) is calculated follows as:

(4.19) Specificty = TN TN + FP

Specificity findings using suggested and existing methods are shown in Fig. 4.6. In comparison to the suggested approach of KCM-DCBOA (see Fig. 4.6), existing techniques such as BPNN, PSOA, ACO, and CFA have poor specificity.

By dividing the real positives by anything else which should have been projected as positive, recall (also known as the True Positive Rate) is obtained. As shown in Fig. 4.7, the suggested approach of KCM-DCBOA has a high recall of 90% than the existing methods. The Eq. (4.20) is given below:

(4.20) Recall = TP TP + FN

The F1 measure represents a happy medium between recall and precision. In terms of measuring success, it is a statistic. A person's F1 measure represents the mean of their accuracy and recall scores. Fig. 4.8 represents the F1 measure results of the proposed and existing methodology. From Fig. 4.8, the proposed approach of KCM-DCBOA has a high F1 measure than the existing methods. Eq. (4.21) is described below:

(4.21) F 1 measure = TP TP + 1 2 ( FP + FN )

4 Conclusion

The vitality of the business and the morale of its workers can both benefit from a well-organized HR department. Across a wide range of sectors, data derived by WI apps have become the norm. Some of the newest optimization algorithms modified from their natural world inspirations for clustering are presented and tested in the context of a WI scenario in this study. Comparisons are made to state-of-the-art methods in terms of accuracy, precisions, specificity, recall, and the F1 measure. Our future studies will incorporate a broader range of data types for the tests as well as more evaluation measures. Compared to other approaches now in use, our suggested method outperforms them.

References

Asís et al., 2022 Asís, E. H. R. , Figueroa, R. P. N. , Quiñones, R. E. T. , & Márquez Mázmela, P. R. H. (2022). Validation of a cybercrime awareness scale in Peruvian university students. Revista Científica General José María Córdova, 20(37), 208224. doi:10.21830/19006586.791

Bharti et al., 2020, Bharti, V. , Biswas, B. , & Shukla, K. K. (2020, January). Recent trends in nature inspired computation with applications to deep learning. In 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 294299). IEEE.

Castillo et al., 2022 Castillo, A. , Fernández, C. E. , Camones, O. G. , & Guerra, M. E. (2022). Digitalización de la cadena de suministro y la competitividad de las empresas peruanas del sector minorista. Revista Cientifica Epistemia, 6(2), 7795. doi:10.26495/re.v6i2.2297

Castro et al., 2022 Castro, J. A. , Castillo, A. , Camones, O. G. , & Cochachin, L. F. (2022). Mejoramiento del servicio al cliente y ampliación de las oportunidades de venta mediante el uso de tecnología de punta en el comercio textil peruano. Revista Cientifica Epistemia, 6(2), 3549. doi:10.26495/re.v6i2.2294

Dey et al., 2020 Dey, N. , Ashour, A. S. , & Bhattacharyya, S. (Eds.). (2020). Applied nature-inspired computing: algorithms and case studies. Springer Singapore.

Herrera and de las Heras-Rosas, 2020 Herrera, J. , & de las Heras-Rosas, C. (2020). Corporate social responsibility and human resource management: Towards sustainable business organizations. Sustainability, 12(3), 841.

Hmoud and Laszlo, 2019 Hmoud, B. , & Laszlo, V. (2019). Will artificial intelligence take over human resources recruitment and selection. Network Intelligence Studies, 7(13), 2130.

Huerta-Soto et al., 2022 Huerta-Soto, R. , Ramirez-Asis, H. , Mukthar, K. J. , Rurush-Asencio, R. , Villanueva-Calderón, J. , & Zarzosa-Marquez, E. (2022). Purchase intention based on the brand value of pharmacies in a locality of the Peruvian highlands. In International Conference on Business and Technology (pp. 6778). Cham: Springer International Publishing. doi:10.1007/978-3-031-26956-1_7

Li and Zhou, 2022 Li, J. , & Zhou, Z. (2022). Design of human resource management system based on deep learning. Computational Intelligence and Neuroscience, 2022.

Mellal, 2022 Mellal, M. A. (2022). Some words about nature-inspired computing. In Applications of nature-inspired computing in renewable energy systems (pp. 19). Hershey, PA: IGI Global.

Oral and Turgut, 2018, Oral, M. , & Turgut, S. S. (2018, October). Performance analysis of relatively new nature inspired computing algorithms. In 2018 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 16). IEEE.

Pallathadka et al., 2023 Pallathadka, H. , Wenda, A. , Ramirez-Asís, E. , Asís-López, M. , Flores-Albornoz, J. , & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today Proceedings, 80, 37823785. doi:10.1016/j.matpr.2021.07.382

Ramirez-Asis et al., 2020 Ramirez-Asis, E. , Maguina, M. E. , Infantes, S. E. , & Naranjo-Toro, M. (2020). Emotional intelligence, competencies and performance of the university professor: Using the SEM-PLS partial least squares technique. Rev. Electron. Interuniv. Form. Profr, 23, 99114. doi:10.6018/reifop.428261

Rui et al., 2019, Rui, T. , Fong, S. , Yang, X. S. , & Deb, S. (2019, December). Nature-inspired clustering algorithms for web intelligence data. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (Vol. 3, pp. 147153). IEEE.

Shaikh et al., 2022 Shaikh, A. A. , Lakshmi, K. S. , Tongkachok, K. , Alanya-Beltran, J. , Ramirez-Asis, E. , & Perez-Falcon, J. (2022). Empirical analysis in analysing the major factors of machine learning in enhancing the e-business through structural equation modelling (SEM) approach. International Journal of System Assurance Engineering and Management, 13(Suppl 1), 681689. doi:10.1007/s13198-021-01590-1

Sohrabi et al., 2018 Sohrabi, B. , Vanani, I. R. , & Abedin, E. (2018). Human resources management and information systems trend analysis using text clustering. International Journal of Human Capital and Information Technology Professionals, 9(3), 124.

Soto et al., 2023 Soto, R. H. , Asis, E. H. , Figueroa, R. P. , & Plasencia, L. (2023). Autoeficacia emprendedora y desempeño de micro y pequeñas empresas peruanas. Revista Venezolana de Gerencia: RVG, 28(102), 751768. doi:10.52080/rvgluz.28.102.19

Vasantham, 2021 Vasantham, S. T. (2021). The role of artificial intelligence in human resource management. doi:10.30726/ESIJ/V8.I2.2021.82013

Veluchamy et al., 2021 Veluchamy, R. , Sanchari, C. , & Gupta, S. (2021). Artificial intelligence within recruitment: Eliminating biases in human resource management. Artificial Intelligence, 8(3).

Venusamy et al., 2020, Venusamy, K. , Rajagopal, N. K. , & Yousoof, M. (2020, December). A study of human resources development through chatbots using artificial intelligence. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) (pp. 9499). IEEE.

Votto et al., 2021 Votto, A. M. , Valecha, R. , Najafirad, P. , & Rao, H. R. (2021). Artificial intelligence in tactical human resource management: A systematic literature review. International Journal of Information Management Data Insights, 1(2), 100047.

Wang, 2022 Wang, W. (2022). Design and simulation of human resource allocation model based on artificial intelligence and in-depth data analysis. In International Conference on Multi-modal Information Analytics (pp. 192200). Cham: Springer.

Yahia et al., 2021 Yahia, H. S. , Zeebaree, S. R. , Sadeeq, M. A. , Salim, N. O. , Kak, S. F. , Adel, A. Z. , … Hussein, H. A. (2021). Comprehensive survey for cloud computing based on nature-inspired algorithms optimization scheduling. Asian Journal of Research in Computer Science, 8(2), 116.

Yating et al., 2022 Yating, Y. , Mughal, N. , Wen, J. , Ngan, T. T. , Ramirez-Asis, E. , & Maneengam, A. (2022). Economic performance and natural resources commodity prices volatility: Evidence from global data. Resources Policy, 78, 102879. doi:10.1016/j.resourpol.2022.102879

Zhao, 2020, Zhao, Y. (2020, December). Application of K-means clustering algorithm in human resource data informatization. In Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies (pp. 1216).

Prelims
Part I Business Intelligence, Technology for Sustainability
Chapter 1 Artificial Intelligence and Marketing: Challenges and Opportunities
Chapter 2 Digital Resources and Social Skills Development for Credit Analysts in Banks Focused on Green Finance
Chapter 3 Digital Transformation: A Catalyst for Sustainable Business Practices
Chapter 4 An Innovative Web Intelligence Data Clustering Algorithm for Human Resources Based on Sustainability
Part II Technological Education and Skills Development for Sustainable Practices
Chapter 5 Blockchain Technology and Virtual Asset Accounting in the Metaverse
Chapter 6 Stakeholders' Perceptions of Sustainability Accounting Education: A Literature Review
Chapter 7 A Symphony of Insights: Orchestrating Business and Education Research With Google Bard
Chapter 8 Interpersonal Competence and Teaching Quality in a Sustainable Public University
Chapter 9 EdTech Tools for Sustainable Practices: A Green Revolution in Education
Chapter 10 Gap Analysis of Employability Attributes Among Job Seekers in Bahrain: Employee Perspective
Chapter 11 Digital Competencies and Attitude Toward the Use of Information Technologies in Secondary School Teachers in a Peruvian Public Educational Institution
Chapter 12 Effect of Business Intelligence Applications on the Contribution of Accounting Departments at Jordanian Universities in Developing University Accounting Education and Its Quality Assurance
Part III Digital Technologies, Economic Diversification, Entrepreneurial Capacities, and Sustainability
Chapter 13 Building Productive Capacity for Economic Diversification in the Gulf Region
Chapter 14 Assessing the Sustainability of GCC Economic Growth: A Proposed Theoretical Framework
Chapter 15 Emotional Intelligence, Job Satisfaction, and Work Engagement at a Public University
Chapter 16 Entrepreneurial Capabilities and Survival of Microentrepreneurs in Rural Peru
Chapter 17 Socioeconomic Factors and Financial Inclusion in the Department of Ancash, Peru, 2015 and 2021
Chapter 18 Board Structure and Financial Performance: A Survey to Directors' Perception
Chapter 19 Human Development Based on the Competitiveness of the Peruvian Region of Ancash, 2008–2021
Chapter 20 Female Education and Economic Growth in Egypt: An Empirical Study
Index