Search results
1 – 6 of 6Qizi Huangpeng, Wenwei Huang, Hanyi Shi and Jun Fan
Vehicles estimation can be used in evaluating traffic conditions and facilitating traffic control, which is an important task in intelligent transportation system. The paper aims…
Abstract
Purpose
Vehicles estimation can be used in evaluating traffic conditions and facilitating traffic control, which is an important task in intelligent transportation system. The paper aims to propose a vehicle-counting method based on the analysis of surveillance videos.
Design/methodology/approach
The paper proposes a novel two-step method using low-rank representation (LRR) detection and locality-constrained linear coding (LLC) classification to count the number of vehicles in traffic video sequences automatically. The proposed method is based on an offline training to understand an LLC-based classifier with extracted features for vehicle and pedestrian classification, followed by an online counting algorithm to count the number of vehicles detected from the image sequence.
Findings
The proposed method allows delivery estimation (counting the number of vehicles at each frame only) and total number estimation of vehicles shown in the scene. The paper compares the proposed method with other similar methods on three public data sets. The experimental results show that the proposed method is competitive and effective in terms of computational speed and evaluation accuracy.
Research limitations/implications
The proposed method does not consider illumination. Hence, the results might be unsatisfactory under low-lighting condition. Therefore, researchers are encouraged to add a term that controls the illumination changes into the energy function of vehicle detection in future work.
Originality/value
The paper bridges the gap between LRR detection and vehicle counting by taking advantage of existing LLC classification algorithm to distinguish different moving objects.
Details
Keywords
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…
Abstract
Purpose
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.
Design/methodology/approach
To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.
Findings
Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.
Originality/value
This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.
Details
Keywords
Padmavati Shrivastava, K.K. Bhoyar and A.S. Zadgaonkar
The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision, in gathering knowledge about the structure, content and the…
Abstract
Purpose
The purpose of this paper is to build a classification system which mimics the perceptual ability of human vision, in gathering knowledge about the structure, content and the surrounding environment of a real-world natural scene, at a quick glance accurately. This paper proposes a set of novel features to determine the gist of a given scene based on dominant color, dominant direction, openness and roughness features.
Design/methodology/approach
The classification system is designed at two different levels. At the first level, a set of low level features are extracted for each semantic feature. At the second level the extracted features are subjected to the process of feature evaluation, based on inter-class and intra-class distances. The most discriminating features are retained and used for training the support vector machine (SVM) classifier for two different data sets.
Findings
Accuracy of the proposed system has been evaluated on two data sets: the well-known Oliva-Torralba data set and the customized image data set comprising of high-resolution images of natural landscapes. The experimentation on these two data sets with the proposed novel feature set and SVM classifier has provided 92.68 percent average classification accuracy, using ten-fold cross validation approach. The set of proposed features efficiently represent visual information and are therefore capable of narrowing the semantic gap between low-level image representation and high-level human perception.
Originality/value
The method presented in this paper represents a new approach for extracting low-level features of reduced dimensionality that is able to model human perception for the task of scene classification. The methods of mapping primitive features to high-level features are intuitive to the user and are capable of reducing the semantic gap. The proposed feature evaluation technique is general and can be applied across any domain.
Details
Keywords
Vanessa El‐Khoury, Martin Jergler, Getnet Abebe Bayou, David Coquil and Harald Kosch
A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the…
Abstract
Purpose
A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the object level. The authors address these requirements by proposing semantic video content annotation tool (SVCAT) for structural and high‐level semantic video annotation. SVCAT is a semi‐automatic MPEG‐7 standard compliant annotation tool, which produces metadata according to a new object‐based video content model introduced in this work. Videos are temporally segmented into shots and shots level concepts are detected automatically using ImageNet as background knowledge. These concepts are used as a guide to easily locate and select objects of interest which are then tracked automatically to generate an object level metadata. The integration of shot based concept detection with object localization and tracking drastically alleviates the task of an annotator. The paper aims to discuss these issues.
Design/methodology/approach
A systematic keyframes classification into ImageNet categories is used as the basis for automatic concept detection in temporal units. This is then followed by an object tracking algorithm to get exact spatial information about objects.
Findings
Experimental results showed that SVCAT is able to provide accurate object level video metadata.
Originality/value
The new contribution in this paper introduces an approach of using ImageNet to get shot level annotations automatically. This approach assists video annotators significantly by minimizing the effort required to locate salient objects in the video.
Details
Keywords
Ushapreethi P and Lakshmi Priya G G
To find a successful human action recognition system (HAR) for the unmanned environments.
Abstract
Purpose
To find a successful human action recognition system (HAR) for the unmanned environments.
Design/methodology/approach
This paper describes the key technology of an efficient HAR system. In this paper, the advancements for three key steps of the HAR system are presented to improve the accuracy of the existing HAR systems. The key steps are feature extraction, feature descriptor and action classification, which are implemented and analyzed. The usage of the implemented HAR system in the self-driving car is summarized. Finally, the results of the HAR system and other existing action recognition systems are compared.
Findings
This paper exhibits the proposed modification and improvements in the HAR system, namely the skeleton-based spatiotemporal interest points (STIP) feature and the improved discriminative sparse descriptor for the identified feature and the linear action classification.
Research limitations/implications
The experiments are carried out on captured benchmark data sets and need to be analyzed in a real-time environment.
Practical implications
The middleware support between the proposed HAR system and the self-driven car system provides several other challenging opportunities in research.
Social implications
The authors’ work provides the way to go a step ahead in machine vision especially in self-driving cars.
Originality/value
The method for extracting the new feature and constructing an improved discriminative sparse feature descriptor has been introduced.
Details
Keywords
Zhe Jing, Yan Luo, Xiaotong Li and Xin Xu
A smart city is a potential solution to the problems caused by the unprecedented speed of urbanization. However, the increasing availability of big data is a challenge for…
Abstract
Purpose
A smart city is a potential solution to the problems caused by the unprecedented speed of urbanization. However, the increasing availability of big data is a challenge for transforming a city into a smart one. Conventional statistics and econometric methods may not work well with big data. One promising direction is to leverage advanced machine learning tools in analyzing big data about cities. In this paper, the authors propose a model to learn region embedding. The learned embedding can be used for more accurate prediction by representing discrete variables as continuous vectors that encode the meaning of a region.
Design/methodology/approach
The authors use the random walk and skip-gram methods to learn embedding and update the preliminary embedding generated by graph convolutional network (GCN). The authors apply this model to a real-world dataset from Manhattan, New York, and use the learned embedding for crime event prediction.
Findings
This study’s results show that the proposed model can learn multi-dimensional city data more accurately. Thus, it facilitates cities to transform themselves into smarter ones that are more sustainable and efficient.
Originality/value
The authors propose an embedding model that can learn multi-dimensional city data for improving predictive analytics and urban operations. This model can learn more dimensions of city data, reduce the amount of computation and leverage distributed computing for smart city development and transformation.
Details