An ensemble of visual features for gaussians of local descriptors and non-binary coding for texture descriptors
[abstract] This paper presents an improved version of a recent state-of-the-art texture descriptor called Gaussians of Local Descriptors (GOLD), which is based on a multivariate Gaussian that models the local feature distribution that describes the original image. The full rank covariance matrix, which lies on a Riemannian manifold, is projected on the tangent Euclidean space and concatenated to the mean vector for representing a given image. In this paper, we test the following features for describing the original image: scale-invariant feature transform (SIFT), histogram of gradients (HOG), and weber's law descriptor (WLD). To improve the baseline version of GOLD, we describe the covariance matrix using a set of visual features that are fed into a set of Support Vector Machines (SVMs). The SVMs are combined by sum rule. The scores obtained by an SVM trained using the original GOLD approach and the SVMs trained with visual features are then combined by sum rule. Experiments show that our proposed variant outperforms the original GOLD approach. The superior performance of the proposed system is validated across a large set of datasets. Particularly interesting is the performance obtained in two widely used person re-identification datasets, CAVIAR4REID and IAS, where the proposed GOLD variant is coupled with a state-of-the-art ensemble to obtain an improvement of performance on these two datasets. Moreover, we performed further tests that combine GOLD with non-binary features (local ternary/quinary patterns) and deep transfer learning. The fusion among SVMs trained with deep features and the SVMs trained using the ternary/quinary coding ensemble is demonstrated to obtain a very high performance across datasets. The MATLAB code for the ensemble of classifiers and for the extraction of the features will be publicly available to other researchers for future comparisons.Keywords Image classification; texture; image processing; ensemble of descriptors; person re-identification