TY  - JOUR
TI  - Wrapper for building classification models using Covering Arrays
T2  - IEEE Access
SP  - 148297
EP  - 148312
AU  - H. Dorado
AU  - C. Cobos
AU  - J. Torres-Jimenez
AU  - D. D. Burra
AU  - M. Mendoza
AU  - D. Jimenez
PY  - 2019
KW  - Feature extraction
KW  - Genetic algorithms
KW  - Arrays
KW  - Search problems
KW  - Buildings
KW  - Testing
KW  - Particle swarm optimization
KW  - Classification algorithms
KW  - Covering arrays
KW  - Random forest
KW  - Support vector machines
KW  - Genetic algorithms
KW  - Particle swarm optimization
DO  - 10.1109/ACCESS.2019.2944641
JO  - IEEE Access
IS  - 1
SN  - 2169-3536
VO  - 
VL  - 7
JA  - IEEE Access
Y1  - 
AB  - Wrapper methods are a type of feature selection method that finds a subset of variables to improve the performance of a classifier by removing redundant and irrelevant variables. The use of a wrapper implies that each time a candidate solution is explored, the classifier is evaluated on the quality measures selected (e.g. accuracy or precision). Though robust, this iteration across several candidate solutions can become computationally intensive and time-consuming. In this paper we propose a wrapper, that is based on binary Covering Arrays (CAs), and binary Incremental Covering Arrays (ICAs), that have been widely used for experimental design and fault detection in software and hardware testing. The new wrapper was evaluated with six classifiers on seven data sets. The results show that the CAs and ICAs with strength 6 significantly improve the performance and reduces the number of variables required by the classifier. A comparative analysis of the proposed method against wrappers based on other search approaches such as genetic algorithms (GA) and particle swarm optimization (PSO), shows that the proposed method yields results similar to GA, but not to PSO, with differences to PSO, in accuracy, which in the majority of cases is below 0.04. This lack of accuracy, by which the new wrapper fails to match PSO, is offset by the fact that the user does not need to fine tune algorithm parameters, such as velocity ranges, timing, cognitive coefficient, and social coefficient, while it is also much easier to program in parallel.
ER  -