TY - CHAP PY - 2014 SN - 978-3-319-13646-2 T2 - Human-Inspired Computing and Its Applications SE - 9 VL - 8856 T3 - Lecture Notes in Computer Science A2 - Gelbukh, Alexander A2 - Espinoza, FélixCastro A2 - Galicia-Haro, SofíaN. DO - 10.1007/978-3-319-13647-9_9 TI - Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes UR - http://dx.doi.org/10.1007/978-3-319-13647-9_9 PB - Springer International Publishing DA - 2014/01/01 AU - Molano, Viviana AU - Cobos, Carlos AU - Mendoza, Martha AU - Herrera-Viedma, Enrique AU - Manic, Milos SP - 80-91 LA - English AB - Automatic text classification into predefined categories is an increasingly important task given the vast number of electronic documents available on the Internet and enterprise servers. Successful text classification relies heavily on the vital task of dimensionality reduction, which aims to improve classification accuracy, give greater expression to the classification process, and improve classification computational efficiency. In this paper, two algorithms for feature selection are presented, based on sampling and weighted sampling that build on the C4.5 algorithm. The results demonstrate considerable improvements with regard to classification accuracy - up to 10% - compared to traditional algorithms such as C4.5, Naïve Bayes and Support Vector Machines. The classification process is performed using the Naïve Bayes model in the space of reduced dimensionality. Experiments were carried out using data sets based on the Reuters-21578 collection. ER -