Reference Hub2
An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

Amit Saxena, John Wang, Wutiphol Sintunavarat
Copyright: © 2021 |Volume: 13 |Issue: 1 |Pages: 16
ISSN: 1942-9045|EISSN: 1942-9037|EISBN13: 9781799860648|DOI: 10.4018/IJSSCI.2021010101
Cite Article Cite Article

MLA

Saxena, Amit, et al. "An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection." IJSSCI vol.13, no.1 2021: pp.1-16. http://doi.org/10.4018/IJSSCI.2021010101

APA

Saxena, A., Wang, J., & Sintunavarat, W. (2021). An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection. International Journal of Software Science and Computational Intelligence (IJSSCI), 13(1), 1-16. http://doi.org/10.4018/IJSSCI.2021010101

Chicago

Saxena, Amit, John Wang, and Wutiphol Sintunavarat. "An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection," International Journal of Software Science and Computational Intelligence (IJSSCI) 13, no.1: 1-16. http://doi.org/10.4018/IJSSCI.2021010101

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

One of the main problems in K-means clustering is setting of initial centroids which can cause misclustering of patterns which affects clustering accuracy. Recently, a density and distance-based technique for determining initial centroids has claimed a faster convergence of clusters. Motivated from this key idea, the authors study the impact of initial centroids on clustering accuracy for unsupervised feature selection. Three metrics are used to rank the features of a data set. The centroids of the clusters in the data sets, to be applied in K-means clustering, are initialized randomly as well as by density and distance-based approaches. Extensive experiments are performed on 15 datasets. The main significance of the paper is that the K-means clustering yields higher accuracies in majority of these datasets using proposed density and distance-based approach. As an impact of the paper, with fewer features, a good clustering accuracy can be achieved which can be useful in data mining of data sets with thousands of features.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.