Version 1
: Received: 28 August 2024 / Approved: 28 August 2024 / Online: 28 August 2024 (09:06:01 CEST)
How to cite:
Truong, N. T.; Nguyen, S. D.; Choi, S.-B. A Strategy of Weak-Connected Grid Search for Noise Filtering and Density Grid-Based Data Clustering. Preprints2024, 2024082023. https://doi.org/10.20944/preprints202408.2023.v1
Truong, N. T.; Nguyen, S. D.; Choi, S.-B. A Strategy of Weak-Connected Grid Search for Noise Filtering and Density Grid-Based Data Clustering. Preprints 2024, 2024082023. https://doi.org/10.20944/preprints202408.2023.v1
Truong, N. T.; Nguyen, S. D.; Choi, S.-B. A Strategy of Weak-Connected Grid Search for Noise Filtering and Density Grid-Based Data Clustering. Preprints2024, 2024082023. https://doi.org/10.20944/preprints202408.2023.v1
APA Style
Truong, N. T., Nguyen, S. D., & Choi, S. B. (2024). A Strategy of Weak-Connected Grid Search for Noise Filtering and Density Grid-Based Data Clustering. Preprints. https://doi.org/10.20944/preprints202408.2023.v1
Chicago/Turabian Style
Truong, N. T., Sy Dzung Nguyen and Seung-Bok Choi. 2024 "A Strategy of Weak-Connected Grid Search for Noise Filtering and Density Grid-Based Data Clustering" Preprints. https://doi.org/10.20944/preprints202408.2023.v1
Abstract
One of the efficient data mining tools is density-based clustering, including the density grid-based clustering. However, a common drawback always existing in clusters made by the density grid-based method is the existence of weakly connected grids deriving mainly from noise. Appearing such an unwanted connection with a high frequency reduces the accuracy of the obtained cluster data space (CDS) and its application efficiency. Here, we present an essential improvement to overcome this problem. First, we describe a concept of the weak-connected grid cell (WCG) and present a fuzzy-type approximation to depict the density-based distribution of data points at grid nodes. Then, we propose a strategy of searching WCG for density grid-based clustering (SWCG-DGB) to set up a CDS, filter noise, and tune the created CDS. A buffer is deployed during this phase to collect border points and filter noise, which improves the computational time significantly, especially for noisy datasets. Results from numerical surveys reflected the compared efficiency of this method in clustering validity, including the accuracy of the number of clusters.
Keywords
Clustering; Density-based clustering; Density grid-based clustering; Fuzzy approximation
Subject
Engineering, Mechanical Engineering
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.