Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

Maxmin Data Range Heuristic-Based Initial Centroid Method of Partitional Clustering for Big Data Mining

OAI: oai:igi-global.com:289954 DOI: 10.4018/IJIRR.289954
Published by: IGI Global

Abstract

The centroid-based clustering algorithm depends on the number of clusters, initial centroid, distance measures, and statistical approach of central tendencies. The initial centroid initialization algorithm defines convergence speed, computing efficiency, execution time, scalability, memory utilization, and performance issues for big data clustering. Nowadays various researchers have proposed the cluster initialization techniques, where some initialization techniques reduce the number of iterations with the lowest cluster quality, and some initialization techniques increase the cluster quality with high iterations. For these reasons, this study proposed the initial centroid initialization based Maxmin Data Range Heuristic (MDRH) method for K-Means (KM) clustering that reduces the execution times, iterations, and improves quality for big data clustering. The proposed MDRH method has compared against the classical KM and KM++ algorithms with four real datasets. The MDRH method has achieved better effectiveness and efficiency over RS, DB, CH, SC, IS, and CT quantitative measurements.