kmodes.Rd
Implement three unsupervised clustering algorithms on categorical datasets.
kmodes(K = 1, datafile = NULL, n_init = 1, algorithm = "KMODES_HUANG", init_method = "KMODES_INIT_RANDOM_SEEDS", seed = 1, shuffle = FALSE)
K | Number of clusters. Default is 1. |
---|---|
datafile | Path to a data file. |
n_init | Number of initializations. |
algorithm | Algorithm to implement clustering. Default is "KMODES_HUANG". See details for the options available. |
init_method | Initialization methods. Default is "KMODES_INIT_RANDOM_SEEDS". See details for the options available. |
seed | Random number seed. Default is 1. |
shuffle | Incidate if shuffle the input order. Default is FALSE. |
Returns a list of clustering results.
Algorithms avaiable:
"KMODES_HUANG"
: MacQueen's algorithm
"KMODES_HARTIGAN_WONG"
: Hartigan and Wong algorithm
"KMODES_LLOYD"
: Lloyd's algorithm
Initialization methods avaiable:
"KMODES_INIT_RANDOM_SEEDS"
: Random sampling.
"KMODES_INIT_H97_RANDOM"
: Huang1997, randomized version.
"KMODES_INIT_HD17"
: Huang1997 interpretted by Python author de Vos.
"KMODES_INIT_CLB09_RANDOM"
: Cao2009, randomized version.
"KMODES_INIT_AV07"
: K-means++ adapted.
"KMODES_INIT_AV07_GREEDY"
: K-means++ greedy adapted.
Value:
"best_cluster_size"
: Number of observations in each cluster of the best initialization.
"best_criterion"
: Optimized criterion in each cluster of the best initialization.
"best_cluster_id"
: Cluster assignment of each observation of the best initialization.
"best_modes"
: Estimated modes for each cluster of the best initialization.
"best_seed_index"
: Seed index of the best initialization.
"total_best_criterion"
: Total optimized criterion of the best initialization.
"clsuter_size"
: Number of clusters.
"data_dim"
: Dimension of input data.
"data"
: The input data.
Lloyd S (1982). “Least squares quantization in PCM.” Information Theory, IEEE Transactions on, 28(2), 129 - 137.
MacQueen J (1967). “Some methods for classification and analysis of multivariate observations.” In Cam LML, Neyman J (eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281-297.
Huang Z (1998). “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values.” Data Min. Knowl. Discov., 2, 283-304.
Huang Z (1997). “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining.” Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 28, 1-8.
Hartigan JA (1975). Clustering Algorithms. John Wiley & Sons.
# Clustering with three initializations with default algorithm ("KMODES_HUANG") datFile <- system.file("extdata", "zoo.int.data", package = "CClust") res_kmodes <- kmodes(K = 5, datafile = datFile, n_init = 3, shuffle = TRUE) # Clustering with Harigan and Wong and K-means++ greedy adapted initialization method. res_kmodes <- kmodes(K = 5, datafile = datFile, algorithm = "KMODES_HARTIGAN_WONG", init_method = "KMODES_INIT_AV07_GREEDY")