Simulate clustered categorical datasets by using continous time Marcov chain.

simulator(simK, n_coordinates, n_observations,
n_categories, sim_between_t, sim_within_t, use_dirichlet = FALSE, sim_pi)

Arguments

simK

Number of clusters.

n_coordinates

Number of coordinate of the simulated dataset.

n_observations

Number of observation of the simulated dataset.

n_categories

Number of categories of the simulated dataset.

sim_between_t

Between cluster variation.

sim_within_t

Within cluster variation.

use_dirichlet

Indicate if cimulate datasets with dirichlet prior. Defalut is FALSE.

sim_pi

Mixing proportions, a vector with the same length of specified number of clusters and the sum of the values in this vector has to be 1.

Value

Returns a list of simulation dataset results.

Details

Value:

  • "CTMC_probabilities": Number of observations in each cluster of the best initialization.

  • "modes": Simulated modes.

  • "cluster_assignments": Simulated cluster assignments.

  • "cluster_sizes": Simulated cluster sizes.

  • "data": Simulated data.

Examples

#Simulate data with dim 100 * 10, 4 different categories and there are 5 true clusters. data <- simulator(simK = 5, n_coordinates = 10, n_observations = 100, n_categories = 4, sim_between_t = 2, sim_within_t = 1, use_dirichlet = TRUE, sim_pi = c(0.1, 0.1, 0.2, 0.3, 0.3))