| Title: | Proximity Measure Based Diagnostics for Standard, Soft, and Multi-Way Clustering |
|---|---|
| Description: | Quantifies clustering quality by measuring both cohesion within clusters and separation between clusters. Implements advanced silhouette width computations for diverse clustering structures, including: simplified silhouette (Van der Laan et al., 2003) <doi:10.1080/0094965031000136012>, Probability of Alternative Cluster normalization methods (Raymaekers & Rousseeuw, 2022) <doi:10.1080/10618600.2022.2050249>, fuzzy clustering and silhouette diagnostics using membership probabilities (Campello & Hruschka, 2006; Menardi, 2011; Bhat & Kiruthika, 2024) <doi:10.1016/j.fss.2006.07.006>, <doi:10.1007/s11222-010-9169-0>, <doi:10.1080/23737484.2024.2408534>, and multi-way clustering extensions such as block and tensor clustering (Schepers et al., 2008; Bhat & Kiruthika, 2025) <doi:10.1007/s00357-008-9005-9>, <doi:10.21203/rs.3.rs-6973596/v1>. Provides tools for computation and visualization (Rousseeuw, 1987) <doi:10.1016/0377-0427(87)90125-7> to support robust and reproducible cluster diagnostics across standard, soft, and multi-way clustering settings. |
| Authors: | Shrikrishna Bhat K [aut, cre, cph] (ORCID: <https://orcid.org/0009-0000-6180-5783>), Kiruthika C [aut] (ORCID: <https://orcid.org/0000-0001-9655-702X>) |
| Maintainer: | Shrikrishna Bhat K <[email protected]> |
| License: | GPL-2 |
| Version: | 0.9.6 |
| Built: | 2026-05-21 09:44:14 UTC |
| Source: | https://github.com/kskbhat/silhouette |
Computes all possible silhouette indices from available functions in the package and returns a summary data frame comparing crisp, fuzzy, and median silhouette values across different methods.
calSilhouette( prox_matrix = NULL, proximity_type = c("dissimilarity", "similarity"), prob_matrix = NULL, a = 2, print.summary = FALSE, clust_fun = NULL, ... )calSilhouette( prox_matrix = NULL, proximity_type = c("dissimilarity", "similarity"), prob_matrix = NULL, a = 2, print.summary = FALSE, clust_fun = NULL, ... )
prox_matrix |
A numeric matrix where rows represent observations and columns represent proximity measures (e.g., distances or similarities) to clusters. Typically, this is a membership or dissimilarity matrix from clustering results. If |
proximity_type |
Character string specifying the type of proximity measure in |
prob_matrix |
A numeric matrix of cluster membership probabilities, where rows represent
observations and columns represent clusters (depending on |
a |
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to |
print.summary |
Logical; if |
clust_fun |
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity measure matrix. For example, |
... |
Additional arguments passed to |
This function computes all available silhouette methods from the package and returns a comparative summary. The methods included depend on the available input matrices:
If prox_matrix is available:
medoid - Medoid-based silhouette using Silhouette
pac - PAC-based silhouette using Silhouette
If prob_matrix is available:
pp_pac - Posterior probabilities silhouette with PAC method using softSilhouette
pp_medoid - Posterior probabilities silhouette with Medoid method using softSilhouette
nlpp_pac - Negative log posterior probabilities silhouette with PAC method using softSilhouette
nlpp_medoid - Negative log posterior probabilities silhouette with Medoid method using softSilhouette
pd_pac - Probability distribution silhouette with PAC method using softSilhouette
pd_medoid - Probability distribution silhouette with Medoid method using softSilhouette
cer - Certainty-based silhouette using cerSilhouette
db - Density-based silhouette using dbSilhouette
At least one of prox_matrix or prob_matrix must be provided.
A data frame with the following columns:
Character vector of method names
Numeric vector of crisp (unweighted) average silhouette values
Numeric vector of fuzzy (weighted) average silhouette values (NA if prob_matrix is not available for the method)
Numeric vector of median silhouette values
Silhouette, softSilhouette, dbSilhouette, cerSilhouette
if (requireNamespace("ppclust", quietly = TRUE)) { # Example with FCM clustering library(ppclust) data(iris) fcm_result <- fcm(iris[, -5], centers = 3) # Using matrices directly summary_result <- calSilhouette( prox_matrix = fcm_result$d, prob_matrix = fcm_result$u, proximity_type = "dissimilarity", print.summary = TRUE ) } if (requireNamespace("ppclust", quietly = TRUE)) { # Using clustering function summary_result2 <- calSilhouette( prox_matrix = "d", prob_matrix = "u", proximity_type = "dissimilarity", clust_fun = ppclust::fcm, x = iris[, -5], centers = 3, print.summary = TRUE ) }if (requireNamespace("ppclust", quietly = TRUE)) { # Example with FCM clustering library(ppclust) data(iris) fcm_result <- fcm(iris[, -5], centers = 3) # Using matrices directly summary_result <- calSilhouette( prox_matrix = fcm_result$d, prob_matrix = fcm_result$u, proximity_type = "dissimilarity", print.summary = TRUE ) } if (requireNamespace("ppclust", quietly = TRUE)) { # Using clustering function summary_result2 <- calSilhouette( prox_matrix = "d", prob_matrix = "u", proximity_type = "dissimilarity", clust_fun = ppclust::fcm, x = iris[, -5], centers = 3, print.summary = TRUE ) }
Computes silhouette widths using maximum of posterior probabilities as Silhouette.
cerSilhouette( prob_matrix, average = c("crisp", "fuzzy", "median"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )cerSilhouette( prob_matrix, average = c("crisp", "fuzzy", "median"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )
prob_matrix |
A numeric matrix of posterior probabilities where rows represent observations and columns represent clusters. Must sum to 1 by row. If |
average |
Character string specifying the method for computing the average silhouette width. Options are:
Defaults to |
a |
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to |
sort |
Logical; if |
print.summary |
Logical; if |
clust_fun |
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity matrix. For example, |
... |
Additional arguments passed to |
Let the posterior probability matrix or cluster membership matrix as
The certainty silhouette width for observation is:
#' If average = "crisp", the crisp silhouette index is calculated as () is:
summarizing overall clustering quality.
If average = "fuzzy" and prob_matrix is provided, denoted as ,
with representing the probability of observation belonging to cluster ,
the fuzzy silhouette index () is calculated as:
where is weight and (the a argument) controls the emphasis on confident assignments.
If average = "median" then median Silhouette is Calculated
A data frame of class "Silhouette" containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
The proximity type used i.e., "similarity".
The silhouette calculation method used i.e., "certainty".
Character — the averaging method: "crisp", "fuzzy", or "median".
Bhat Kapu, S., & Kiruthika. (2024). Some density-based silhouette diagnostics for soft clustering algorithms. Communications in Statistics: Case Studies, Data Analysis and Applications, 10(3-4), 221-238. doi:10.1080/23737484.2024.2408534
Silhouette, softSilhouette, dbSilhouette, getSilhouette, is.Silhouette, plotSilhouette
# Compare two soft clustering algorithms using cerSilhouette # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- cerSilhouette(prob_matrix = fcm_result$u, print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- cerSilhouette(prob_matrix = fcm2_result$u, print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }# Compare two soft clustering algorithms using cerSilhouette # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- cerSilhouette(prob_matrix = fcm_result$u, print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- cerSilhouette(prob_matrix = fcm2_result$u, print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }
Computes silhouette widths based on Menardi (2011) density-based method using log-ratios of posterior probabilities.
dbSilhouette( prob_matrix, average = c("median", "crisp", "fuzzy"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )dbSilhouette( prob_matrix, average = c("median", "crisp", "fuzzy"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )
prob_matrix |
A numeric matrix of posterior probabilities where rows represent observations and columns represent clusters. Must sum to 1 by row. If |
average |
Character string specifying the method for computing the average silhouette width. Options are:
Defaults to |
a |
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to |
sort |
Logical; if |
print.summary |
Logical; if |
clust_fun |
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity matrix. For example, |
... |
Additional arguments passed to |
Let the posterior probability matrix or cluster membership matrix as
The density-based silhouette width for observation is:
#' If average = "crisp", the crisp silhouette index is calculated as () is:
summarizing overall clustering quality.
If average = "fuzzy" and prob_matrix is provided, denoted as ,
with representing the probability of observation belonging to cluster ,
the fuzzy silhouette index () is calculated as:
where is weight and (the a argument) controls the emphasis on confident assignments.
If average = "median" then median Silhouette is Calculated
A data frame of class "Silhouette" containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
The proximity type used i.e., "similarity".
The silhouette calculation method used i.e., "db".
Character — the averaging method: "crisp", "fuzzy", or "median".
Menardi, G. (2011). Density-based silhouette diagnostics for clustering methods. Statistics and Computing, 21(3), 295–308. doi:10.1007/s11222-010-9169-0
Silhouette, softSilhouette, cerSilhouette, getSilhouette, is.Silhouette, plotSilhouette
# Compare two soft clustering algorithms using dbSilhouette # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- dbSilhouette(prob_matrix = fcm_result$u, print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- dbSilhouette(prob_matrix = fcm2_result$u, print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }# Compare two soft clustering algorithms using dbSilhouette # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- dbSilhouette(prob_matrix = fcm_result$u, print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- dbSilhouette(prob_matrix = fcm2_result$u, print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }
Computes an extended silhouette width for multi-way clustering (e.g., biclustering, triclustering, or n-mode tensor clustering) by combining silhouette widths from a list of Silhouette objects, each representing one mode of clustering. The extended silhouette width is the weighted average of the average silhouette widths from each mode, weighted by the number of observations in each mode's silhouette analysis. The output is an object of class extSilhouette.
extSilhouette(sil_list, dim_names = NULL, print.summary = FALSE)extSilhouette(sil_list, dim_names = NULL, print.summary = FALSE)
sil_list |
A list of objects of class |
dim_names |
An optional character vector of dimension names (e.g., |
print.summary |
Logical; if |
The extended silhouette width is computed as:
where is the number of observations in mode (derived from nrow(x$widths)), and is the average silhouette width for that mode (from x$avg.width).
Each Silhouette object in sil_list must contain a non-empty widths data frame and a numeric avg.width value. Modes with zero observations () are not allowed, as they would result in an undefined weighted average. For consistency make sure all Silhouette objects derived from same method and arguments.
A list of class "extSilhouette" with the following components:
A numeric scalar representing the extended silhouette width.
A data frame with columns dimension (e.g., "Mode 1", "Mode 2"), n_obs (number of observations), and avg_sil_width (average silhouette width for each mode).
Schepers, J., Ceulemans, E., & Van Mechelen, I. (2008). Selecting among multi-mode partitioning models of different complexities: A comparison of four model selection criteria. Journal of Classification, 25(1), 67–85. doi:10.1007/s00357-008-9005-9
Bhat Kapu, S., & Kiruthika, C. (2025). Block Probabilistic Distance Clustering: A Unified Framework and Evaluation. PREPRINT (Version 1) available at Research Square. doi:10.21203/rs.3.rs-6973596/v1
Silhouette, softSilhouette, dbSilhouette, cerSilhouette, getSilhouette, is.Silhouette
# Example using iris dataset with two modes data(iris) if (requireNamespace("blockcluster", quietly = TRUE)) { library(blockcluster) result <- coclusterContinuous( as.matrix(iris[, -5]), nbcocluster = c(3, 2) ) } else { message("Install 'blockcluster': install.packages('blockcluster')") } if (requireNamespace("blockcluster", quietly = TRUE)) { sil_mode1 <- softSilhouette( prob_matrix = result@rowposteriorprob, method = "pac") sil_mode2 <- softSilhouette( prob_matrix = result@colposteriorprob, method = "pac" ) # Extended silhouette ext_sil <- extSilhouette(list(sil_mode1, sil_mode2),print.summary = TRUE) }# Example using iris dataset with two modes data(iris) if (requireNamespace("blockcluster", quietly = TRUE)) { library(blockcluster) result <- coclusterContinuous( as.matrix(iris[, -5]), nbcocluster = c(3, 2) ) } else { message("Install 'blockcluster': install.packages('blockcluster')") } if (requireNamespace("blockcluster", quietly = TRUE)) { sil_mode1 <- softSilhouette( prob_matrix = result@rowposteriorprob, method = "pac") sil_mode2 <- softSilhouette( prob_matrix = result@colposteriorprob, method = "pac" ) # Extended silhouette ext_sil <- extSilhouette(list(sil_mode1, sil_mode2),print.summary = TRUE) }
Constructs a Silhouette class object directly from user-provided components without performing silhouette calculations. This function allows users to build a Silhouette object when they already have the necessary components.
getSilhouette( cluster, neighbor, sil_width, weight = NULL, proximity_type = c("dissimilarity", "similarity"), method = NA, average = c("crisp", "fuzzy", "median") )getSilhouette( cluster, neighbor, sil_width, weight = NULL, proximity_type = c("dissimilarity", "similarity"), method = NA, average = c("crisp", "fuzzy", "median") )
cluster |
Numeric or integer vector of cluster assignments for each observation |
neighbor |
Numeric or integer vector of nearest neighbor cluster assignments for each observation |
sil_width |
Numeric vector of silhouette widths for each observation (must be between -1 and +1) |
weight |
Numeric vector of weights for each observation (must be between 0 and 1, only used when average = "fuzzy") |
proximity_type |
Character; the proximity type used. Options: "similarity" or "dissimilarity" |
method |
Character; the silhouette calculation method used (default: NULL, can be any custom name) |
average |
Character; the averaging method. Options: "crisp", "fuzzy", or "median" |
A data frame of class "Silhouette" containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
The proximity type used ("similarity" or "dissimilarity").
The silhouette calculation method used ("medoid" or "pac").
Character — the averaging method: "crisp", "fuzzy", or "median".
Silhouette, softSilhouette, dbSilhouette, cerSilhouette, is.Silhouette, plotSilhouette
# Create a simple crisp Silhouette object (3 columns) cluster_assignments <- c(1, 1, 2, 2, 3, 3) neighbor_clusters <- c(2, 2, 1, 1, 1, 1) silhouette_widths <- c(0.8, 0.7, 0.6, 0.9, 0.5, 0.4) sil_obj <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "medoid", average = "crisp" ) sil_obj # Create a fuzzy Silhouette object with weights (4 columns) weights <- c(0.9, 0.8, 0.7, 0.95, 0.6, 0.5) sil_fuzzy <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, weight = weights, proximity_type = "similarity", method = "pac", average = "fuzzy" ) sil_fuzzy # Custom method name sil_custom <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "my_custom_method", average = "crisp" ) sil_custom# Create a simple crisp Silhouette object (3 columns) cluster_assignments <- c(1, 1, 2, 2, 3, 3) neighbor_clusters <- c(2, 2, 1, 1, 1, 1) silhouette_widths <- c(0.8, 0.7, 0.6, 0.9, 0.5, 0.4) sil_obj <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "medoid", average = "crisp" ) sil_obj # Create a fuzzy Silhouette object with weights (4 columns) weights <- c(0.9, 0.8, 0.7, 0.95, 0.6, 0.5) sil_fuzzy <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, weight = weights, proximity_type = "similarity", method = "pac", average = "fuzzy" ) sil_fuzzy # Custom method name sil_custom <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "my_custom_method", average = "crisp" ) sil_custom
Tests whether an object is of class "Silhouette". This function checks both the class inheritance and the expected structure of a Silhouette object.
is.Silhouette(x, strict = FALSE)is.Silhouette(x, strict = FALSE)
x |
An object to test |
strict |
Logical; if TRUE, performs additional structural validation beyond just class checking (default: FALSE) |
When strict = FALSE, the function only checks if the object inherits
from the "Silhouette" class.
When strict = TRUE, the function additionally validates:
Object is a data frame
Has required columns: cluster, neighbor, sil_width
Has required attributes: proximity_type, method, average
Column types are appropriate (integer for cluster/neighbor, numeric for sil_width)
The Silhouette object attributes are validated as follows:
proximity_type: Must be one of "dissimilarity" or "similarity"
average: Must be one of "crisp", "fuzzy", or "median"
method: Can be NULL or any string
Logical; TRUE if the object is of class "Silhouette", FALSE otherwise
Silhouette, softSilhouette, dbSilhouette, cerSilhouette, getSilhouette, plotSilhouette
# Create a Silhouette object cluster_assignments <- c(1, 1, 2, 2, 3, 3) neighbor_clusters <- c(2, 2, 1, 1, 1, 1) silhouette_widths <- c(0.8, 0.7, 0.6, 0.9, 0.5, 0.4) sil_obj <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "medoid", average = "crisp" ) # Test if object is Silhouette is.Silhouette(sil_obj) # TRUE is.Silhouette(sil_obj, strict = TRUE) # TRUE # Test with non-Silhouette objects is.Silhouette(data.frame(a = 1, b = 2)) # FALSE is.Silhouette(matrix(1:10, ncol = 2)) # FALSE is.Silhouette(list(a = 1, b = 2)) # FALSE is.Silhouette(NULL) # FALSE# Create a Silhouette object cluster_assignments <- c(1, 1, 2, 2, 3, 3) neighbor_clusters <- c(2, 2, 1, 1, 1, 1) silhouette_widths <- c(0.8, 0.7, 0.6, 0.9, 0.5, 0.4) sil_obj <- getSilhouette( cluster = cluster_assignments, neighbor = neighbor_clusters, sil_width = silhouette_widths, proximity_type = "dissimilarity", method = "medoid", average = "crisp" ) # Test if object is Silhouette is.Silhouette(sil_obj) # TRUE is.Silhouette(sil_obj, strict = TRUE) # TRUE # Test with non-Silhouette objects is.Silhouette(data.frame(a = 1, b = 2)) # FALSE is.Silhouette(matrix(1:10, ncol = 2)) # FALSE is.Silhouette(list(a = 1, b = 2)) # FALSE is.Silhouette(NULL) # FALSE
Creates a silhouette plot for visualizing the silhouette widths of clustering results, with bars colored by cluster and an optional summary of cluster statistics in legend.
plotSilhouette( x, label = FALSE, summary.legend = TRUE, grayscale = FALSE, linetype = c("dashed", "solid", "dotted", "dotdash", "longdash", "twodash"), ... )plotSilhouette( x, label = FALSE, summary.legend = TRUE, grayscale = FALSE, linetype = c("dashed", "solid", "dotted", "dotdash", "longdash", "twodash"), ... )
x |
An object of class |
label |
Logical; if |
summary.legend |
Logical; if |
grayscale |
Logical; if |
linetype |
Character or numeric value specifying the type of line to be used for the horizontal reference line indicating the average silhouette width. Accepts standard ggplot2 linetype values, such as:
Defaults to |
... |
Additional arguments passed to |
The Silhouette plot displays the silhouette width (sil_width) for each observation, grouped by cluster, with bars sorted by cluster and descending silhouette width. The summary.legend option adds cluster sizes and average silhouette widths to the legend.
This function replica of S3 method for objects of class "Silhouette", typically produced by the Silhouette, softSilhouette, , dbSilhouette or , cerSilhouette functions in this package. It also supports objects of the following classes, with silhouette information extracted from their respective component:
"eclust": Produced by eclust from the factoextra package.
"hcut": Produced by hcut from the factoextra package.
"pam": Produced by pam from the cluster package.
"clara": Produced by clara from the cluster package.
"fanny": Produced by fanny from the cluster package.
"silhouette": Produced by silhouette from the cluster package or silhouette from the drclust package.
For these classes ("eclust", "hcut", "pam", "clara", "fanny", "silhouette"), users should explicitly call plotSilhouette() (e.g., plotSilhouette(pam_result)) to ensure the correct method is used, as the generic plot() may not dispatch to this function for these objects.
A ggplot2 object representing the Silhouette plot.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. doi:10.1016/0377-0427(87)90125-7
Silhouette, softSilhouette, dbSilhouette, cerSilhouette, getSilhouette, is.Silhouette
data(iris) # Crisp Silhouette with k-means out <- kmeans(iris[, -5], 3) if (requireNamespace("proxy", quietly = TRUE)) { library(proxy) dist <- dist(iris[, -5], out$centers) plot(Silhouette(dist)) } #' # Fuzzy Silhouette with ppclust::fcm if (requireNamespace("ppclust", quietly = TRUE)) { library(ppclust) out_fuzzy <- Silhouette( prox_matrix = "d", proximity_type = "dissimilarity", prob_matrix = "u", clust_fun = ppclust::fcm, x = iris[, 1:4], centers = 3, sort = TRUE ) plot(out_fuzzy, summary.legend = FALSE, grayscale = TRUE) } else { message("Install 'ppclust': install.packages('ppclust')") } # Silhouette plot for pam clustering if (requireNamespace("cluster", quietly = TRUE)) { library(cluster) pam_result <- pam(iris[, 1:4], k = 3) plotSilhouette(pam_result) } # Silhouette plot for clara clustering if (requireNamespace("cluster", quietly = TRUE)) { clara_result <- clara(iris[, 1:4], k = 3) plotSilhouette(clara_result) } # Silhouette plot for fanny clustering if (requireNamespace("cluster", quietly = TRUE)) { fanny_result <- fanny(iris[, 1:4], k = 3) plotSilhouette(fanny_result) } # Example using base silhouette() object if (requireNamespace("cluster", quietly = TRUE)) { sil <- silhouette(pam_result) plotSilhouette(sil) } # Silhouette plot for eclust clustering if (requireNamespace("factoextra", quietly = TRUE)) { library(factoextra) eclust_result <- eclust(iris[, 1:4], "kmeans", k = 3, graph = FALSE) plotSilhouette(eclust_result) } # Silhouette plot for hcut clustering if (requireNamespace("factoextra", quietly = TRUE)) { hcut_result <- hcut(iris[, 1:4], k = 3) plotSilhouette(hcut_result) } # Silhouette plot for hcut clustering if (requireNamespace("drclust", quietly = TRUE)) { library(drclust) iris_mat <- as.matrix(iris[,-5]) drclust_out <- dpcakm(iris_mat, 20, 3) d <- silhouette(iris_mat, drclust_out) plotSilhouette(d$cl.silhouette) }data(iris) # Crisp Silhouette with k-means out <- kmeans(iris[, -5], 3) if (requireNamespace("proxy", quietly = TRUE)) { library(proxy) dist <- dist(iris[, -5], out$centers) plot(Silhouette(dist)) } #' # Fuzzy Silhouette with ppclust::fcm if (requireNamespace("ppclust", quietly = TRUE)) { library(ppclust) out_fuzzy <- Silhouette( prox_matrix = "d", proximity_type = "dissimilarity", prob_matrix = "u", clust_fun = ppclust::fcm, x = iris[, 1:4], centers = 3, sort = TRUE ) plot(out_fuzzy, summary.legend = FALSE, grayscale = TRUE) } else { message("Install 'ppclust': install.packages('ppclust')") } # Silhouette plot for pam clustering if (requireNamespace("cluster", quietly = TRUE)) { library(cluster) pam_result <- pam(iris[, 1:4], k = 3) plotSilhouette(pam_result) } # Silhouette plot for clara clustering if (requireNamespace("cluster", quietly = TRUE)) { clara_result <- clara(iris[, 1:4], k = 3) plotSilhouette(clara_result) } # Silhouette plot for fanny clustering if (requireNamespace("cluster", quietly = TRUE)) { fanny_result <- fanny(iris[, 1:4], k = 3) plotSilhouette(fanny_result) } # Example using base silhouette() object if (requireNamespace("cluster", quietly = TRUE)) { sil <- silhouette(pam_result) plotSilhouette(sil) } # Silhouette plot for eclust clustering if (requireNamespace("factoextra", quietly = TRUE)) { library(factoextra) eclust_result <- eclust(iris[, 1:4], "kmeans", k = 3, graph = FALSE) plotSilhouette(eclust_result) } # Silhouette plot for hcut clustering if (requireNamespace("factoextra", quietly = TRUE)) { hcut_result <- hcut(iris[, 1:4], k = 3) plotSilhouette(hcut_result) } # Silhouette plot for hcut clustering if (requireNamespace("drclust", quietly = TRUE)) { library(drclust) iris_mat <- as.matrix(iris[,-5]) drclust_out <- dpcakm(iris_mat, 20, 3) d <- silhouette(iris_mat, drclust_out) plotSilhouette(d$cl.silhouette) }
Computes the silhouette width for each observation based on clustering results, measuring how similar an observation is to its own cluster compared to nearest neighbor cluster. The silhouette width ranges from -1 to 1, where higher values indicate better cluster cohesion and separation.
Silhouette( prox_matrix, proximity_type = c("dissimilarity", "similarity"), method = c("medoid", "pac"), average = c("crisp", "fuzzy", "median"), prob_matrix = NULL, a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... ) ## S3 method for class 'Silhouette' plot( x, label = FALSE, summary.legend = TRUE, grayscale = FALSE, linetype = c("dashed", "solid", "dotted", "dotdash", "longdash", "twodash"), ... ) ## S3 method for class 'Silhouette' summary(object, print.summary = TRUE, ...)Silhouette( prox_matrix, proximity_type = c("dissimilarity", "similarity"), method = c("medoid", "pac"), average = c("crisp", "fuzzy", "median"), prob_matrix = NULL, a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... ) ## S3 method for class 'Silhouette' plot( x, label = FALSE, summary.legend = TRUE, grayscale = FALSE, linetype = c("dashed", "solid", "dotted", "dotdash", "longdash", "twodash"), ... ) ## S3 method for class 'Silhouette' summary(object, print.summary = TRUE, ...)
prox_matrix |
A numeric matrix where rows represent observations and columns represent proximity measures (e.g., distances or similarities) to clusters. Typically, this is a membership or dissimilarity matrix from clustering results. If |
proximity_type |
Character string specifying the type of proximity measure in |
method |
Character string specifying the silhouette calculation method. Options are |
average |
Character string specifying the method for computing the average silhouette width. Options are:
Defaults to |
prob_matrix |
A numeric matrix of cluster membership probabilities, where rows represent
observations and columns represent clusters (depending on |
a |
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to |
sort |
Logical; if |
print.summary |
Logical; if |
clust_fun |
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity measure matrix. For example, |
... |
Additional arguments passed to |
x |
An object of class |
label |
Logical; if |
summary.legend |
Logical; if |
grayscale |
Logical; if |
linetype |
Character or numeric value specifying the type of line to be used for the horizontal reference line indicating the average silhouette width. Accepts standard ggplot2 linetype values, such as:
Defaults to |
object |
An object of class |
The Silhouette function implements the Simplified Silhouette method introduced by Van der Laan, Pollard, & Bryan (2003), which adapts and generalizes the classic silhouette method of Rousseeuw (1987).
Clustering quality is evaluated using a proximity matrix, denoted as
for dissimilarity measures or
for similarity measures.
Here, indexes observations, and indexes clusters.
represents the dissimilarity (e.g., distance) between observation and cluster ,
while represents similarity values.
The silhouette width for observation depends on the proximity type:
For dissimilarity measures:
For similarity measures:
where is a normalizing factor defined by the method.
Choice of method:
The normalizer is selected according to the method argument. The method names reference their origins but may be used with any proximity matrix, not exclusively certain clustering algorithms:
For medoid (Van der Laan et al., 2003):
Dissimilarity:
Similarity:
For pac (Raymaekers & Rousseeuw, 2022):
Dissimilarity:
Similarity:
Note:
The "medoid" and "pac" options reflect the normalization formula—not a requirement to use the PAM algorithm or posterior/ensemble methods—and are general scoring approaches. These methods can be applied to any suitable proximity matrix, including proximity, similarity, or dissimilarity matrices derived from classification algorithms. This flexibility means silhouette indices may be computed to assess group separation when clusters or groups are formed from classification-derived proximities, not only from unsupervised clustering.
If average = "crisp", the crisp silhouette index is calculated as () is:
summarizing overall clustering quality.
If average = "fuzzy" and prob_matrix is provided, denoted as ,
with representing the probability of observation belonging to cluster ,
the fuzzy silhouette index () is calculated as:
where is weight and (the a argument) controls the emphasis on confident assignments.
If average = "median" then median Silhouette is Calculated
A data frame of class "Silhouette" containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
The proximity type used ("similarity" or "dissimilarity").
The silhouette calculation method used ("medoid" or "pac").
Character — the averaging method: "crisp", "fuzzy", or "median".
Further, summary returns a list containing:
clus.avg.widths: A named numeric vector of average silhouette widths per cluster.
avg.width: The overall average silhouette width.
sil.sum: A data frame with columns cluster, size, and avg.sil.width summarizing cluster sizes and average silhouette widths.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. doi:10.1016/0377-0427(87)90125-7
Van der Laan, M., Pollard, K., & Bryan, J. (2003). A new partitioning around medoids algorithm. Journal of Statistical Computation and Simulation, 73(8), 575–584. doi:10.1080/0094965031000136012
Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858–2875. doi:10.1016/j.fss.2006.07.006
Raymaekers, J., & Rousseeuw, P. J. (2022). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. Journal of Computational and Graphical Statistics, 31(4), 1332–1343. doi:10.1080/10618600.2022.2050249
Bhat Kapu, S., & Kiruthika. (2024). Some density-based silhouette diagnostics for soft clustering algorithms. Communications in Statistics: Case Studies, Data Analysis and Applications, 10(3-4), 221-238. doi:10.1080/23737484.2024.2408534
softSilhouette, dbSilhouette, cerSilhouette, getSilhouette, is.Silhouette, plotSilhouette
# Standard silhouette with k-means on iris dataset data(iris) # Crisp Silhouette with k-means out <- kmeans(iris[, -5], 3) if (requireNamespace("proxy", quietly = TRUE)) { library(proxy) dist <- proxy::dist(iris[, -5], out$centers) silh_out <- Silhouette(dist,print.summary = TRUE) plot(silh_out) } else { message("Install 'proxy': install.packages('ppclust')") } # Scree plot for optimal clusters (2 to 7) if (requireNamespace("ppclust", quietly = TRUE)) { library(ppclust) avg_sil_width <- rep(NA,7) for (k in 2:7) { out <- Silhouette( prox_matrix = "d", proximity_type = "dissimilarity", prob_matrix = "u", clust_fun = ppclust::fcm, x = iris[, 1:4], centers = k, average = "fuzzy" ) # Compute average silhouette width from widths avg_sil_width[k] <- summary(out, print.summary = FALSE)$avg.width } plot(avg_sil_width, type = "o", ylab = "Overall Silhouette Width", xlab = "Number of Clusters", main = "Scree Plot" ) } else { message("Install 'ppclust': install.packages('ppclust')") }# Standard silhouette with k-means on iris dataset data(iris) # Crisp Silhouette with k-means out <- kmeans(iris[, -5], 3) if (requireNamespace("proxy", quietly = TRUE)) { library(proxy) dist <- proxy::dist(iris[, -5], out$centers) silh_out <- Silhouette(dist,print.summary = TRUE) plot(silh_out) } else { message("Install 'proxy': install.packages('ppclust')") } # Scree plot for optimal clusters (2 to 7) if (requireNamespace("ppclust", quietly = TRUE)) { library(ppclust) avg_sil_width <- rep(NA,7) for (k in 2:7) { out <- Silhouette( prox_matrix = "d", proximity_type = "dissimilarity", prob_matrix = "u", clust_fun = ppclust::fcm, x = iris[, 1:4], centers = k, average = "fuzzy" ) # Compute average silhouette width from widths avg_sil_width[k] <- summary(out, print.summary = FALSE)$avg.width } plot(avg_sil_width, type = "o", ylab = "Overall Silhouette Width", xlab = "Number of Clusters", main = "Scree Plot" ) } else { message("Install 'ppclust': install.packages('ppclust')") }
Computes silhouette widths for soft clustering results by interpreting cluster membership probabilities (or their transformations) as proximity measures. Although originally designed for evaluating clustering quality within a method, this adaptation allows heuristic comparison across soft clustering algorithms using average silhouette widths.
softSilhouette( prob_matrix, prob_type = c("pp", "nlpp", "pd"), method = c("pac", "medoid"), average = c("crisp", "fuzzy", "median"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )softSilhouette( prob_matrix, prob_type = c("pp", "nlpp", "pd"), method = c("pac", "medoid"), average = c("crisp", "fuzzy", "median"), a = 2, sort = FALSE, print.summary = FALSE, clust_fun = NULL, ... )
prob_matrix |
A numeric matrix where rows represent observations and columns represent cluster membership probabilities (or transformed probabilities, depending on |
prob_type |
Character string specifying the type transformation of membership matrix considered as proximity matrix in
Defaults to |
method |
Character string specifying the silhouette calculation method. Options are |
average |
Character string specifying the method for computing the average silhouette width. Options are:
Defaults to |
a |
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to |
sort |
Logical; if |
print.summary |
Logical; if |
clust_fun |
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity measure matrix. For example, |
... |
Additional arguments passed to |
Although the silhouette method was originally developed for evaluating clustering structure within a single result, this implementation allows leveraging cluster membership probabilities from soft clustering methods to construct proximity-based silhouettes. These silhouette widths can be compared heuristically across different algorithms to assess clustering quality.
See doi:10.1080/23737484.2024.2408534 for more details.
#' If average = "crisp", the crisp silhouette index is calculated as () is:
summarizing overall clustering quality.
If average = "fuzzy" and prob_matrix is provided, denoted as ,
with representing the probability of observation belonging to cluster ,
the fuzzy silhouette index () is calculated as:
where is weight and (the a argument) controls the emphasis on confident assignments.
If average = "median" then median Silhouette is Calculated
A data frame of class "Silhouette" containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
The proximity type used ("similarity" or "dissimilarity").
The silhouette calculation method used ("medoid" or "pac").
Character — the averaging method: "crisp", "fuzzy", or "median".
Raymaekers, J., & Rousseeuw, P. J. (2022). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. Journal of Computational and Graphical Statistics, 31(4), 1332–1343. doi:10.1080/10618600.2022.2050249
Bhat Kapu, S., & Kiruthika. (2024). Some density-based silhouette diagnostics for soft clustering algorithms. Communications in Statistics: Case Studies, Data Analysis and Applications, 10(3-4), 221-238. doi:10.1080/23737484.2024.2408534
Silhouette, dbSilhouette, cerSilhouette, getSilhouette, is.Silhouette, plotSilhouette
# Compare two soft clustering algorithms using softSilhouett # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- softSilhouette(prob_matrix = fcm_result$u,print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- softSilhouette(prob_matrix = fcm2_result$u,print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }# Compare two soft clustering algorithms using softSilhouett # Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion data(iris) if (requireNamespace("ppclust", quietly = TRUE)) { fcm_result <- ppclust::fcm(iris[, 1:4], 3) out_fcm <- softSilhouette(prob_matrix = fcm_result$u,print.summary = TRUE) plot(out_fcm) sfcm <- summary(out_fcm, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } if (requireNamespace("ppclust", quietly = TRUE)) { fcm2_result <- ppclust::fcm2(iris[, 1:4], 3) out_fcm2 <- softSilhouette(prob_matrix = fcm2_result$u,print.summary = TRUE) plot(out_fcm2) sfcm2 <- summary(out_fcm2, print.summary = FALSE) } else { message("Install 'ppclust' to run this example: install.packages('ppclust')") } # Compare average silhouette widths of fcm and fcm2 if (requireNamespace("ppclust", quietly = TRUE)) { cat("FCM average silhouette width:", sfcm$avg.width, "\n", "FCM2 average silhouette width:", sfcm2$avg.width, "\n") }