Michal
Moshkovitz

Explainable k-Means and k-Medians Clustering

University of California

Michal
Moshkovitz

Explainable k-Means and k-Medians Clustering

University of California

Bio

Michal is a postdoctoral fellow at the Qualcomm Institute of the University of California, San Diego. Her interests lie in the foundations of AI, exploring how different constraints affect learning. She works on explainable machine learning, bounded memory learning, and online no-substitution clustering. Michal received her Ph.D. from the Hebrew University and an MSc from Tel-Aviv University. She was the recipient of an Anita Borg scholarship from Google and a Hoffman scholarship from the Hebrew University.

Bio

Michal is a postdoctoral fellow at the Qualcomm Institute of the University of California, San Diego. Her interests lie in the foundations of AI, exploring how different constraints affect learning. She works on explainable machine learning, bounded memory learning, and online no-substitution clustering. Michal received her Ph.D. from the Hebrew University and an MSc from Tel-Aviv University. She was the recipient of an Anita Borg scholarship from Google and a Hoffman scholarship from the Hebrew University.

Abstract

Many clustering algorithms lead to cluster assignments that are hard to explain, partially because they complicatedly depend on data features. To improve interpretability, we consider using a small decision tree to define the clustering. We show that popular top-down decision tree algorithms may lead to clusterings with an arbitrarily high cost. On the positive side, we design a new algorithm that is O(k^2)-approximation compared to the optimal k-means. Prior to our work, no algorithms were known to have provable guarantees independent of dimension and input size. Empirically, we validate that the new algorithm produces a low-cost clustering, outperforming both standard decision tree methods and other algorithms for explainable clustering. Implementation available at https://github.com/navefr/ExKMC.

Abstract

Many clustering algorithms lead to cluster assignments that are hard to explain, partially because they complicatedly depend on data features. To improve interpretability, we consider using a small decision tree to define the clustering. We show that popular top-down decision tree algorithms may lead to clusterings with an arbitrarily high cost. On the positive side, we design a new algorithm that is O(k^2)-approximation compared to the optimal k-means. Prior to our work, no algorithms were known to have provable guarantees independent of dimension and input size. Empirically, we validate that the new algorithm produces a low-cost clustering, outperforming both standard decision tree methods and other algorithms for explainable clustering. Implementation available at https://github.com/navefr/ExKMC.