Clustering models in epidemiology

In epidemiology, when we think of the word ‘clustering’ we often think about it in the context of infectious disease outbreak and transmission chains, spatial patterns in environmental epidemiology, or high dimensional analysis in genetic epidemiology. Clustering methods, however, have a much broader application in general epidemiology in identifying patterns and groups that share exposures, risk factors and outcomes within populations. There are many types of clustering approaches that can be used in epidemiological studies. As the diagram below shows, traditionally these models were broadly categorize into heuristic vs model based vs density-based, with a range of other expanded models (adapted from Jain et al. 2004). I’ve included some examples of some of the prototype modelling approaches under each category. I should note that these models and approaches can go by many names depending on the field, and under various classifications. For example k-means clustering is also identified as centroid based clustering, partitional clustering, distance-based clustering in literature and field. To better learn about some of the traditional approaches to clustering, this article by Jain et al. (2004) is an excellent review.

*

Jain, A. K., Topchy, A., Law, M. H., & Buhmann, J. M. (2004, August). Landscape of clustering algorithms. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 1, pp. 260-263). IEEE.

The advent of machine learning and deep learning approaches in the past decade has not only advanced existing approaches and introduced a host of new methods such as deep clustering methods. This article provides the historical timeline for some of these developments. Some of these methods are also increasingly being used for health and epidemiological data, for example deep embedded clustering (DEC) used with critical care data in this paper by de Kok et al. (2024) , I’ve provided some examples below. A good review of recent clustering methods in this space by Ezugwu et al. (2022).

*

Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110, 104743.

0 0 votes
Article Rating

Marzieh Ghiasi (@ntds), MD PhD MSc trained in epidemiology at McGill University and Michigan State University. She is greatly interested in epidemiological methods, particularly clustering techniques and genetic epidemiology. She is passionate about promoting stronger medical education, particularly focusing on epidemiological, biostatistics and clinical research skills.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments