Fair archetypal analysis for fair representation

EcoSta 2025

Aleix Alcacer and Irene Epifanio

Saturday, August 23, 2025

Introduction

The concept of archetype

Etymologically, the term archetype is derived from the Ancient Greek árkhō ‘to begin’ and túpos ‘sort, type’.
Some definitions of archetype include:
1. a very typical example of a certain person or thing (Oxford Dictionary).
2. a typical example of something, or the original model of something from which others are copied (Cambridge Dictionary).
3. the original pattern or model of which all things of the same type are representations or copies (Merriam-Webster Dictionary).

“An archetype is a typical form or prototype that other objects are derived from”

Example: Inside Out

In the movie, the main character, Riley, has five archetypal emotions: Joy, Sadness, Fear, Anger, and Disgust.

Archetypal emotions in Inside Out movie

New emotions

However, as seen in Inside Out 2, not only exists these emotions, but also new emotions can be obtained by combining these:

Mixed emotions in Inside Out 2

Combining archetypal emotions

Therefore, a new wide range of emotions can be obtained by combining the archetypal emotions:

New emotions obtained from combining the archetypal emotions

Archetypal Analysis in Machine Learning

Motivation of work

Number of publications containing the “archetypal analysis” keyword over the years. Data has been collected from Google Scholar.

Machine Learning

Machine learning overview

Archetypal Analysis (AA)

Unsupervised machine learning technique introduced by Cutler and Breiman in 1994.
Identifies extreme points that are representative of the underlying patterns or structures within the data set.
Represents each data point as a mixture of these archetypes.

Formal Definition

Let \mathfrak{X}=\{\bm{x}^{(1)}, \bm{x}^{(2)}, \ldots, \bm{x}^{(N)}\} be a dataset where each \bm{x}^{(n)} \in \mathbb{R}^M.

Objective

Find some archetypes \mathfrak{Z} = \{\bm{z}^{(1)}, \bm{z}^{(2)}, \ldots, \bm{z}^{(K)}\} where each \bm{z}^{(k)} \in \mathbb{R}^M, which are convex combinations of the data points in \mathfrak{X}.
Simultaneously, each data point \bm{x}^{(n)} is also approximated as a convex combination of these archetypes \mathfrak{Z}.

Archetypes as convex combinations of the data points

Expanding previous definition, each archetype \bm{Z}_k, i.e. \bm{z}^{(k)}, can be represented as convex combinations of the data points: \bm{Z}_k = \sum_{n=1}^{N} \bm{{C}}_{k,n} \bm{X}_n

subject to \|{\bm{{c}}_k}\|_1 = 1 and \bm{{c}}_{k,n} \in [0, 1].

Archetypes as convex combinations of the data points

Data points as convex combinations of the archetypes

Simultaneously, each data point \bm{X}_n, i.e. \bm{x}^{(n)}, is then approximated as a convex combination of these archetypes \bm{X}_n \approx \sum_{k=1}^{K} \bm{{S}}_{n,k} \bm{Z}_k similarly subject to \|{\bm{{s}}_n}\|_1 = 1 and \bm{{s}}_{n,k} \in [0,1].

Data points as convex combinations of the archetypes

Matrix notation

This formulation thus involves two matrices of coefficients: \bm{{C}} which defines the archetypes as combinations of the data points, and \bm{{S}} which represents each data point as a mixture of archetypes.

In matrix notation, archetypal analysis can be expressed as: \bm{X} \approx \bm{S} \bm{Z} = \bm{S} (\bm{C} \bm{X})

Computation

To compute the archetypes, we need to minimize the following objective function: \mathop{\mathrm{arg\,min}}_{\mathbf{S}, \mathbf{C}} \|\mathbf{X} - \mathbf{SCX}\|_F^2 subject to \|{\mathbf{s}_n}\|_1 = 1, \mathbf{s}_n \geq 0, \|{\mathbf{c}_k}\|_1 = 1 and \mathbf{c}_k \geq 0.

As proposed by Mørup & Hansen in 2012, the objective function can be rewritten in terms of the trace operator as E = \|{\mathbf{X} - \mathbf{SCX}}\|_F^2 = \text{tr} \left( \mathbf{X} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}- 2 \mathbf{SCX}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}+ \mathbf{SCX}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\right), which allows us to derive the gradients with respect to \mathbf{S} and \mathbf{C} \nabla_{\mathbf{S}} E = 2(\mathbf{SCX} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}-\mathbf{X} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}), \nabla_{\mathbf{C}} E = 2(\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{S}\mathbf{C}\mathbf{X}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}- \mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{X}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}).

Steps

To minimize the objective function E, AA can be computed using an alternating optimization scheme:

Update \mathbf{S}:

\mathbf{S} \leftarrow \mathbf{S} - \eta_S \cdot \nabla_{\mathbf{S}} E
Project \mathbf{S} back onto its feasible set.
Update \mathbf{C}:

\mathbf{C} \leftarrow \mathbf{C} - \eta_C \cdot \nabla_{\mathbf{C}} E
Project \mathbf{C} back onto its feasible set.

where \eta_S and \eta_C are the learning rates.

Selecting the optimal number of archetypes

Achieving a balance between complexity and elucidation of data patterns is crucial.
Archetypes are not necessarily nested; they can change as K increases to better capture data structure.
The Elbow Method is a preferred approach due to its simplicity and intuitive visual clarity.

Archetypal Analysis with different number of archetypes

The Elbow Method

Plot performance metric (variance explained or reconstruction error) vs. number of archetypes.
Identify the “elbow” where marginal gains diminish.
The elbow indicates the optimal number of archetypes.

The Elbow Method

Archetypal Analysis vs Clustering

Clustering

Clustering is a technique used to group similar data points into clusters.
The goal is to find groups of data points that are similar to each other and dissimilar to data points in other clusters.
The prototypes of the clusters are the centroids of them.

Property	Clustering	Archetypal Analysis
Goal	Group similar data points	Find extreme points
Prototypes	Centroids	Extreme points

Example

Consider the following dataset representing a set of colors:

Q: Which three colors do you think best represent the dataset?

Prototypes comparison

As you can imagine, the prototypes obtained by clustering and by archetypal analysis are different:

As can be seen, in datasets where there aren’t clear clusters, archetypal analysis can be an effective technique to find interpretable prototypes.

Fair Archetypal Analysis

Definition

Following the work of Kleindessner et al. in 2023 on fair PCA, our objective in fair archetypal analysis (FairAA) is to obscure critical information when projecting the dataset onto the archetypal space.

Let z_i \in \{0, 1\} represent a critical attribute of the data point x_i, indicating membership in one of two groups. Ideally, no classifier should be able to infer z_i from the projection of x_i onto the K-archetypal space.

Analagously to the definition of AA, we can define FairAA as follows:

\mathop{\mathrm{arg\,min}}_{\mathbf{S}, \mathbf{C}} \|{\mathbf{X} - \mathbf{SCX}}\|_F^2

subject to \|{\mathbf{s}_n}\|_1 = 1, \mathbf{s}_n \geq 0, \|{\mathbf{c}_k}\|_1 = 1, \mathbf{c}_k \geq 0 and \forall h:\mathbb{R}^k\rightarrow\mathbb{R}, h(\mathbf{s}_i) and z_i are statistically independent.

Reformulation

If we define \bar{z} = \frac{1}{n}\sum z_i and \mathbf{z} = \{z_1 - \bar{z}, \dots, z_n - \bar{z}\} \in \mathbb{R}^n and consider only linear classifiers, we can reformulate the fairness requirement as:

\begin{gather*} \forall w \in \mathbb{R}^k, b\in \mathbb{R}: \mathbf{w} \mathbf{s}_i^{\mkern-1.5mu\mathsf{T}}+ b \text{ and } z_i \text{ are uncorrelated} \Leftrightarrow \\ \forall w \in \mathbb{R}^k, b\in \mathbb{R}: \sum_{i=1}^n (z_i - \bar{z})(\mathbf{w} \mathbf{s}_i^{\mkern-1.5mu\mathsf{T}}+ b) = 0 \Leftrightarrow \\ \forall w \in \mathbb{R}^k: \mathbf{z}\mathbf{S}\mathbf{w}^T = 0 \Leftrightarrow \\ \mathbf{z}\mathbf{S} = 0 \end{gather*}

Thus, we can define FairAA by introducing an additional term in the optimization problem:

\mathop{\mathrm{arg\,min}}_{\mathbf{S}, \mathbf{C}} \|{\mathbf{X} - \mathbf{SCX}}\|_F^2 + \lambda \|{\mathbf{z}\mathbf{S}}\|_F^2

subject to \|{\mathbf{s}_n}\|_2 = 1, \mathbf{s}_n \geq 0, \|{\mathbf{c}_k}\|_2 = 1, \mathbf{c}_k \geq 0, and \lambda \geq 0, which acts as a regularization parameter.

Computation

In this case, the objective function can also be written in terms of the trace operator as: E = \text{tr} \left( \mathbf{X} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}- 2 \mathbf{SCX}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}+ \mathbf{SCX}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\right) + \lambda \text{tr}\left( \mathbf{z}\mathbf{S}\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{z}^{\mkern-1.5mu\mathsf{T}}\right)

and the resulting gradients are: \nabla_{\mathbf{S}} E = 2(\mathbf{SCX} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}-\mathbf{X} \mathbf{X}^{\mkern-1.5mu\mathsf{T}}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}+ \lambda\mathbf{z}^{\mkern-1.5mu\mathsf{T}}\mathbf{z} \mathbf{S}), \nabla_{\mathbf{C}} E = 2(\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{S}\mathbf{C}\mathbf{X}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}- \mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{X}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}).

To optimize the objective function, we can use the same alternating optimization scheme as in AA.

Complexity order

Regarding the complexity order, the \lambda \mathbf{z}^{\mkern-1.5mu\mathsf{T}}\mathbf{z} parameter could be precomputed, so the complexity order will be the same as AA.

Example

Kernelizing fair AA

The computed gradients depend solely on the pairwise relationships, which are represented by the kernel matrix of inner products, \mathbf{K} = \mathbf{X}\mathbf{X}^{\mkern-1.5mu\mathsf{T}}.

Therefore, FairAA can be easily extended to kernel-based representations that rely on other pairwise data point relationships (FairKernelAA).

In this new framework, the gradients are given by the following expressions: \nabla_{\mathbf{S}} E = 2(\mathbf{SCK}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}-\mathbf{K}\mathbf{C}^{\mkern-1.5mu\mathsf{T}}+ \lambda\mathbf{z}^{\mkern-1.5mu\mathsf{T}}\mathbf{z} \mathbf{S}), \nabla_{\mathbf{C}} E = 2(\mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{S}\mathbf{C}\mathbf{K} - \mathbf{S}^{\mkern-1.5mu\mathsf{T}}\mathbf{K}).

where \mathbf{K} = \mathbf{K}(\mathbf{X}) \in \mathbb{R}^{n \times n} represents the kernel matrix that encodes the pairwise relationships between the elements of \mathbf{X}.

Exemple

Hiding multiple groups

To extend FairAA to cases where the critical attribute consists of m disjoint groups, we adopt the one-vs-all approach.

We define m one-hot encoded critical attributes \mathbf{z}^{(1)}, \dots, \mathbf{z}^{(m)}, where z_i^{(l)} = 1 if \mathbf{x}_i belongs to group l, and z_i^{(l)} = 0 if it does not.

This definition implies that \mathbf{ZS} = 0, where the l-th row of \mathbf{Z} \in \mathbb{R}^{m \times n} is given by \{z_1^{(l)} - \bar{z}^{(l)}, \dots, z_n^{(l)} - \bar{z}^{(l)}\}, with \bar{z}^{(l)} being the mean of the l-th critical attribute across all data points.

As a result, the optimization problem can be solved in a similar manner to the fair AA approach for two groups.

\mathop{\mathrm{arg\,min}}_{\mathbf{S}, \mathbf{C}} \|{\mathbf{X} - \mathbf{SCX}}\|_F^2 + \lambda \|{\mathbf{Z}\mathbf{S}}\|_F^2

subject to \|{\mathbf{s}_n}\|_1 = 1, \mathbf{s}_n \geq 0, \|{\mathbf{c}_k}\|_2 = 1, \mathbf{c}_k \geq 0, and \lambda \geq 0, which acts as a regularization parameter.

Example

Python Package

`archetypes` package

archetypes is a Python package offering a user-friendly and efficient implementation of state-of-the-art techniques for Archetypal Analysis (AA).

Implements multiple archetype initialization algorithms.
Supports a variety of AA algorithms.
Provides multiple backends, including numpy, jax, and torch.
Runs seamlessly on both CPU and GPU.
Integrates diverse visualization tools for enhanced result interpretation.
Fully open-source and community-driven project.

Example

from archetypes import AA
import numpy as np
import matplotlib.pyplot as plt

# Create a dataset with 1000 samples
generator = np.random.default_rng(123)
X = generator.uniform(size=(1000, 2))

# Fit an AA model with 3 archetypes
aa = AA(n_archetypes=4, random_state=123).fit(X)

# Plot the data and the prototypes
fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], alpha=0.25)
ax.scatter(aa.archetypes_[:, 0], aa.archetypes_[:, 1], color='red', s=100)
plt.show()

Conclusions

Summary

The concept of archetype refers to a typical form or prototype that other objects are derived from.
Archetypal Analysis is a technique used to find extreme points (archetypes) in a dataset.
The prototypes obtained by Archetypal Analysis often offers more interpretable results than clustering.
Fair Archetypal Analysis allows to obscure critical information when projecting the dataset onto the archetypal space.
Fair Archetypal Analysis has been extended to use different pairwise relationships between data points and to hide multiple groups.
There is a Python package called archetypes that makes it easy to work with archetypes.

Main References

Alcacer, A., Epifanio, I. (2025). Incorporating Fairness Constraints into Archetypal Analysis. ArXiv. https://doi.org/10.48550/arXiv.2507.12021
Alcacer, A., Epifanio, I., & Gual-Arnau, X. (2024). Biarchetype Analysis: Simultaneous Learning of Observations and Features Based on Extremes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–12. https://doi.org/10.1109/TPAMI.2024.3400730
Cutler, A., & Breiman, L. (1994). Archetypal Analysis. Technometrics, 36(4), 338–347. https://doi.org/10.2307/1269949
Mørup, M., & Hansen, L. K. (2012). Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 54–63. https://doi.org/10.1016/j.neucom.2011.06.033
Kleindessner, M., Donini, M., Russell, C., Zafar, M. B. (2023). Efficient Fair PCA for Fair Representation Learning. International Conference on Artificial Intelligence and Statistics (pp. 5250–5270). PMLR. https://doi.org/10.48550/arXiv.2302.13319

Fair archetypal analysis for fair representation

Introduction

The concept of archetype

Example: Inside Out

New emotions

Combining archetypal emotions

Archetypal Analysis in Machine Learning

Motivation of work

Machine Learning

Archetypal Analysis (AA)

Formal Definition

Objective

Archetypes as convex combinations of the data points

Data points as convex combinations of the archetypes

Matrix notation

Computation

Steps

Selecting the optimal number of archetypes

The Elbow Method

Archetypal Analysis vs Clustering

Clustering

Example

Prototypes comparison

Fair Archetypal Analysis

Definition

Reformulation

Computation

Complexity order

Example

Kernelizing fair AA

Exemple

Hiding multiple groups

Example

Python Package

archetypes package

Example

Conclusions

Summary

Main References

Thank you 🫰

`archetypes` package