Biarchetype analysis

Simultaneous learning of samples and features based on extremes

archetypal analysis
matrix factorization
Published

November 16, 2022

Introduction

Archetypal analysis was introduced by Cutler & Breiman (1994). Let \(X\) be a (real-valued) data matrix where the rows represent the samples of the data and the columns, the features. They defined the archetypes as convex combinations of the data samples, i.e. \(Z = BX\) where \(B\) is a stochastic matrix. In addition, the data samples are approximated by convex combinations of the archetypes, i.e. \(X \simeq AZ\) where \(A\) and is also stochastic matrix. This is equivalent to solve the following optimization problem:

\[ \begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B} \quad & \|X - ABX \|^2 \\ \textrm{s.t.} \quad & \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ \end{aligned} \tag{1}\]

TODO: Interpretation

BiArchetype Analysis

In BiAA, the archetypes are assumed to be convex combinations of the data in both dimensions, i.e. \(Z = BXC\) where \(B\) and \(C\) are stochastic matrices. At the same time, the data is approximated by convex combinations of the archetypes, i.e. \(X \simeq AZD\) where \(A\) and \(D\) are also stochastic matrices.

This is equivalent to solve the following optimization problem:

\[ \begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B,C,D} \quad & \ell(X|ABXCD) \\ \textrm{s.t.} \quad & \\ & \ell(X | \tilde{X}) \text{ should be a loss function} \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ & \sum\nolimits_{n=1}^N C_{nl} = 1 \text{ with } D_{nl} \in [0, 1] \text{ for each } l=1,\dots, L \\ & \sum\nolimits_{l=1}^L D_{ln} = 1 \text{ with } D_{ln} \in [0, 1] \text{ for each } n=1,\dots, N \\ \end{aligned} \tag{2}\]

In Equation 2, just as Seth & Eugster (2016) proposed for archetypal analysis, \(\ell\) could be a negative log-likelihood function. Therefore,

  • for Bernoulli distributions \(\ell\) is defined as \[ \ell(X | \tilde{X}) = -\sum_{m=1}^M \sum_{n=1}^N X_{mn}\ln (\tilde{X}_{mn}) + (1 - X_{mn}) \ln (1 - \tilde{X}_{mn}) \tag{3}\]

  • and for normal distributions, \[ \ell(X | \tilde{X}) = MN \ln \left(\sigma {\sqrt {2\pi }}\right) + {\frac {1}{2\sigma^2}}\sum_{m=1}^M \sum_{n=1}^N \left( {X_{mn}-\tilde{X}_{mn} }\right)^{2} \tag{4}\]

Example

Code
import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()

Figure 1: Temperature and ozone level.

References

Cutler, A., & Breiman, L. (1994). Archetypal analysis. Technometrics, 36, 338–347. https://doi.org/10.1080/00401706.1994.10485840
Mørup, M., & Hansen, L. K. (2012). Archetypal analysis for machine learning and data mining. Neurocomputing, 80, 54–63. https://doi.org/10.1016/j.neucom.2011.06.033
Seth, S., & Eugster, M. J. A. (2016). Probabilistic archetypal analysis. Machine Learning, 102, 85–113. https://doi.org/10.1007/S10994-015-5498-8/FIGURES/14