# Biarchetype analysis

Simultaneous learning of samples and features based on extremes

archetypal analysis
matrix factorization
Published

November 16, 2022

# Introduction

Archetypal analysis was introduced by Cutler & Breiman (1994). Let $$X$$ be a (real-valued) data matrix where the rows represent the samples of the data and the columns, the features. They defined the archetypes as convex combinations of the data samples, i.e. $$Z = BX$$ where $$B$$ is a stochastic matrix. In addition, the data samples are approximated by convex combinations of the archetypes, i.e. $$X \simeq AZ$$ where $$A$$ and is also stochastic matrix. This is equivalent to solve the following optimization problem:

\begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B} \quad & \|X - ABX \|^2 \\ \textrm{s.t.} \quad & \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ \end{aligned} \tag{1}

TODO: Interpretation

# BiArchetype Analysis

In BiAA, the archetypes are assumed to be convex combinations of the data in both dimensions, i.e. $$Z = BXC$$ where $$B$$ and $$C$$ are stochastic matrices. At the same time, the data is approximated by convex combinations of the archetypes, i.e. $$X \simeq AZD$$ where $$A$$ and $$D$$ are also stochastic matrices.

This is equivalent to solve the following optimization problem:

\begin{aligned} \mathop{\mathrm{arg\,min\,}}_{A,B,C,D} \quad & \ell(X|ABXCD) \\ \textrm{s.t.} \quad & \\ & \ell(X | \tilde{X}) \text{ should be a loss function} \\ & \sum\nolimits_{k=1}^K A_{mk} = 1 \text{ with } A_{mk} \in [0, 1] \text{ for each } m=1,\dots, M \\ & \sum\nolimits_{m=1}^M B_{km} = 1 \text{ with } B_{km} \in [0, 1] \text{ for each } k=1,\dots, K \\ & \sum\nolimits_{n=1}^N C_{nl} = 1 \text{ with } D_{nl} \in [0, 1] \text{ for each } l=1,\dots, L \\ & \sum\nolimits_{l=1}^L D_{ln} = 1 \text{ with } D_{ln} \in [0, 1] \text{ for each } n=1,\dots, N \\ \end{aligned} \tag{2}

In Equation 2, just as Seth & Eugster (2016) proposed for archetypal analysis, $$\ell$$ could be a negative log-likelihood function. Therefore,

• for Bernoulli distributions $$\ell$$ is defined as $\ell(X | \tilde{X}) = -\sum_{m=1}^M \sum_{n=1}^N X_{mn}\ln (\tilde{X}_{mn}) + (1 - X_{mn}) \ln (1 - \tilde{X}_{mn}) \tag{3}$

• and for normal distributions, $\ell(X | \tilde{X}) = MN \ln \left(\sigma {\sqrt {2\pi }}\right) + {\frac {1}{2\sigma^2}}\sum_{m=1}^M \sum_{n=1}^N \left( {X_{mn}-\tilde{X}_{mn} }\right)^{2} \tag{4}$

# Example

Code
import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
subplot_kw = {'projection': 'polar'}
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()