Method: Gower similarity measure

  • Pro: It can be used for vectors with both categorical and continuous variables.
  • Con: It takes into account neither correlation between variables nor variable importance.

\[\begin{equation} d_{gower}(\underline{x}_i, \underline{x}_j) = \frac{1}{p} \sum_{k=1}^p d^k(x_i^k, x_j^k) \end{equation}\]

For a continuous variable:

\[ d^k(x_i^k, x_j^k)=\frac{|x_i^k-x_j^k|}{\max(x_1^k,\ldots,x_n^k)-\min(x_1^k,\ldots,x_n^k)}, \]

For a categorical variable:

\[ d^k(x_i^k, x_j^k)=1_{x_i^k \neq x_j^k} \]