Function Repository Resource:

MultidimensionalScaling

Source Notebook

Reduce a matrix of real values to low dimension using the principal coordinates analysis method

Contributed by: Daniel Lichtblau

ResourceFunction["MultidimensionalScaling"][vecs,dim]

uses principal coordinates analysis to find a "best projection" of vecs to dimension dim.

ResourceFunction["MultidimensionalScaling"][vecs]

projects vecs to two dimensions.

Details

Multidimensional Scaling (MDS) is a standard method for reducing the dimension of a set of numerical vectors. It is related to the "LatentSemanticAnalysis" and "PrincipalComponentsAnalysis" methods of DimensionReduce.

ResourceFunction["MultidimensionalScaling"] gives an optimal reduction according to a certain Euclidean measure.

Given a set of n vectors, ResourceFunction["MultidimensionalScaling"] will create a list of all pairwise distances, that is, an n×n dense matrix of real values. It is thus not recommended to use this when n is large.

This function computes what is known as the classical multidimensional scaling for a given set of points. Other types of multidimensional scaling exist, including one that is supported as a Method setting for the built-in function DimensionReduce.

Several variants of multidimensional scaling are described in this Wikipedia article. The one implemented here is also referred to as principal coordinates analysis.

Examples

Basic Examples (1)

Reduce the dimension of some vectors:

In[1]:=

Out[1]=

Scope (2)

Create and visualize random 3D vectors:

In[2]:=

vectors = Join[RandomReal[{0, 3}, {500, 3}], RandomReal[{2, 5}, {500, 3}], RandomReal[{4, 7}, {500, 3}]];
ListPointPlot3D[vectors]

Out[3]=

Visualize this dataset reduced to two dimensions:

In[4]:=

Out[4]=

MultidimensionalScaling will reduce to any dimension that is no larger than the input dimension. Here data is created in 10-dimensional space and visualized in three dimensions:

In[5]:=

dim = 10;
num = 500;
vectors = Join[RandomReal[{0, 3}, {num, dim}], RandomReal[{2, 5}, {num, dim}],
RandomReal[{4, 7}, {num, dim}]];

In[6]:=

Out[6]=

Properties and Relations (8)

As is done in the reference page for DimensionReduce, load the Fisher Iris dataset from ExampleData:

In[7]:=

Reduce the dimension of the features:

In[8]:=

Group the examples by their species:

In[9]:=

Visualize the reduced dataset:

In[10]:=

Out[10]=

Now show some DimensionReduce methods for this same dataset. First, use the "PrincipalComponentsAnalysis" method:

In[11]:=

featuresPCA = DimensionReduce[iris[[All, 1]], 2, Method -> "PrincipalComponentsAnalysis"];
byspeciesPCA = GroupBy[Thread[featuresPCA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesPCA], PlotLegends -> Keys[byspeciesPCA]]

Out[12]=

Use the "TSNE" method:

In[13]:=

featuresTSNE = DimensionReduce[iris[[All, 1]], 2, Method -> "TSNE"];
byspeciesTSNE = GroupBy[Thread[featuresTSNE -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesTSNE], PlotLegends -> Keys[byspeciesTSNE]]

Out[14]=

Visualize with the "LatentSemanticAnalysis" method:

In[15]:=

featuresLSA = DimensionReduce[iris[[All, 1]], 2, Method -> "LatentSemanticAnalysis"];
byspeciesLSA = GroupBy[Thread[featuresLSA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA], PlotLegends -> Keys[byspeciesLSA]]

Out[16]=

The "LatentSemanticAnalysis" method can be attained directly using SingularValueDecomposition:

In[17]:=

$LSA[vecs_, dim_] := With[{svd = SingularValueDecomposition[vecs, dim]}, svd[[1]] . Sqrt[svd[[2]]]]$

In[18]:=

featuresLSA2 = LSA[iris[[All, 1]], 2];
byspeciesLSA2 = GroupBy[Thread[featuresLSA2 -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA2], PlotLegends -> Keys[byspeciesLSA2]]

Out[19]=

Neat Examples (6)

Illustrate multidimensional scaling on textual data using several popular literature texts from ExampleData:

Break each text into chunks of equal string length:

Find the most common words across all texts:

Create common word frequency vectors for each chunk:

Weight the frequency vectors using the log-entropy method:

Show the result of multidimensional scaling in two dimensions, grouping text chunks by position of the title in the list of text names:

In[20]:=

wdim2 = ResourceFunction[
"MultidimensionalScaling", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][weightedWordsByChunkMatrix, 2];
byAuthorW = GatherBy[Thread[labels -> wdim2], First];
ListPlot[Values[byAuthorW], PlotLegends -> Apply[Union, Keys[byAuthorW]]]

Out[22]=

Version History

1.1.1 – 12 September 2022
1.1.0 – 31 May 2022
1.0.2 – 22 November 2021

Source Metadata

Citation:
- Martin S. Zand, Jiong Wang, Shannon Hilchey. "Graphical Representation of Proximity Measures for Multidimensional Data: Classical and Metric Multidimensional Scaling." The Mathematica Journal, Volume 17, 2015.

Author Notes

Some nice speed improvements were provided by Paritosh Mokhasi.

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License

Wolfram Function Repository

MultidimensionalScaling

Details