Wolfram Research

Function Repository Resource:

MultidimensionalScaling

Source Notebook

Reduce a matrix of real values to low dimension using the principal coordinates analysis method

Contributed by: Daniel Lichtblau

ResourceFunction["MultidimensionalScaling"][vecs,dim]

uses principal coordinates analysis to find a "best projection" of vecs to dimension dim.

ResourceFunction["MultidimensionalScaling"][vecs]

projects vecs to two dimensions.

Details

Multidimensional Scaling (MDS) is a standard method for reducing dimension of a set of numerical vectors. It is related to the "LatentSemanticAnalysis" and "PrincipalComponentsAnalysis" methods of DimensionReduce.
ResourceFunction["MultidimensionalScaling"] gives an optimal reduction according to a certain Euclidean measure.
Given a set of n vectors, ResourceFunction["MultidimensionalScaling"] will create a list of all pairwise distances, that is, an n×n dense matrix of real values. It is thus not recommended to use this when n is large.

Examples

Basic Examples (1) 

Reduce the dimension of some vectors:

In[1]:=
ResourceFunction[
 "MultidimensionalScaling"][{{1, 2, 3}, {2, 3, 5}, {3, 5, 8}, {4, 5, 8.5}}]
Out[1]=

Scope (2) 

Create and visualize random 3D vectors:

In[2]:=
vectors = Join[RandomReal[{0, 3}, {500, 3}], RandomReal[{2, 5}, {500, 3}], RandomReal[{4, 7}, {500, 3}]];
ListPointPlot3D[vectors]
Out[3]=

Visualize this dataset reduced to two dimensions:

In[4]:=
ListPlot[ResourceFunction["MultidimensionalScaling"][vectors]]
Out[4]=

MultidimensionalScaling will reduce to any dimension that is no larger than the input dimension. Here we create data in ten dimensional space, and visualize in three dimensions:

In[5]:=
dim = 10;
num = 500;
vectors = Join[RandomReal[{0, 3}, {num, dim}], RandomReal[{2, 5}, {num, dim}],
    RandomReal[{4, 7}, {num, dim}]];
In[6]:=
ListPointPlot3D[
 ResourceFunction["MultidimensionalScaling"][vectors, 3]]
Out[6]=

Properties and Relations (8) 

As is done in the reference page for DimensionReduce, load the Fisher iris dataset from ExampleData:

In[7]:=
iris = ExampleData[{"MachineLearning", "FisherIris"}, "Data"];

Reduce the dimension of the features:

In[8]:=
featuresMDS = ResourceFunction["MultidimensionalScaling"][iris[[All, 1]], 2];

Group the examples by their species:

In[9]:=
byspeciesMDS = GroupBy[Thread[featuresMDS -> iris[[All, 2]]], Last -> First];

Visualize the reduced dataset:

In[10]:=
ListPlot[Values[byspeciesMDS], PlotLegends -> Keys[byspeciesMDS]]
Out[10]=

Now show some DimensionReduce methods for this same dataset. First we use the "PrincipalComponentsAnalysis" method:

In[11]:=
featuresPCA = DimensionReduce[iris[[All, 1]], 2, Method -> "PrincipalComponentsAnalysis"];
byspeciesPCA = GroupBy[Thread[featuresPCA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesPCA], PlotLegends -> Keys[byspeciesPCA]]
Out[12]=

Use the "TSNE" method:

In[13]:=
featuresTSNE = DimensionReduce[iris[[All, 1]], 2, Method -> "TSNE"];
byspeciesTSNE = GroupBy[Thread[featuresTSNE -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesTSNE], PlotLegends -> Keys[byspeciesTSNE]]
Out[14]=

Visualize with the "LatentSemanticAnalysis" method:

In[15]:=
featuresLSA = DimensionReduce[iris[[All, 1]], 2, Method -> "LatentSemanticAnalysis"];
byspeciesLSA = GroupBy[Thread[featuresLSA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA], PlotLegends -> Keys[byspeciesLSA]]
Out[16]=

The "LatentSemanticAnalysis" method can be attained directly using SingularValueDecomposition:

In[17]:=
LSA[vecs_, dim_] := With[{svd = SingularValueDecomposition[vecs, dim]}, svd[[1]] . Sqrt[svd[[2]]]]
In[18]:=
featuresLSA2 = LSA[iris[[All, 1]], 2];
byspeciesLSA2 = GroupBy[Thread[featuresLSA2 -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA2], PlotLegends -> Keys[byspeciesLSA2]]
Out[19]=

Neat Examples (6) 

Illustrate multidimensional scaling on textual data using several popular literature texts from ExampleData:

Break each text into chunks of equal string length:

Find the most common words across all texts:

Create common word frequency vectors for each chunk:

Weight the frequency vectors using the log-entropy method:

Show the result of multidimensional scaling in two dimensions, grouping text chunks by position of title in the list of text names:

In[20]:=
wdim2 = ResourceFunction["MultidimensionalScaling"][
   weightedWordsByChunkMatrix, 2];
byAuthorW = GatherBy[Thread[labels -> wdim2], First];
ListPlot[Values[byAuthorW], PlotLegends -> Apply[Union, Keys[byAuthorW]]]
Out[22]=

Resource History

Source Metadata

License Information