Wolfram Research

Function Repository Resource:

MultidimensionalScaling

Source Notebook

Reduce a matrix of real values to low dimension using the principal coordinates analysis method

Contributed by: Daniel Lichtblau

ResourceFunction["MultidimensionalScaling"][vecs,dim]

uses principal coordinates analysis to find a "best projection" of vecs to dimension dim.

ResourceFunction["MultidimensionalScaling"][vecs]

projects vecs to two dimensions.

Details and Options

Multidimensional Scaling (MDS) is a standard method for reducing dimension of a set of numerical vectors. It is related to the "LatentSemanticAnalysis" and "PrincipalComponentsAnalysis" methods of DimensionReduce.
ResourceFunction["MultidimensionalScaling"] gives an optimal reduction according to a certain Euclidean measure.
Given a set of n vectors, ResourceFunction["MultidimensionalScaling"] will create a list of all pairwise distances, that is, an n×n dense matrix of real values. It is thus not recommended to use this when n is large.

Examples

Basic Examples

Reduce the dimension of some vectors:

In[1]:=
ResourceFunction[
 "MultidimensionalScaling"][{{1, 2, 3}, {2, 3, 5}, {3, 5, 8}, {4, 5, 8.5}}]
Out[1]=

Scope

Create and visualize random 3D vectors:

In[2]:=
vectors = Join[RandomReal[{0, 3}, {500, 3}], RandomReal[{2, 5}, {500, 3}], RandomReal[{4, 7}, {500, 3}]];
ListPointPlot3D[vectors]
Out[3]=

Visualize this dataset reduced to two dimensions:

In[4]:=
ListPlot[ResourceFunction["MultidimensionalScaling"][vectors]]
Out[4]=

MultidimensionalScaling will reduce to any dimension that is no larger than the input dimension. Here we create data in ten-dimensional space, and visualize in three dimensions:

In[5]:=
dim = 10;
num = 500;
vectors = Join[RandomReal[{0, 3}, {num, dim}], RandomReal[{2, 5}, {num, dim}],
    RandomReal[{4, 7}, {num, dim}]];
In[6]:=
ListPointPlot3D[
 ResourceFunction["MultidimensionalScaling"][vectors, 3]]
Out[6]=

Properties and Relations

As is done in the reference page for DimensionReduce, load the Fisher iris dataset from ExampleData:

In[7]:=
iris = ExampleData[{"MachineLearning", "FisherIris"}, "Data"];

Reduce the dimension of the features:

In[8]:=
featuresMDS = ResourceFunction["MultidimensionalScaling"][iris[[All, 1]], 2];

Group the examples by their species:

In[9]:=
byspeciesMDS = GroupBy[Thread[featuresMDS -> iris[[All, 2]]], Last -> First];

Visualize the reduced dataset:

In[10]:=
ListPlot[Values[byspeciesMDS], PlotLegends -> Keys[byspeciesMDS]]
Out[10]=

Now show some DimensionReduce methods for this same dataset. First we use the "PrincipalComponentsAnalysis" method:

In[11]:=
featuresPCA = DimensionReduce[iris[[All, 1]], 2, Method -> "PrincipalComponentsAnalysis"];
byspeciesPCA = GroupBy[Thread[featuresPCA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesPCA], PlotLegends -> Keys[byspeciesPCA]]
Out[12]=

Use the "TSNE" method:

In[13]:=
featuresTSNE = DimensionReduce[iris[[All, 1]], 2, Method -> "TSNE"];
byspeciesTSNE = GroupBy[Thread[featuresTSNE -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesTSNE], PlotLegends -> Keys[byspeciesTSNE]]
Out[14]=

Visualize with the "LatentSemanticAnalysis" method:

In[15]:=
featuresLSA = DimensionReduce[iris[[All, 1]], 2, Method -> "LatentSemanticAnalysis"];
byspeciesLSA = GroupBy[Thread[featuresLSA -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA], PlotLegends -> Keys[byspeciesLSA]]
Out[16]=

The "LatentSemanticAnalysis" method can be attained directly using SingularValueDecomposition:

In[17]:=
LSA[vecs_, dim_] := With[{svd = SingularValueDecomposition[vecs, dim]}, svd[[1]].Sqrt[svd[[2]]]]
In[18]:=
featuresLSA2 = LSA[iris[[All, 1]], 2];
byspeciesLSA2 = GroupBy[Thread[featuresLSA2 -> iris[[All, 2]]], Last -> First];
ListPlot[Values[byspeciesLSA2], PlotLegends -> Keys[byspeciesLSA2]]
Out[19]=

Neat Examples

Illustrate multidimensional scaling on textual data using several popular literature texts from ExampleData:

In[20]:=
textnames = {{"Text", "AeneidEnglish"}, {"Text", "AliceInWonderland"}, {"Text", "BeowulfModern"}, {"Text", "DonQuixoteIEnglish"}, {"Text", "Hamlet"}, {"Text", "PrideAndPrejudice"}};
texts = Map[ExampleData, textnames];

Break each text into chunks of equal string length:

In[21]:=
seglen = 5000;
chunks = Map[Take[StringPartition[#, seglen], UpTo[20]] &, texts];
chunkwordlists = Map[StringSplit, chunks, {2}];
chunkwordlistsB = Map[deletePunctuation, chunkwordlists, {3}];

Find the most common words across all texts:

In[22]:=
pwordcount = 500;
talliedwords = Tally[Flatten[chunkwordlistsB]];
popularwordsandcounts = TakeLargestBy[talliedwords, Last, pwordcount];
popularwords = popularwordsandcounts[[All, 1]];
popwPatterns = Apply[Alternatives, popularwords];

Create common word frequency vectors for each chunk:

In[23]:=
popchunkwords = Map[Cases[#, popwPatterns] &, chunkwordlistsB, {2}];
talliedWords = Map[Tally, popchunkwords, {2}];
pwordposns = Dispatch[Thread[popularwords -> Range[pwordcount]]];
wordvecs = Map[Normal[SparseArray[#, pwordcount]] &, Map[(#[[1]] /. pwordposns) -> #[[2]] &, talliedWords, {3}], {2}];
allwordvecs = Flatten[wordvecs, 1];

Weight the frequency vectors using the log-entropy method:

In[24]:=
globalWordVec = popularwordsandcounts[[All, 2]];
labels = Flatten[
   Table[ConstantArray[j, Length[chunks[[j]]]], {j, Length[texts]}]];
frequencyMatrix = Map[1./globalWordVec*# &, N@allwordvecs];
globalEntropyVector = (1. + Total[frequencyMatrix*Map[log, frequencyMatrix, {2}]]/(Log[
        Length[allwordvecs]])) /. log -> Log;
weightedWordsByChunkMatrix = Map[globalEntropyVector*# &, Log[1. + allwordvecs]];

Show the result of multidimensional scaling in two dimensions, grouping text chunks by position of title in the list of text names:

In[25]:=
wdim2 = ResourceFunction["MultidimensionalScaling"][
   weightedWordsByChunkMatrix, 2];
byAuthorW = GatherBy[Thread[labels -> wdim2], First];
ListPlot[Values[byAuthorW], PlotLegends -> Apply[Union, Keys[byAuthorW]]]
Out[27]=

Resource History

Source Metadata

License Information