Wolfram Research

Function Repository Resource:

NonNegativeMatrixFactorization

Source Notebook

Decompose a matrix into two non-negative matrix factors

Contributed by: Anton Antonov

ResourceFunction["NonNegativeMatrixFactorization"][mat,k]

finds non-negative matrix factors for the matrix mat using k dimensions.

Details and Options

Non-negative matrix factorization (NNMF) is a matrix factorization method that reduces the dimensionality of a given matrix.
NNMF has two factors: the left one is the reduced dimensionality representation matrix and the right one is the corresponding new-basis matrix.
The argument matrix need not be square.
NNMF allows easier interpretation of extracted topics in the framework of latent semantic analysis.
When using NNMF over collections of images and documents, often more than one execution is required in order to build confidence in the topics interpretation.
In comparison with the thin singular value decomposition, NNMF provides a non-orthogonal basis with vectors that have non-negative coordinates.
Here are the options taken by ResourceFunction["NonNegativeMatrixFactorization"]:
"Epsilon" 10-9 denominator (regularization) offset
MaxSteps 200 maximum iteration steps
"NonNegative" True should the non-negativity be enforced at each step
"Normalization" Left which normalization is applied to the factors
PrecisionGoal Automatic precision goal
"ProfilingPrints" False should profiling times be printed out during execution
"RegularizationParameter" 0.01 regularization (multiplier) parameter

Examples

Basic Examples

Create a random integer matrix:

In[1]:=
SeedRandom[7]
mat = RandomInteger[10, {4, 3}];
MatrixForm[mat]
Out[3]=

Compute the NNMF factors:

In[4]:=
{W, H} = ResourceFunction["NonNegativeMatrixFactorization"][mat, 2];
Row[{MatrixForm[W], MatrixForm[H]}]
Out[5]=

Here is the matrix product of the obtained factors:

In[6]:=
MatrixForm[W.H]
Out[6]=

Note that elementwise relative errors between the original matrix and reconstructed matrix are small:

In[7]:=
MatrixForm[Round[(W.H - mat)/mat, 0.001]]
Out[7]=

Scope

Here is a random matrix with its first two columns having much larger magnitudes than the rest:

In[8]:=
SeedRandom[278];
mat = RandomReal[{20, 100}, {10, 2}];
mat2 = RandomReal[{10, 30}, {10, 7}];
mat = ArrayPad[mat, {{0, 0}, {0, 5}}] + mat2;
ListLinePlot[Transpose[mat], PlotRange -> All, ImageSize -> Medium, PlotTheme -> "Detailed"]
Out[12]=

Here we compute NNMF factors:

In[13]:=
SeedRandom[32];
{W, H} = ResourceFunction["NonNegativeMatrixFactorization"][mat, 3, MaxSteps -> 100, PrecisionGoal -> 8];

Find the relative error of the approximation by the matrix factorization:

In[14]:=
Norm[mat - W.H]/Norm[mat]
Out[14]=

Here is the relative error for the first three columns:

In[15]:=
Norm[mat[[All, 1 ;; 3]] - (W.H)[[All, 1 ;; 3]]]/
 Norm[mat[[All, 1 ;; 3]]]
Out[15]=

Here are comparison plots:

In[16]:=
Block[{opts = {PlotRange -> All, ImageSize -> 250, PlotTheme -> "Detailed"}}, Row[{ListLinePlot[Transpose[mat], opts, PlotLabel -> "Orginal"], ListLinePlot[Transpose[W.H], opts, PlotLabel -> "Reconstructed"]}]]
Out[16]=

Options

Normalization

NNMF factors can be normalized in two ways: (i) the Euclidean norms of the columns of the left factor are all equal to 1; or (ii) the Euclidean norms of the rows of the right factor are all equal to 1. Here is a table that shows NNMF factors for different normalization specifications:

In[17]:=
Block[{mat},
 SeedRandom[7];
 mat = RandomInteger[10, {4, 3}];
 Multicolumn[
  Table[
   SeedRandom[121];
   {W, H} = ResourceFunction["NonNegativeMatrixFactorization"][mat, 2, "Normalization" -> nspec];
   Grid[{
     {Style[nspec, Bold, Blue], SpanFromLeft},
     {Labeled[MatrixForm[W], "W", Top], Labeled[MatrixForm[H], "H", Top]},
     {"Norm/@Transpose[W]:", Norm /@ Transpose[W]},
     {"Norm/@H:", Norm /@ H}},
    Dividers -> All, FrameStyle -> GrayLevel[0.8], Alignment -> Left
    ], {nspec, {Automatic, Left, Right, True, False, None}}],
  2, Dividers -> All]
 ]
Out[17]=

RegularizationParameter

The implemented NNMF algorithm uses the gradient descent algorithm. The regularization parameter controls the “learning rate” and can have dramatic influence on the number of iteration steps and approximation precision. Compute NNMF with different regularization multiplier parameters:

In[18]:=
res = Association[# -> BlockRandom[SeedRandom[22];
       ResourceFunction["NonNegativeMatrixFactorization"][mat, 3, "RegularizationParameter" -> #, MaxSteps -> 10, PrecisionGoal -> 3]] & /@ {0.01, 2}];

Plot the results:

In[19]:=
Grid[KeyValueMap[{#1, ListLinePlot[Transpose[#2[[1]]], PlotRange -> All, ImageSize -> 250, PlotTheme -> "Detailed"]} &, res], Alignment -> {Center, Left}]
Out[19]=

Here are the corresponding norms:

In[20]:=
Map[Norm[# - mat]/Norm[mat] &, Dot @@@ res]
Out[20]=

Applications

One of the main motivations for developing NNMF algorithms is the easier interpretation of extracted topics in the framework of latent semantic analysis. The following code illustrates the extraction of topics over a dataset of movie reviews.

Start with a movie reviews dataset:

In[21]:=
movieReviews = ExampleData[{"MachineLearning", "MovieReview"}, "Data"];
Dimensions[movieReviews]
Out[22]=

Change the labels “positive” and “negative” to have the prefix “tag:”:

In[23]:=
movieReviews[[All, 2]] = "tag:" <> # & /@ movieReviews[[All, 2]];

Concatenate to each review the corresponding label and create an Association:

In[24]:=
aMovieReviews = AssociationThread[
   Range[Length[movieReviews]] -> Map[StringRiffle[List @@ #, " "] &, movieReviews]];
RandomSample[aMovieReviews, 2]
Out[25]=

Select movie reviews that are nontrivial:

In[26]:=
aMovieReviews = Select[aMovieReviews, StringQ[#] && StringLength[#] > 10 &];
RandomSample[aMovieReviews, 2]
Out[27]=

Split each review into words and delete stopwords:

In[28]:=
aMovieReviews2 = DeleteStopwords[Select[StringSplit[#], StringLength[#] > 0 &]] & /@ aMovieReviews;

Convert the movie reviews association into a list of ID-word pairs (“long form”):

In[29]:=
lsLongForm = Join @@ MapThread[Thread[{##}] &, Transpose[List @@@ Normal[aMovieReviews2]]];
RandomSample[lsLongForm, 4]
Out[30]=

Replace each word with its Porter stem:

In[31]:=
AbsoluteTiming[
 aStemRules = Dispatch[Thread[Rule[#, WordData[#, "PorterStem"] & /@ #]] &@
    Union[lsLongForm[[All, 2]]]];
 lsLongForm[[All, 2]] = lsLongForm[[All, 2]] /. aStemRules;
 ]
Out[31]=

Find the frequency of appearance of all unique word stems and pick words that appear in more than 30 reviews:

In[32]:=
aTallies = Association[Rule @@@ Tally[lsLongForm[[All, 2]]]];
aTallies = Select[aTallies, # > 20 &];
Length[aTallies]
Out[34]=

Filter the ID-word pairs list to contain only words that are sufficiently popular:

In[35]:=
lsLongForm = Select[lsLongForm, KeyExistsQ[aTallies, #[[2]]] && StringLength[#[[2]]] > 2 &];

Compute an ID-word contingency matrix:

In[36]:=
ctObj = ResourceFunction["CrossTabulate"][lsLongForm, "Sparse" -> True];

Here is a sample of the contingency matrix as a dataset:

In[37]:=
SeedRandom[9938];
ResourceFunction["CrossTabulate"][RandomSample[lsLongForm, 12]]
Out[38]=

Visualize the contingency matrix:

In[39]:=
CTMatrixPlot[x_Association /; KeyExistsQ[x, "SparseMatrix"], opts___] := MatrixPlot[x["SparseMatrix"], Append[{opts}, FrameLabel -> {{Keys[x][[2]], None}, {Keys[x][[3]], None}}]];
CTMatrixPlot[ctObj]
Out[40]=

Take the contingency sparse matrix:

In[41]:=
matCT = N[ctObj["SparseMatrix"]];

Here is a plot that shows the Pareto principle adherence of the selected popular words:

In[42]:=
ResourceFunction["ParetoPrinciplePlot"][Total[matCT, {1}]]
Out[42]=

Judging from the Pareto plot we should apply the inverse document frequency (IDF) formula:

In[43]:=
matCT = matCT.SparseArray[
    DiagonalMatrix[Log[Dimensions[matCT][[1]]/Total[matCT, {1}]]]];

Normalize each row with the Euclidean norm:

In[44]:=
matCT = matCT/Sqrt[Total[matCT*matCT, {2}]];

In order to get computations faster, we take a sample submatrix:

In[45]:=
SeedRandom[8966]
matCT2 = matCT[[RandomSample[Range[Dimensions[matCT][[1]]], 4000], All]]
Out[46]=

Apply NNMF in order to extract 24 topics using 12 maximum iteration steps, normalizing the right factor:

In[47]:=
SeedRandom[23];
AbsoluteTiming[
 {W, H} = ResourceFunction["NonNegativeMatrixFactorization"][matCT2, 24, MaxSteps -> 12, "Normalization" -> Right];
 ]
Out[48]=

Show the extracted topics using the right factor (H):

In[49]:=
Multicolumn[
 Table[
  Column[{Style[ind, Blue, Bold], ColumnForm[
     Keys[TakeLargest[
       AssociationThread[
        ctObj["ColumnNames"] -> Normal[H[[ind, All]]]], 10]]]}],
  {ind, Dimensions[H][[1]]}
  ], 8, Dividers -> All]
Out[49]=

Here we show statistical thesaurus entries for random words, selected words and the labels (“tag:positive”,”tag:negative”):

In[50]:=
SeedRandom[898];
rinds = Flatten[
   Position[ctObj["ColumnNames"], #] & /@ Map[WordData[#, "PorterStem"] &, {"tag:positive", "tag:negative", "book", "amusing", "actor", "plot", "culture", "comedy", "director", "thoughtful", "epic", "film", "bad", "good"}]];
rinds = Sort@
   Join[rinds, RandomSample[Range[Dimensions[H][[2]]], 16 - Length[rinds]]];
Multicolumn[
 Table[
  Column[{Style[ctObj["ColumnNames"][[ind]], Blue, Bold], ColumnForm[
     ctObj["ColumnNames"][[
      Flatten@Nearest[Normal[Transpose[H]] -> "Index", H[[All, ind]], 12]]]]}],
  {ind, rinds}
  ], 8, Dividers -> All]
Out[53]=

Properties and Relations

NNMF should be compared with the singular value decomposition (SVD) and independent component analysis (ICA).

Generate 3D points:

In[54]:=
n = 200;
c = 12;
SeedRandom[232];
points = Transpose[
   RandomVariate[NormalDistribution[0, #], n] & /@ {2, 4, 0.1}];
points = points.RotationMatrix[{{0, 0, 1}, {-1, 0, 1}}];
points = Map[{2, 8, 2} + # &, points];
points = Clip[points, {0, Max[points]}];
opts = {BoxRatios -> {1, 1, 1}, PlotRange -> Table[{c, -c}, 3], FaceGrids -> {{0, 0, -1}, {0, 1, 0}, {-1, 0, 0}}, ViewPoint -> {-1.1, -2.43, 2.09}};
gr0 = ListPointPlot3D[points, opts];

Compute matrix factorizations using NNMF, SVD and ICA and make a comparison plots grid (rotate the graphics boxes for better perception):

In[55]:=
SeedRandom[232];
{W, H} = ResourceFunction["NonNegativeMatrixFactorization"][points, 2,
    "Normalization" -> Right];
grNNMF = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*H/Norm[H])}]}, opts, PlotLabel -> "NNMF"];
{U, S, V} = SingularValueDecomposition[points, 2];
grSVD = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*Transpose[V])}]}, opts, PlotLabel -> "SVD"];
{A, S} = ResourceFunction["IndependentComponentAnalysis"][points, 2];
grICA = Show[{ListPointPlot3D[points], Graphics3D[{Red, Arrow[{{0, 0, 0}, #}] & /@ (c*S/Norm[S])}]}, opts, PlotLabel -> "ICA"];
Grid[{{gr0, grSVD}, {grICA, grNNMF}}]
Out[62]=

Resource History

Source Metadata

Related Resources

License Information