Obtain a set of complexity measures for an English sentence

Contributed by:
Mark Greenberg

ResourceFunction["SentenceComplexityMeasures"][ gives a report listing five measures of the complexity of sentence |

ResourceFunction["SentenceComplexityMeasures"][*s*] only works for English *s*.

ResourceFunction["SentenceComplexityMeasures"][*s*] combines five separately rescaled measurements of the graph and tree of *s*:

"leaf count" | number of surface-level structures (words, contracted words) |

"vertex count" | number of grammatical structures (words, phrases, clauses, etc.) |

"tree depth" | maximum depth of nested grammatical structures |

"graph diameter" | longest distance between grammatical structures |

"distance" | average distance between grammatical structures |

The author determined the scaling ranges from empirical tests.

The five measurements are rescaled and summed to produce a single empirically-defined metric.

ResourceFunction["SentenceComplexityMeasures"][*s*] uses this formula to determine the overall complexity score: (Rescale[*leafCt*,{2,120}]+Rescale[*vertCt*,{6,360}]+Rescale[*depth*,{3,15}]+Rescale[*diam*,{6,22}]+Rescale[*dist*,{2.5,10}])/5

The "Format" option gives different kinds of output and can take the following values:

"Association" | an association for all five measures and their complexity scores |

"Number" | a real number from 0 (simplest possible sentence) to 1 (extremely complex sentence) |

"Dataset" | (default) an easily read report with the same information as "Association" |

Get complexity measures for a sentence:

In[1]:= |

Out[1]= |

Measure a more complex sentence:

In[2]:= |

Out[2]= |

Get the complexity measure as a real number:

In[3]:= |

Out[4]= |

Get an association with more detail for the same sentence:

In[5]:= |

Out[5]= |

Demonstrate the effects of different sentence enhancements:

In[6]:= |

Out[11]= |

Measure the sentence complexity mean and standard deviation of an entire piece of literature. (This took 110 seconds on the development computer.):

In[12]:= |

Out[14]= |

Demonstrate the difference between a run-on sentence and one with nested clauses:

In[15]:= |

Out[17]= |

There is no limit to how long and complex an English sentence might be. The parser fails on extremely complex sentences. In those cases, SentenceComplexityMeasures will return a Failure object:

In[18]:= |

Out[18]= |

Wolfram Language 13.0 (December 2021) or above

- 1.0.0 – 22 January 2024

