NuToxicity Text Feature Extractor

Represent text as a sequence of vectors

Released in 2023, NuToxicity is a BERT-based transformer encoder from NuMind designed for content-moderation feature extraction. It is publicly distributed as a feature-extraction model and is built on top of the E5-base-v2 text-embedding architecture. The public configuration indicates a 12-layer encoder with a hidden size of 768 and a maximum sequence length of 512. The model can be used to obtain contextual token representations and sentence-level embeddings for downstream moderation applications.

Training Set Information

Model Information

Examples

Resource retrieval

Get the pre-trained net:

In[1]:=
NetModel["NuToxicity Text Feature Extractor"]
Out[1]=

Evaluation function

Get the tokenizer to process text inputs into tokens:

In[2]:=
tokenizer = NetModel["NuToxicity Text Feature Extractor", "Tokenizer"]
Out[2]=

Write a function that preprocesses a list of input sentences:

In[3]:=
prepareBatch[inputStrings_?ListQ] := Block[
   {tokens, attentionMask},
   tokens = tokenizer[inputStrings] - 1;
   attentionMask = PadRight[ConstantArray[1, Length[#]] & /@ tokens, Automatic];
   tokens = PadRight[tokens, Automatic, 1];
   <|
    "input_ids" -> tokens, "attention_mask" -> attentionMask
    |>
   ];

Write a function that applies mean pooling to the hidden states:

In[4]:=
meanPooler[vectors_?MatrixQ, weights_?VectorQ] := Divide[weights . vectors, Total[weights]]
meanPooler[vectors_?ArrayQ, weights_?ArrayQ] := MapThread[meanPooler, {vectors, weights}]

Write a function that returns one of the requested outputs from the NuToxicity encoder (last hidden state, sentence and normalized embeddings) and optionally trims padding tokens using the "attention_mask" when the optional parameter "ApplyMask" is set to True:

In[5]:=
Options[netevaluate] = {"ApplyMask" -> False}; netevaluate[input_?StringQ, output : ("LastHiddenState" | "SentenceEmbedding" | "NormalizedEmbedding" | "MeanPooling") : "MeanPooling", opts : OptionsPattern[]] := If[output === "NetOutputs", First /@ netevaluate[{input}, output, opts], First@netevaluate[{input}, output, opts]];

netevaluate[inputStrings_?ListQ, output : ("LastHiddenState" | "SentenceEmbedding" | "NormalizedEmbedding" | "MeanPooling") : "MeanPooling", opts : OptionsPattern[]] := Module[
   {assoc, out, h, mask, pooled},
   assoc = prepareBatch[inputStrings];
   mask = assoc["attention_mask"];
   out = NetModel["NuToxicity Text Feature Extractor"][assoc];
   Switch[output,
    "LastHiddenState",
    h = out["last_hidden_state"];
    If[TrueQ@OptionValue["ApplyMask"], MapThread[Take, {h, Total /@ mask}], h], "SentenceEmbedding",
    out["sentence_embedding"], "NormalizedEmbedding",
    out["normalized_embedding"], "MeanPooling",
    h = out["last_hidden_state"];
    pooled = meanPooler[h, mask];
    Normalize /@ pooled,
    "NetOutputs",
    out
    ]
   ];

Basic usage

Get the sentence embedding:

In[6]:=
output = netevaluate["query: The air in the city is very polluted."];

Get the dimensions of the output:

In[7]:=
Dimensions@output
Out[7]=

Get the sentences:

In[8]:=
sentences = {
   "query: rude and insulting message", "query: normal message",
   "passage: your tone is disrespectful and unnecessarily hostile", "passage: the package arrived this morning."};

Get the sentence embeddings using "NormalizedEmbedding":

In[9]:=
output = netevaluate[sentences, "NormalizedEmbedding"];

Get the dimensions of the output:

In[10]:=
Dimensions[output]
Out[10]=

Get the scores from the output's embeddings:

In[11]:=
scores = output[[1 ;; 2]] . Transpose[output[[3 ;; 4]]]
Out[11]=

Input preprocessing

Preprocess a batch of sentences into inputs expected by the model. The result is an association:

"input_ids": integer token indices

"attention_mask": a binary mask indicating valid tokens vs. padding tokens

In[12]:=
inputs = prepareBatch[sentences];

Get the dimensions of the preprocessed sentences:

In[13]:=
Map[Dimensions, inputs]
Out[13]=

Visualize the preprocessed sentences:

In[14]:=
ArrayPlot /@ inputs
Out[14]=

Get the sentence embeddings:

In[15]:=
outputs = NetModel["NuToxicity Text Feature Extractor"][inputs];

Get the dimensions of the outputs:

In[16]:=
Dimensions /@ outputs
Out[16]=

Visualize the first sentence embedding:

In[17]:=
MatrixPlot@outputs[[1]][[1]]
Out[17]=

The sentence embedding is the normalized average of all non-padded token representations:

In[18]:=
Normalize@Mean@outputs[[1]][[1]] // Short
Out[18]=

DistanceMatrix

Get the sentences:

In[19]:=
sentences = {
   "query: thank you for your help today", "query: I really appreciate your support", "query: the package arrived this morning", "query: the report was uploaded yesterday", "query: I disagree with your opinion", "query: I do not think this idea will work", "query: that comment was rude", "query: your tone is disrespectful",
    "query: stop being so annoying", "query: you are acting like a jerk", "query: this message is abusive and hateful", "query: your behavior is toxic and hostile"};

Get the embeddings of the sentences by taking the mean of the features of the tokens for each sentence:

In[20]:=
embeddings = netevaluate[sentences];

Compute the pairwise distance matrix between the sentence embeddings:

In[21]:=
distances = DistanceMatrix[embeddings, DistanceFunction -> SquaredEuclideanDistance];

Define shorter labels for the plot axes:

In[22]:=
shortLabels = {"help today", "appreciate support", "package arrived", "report uploaded", "disagree", "idea won't work", "comment rude", "tone disrespectful", "stop annoying", "acting like a jerk", "abusive hateful", "toxic hostile"};

Use numbered labels on the horizontal axis and numbered short labels on the vertical axis:

In[23]:=
xTicks = Transpose[{Range[Length[shortLabels]], Range[Length[shortLabels]]}];
yTicks = Transpose[{Range[Length[shortLabels]], MapThread[
     Row[{#1, ". ", #2}] &, {Range[Length[shortLabels]], shortLabels}]}];

Visualize the pairwise distances between the sentence embeddings:

In[24]:=
MatrixPlot[distances, FrameTicks -> {{yTicks, None}, {xTicks, None}}, PlotLegends -> Automatic, FrameLabel -> {"Sentence index", "Sentence"},
  PlotLabel -> "Pairwise distance matrix between sentence embeddings",
  ImageSize -> Large]
Out[24]=

Advanced usage

One-shot learning

Get a list of classes with one example sentence for each:

In[25]:=
labelSentences = {
   "query: Thank you for taking the time to help me with this." -> "Benign",
   "query: The package arrived this morning." -> "Neutral",
   "query: Your reply was rude and unnecessarily disrespectful." -> "Insulting",
   "query: This message is hostile, abusive, and clearly inappropriate." -> "Toxic"
   };

Get a set of sentences to classify and their correct labels:

In[26]:=
testSentences = {
   "query: I really appreciate how quickly you responded." -> "Benign",
    "query: Thanks for explaining everything so clearly." -> "Benign",
    "query: The report was uploaded evening." -> "Neutral", "query: I received the update this morning." -> "Neutral", "query: That was an extremely disrespectful way to speak to someone." -> "Insulting", "query: Your tone is rude and offensive." -> "Insulting", "query: The message was openly abusive and threatening." -> "Toxic",
    "query: This kind of hostile language is unacceptable." -> "Toxic"
   };

Get the embeddings of the labels and test sentences:

In[27]:=
labelEmb = netevaluate[Keys@labelSentences];
inputEmb = netevaluate[Keys@testSentences];

Get the predictions. Since all of the embeddings are normalized, SquaredEuclideanDistance, which is equivalent (up to a constant factor) to cosine distance, is used here:

In[28]:=
results = Flatten@Nearest[Thread[labelEmb -> Values@labelSentences], DistanceFunction -> SquaredEuclideanDistance][inputEmb];

Create a table to visualize the correct and predicted label for each sentence:

In[29]:=
Grid[Prepend[
  Transpose[{Keys@testSentences, Values@testSentences, results}], {"Text", "True Label", "Predicted Label"}], Frame -> All, Background -> {None, {LightGray}}, Alignment -> Left]
Out[29]=

Transfer learning

Content moderation classification

Perform content moderation classification on the Jigsaw Toxic Comment Classification Challenge dataset, which contains Wikipedia comments labeled as toxic behavior. For this example, the original annotations are reduced to three classes, and texts are encoded using NuToxicity Text Feature Extractor sentence embeddings. A simple classifier is then trained on top of these embeddings.

Get the Jigsaw Toxic Comment Classification Challenge dataset (via a Hugging Face mirror of the original Kaggle release). The underlying text is derived from Wikipedia talk pages and is licensed under the Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0) license:

In[30]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/581eeb20-6bdc-4e23-8f43-0dc06830b3a0"]
Out[30]=

Preprocess the dataset:

In[31]:=
i = 0; Monitor[
 encodeddata = Select[TransformColumns[data, "Input" -> Function[i++; Quiet@Check[
        Normal@netevaluate[("query: " <> #text)], $Failed]]], #Input =!= $Failed &], ProgressIndicator[i/Length[data]]]
Out[31]=

Define the classifier model for content moderation classification, which accepts the embeddings as input and outputs the probabilities for each class (benign, toxic, severe):

In[32]:=
numClasses = 3;
classifier = NetChain[{LinearLayer[numClasses], SoftmaxLayer[] }]
Out[33]=

Extract the training datasets from the initial data:

In[34]:=
trainData = Take[encodeddata, 600];
{validationData, testData} = TakeDrop[Drop[encodeddata, 600], 100];

Train the classifier:

In[35]:=
trainedClassifier = NetTrain[classifier, trainData, ValidationSet -> Dataset@validationData]
Out[35]=

Run the classifier on the embeddings obtained by the NuToxicity model using test sentences and categorize the results into "Correct" and "Incorrect" predictions:

In[36]:=
resultsData = TransformColumns[testData, "Prediction" -> Function[trainedClassifier[#Input]]] // TransformColumns[{
    "Correct" -> (Boole[#Prediction == #Output] &),
    "Incorrect" -> (Boole[#Prediction != #Output] &)
    }]
Out[36]=

Compute the accuracy:

In[37]:=
AggregateRows[resultsData, {
  "Accuracy" -> Function[N@Total[#Correct]/(Total[#Correct] + Total[#Incorrect])]}]
Out[37]=

Create a unified pipeline by merging the classifier and NuToxicity:

In[38]:=
toxicityModel = NetReplacePart[
  trainedClassifier, {"Input" -> NetEncoder[{"Function", netevaluate[#] &, 768, SaveDefinitions -> False}], "Output" -> NetDecoder[{"Class", {"Benign", "Toxic", "Severe"}}]}]
Out[38]=

Show the results:

In[39]:=
toxicityModel /@ {
  "The team showed such a good performance in the last match",
  "you little bustard get out of here" }
Out[39]=

Resource History

Reference

  • A. Constantin, S. Bogdanov, E. Bernard, "Creating Task-Specific Foundation Models with GPT-4," NuMind Blog, https://about.nuextract.ai/blog/creating-task-specific-foundation-models-with-gpt-4 (2023)

    L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, F. Wei, "Text Embeddings by Weakly-Supervised Contrastive Pre-training," arXiv:2212.03533v1 (2022)
  • Available from: https://huggingface.co/numind/NuToxicity
  • Rights: MIT License