Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Tokenize an input string into a list of integers from a vocabulary that was originally used to train GPT nets
ResourceFunction["GPTTokenizer"][] returns a GPT NetEncoder. | |
ResourceFunction["GPTTokenizer"]["string"] tokenizes an input "string" into a list of integers from the GPT neural net vocabulary. |
Encode a string of characters:
In[1]:= |
Out[1]= |
Get the GPT NetEncoder:
In[2]:= |
Out[2]= |
Check that tokenization is the same:
In[3]:= |
Out[3]= |
This work is licensed under a Creative Commons Attribution 4.0 International License