Basic Examples (11)
This functionality requires an OpenAI API account and an API key:
Convert all equations from the 1st through the 3rd sections of the provided arXiv preprint from TeX to Wolfram Language output them as a dataset:
Instead of returning a dataset, save the result to a notebook and open it automatically:
Also save the output notebook:
By default, it saves in the directory FileNameJoin[{OptionValue["ProjectBaseDirectory"], "paper"}] where OptionValue["ProjectBaseDirectory"] by default is FileNameJoin[{URL[$LocalBase], "ConvertArXivTeXSource"}]:
Instead of saving it to the default file path, save it to a specified one:
Convert all equations in the sections 2 through 4, then publish the results to a Wolfram Cloud notebook:
Convert all equations from the given arXiv preprint using multiple parallel kernels to speed up the process:
A pre-generated result for reference:
The LLM model used for most of the development and testing is OpenAI's "gpt-4-turbo" which has a long (128k) context length limit needed for typical arXiv preprints. But the option "LLMModel" can be used to specify a different mode:
Use a different LLM model:
Scope (2)
The function ConvertArXivTeXSource requires access to OpenAI API to work. Users need to set the system credential SystemCredential["OPENAI_API_KEY"] = "XXX" (Replace "XXX" with your OpenAI API key.)
The TEX source file for the targeted paper ID are downloaded from arXiv.org and saved to a subdirectory "paper/" inside OptionValue["ProjectBaseDirectory"].
OptionValue["ProjectBaseDirectory"] specifies the project base directory which includes the sub-directory "paper/" for saving downloaded TEX files and output notebooks. Its default value is FileNameJoin[{URL[$LocalBase], "ConvertArXivTeXSource"}], where $LocalBase has a default value $DefaultLocalBase.
The option "LLMModel" can be used to specify a the LLM model used for the converting: "LLMModel" -> <|"Service" -> "OpenAI", "Name" -> "gpt-4-turbo"|>. The current default is "gpt-4o". The model "gpt-4-turbo" was used a lot during the development and testing. Both models have a long (128k) context length limit needed for a typical arXiv preprint. The behavior with other models aren't tested extensively.
The function ResourceFunction["ConvertArXivTeXSource"] requires access to OpenAI API to work. Users need to set the system credential with SystemCredential["OPENAI_API_KEY"] = "XXX" (Replace "XXX" with your OpenAI API key.)
When the output format is "Notebook", use the "SaveFileQ" → True option to save the output notebook. It saves the notebook file in the same directory "paper/" alongside the TEX file. Currently the named LLM prompt recipe argument promptRecipe can only be {"ConvertArXivTeXSource","EquationList"} which collects and converts a list of equation-like objects in the specified portion of the paper.
The option "AdditionalLLMPrompts" -> {p1, p2,…}, can be used to add additional prompts at the end of the built-in one. Each pi should be a string.
Set the option "ParallelQ" → True to run the potentially slow LLM API calls in parallel.
To use a local TEX source file instead of having the function automatically downloading the TEX source file from arXiv.org, before running ResourceFunction["ConvertArXivTeXSource"], specify the root directory with "ProjectBaseDirectory" → dir and save the TEX file as, e.g. FileNameJoin[{dir,"paper", "2404.11685.tex"}].
The ConvertArXivTeXSource function takes the following options:
"ProjectBaseDirectory" | FileNameJoin[{URL[$LocalBase], "ConvertArXivTeXSource"}] | project base directory for saving downloaded TEX file and output notebooks |
"LLMModel" | <|"Service" -> "OpenAI", "Name" -> "gpt-4o"|> | LLM model |
"SaveFileQ" | False | set to True to save the output notebook as FileNameJoin[{OptionValue["ProjectBaseDirectory"],"paper", "YYMM.xxxxx.nb"}] |
"PublishToWolframCloudQ" | False | set to True to publish the output notebook to the cloud specified by OptionValue["CloudBase"] |
"CloudBase" | $CloudBase | the cloud base for publishing notebooks |
"ParallelQ" | False | set to True to process LLM API calls parallelly |
"AdditionalLLMPrompts" | {} | optionally provide a list of prompt strings to be appended after the built-in prompts of the specified prompt recipe |
Screenshot of an example output notebook: