LLMPromptAssessment

Use an LLM to assess the quality of another LLM's response for a given prompt

LLMResourceFunction["LLMPromptAssessment"]["prompt"]

generate a response from an LLM for the given prompt, then have another LLM assess the quality of that response.

LLMResourceFunction["LLMPromptAssessment"]["prompt",config]

use the LLMConfiguration specified by config to generate the initial response.

LLMResourceFunction["LLMPromptAssessment"]["prompt",config,"extra"]

includes the given extra instructions in the prompting for the assessment LLM.

Details

Using LLMPromptAssessment involves making two LLM calls. One generates a response to the given prompt, and the other evaluates how well that response followed instructions.

The two LLM calls can be for different models.

In LLMResourceFunction["LLMPromptAssessment",LLMEvaluator→<|"Model"→"outer"|>]["prompt",<|"Model"→"inner"|>], the models "inner" and "outer" represent the following:

"inner"

the model that generates the initial response for the given "prompt"

"outer"

the model that looks at the response and evaluates how well it followed instructions specified in "prompt"

For best results, the "outer" model should be at least as capable as the "inner" model.

The extra instructions argument can be used to provide additional prompting for the assessment model.

The prompting for the assessment model instructs it to format its response as JSON, so extra instructions can be provided to add additional information to that JSON result which will be included in the parsed response.

Programmatic Examples

Generate a response from an LLM and analyze the response with another LLM:

In[1]:=

LLMResourceFunction["LLMPromptAssessment"]["What's the capital of France?"]

Out[1]=

Score5,ExplanationThe responder followed the instructions perfectly by providing the correct answer, which is Paris. The response is concise and accurate, leaving no room for improvement.,ResponseThe capital of France is Paris.

Specify the model that should be used for generating the response:

In[2]:=

LLMResourceFunction["LLMPromptAssessment"]["Write a short poem where the first word is the total number of words in the poem.","Model""gpt-3.5-turbo"]

Out[2]=

Score5,ExplanationThe responder followed the instructions perfectly. The first word of the poem is indeed the total number of words in the poem, which is 'Three.' The poem itself is well-written and captures the theme of love beautifully. There is nothing that can be improved in this response.,ResponseThree.Love blooms bright,In heart's kindest chapels,Whispers of gentle melodies,Tender embraces paint the canvas,Time's passing breathlessly,Forever entwined eternally.

Use a more capable model for assessment:

In[3]:=

LLMResourceFunction["LLMPromptAssessment",LLMEvaluator"Model""gpt-4"]["Write a short poem where the first word is the total number of words in the poem.","Model""gpt-3.5-turbo"]

Out[3]=

Score1,ExplanationThe responder did not follow the instructions. In the PROMPT, it was clearly specified that the total number of words in the poem should be the first word. Therefore, the first word 'Seven' shall be included to count the total words in the poem as well. Hence, the total count in this poem is 34, rather than 7. So, the poem should start with 'Thirty four' according to the instructions, but it did not.,ResponseSevenWhispering leaves sway gentlyAmidst sunlight's golden danceSilent moments brimming with peaceNature's symphony in every tranceEthereal beauty, soft and sweetCaptured in each delicate strokeSeven words unite this fleeting treatA poem, a glimpse, a heartfelt spoke.

Test a prompt from the Wolfram Prompt Repository:

In[4]:=

LLMResourceFunction["LLMPromptAssessment"][LLMPrompt["Emojify"]["Hello world"]]

Out[4]=

Score5,ExplanationThe responder followed the instructions perfectly by adding emojis to key words in the text. The response is concise and does not include any unnecessary information.,Response👋🌍

In[5]:=

LLMResourceFunction["LLMPromptAssessment"][LLMPrompt["TLDR"][ResourceData["Gettysburg Address"]]]

Out[5]=

Score4,ExplanationThe responder followed the instructions in the prompt by providing a short one sentence summary of the text. The summary captures the main themes of dedication to preserving the new nation and continuing the fight for freedom in honor of those who sacrificed their lives. However, the summary could be improved by being more concise and eliminating some repetitive phrases.,ResponseThe text highlights the dedication to preserving the new nation created on the principle of equality, as well as the responsibility to continue the fight for freedom in honor of those who sacrificed their lives.

In[6]:=

LLMResourceFunction["LLMPromptAssessment"][LLMPrompt["Translate"]["French","Thank you for your hospitality"]]

Out[6]=

Score5,ExplanationThe responder followed the instructions perfectly and provided an accurate translation of the text into French.,ResponseMerci pour votre hospitalité.

Scope
(2)

Include additional instructions for the assessment LLM:

In[1]:=

LLMResourceFunction["LLMPromptAssessment",LLMEvaluator"Model""gpt-4"]["What's the capital of France?",Automatic,"A one word answer is expected for this prompt. Reduce your score if more than one word is given in the response."]

Out[1]=

Score2.5,ExplanationThe prompt was followed as the correct information was provided, which is that Paris is the capital of France. However, extra unnecessary information was added to the response, in contravention to the additional instructions.,ResponseThe capital of France is Paris.

Ask for extra structured information in the response by specifying property names:

In[2]:=

LLMResourceFunction["LLMPromptAssessment",LLMEvaluator"Model""gpt-4"]["What's the capital of France?",Automatic,"Include a suggested follow-up question in your response in a property named \"Suggested\"."]

Out[2]=

Score5,ExplanationThe responder correctly identified the capital of France as asked in the prompt.,SuggestedWhat is the population of Paris?,ResponseThe capital of France is Paris.

The requested information is automatically added to the output:

In[3]:=

%["Suggested"]

Out[3]=

What is the population of Paris?

Applications
(1)



Properties and Relations
(3)



Possible Issues
(3)



External Links

Reflexion: Language Agents with Verbal Reinforcement Learning

Prompt Source

Definition Notebook »

Publisher Information

Contributed by: Richard Hennigan (Wolfram Research)