This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you. Some of the examples demonstrated here currently work only with our most capable
This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you. Some of the examples demonstrated here currently work only with our most capable model, gpt-4. lf you don't yet have access to gpt-4 consider joining the waitlist. ln general, if you find that a GPT model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.
1 What is Prompt Engineering?
Prompt engineering is a relatively new discipline that involves developing and optimizing prompts to effectively use various language models (LMs) for a wide range of applications and research topics. It helps to leverage the characteristics and limitations of large language models (LLMs). Prompt engineering is used to enhance the capabilities of LLMs in performing a wide array of common and complex tasks, such as question answering and arithmetic reasoning.
2 Why Do We Need Prompts?
We need prompts because ChatGPT and other LLMs have certain limitations and cannot always meet our needs as is. The purpose of using prompts is to help the model overcome these limitations (this applies to all LLMs). To give a clear idea, we can categorize the limitations of ChatGPT into semantic, design, and content-based issues. Semantic issues, which also pertain to language as a means of communication, include: ChatGPT fails to understand me: I spend a lot of time formulating and expressing my ideas. I don’t like or understand the way ChatGPT expresses things. ChatGPT’s responses are too long/short, etc.
3 Six strategies for getting better results
3.1 Write clear instructions GPTs can't read your mind. lf outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing.If you dislike the format, demonstrate the format you'd like to see. The less GPTs have to guess at what you want, the more likely you'll get it. Content Expansion:
-Include details in your query to get more relevant answers -Ask the model to adopt a persona -Use delimiters to clearly indicate distinct parts oft he input -Specify the steps required to complete a task -Provide examples 1.6 Specify the desired length of the output
3.2 Provide reference text GPTs can confidently invent fake answers,especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.
-Instruct the model to answer using a reference text -Instruct the model to answer with citations from a reference text
3.3 Split complex tasks into simpler sub tasks Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
-Use intent classification to identify the most relevant instructions for a user query -For dialogue applications that require very long conversations, summarize or filter previous dialogue. -Summarize long documents piece wise and construct a full summary recursively
3.4 Give GPTs time to think lf asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer.
-Instruct the model to work out its own solution before rushing to a conclusion. -Use inner monologue or a sequence of queries to hide the model's reasoning process. -Ask the model if it missed anything on previous passes.
3.5 Use external tools Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents.A code execution engine can help GPTs do math and run code. lfa task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.
-Use embeddings-based search to implement efficient knowledge retrieval. -Use code execution to perform more accurate calculations or calle xternal APIs. -Give the model access to specific functions
3.6 Test changes systematically lmproving performance is easier if you can measure it.In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite(also known an as an "eval") Evaluate model outputs with reference to gold standard answers.
4 Tactics
Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
4.1 Write clear instructions
1nclude details in your query to get more relevant answers In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.
Not so good: How do l add numbers in Excel? Summarize the meeting notes.
Better: How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total" Ask details:
Have no detalis:
Ask the model to adopt a persona The system message can be used to specify the persona used by the model in its replies.
You will get different answers:
You are a mischievous courtier with the intent ofcausing confusion and not being helpful at all.
How to Bake a Chocolate Cake?
Use delimiters to clearly indicate distinct parts of the input Delimiters like triple quotation marks, XML tags, section titles, etc, can help demarcate sections of text to be treated differently.
Summarize the text delimited by triple quotes with a haiku. “““insert text here”””
For straightforward tasks such as these, using delimiters might not make a difference in the output quality.However, the more complex a task is the more important it is to disambiguate task details. Don't make GPTs work to understand exactly what you are asking of them.
Specify the steps required to complete a task Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.
Provide examples Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly.This is known as "few-shot" prompting. (已编辑)
Specify the desired length of the output You can ask the model to produce outputs that are of a given target length, The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
Summarize the text delimited by triple quotes in about 30 words.
“““insert text here”””
4.2 Tactic2 Provide reference text
Instruct the Model to Answer Using a Reference Text
When a model is given specific, trusted information that is relevant to a current query, it can be instructed to use that information to form its response.
SYSTEM Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
USER <insert articles, each delimited by triple quotes>
Question: <insert question here>
Instruct the Model to Answer with Citations from a Reference Text
If the input includes supplemental relevant knowledge, it's feasible to instruct the model to enhance its answers with citations. These citations should refer to passages from the provided documents. The validity of these citations can then be programmatically verified by matching strings within the given documents.
SYSTEM You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({"citation": …}).
USER """<insert document here>"""
Question: <insert question here>
4.3 Break Down Complex Tasks into Simpler Steps
Determine Relevant Instructions Using Intent Classification
In situations where multiple independent instructions are needed to address different scenarios, it's useful to first identify the type of query. This classification helps in selecting the most appropriate instructions for that particular case. By categorizing queries into fixed groups and assigning specific instructions to each group, tasks can be efficiently managed. This method can also be applied in stages, breaking down a complex task into simpler, sequential steps. This approach ensures that each query is provided with only the necessary instructions for the next part of the task, which can reduce errors and lower costs, as complex prompts are more expensive. Suppose for example that for a customer service application, queries could be usefully classified as follows:
SYSTEM You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.
Primary categories: Billing, Technical Support, Account Management, or General Inquiry.
Billing secondary categories: -Unsubscribe or upgrade - Add a payment method - Explanation for charge - Dispute a charge
Account Management secondary categories: - Password reset - Update personal information - Close account - Account security
General Inquiry secondary categories: - Product information - Pricing - Feedback - Speak to a human
USER I need to get my internet working again.
Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with "troubleshooting".
SYSTEM You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by:Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time.If all cables are connected and the issue persists, ask them which router model they are usingNow you will advise them how to restart their device:-- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection. -- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection.If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}.If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme:<insert primary/secondary classification scheme from above here>
USER I need to get my internet working again.
Summarize or Filter Prolonged Dialogue in Applications
Given that GPT models have a limited context window, they cannot maintain an indefinitely long conversation including the entire dialogue history.
One solution is to periodically summarize the conversation once it reaches a certain length. This summary can then be used as part of the system's response. Another method is to continuously summarize the dialogue in the background throughout the conversation. Alternatively, selecting and focusing on the most relevant parts of the previous conversation for the current query can be effective. This approach aligns with the strategy of using embedding-based search for efficient knowledge retrieval.
Piece-wise Summarization of Long Documents for Comprehensive Overview
Since GPT models have a limit on the length of text they can process at once, they are unable to summarize texts that exceed this limit in a single query.
For summarizing lengthy texts like books, a sequential approach can be used. Each section of the text is summarized individually, and these summaries are then combined and summarized again. This recursive process continues until a complete summary of the document is achieved. If understanding later sections requires knowledge of earlier parts, a running summary of the preceding content can be included while summarizing subsequent sections. This method has been researched and validated by OpenAI in their studies on book summarization.
4.4 Allowing Time for GPT Models to Process Information
Implement Inner Monologue or Sequential Queries for Deeper Reasoning
To achieve more accurate results, it can be beneficial to guide the model through a step-by-step reasoning process, especially in tasks requiring logical deduction, like math problem evaluations. Instead of directly asking the model to validate a student's solution, prompt it to first work through the problem independently. This method can lead the model to identify errors in the student's solution more effectively by comparing it against its own calculated answer.
SYSTEM Determine if the student's solution is correct or not.
USER Problem Statement: I'm building a solar power installation and I need help working out the financials.Land costs $100 / square footI can buy solar panels for $250 / square footI negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square footWhat is the total cost for the first year of operations as a function of the number of square feet. Student's Solution: Let x be the size of the installation in square feet.Land cost: 100xSolar panel cost: 250xMaintenance cost: 100,000 + 100xTotal cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT The student's solution is correct.
But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
SYSTEM First work out your own solution to the problem. Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. Don't decide if the student's solution is correct until you have done the problem yourself.
USER Problem Statement: I'm building a solar power installation and I need help working out the financials.Land costs $100 / square footI can buy solar panels for $250 / square footI negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square footWhat is the total cost for the first year of operations as a function of the number of square feet.
Student's Solution: Let x be the size of the installation in square feet.Land cost: 100xSolar panel cost: 250xMaintenance cost: 100,000 + 100xTotal cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT Let x be the size of the installation in square feet.Land cost: 100xSolar panel cost: 250xMaintenance cost: 100,000 + 10xTotal cost: 100x + 250x + 100,000 + 10x = 360x + 100,000 The student's solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.
Use Follow-Up Queries to Ensure Comprehensive Analysis
When using a model to extract relevant excerpts from a large source in response to a question, there's a risk of the model not identifying all pertinent sections in one go. If the initial attempt seems incomplete, improve the outcome by asking the model additional questions. These follow-up queries encourage the model to revisit the source material, helping it uncover any relevant excerpts that might have been overlooked initially. This iterative approach enhances the thoroughness of the extraction process.
SYSTEM Follow these steps to answer the user queries.
Step 1 - First work out your own solution to the problem. Don't rely on the student's solution since it may be incorrect. Enclose all your work for this step within triple quotes (""").
Step 2 - Compare your solution to the student's solution and evaluate if the student's solution is correct or not. Enclose all your work for this step within triple quotes (""").
Step 3 - If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (""").
Step 4 - If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:". USER Problem Statement: <insert problem statement>
Student Solution: <insert student solution>
Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
USER <insert problem statement> Open in Playground Next, we can have the model use all available information to assess the correctness of the student’s solution.
SYSTEM Compare your solution to the student's solution and evaluate if the student's solution is correct or not. USER Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>"""
Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
SYSTEM You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment. USER Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>""" Analysis: """<insert model generated analysis from previous step>"""
4.5 Use external tools
Implement Efficient Knowledge Retrieval with Embeddings-Based Search
Models can greatly benefit from external information, especially when answering queries that need up-to-date or specific details. For instance, in responding to a question about a particular movie, incorporating comprehensive data about the movie (like cast and director) into the model's input can enhance the response's accuracy. Embeddings-based search can be utilized for this efficient knowledge retrieval. Text embeddings, which are vectors representing the similarity between text strings, enable the model to find and integrate the most relevant information from a text corpus dynamically, ensuring more informed responses. This technique is detailed in the OpenAI Cookbook and is crucial in minimizing the likelihood of the model generating incorrect information.
Execute Code for Accurate Calculations or to Utilize External APIs
For tasks requiring precise arithmetic or complex calculations, the model can be instructed to write and execute code, rather than relying on its own computational abilities. This process involves the model formatting the code (e.g., in triple backticks), which is then executed in a controlled environment, like a Python interpreter. This approach is also applicable for using external APIs, where the model can be guided with appropriate documentation and code samples. However, it's essential to exercise caution and ensure a secure, sandboxed environment for code execution to prevent potential risks.
SYSTEM You can write and execute Python code by enclosing it in triple backticks, e.g. code goes here . Use this to perform calculations. USER Find all real-valued roots of the following polynomial: 3x**5 - 5x4 - 3*x3 - 7*x - 10.
Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
SYSTEM You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:
import message message.write(to="John", message="Hey, want to meetup after work?")
WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
Enable Model Access to Specific Functions The Chat completions API offers the capability to include function descriptions in requests. This allows models to tailor function arguments as per the provided schemas, with the arguments returned in JSON format. These can then be used to execute function calls, and the resulting outputs can be fed back into the model for subsequent queries. This method, as recommended in the introductory GPT guide and further exemplified in the OpenAI Cookbook, optimizes the use of GPT models in calling external functions.
4.6 Systematic Testing of Modifications
Determining the effectiveness of a change, such as a new instruction or design, in a system can be challenging. While examining a few instances might suggest some improvements, small sample sizes can blur the line between actual enhancement and mere chance. A modification might improve performance in certain cases but deteriorate it in others.
To optimize system designs, evaluation procedures (often referred to as "evals") are crucial. Effective evals should be:
Closely mirroring real-world scenarios or, at the very least, be diverse. Inclusive of a large number of test cases to increase statistical significance (refer to the provided table for guidance). Designed for ease of automation or replication.
Evals can be conducted by computers, humans, or a combination of both. For criteria that are objective (like questions with definitive answers) or somewhat subjective, computers can automate the evaluation process. OpenAI Evals, an open-source framework, offers tools for setting up such automated evaluations.
Model-based evaluations become pertinent when there's a variety of potential outputs that are all considered high quality (such as in responses to questions with detailed answers). The line between what can be effectively judged through model-based evaluations and what necessitates human assessment is not clear-cut and is continually evolving as models advance. Experimentation is encouraged to determine the efficacy of model-based evaluations for specific applications.
Assessing Model Outputs Against Established Correct Answers
In situations where the accurate response to a question is expected to incorporate certain established facts, a model query can be employed to tally the number of these essential facts present in the answer.
For instance, consider this approach using the system message provided:
SYSTEM You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer:Neil Armstrong was the first person to walk on the moon.The date Neil Armstrong first walked on the moon was July 21, 1969.For each of these points perform the following steps: 1 - Restate the point. 2 - Provide a citation from the answer which is closest to this point. 3 - Consider if someone reading the citation who doesn't know the topic could directly infer the point. Explain why or why not before making up your mind. 4 - Write "yes" if the answer to 3 was yes, otherwise write "no".
Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}. Here's an example input where both points are satisfied:
SYSTEM <insert system message above> USER """Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission."""
Here's an example input where only one point is satisfied:
SYSTEM <insert system message above> USER """Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon.""" Open in Playground Here's an example input where none are satisfied:
SYSTEM <insert system message above> USER """In the summer of '69, a voyage grand, Apollo 11, bold as legend's hand. Armstrong took a step, history unfurled, "One small step," he said, for a new world."""
There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
SYSTEM Use the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. "Step 1: Reason...".
Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset).
Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer.
Step 3: Output a JSON object structured like: {"type_of_overlap": "disjoint" or "equal" or "subset" or "superset" or "overlapping", "contradiction": true or false}
Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:
SYSTEM <insert system message above> USER Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Submitted Answer: """Didn't he walk on the moon or something?"""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
Here's an example input with answer that directly contradicts the expert answer:
SYSTEM <insert system message above> USER Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
Here's an example input with a correct answer that also provides a bit more detail than is necessary:
SYSTEM <insert system message above> USER Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time."""
Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history."""
Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969."""
OpenTaskAI is a global marketplace that connects AI freelancers and business needs. Dedicated to enabling more people to achieve self-worth through AI tools.
We are excited to announce the official launch of the OpenTaskAI test network on the BNB Smart Chain Testnet! This significant milestone not only showcases our technical capabilities but also highlights our team’s dedication to long-term research, continuous optimization, and technological innovation.
In this test phase, key platform features
This article includes details freelancers need to pay attention to when receiving an order, how to handle disputed orders and the order status!
How to Start Your Order:
1. Login in & Post a Gig: Make sure you have registered with OpenTaskAI and have posted a Gig! You can learn
This article includes: how to place and pay for an order, how to cancel an order, how to request order modifications, how to request a refund, how to seek customer service help, and other order statuses.
1. Place Order
After successfully logging in and selecting the Gig service you need,
In today’s competitive business sector, scalability remains one of the most discussed topics as every startup owner looks to increase their revenue and grow their business. But, according to a study by the Kauffman Foundation, only 10% of startups will scale successfully. This explains why modern startups need AI
This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you. Some of the examples demonstrated here currently work only with our most capable model, gpt-4. lf you don't yet have access to gpt-4 consider joining the waitlist. ln general, if you find that a GPT model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.
1 What is Prompt Engineering?
Prompt engineering is a relatively new discipline that involves developing and optimizing prompts to effectively use various language models (LMs) for a wide range of applications and research topics. It helps to leverage the characteristics and limitations of large language models (LLMs). Prompt engineering is used to enhance the capabilities of LLMs in performing a wide array of common and complex tasks, such as question answering and arithmetic reasoning.
2 Why Do We Need Prompts?
We need prompts because ChatGPT and other LLMs have certain limitations and cannot always meet our needs as is. The purpose of using prompts is to help the model overcome these limitations (this applies to all LLMs). To give a clear idea, we can categorize the limitations of ChatGPT into semantic, design, and content-based issues. Semantic issues, which also pertain to language as a means of communication, include: ChatGPT fails to understand me: I spend a lot of time formulating and expressing my ideas. I don’t like or understand the way ChatGPT expresses things. ChatGPT’s responses are too long/short, etc.
3 Six strategies for getting better results
3.1 Write clear instructions
GPTs can't read your mind. lf outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing.If you dislike the format, demonstrate the format you'd like to see. The less GPTs have to guess at what you want, the more likely you'll get it. Content Expansion:
3.2 Provide reference text
GPTs can confidently invent fake answers,especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.
3.3 Split complex tasks into simpler sub tasks
Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
3.4 Give GPTs time to think
lf asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer.
3.5 Use external tools
Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents.A code execution engine can help GPTs do math and run code. lfa task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.
3.6 Test changes systematically
lmproving performance is easier if you can measure it.In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite(also known an as an "eval") Evaluate model outputs with reference to gold standard answers.
4 Tactics
Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
4.1 Write clear instructions
Not so good:
How do l add numbers in Excel? Summarize the meeting notes.
Better:
How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total" Ask details:
Have no detalis:
The system message can be used to specify the persona used by the model in its replies.
You will get different answers:
You are a mischievous courtier with the intent ofcausing confusion and not being helpful at all.
How to Bake a Chocolate Cake?
Delimiters like triple quotation marks, XML tags, section titles, etc, can help demarcate sections of text to be treated differently.
Summarize the text delimited by triple quotes with a haiku.
“““insert text here”””
For straightforward tasks such as these, using delimiters might not make a difference in the output quality.However, the more complex a task is the more important it is to disambiguate task details. Don't make GPTs work to understand exactly what you are asking of them.
Summarize the text delimited by triple quotes in about 30 words.
“““insert text here”””
4.2 Tactic2 Provide reference text
When a model is given specific, trusted information that is relevant to a current query, it can be instructed to use that information to form its response.
If the input includes supplemental relevant knowledge, it's feasible to instruct the model to enhance its answers with citations. These citations should refer to passages from the provided documents. The validity of these citations can then be programmatically verified by matching strings within the given documents.
4.3 Break Down Complex Tasks into Simpler Steps
In situations where multiple independent instructions are needed to address different scenarios, it's useful to first identify the type of query. This classification helps in selecting the most appropriate instructions for that particular case. By categorizing queries into fixed groups and assigning specific instructions to each group, tasks can be efficiently managed. This method can also be applied in stages, breaking down a complex task into simpler, sequential steps. This approach ensures that each query is provided with only the necessary instructions for the next part of the task, which can reduce errors and lower costs, as complex prompts are more expensive. Suppose for example that for a customer service application, queries could be usefully classified as follows:
Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with "troubleshooting".
Given that GPT models have a limited context window, they cannot maintain an indefinitely long conversation including the entire dialogue history.
One solution is to periodically summarize the conversation once it reaches a certain length. This summary can then be used as part of the system's response. Another method is to continuously summarize the dialogue in the background throughout the conversation. Alternatively, selecting and focusing on the most relevant parts of the previous conversation for the current query can be effective. This approach aligns with the strategy of using embedding-based search for efficient knowledge retrieval.
Since GPT models have a limit on the length of text they can process at once, they are unable to summarize texts that exceed this limit in a single query.
For summarizing lengthy texts like books, a sequential approach can be used. Each section of the text is summarized individually, and these summaries are then combined and summarized again. This recursive process continues until a complete summary of the document is achieved. If understanding later sections requires knowledge of earlier parts, a running summary of the preceding content can be included while summarizing subsequent sections. This method has been researched and validated by OpenAI in their studies on book summarization.
4.4 Allowing Time for GPT Models to Process Information
To achieve more accurate results, it can be beneficial to guide the model through a step-by-step reasoning process, especially in tasks requiring logical deduction, like math problem evaluations. Instead of directly asking the model to validate a student's solution, prompt it to first work through the problem independently. This method can lead the model to identify errors in the student's solution more effectively by comparing it against its own calculated answer.
But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
Use Follow-Up Queries to Ensure Comprehensive Analysis
When using a model to extract relevant excerpts from a large source in response to a question, there's a risk of the model not identifying all pertinent sections in one go. If the initial attempt seems incomplete, improve the outcome by asking the model additional questions. These follow-up queries encourage the model to revisit the source material, helping it uncover any relevant excerpts that might have been overlooked initially. This iterative approach enhances the thoroughness of the extraction process.
Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
4.5 Use external tools
Models can greatly benefit from external information, especially when answering queries that need up-to-date or specific details. For instance, in responding to a question about a particular movie, incorporating comprehensive data about the movie (like cast and director) into the model's input can enhance the response's accuracy. Embeddings-based search can be utilized for this efficient knowledge retrieval. Text embeddings, which are vectors representing the similarity between text strings, enable the model to find and integrate the most relevant information from a text corpus dynamically, ensuring more informed responses. This technique is detailed in the OpenAI Cookbook and is crucial in minimizing the likelihood of the model generating incorrect information.
For tasks requiring precise arithmetic or complex calculations, the model can be instructed to write and execute code, rather than relying on its own computational abilities. This process involves the model formatting the code (e.g., in triple backticks), which is then executed in a controlled environment, like a Python interpreter. This approach is also applicable for using external APIs, where the model can be guided with appropriate documentation and code samples. However, it's essential to exercise caution and ensure a secure, sandboxed environment for code execution to prevent potential risks.
Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
SYSTEM You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:
WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
4.6 Systematic Testing of Modifications
Determining the effectiveness of a change, such as a new instruction or design, in a system can be challenging. While examining a few instances might suggest some improvements, small sample sizes can blur the line between actual enhancement and mere chance. A modification might improve performance in certain cases but deteriorate it in others.
To optimize system designs, evaluation procedures (often referred to as "evals") are crucial. Effective evals should be:
Evals can be conducted by computers, humans, or a combination of both. For criteria that are objective (like questions with definitive answers) or somewhat subjective, computers can automate the evaluation process. OpenAI Evals, an open-source framework, offers tools for setting up such automated evaluations.
Model-based evaluations become pertinent when there's a variety of potential outputs that are all considered high quality (such as in responses to questions with detailed answers). The line between what can be effectively judged through model-based evaluations and what necessitates human assessment is not clear-cut and is continually evolving as models advance. Experimentation is encouraged to determine the efficacy of model-based evaluations for specific applications.
Assessing Model Outputs Against Established Correct Answers
In situations where the accurate response to a question is expected to incorporate certain established facts, a model query can be employed to tally the number of these essential facts present in the answer.
For instance, consider this approach using the system message provided:
Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}. Here's an example input where both points are satisfied:
Here's an example input where only one point is satisfied:
There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:
Here's an example input with answer that directly contradicts the expert answer:
Here's an example input with a correct answer that also provides a bit more detail than is necessary:
OpenTaskAI is a global marketplace that connects AI freelancers and business needs. Dedicated to enabling more people to achieve self-worth through AI tools.
WebSite|Twitter|Medium|TikTok|Instagram|Discord|Linkedin
Read Next
OpenTaskAI Test Network Officially Launches on BNB Smart Chain Testnet!
We are excited to announce the official launch of the OpenTaskAI test network on the BNB Smart Chain Testnet! This significant milestone not only showcases our technical capabilities but also highlights our team’s dedication to long-term research, continuous optimization, and technological innovation. In this test phase, key platform features
How Freelancers Accept Orders and Corresponding Order Status
This article includes details freelancers need to pay attention to when receiving an order, how to handle disputed orders and the order status! How to Start Your Order: 1. Login in & Post a Gig: Make sure you have registered with OpenTaskAI and have posted a Gig! You can learn
How Clients Place Orders and Order Status!
This article includes: how to place and pay for an order, how to cancel an order, how to request order modifications, how to request a refund, how to seek customer service help, and other order statuses. 1. Place Order After successfully logging in and selecting the Gig service you need,
Scale your Startup Faster: The AI Talent Advantage
In today’s competitive business sector, scalability remains one of the most discussed topics as every startup owner looks to increase their revenue and grow their business. But, according to a study by the Kauffman Foundation, only 10% of startups will scale successfully. This explains why modern startups need AI