gitlab-ce/doc/development/ai_features/prompt_engineering.md

190 lines
7.1 KiB
Markdown

---
stage: AI-powered
group: AI Framework
info: Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review.
title: Prompt Engineering Guide
---
This guide outlines the key aspects of prompt engineering when working with Large Language Models (LLMs),
including prompt design, optimization, evaluation, and monitoring.
## Understanding prompt engineering
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/bOA6BtBaMTQ).
Most important takeaways:
- **Definition of a prompt:**
- An instruction sent to a language model to solve a task
- Forms the core of AI features in user interfaces
- **Importance of prompt quality:**
- Greatly influences the quality of the language model's response
- Iterating on prompts is crucial for optimal results
- **Key considerations when crafting prompts:**
- Understand the task you're asking the model to perform
- Know what kind of response you're expecting
- Prepare a dataset to test the prompts
- Be specific - provide lots of details and context to help the AI understand
- Give examples of potential questions and desired answers
- **Prompt universality:**
- Prompts are not universal across different language models
- When changing models, prompts need to be adjusted
- Consult the language model provider's documentation for specific tips
- Test new models before fully switching
- **Tools for working with prompts:**
- Anthropic Console: A platform for writing and testing prompts
- Generator Prompt: A tool that creates crafted prompts based on task descriptions
- **Prompt structure:**
- Typically includes a general task description
- Contains placeholders for input text
- May include specific instructions and suggested output formats
- Consider wrapping inputs in XML tags for better understanding and data extraction
- **System prompts:**
- Set the general tone and role for the AI
- Can improve the model's performance
- Usually placed at the beginning of the prompt
- Set the role for the language model
- **Best practices:**
- Invest time in understanding the assignment
- Use prompt generation tools as a starting point
- Test and iterate on prompts to improve results
- Use proper English grammar and syntax to help the AI understand
- Allow uncertainty - tell the AI to say "I don't know" if it is unsure
- Use positive phrasing - say what the AI should do, not what it shouldn't do
### Best practices for writing effective prompts
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video about writing effective prompts](https://youtu.be/xL-zj-Z4Mh4).
Here are the key takeaways from this video:
- **No universal "good" prompt:**
- The effectiveness of a prompt depends on the specific task.
- There's no one-size-fits-all approach to prompt writing.
- **Characteristics of effective prompts:**
- Clear and explanatory of the task and expected outcomes.
- Direct and detailed.
- Specific about the desired output.
- **Key elements to consider:**
- Understand the task, audience, and end goal.
- Explain these elements clearly in the prompt.
- **Strategies for improving prompt performance:**
- Add instructions in sequential steps.
- Include relevant examples.
- Ask the model to think in steps (chain of thought).
- Request reasoning before providing answers.
- Guide the input - use delimiters to clearly indicate where the user's input starts and ends.
- **Adapting to model preferences:**
- Adjust prompts to suit the preferred data structure of the model.
- For example, Anthropic models work well with XML tags.
- **Importance of system prompts:**
- Set the role for the language model.
- Placed at the beginning of the interaction.
- Can include awareness of tools or long context.
- **Iteration is crucial:**
- Emphasized as the most important part of working with prompts.
- Continual refinement leads to better results.
- Build quality control - automate testing prompts with RSpec or Rake tasks to catch differences.
- **Use traditional code:**
- If a task can be done efficiently outside of calling an LLM, use code for more reliable and deterministic outputs.
## Tuning and optimizing workflows for prompts
### Prompt tuning for LLMs using LangSmith and Anthropic Workbench together + CEF
#### Iterating on the prompt using Anthropic console
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/03nOKxr8BS4).
#### Iterating on the prompt using LangSmith
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/9WXT0licAdg).
#### Using Datasets for prompt tuning with LangSmith
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://www.youtube.com/watch?v=kUnm0c2LMlQ).
#### Using automated evaluation in LangSmith
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/MT6SK4y47Zw).
#### Using pairwise experiments in LangSmith
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/mhpY7ddjXqc).
[View the CEF documentation](https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/running_evaluation_locally/pairwise_evaluation.md).
#### When to use LangSmith and when CEF
<i class="fa fa-youtube-play youtube" aria-hidden="true"></i>
For an overview, see [this video](https://youtu.be/-DK-XFFllwg).
##### Key Points on CEF (Centralized Evaluation Framework) Project
1. Initial Development
- Start with pure LangSmith for prompt iteration
- Easier and quicker to set up
- More cost-effective for early stages
1. When to Transition to CEF
- When investing more in the feature
- For working with larger datasets
- For repeated, long-term use
1. CEF Setup Considerations
- Requires upfront time investment
- Need to adjust evaluations for specific features
- Set up input data (for example, local GDK for chat features)
1. Challenges
- Ensuring consistent data across different users
- Exploring options like seats and imports for data sharing
1. Current CEF Capabilities
- Supports chat questions about code
- Handles documentation-related queries
- Includes evaluations for code suggestions
1. Advantages of CEF
- Allows running evaluations on local GDK
- Results viewable in LangSmith UI
- Enables use of larger datasets
1. Flexibility
- Requires customization for specific use cases
- Not a one-size-fits-all solution
1. Documentation
- CEF has extensive documentation available.
1. Adoption
- Already in use by some teams, including code suggestions and create teams
## Further resources
For more comprehensive prompt engineering guides, see:
- [Prompt Engineering Guide 1](https://www.promptingguide.ai/)
- [Prompt Engineering Guide 2](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/)