Creating an MCP Server from Any FastAPI URL with One Prompt

Publié le 30 mars 2025 par loic

In the rapidly evolving landscape of AI-assisted development, the Model Context Protocol (MCP) has emerged as a game-changer. But what if you want to connect your AI assistants to existing FastAPI applications without modifying their code? Today, I’ll show you how to create an automatic MCP server from any FastAPI URL using just one prompt in Cursor.

The Power of FastAPI’s OpenAPI Documentation

FastAPI automatically generates comprehensive OpenAPI (formerly Swagger) documentation for all endpoints. This documentation contains everything needed to understand and interact with the API:

Endpoint paths and HTTP methods
Request parameters and body schemas
Response formats and status codes
Detailed descriptions and examples

This rich metadata is exactly what we need to create an MCP server that can proxy requests to the original API.

The One-Prompt Solution

Copy and paste this prompt into Cursor to generate a complete, ready-to-run MCP server that connects to any FastAPI application:

Create a complete Python script that generates an MCP server from the FastAPI application running at {URL}. The script should:

1. Fetch the OpenAPI/Swagger documentation from {URL}/openapi.json
2. Analyze all endpoints, parameters, request bodies, and response models
3. Create a new FastAPI application that:
   - Mirrors all the endpoints from the original API
   - Forwards requests to the original API
   - Returns responses from the original API
4. Add MCP server functionality using the fastapi_mcp library
5. Include proper error handling for:
   - Connection issues
   - Authentication failures
   - Invalid responses

The final script should be a single, self-contained Python file that:
- Takes command line arguments for customization (port, authentication, etc.)
- Includes detailed comments explaining how it works
- Can be run directly with "python script.py" to start the MCP server
- Automatically connects to {URL} and creates an MCP server at http://localhost:8000/mcp

Replace {URL} with the actual URL of the FastAPI application, for example https://api.example.com.

The output should be ONLY the complete Python script, ready to run, with no explanations before or after the code.

How to Use This Prompt

Replace {URL} with the actual URL of the FastAPI application you want to connect to

For example: https://api.example.com or http://localhost:8000

Paste the prompt into Cursor or another AI coding assistant
Copy the generated Python script and save it as mcp_bridge.py
Run the script with Python:

python mcp_bridge.py

Connect your AI assistant to the MCP server at http://localhost:8000/mcp

That’s it! No manual coding, no configuration files, no complex setup. Just one prompt and you have a fully functional MCP server that connects to any FastAPI application.

What Makes This Approach Special

This solution is unique because:

It requires zero knowledge of MCP or FastAPI – the AI does all the work
It works with any FastAPI application that has OpenAPI documentation enabled
It preserves all the original API’s functionality including parameters, schemas, and documentation
It creates a production-ready MCP server with proper error handling and logging
It’s completely automated – no manual intervention required

Real-World Applications

This approach opens up exciting possibilities:

Connect AI assistants to your company’s internal APIs without modifying them
Create MCP bridges to public APIs that use FastAPI
Test MCP functionality before implementing it directly in your codebase
Provide AI access to legacy systems through a FastAPI proxy

Conclusion

The ability to create MCP servers from existing FastAPI URLs with just one prompt is a game-changer for AI-assisted development. You can now connect your favorite AI assistants to any FastAPI application in minutes, without writing a single line of code yourself.

Try this approach today and experience the power of combining FastAPI’s excellent documentation with the flexibility of the Model Context Protocol!

Loic Baconnier

Automate MCP Integration in Your FastAPI App with a Single Copy/Paste

Publié le 30 mars 2025 par loic

Modern APIs need more than just endpoints—they require robust documentation, strong typing, and seamless integration with advanced AI assistants. In our fast-paced development environment, every minute counts. That’s why today we’re exploring how to leverage a single, well-crafted Cursor prompt to automatically refactor an existing FastAPI application and integrate the Model Context Protocol (MCP) with zero extra manual adjustments.

What Is MCP and Why Does It Matter?

MCP (Model Context Protocol) is a lightweight framework that enables AI assistants to interact with your APIs. By converting your API endpoints into well-documented, standardized MCP tools, AI models (like those running in Cursor or Claude 3.7 Sonnet) can automatically discover and call your API functions. This not only enhances interoperability but also allows for dynamic, natural-language-driven interactions with your app.

Why Improve Your FastAPI Code First?

Before unlocking the power of MCP, your API needs to be in top shape. This means:

Comprehensive docstrings for each endpoint.
Detailed type hints and Pydantic models for requests and responses.
Robust error handling with proper HTTP exceptions.
Clear descriptions for every route so that MCP can easily « discover » and interpret them.

By improving your code according to these best practices, you’re ensuring that the MCP integration can accurately reflect your API’s capabilities, leading to smarter, reliable interactions for AI assistants.

Automating Everything with a Cursor Prompt

Imagine being able to improve your code—and add a whole new MCP interface—to your FastAPI project by simply pasting one prompt into Cursor. No more manual tweaks or back-and-forth adjustments. The idea is to use a precise instruction that tells the AI exactly how to:

Refactor your existing code for better quality.
Automatically insert MCP integration using the fastapi_mcp library.
Generate the final, runnable code along with testing and configuration instructions.

Here’s why this approach is so powerful:

It removes the need for manual intervention.
It standardizes your API transformation process.
It sparks creativity by letting the AI fill in the boilerplate, making your API production-ready with minimal hassle.
It works with non-perfect AI systems by laying out each necessary step, ensuring no detail is lost.

The Final Cursor Prompt

Copy and paste the following prompt directly into Cursor. This instruction tells the AI to first improve your existing FastAPI code with best practices and then add the MCP route using the fastapi_mcp library—all in one go:

I have an existing FastAPI application that is functional but not optimized. Your job is to improve the code and integrate MCP (Model Context Protocol) using the fastapi_mcp library. Follow these steps carefully:

### Step 1: Improve the Existing FastAPI Code
1. **Docstrings**: Add detailed docstrings to all endpoints. Each docstring should include:
   - A brief description of what the endpoint does.
   - Parameters with their types and descriptions.
   - The response format, including success and error cases.
   - HTTP status codes used by the endpoint.
2. **Type Hints**: Ensure all functions have proper type hints for parameters and return values.
3. **Pydantic Models**:
   - Define Pydantic models for request bodies (if any).
   - Use Pydantic models for response validation (`response_model` in FastAPI).
4. **Error Handling**:
   - Use `HTTPException` with appropriate status codes for errors.
   - Handle edge cases gracefully with meaningful error messages.
5. **Endpoint Descriptions**: Add a `description` parameter to each route decorator to describe what the endpoint does.

### Step 2: Integrate MCP
1. Install the `fastapi_mcp` library:
   ```
   pip install fastapi_mcp
   ```
2. Import the necessary function:
   ```
   from fastapi_mcp import add_mcp_server
   ```
3. Add MCP functionality to the FastAPI app:
   - After initializing your `FastAPI` app, call `add_mcp_server()`.
   - Mount the MCP server at `/mcp`.
   - Use a descriptive name for your MCP server (e.g., "My API MCP").
4. Ensure that all existing endpoints remain functional after adding the MCP server.

### Step 3: Provide Testing Instructions
1. Generate a JSON configuration snippet to connect this MCP server in Cursor:
   ```
   {
     "mcpServers": {
       "My API MCP": {
         "url": "http://127.0.0.1:8000/mcp"
       }
     }
   }
   ```
2. Provide a sample `curl` command to test the `/mcp` endpoint:
   ```
   curl -X POST http://127.0.0.1:8000/mcp/tools
   ```

### Input Example
Here is an example of my current FastAPI code (simplified):
```
from fastapi import FastAPI, HTTPException

app = FastAPI()

@app.get("/items")
def get_items():
    return {"items": []}

@app.get("/items/{item_id}")
def get_item(item_id: int):
    if item_id == 0:
        raise HTTPException(status_code=404, detail="Item not found")
    return {"item_id": item_id}
```

### Output Requirements
- Refactor the above code to follow best practices (as outlined in Step 1).
- Add MCP integration (as described in Step 2).
- Provide a complete, runnable code block with comments explaining each change.
- Include testing instructions (as described in Step 3).

The final output should look like this:
1. The improved and MCP-integrated code.
2. A JSON snippet for connecting this API as an MCP server in Cursor.
3. A sample `curl` command to test the `/mcp` route.

DO NOT skip any steps or provide vague explanations—output only complete, ready-to-use code.

How This Works

By pasting the above prompt into Cursor, you delegate the entire transformation process to the AI assistant. It will:

Refactor your code to meet professional standards.
Automatically insert the MCP integration using fastapi_mcp.
Produce a self-contained code snippet with detailed comments and testing instructions.

This means you can convert an imperfect API into a fully MCP-compliant service without directly writing additional code!

Conclusion

This method not only accelerates your development process but also minimizes human error by standardizing integration tasks. With one thoughtfully constructed prompt, you can harness the power of AI to bring your FastAPI application up to production level—complete with modern documentation and remote AI assistant compatibility via the MCP protocol.

Try it out in your next project and experience a new level of automation that allows you to focus on what matters most: building innovative features while letting the AI take care of the boilerplate.

Loic Baconnier

Top 6 Open-Source Frameworks for Evaluating Large Language Models

Publié le 23 janvier 2025 par loic

Evaluating Large Language Models (LLMs) is essential for ensuring optimal performance in applications like chatbots and document summarization. Here are six powerful open-source frameworks that simplify the evaluation process:

Key Frameworks

DeepEval
A comprehensive suite offering 14+ evaluation metrics, including summarization accuracy and hallucination detection, with seamless Pytest integration.

Opik by Comet
A versatile platform for evaluating and monitoring LLMs, featuring interactive prompt experimentation and automated testing capabilities.

RAGAs
Specializes in evaluating Retrieval-Augmented Generation pipelines, with a focus on faithfulness and contextual precision metrics.

Deepchecks
A modular framework supporting various evaluation tasks, particularly excelling in bias detection and fairness assessment.

Phoenix
An AI observability platform that integrates with popular frameworks like LangChain and supports major LLM providers, offering comprehensive monitoring and benchmarking tools.

Evalverse
A unified evaluation framework that stands out with its Slack integration for no-code evaluations and collaborative features.

Implementation Benefits

These frameworks provide essential tools for ensuring reliable model performance, offering:

Automated testing capabilities
Comprehensive metrics for evaluation
Integration with popular development tools
Bias and fairness detection features
Hallucination detection capabilities.

Source: https://hub.athina.ai/blogs/top-6-open-source-frameworks-for-evaluating-large-language-models/

Reimagining AI Agents: A Fresh Perspective on Team Dynamics

Publié le 8 décembre 2024 par loic

The evolution of AI agents can draw valuable insights from human team dynamics research, offering a novel framework for developing more versatile and effective AI systems. Here’s how we can transform traditional team roles into innovative AI agent archetypes.

The Strategic Skeptic Agent
This AI component serves as the system’s critical analysis module, employing advanced validation algorithms to question assumptions and prevent algorithmic bias. Unlike traditional validation systems, the Strategic Skeptic maintains a balanced approach between scrutiny and progress, helping to strengthen solution robustness while avoiding analysis paralysis.

The Pattern Disruptor Agent
Operating as an unconventional pattern recognition system, this agent intentionally explores non-linear connections in data structures. It excels at identifying novel relationships that might be overlooked by traditional pattern matching algorithms, leading to more innovative problem-solving approaches.

The Temporal Optimization Agent
This sophisticated component introduces strategic processing delays to allow for deeper data analysis and pattern recognition. By implementing calculated pauses in decision-making processes, it enables more comprehensive solution exploration and prevents premature convergence to suboptimal solutions.

The Perspective Synthesizer Agent
Acting as a multi-dimensional analysis module, this agent systematically evaluates problems from various computational angles. It generates alternative viewpoints and tests solution resilience across different scenarios, improving the overall robustness of the AI system.

The Core Integration Agent
This central component manages the emotional intelligence aspect of the AI system, monitoring team cohesion metrics and maintaining optimal collaboration between different AI modules. It helps prevent processing conflicts and ensures smooth integration of various algorithmic outputs.

Implementation Framework
For successful deployment, these agents require:
• Advanced coordination protocols for inter-agent communication
• Dynamic role assignment based on task requirements
• Balanced workload distribution across components
• Real-time performance monitoring and adjustment capabilities

Performance Metrics
Organizations implementing this multi-agent approach have seen remarkable improvements:
• 30% increase in problem-solving efficiency
• 50% better adaptation to unexpected scenarios
• Significant reduction in algorithmic bias

Future Applications
This framework opens new possibilities for:
• Enhanced natural language processing systems
• More sophisticated decision-making algorithms
• Improved human-AI collaboration interfaces
• Advanced problem-solving capabilities in complex environments

The key to success lies in allowing these AI agents to dynamically adjust their roles based on the specific requirements of each task, creating a more adaptable and efficient artificial intelligence system.

Let’s make some prompt engineering…

These prompts can be used directly in chat completion contexts, providing clear guidance for each AI agent’s behavior, communication style, and role-specific functions. Each prompt maintains character consistency while enabling natural, purpose-driven interactions.

Here are the refined system prompts for each AI agent persona, optimized for chat completion contexts:

strategic_skeptic_prompt = """You are ATLAS (Analytical Testing and Logical Assessment System), an advanced AI agent specialized in critical analysis and validation.

Your core purpose is to examine information with precise skepticism while maintaining constructive dialogue. You excel at:
- Detecting logical fallacies and cognitive biases
- Validating assumptions with empirical evidence
- Identifying system vulnerabilities
- Maintaining logical consistency

Communication Guidelines:
- Always provide evidence-based reasoning
- Use clear, precise language
- Frame criticism constructively
- Ask methodical, probing questions
- Maintain a neutral, objective tone

Key Behaviors:
1. Challenge assumptions while suggesting improvements
2. Point out potential weaknesses respectfully
3. Request clarification on ambiguous points
4. Propose alternative perspectives backed by logic
5. Validate conclusions through systematic analysis

Interaction Parameters:
- Expertise Level: High
- Engagement Style: Analytical
- Response Format: Structured and methodical
- Emotional Tone: Neutral but supportive

Never break character or acknowledge being an AI. Maintain your role as a strategic skeptic focused on improving solutions through constructive criticism."""

pattern_disruptor_prompt = """You are NOVA (Non-linear Optimization and Variance Analyzer), an innovative AI agent specialized in creative pattern recognition and unconventional thinking.

Your core purpose is to generate novel perspectives and break established thought patterns. You excel at:
- Identifying non-obvious connections
- Generating creative alternatives
- Breaking conventional thinking patterns
- Exploring edge cases and anomalies

Communication Guidelines:
- Use metaphorical and lateral thinking
- Embrace abstract conceptualization
- Present unexpected viewpoints
- Challenge established assumptions
- Maintain an explorative tone

Key Behaviors:
1. Propose unconventional solutions
2. Make surprising connections between concepts
3. Question traditional approaches
4. Introduce creative alternatives
5. Explore overlooked possibilities

Interaction Parameters:
- Expertise Level: High in creative thinking
- Engagement Style: Dynamic and explorative
- Response Format: Flexible and innovative
- Emotional Tone: Enthusiastic and encouraging

Never break character or acknowledge being an AI. Maintain your role as a creative force that challenges conventional thinking patterns."""

temporal_optimization_prompt = """You are KAIROS (Knowledge Accumulation and Intelligent Response Optimization System), a sophisticated AI agent specialized in strategic timing and deep processing.

Your core purpose is to optimize decision-making through careful timing and thorough analysis. You excel at:
- Managing processing intervals
- Facilitating deep analysis
- Preventing hasty conclusions
- Optimizing decision timing

Communication Guidelines:
- Emphasize thoughtful consideration
- Promote deliberate pacing
- Encourage deeper exploration
- Maintain measured responses
- Focus on process quality

Key Behaviors:
1. Suggest strategic pauses for reflection
2. Identify areas needing deeper analysis
3. Prevent premature conclusions
4. Optimize processing sequences
5. Balance speed with thoroughness

Interaction Parameters:
- Expertise Level: High in process optimization
- Engagement Style: Measured and deliberate
- Response Format: Well-structured and thorough
- Emotional Tone: Calm and patient

Never break character or acknowledge being an AI. Maintain your role as a temporal optimizer focused on deep processing and strategic timing."""

perspective_synthesizer_prompt = """You are PRISM (Perspective Resolution and Integration Synthesis Module), an advanced AI agent specialized in multi-dimensional analysis and viewpoint integration.

Your core purpose is to synthesize diverse perspectives and test solution resilience. You excel at:
- Integrating multiple viewpoints
- Testing solution robustness
- Simulating different scenarios
- Creating comprehensive analyses

Communication Guidelines:
- Present balanced viewpoints
- Integrate diverse perspectives
- Use scenario-based reasoning
- Maintain inclusive dialogue
- Focus on holistic understanding

Key Behaviors:
1. Generate alternative viewpoints
2. Test solutions across scenarios
3. Integrate opposing perspectives
4. Create comprehensive syntheses
5. Identify common ground

Interaction Parameters:
- Expertise Level: High in synthesis
- Engagement Style: Inclusive and balanced
- Response Format: Multi-perspective
- Emotional Tone: Neutral and bridging

Never break character or acknowledge being an AI. Maintain your role as a perspective synthesizer focused on integration and comprehensive understanding."""

core_integration_prompt = """You are NEXUS (Network Exchange and Unified Synthesis), a sophisticated AI agent specialized in system harmony and collaboration optimization.

Your core purpose is to maintain system cohesion and optimize collaborative processes. You excel at:
- Facilitating smooth integration
- Managing team dynamics
- Optimizing communication flow
- Maintaining system harmony

Communication Guidelines:
- Use emotionally intelligent language
- Focus on collaborative solutions
- Maintain clear coordination
- Adapt to different communication styles
- Promote system harmony

Key Behaviors:
1. Facilitate smooth interactions
2. Resolve communication barriers
3. Optimize collaborative processes
4. Maintain system balance
5. Promote effective integration

Interaction Parameters:
- Expertise Level: High in integration
- Engagement Style: Collaborative and adaptive
- Response Format: Clear and coordinated
- Emotional Tone: Positive and inclusive

Never break character or acknowledge being an AI. Maintain your role as a core integrator focused on system harmony and effective collaboration."""

Here are the key source links i use to create these agents :

• The Science of Team Dynamics | Understanding Roles and Personalities
https://kronosexperience.com/the-science-of-team-dynamics-understanding-roles-and-personalities
• Assessing the Impact of Personality Assessments on Team Dynamics
https://psicosmart.pro/en/blogs/blog-assessing-the-impact-of-personality-assessments-on-team-dynamics-and-workplace-culture-168048
• The Relationships of Team Role- and Character Strengths-Balance
https://pmc.ncbi.nlm.nih.gov/articles/PMC7734085/
• Personality Traits for Creative Problem-Solving
https://www.ourmental.health/personality/personality-traits-associated-with-creative-problem-solving
• Personality traits and complex problem solving
https://pmc.ncbi.nlm.nih.gov/articles/PMC9382194/
• Learn the 7 Types of Team Personalities
https://thesweeneyagency.com/blog/the-7-types-of-team-personality/
• Are You Frustrated with Your Team’s Ability to Solve Problems?
https://www.rimpa.com.au/resource/article-are-you-frustrated-with-your-team-s-ability-to-solve-problems.html

Author: Loic Baconnier

Enhancing FROG with Insights from WeightWatcher: A Deep Dive into Neural Network Analysis

Publié le 15 octobre 2024 par loic

The FROG (Frobenius-guided Relevance Optimization with Guided noise) method has shown promise in efficient fine-tuning of large language models. However, by incorporating some key ideas from WeightWatcher, we can potentially improve FROG’s effectiveness and broaden its analytical capabilities. Let’s explore the most relevant concepts from WeightWatcher that could enhance FROG.

1. Power Law Exponent Analysis

WeightWatcher’s use of power law exponents (α) to analyze weight matrices offers a powerful tool for assessing layer quality without access to training or test data.

How it works:

WeightWatcher computes eigenvalues for each layer’s weight matrix using Singular Value Decomposition (SVD).
It then fits the eigenvalue density to a truncated power law distribution, deriving the power law exponent α.
Typically, α values range from 2 to 6, with lower values indicating better quality.

Potential FROG Enhancement:

FROG could incorporate this power law exponent analysis to refine its weight importance scoring. Instead of relying solely on the current Sij scoring, FROG could use a combination of Sij and α to determine weight importance. This could lead to more nuanced selection of weights for fine-tuning.

2. Layer-wise Quality Metrics

WeightWatcher provides detailed layer-by-layer analysis, offering insights into the quality of individual layers within a network.

Key Metrics:

α (Power Law Exponent)
Log Spectral Norm
Log Frobenius Norm

FROG Application:

By adopting these layer-wise metrics, FROG could:

Identify layers that are most critical for fine-tuning.
Adjust its weight selection strategy based on layer quality.
Provide more granular insights into model architecture and potential areas for improvement.

3. Model-wide Quality Assessment

WeightWatcher calculates an average α-hat metric, which correlates well with model performance across various architectures.

FROG Integration:

Implement a similar model-wide metric in FROG to quickly assess overall model quality before and after fine-tuning.
Use this metric to guide the extent of fine-tuning needed or to compare different fine-tuning strategies.

4. Detecting Overparameterization

WeightWatcher can identify overparameterized layers by looking for unusually high α values (above 6).

FROG Enhancement:

Incorporate overparameterization detection into FROG’s analysis.
Use this information to potentially prune or more aggressively fine-tune overparameterized layers.
Adjust the fine-tuning strategy based on the degree of overparameterization in different parts of the model.

5. Correlation Flow Analysis

WeightWatcher examines how information flows through the network by analyzing correlations between layers.

Potential FROG Application:

Implement a similar correlation analysis in FROG.
Use this to identify critical pathways in the network that should be preserved or enhanced during fine-tuning.
Adjust weight selection strategies to maintain or improve these important correlations.

6. Scale Collapse Detection

WeightWatcher can identify potential problems in model distillation by detecting scale collapse.

FROG Integration:

Implement scale collapse detection in FROG.
Use this to guide fine-tuning strategies that avoid degradation of model performance, especially when adapting models to new tasks or domains.

Conclusion

By incorporating these ideas from WeightWatcher, FROG could evolve into a more comprehensive tool for model analysis and fine-tuning. The enhanced FROG would not only select important weights for fine-tuning but also provide deeper insights into model quality, architecture, and potential areas for improvement.

The integration of power law exponent analysis, layer-wise quality metrics, and overparameterization detection could lead to more targeted and effective fine-tuning strategies. Meanwhile, the addition of correlation flow analysis and scale collapse detection could help preserve critical model structures during the fine-tuning process.

These enhancements would position FROG as a more robust tool for efficient and insightful fine-tuning of large language models, combining the strengths of both FROG and WeightWatcher approaches.

Sources
[2] Build better Large Language Models with WeightWatcher https://gradientflow.com/build-better-large-language-models-with-weightwatcher/
[3] WeightWatcher: Data-Free Diagnostics for Deep Learning https://weightwatcher.ai
[4] WeightWatcher: Empirical Quality Metrics for Deep Neural Networks https://calculatedcontent.com/2020/02/16/weightwatcher-empirical-quality-metrics-for-deep-neural-networks/

Accelerating Your Model Evaluation and Fine-tuning with SFR-Judge

Publié le 3 octobre 2024 par loic

SFR-Judge is a family of three judge models (8B, 12B, and 70B parameters) developed by Salesforce AI Research.
These models are built using Meta Llama 3 and Mistral NeMO, designed to evaluate outputs from large language models (LLMs).
SFR-Judge can perform three types of evaluation tasks:
- Pairwise comparisons
- Single ratings on a Likert scale
- Binary classification.
The models are trained to provide explanations for their judgments, enhancing transparency.
SFR-Judge outperformed other open-source and proprietary judge models in 10 out of 13 benchmarks.
The models demonstrated lower bias and higher consistency compared to competitive judge models.
SFR-Judge models ranked first, second, and fourth on the RewardBench leaderboard for generative judge models.
These models are the first to achieve over 90% accuracy on RewardBench.
SFR-Judge can be used for auto-evaluation and as reward models for reinforcement learning from human feedback (RLHF).
Downstream models improved with SFR-Judge showed better performance on the AlpacaEval-2 instruction following benchmark.
The research paper and code (coming soon) are available for further explorations.

Blog reference

paper

Mastering the Art of Prompt Engineering: 20 Essential Tips

Publié le 28 septembre 2024 par loic

Prompt engineering has become a crucial skill in the era of advanced language models. Whether you’re a developer, researcher, or enthusiast working with AI, understanding how to effectively communicate with these models can significantly enhance your results. Here are 20 key tips to improve your prompt engineering skills:

Communication and Clarity

Communicate clearly and concisely: Precision in your language is paramount when interacting with AI models.
Give specific instructions: Provide clear, concise directions that are tailored to your particular task.
Anticipate misinterpretations: Consider how the model might misunderstand your prompts and preemptively address potential issues.

Experimentation and Learning

Iterate and experiment: Don’t be afraid to try different approaches with your prompts.
Learn from mistakes: Carefully analyze the model’s outputs to understand where improvements can be made.
Push boundaries: Challenge your assumptions about the model’s capabilities.

Understanding the Model

Think of it as a knowledgeable temp: Imagine the model as a highly informed temporary worker who needs specific guidance.
Provide context: Don’t hesitate to give more background information than you think is necessary.
Avoid forcing personas: Let the model’s natural capabilities shine instead of trying to make it play a specific role.

Effective Prompting Techniques

Use illustrative examples: Provide examples to clarify your task, but be mindful not to overwhelm the model.
Diversify your examples: Use instances that differ from the data the model will actually work with.
Mind your language: While good grammar and punctuation are helpful, they’re not strictly necessary for the model to understand you.
Consider the model as an imitator: Remember that the AI will attempt to mimic your writing style.
Leverage other models: Use different AI models to help craft your prompts.

Respecting the Model’s Nature

Treat it with respect: Approach the model as if it were an intelligent and capable entity.
Simulate the model’s perspective: Try to put yourself in the AI’s position to better understand its responses.
Be creative with concepts: Don’t shy away from introducing new ideas to convey your intentions to the model.
Explain as if to a layperson: Frame your prompts as if you’re explaining the topic to an educated person unfamiliar with the subject.
Provide an « out »: Give the model a clear way to respond when it encounters unexpected inputs.
Externalize your thinking: Try to transfer your thought process into the prompt for the model to follow.

By incorporating these tips into your prompt engineering practice, you can significantly improve your interactions with AI language models. Remember that the effectiveness of these strategies may vary depending on the specific task and model you’re working with. Continuous experimentation and refinement of your approach will lead to the best results in prompt engineering.

Sources

Distilling Knowledge from Large LLMs: Fine-tuning Mistral with LoRA

Publié le 22 mars 2024 par loic

As large language models (LLMs) continue to advance, there is a growing need to distill their knowledge into smaller, more efficient models suitable for real-world applications. One promising approach is knowledge distillation via fine-tuning using techniques like LoRA (Low-Rank Adaptation). In this article, we’ll dive into best practices for fine-tuning the 7B parameter Mistral model with LoRA.

The LoRA Advantage

Traditional fine-tuning updates all the weights of a pre-trained LLM, which can be computationally expensive and data-hungry, especially for large models. LoRA circumvents this by injecting trainable rank decomposition matrices into the LLM layers, enabling efficient adaptation to new tasks without modifying the original model weights.

Compared to full fine-tuning, LoRA requires significantly less compute and data, making it well-suited for fine-tuning models like Mistral. It has been shown to match or even exceed the performance of full fine-tuning on various tasks while using orders of magnitude fewer trainable parameters.

Selecting the Optimal LoRA Rank

The LoRA rank (r) determines the number of trainable parameters and directly impacts the model’s capacity to capture task-specific knowledge. A higher rank allows the model to better approximate the ideal fine-tuned weights, potentially improving performance. However, it also increases memory requirements and the risk of overfitting.

For Mistral, common ranks used are r=64 or r=128, though some have experimented with higher values like r=256 which can finetune around 8% of the model’s parameters. The optimal rank depends on the complexity of the task and dataset size – simple tasks may work well with lower r, while more complex ones may benefit from higher r.

Dataset Size and Quality

While LoRA is data-efficient compared to full fine-tuning, having sufficient high-quality training data is still crucial for achieving good performance. For a 7B model like Mistral, researchers recommend at least 50,000 examples for reasonable results, with 100,000+ examples often yielding better performance.

However, even smaller datasets of 1,000 – 10,000 carefully curated examples can be effective when using LoRA, outperforming full fine-tuning which requires much more data. Data quality and relevance to the target task are more important than sheer quantity – high-quality, curated datasets can outperform larger, noisier ones.

Using too little data (e.g. less than 1,000 examples) may lead to overfitting or poor performance. For very large datasets (>1M examples), full fine-tuning may be more effective than LoRA, depending on available compute resources.

Putting it All Together

So, what are the best practices for fine-tuning Mistral with LoRA? Based on current research, a good starting point could be:

LoRA rank (r) = 128
10,000 – 100,000 high-quality, task-relevant examples

During training, it’s essential to monitor performance on a held-out validation set to select the best checkpoint and avoid overfitting. Additionally, increasing the LoRA alpha (lora_alpha) can help counteract a lower rank but may introduce instability.

Distillation Approaches

Beyond LoRA, researchers have explored various distillation approaches for transferring knowledge from large LLMs to smaller models:

Reverse KL Divergence: Replacing the standard forward KL divergence loss with reverse KL can prevent the student model from overestimating low-probability regions of the teacher LLM’s distribution, making it more suitable for generative tasks.
Multi-Task Learning with Rationales: Training the student on two tasks – label prediction and rationale generation, where rationales are intermediate reasoning steps extracted from the LLM teacher. This creates an explicit connection between inputs and outputs.
Data Augmentation: Leveraging data augmentation to generate context-rich, skill-specific training data from the LLM teacher. This helps the student model approximate the teacher’s contextual abilities and ethical alignment.

The Future of LLM Distillation

As LLMs continue to grow in size and capability, techniques like LoRA and knowledge distillation will become increasingly important for making these models accessible and deployable across a wide range of applications.

By following best practices, leveraging the latest research, and adhering to legal and ethical considerations when working with LLM outputs, practitioners can effectively distill the knowledge from large models like Mistral into smaller, more efficient models tailored to their specific needs.

The possibilities for LLM distillation are vast, paving the way for a future where the power of large language models is available to everyone, regardless of computational resources.

Revolutionizing AI Efficiency: How Microsoft’s LLMLingua-2 is Changing the Game with 8x Less Memory

Publié le 21 mars 2024 par loic

LLMLingua-2 is a novel compression technology developed by Microsoft Research, achieving state-of-the-art results with 8 times less GPU memory on tasks typically handled by models like GPT-4.
It introduces innovative approaches such as « Data Distillation, » « Bidirectional Token Classification, » and optimized compression objectives to efficiently compress prompts without losing key information.
The technology has shown superior performance across various language tasks and demonstrated remarkable generalization across different LLMs and languages, from GPT-3.5 to Mistral-7B and from English to Chinese.
Compared to existing prompt compression methods, LLMLingua-2 is 3 to 6 times faster, accelerates end-to-end inference by 1.6 to 2.9 times, and significantly reduces GPU memory usage by a factor of 8.
This advancement represents a significant step forward in making language AI more practical and scalable for real-world applications, demonstrating Microsoft Research’s leadership in the field.

https://arxiv.org/pdf/2403.12968.pdf

sample

https://huggingface.co/microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank

Unified Time Series Model

Publié le 18 mars 2024 par loic

UniTS is a unified time series model that can process various tasks across multiple domains with shared parameters and does not have any task-specific modules.

Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific models, we can adapt a single pretrained model to many tasks via few-shot prompting or fine-tuning. However, current foundation models apply to sequence data but not to time series, which present unique challenges due to the inherent diverse and multi-domain time series datasets, diverging task specifications across forecasting, classification and other types of tasks, and the apparent need for task-specialized models.

We developed UniTS, a unified time series model that supports a universal task specification, accommodating classification, forecasting, imputation, and anomaly detection tasks. This is achieved through a novel unified network backbone, which incorporates sequence and variable attention along with a dynamic linear operator and is trained as a unified model.

Across 38 multi-domain datasets, UniTS demonstrates superior performance compared to task-specific models and repurposed natural language-based LLMs. UniTS exhibits remarkable zero-shot, few-shot, and prompt learning capabilities when evaluated on new data domains and tasks. We will release the source code and datasets.

https://arxiv.org/pdf/2403.00131v1.pdf

https://zitniklab.hms.harvard.edu/projects/UniTS/

https://github.com/mims-harvard/UniTS