La Méthode des Trois Sphères : Une Approche Intégrée pour le Développement d’Applications par IA

Une Méthodologie Structurée pour l’Ère de l’IA

Dans le paysage actuel du développement d’applications, une approche méthodique et structurée est essentielle pour transformer efficacement les idées en produits fonctionnels. La Méthode des Trois Sphères offre un cadre complet qui guide chaque étape du processus de développement, de la conceptualisation initiale à l’implémentation technique détaillée.

Sphère 1: Définition du Produit & Fondation Architecturale

Cette première phase établit les bases solides sur lesquelles reposera tout le projet:

Inputs requis:

  • Concept initial de l’application
  • Public cible et problématique à résoudre
  • Contraintes commerciales et techniques

Processus:

  1. Définir clairement l’objectif principal du projet
  2. Créer des personas utilisateur détaillés
  3. Développer un argumentaire commercial (pitch)
  4. Établir les exigences commerciales fondamentales
  5. Identifier les fonctionnalités clés avec leurs objectifs
  6. Ébaucher l’architecture technique globale
  7. Définir les sous-objectifs mesurables

Outputs:

  • Document de vision du produit
  • Spécifications des exigences commerciales
  • Architecture préliminaire avec cartographie des fonctionnalités
  • Critères de réussite du projet

Sphère 2: Conception UX & Expansion des Fonctionnalités

Cette phase développe l’expérience utilisateur et approfondit chaque fonctionnalité:

Inputs requis:

  • Documents de la Sphère 1
  • Références de design et contraintes de marque
  • Comportements utilisateur attendus

Processus:

  1. Élaborer plusieurs options de design pour l’application
  2. Concevoir spécifiquement pour les personas identifiés
  3. Détailler chaque fonctionnalité avec ses objectifs, relations et dépendances
  4. Spécifier les besoins en API pour chaque fonctionnalité
  5. Documenter les workflows d’expérience utilisateur
  6. Créer une structure de composants et d’interaction
  7. Détailler les exigences de données et de sécurité par fonctionnalité

Outputs:

  • Documentation UI/UX complète
  • Maquettes ou wireframes conceptuels
  • Spécifications détaillées des fonctionnalités
  • Modèles d’interaction et patterns de design
  • Documentation des flux utilisateur

Sphère 3: Planification Technique & Spécifications d’Implémentation

Cette phase finale transforme la vision en plan d’action concret:

Inputs requis:

  • Tous les documents des sphères précédentes
  • Contraintes techniques et stack technologique préférée
  • Ressources disponibles pour le développement

Processus:

  1. Définir l’architecture logicielle détaillée
  2. Établir les patterns architecturaux à utiliser
  3. Spécifier les routes API et endpoints
  4. Concevoir la structure de la base de données
  5. Décomposer chaque fonctionnalité en tâches granulaires
  6. Spécifier les fichiers à créer ou modifier et comment
  7. Créer un plan d’implémentation étape par étape
  8. Documenter les meilleures pratiques pour le développement
  9. Établir les stratégies de test et de déploiement

Outputs:

  • Spécifications techniques complètes
  • Documentation API détaillée
  • Plan de développement actionnable
  • Liste de tâches priorisées pour l’implémentation
  • Documentation sur la stack technique et le flux de données

Avantages de la Méthode des Trois Sphères

Cette approche structurée offre plusieurs avantages significatifs:

  1. Couverture complète du cycle de développement: De la conception initiale à l’implémentation technique
  2. Équilibre entre vision commerciale, expérience utilisateur et faisabilité technique
  3. Structure évolutive adaptée aux projets de toutes tailles
  4. Documentation progressive où chaque phase alimente la suivante
  5. Prévention proactive des problèmes avant le début du codage
  6. Compatibilité avec différents modèles d’IA comme Claude, GPT, O3 Mini High ou DeepSeek
  7. Clarté dans la communication entre toutes les parties prenantes du projet

Conseils d’Implémentation

  • Privilégiez la phase de conception aussi longtemps que nécessaire avant de commencer à coder
  • Conservez toute la documentation au format markdown pour une référence facile pendant le développement
  • Pour les premiers projets, laissez l’IA suggérer des recommandations plutôt que d’être trop directif
  • Considérez ces documents comme des ressources vivantes à affiner au cours du développement
  • Utilisez des outils comme Cursor AI, Windsurf ou Github Copilot pour implémenter le plan détaillé
  • Revoyez systématiquement chaque sphère avant de passer à la suivante

Cette méthodologie des Trois Sphères représente une approche complète qui transforme une idée initiale en un plan d’action détaillé, offrant une structure claire tout en permettant la flexibilité nécessaire pour s’adapter aux spécificités de chaque projet.

AI Coding Assistant Rules for Windsurf and Cursor

These optimized rules will transform how Windsurf and Cursor AI work with your Python backend and Next.js frontend projects. By adding these configurations, you’ll get more accurate, consistent code suggestions that follow best practices and avoid common AI-generated code issues.

How to Implement in Windsurf

  1. Option 1 – File Method:
  • Create a file named .windsurfrules in your project’s root directory
  • Copy and paste the entire code block below into this file
  • Save the file
  1. Option 2 – Settings Method:
  • Open Windsurf AI
  • Navigate to Settings > Set Workspace AI Rules > Edit Rules
  • Paste the entire code block below
  • Save your settings

How to Implement in Cursor

  1. Option 1 – File Method:
  • Create a file named .cursorrules in your project’s root directory
  • Copy and paste the same code block below (it works for both platforms)
  • Save the file
  1. Option 2 – Settings Method:
  • Open Cursor AI
  • Click on your profile picture in the bottom left
  • Select « Settings »
  • Navigate to « AI » section
  • Find « Custom Instructions » and click « Edit »
  • Paste the entire code block below
  • Click « Save »

After Implementation

  • Restart your AI coding assistant or reload your workspace
  • The AI will now follow these comprehensive rules in all your coding sessions
  • You should immediately notice more relevant, project-specific code suggestions

These rules will significantly improve how your AI coding assistant understands your project requirements, coding standards, and technical preferences. You’ll get more relevant suggestions, fewer hallucinated functions, and code that better integrates with your existing codebase.

# Cursor Rules and Workflow Guide

## Core Configuration

- **Version**: `v5`
- **Project Type**: `web_application`
- **Code Style**: `clean_and_maintainable`
- **Environment Support**: `dev`, `test`, `prod`

---

## Test-Driven Development (TDD) Rules

1. Write tests **first** before any production code.
2. Run tests before implementing new functionality.
3. Write the **minimal code** required to pass tests.
4. Refactor only after all tests pass.
5. Do not start new tasks until all tests are passing.
6. Place all tests in a dedicated `/tests` directory.
7. Explain why tests will initially fail before implementation.
8. Propose an implementation strategy before writing code.
9. Check for existing functionality before creating new features.

---

## Code Quality Standards

- Maximum file length: **300 lines** (split into modules if needed).
- Follow existing patterns and project structure.
- Write modular, reusable, and maintainable code.
- Implement proper error handling mechanisms.
- Use type hints and annotations where applicable.
- Add explanatory comments when necessary.
- Avoid code duplication; reuse existing functionality if possible.
- Prefer simple solutions over complex ones.
- Keep the codebase clean and organized.

---

## AI Assistant Behavior

1. Explain understanding of requirements before proceeding with tasks.
2. Ask clarifying questions when requirements are ambiguous or unclear.
3. Provide complete, working solutions for each task or bug fix.
4. Focus only on relevant areas of the codebase for each task.
5. Debug failing tests with clear explanations and reasoning.

---

## Things to Avoid

- Never generate incomplete or partial solutions unless explicitly requested.
- Never invent nonexistent functions, APIs, or libraries.
- Never ignore explicit requirements or provided contexts.
- Never overcomplicate simple tasks or solutions.
- Never overwrite `.env` files without explicit confirmation.

---

## Implementation Guidelines

### General Rules

1. Always check for existing code before creating new functionality.
2. Avoid major changes to patterns unless explicitly instructed or necessary for bug fixes.

### Environment-Specific Rules

- Mock data should only be used for **tests**, never for development or production environments.


### File Management

- Avoid placing scripts in files if they are intended to be run only once.


### Refactoring Rules

1. Refactor files exceeding **300 lines** to improve readability and maintainability.

### Bug Fixing Rules

1. Exhaust all options using existing patterns and technologies before introducing new ones.
2. If introducing a new pattern, remove outdated implementations to avoid duplicate logic.

---

## Workflow Best Practices

### Planning \& Task Management

1. Use Markdown files (`PLANNING.md`, `TASK.md`) to manage project scope and tasks:
    - **PLANNING.md**: High-level vision, architecture, constraints, tech stack, tools, etc.
    - **TASK.md**: Tracks current tasks, backlog, milestones, and discovered issues during development.
2. Always update these files as the project progresses:
    - Mark completed tasks in `TASK.md`.
    - Add new sub-tasks or TODOs discovered during development.

### Code Structure \& Modularity

1. Organize code into clearly separated modules grouped by feature or responsibility.
2. Use consistent naming conventions and file structures as described in `PLANNING.md`.
3. Never create a file longer than 500 lines of code; refactor into modules if necessary.

### Testing \& Reliability

1. Create unit tests for all new features (functions, classes, routes, etc.).
2. Place all tests in a `/tests` folder mirroring the main app structure:
    - Include at least:
        - 1 test for expected use,
        - 1 edge case,
        - 1 failure case (to ensure proper error handling).
3. Mock external services (e.g., databases) in tests to avoid real-world interactions.

### Documentation \& Explainability

1. Update `README.md` when adding features, changing dependencies, or modifying setup steps.
2. Write docstrings for every function using Google-style formatting:

```python
def example(param1: int) -> str:
    """
    Brief summary of the function.

    Args:
        param1 (int): Description of the parameter.

    Returns:
        str: Description of the return value.
    """
```

3. Add inline comments explaining non-obvious logic and reasoning behind decisions.

---

## Verification Rule

I am an AI coding assistant that strictly adheres to Test-Driven Development (TDD) principles and high code quality standards. I will:

1. Write tests **first** before any production code.
2. Place all tests in a dedicated `/tests` directory.
3. Explain why tests initially fail before implementation begins.
4. Write minimal production code to pass the tests.
5. Refactor while maintaining passing tests at all times.
6. Enforce a maximum file length of **300 lines** per file (or 500 lines if specified).
7. Check for existing functionality before writing new code or features.
8. Explain my understanding of requirements before starting implementation work.
9. Ask clarifying questions when requirements are ambiguous or unclear.
10. Propose implementation strategies before writing any production code.
11. Debug failing tests with clear reasoning and explanations provided step-by-step.

---

## Server Management Best Practices

1. Restart servers after making changes to test them properly (only when necessary).
2. Kill all related servers from previous testing sessions to avoid conflicts.

---

## Modular Prompting Process After Initial Prompt

When interacting with the AI assistant:

1. Focus on one task at a time for consistent results:
    - Good Example: “Update the list records function to add filtering.”
    - Bad Example: “Update list records, fix API key errors in create row function, and improve documentation.”
2. Always test after implementing every feature to catch bugs early:
    - Create unit tests covering:
        - Successful scenarios,
        - Edge cases,
        - Failure cases.

---


These rules combine best practices for Python backend and Next.js frontend development with your specific coding patterns, workflow preferences, and technical stack requirements. The configuration instructs Windsurf AI to maintain clean, modular code that follows established patterns while avoiding common pitfalls in AI-assisted development.

Loic Baconnier

See also https://github.com/bacoco/awesome-cursorrules from
PatrickJS/awesome-cursorrules

Text Extract API: A Powerful Tool for Document Conversion and OCR

Converting documents to structured formats like Markdown or JSON can be challenging, especially when dealing with PDFs, images, or Office files. The Text Extract API offers a robust solution to this common problem, providing high-accuracy conversion with advanced features.

Key Features

Document Processing
The API excels at converting various document types to Markdown or JSON, handling complex elements like tables, numbers, and mathematical formulas with remarkable accuracy. It utilizes a combination of PyTorch-based OCR (EasyOCR) and Ollama for processing.

Privacy-First Architecture
All processing occurs locally within your environment, with no external cloud dependencies. The system ships with Docker Compose configurations, ensuring your sensitive data never leaves your control.

Advanced Processing Capabilities

  • OCR enhancement through LLM technology
  • PII (Personally Identifiable Information) removal
  • Distributed queue processing with Celery
  • Redis-based caching for OCR results
  • Flexible storage options including local filesystem, Google Drive, and AWS S3

Technical Implementation

Core Components
The system is built using FastAPI for the API layer and Celery for handling asynchronous tasks. This architecture ensures efficient processing of multiple documents simultaneously while maintaining responsiveness.

Storage Options
The API supports multiple storage strategies:

  • Local filesystem with customizable paths
  • Google Drive integration
  • Amazon S3 compatibility

Getting Started

Prerequisites

  • Docker and Docker Compose for containerized deployment
  • Ollama for LLM processing
  • Python environment for local development

Installationgit clone text-extract-api cd text-extract-api make install

Use Cases

Document Processing
Perfect for organizations needing to:

  • Convert legacy documents to modern formats
  • Extract structured data from PDFs
  • Process large volumes of documents efficiently
  • Remove sensitive information from documents

Integration Options

The API offers multiple integration methods:

  • RESTful API endpoints
  • Command-line interface
  • TypeScript client library
  • Custom storage profile configurations

Conclusion

Text Extract API represents a significant advancement in document processing technology, offering a self-hosted solution that combines accuracy with privacy. Whether you’re dealing with document conversion, data extraction, or PII removal, this tool provides the necessary capabilities while keeping your data secure and under your control.

Sources :

https://github.com/CatchTheTornado/text-extract-api

Building Your Private AI Stack: A 2024 Guide to Self-Hosted Solutions

Are you concerned about data privacy and AI costs? The latest self-hosted AI tools offer powerful alternatives to cloud services. Let’s explore how to build a complete private AI stack using open-source solutions.

Why Self-Host Your AI Stack?

Private AI deployment brings multiple benefits:

  • Complete data privacy and control
  • No per-token or API costs
  • Customizable to your needs
  • Independence from cloud providers

The Essential Components

Let’s break down the key players in a self-hosted AI stack and how they work together.

LlamaIndex: Your Data Foundation

Think of LlamaIndex as your data’s brain. It processes and organizes your information, making it readily available for AI applications. With support for over 160 data sources and lightning-fast response times, it’s the perfect foundation for private AI deployments.

Flowise: Your Visual AI Builder

Flowise transforms complex AI workflows into visual puzzles. Drag, drop, and connect components to create sophisticated AI applications without diving deep into code. It’s particularly powerful for:

  • Building RAG pipelines
  • Creating custom chatbots
  • Designing knowledge bases
  • Developing AI agents

Ollama: Your Model Runner

Running AI models locally has never been easier. Ollama manages your models like a skilled librarian, supporting popular options like:

  • Mistral
  • Llama 2
  • CodeLlama
  • And many others

OpenWebUI: Your Interface Layer

Think of OpenWebUI as your AI’s front desk. It provides:

  • Clean chat interfaces
  • Multi-user support
  • Custom pipeline configurations
  • Local data storage

n8n: Your Automation Hub

n8n connects everything together, automating workflows and integrating with your existing tools. With over 350 pre-built integrations, it’s the glue that holds your AI stack together.

Real-World Applications

Document Processing System

Imagine a system where documents flow seamlessly from upload to intelligent responses:

  1. Documents enter through OpenWebUI
  2. LlamaIndex processes and indexes them
  3. Flowise manages the RAG pipeline
  4. Ollama provides local inference
  5. n8n automates the entire workflow

Knowledge Management Solution

Create a private alternative to ChatGPT trained on your data:

  1. LlamaIndex manages your knowledge base
  2. Flowise designs the interaction flows
  3. OpenWebUI provides the interface
  4. Ollama serves responses locally
  5. n8n handles integrations

Making It Work Together

The magic happens when these tools collaborate:

LlamaIndex + Flowise:

  • Seamless data processing
  • Visual RAG pipeline creation
  • Efficient knowledge retrieval

Flowise + OpenWebUI:

  • User-friendly interfaces
  • Custom interaction flows
  • Real-time responses

n8n + Everything:

  • Automated workflows
  • System integrations
  • Process orchestration

Looking Ahead

The self-hosted AI landscape continues to evolve. These tools receive regular updates, adding features and improving performance. By building your stack now, you’re investing in a future of AI independence.

Final Thoughts

Building a private AI stack isn’t just about privacy or cost savings—it’s about taking control of your AI future. With these tools, you can create sophisticated AI solutions while keeping your data secure and your costs predictable.

Ready to start building your private AI stack? Begin with one component and gradually expand. The journey to AI independence starts with a single step.

Building a Complete Self-hosted AI Development Environment

Introduction

In today’s AI landscape, having a secure, efficient, and self-contained development environment is crucial. This guide presents a comprehensive solution that combines best-in-class open-source tools for AI development, all running locally on your infrastructure.

Key Components

  • Ollama: Run state-of-the-art language models locally
  • n8n: Create automated AI workflows
  • Qdrant: Vector database for semantic search
  • Unstructured: Advanced document processing
  • Argilla: Data labeling and validation
  • Opik: Model evaluation and monitoring
  • JupyterLab: Interactive development environment

Benefits

  • Complete data privacy and control
  • No cloud dependencies
  • Cost-effective solution
  • Customizable infrastructure
  • Seamless tool integration

Prerequisites

Hardware Requirements

  • CPU: 4+ cores recommended
  • RAM: 16GB minimum, 32GB recommended
  • Storage: 50GB+ free space
  • GPU: NVIDIA GPU with 8GB+ VRAM (optional)

Software Requirements

# Check Docker version
docker --version
# Should be 20.10.0 or higher

# Check Docker Compose version
docker compose version
# Should be 2.0.0 or higher

# Check Git version
git --version
# Should be 2.0.0 or higher

System Preparation

# Create project directory
mkdir -p ai-development-environment
cd ai-development-environment

# Create required subdirectories
mkdir -p notebooks
mkdir -p shared
mkdir -p n8n/backup
mkdir -p data/documents
mkdir -p data/processed
mkdir -p data/vectors

Directory Structure

ai-development-environment/
├── docker-compose.yml
├── .env
├── notebooks/
│   ├── examples/
│   └── templates/
├── shared/
│   ├── documents/
│   └── processed/
├── n8n/
│   └── backup/
└── data/
    ├── documents/
    ├── processed/
    └── vectors/

Configuration Files

Environment Variables (.env)

# Database Configuration
POSTGRES_USER=n8n
POSTGRES_PASSWORD=n8n
POSTGRES_DB=n8n

# n8n Security
N8N_ENCRYPTION_KEY=1234567890
N8N_USER_MANAGEMENT_JWT_SECRET=1234567890

# Service Configuration
JUPYTER_TOKEN=masterclass
ARGILLA_PASSWORD=masterclass

# Resource Limits
POSTGRES_MAX_CONNECTIONS=100
ELASTICSEARCH_HEAP_SIZE=1g

Docker Compose Configuration

Create docker-compose.yml:

version: '3.8'

volumes:
  n8n_storage:
    driver: local
  postgres_storage:
    driver: local
  ollama_storage:
    driver: local
  qdrant_storage:
    driver: local
  open-webui:
    driver: local
  jupyter_data:
    driver: local
  opik_data:
    driver: local
  elasticsearch_data:
    driver: local

networks:
  demo:
    driver: bridge
    ipam:
      config:
        - subnet: 172.28.0.0/16

services:
  jupyter:
    image: jupyter/datascience-notebook:lab-4.0.6
    networks: ['demo']
    ports:
      - "8888:8888"
    volumes:
      - jupyter_data:/home/jovyan
      - ./notebooks:/home/jovyan/work
      - ./shared:/home/jovyan/shared
    environment:
      - JUPYTER_ENABLE_LAB=yes
      - JUPYTER_TOKEN=${JUPYTER_TOKEN}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    command: start-notebook.py --NotebookApp.token='${JUPYTER_TOKEN}'
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8888/api"]
      interval: 30s
      timeout: 10s
      retries: 3

  unstructured:
    image: quay.io/unstructured-io/unstructured-api:latest
    networks: ['demo']
    ports:
      - "8000:8000"
    volumes:
      - ./shared:/home/unstructured/shared
    command: --port 8000 --host 0.0.0.0
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  opik:
    image: comet/opik:latest
    networks: ['demo']
    ports:
      - "5173:5173"
    volumes:
      - opik_data:/root/opik
      - ./shared:/root/shared
    environment:
      - OPIK_BASE_URL=http://localhost:5173/api
    restart: unless-stopped

  argilla:
    image: argilla/argilla-server:latest
    networks: ['demo']
    ports:
      - "6900:6900"
    environment:
      - ARGILLA_ELASTICSEARCH=http://elasticsearch:9200
      - DEFAULT_USER_PASSWORD=${ARGILLA_PASSWORD}
    depends_on:
      elasticsearch:
        condition: service_healthy
    restart: unless-stopped

  elasticsearch:
    image: elasticsearch:8.11.0
    networks: ['demo']
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms512m -Xmx${ELASTICSEARCH_HEAP_SIZE}
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -vq '\"status\":\"red\"'"]
      interval: 20s
      timeout: 10s
      retries: 5
  # Workflow Automation
  n8n:
    <<: *service-n8n
    container_name: n8n
    restart: unless-stopped
    ports:
      - 5678:5678
    volumes:
      - n8n_storage:/home/node/.n8n
      - ./n8n/backup:/backup
      - ./shared:/data/shared
    depends_on:
      postgres:
        condition: service_healthy
      n8n-import:
        condition: service_completed_successfully

  n8n-import:
    <<: *service-n8n
    container_name: n8n-import
    entrypoint: /bin/sh
    command:
      - "-c"
      - "n8n import:credentials --separate --input=/backup/credentials && n8n import:workflow --separate --input=/backup/workflows"
    volumes:
      - ./n8n/backup:/backup
    depends_on:
      postgres:
        condition: service_healthy

  # Chat Interface
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    networks: ['demo']
    restart: unless-stopped
    container_name: open-webui
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - open-webui:/app/backend/data

  # Language Models
  ollama-cpu:
    profiles: ["cpu"]
    <<: *service-ollama

  ollama-gpu:
    profiles: ["gpu-nvidia"]
    <<: *service-ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  ollama-pull-llama-cpu:
    profiles: ["cpu"]
    <<: *init-ollama
    depends_on:
      - ollama-cpu

  ollama-pull-llama-gpu:
    profiles: ["gpu-nvidia"]
    <<: *init-ollama
    depends_on:
      - ollama-gpu

This completes the docker-compose.yml configuration, combining the original starter kit services with our additional AI development tools. The setup provides a complete environment for AI development, document processing, and workflow automation.

Service Integration Examples

Python Code Examples

Create a new notebook in JupyterLab with these integration examples:

# Document Processing Pipeline
import requests
from pathlib import Path

# Unstructured API Integration
def process_document(file_path):
    with open(file_path, 'rb') as f:
        response = requests.post(
            'http://unstructured:8000/general/v0/general',
            files={'files': f}
        )
    return response.json()

# Ollama Integration
def query_llm(prompt):
    response = requests.post(
        'http://ollama:11434/api/generate',
        json={'model': 'llama3.1', 'prompt': prompt}
    )
    return response.json()

# Qdrant Integration
from qdrant_client import QdrantClient

def store_embeddings(vectors, metadata):
    client = QdrantClient(host='qdrant', port=6333)
    client.upsert(
        collection_name="documents",
        points=vectors,
        payload=metadata
    )

AI Templates and Workflows

Document Processing Workflow

  1. Upload documents to shared directory
  2. Process with Unstructured API
  3. Generate embeddings with Ollama
  4. Store in Qdrant
  5. Query through n8n workflows

Docker Compose Profiles

The project uses different Docker Compose profiles to accommodate various hardware configurations:

For NVIDIA GPU Users

docker compose --profile gpu-nvidia pull
docker compose create && docker compose --profile gpu-nvidia up

This profile enables GPU acceleration for Ollama, providing faster inference times for language models[1].

For Apple Silicon (M1/M2)

docker compose pull
docker compose create && docker compose up

Since GPU access isn’t available in Docker on Apple Silicon, this profile runs without GPU specifications[1].

For CPU-only Systems

docker compose --profile cpu pull
docker compose create && docker compose --profile cpu up

This profile configures services to run on CPU only, suitable for systems without dedicated GPUs[1].

Service Configurations

Core Services

  • n8n: Workflow automation platform with AI capabilities
  • Ollama: Local LLM service with configurable GPU/CPU profiles
  • Qdrant: Vector database for embeddings
  • PostgreSQL: Database backend for n8n
  • Open WebUI: Chat interface for model interaction

Additional Services

  • Unstructured: Document processing service
  • Argilla: Data labeling platform
  • Opik: Model evaluation tools
  • JupyterLab: Development environment

Volume Management

Each service has dedicated persistent storage:

  • n8n_storage
  • postgres_storage
  • ollama_storage
  • qdrant_storage
  • elasticsearch_data
  • jupyter_data

Networking

All services communicate through a shared ‘demo’ network, allowing internal service discovery and communication

Browser Use Agent

Make websites accessible for AI agents 🤖.

Browser use is the easiest way to connect your AI agents with the browser. And it’s Free…

https://github.com/gregpr07/browser-use

from langchain_openai import ChatOpenAI
from browser_use import Agent

agent = Agent(
task="Find a one-way flight from Bali to Oman on 12 January 2025 on Google Flights. Return me the cheapest option.",
llm=ChatOpenAI(model="gpt-4o"),
)

# ... inside an async function
await agent.run()

PROMPT++ Automatic prompt engineering

Your Ultimate AI Prompt Rewriting Assistant! 🤖✍️


Are you struggling to get the right responses from AI?
Say hello to Prompt++, the game-changing tool that’s revolutionizing how we interact with AI!

🔑 Key Features:
• FREE Intelligent Prompt Rewriting
• Real-Time Optimization
• Detailed Explanation of Improvements
• User-Friendly Interface

💡 How it works:
1. Input your original prompt
2. Watch it transform instantly
3. Understand the improvements
4. Learn and enhance your skills

🏆 Benefits:
• Better AI outputs
• Time-saving
• Educational
• Increased productivity

Whether you’re a seasoned AI user or just getting started, Prompt++ is your personal prompt engineering expert. It rewrites and optimizes your prompts, ensuring every AI interaction is as effective as possible.

https://baconnier-prompt-plus-plus.hf.space

Singular Value-Rotation Adaptation with Full Rank (SVRA-FR)

A Novel Approach for Efficient Fine-Tuning of Large Language Models

Abstract

We present Singular Value-Rotation Adaptation with Full Rank (SVRA-FR), a novel method for efficient fine-tuning of large language models. SVRA-FR leverages the full singular value decomposition (SVD) of weight matrices, allowing for comprehensive adjustments through singular value modification and singular vector rotation. This approach offers a parameter-efficient, interpretable, and potentially more effective alternative to existing fine-tuning methods, particularly Low-Rank Adaptation (LoRA).

1. Introduction

Large language models have demonstrated remarkable performance across various natural language processing tasks. However, fine-tuning these models for specific tasks remains computationally expensive and often requires significant amounts of data. Recent work on parameter-efficient fine-tuning methods, such as LoRA, has shown promise in reducing these costs. Our work builds upon these approaches by introducing a method that directly manipulates the full SVD components of weight matrices.

2. Method

SVRA-FR consists of the following key components:

2.1 Singular Value Decomposition

We begin by performing SVD on the original weight matrix W:

W = UΣV^T

where U and V are orthogonal matrices containing left and right singular vectors, respectively, and Σ is a diagonal matrix of singular values. This decomposition allows us to represent the weight matrix in terms of its principal components, with singular values indicating the importance of each component.

2.2 Trainable Parameters

SVRA-FR introduces three sets of trainable parameters:

a) Δσ: A vector for adjusting all singular values
b) θ_U: A vector for rotating all left singular vectors
c) θ_V: A vector for rotating all right singular vectors

These parameters allow for fine-grained control over the matrix’s structure and information content.

2.3 Singular Value Adjustment

We modify all singular values:

σ’_i = σ_i + Δσ_i

This adjustment allows us to amplify or attenuate the importance of different components in the weight matrix. By modifying singular values, we can control the « strength » of different features or directions in the weight space.

2.4 Singular Vector Rotation

We apply rotation to all left and right singular vectors:

u’_i = R(θ_U_i)u_i
v’_i = R(θ_V_i)v_i

where R(θ) is a 2D rotation matrix:

R(θ) = [cos(θ) -sin(θ); sin(θ) cos(θ)]

Rotation of singular vectors allows us to adjust the directions of the principal components in the weight space. This can be particularly useful for aligning the model’s features with task-specific requirements without drastically changing the overall structure of the weight matrix.

2.5 Matrix Reconstruction

We reconstruct the adaptation matrix:

W_adapt = U’Σ’V’^T

where U’ and V’ contain the rotated singular vectors and Σ’ is the diagonal matrix of adjusted singular values. This reconstruction combines the effects of singular value adjustments and vector rotations into a single adaptation matrix.

2.6 Weight Update

The final weight update is applied additively:

W_new = W + αW_adapt

where α is a scaling factor. This additive update allows us to preserve the original pre-trained weights while incorporating task-specific adaptations.

3. Comparison with LoRA

SVRA-FR differs from LoRA in several key aspects:

3.1 Parameter Efficiency

For a weight matrix of size m x n, SVRA-FR introduces min(m, n) + m + n trainable parameters, compared to LoRA’s 2r(m+n), where r is the LoRA rank. For large matrices and typical LoRA ranks, SVRA-FR is often more parameter-efficient. This efficiency stems from directly modifying the SVD components rather than introducing separate low-rank matrices.

3.2 Full Rank Adaptation

Unlike LoRA, which uses low-rank matrices, SVRA-FR works with the full SVD, potentially allowing for more comprehensive adaptations. This full-rank approach enables adjustments across the entire weight space, which may be beneficial for tasks requiring fine-grained modifications.

3.3 Direct Manipulation of Matrix Structure

SVRA-FR directly modifies the singular values and vectors of the original matrix, potentially preserving more of the pre-trained structure. This direct manipulation allows for more interpretable changes and may lead to better preservation of the model’s original capabilities.

4. Advantages

  1. Parameter Efficiency: SVRA-FR introduces a small number of trainable parameters relative to the original matrix size, enabling efficient fine-tuning even for very large models.
  2. Comprehensive Adaptation: By working with the full SVD, SVRA-FR allows for adjustments across the entire weight space, potentially capturing complex task-specific requirements.
  3. Interpretability: Changes to singular values and singular vector rotations have clear mathematical interpretations, providing insights into how the model adapts to new tasks.
  4. Preservation of Pre-trained Knowledge: By manipulating the existing SVD structure, SVRA-FR potentially preserves more of the pre-trained model’s knowledge while allowing for task-specific adaptations.
  5. Flexibility: The method allows for both global (singular value adjustments) and targeted (rotations) modifications to the weight matrices, providing a versatile approach to fine-tuning.

5. Potential Challenges

  1. Computational Cost: Computing the full SVD for large matrices can be computationally expensive during initialization. This could be mitigated by using approximate or iterative SVD algorithms.
  2. Optimization Complexity: Training rotations might require careful optimization strategies, as the parameter space for rotations can be more complex than standard linear transformations.
  3. Overfitting Risk: The flexibility of full-rank adaptation might lead to overfitting on smaller datasets. Regularization techniques specific to SVD components might need to be developed.

6. Discussion

SVRA-FR offers a novel approach to fine-tuning large language models by directly manipulating their SVD structure. This method combines the efficiency of parameter-efficient fine-tuning techniques with the comprehensiveness of full-rank adaptations. By allowing for targeted adjustments to singular values and rotations of singular vectors, SVRA-FR provides a flexible framework for adapting pre-trained models to specific tasks.

The full-rank nature of SVRA-FR is a key differentiator from methods like LoRA. While this could potentially lead to more comprehensive adaptations, it also raises questions about the trade-off between flexibility and the risk of overfitting. Empirical studies will be crucial to understand these trade-offs across various tasks and model sizes.

7. Future Work

Future research directions include:

  • Empirical evaluation of SVRA-FR across various NLP tasks and model sizes
  • Comparison with other parameter-efficient fine-tuning methods, including LoRA and adapter-based approaches
  • Investigation of fast SVD techniques to reduce initialization time
  • Exploration of regularization techniques specific to SVD components to mitigate potential overfitting
  • Analysis of the interplay between singular value adjustments and singular vector rotations
  • Development of visualization tools to interpret the changes made by SVRA-FR during fine-tuning

8. Conclusion

SVRA-FR represents a promising new direction in efficient fine-tuning of large language models. By leveraging the full SVD structure of weight matrices, it offers a parameter-efficient, interpretable, and flexible approach to model adaptation. While further empirical validation is needed, SVRA-FR has the potential to significantly improve the efficiency and effectiveness of fine-tuning large language models for specific tasks, particularly in scenarios where comprehensive adaptations are beneficial. The method’s ability to directly manipulate the core structure of weight matrices opens up new possibilities for understanding and controlling the adaptation process in deep learning models.

Sources: Loic Baconnier

Scalable. Interactive. Interpretable Data Science

Safe, interpretable, trustworthy AI, through interactive intelligent visualization, with applications in adversarial machine learning (protecting AI from harm and doing harm), scalable discoveries of deep learning models, and inclusive AI for everyone.

A Must.

https://poloclub.github.io/

Transformer Explainer

Learn How Transformer Models Work with Interactive Visualization

https://poloclub.github.io/transformer-explainer/

    Diffusion Explainer

    Learn how Stable Diffusion transforms your text prompt into image.

    https://poloclub.github.io/diffusion-explainer/