Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

Publié le 22 octobre 2023 par loic

Summary:

NLP’s expanding real-world applications face a hurdle.
Most NLP tasks assume clean, raw text data.
In practice, many documents, especially legal ones, are visually structured, like PDFs.
Visual Structured Documents (VSDs) pose challenges for content extraction.
The discussion primarily focuses on text-only layered PDFs.
These PDFs, although considered resolved, still present NLP challenges.

Publié le 22 octobre 2023 par loic

Multi-Document Agents guide explains how to set up an agent that can answer different types of questions over a larger set of documents.
The questions include QA over a specific doc, QA comparing different docs, summaries over a specific doc, and comparing summaries between different docs.
The architecture involves setting up a « document agent » over each document, which can do QA/summarization within its document, and a top-level agent over this set of document agents, which can do tool retrieval and then do CoT over the set of tools to answer a question.
The guide provides code examples using the LlamaIndex and OpenAI libraries.
The document agent can dynamically choose to perform semantic search or summarization within a given document.
A separate document agent is created for each city.
The top-level agent can orchestrate across the different document agents to answer any user query.