Challenges of NLP in Dealing with Structured Documents: The Case of PDFs

Summary:

  • NLP’s expanding real-world applications face a hurdle.
  • Most NLP tasks assume clean, raw text data.
  • In practice, many documents, especially legal ones, are visually structured, like PDFs.
  • Visual Structured Documents (VSDs) pose challenges for content extraction.
  • The discussion primarily focuses on text-only layered PDFs.
  • These PDFs, although considered resolved, still present NLP challenges.

https://blog.llamaindex.ai/mastering-pdfs-extracting-sections-headings-paragraphs-and-tables-with-cutting-edge-parser-faea18870125

RAG: Multi-Document Agents

  • Multi-Document Agents guide explains how to set up an agent that can answer different types of questions over a larger set of documents.
  • The questions include QA over a specific doc, QA comparing different docs, summaries over a specific doc, and comparing summaries between different docs.
  • The architecture involves setting up a « document agent » over each document, which can do QA/summarization within its document, and a top-level agent over this set of document agents, which can do tool retrieval and then do CoT over the set of tools to answer a question.
  • The guide provides code examples using the LlamaIndex and OpenAI libraries.
  • The document agent can dynamically choose to perform semantic search or summarization within a given document.
  • A separate document agent is created for each city.
  • The top-level agent can orchestrate across the different document agents to answer any user query.

https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents.html

Publié dans RAG | Marqué avec