User-Centric RAG

Transforming RAG with LlamaIndex Multi-Agent System and Qdrant

Retrieval-Augmented Generation (RAG) models have evolved significantly over time. Initially, traditional RAG systems faced numerous limitations. However, with advancements in the field, we have seen the emergence of more sophisticated RAG applications. Techniques such as Self-RAG, Hybrid Search RAG, experimenting with different prompting and chunking strategies, and the evolution of Agentic RAG have addressed many of the initial limitations.

https://medium.com/@pavannagula76/user-centric-rag-transforming-rag-with-llamaindex-multi-agent-system-and-qdrant-cf3c32cfe6f3

PDF-Extract-Kit

PDF-Extract-Kit, a comprehensive toolkit for high-quality PDF content extraction, including layout detectionformula detectionformula recognition, and OCR.

PDF documents contain a wealth of knowledge, yet extracting high-quality content from PDFs is not an easy task. To address this, we have broken down the task of PDF content extraction into several components:

  • Layout Detection: Using the LayoutLMv3model for region detection, such as imagestablestitlestext, etc.;
  • Formula Detection: Using YOLOv8 for detecting formulas, including inline formulas and isolated formulas;
  • Formula Recognition: Using UniMERNet for formula recognition;
  • Table Recognition: Using StructEqTable for table recognition;
  • Optical Character Recognition: Using PaddleOCR for text recognition;

https://github.com/opendatalab/PDF-Extract-Kit

https://www.perplexity.ai/search/look-at-this-github-https-gith-8ZVtYO.2SA6_q5Vg.VXy.g