BERTopic

Publié le 24 août 2024 par loic

BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure.

https://ritvik19.medium.com/papers-explained-193-bertopic-f9aec10cd5a6

GrapRAG code samples

Publié le 24 août 2024 par loic

https://github.com/bacoco/GraphRAG-with-Llama-3.1/blob/main/enhancing_rag_with_graph.ipynb

Self-RAG

Publié le 11 août 2024 par loic

Self-RAG is another form of Retrieval Augmented Generation (RAG). Unlike other RAG retrieval strategies, it doesn’t enhance a specific module within the RAG process. Instead, it optimizes various modules within the RAG framework to improve the overall RAG process. If you’re unfamiliar with Self-RAG or have only heard its name, join me today to understand the implementation principles of Self-RAG and better grasp its details through code.

https://ai.gopubby.com/advanced-rag-retrieval-strategies-self-rag-3e9a4cd422a1

https://llamahub.ai/l/llama-packs/llama-index-packs-self-rag?from=

Rag techniques notebook

Publié le 9 août 2024 par loic

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.

https://github.com/NirDiamant/RAG_Techniques

https://github.com/NirDiamant/RAG_Techniques/tree/main/all_rag_techniques

Perplexica – An AI-powered search engine

Publié le 9 août 2024 par loic

Perplexica is an open-source AI-powered searching tool or an AI-powered search engine that goes deep into the internet to find answers. Inspired by Perplexity AI, it’s an open-source option that not just searches the web but understands your questions. It uses advanced machine learning algorithms like similarity searching and embeddings to refine results and provides clear answers with sources cited.

Using SearxNG to stay current and fully open source, Perplexica ensures you always get the most up-to-date information without compromising your privacy.

https://github.com/ItzCrazyKns/Perplexica

RAG Foundry

Publié le 9 août 2024 par loic

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

RAG Foundry is a library designed to improve LLMs ability to use external information by fine-tuning models on specially created RAG-augmented datasets. The library helps create the data for training, given a RAG technique, helps easily train models using parameter-efficient finetuning (PEFT), and finally can help users measure the improved performance using various, RAG-specific metrics. The library is modular, workflows are customizable using configuration files.

https://github.com/IntelLabs/RAGFoundry

DeepEval: evaluating the performance of an LLM

Publié le 9 août 2024 par loic

In deepeval, a metric serves as a standard of measurement for evaluating the performance of an LLM output based on a specific criteria of interest. Essentially, while the metric acts as the ruler, a test case represents the thing you’re trying to measure. deepeval offers a range of default metrics for you to quickly get started with, such as:

G-Eval
Summarization
Faithfulness
Answer Relevancy
Contextual Relevancy
Contextual Precision
Contextual Recall
Ragas
Hallucination
Toxicity
Bias

deepeval also offers conversational metrics, which are metrics used to evaluate conversations instead of individual, granular LLM interactions. These include:

Conversation Completeness
Conversation Relevancy
Knowledge Retention

https://docs.confident-ai.com/docs/metrics-introduction

BANKER++ Embedding for RAG

Publié le 15 juin 2024 par loic

Fine-tuning an embedding model is a powerful technique for optimizing retrieval augmented generation (RAG) systems in finance. By training a smaller open-source embedding model like BAAI/bge-small-en on a domain-specific dataset, the model learns more meaningful vector representations that capture the nuances and semantics of financial language. This leads to significantly improved retrieval performance compared to using generic pre-trained embeddings.

https://huggingface.co/baconnier/Finance_embedding_large_en-V1.5

Fine-tuned financial embedding models, such as Banker++ RAG, demonstrate superior accuracy on tasks like semantic search, text similarity, and clustering. They enable RAG systems to better understand complex financial jargon and retrieve the most relevant information given a query.

Integrating these specialized embeddings is straightforward using libraries like LlamaIndex or Sentence-Transformers.

As the financial industry increasingly adopts AI, fine-tuned embedding models will play a crucial role in powering domain-specific NLP applications. From analyzing market sentiment to personalizing investment recommendations, these optimized embeddings unlock new possibilities for harnessing unstructured financial data. By combining the power of open-source models with the domain expertise embedded in financial corpora, fine-tuning paves the way for more intelligent and impactful RAG systems in finance.

BANKER ++

Publié le 15 juin 2024 par loic

Banker++ is trained to act like a Senior Banker.

I’m excited to share a resource that could potentially be a valuable addition to your financial toolkit. It’s a Financial LLM Model tailored for corporate entities and financial institutions.

In today’s complex financial landscape, having access to reliable analytical tools is crucial. This model, available at link:

https://huggingface.co/spaces/baconnier/Finance

https://huggingface.co/baconnier/Finance_dolphin-2.9.1-yi-1.5-9b

It offers a framework for assessing various financial scenarios with a level of precision and insight that could benefit your decision-making processes.

However, it’s important to note that this content is strictly for educational purposes and should not be construed as financial advice. Please exercise caution when applying any information provided.

While I’m not suggesting it’s a one-size-fits-all solution or a replacement for professional financial advice, it may offer valuable insights into areas such as risk management, investment strategies, and portfolio optimization.

To illustrate, let’s consider a common question in finance:

Question: What is CDS and how does it compare to a swap?

Answer:

CDS: Credit Default Swap (CDS) is a financial derivative contract between two parties (buyer and seller) for insurance against default or credit risk associated with a bond or loan. The protection buyer pays a premium to the protection seller in exchange for the right to receive payment if a credit event occurs. Typically, the protection seller is a financial institution, while the protection buyer can be an investor or a bank.
Swap: A swap is an agreement between two parties to exchange cash flows, typically involving interest payments or principal at a future date. Common types of swaps include interest rate swaps, currency swaps, and commodity swaps.

CDS differs from swaps as it focuses specifically on credit risk protection, providing insurance against default events. In contrast, swaps involve the exchange of cash flows, often related to interest rates, currencies, or commodities, without directly addressing credit risk.

Feel free to explore the model and see if it aligns with your organization’s needs and objectives. Remember to approach its use with caution and consider consulting with financial experts when making significant decisions.

As we navigate the complexities of the financial world together, let’s remain humble in our pursuit of knowledge and improvement.

AutoGluon: Time Series Forecasting

Publié le 2 mai 2024 par loic

AutoGluon can forecast the future values of multiple time series given the historical data and other related covariates. A single call to AutoGluon TimeSeriesPredictor’s fit() method trains multiple models to generate accurate probabilistic forecasts, and does not require you to manually deal with cumbersome issues like model selection and hyperparameter tuning.

https://auto.gluon.ai/stable/tutorials/timeseries/index.html

https://raw.githubusercontent.com/Innixma/autogluon-doc-utils/main/docs/cheatsheets/stable/autogluon-cheat-sheet.jpeg

L	M	M	J	V	S	D
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Deeplearning.fr

You have to learn the rules of the game. And then you have to play better than anyone else

BERTopic

GrapRAG code samples

Self-RAG

Rag techniques notebook

Perplexica – An AI-powered search engine

RAG Foundry

DeepEval: evaluating the performance of an LLM

BANKER++ Embedding for RAG

BANKER ++

AutoGluon: Time Series Forecasting