In deepeval
, a metric serves as a standard of measurement for evaluating the performance of an LLM output based on a specific criteria of interest. Essentially, while the metric acts as the ruler, a test case represents the thing you’re trying to measure. deepeval
offers a range of default metrics for you to quickly get started with, such as:
- G-Eval
- Summarization
- Faithfulness
- Answer Relevancy
- Contextual Relevancy
- Contextual Precision
- Contextual Recall
- Ragas
- Hallucination
- Toxicity
- Bias
deepeval
also offers conversational metrics, which are metrics used to evaluate conversations instead of individual, granular LLM interactions. These include:
- Conversation Completeness
- Conversation Relevancy
- Knowledge Retention
https://docs.confident-ai.com/docs/metrics-introduction