Meet Chonkie, a revolutionary new Python library that’s transforming the way we handle text chunking for RAG (Retrieval-Augmented Generation) applications. This lightweight powerhouse combines simplicity with performance, making it an essential tool for AI developers[3].
Key Features
Core Capabilities
- Feature-rich implementation with comprehensive chunking methods
- Lightning-fast performance with minimal resource requirements
- Universal tokenizer support for maximum flexibility[3]
Chunking Methods
The library offers multiple specialized chunkers:
- TokenChunker for fixed-size token splits
- WordChunker for word-based divisions
- SentenceChunker for sentence-level processing
- RecursiveChunker for hierarchical text splitting
- SemanticChunker for similarity-based chunking
- SDPMChunker utilizing Semantic Double-Pass Merge[3]
Implementation
Getting started with Chonkie is straightforward. Here’s a basic example:
from chonkie import TokenChunker
from tokenizers import Tokenizer
# Initialize tokenizer
tokenizer = Tokenizer.from_pretrained(« gpt2 »)
# Create chunker chunker = TokenChunker(tokenizer)
# Process text
chunks = chunker(« Woah! Chonkie, the chunking library is so cool! »)
# Access results for chunk in chunks: print(f »Chunk: {chunk.text} ») print(f »Tokens: {chunk.token_count} »)
Performance Metrics
The library demonstrates impressive performance:
- Default installation size: 11.2MB
- Token chunking speed: 33x faster than alternatives
- Sentence chunking: 2x performance improvement
- Semantic chunking: 2.5x speed increase[3]
Installation Options
Two installation methods are available:
# Minimal installation
pip install chonkie
# Full installation with all features
pip install chonkie[all]
Also Semantic Chunkers
Semantic Chunkers is a multi-modal chunking library for intelligent chunking of text, video, and audio. It makes your AI and data processing more efficient and accurate.
https://github.com/aurelio-labs/semantic-chunkers?tab=readme-ov-file
Sources
[1] activity https://github.com/chonkie-ai/chonkie/activity
[2] Activity · chonkie-ai/chonkie https://github.com/chonkie-ai/chonkie/activity
[3] chonkie/README.md at main · chonkie-ai/chonkie https://github.com/chonkie-ai/chonkie/blob/main/README.md