Learn More
How It Works
Tchunky processes your documents through several stages to create LLM-optimized chunks with rich context.
1. Document Analysis
When you submit a document, Tchunky first analyzes it to extract key metadata. This includes:
- Document title and citation detection
- Category classification (e.g., book, report, email thread)
- Length analysis and token estimation
- Generation of a concise summary using Claude 3
2. Section Detection
Next, Tchunky identifies logical sections using advanced language models. The process involves:
- Semantic boundary detection using transformer models
- Hierarchical structure preservation
- Context window optimization (typically 8k-32k tokens)
- Section-level metadata extraction
3. Chunk Generation
Finally, each section is split into smaller chunks optimized for LLM context windows. Each chunk is enhanced with:
- Hierarchical context (document → section → chunk)
- Semantic keywords for improved retrieval
- Temporal references and timeframes
- Source citations and references
Technical Specifications
- Uses Qwen 2.5-72B for analysis
- Claude 3.5 Sonnet for sectioning and adding context to chunks
- Approximately 500 tokens per chunk
- Processing: Asynchronous with webhook notifications
- Output Format: JSON with rich metadata