Learn More

How It Works

Tchunky processes your documents through several stages to create LLM-optimized chunks with rich context.

1. Document Analysis

When you submit a document, Tchunky first analyzes it to extract key metadata. This includes:

Document title and citation detection
Category classification (e.g., book, report, email thread)
Length analysis and token estimation
Generation of a concise summary using Claude 3

2. Section Detection

Next, Tchunky identifies logical sections using advanced language models. The process involves:

Semantic boundary detection using transformer models
Hierarchical structure preservation
Context window optimization (typically 8k-32k tokens)
Section-level metadata extraction

3. Chunk Generation

Finally, each section is split into smaller chunks optimized for LLM context windows. Each chunk is enhanced with:

Hierarchical context (document → section → chunk)
Semantic keywords for improved retrieval
Temporal references and timeframes
Source citations and references

Technical Specifications

Uses Qwen 2.5-72B for analysis
Claude 3.5 Sonnet for sectioning and adding context to chunks
Approximately 500 tokens per chunk
Processing: Asynchronous with webhook notifications
Output Format: JSON with rich metadata