Learn More

How It Works

Tchunky processes your documents through several stages to create LLM-optimized chunks with rich context.

Document Analysis Section Detection Chunk Generation Metadata Title Citation Summary Category Length Logical Parts Hierarchy Structure Context-Rich Chunks Keywords Timeframes

1. Document Analysis

When you submit a document, Tchunky first analyzes it to extract key metadata. This includes:

  • Document title and citation detection
  • Category classification (e.g., book, report, email thread)
  • Length analysis and token estimation
  • Generation of a concise summary using Claude 3

2. Section Detection

Next, Tchunky identifies logical sections using advanced language models. The process involves:

  • Semantic boundary detection using transformer models
  • Hierarchical structure preservation
  • Context window optimization (typically 8k-32k tokens)
  • Section-level metadata extraction

3. Chunk Generation

Finally, each section is split into smaller chunks optimized for LLM context windows. Each chunk is enhanced with:

  • Hierarchical context (document → section → chunk)
  • Semantic keywords for improved retrieval
  • Temporal references and timeframes
  • Source citations and references

Technical Specifications

  • Uses Qwen 2.5-72B for analysis
  • Claude 3.5 Sonnet for sectioning and adding context to chunks
  • Approximately 500 tokens per chunk
  • Processing: Asynchronous with webhook notifications
  • Output Format: JSON with rich metadata