Configure the Chunk Settings
Learn how to split documents into appropriate chunks to optimize information retrieval.
Table of Contents
Β· [What is Chunking?](#what-is-chunking)
Β· [Choose a Chunk Mode](#choose-a-chunk-mode)
Β· [Pre-process Text Before Chunking](#pre-process-text-before-chunking)
Β· [Enable Summary Auto-Gen](#enable-summary-auto-gen)
Β· [Preview Chunks](#preview-chunks)
What is Chunking?
Chunking is the process of splitting long documents into shorter text segments (called "chunks"). This is a critical step in building a Knowledge Base because:
Β· Chunks that are too long may contain irrelevant information, causing noise during retrieval
Β· Chunks that are too short may lack context, leading to incomplete answers
Β· Properly sized chunks result in more accurate retrieval
Two key concepts:
Β· Delimiter: The character or sequence where text is split. For example, \n\n splits at paragraph breaks, \n at line breaks.
π NOTE: Delimiters are removed during chunking. For example, using `A` as the delimiter splits `CBACD` into `CB` and `CD`. To avoid information loss, use non-content characters that don't naturally appear in your documents.
Β· Maximum Chunk Length: The maximum size of each chunk in characters. Text exceeding this limit is force-split regardless of delimiter settings.
Choose a Chunk Mode
ClickAI provides 4 chunking modes:
Mode Overview
Mode
Description
When to use
General
Splits by delimiter and max size. Flexible and fits most cases.
General documents, FAQs, guides
Parent-Child
Creates large chunks (parent) containing smaller chunks (child). Retrieval targets child but returns fuller parent context.
Technical docs, docs needing broader context
Paragraph
Splits by natural paragraphs.
Documents with clear paragraph structure
Full Doc
Keeps the entire document as a single chunk.
Short documents, policy documents
Quick Comparison
Criteria
General
Parent-Child
Paragraph
Full Doc
Flexibility
High
Medium
Low
Low
Broad context
Medium
High
Medium
Very High
Retrieval accuracy
High
Very High
High
Low
Suits long docs
β
β
β
β
Suits short docs
β
β
β
β
Notes on Parent-Child Mode
Β· Only the first 10,000 tokens are processed. Content beyond this limit will be truncated.
Β· The parent chunk cannot be edited once created. To modify it, you must upload a new document.
β οΈ IMPORTANT: Choosing the right chunk mode is a critical step that directly affects retrieval quality. Experiment with different modes and use the Test Retrieval feature to evaluate results.
Pre-process Text Before Chunking
ClickAI provides pre-processing options to clean text before chunking:
Replace consecutive spaces, newlines, and tabs
Automatically normalizes whitespace:
Β· Three or more consecutive newlines β two newlines
Β· Multiple spaces β single space
Β· Tabs, form feeds, and special Unicode spaces β regular space
Remove all URLs and email addresses
Strips all URLs and email addresses from text content.
π NOTE: This setting is ignored in Full Doc mode.
Enable Summary Auto-Gen
When Summary Auto-Gen is enabled, ClickAI automatically generates summaries for each chunk using an LLM. Summaries help:
Β· Improve retrieval when user queries differ from document language
Β· Add high-level information for chunks containing technical content (code, tables, logs)
Β· Create "semantic glue" β apply identical summaries to related chunks for grouped retrieval
π‘ TIP: Summary Auto-Gen is especially useful when source documents use specialized jargon but users ask questions in everyday natural language.
Preview Chunks
After configuring chunk settings, click Preview to review results:
Β· See how documents are split into chunks
Β· Inspect content of each chunk
Β· Adjust configuration if results are unsatisfactory
Check chunk quality:
Β· Chunks too short β May lack sufficient context, leading to semantic loss and inaccurate answers
Β· Chunks too long β May include irrelevant information, introducing semantic noise and lowering retrieval precision
Β· Semantically incomplete chunks β Caused by forced chunking that cuts through sentences or paragraphs, resulting in missing or misleading content
β οΈ IMPORTANT: Always preview and check chunk quality before proceeding with indexing. Re-indexing later costs additional time and resources.
π Previous: [Quick Create Overview] Β· Next: [Index Method & Retrieval Settings]
Last updated