> For the complete documentation index, see [llms.txt](https://docs.clickai.vn/clickai-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clickai.vn/clickai-docs/clickai-docs-en/database/configure-the-chunk-settings.md).

# Configure the Chunk Settings

## Table of Contents

·       \[What is Chunking?]\(#what-is-chunking)

·       \[Choose a Chunk Mode]\(#choose-a-chunk-mode)

·       \[Pre-process Text Before Chunking]\(#pre-process-text-before-chunking)

·       \[Enable Summary Auto-Gen]\(#enable-summary-auto-gen)

·       \[Preview Chunks]\(#preview-chunks)

&#x20;

## What is Chunking?

Chunking is the process of splitting long documents into shorter text segments (called "chunks"). This is a critical step in building a Knowledge Base because:

·       Chunks that are too long may contain irrelevant information, causing noise during retrieval

·       Chunks that are too short may lack context, leading to incomplete answers

·       Properly sized chunks result in more accurate retrieval

Two key concepts:

·       Delimiter: The character or sequence where text is split. For example, \n\n splits at paragraph breaks, \n at line breaks.

📝 NOTE: Delimiters are removed during chunking. For example, using \`A\` as the delimiter splits \`CBACD\` into \`CB\` and \`CD\`. To avoid information loss, use non-content characters that don't naturally appear in your documents.

·       Maximum Chunk Length: The maximum size of each chunk in characters. Text exceeding this limit is force-split regardless of delimiter settings.

&#x20;

## Choose a Chunk Mode

ClickAI provides 4 chunking modes:

### Mode Overview

<table data-header-hidden><thead><tr><th valign="top"></th><th valign="top"></th><th valign="top"></th></tr></thead><tbody><tr><td valign="top">Mode</td><td valign="top">Description</td><td valign="top">When to use</td></tr><tr><td valign="top">General</td><td valign="top">Splits by delimiter and max size. Flexible and fits most cases.</td><td valign="top">General documents, FAQs, guides</td></tr><tr><td valign="top">Parent-Child</td><td valign="top">Creates large chunks (parent) containing smaller chunks (child). Retrieval targets child but returns fuller parent context.</td><td valign="top">Technical docs, docs needing broader context</td></tr><tr><td valign="top">Paragraph</td><td valign="top">Splits by natural paragraphs.</td><td valign="top">Documents with clear paragraph structure</td></tr><tr><td valign="top">Full Doc</td><td valign="top">Keeps the entire document as a single chunk.</td><td valign="top">Short documents, policy documents</td></tr></tbody></table>

&#x20;

### Quick Comparison

<table data-header-hidden><thead><tr><th valign="top"></th><th valign="top"></th><th valign="top"></th><th valign="top"></th><th valign="top"></th></tr></thead><tbody><tr><td valign="top">Criteria</td><td valign="top">General</td><td valign="top">Parent-Child</td><td valign="top">Paragraph</td><td valign="top">Full Doc</td></tr><tr><td valign="top">Flexibility</td><td valign="top">High</td><td valign="top">Medium</td><td valign="top">Low</td><td valign="top">Low</td></tr><tr><td valign="top">Broad context</td><td valign="top">Medium</td><td valign="top">High</td><td valign="top">Medium</td><td valign="top">Very High</td></tr><tr><td valign="top">Retrieval accuracy</td><td valign="top">High</td><td valign="top">Very High</td><td valign="top">High</td><td valign="top">Low</td></tr><tr><td valign="top">Suits long docs</td><td valign="top">✅</td><td valign="top">✅</td><td valign="top">✅</td><td valign="top">❌</td></tr><tr><td valign="top">Suits short docs</td><td valign="top">✅</td><td valign="top">❌</td><td valign="top">✅</td><td valign="top">✅</td></tr></tbody></table>

&#x20;

### Notes on Parent-Child Mode

·       Only the first 10,000 tokens are processed. Content beyond this limit will be truncated.

·       The parent chunk cannot be edited once created. To modify it, you must upload a new document.

⚠️ IMPORTANT: Choosing the right chunk mode is a critical step that directly affects retrieval quality. Experiment with different modes and use the Test Retrieval feature to evaluate results.

&#x20;

## Pre-process Text Before Chunking

ClickAI provides pre-processing options to clean text before chunking:

### Replace consecutive spaces, newlines, and tabs

Automatically normalizes whitespace:

·       Three or more consecutive newlines → two newlines

·       Multiple spaces → single space

·       Tabs, form feeds, and special Unicode spaces → regular space

### Remove all URLs and email addresses

Strips all URLs and email addresses from text content.

📝 NOTE: This setting is ignored in Full Doc mode.

&#x20;

## Enable Summary Auto-Gen

When Summary Auto-Gen is enabled, ClickAI automatically generates summaries for each chunk using an LLM. Summaries help:

·       Improve retrieval when user queries differ from document language

·       Add high-level information for chunks containing technical content (code, tables, logs)

·       Create "semantic glue" — apply identical summaries to related chunks for grouped retrieval

💡 TIP: Summary Auto-Gen is especially useful when source documents use specialized jargon but users ask questions in everyday natural language.

&#x20;

## Preview Chunks

After configuring chunk settings, click Preview to review results:

·       See how documents are split into chunks

·       Inspect content of each chunk

·       Adjust configuration if results are unsatisfactory

Check chunk quality:

·       Chunks too short — May lack sufficient context, leading to semantic loss and inaccurate answers

·       Chunks too long — May include irrelevant information, introducing semantic noise and lowering retrieval precision

·       Semantically incomplete chunks — Caused by forced chunking that cuts through sentences or paragraphs, resulting in missing or misleading content

⚠️ IMPORTANT: Always preview and check chunk quality before proceeding with indexing. Re-indexing later costs additional time and resources.

&#x20;

*📖 Previous: \[Quick Create Overview] · Next: \[Index Method & Retrieval Settings]*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clickai.vn/clickai-docs/clickai-docs-en/database/configure-the-chunk-settings.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
