> For the complete documentation index, see [llms.txt](https://docs.clickai.vn/clickai-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clickai.vn/clickai-docs/clickai-docs-en/database/create-a-knowledge-pipeline.md).

# Create a Knowledge Pipeline

## Table of Contents

·       \[Overview]\(#overview)

·       \[Step 1: Create Knowledge Pipeline]\(#step-1-create-knowledge-pipeline)

·       \[Step 2: Orchestrate Knowledge Pipeline]\(#step-2-orchestrate-knowledge-pipeline)

·       \[Step 3: Publish Knowledge Pipeline]\(#step-3-publish-knowledge-pipeline)

·       \[Step 4: Upload Files]\(#step-4-upload-files)

·       \[Step 5: Manage and Use Knowledge Base]\(#step-5-manage-and-use-knowledge-base)

·       \[Authorize Data Source]\(#authorize-data-source)

&#x20;

## Overview

Knowledge Pipeline is an advanced method for creating Knowledge Bases, allowing you to design custom data processing workflows. Compared to Quick Create, Knowledge Pipeline provides:

·       Flexible workflows: Customize each data processing step

·       Plugin support: Use plugins to extract, transform, and load data

·       Reusability: Created pipelines can be reused for multiple Knowledge Bases

·       Complex processing: Suitable for data requiring special preprocessing (OCR, table extraction, etc.)

💡 TIP: If you're just getting started, use Quick Create first. Knowledge Pipeline is best when you need more granular control over data processing.

&#x20;

## Step 1: Create Knowledge Pipeline

1\.     Go to Knowledge from the sidebar

2\.     Click Create Knowledge > select Create from Knowledge Pipeline

3\.     Name and describe the Pipeline

4\.     Click Create to continue

&#x20;

## Step 2: Orchestrate Knowledge Pipeline

In the pipeline editor, you can:

### Add Processing Steps

·       Data Source: Choose input data source (local files, cloud storage, etc.)

·       Document Extractor: Plugin to extract content from documents (PDF parser, OCR, etc.)

·       Text Splitter: Configure how text is split into chunks

·       Embedder: Select the embedding model to create vectors

·       Custom Processors: Add custom processing steps

### Connect Steps

·       Drag and drop to connect nodes

·       Configure parameters for each step

·       Preview results at each step

⚠️ IMPORTANT: Ensure steps are connected in the correct order: Data Source → Extractor → Splitter → Embedder.

&#x20;

## Step 3: Publish Knowledge Pipeline

5\.     After orchestration is complete, click Publish

6\.     The pipeline will be saved and ready to use

7\.     You can edit and re-publish at any time

&#x20;

## Step 4: Upload Files

8\.     In the created Knowledge Base, click Add Documents

9\.     Upload files to process

10\.  The pipeline will automatically apply the configured workflow

11\.  Wait for processing to complete

&#x20;

## Step 5: Manage and Use Knowledge Base

After the pipeline finishes processing, the Knowledge Base works the same as one created via Quick Create:

·       View and manage documents & chunks

·       Test retrieval

·       Integrate into applications

&#x20;

## Authorize Data Source

If the pipeline needs access to external data sources (Google Drive, Dropbox, S3, etc.), you need to grant access:

12\.  Go to Pipeline Settings

13\.  Click Authorize Data Source

14\.  Select the data source and complete authorization

15\.  Once authorized, the pipeline can automatically pull data from the source

📝 NOTE: Authorization tokens have an expiration period. Check and renew periodically to ensure the pipeline continues working.

&#x20;

*📖 Previous: \[Index Method & Retrieval Settings]\(./04-index-retrieval-settings.md) · Next: \[Connect to External Knowledge]\(./06-external-knowledge.md)*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clickai.vn/clickai-docs/clickai-docs-en/database/create-a-knowledge-pipeline.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
