Create a Knowledge Pipeline
Orchestrate complex data processing workflows with custom steps and plugins for advanced needs.
Table of Contents
· [Overview](#overview)
· [Step 1: Create Knowledge Pipeline](#step-1-create-knowledge-pipeline)
· [Step 2: Orchestrate Knowledge Pipeline](#step-2-orchestrate-knowledge-pipeline)
· [Step 3: Publish Knowledge Pipeline](#step-3-publish-knowledge-pipeline)
· [Step 4: Upload Files](#step-4-upload-files)
· [Step 5: Manage and Use Knowledge Base](#step-5-manage-and-use-knowledge-base)
· [Authorize Data Source](#authorize-data-source)
Overview
Knowledge Pipeline is an advanced method for creating Knowledge Bases, allowing you to design custom data processing workflows. Compared to Quick Create, Knowledge Pipeline provides:
· Flexible workflows: Customize each data processing step
· Plugin support: Use plugins to extract, transform, and load data
· Reusability: Created pipelines can be reused for multiple Knowledge Bases
· Complex processing: Suitable for data requiring special preprocessing (OCR, table extraction, etc.)
💡 TIP: If you're just getting started, use Quick Create first. Knowledge Pipeline is best when you need more granular control over data processing.
Step 1: Create Knowledge Pipeline
1. Go to Knowledge from the sidebar
2. Click Create Knowledge > select Create from Knowledge Pipeline
3. Name and describe the Pipeline
4. Click Create to continue
Step 2: Orchestrate Knowledge Pipeline
In the pipeline editor, you can:
Add Processing Steps
· Data Source: Choose input data source (local files, cloud storage, etc.)
· Document Extractor: Plugin to extract content from documents (PDF parser, OCR, etc.)
· Text Splitter: Configure how text is split into chunks
· Embedder: Select the embedding model to create vectors
· Custom Processors: Add custom processing steps
Connect Steps
· Drag and drop to connect nodes
· Configure parameters for each step
· Preview results at each step
⚠️ IMPORTANT: Ensure steps are connected in the correct order: Data Source → Extractor → Splitter → Embedder.
Step 3: Publish Knowledge Pipeline
5. After orchestration is complete, click Publish
6. The pipeline will be saved and ready to use
7. You can edit and re-publish at any time
Step 4: Upload Files
8. In the created Knowledge Base, click Add Documents
9. Upload files to process
10. The pipeline will automatically apply the configured workflow
11. Wait for processing to complete
Step 5: Manage and Use Knowledge Base
After the pipeline finishes processing, the Knowledge Base works the same as one created via Quick Create:
· View and manage documents & chunks
· Test retrieval
· Integrate into applications
Authorize Data Source
If the pipeline needs access to external data sources (Google Drive, Dropbox, S3, etc.), you need to grant access:
12. Go to Pipeline Settings
13. Click Authorize Data Source
14. Select the data source and complete authorization
15. Once authorized, the pipeline can automatically pull data from the source
📝 NOTE: Authorization tokens have an expiration period. Check and renew periodically to ensure the pipeline continues working.
📖 Previous: [Index Method & Retrieval Settings](./04-index-retrieval-settings.md) · Next: [Connect to External Knowledge](./06-external-knowledge.md)
Last updated