Knowledgebase

The knowledgebase allows you to upload, process, and search through your documents with intelligent vector search and chunking. Documents of various types are automatically processed, embedded, and made searchable. Your documents are intelligently chunked, and you can view, edit, and search through them using natural language queries.

Upload and Processing

Simply upload your documents to get started. Sim automatically processes them in the background, extracting text, creating embeddings, and breaking them into searchable chunks.

The system handles the entire processing pipeline for you:

Text Extraction: Content is extracted from your documents using specialized parsers for each file type
Intelligent Chunking: Documents are broken into meaningful chunks with configurable size and overlap
Embedding Generation: Vector embeddings are created for semantic search capabilities
Processing Status: Track the progress as your documents are processed

Supported File Types

Sim supports PDF, Word (DOC/DOCX), plain text (TXT), Markdown (MD), HTML, Excel (XLS/XLSX), PowerPoint (PPT/PPTX), and CSV files. Files can be up to 100MB each, with optimal performance for files under 50MB. You can upload multiple documents simultaneously, and PDF files include OCR processing for scanned documents.

Viewing and Editing Chunks

Once your documents are processed, you can view and edit the individual chunks. This gives you full control over how your content is organized and searched.

Document chunks view showing processed content

Chunk Configuration

Default chunk size: 1,024 characters
Configurable range: 100-4,000 characters per chunk
Smart overlap: 200 characters by default for context preservation
Hierarchical splitting: Respects document structure (sections, paragraphs, sentences)

Editing Capabilities

Edit chunk content: Modify the text content of individual chunks
Adjust chunk boundaries: Merge or split chunks as needed
Add metadata: Enhance chunks with additional context
Bulk operations: Manage multiple chunks efficiently

Advanced PDF Processing

For PDF documents, Sim offers enhanced processing capabilities:

OCR Support

When configured with Azure or Mistral OCR:

Scanned document processing: Extract text from image-based PDFs
Mixed content handling: Process PDFs with both text and images
High accuracy: Advanced AI models ensure accurate text extraction

Using The Knowledge Block in Workflows

Once your documents are processed, you can use them in your AI workflows through the Knowledge block. This enables Retrieval-Augmented Generation (RAG), allowing your AI agents to access and reason over your document content to provide more accurate, contextual responses.

Knowledge Block Features

Semantic search: Find relevant content using natural language queries
Context integration: Automatically include relevant chunks in agent prompts
Dynamic retrieval: Search happens in real-time during workflow execution
Relevance scoring: Results ranked by semantic similarity

Integration Options

System prompts: Provide context to your AI agents
Dynamic context: Search and include relevant information during conversations
Multi-document search: Query across your entire knowledgebase
Filtered search: Combine with tags for precise content retrieval

Vector Search Technology

Sim uses vector search powered by pgvector to understand the meaning and context of your content:

Semantic Understanding

Contextual search: Finds relevant content even when exact keywords don't match
Concept-based retrieval: Understands relationships between ideas
Multi-language support: Works across different languages
Synonym recognition: Finds related terms and concepts

Search Capabilities

Natural language queries: Ask questions in plain English
Similarity search: Find conceptually similar content
Hybrid search: Combines vector and traditional keyword search
Configurable results: Control the number and relevance threshold of results

Document Management

Organization Features

Bulk upload: Upload multiple files at once via the asynchronous API
Processing status: Real-time updates on document processing
Search and filter: Find documents quickly in large collections
Metadata tracking: Automatic capture of file information and processing details

Security and Privacy

Secure storage: Documents stored with enterprise-grade security
Access control: Workspace-based permissions
Processing isolation: Each workspace has isolated document processing
Data retention: Configure document retention policies

Getting Started

Navigate to your knowledgebase: Access from your workspace sidebar
Upload documents: Drag and drop or select files to upload
Monitor processing: Watch as documents are processed and chunked
Explore chunks: View and edit the processed content
Add to workflows: Use the Knowledge block to integrate with your AI agents

The knowledgebase transforms your static documents into an intelligent, searchable resource that your AI workflows can leverage for more informed and contextual responses.

Knowledgebase

On this page