Jina AI is a powerful content extraction tool that seamlessly integrates with Sim to transform web content into clean, readable text. This integration allows developers to easily incorporate web content processing capabilities into their agentic workflows.
Jina AI Reader specializes in extracting the most relevant content from web pages, removing clutter, advertisements, and formatting issues to produce clean, structured text that's optimized for language models and other text processing tasks.
With the Jina AI integration in Sim, you can:
- Extract clean content from any web page by simply providing a URL
- Process complex web layouts into structured, readable text
- Maintain important context while removing unnecessary elements
- Prepare web content for further processing in your agent workflows
- Streamline research tasks by quickly converting web information into usable data
This integration is particularly valuable for building agents that need to gather and process information from the web, conduct research, or analyze online content as part of their workflow.
Usage Instructions
Integrate Jina AI into the workflow. Search the web and get LLM-friendly results, or extract clean content from specific URLs with advanced parsing options.
Tools
jina_read_url
Extract and process web content into clean, LLM-friendly text using Jina AI Reader. Supports advanced content parsing, link gathering, and multiple output formats with configurable processing options.
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL to read and convert to markdown |
useReaderLMv2 | boolean | No | Whether to use ReaderLM-v2 for better quality (3x token cost) |
gatherLinks | boolean | No | Whether to gather all links at the end |
jsonResponse | boolean | No | Whether to return response in JSON format |
apiKey | string | Yes | Your Jina AI API key |
withImagesummary | boolean | No | Gather all images from the page with metadata |
retainImages | string | No | Control image inclusion: "none" removes all, "all" keeps all |
returnFormat | string | No | Output format: markdown, html, text, screenshot, or pageshot |
withIframe | boolean | No | Include iframe content in extraction |
withShadowDom | boolean | No | Extract Shadow DOM content |
noCache | boolean | No | Bypass cached content for real-time retrieval |
withGeneratedAlt | boolean | No | Generate alt text for images using VLM |
robotsTxt | string | No | Bot User-Agent for robots.txt checking |
dnt | boolean | No | Do Not Track - prevents caching/tracking |
noGfm | boolean | No | Disable GitHub Flavored Markdown |
Output
| Parameter | Type | Description |
|---|---|---|
content | string | The extracted content from the URL, processed into clean, LLM-friendly text |
links | array | List of links found on the page (when gatherLinks or withLinksummary is enabled) |
images | array | List of images found on the page (when withImagesummary is enabled) |
jina_search
Search the web and return top 5 results with LLM-friendly content. Each result is automatically processed through Jina Reader API. Supports geographic filtering, site restrictions, and pagination.
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
q | string | Yes | Search query string |
apiKey | string | Yes | Your Jina AI API key |
num | number | No | Maximum number of results per page (default: 5) |
site | string | No | Restrict results to specific domain(s). Can be comma-separated for multiple sites (e.g., "jina.ai,github.com") |
withFavicon | boolean | No | Include website favicons in results |
withImagesummary | boolean | No | Gather all images from result pages with metadata |
withLinksummary | boolean | No | Gather all links from result pages |
retainImages | string | No | Control image inclusion: "none" removes all, "all" keeps all |
noCache | boolean | No | Bypass cached content for real-time retrieval |
withGeneratedAlt | boolean | No | Generate alt text for images using VLM |
respondWith | string | No | Set to "no-content" to get only metadata without page content |
returnFormat | string | No | Output format: markdown, html, text, screenshot, or pageshot |
Output
| Parameter | Type | Description |
|---|---|---|
results | array | Array of search results, each containing title, description, url, and LLM-friendly content |
Notes
- Category:
tools - Type:
jina