Jina

Jina AI is a powerful content extraction tool that seamlessly integrates with Sim to transform web content into clean, readable text. This integration allows developers to easily incorporate web content processing capabilities into their agentic workflows.

Jina AI Reader specializes in extracting the most relevant content from web pages, removing clutter, advertisements, and formatting issues to produce clean, structured text that's optimized for language models and other text processing tasks.

With the Jina AI integration in Sim, you can:

Extract clean content from any web page by simply providing a URL
Process complex web layouts into structured, readable text
Maintain important context while removing unnecessary elements
Prepare web content for further processing in your agent workflows
Streamline research tasks by quickly converting web information into usable data

This integration is particularly valuable for building agents that need to gather and process information from the web, conduct research, or analyze online content as part of their workflow.

Parameter	Type	Required	Description
`url`	string	Yes	The URL to read and convert to markdown (e.g., "https://example.com/page"\)
`useReaderLMv2`	boolean	No	Whether to use ReaderLM-v2 for better quality (3x token cost)
`gatherLinks`	boolean	No	Whether to gather all links at the end
`jsonResponse`	boolean	No	Whether to return response in JSON format
`apiKey`	string	Yes	Your Jina AI API key
`withImagesummary`	boolean	No	Gather all images from the page with metadata
`retainImages`	string	No	Control image inclusion: "none" removes all, "all" keeps all
`returnFormat`	string	No	Output format: markdown, html, text, screenshot, or pageshot
`withIframe`	boolean	No	Include iframe content in extraction
`withShadowDom`	boolean	No	Extract Shadow DOM content
`noCache`	boolean	No	Bypass cached content for real-time retrieval
`withGeneratedAlt`	boolean	No	Generate alt text for images using VLM
`robotsTxt`	string	No	Bot User-Agent for robots.txt checking
`dnt`	boolean	No	Do Not Track - prevents caching/tracking
`noGfm`	boolean	No	Disable GitHub Flavored Markdown

Output

Parameter	Type	Description
`content`	string	The extracted content from the URL, processed into clean, LLM-friendly text

`jina_search`

Search the web and return top 5 results with LLM-friendly content. Each result is automatically processed through Jina Reader API. Supports geographic filtering, site restrictions, and pagination.

Input

Parameter	Type	Required	Description
`q`	string	Yes	Search query string (e.g., "machine learning tutorials")
`apiKey`	string	Yes	Your Jina AI API key
`num`	number	No	Maximum number of results per page (default: 5)
`site`	string	No	Restrict results to specific domain(s). Can be comma-separated for multiple sites (e.g., "jina.ai,github.com")
`withFavicon`	boolean	No	Include website favicons in results
`withImagesummary`	boolean	No	Gather all images from result pages with metadata
`withLinksummary`	boolean	No	Gather all links from result pages
`retainImages`	string	No	Control image inclusion: "none" removes all, "all" keeps all
`noCache`	boolean	No	Bypass cached content for real-time retrieval
`withGeneratedAlt`	boolean	No	Generate alt text for images using VLM
`respondWith`	string	No	Set to "no-content" to get only metadata without page content
`returnFormat`	string	No	Output format: markdown, html, text, screenshot, or pageshot

Output

Parameter	Type	Description
`results`	array	Array of search results, each containing title, description, url, and LLM-friendly content
↳ `title`	string	Page title
↳ `description`	string	Page description or meta description
↳ `url`	string	Page URL
↳ `content`	string	LLM-friendly extracted content
↳ `usage`	object	Token usage information
↳ `tokens`	number	Number of tokens consumed by this request

Jina

Usage Instructions

Tools

`jina_read_url`

Input

Output

`jina_search`

Input

Output

On this page