Sim

Firecrawl

Scrape, search, crawl, map, and extract web data

Firecrawl is a powerful web scraping and content extraction API that integrates seamlessly into Sim, enabling developers to extract clean, structured content from any website. This integration provides a simple way to transform web pages into usable data formats like Markdown and HTML while preserving the essential content.

With Firecrawl in Sim, you can:

  • Extract clean content: Remove ads, navigation elements, and other distractions to get just the main content
  • Convert to structured formats: Transform web pages into Markdown, HTML, or JSON
  • Capture metadata: Extract SEO metadata, Open Graph tags, and other page information
  • Handle JavaScript-heavy sites: Process content from modern web applications that rely on JavaScript
  • Filter content: Focus on specific parts of a page using CSS selectors
  • Process at scale: Handle high-volume scraping needs with a reliable API
  • Search the web: Perform intelligent web searches and retrieve structured results
  • Crawl entire sites: Crawl multiple pages from a website and aggregate their content

In Sim, the Firecrawl integration enables your agents to access and process web content programmatically as part of their workflows. Supported operations include:

  • Scrape: Extract structured content (Markdown, HTML, metadata) from a single web page.
  • Search: Search the web for information using Firecrawl's intelligent search capabilities.
  • Crawl: Crawl multiple pages from a website, returning structured content and metadata for each page.

This allows your agents to gather information from websites, extract structured data, and use that information to make decisions or generate insights—all without having to navigate the complexities of raw HTML parsing or browser automation. Simply configure the Firecrawl block with your API key, select the operation (Scrape, Search, or Crawl), and provide the relevant parameters. Your agents can immediately begin working with web content in a clean, structured format.

Usage Instructions

Integrate Firecrawl into the workflow. Scrape pages, search the web, crawl entire sites, map URL structures, and extract structured data with AI.

Tools

firecrawl_scrape

Extract structured content from web pages with comprehensive metadata support. Converts content to markdown or HTML while capturing SEO metadata, Open Graph tags, and page information.

Input

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape content from
scrapeOptionsjsonNoOptions for content scraping
apiKeystringYesFirecrawl API key

Output

ParameterTypeDescription
markdownstringPage content in markdown format
htmlstringRaw HTML content of the page
metadataobjectPage metadata including SEO and Open Graph information

Search for information on the web using Firecrawl

Input

ParameterTypeRequiredDescription
querystringYesThe search query to use
apiKeystringYesFirecrawl API key

Output

ParameterTypeDescription
dataarraySearch results data

firecrawl_crawl

Crawl entire websites and extract structured content from all accessible pages

Input

ParameterTypeRequiredDescription
urlstringYesThe website URL to crawl
limitnumberNoMaximum number of pages to crawl (default: 100)
onlyMainContentbooleanNoExtract only main content from pages
apiKeystringYesFirecrawl API Key

Output

ParameterTypeDescription
pagesarrayArray of crawled pages with their content and metadata

firecrawl_map

Get a complete list of URLs from any website quickly and reliably. Useful for discovering all pages on a site without crawling them.

Input

ParameterTypeRequiredDescription
urlstringYesThe base URL to map and discover links from
searchstringNoFilter results by relevance to a search term (e.g., "blog")
sitemapstringNoControls sitemap usage: "skip", "include" (default), or "only"
includeSubdomainsbooleanNoWhether to include URLs from subdomains (default: true)
ignoreQueryParametersbooleanNoExclude URLs containing query strings (default: true)
limitnumberNoMaximum number of links to return (max: 100,000, default: 5,000)
timeoutnumberNoRequest timeout in milliseconds
locationjsonNoGeographic context for proxying (country, languages)
apiKeystringYesFirecrawl API key

Output

ParameterTypeDescription
successbooleanWhether the mapping operation was successful
linksarrayArray of discovered URLs from the website

firecrawl_extract

Extract structured data from entire webpages using natural language prompts and JSON schema. Powerful agentic feature for intelligent data extraction.

Input

ParameterTypeRequiredDescription
urlsjsonYesArray of URLs to extract data from (supports glob format)
promptstringNoNatural language guidance for the extraction process
schemajsonNoJSON Schema defining the structure of data to extract
enableWebSearchbooleanNoEnable web search to find supplementary information (default: false)
ignoreSitemapbooleanNoIgnore sitemap.xml files during scanning (default: false)
includeSubdomainsbooleanNoExtend scanning to subdomains (default: true)
showSourcesbooleanNoReturn data sources in the response (default: false)
ignoreInvalidURLsbooleanNoSkip invalid URLs in the array (default: true)
scrapeOptionsjsonNoAdvanced scraping configuration options
apiKeystringYesFirecrawl API key

Output

ParameterTypeDescription
successbooleanWhether the extraction operation was successful
dataobjectExtracted structured data according to the schema or prompt

Notes

  • Category: tools
  • Type: firecrawl
On this page

On this page

Start building today
Trusted by over 60,000 builders.
Build Agentic workflows visually on a drag-and-drop canvas or with natural language.
Get started