Context.dev

Context.dev is a web data API that scrapes, crawls, searches, and extracts data from the web, and resolves brand and company data from a domain, name, email, ticker, or transaction descriptor.

With Context.dev, you can:

Scrape and crawl pages: Convert URLs to clean markdown or HTML, capture screenshots, discover images, crawl entire sites, and map sitemaps
Search the web: Run natural language searches with domain filters and optional markdown scraping of results
Extract structured data: Pull data matching a JSON schema, or detect and extract product details and catalogs from a page or domain
Analyze brand and design data: Extract a domain's fonts and design system, classify a brand into NAICS/SIC industry codes, and resolve brand data (logos, colors, socials, address) from a domain, company name, email, ticker, or transaction descriptor

In Sim, the Context.dev integration allows your agents to scrape and crawl web pages into markdown or HTML, capture screenshots, search the web, extract structured data and product information, pull a site's fonts and style guide, classify a brand's industry, and look up brand assets and company details by domain, name, email, ticker, or transaction — all through a single set of API calls in your workflow.

Usage Instructions

Integrate Context.dev into the workflow. Scrape pages to markdown or HTML, capture screenshots, list images, crawl entire sites, map sitemaps, search the web, extract structured data and products, pull design systems, classify industries, and retrieve brand assets by domain, name, email, ticker, or transaction — all from one API.

Parameter	Type	Required	Description
`url`	string	Yes	The full URL to scrape (must include http:// or https://)
`useMainContentOnly`	boolean	No	Return only main content, excluding headers, footers, and navigation
`includeLinks`	boolean	No	Preserve hyperlinks in the markdown output (default: true)
`includeImages`	boolean	No	Include image references in the markdown output (default: false)
`includeFrames`	boolean	No	Render iframe contents inline (default: false)
`maxAgeMs`	number	No	Cache duration in milliseconds (0-2592000000, default: 86400000)
`waitForMs`	number	No	Browser wait time after page load in milliseconds (0-30000)
`timeoutMS`	number	No	Request timeout in milliseconds (1000-300000)
`apiKey`	string	Yes	Context.dev API key

Parameter	Type	Description
`html`	string	Raw HTML content of the page
`url`	string	The scraped URL
`type`	string	Detected content type (html, xml, json, text, csv, markdown, svg, pdf, doc, docx)

Parameter	Type	Description
`success`	boolean	Whether the scrape succeeded
`images`	array	Discovered image assets with source, element, type, and optional enrichment
↳ `src`	string	Image source URL or data
↳ `element`	string	Source element (img, svg, link, source, video, css, object, meta, background)
↳ `type`	string	Image representation (url, html, base64)
↳ `alt`	string	Alt text
↳ `enrichment`	json	Optional enrichment (width, height, mimetype, url, type) when requested
`url`	string	The scraped URL

Parameter	Type	Description
`file`	file	Stored screenshot image file
`screenshotUrl`	string	Public URL of the captured screenshot
`screenshotType`	string	Screenshot type (viewport or fullPage)
`domain`	string	Domain that was captured
`width`	number	Screenshot width in pixels
`height`	number	Screenshot height in pixels

Parameter	Type	Description
`results`	array	Crawled pages with markdown content and per-page metadata
↳ `markdown`	string	Page content as markdown
↳ `metadata`	json	Page metadata (url, title, crawlDepth, statusCode)
`metadata`	object	Crawl summary (numUrls, maxCrawlDepth, numSucceeded, numFailed, numSkipped)

Context.dev

On this page