Databricks

Run SQL queries and manage jobs on Databricks

Databricks is a unified data analytics platform built on Apache Spark, providing a collaborative environment for data engineering, data science, and machine learning. Databricks combines data warehousing, ETL, and AI workloads into a single lakehouse architecture, with support for SQL analytics, job orchestration, and cluster management across major cloud providers.

With the Databricks integration in Sim, you can:

  • Execute SQL queries: Run SQL statements against Databricks SQL warehouses with support for parameterized queries and Unity Catalog
  • Manage jobs: List, trigger, and monitor Databricks job runs programmatically
  • Track run status: Get detailed run information including timing, state, and output results
  • Control clusters: List and inspect cluster configurations, states, and resource details
  • Retrieve run outputs: Access notebook results, error messages, and logs from completed job runs

In Sim, the Databricks integration enables your agents to interact with your data lakehouse as part of automated workflows. Agents can query large-scale datasets, orchestrate ETL pipelines by triggering jobs, monitor job execution, and retrieve results—all without leaving the workflow canvas. This is ideal for automated reporting, data pipeline management, scheduled analytics, and building AI-driven data workflows that react to query results or job outcomes.

Usage Instructions

Connect to Databricks to execute SQL queries against SQL warehouses, trigger and monitor job runs, manage clusters, and retrieve run outputs. Requires a Personal Access Token and workspace host URL.

Tools

databricks_execute_sql

Execute a SQL statement against a Databricks SQL warehouse and return results inline. Supports parameterized queries and Unity Catalog.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
warehouseIdstringYesThe ID of the SQL warehouse to execute against
statementstringYesThe SQL statement to execute (max 16 MiB)
catalogstringNoUnity Catalog name (equivalent to USE CATALOG)
schemastringNoSchema name (equivalent to USE SCHEMA)
rowLimitnumberNoMaximum number of rows to return
waitTimeoutstringNoHow long to wait for results (e.g., "50s"). Range: "0s" or "5s" to "50s". Default: "50s"

Output

ParameterTypeDescription
statementIdstringUnique identifier for the executed statement
statusstringExecution status (SUCCEEDED, PENDING, RUNNING, FAILED, CANCELED, CLOSED)
columnsarrayColumn schema of the result set
namestringColumn name
positionnumberColumn position (0-based)
typeNamestringColumn type (STRING, INT, LONG, DOUBLE, BOOLEAN, TIMESTAMP, DATE, DECIMAL, etc.)
dataarrayResult rows as a 2D array of strings where each inner array is a row of column values
totalRowsnumberTotal number of rows in the result
truncatedbooleanWhether the result set was truncated due to row_limit or byte_limit

databricks_list_jobs

List all jobs in a Databricks workspace with optional filtering by name.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
limitnumberNoMaximum number of jobs to return (range 1-100, default 20)
offsetnumberNoOffset for pagination
namestringNoFilter jobs by exact name (case-insensitive)
expandTasksbooleanNoInclude task and cluster details in the response (max 100 elements)

Output

ParameterTypeDescription
jobsarrayList of jobs in the workspace
jobIdnumberUnique job identifier
namestringJob name
createdTimenumberJob creation timestamp (epoch ms)
creatorUserNamestringEmail of the job creator
maxConcurrentRunsnumberMaximum number of concurrent runs
formatstringJob format (SINGLE_TASK or MULTI_TASK)
hasMorebooleanWhether more jobs are available for pagination
nextPageTokenstringToken for fetching the next page of results

databricks_run_job

Trigger an existing Databricks job to run immediately with optional job-level or notebook parameters.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
jobIdnumberYesThe ID of the job to trigger
jobParametersstringNoJob-level parameter overrides as a JSON object (e.g., {"key": "value"})
notebookParamsstringNoNotebook task parameters as a JSON object (e.g., {"param1": "value1"})
idempotencyTokenstringNoIdempotency token to prevent duplicate runs (max 64 characters)

Output

ParameterTypeDescription
runIdnumberThe globally unique ID of the triggered run
numberInJobnumberThe sequence number of this run among all runs of the job

databricks_get_run

Get the status, timing, and details of a Databricks job run by its run ID.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
runIdnumberYesThe canonical identifier of the run
includeHistorybooleanNoInclude repair history in the response
includeResolvedValuesbooleanNoInclude resolved parameter values in the response

Output

ParameterTypeDescription
runIdnumberThe run ID
jobIdnumberThe job ID this run belongs to
runNamestringName of the run
runTypestringType of run (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
attemptNumbernumberRetry attempt number (0 for initial attempt)
stateobjectRun state information
lifeCycleStatestringLifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
resultStatestringResult state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
stateMessagestringDescriptive message for the current state
userCancelledOrTimedoutbooleanWhether the run was cancelled by user or timed out
startTimenumberRun start timestamp (epoch ms)
endTimenumberRun end timestamp (epoch ms, 0 if still running)
setupDurationnumberCluster setup duration (ms)
executionDurationnumberExecution duration (ms)
cleanupDurationnumberCleanup duration (ms)
queueDurationnumberTime spent in queue before execution (ms)
runPageUrlstringURL to the run detail page in Databricks UI
creatorUserNamestringEmail of the user who triggered the run

databricks_list_runs

List job runs in a Databricks workspace with optional filtering by job, status, and time range.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
jobIdnumberNoFilter runs by job ID. Omit to list runs across all jobs
activeOnlybooleanNoOnly include active runs (PENDING, RUNNING, or TERMINATING)
completedOnlybooleanNoOnly include completed runs
limitnumberNoMaximum number of runs to return (range 1-24, default 20)
offsetnumberNoOffset for pagination
runTypestringNoFilter by run type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
startTimeFromnumberNoFilter runs started at or after this timestamp (epoch ms)
startTimeTonumberNoFilter runs started at or before this timestamp (epoch ms)

Output

ParameterTypeDescription
runsarrayList of job runs
runIdnumberUnique run identifier
jobIdnumberJob this run belongs to
runNamestringRun name
runTypestringRun type (JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN)
stateobjectRun state information
lifeCycleStatestringLifecycle state (QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY)
resultStatestringResult state (SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED)
stateMessagestringDescriptive state message
userCancelledOrTimedoutbooleanWhether the run was cancelled by user or timed out
startTimenumberRun start timestamp (epoch ms)
endTimenumberRun end timestamp (epoch ms)
hasMorebooleanWhether more runs are available for pagination
nextPageTokenstringToken for fetching the next page of results

databricks_cancel_run

Cancel a running or pending Databricks job run. Cancellation is asynchronous; poll the run status to confirm termination.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
runIdnumberYesThe canonical identifier of the run to cancel

Output

ParameterTypeDescription
successbooleanWhether the cancel request was accepted

databricks_get_run_output

Get the output of a completed Databricks job run, including notebook results, error messages, and logs. For multi-task jobs, use the task run ID (not the parent run ID).

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token
runIdnumberYesThe run ID to get output for. For multi-task jobs, use the task run ID

Output

ParameterTypeDescription
notebookOutputobjectNotebook task output (from dbutils.notebook.exit())
resultstringValue passed to dbutils.notebook.exit() (max 5 MB)
truncatedbooleanWhether the result was truncated
errorstringError message if the run failed or output is unavailable
errorTracestringError stack trace if available
logsstringLog output (last 5 MB) from spark_jar, spark_python, or python_wheel tasks
logsTruncatedbooleanWhether the log output was truncated

databricks_list_clusters

List all clusters in a Databricks workspace including their state, configuration, and resource details.

Input

ParameterTypeRequiredDescription
hoststringYesDatabricks workspace host (e.g., dbc-abc123.cloud.databricks.com)
apiKeystringYesDatabricks Personal Access Token

Output

ParameterTypeDescription
clustersarrayList of clusters in the workspace
clusterIdstringUnique cluster identifier
clusterNamestringCluster display name
statestringCurrent state (PENDING, RUNNING, RESTARTING, RESIZING, TERMINATING, TERMINATED, ERROR, UNKNOWN)
stateMessagestringHuman-readable state description
creatorUserNamestringEmail of the cluster creator
sparkVersionstringSpark runtime version (e.g., 13.3.x-scala2.12)
nodeTypeIdstringWorker node type identifier
driverNodeTypeIdstringDriver node type identifier
numWorkersnumberNumber of worker nodes (for fixed-size clusters)
autoscaleobjectAutoscaling configuration (null for fixed-size clusters)
minWorkersnumberMinimum number of workers
maxWorkersnumberMaximum number of workers
clusterSourcestringOrigin (API, UI, JOB, MODELS, PIPELINE, PIPELINE_MAINTENANCE, SQL)
autoterminationMinutesnumberMinutes of inactivity before auto-termination (0 = disabled)
startTimenumberCluster start timestamp (epoch ms)

On this page

Start building today
Trusted by over 70,000 builders.
Build Agentic workflows visually on a drag-and-drop canvas or with natural language.
Get started