Databricks Permissions for AI Data Engineer
Overview
The Osmos AI Data Engineer for Databricks connects securely to your Databricks workspace through a service principal that you create and manage. All compute runs on your own Databricks clusters, and code can be version-controlled in a Git-backed Databricks Repo, giving you complete transparency and control.
Quickstart Guide
Create a Service Principal in your Databricks workspace.
Grant Permissions:
Repos:
Read
/Write
on the target repoWorkspace:
Read
/Write
on a folder for artifactsClusters:
Can Attach To
on your target classic (non-serverless) clusterData:
SELECT
on source tables, andINSERT
/UPDATE
on the desired output schema
Provide Resources to Osmos: Your Databricks service principal name, your Databricks workspace URL.
Networking Check: If your workspace uses a private VNet or VPC, contact Osmos Support for onboarding options.
Required Permissions
The service principal needs these baseline permissions:
Databricks Repos –
Read
&Write
access to the designated repo pathWorkspace –
Read
&Write
access to a folder for storing artifactsClusters –
Can Attach To
on a designated, active classic (non-serverless) clusterUnity Catalog Data –
SELECT
on source schemas/tables andINSERT
/UPDATE
on the schema where outputs are written
Configuration in Context
Code Management in Databricks Repos
Use a Git-backed Databricks Repo so the agent can create feature branches and commits. From there, push changes to your remote Git provider and create pull requests for review. Alternatively, grant write access to a Unity Catalog Volume if you prefer direct file storage.
Workspace for Artifacts
Provide a folder with Read
& Write
access (e.g., /Users/[email protected]/
) for logs and temporary files.
You may also grant Read
access to other workspace artifacts you want the agent to reference.
Cluster Usage
Provision a Classic (Non-Serverless) cluster and grant the service principal Can Attach To
.
The agent will not create, start, or stop clusters—it simply uses the cluster you specify and inherits its policies and libraries.
Data Governance with Unity Catalog
Grant SELECT
permissions on the source data and INSERT/UPDATE
on a designated schema for outputs.
A common best practice is to provide write access only to a development schema to isolate outputs from production tables.
Summary
The Osmos AI Data Engineer for Databricks operates entirely within the permissions you define. Think of it as a new engineer on your team: it can only read or write where you allow and will follow the instructions you provide. By configuring service principal access carefully—covering Repos, Workspace, Clusters, and Unity Catalog Data—you maintain full control and governance while enabling powerful autonomous data engineering capabilities.
Last updated
Was this helpful?