Databricks Permissions for AI Data Engineer

Overview

The Osmos AI Data Engineer for Databricks connects securely to your Databricks workspace through a service principal that you create and manage. All compute runs on your own Databricks clusters, and code can be version-controlled in a Git-backed Databricks Repo, giving you complete transparency and control.

Quickstart Guide

  1. Create a Service Principal in your Databricks workspace.

  2. Grant Permissions:

    • Repos: Read/Write on the target repo

    • Workspace: Read/Write on a folder for artifacts

    • Clusters: Can Attach To on your target classic (non-serverless) cluster

    • Data: SELECT on source tables, and INSERT/UPDATE on the desired output schema

  3. Provide Resources to Osmos: Your Databricks service principal name, your Databricks workspace URL.

  4. Networking Check: If your workspace uses a private VNet or VPC, contact Osmos Support for onboarding options.

Required Permissions

The service principal needs these baseline permissions:

  • Databricks ReposRead & Write access to the designated repo path

  • WorkspaceRead & Write access to a folder for storing artifacts

  • ClustersCan Attach To on a designated, active classic (non-serverless) cluster

  • Unity Catalog DataSELECT on source schemas/tables and INSERT/UPDATE on the schema where outputs are written

Configuration in Context

Code Management in Databricks Repos

Use a Git-backed Databricks Repo so the agent can create feature branches and commits. From there, push changes to your remote Git provider and create pull requests for review. Alternatively, grant write access to a Unity Catalog Volume if you prefer direct file storage.

Workspace for Artifacts

Provide a folder with Read & Write access (e.g., /Users/[email protected]/) for logs and temporary files. You may also grant Read access to other workspace artifacts you want the agent to reference.

Cluster Usage

Provision a Classic (Non-Serverless) cluster and grant the service principal Can Attach To. The agent will not create, start, or stop clusters—it simply uses the cluster you specify and inherits its policies and libraries.

Data Governance with Unity Catalog

Grant SELECT permissions on the source data and INSERT/UPDATE on a designated schema for outputs. A common best practice is to provide write access only to a development schema to isolate outputs from production tables.

Summary

The Osmos AI Data Engineer for Databricks operates entirely within the permissions you define. Think of it as a new engineer on your team: it can only read or write where you allow and will follow the instructions you provide. By configuring service principal access carefully—covering Repos, Workspace, Clusters, and Unity Catalog Data—you maintain full control and governance while enabling powerful autonomous data engineering capabilities.

Last updated

Was this helpful?