Skip to main content

Databricks API Token

A Databricks API Token is a credential used to authenticate and authorize access to Databricks services, which are primarily used for big data processing and analytics. These tokens allow users to interact with Databricks REST APIs, enabling operations such as job management, cluster configuration, and data access. Exposure of an API token can lead to unauthorized access to sensitive data and resources, posing a significant security risk.


How Does It Look

Databricks API Tokens can appear in various contexts, such as:

  • Environment variables

    export DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXX"
  • Configuration files (JSON, YAML, .env)

    {
    "databricks": {
    "token": "dapiXXXXXXXXXXXXXXXX"
    }
    }
  • Code snippets

    import requests

    headers = {"Authorization": "Bearer dapiXXXXXXXXXXXXXXXX"}
    response = requests.get("https://<databricks-instance>/api/2.0/clusters/list", headers=headers)
  • Connection strings

    databricks://token:dapiXXXXXXXXXXXXXXXX@<databricks-instance>

Severity

  • 🔴 Critical

The severity is critical because a Databricks API Token provides access to potentially sensitive data and computational resources. An attacker with this token can perform actions such as managing clusters, running jobs, and accessing data stored in Databricks, leading to a wide blast radius.


What Can an Attacker Do?

With immediate access to a Databricks API Token, an attacker can execute a variety of malicious actions:

An attacker can immediately:

  • Access and exfiltrate data (if the token has read permissions to data stored in Databricks)
  • Modify or delete data (if the token has write permissions)
  • Spin up clusters and run unauthorized jobs (if compute permissions are granted)
  • Access sensitive configuration and metadata (if the token allows access to administrative APIs)

The attacker can potentially escalate their access or move laterally within the environment by exploiting other vulnerabilities or misconfigurations, leading to broader access across the organization's infrastructure.


Real-World Impact

The exposure of a Databricks API Token poses significant business risks:

The primary impact includes unauthorized access to sensitive data and resources.

Potential consequences include:

  • Data Exposure: Access to proprietary datasets and analytics results (if the token has read access to sensitive data)
  • Financial Loss: Increased cloud costs due to unauthorized resource usage (if billing/resource creation is permitted)
  • Operational Disruption: Interruption of data processing workflows (if the attacker has delete/modify permissions)
  • Reputational Damage: Loss of trust from clients and partners due to data breaches

In the worst-case scenario, the exposure could lead to cascading effects, such as further breaches of connected systems or regulatory scrutiny due to data mishandling.


Prerequisites for Exploitation

To exploit a Databricks API Token, an attacker needs:

  • Network access to the Databricks instance
  • Knowledge of the Databricks instance URL and relevant API endpoints
  • No IP restrictions or MFA enforcement on the account

How to Verify If It's Active

To verify if a Databricks API Token is active, use the following command:

curl -X GET https://<databricks-instance>/api/2.0/clusters/list \
-H "Authorization: Bearer [TOKEN]"

Valid credential response: A list of clusters in JSON format, indicating the token is active.

Invalid/expired credential response: An error message indicating unauthorized access or token expiration.


Detection Patterns

Common Variable Names:

  • DATABRICKS_TOKEN
  • DATABRICKS_API_KEY
  • DATABRICKS_SECRET
  • TOKEN
  • API_KEY
  • SECRET_KEY
  • DBRICKS_TOKEN
  • DBRICKS_API_KEY

File Locations:

  • .env
  • config.json
  • settings.yaml
  • credentials.txt
  • databricks_config.py

Regex Pattern:

dapi[a-zA-Z0-9]{24}

Remediation Steps

  1. Revoke immediately - Go to Databricks > User Settings > Access Tokens and delete the compromised token.
  2. Audit access logs - Review Databricks audit logs for unauthorized API calls or data access during the exposure window.
  3. Assess blast radius - Identify all systems, applications, and environments that used the exposed token.
  4. Rotate credential - Generate a new API token in Databricks with least-privilege permissions.
  5. Update dependent systems - Deploy the new token to all applications and update CI/CD pipelines securely.
  6. Harden access controls - Enable IP allowlisting in Databricks and require multi-factor authentication (MFA).
  7. Implement secrets management - Migrate credentials to a secrets manager (HashiCorp Vault, AWS Secrets Manager) to prevent hardcoding.
  8. Add detection controls - Set up pre-commit hooks and repository scanning to catch credential leaks before they reach production.

Credential exposures often go undetected for extended periods, increasing the window for exploitation. As a long-term strategy, plan to establish an internal process or engage an external vendor for continuous external exposure monitoring. This helps identify leaked secrets across public repositories, paste sites, dark web forums, and other external sources before attackers can leverage them. Proactive detection and rapid response are essential to minimizing the impact of credential leaks.


References