Databricks • DCDAA
Validates the ability to perform data analysis tasks using Databricks SQL and the Data Intelligence Platform, covering data management with Unity Catalog, query development and optimization, dashboards and visualizations, AI/BI Genie spaces, and data modeling.
Questions
627
Duration
90 minutes
Passing Score
70%
Difficulty
AssociateLast Updated
Feb 2026
The Databricks Certified Data Analyst Associate certification validates a candidate's ability to perform data analysis tasks using Databricks SQL and the broader Databricks Data Intelligence Platform. The exam assesses proficiency across five core domains: Databricks SQL (22%), Data Management (20%), SQL (29%), Data Visualization and Dashboards (18%), and Analytics Applications (11%). Candidates must demonstrate the ability to write and optimize ANSI SQL-compliant queries, manage data using Unity Catalog, ingest data through multiple methods including UI uploads, S3 ingestion, Delta Sharing, Auto Loader, and the Databricks Marketplace, and build production-grade dashboards with AI/BI Genie spaces.
The certification was updated in 2025 to reflect Databricks' evolution from a SQL analytics tool to a comprehensive Data Intelligence Platform. The updated exam places greater emphasis on Unity Catalog governance, AI/BI dashboard capabilities, query federation for cross-system analytics, and Attribute-Based Access Control (ABAC). Topics such as discrete/continuous statistics and third-party BI tool integrations (Tableau, Power BI, Looker specifics) were removed in the 2025 version. The credential remains valid for two years, after which recertification via the current exam version is required.
This certification is designed for data analysts, business intelligence professionals, SQL practitioners, and business users who work with or plan to work with the Databricks Data Intelligence Platform. It is well-suited for individuals in roles such as Data Analyst, BI Analyst, Analytics Engineer, or SQL Developer who need to demonstrate hands-on proficiency with Databricks SQL for querying, visualization, and insight generation.
Candidates are expected to have approximately 6 months of hands-on experience performing data analysis tasks within the Databricks environment. The associate-level designation makes it an appropriate starting point for professionals transitioning into the Lakehouse ecosystem or those looking to formalize their existing Databricks SQL skills with a vendor-recognized credential.
There are no mandatory formal prerequisites to register for this exam. However, Databricks recommends that candidates have at least 6 months of practical, hands-on experience working with Databricks SQL and the Data Intelligence Platform before attempting the exam. Familiarity with ANSI SQL standards is essential, as all SQL in the exam conforms to that specification.
Databricks also recommends completing the Lakehouse Fundamentals Accreditation as a foundational step before pursuing this certification. Prior experience with Unity Catalog for data governance, Delta Lake for data management, and the Databricks SQL editor will be highly beneficial. Candidates without Databricks-specific experience but with strong SQL backgrounds and data warehouse or analytics tool experience may still be competitive after targeted hands-on preparation.
The exam consists of 45 scored questions delivered in a 90-minute time window. Questions are multiple-choice and multi-select format. The exam may also include a small number of unscored survey or pilot items used for statistical calibration of future exams; these items are not identified and do not affect the final score, with additional time factored in to account for them.
The passing score is 70%. The exam costs USD $200 (plus applicable local taxes) and is delivered online through Databricks' exam delivery platform, which requires account creation or login prior to registration. All SQL tested on the exam adheres to ANSI SQL standards. Recertification is required every two years by retaking the current version of the exam.
Earning the Databricks Certified Data Analyst Associate credential signals verified proficiency on one of the fastest-growing data platforms in the enterprise market. Databricks is widely adopted by companies building Lakehouse architectures, and certified analysts are well-positioned for roles such as Data Analyst, BI Analyst, Analytics Engineer, and SQL Developer at organizations using Databricks. The certification is particularly valuable for professionals looking to differentiate themselves as Databricks skills become a standard hiring requirement across data teams.
Data analysts with Databricks certification report average salaries in the range of $115,000–$148,000 annually in the United States, meaningfully above the general data analyst average. The certification is an associate-level entry point into the Databricks certification ecosystem, which also includes Data Engineer Associate/Professional and Machine Learning tracks, giving certified analysts a clear pathway for continued credential advancement. As enterprises continue to consolidate their data and AI workloads on unified Lakehouse platforms, demand for analysts with validated Databricks SQL and governance skills is expected to remain strong.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 627 questions.
1. A retail company's data analyst creates the following constraint on the orders table: ALTER TABLE orders ADD CONSTRAINT valid_status CHECK (status IN ('pending', 'shipped', 'delivered', 'cancelled')). An application attempts to insert a row with status = 'SHIPPED' (uppercase). What happens? (Select one!)
Explanation
CHECK constraints in Delta Lake perform exact evaluation of the specified condition. String comparisons using IN are case-sensitive by default in Databricks SQL. Since SHIPPED in uppercase does not exactly match the lowercase shipped in the constraint list, the CHECK constraint fails and rejects the insert. String comparisons are case-sensitive unless explicitly using case-insensitive functions. Databricks does not automatically convert case for constraint validation. CHECK constraints are enforced on all data modification operations including INSERT, UPDATE, and MERGE. To allow case-insensitive validation, the constraint should use UPPER or LOWER functions.
2. A media company stores 50 TB of video metadata in a Delta table partitioned by upload_year with 5 years of historical data. Queries filtering by video_category and content_rating show poor performance despite Z-ordering. The company wants to migrate to liquid clustering. Which approach correctly transitions from partitioning and Z-ordering to liquid clustering? (Select one!)
Explanation
Creating a new table with CLUSTER BY from the source table using CTAS correctly migrates from partitioned tables with Z-ordering to liquid clustering, as partitioning and liquid clustering cannot coexist on the same table. After creation, swap table names or update references. ALTER TABLE CLUSTER BY cannot be applied to partitioned tables because liquid clustering is incompatible with traditional partitioning. OPTIMIZE ZORDER BY does not convert Z-ordering to liquid clustering as these are separate optimization techniques. Delta Lake does not support DROP PARTITION syntax to remove partitioning from a table; partitioning is a table property defined at creation time.
3. A data analyst runs OPTIMIZE events WHERE event_date >= '2025-01-01' on a partitioned table to compact recent data only. The table contains 2 TB of data spanning 3 years, with 50 GB of new data added in January 2025. How does this filtered OPTIMIZE operation behave compared to OPTIMIZE events without a WHERE clause? (Select one!)
Explanation
OPTIMIZE supports optional WHERE clauses to limit the operation to specific partitions or data subsets. When a partition-aware predicate is used, OPTIMIZE only processes files in the matching partitions, dramatically reducing the amount of data rewritten and the processing time required. For a partitioned table with date-based partitions, optimizing only January 2025 data processes approximately 50 GB instead of 2 TB. OPTIMIZE does respect WHERE clause predicates when provided. OPTIMIZE does not create separate copies of data, it rewrites existing files in place using bin-packing. WHERE clause filtering is a supported and recommended optimization technique.
4. A healthcare organization uses Unity Catalog to manage patient data across multiple workspaces. A data governance team needs to identify all downstream dashboards and jobs that depend on the patients table before implementing schema changes. They also need to programmatically query this information for automated impact reports. Which two approaches will provide this information? (Select two!)
Multiple correct answersExplanation
Catalog Explorer's Lineage tab provides interactive visualization of downstream dependencies including jobs and dashboards that consume the patients table, enabling manual impact analysis. The lineage system tables allow programmatic SQL queries to retrieve the same downstream dependency information for automated reporting and integration with other tools. This dual approach satisfies both manual exploration and automated analysis requirements. The audit log tracks access events but does not explicitly identify downstream dependencies or data flow relationships. DESCRIBE HISTORY shows table version history and operations performed on the table itself, not downstream consumers. SHOW GRANTS reveals which users and groups have permissions on the table but does not identify which dashboards, jobs, or downstream tables depend on it for their data pipelines.
5. A data analyst creates a Genie space for the finance department with 18 tables from the finance and accounting catalogs. The space uses a Pro SQL warehouse. After two weeks, the finance director requests that external auditors receive read-only access to review questions and answers without the ability to ask new questions or modify the space. What permission level should the analyst grant to the auditors group? (Select one!)
Explanation
CAN VIEW permission allows users to view the Genie space configuration, existing conversations, and historical questions and answers without the ability to ask new questions or modify the space. This is exactly what auditors need for read-only review access. CAN RUN permission allows users to ask questions and generate new queries, which exceeds the requirement. CAN EDIT allows modifying space configuration including sample queries and instructions. Granting SELECT on underlying Unity Catalog tables provides database-level access but doesn't give access to the Genie space interface or historical conversations—permissions on the Genie space object itself are separate from data permissions.
One-time access to this exam