Microsoft • DP-750
Validates expertise in implementing data engineering solutions using Azure Databricks, including integrating and modeling data, building and deploying optimized pipelines, and applying data quality and governance best practices with Unity Catalog.
Questions
593
Duration
120 minutes
Passing Score
700/1000
Difficulty
AssociateLast Updated
May 2026
The Microsoft Certified: Azure Databricks Data Engineer Associate (Exam DP-750) validates subject matter expertise in implementing end-to-end data engineering solutions on the Azure Databricks platform. The certification covers the full lakehouse engineering lifecycle, from configuring workspaces and compute resources to ingesting, transforming, and modeling data using Delta Lake, then deploying and maintaining production-grade pipelines with Lakeflow Jobs and Lakeflow Spark Declarative Pipelines. A core emphasis is placed on Unity Catalog, Microsoft and Databricks' unified governance layer, which candidates must know how to use for securing objects, managing data lineage, enforcing row- and column-level access controls, and applying data quality expectations.
This certification was introduced in beta in March 2026 and reached general availability in May 2026, reflecting the rapid enterprise adoption of Azure Databricks as a foundational data and AI platform. Certified engineers are expected to work proficiently in both SQL and Python, apply software development lifecycle (SDLC) practices including Git-based version control and Databricks Asset Bundles, and integrate Azure services such as Microsoft Entra for identity management, Azure Data Factory for orchestration, and Azure Monitor for observability. The exam tests not only implementation skills but also the ability to troubleshoot Spark jobs, resolve performance bottlenecks such as skewing and spilling, and optimize Delta tables using techniques like liquid clustering and OPTIMIZE/VACUUM commands.
This certification is designed for data engineers who design, build, and maintain data pipelines and lakehouse architectures on Azure Databricks in production environments. Ideal candidates hold roles such as Azure Databricks Data Engineer, Cloud Data Engineer, or Analytics Engineer, and collaborate closely with platform architects, solution architects, data scientists, and data analysts. The certification is positioned at the associate (intermediate) level, making it appropriate for professionals who have hands-on experience building data solutions in the cloud but are not yet operating at an expert or architect level.
Candidates should be comfortable writing data transformation logic in both SQL and Python, managing version control with Git, and working within the Azure ecosystem. Engineers currently using Azure Synapse Analytics, Azure Data Factory, or other cloud data platforms who are transitioning to or expanding into Azure Databricks will find this certification a strong validation of their upskilled capabilities.
Microsoft does not enforce formal prerequisites for Exam DP-750, but the official study guide makes clear that candidates should arrive with meaningful hands-on experience. Specifically, candidates are expected to know how to ingest and transform data using SQL and Python, apply SDLC practices including Git branching and pull request workflows, and be familiar with Microsoft Entra (for authentication via service principals and managed identities), Azure Data Factory, and Azure Monitor. A solid understanding of Apache Spark concepts—including DataFrames, Structured Streaming, and the Spark execution model (DAGs, shuffle, caching)—is essential for the performance troubleshooting and optimization portions of the exam.
Practical familiarity with Unity Catalog concepts (catalogs, schemas, volumes, managed vs. external tables, privileges, and data lineage) is strongly recommended, as governance topics account for 15–20% of the exam. Candidates who have completed the official instructor-led course DP-750T00-A or equivalent self-paced Microsoft Learn paths will be well-positioned. Prior experience with the Databricks Certified Data Engineer Associate exam from Databricks itself provides useful conceptual overlap, though the DP-750 places greater emphasis on Azure-native integrations and Unity Catalog governance.
Exam DP-750 is a proctored assessment delivered through Pearson VUE, available online (at-home proctoring) or at a testing center. Candidates have 120 minutes to complete the assessment. A passing score of 700 out of 1000 is required; Microsoft uses a scaled scoring system where question difficulty factors into the final score, so the passing threshold does not correspond directly to a fixed percentage of correct answers. The exam is currently offered in English only, though candidates who take the exam in a non-primary language can request an additional 30 minutes.
The exam may include a variety of question types such as multiple choice, multiple select, drag-and-drop, and interactive lab-style components (as noted in the official exam policy). Microsoft does not publish an exact question count for DP-750. The certification renews annually and can be renewed at no cost by passing a free online assessment on Microsoft Learn, typically available within eight weeks of the exam reaching general availability.
Azure Databricks data engineers in the US command average salaries of approximately $137,000 per year, with senior and lead roles on the Azure platform typically ranging from $150,000 to $190,000. Databricks appeared in 16.8% of data engineering job postings in 2026, and the broader data engineering field has added over 20,000 new roles in the past year with projected growth of 34% through 2034 according to U.S. Bureau of Labor Statistics data. The DP-750 targets the intersection of Microsoft Azure infrastructure and the Databricks lakehouse platform, making it directly relevant for roles such as Azure Databricks Data Engineer, Cloud Data Engineer, Analytics Engineer, and Data Platform Engineer at organizations running Azure-native data stacks.
Compared to the vendor-neutral Databricks Certified Data Engineer Associate exam, the DP-750 provides stronger validation of Azure-specific integrations—Microsoft Entra, Azure Monitor, Azure Data Factory, and Delta Sharing in Unity Catalog—making it the more compelling choice for engineers working within Microsoft-centric enterprise environments. The certification renews annually via a free online assessment, keeping credentialed professionals current as the platform evolves. Microsoft has positioned DP-750 as part of a broader wave of AI- and data-focused credentials, signaling continued investment in the Azure Databricks certification path.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 593 questions.
1. Contoso's data governance team has hired a new data steward who must audit Unity Catalog metadata — including table schemas, column definitions, tags, and data lineage — across all catalogs in the workspace. The data steward must not be able to query or read any actual table data. Which privilege should the administrator grant to the data steward at the catalog level? (Select one!)
Explanation
The BROWSE privilege in Unity Catalog allows a principal to view object metadata — including table schemas, column definitions, comments, tags, and lineage information — without requiring USE CATALOG or USE SCHEMA on the parent objects and without granting any data access. This makes it the precise privilege for a data steward role that needs metadata visibility without data access. USE CATALOG only enables namespace navigation and does not provide governance metadata visibility in isolation. SELECT grants full data read access, which violates the requirement of no data access. DATA_READ is not a standard Unity Catalog privilege.
2. Contoso's data engineering team configures an Auto Loader stream to ingest JSON files from ADLS Gen2 into a Delta table. They provide an explicit schema definition in the readStream call but do not configure any schema evolution options. A new batch of JSON files arrives containing an additional field not defined in the schema. What happens to the additional field during ingestion? (Select one!)
Explanation
When an explicit schema is provided to Auto Loader without configuring a schema evolution mode, the default evolution mode is none. In this mode, Auto Loader does not fail the stream when it encounters unexpected fields. Instead, any data that does not conform to the defined schema — including additional fields not listed in the schema — is captured in a special column called _rescued_data. This allows the stream to continue processing all valid fields while preserving unmatched data for later investigation. The stream does not fail because _rescued_data acts as Auto Loader's built-in safety net for schema mismatches. The additional field is not automatically added to the table schema — that behavior requires explicitly setting cloudFiles.schemaEvolutionMode to addNewColumns. Data is never silently discarded; _rescued_data ensures full data capture even when schema conformance is not met.
3. A data analytics team at Northwind Traders shares a single all-purpose cluster among six analysts. The cluster must enforce Unity Catalog fine-grained access controls for each analyst individually, including column masks and row filters, so that each analyst sees only the data they are authorized to access. Analysts use a combination of Python and SQL in their notebooks. Which cluster access mode should the workspace administrator configure? (Select one!)
Explanation
Shared access mode is the correct choice when multiple users must share a cluster while maintaining per-user Unity Catalog security enforcement including column masks and row filters. In Shared mode, each query is evaluated against the identity of the individual user who submitted it, ensuring that column masks and row filters are applied correctly for every analyst on the cluster. Shared mode supports both Python and SQL workloads. No Isolation Shared mode allows multiple users on the same cluster but does not enforce Unity Catalog fine-grained security controls, so column masks and row filters are not applied and all users see unfiltered data. Single User (Dedicated) mode enforces Unity Catalog security but is restricted to a single user at a time, making it unsuitable for a shared team environment. High Concurrency mode is a legacy cluster configuration predating Unity Catalog and does not support modern fine-grained access controls such as column masks and row filters.
4. Contoso's data engineering team configures an Auto Loader stream to ingest JSON event files into a Bronze Delta table without specifying an explicit schema. They configure a schemaLocation to persist the inferred schema state between runs. After several weeks, the upstream system begins including a new field called session_duration in the JSON payloads. What will Auto Loader do when it processes files containing this new field? (Select one!)
Explanation
When no explicit schema is provided to Auto Loader, the default schema evolution mode is addNewColumns. Under this mode, new columns discovered in incoming data are automatically added to the Delta table schema and the pipeline continues processing without interruption. This behavior is independent of whether a schemaLocation is configured—schemaLocation controls where Auto Loader persists schema inference state between runs but does not affect the default evolution mode. This is a documented gotcha: many engineers assume specifying a schemaLocation locks the schema, but it does not change the evolution behavior. The _rescued_data column captures data that cannot conform to the current schema, such as type mismatches or columns present in data when the evolution mode is explicitly set to none. The none evolution mode is only the default when an explicit user-provided schema is supplied, not when Auto Loader is inferring the schema on its own.
5. VanArsdel Ltd's data engineering team is troubleshooting a slow Lakeflow Job stage using the Spark UI. The Stage view shows 200 tasks in the stage. Nearly all tasks complete in under 4 seconds, but 3 tasks have been running for over 10 minutes, blocking the entire stage from completing. Executor memory metrics show no disk spills and all executors have identical configurations. What condition does this pattern most likely indicate, and what should the team investigate first? (Select one!)
Explanation
The described pattern — the vast majority of tasks completing quickly while a very small subset of tasks runs orders of magnitude longer — is the definitive Spark UI signature of data skew. Data skew occurs when certain key values in a join or aggregation column appear far more frequently than others, causing those specific partitions to receive and process disproportionately more data. The Spark UI Stage view with a small number of extreme outlier task durations alongside a large number of fast-completing tasks is the primary diagnostic indicator. The team should examine the distribution of the shuffle key and consider salting skewed keys, repartitioning on a higher-cardinality key, or enabling Adaptive Query Execution skew join optimization. Insufficient shuffle partitions would cause all tasks to be slow proportionally, not a tiny subset. Driver memory pressure affects the post-task collection phase, not individual task execution. Broadcast variable deserialization delays occur at task startup and affect all tasks receiving the broadcast, not a small outlier subset.
One-time access to this exam