Databricks · DCMLEP

Databricks Certified Machine Learning Professional Practice Test

Validates advanced expertise in designing and managing enterprise-scale machine learning solutions on Databricks, covering scalable model development with distributed training, MLOps practices including testing and deployment with Databricks Asset Bundles, and model monitoring with Lakehouse Monitoring.

Exam Details

Questions

622

Duration

120 minutes

Passing Score

70%

Difficulty

Professional

Last Updated

Feb 2026

Databricks Certified Machine Learning Professional Practice Exam Preparation

Use this DCMLEP practice exam to prepare for Databricks Certified Machine Learning Professional with realistic questions, detailed explanations, and focused study modes. The practice bank includes 622 questions for Databricks DCMLEP, so you can review the exam steadily instead of relying on one long cram session.

As you practice, pay extra attention to patterns in your missed answers. Start with short sessions to identify weak areas, then move into timed quizzes once your accuracy is consistent.

The explanations are especially useful when you want to connect exam wording to the responsibilities and scenarios described in the official certification guidance. Use the free preview first, then unlock the full question bank when you are ready to build a complete study routine.

Exam Domain Breakdown

Model Development47%

MLOps43%

Model Deployment10%

Exam Overview

The Databricks Certified Machine Learning Professional certification validates advanced expertise in designing, implementing, and managing enterprise-scale machine learning solutions on the Databricks Lakehouse Platform. It covers the full spectrum of production ML engineering: building scalable pipelines with SparkML, implementing distributed training and hyperparameter tuning using Ray and Optuna, and leveraging advanced MLflow capabilities such as nested runs, custom metrics, model flavors, PyFunc custom models, and Model Registry webhooks.

The certification also emphasizes modern MLOps practices, including ML pipeline testing strategies (unit and integration tests), environment management via Databricks Asset Bundles (DABs) for infrastructure-as-code, automated retraining workflows, and production monitoring with Lakehouse Monitoring for detecting feature drift, label drift, prediction drift, and concept drift. Model deployment topics include blue-green and canary deployment strategies, custom model serving endpoints, and rollout management through Databricks Model Serving and the MLflow Deployments SDK. The exam was updated in September 2025 to consolidate its structure into three core domains.

Official exam page

Who Should Take This Exam

This certification is designed for senior ML engineers, MLOps engineers, and data scientists with at least one year of hands-on experience building and operationalizing machine learning workflows on Databricks. It is appropriate for professionals who work at enterprise scale—managing multi-environment ML deployments, automated retraining pipelines, and production monitoring—rather than those focused solely on model experimentation.

Typical candidates hold roles such as Machine Learning Engineer, MLOps Engineer, Senior Data Scientist, or ML Platform Engineer. Those already holding the Databricks Certified Machine Learning Associate credential who want to demonstrate deeper production-level expertise are also a natural fit for this exam.

Prerequisites

There are no formal prerequisites required to register for this exam. However, Databricks strongly recommends at least one year of hands-on experience performing the advanced ML engineering tasks outlined in the official exam guide. Candidates are expected to have practical familiarity with the Databricks platform, Apache Spark and SparkML, MLflow experiment tracking and model registry, and Python-based ML workflows.

Databricks recommends completing the instructor-led courses 'Machine Learning at Scale' and 'Advanced MLOps on Databricks' before attempting the exam. Candidates who hold the Databricks Certified Machine Learning Associate credential will find that foundational knowledge helpful, though it is not a required prerequisite.

Exam Format

The exam consists of 59 scored multiple-choice questions to be completed within 120 minutes. All questions are multiple-choice; there are no hands-on labs or interactive coding tasks. Many questions are scenario-based, presenting real-world Databricks ML workflows and asking candidates to select the most appropriate approach. The exam may also include a small number of unscored items collected for statistical research purposes; these are not identified on the exam form and do not affect the final score, with additional time factored in to accommodate them.

The exam is delivered online through Databricks' exam delivery platform and costs USD $200. A passing score of 70% is required. Certification is valid for two years, after which recertification requires retaking the current version of the exam. The current version of the exam is the September 2025 edition.

Skills Measured

1.Domain 1 — Model Development (~47%, 22 objectives): Covers building scalable ML pipelines using SparkML estimators, transformers, and pipeline APIs; distributed training with pandas Function APIs and UDFs; hyperparameter tuning at scale using Ray Tune and Optuna; advanced MLflow usage including nested runs, custom metrics, experiment tracking, and model flavors; and Feature Store implementation including automated feature pipelines and point-in-time lookups.
2.Domain 2 — MLOps (~43%, 20 objectives): Covers MLOps architecture patterns and multi-environment setups; ML pipeline testing including unit and integration test strategies; Databricks Asset Bundles (DABs) for managing ML assets as infrastructure-as-code; automated retraining workflows with triggers and orchestration; and production monitoring with Lakehouse Monitoring, including configuring drift detection for feature, label, prediction, and concept drift, and setting up alerting on metric degradation.
3.Domain 3 — Model Deployment (~10%, 5 objectives): Covers deployment strategies including blue-green and canary rollouts; deploying models via Databricks Model Serving endpoints; building custom model serving solutions with MLflow PyFunc; managing model versions and traffic splitting; and using the REST API and MLflow Deployments SDK for programmatic deployment management.

Study Tips

Download and study the official Databricks Certified Machine Learning Professional Exam Guide (available at files.training.databricks.com) to understand the exact objectives and their distribution across the three domains before building your study plan.
Prioritize Lakehouse Monitoring deeply — it appears in approximately 10 exam objectives. Understand all four drift types (feature, label, prediction, concept), the statistical tests used to detect each, and how to configure monitor profiles and alerting in the Databricks UI and via the API.
Complete the official Databricks instructor-led courses 'Machine Learning at Scale' and 'Advanced MLOps on Databricks', then work through the end-to-end MLOps demo notebooks in the Databricks documentation to see how Asset Bundles, model serving, and monitoring connect in a production workflow.
Practice with Databricks Asset Bundles (DABs) hands-on using the Databricks Free Edition. Know how to define ML assets (jobs, model serving endpoints, MLflow experiments) in YAML bundle configuration files and deploy them across dev/staging/prod environments.
Master MLflow's advanced features beyond basic experiment tracking: nested runs for hyperparameter search logging, custom PyFunc model classes, model signatures and input examples, Model Registry stage transitions, and webhook configurations for CI/CD triggers.
Build fluency with distributed hyperparameter tuning: understand when to use Ray Tune vs. Optuna, how to integrate them with MLflow for run logging, and how SparkML's CrossValidator and TrainValidationSplit compare to these libraries for tuning at scale.
Use scenario-based practice questions that mirror real Databricks workflows — the exam is heavily applied. Focus on recognizing the correct tool or API for a given production situation rather than memorizing definitions.

Career Benefits

Earning this certification positions ML engineers and data scientists for senior and staff-level roles that require end-to-end ownership of production ML systems. Job titles commonly associated with this credential include Senior Machine Learning Engineer, MLOps Engineer, ML Platform Engineer, and AI/ML Architect. Organizations adopting the Databricks Lakehouse Platform at scale—particularly in finance, healthcare, retail, and technology sectors—actively seek professionals who can demonstrate validated expertise in production ML workflows rather than just model development.

Sample Questions

5 sample questions with answers and explanations. Start a practice session to test yourself across all 622 questions.

Preview — answers shown

1. A data science team is implementing point-in-time feature lookups to prevent data leakage in a credit scoring model. They want to exclude feature values older than 14 days from the training set. Which parameter and data type should they use in the FeatureLookup configuration? (Select one!)

Alookback_window=14 using integer type

Bmax_age_days=14 using integer type

Clookback_window=timedelta(days=14) using datetime.timedelta type

Dtimestamp_threshold='14d' using string type

Explanation

The lookback_window parameter in FeatureLookup requires a datetime.timedelta object to specify the time window for excluding older feature values. This parameter is available in Feature Store client v0.13.0+ and all versions of Feature Engineering in Unity Catalog. The lookback window is applied during training and batch inference to prevent using stale features. Using an integer, string, or non-existent parameter like max_age_days will result in an error. The timedelta type ensures precise time calculations across different timestamp formats.

2. An ML engineer is debugging an inference table that captures requests to a Model Serving endpoint. They observe that some large prediction requests show null values in the request and response columns, while smaller requests are logged correctly. The endpoint is configured with auto_capture_config enabled. What is the most likely cause of this behavior? (Select one!)

AThe endpoint workload_size is insufficient to handle large requests

BThe sampling_fraction parameter is set too low for large requests

CThe auto_capture_config schema_name is not correctly configured

DInference tables have a 1MB payload size limit and drop larger payloads

Explanation

Databricks inference tables have a documented 1MB maximum payload size limit. When request or response payloads exceed this limit, they are logged as null values instead of being truncated or rejected. This explains why smaller requests log correctly while larger ones show null. The sampling_fraction parameter controls what percentage of requests are logged, not the size threshold for null values. Insufficient workload_size would cause request failures or timeouts, not selective null logging. Incorrect schema_name configuration would prevent all logging or cause permission errors, not size-based null values.

3. A data science team is implementing batch inference using a registered MLflow model with Spark UDF. They need to specify the Python environment manager for dependency isolation. Which env_manager parameter value should they use for the most reliable dependency management in Databricks? (Select one!)

Avirtualenv

Bconda

Cdocker

Dlocal

Explanation

The virtualenv environment manager is recommended for Spark UDF batch inference in Databricks as it provides reliable dependency isolation and works well with Databricks Runtime. The conda environment manager can be slower to create environments. The local environment manager uses the current Python environment without isolation, risking dependency conflicts. Docker environment manager is not supported for spark_udf in Databricks.

4. An ML engineer is comparing model performance metrics from multiple MLflow experiments. They need to find all runs where the F1 score exceeds 0.85 and the model type parameter equals Random Forest, then sort results by F1 score in descending order. Which filter string and order_by syntax is correct? (Select one!)

Afilter_string="metrics.f1 > 0.85 OR params.model_type = 'Random Forest'", order_by=["metrics.f1 DESC"]

Bfilter_string="metrics.f1 > 0.85 AND params.model_type = 'Random Forest'", order_by=["metrics.f1 DESC"]

Cfilter_string="metrics['f1'] > 0.85 AND params['model_type'] = 'Random Forest'", order_by=["metrics.f1 DESC"]

Dfilter_string="metrics.f1 > 0.85 AND params.model_type = Random Forest", order_by=["metrics.f1 DESC"]

Explanation

MLflow search filter syntax requires quotes around parameter string values but not around metric names. The correct syntax is params.model_type = 'Random Forest' with single quotes. Omitting quotes around the parameter value causes a syntax error. Using OR instead of AND returns runs meeting either condition rather than both. Using bracket notation like metrics['f1'] is invalid in MLflow filter strings; dot notation is required.

5. A data engineering team is building a SparkML pipeline to process customer reviews. They need to convert text into numeric vectors for model training. The pipeline should handle variable-length text, create consistent feature vectors of size 1000, and handle unknown words encountered during prediction. Which transformer should they use? (Select one!)

AHashingTF with numFeatures set to 1000

BCountVectorizer with vocabSize set to 1000

CVectorAssembler with numFeatures set to 1000

DWord2Vec with vectorSize set to 1000

Explanation

HashingTF applies the hashing trick to convert text into fixed-size feature vectors and automatically handles unknown words at prediction time by hashing them to the same feature space. CountVectorizer requires a fixed vocabulary learned during training and may fail on unknown words depending on handleInvalid settings. Word2Vec creates dense embeddings with vectorSize parameter, but it learns word representations during training and may not handle out-of-vocabulary words as robustly. VectorAssembler combines existing numeric columns, not text processing.

More Databricks Practice Exams

Databricks Certified Machine Learning Associate

DCMLEA · 630 questions

Databricks Certified Data Engineer Associate

DCDEA · 628 questions

Databricks Certified Data Engineer Professional

DCDEP · 628 questions

Databricks Certified Data Analyst Associate

DCDAA · 627 questions

Databricks Certified Generative AI Engineer Associate

DCGAE · 620 questions

Databricks Certified Associate Developer for Apache Spark

DCASD · 604 questions

$17.99

One-time access to this exam

Full access to all 622 questions

Or $15/mo for all 253 exams

Detailed explanations

Free preview stays available