Databricks • DCMLEP
Validates advanced expertise in designing and managing enterprise-scale machine learning solutions on Databricks, covering scalable model development with distributed training, MLOps practices including testing and deployment with Databricks Asset Bundles, and model monitoring with Lakehouse Monitoring.
Questions
622
Duration
120 minutes
Passing Score
70%
Difficulty
ProfessionalLast Updated
Feb 2026
The Databricks Certified Machine Learning Professional certification validates advanced expertise in designing, implementing, and managing enterprise-scale machine learning solutions on the Databricks Lakehouse Platform. It covers the full spectrum of production ML engineering: building scalable pipelines with SparkML, implementing distributed training and hyperparameter tuning using Ray and Optuna, and leveraging advanced MLflow capabilities such as nested runs, custom metrics, model flavors, PyFunc custom models, and Model Registry webhooks.
The certification also emphasizes modern MLOps practices, including ML pipeline testing strategies (unit and integration tests), environment management via Databricks Asset Bundles (DABs) for infrastructure-as-code, automated retraining workflows, and production monitoring with Lakehouse Monitoring for detecting feature drift, label drift, prediction drift, and concept drift. Model deployment topics include blue-green and canary deployment strategies, custom model serving endpoints, and rollout management through Databricks Model Serving and the MLflow Deployments SDK. The exam was updated in September 2025 to consolidate its structure into three core domains.
This certification is designed for senior ML engineers, MLOps engineers, and data scientists with at least one year of hands-on experience building and operationalizing machine learning workflows on Databricks. It is appropriate for professionals who work at enterprise scale—managing multi-environment ML deployments, automated retraining pipelines, and production monitoring—rather than those focused solely on model experimentation.
Typical candidates hold roles such as Machine Learning Engineer, MLOps Engineer, Senior Data Scientist, or ML Platform Engineer. Those already holding the Databricks Certified Machine Learning Associate credential who want to demonstrate deeper production-level expertise are also a natural fit for this exam.
There are no formal prerequisites required to register for this exam. However, Databricks strongly recommends at least one year of hands-on experience performing the advanced ML engineering tasks outlined in the official exam guide. Candidates are expected to have practical familiarity with the Databricks platform, Apache Spark and SparkML, MLflow experiment tracking and model registry, and Python-based ML workflows.
Databricks recommends completing the instructor-led courses 'Machine Learning at Scale' and 'Advanced MLOps on Databricks' before attempting the exam. Candidates who hold the Databricks Certified Machine Learning Associate credential will find that foundational knowledge helpful, though it is not a required prerequisite.
The exam consists of 59 scored multiple-choice questions to be completed within 120 minutes. All questions are multiple-choice; there are no hands-on labs or interactive coding tasks. Many questions are scenario-based, presenting real-world Databricks ML workflows and asking candidates to select the most appropriate approach. The exam may also include a small number of unscored items collected for statistical research purposes; these are not identified on the exam form and do not affect the final score, with additional time factored in to accommodate them.
The exam is delivered online through Databricks' exam delivery platform and costs USD $200. A passing score of 70% is required. Certification is valid for two years, after which recertification requires retaking the current version of the exam. The current version of the exam is the September 2025 edition.
Earning this certification positions ML engineers and data scientists for senior and staff-level roles that require end-to-end ownership of production ML systems. Job titles commonly associated with this credential include Senior Machine Learning Engineer, MLOps Engineer, ML Platform Engineer, and AI/ML Architect. Organizations adopting the Databricks Lakehouse Platform at scale—particularly in finance, healthcare, retail, and technology sectors—actively seek professionals who can demonstrate validated expertise in production ML workflows rather than just model development.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 622 questions.
1. A machine learning team is implementing distributed PyTorch training for a computer vision model across 4 worker nodes, each with 2 GPUs. They need the training to utilize all 8 GPUs across the cluster. How should they configure TorchDistributor? (Select one!)
Explanation
TorchDistributor with num_processes equals 8, local_mode equals False, and use_gpu equals True correctly distributes training across 8 GPUs on worker nodes. Setting local_mode to False distributes processes across worker nodes in the cluster, while num_processes equals 8 matches the total GPU count (4 nodes times 2 GPUs). Setting local_mode to True would run all 8 processes on the driver node only, which doesn't have 8 GPUs. Using num_processes equals 4 would only utilize 4 of the 8 available GPUs. Setting use_gpu to False would use CPUs instead of the available GPUs.
2. A machine learning team is monitoring a regression model using Lakehouse Monitoring. The model predicts customer lifetime value. During the review of drift metrics, they observe that the Wasserstein distance for the predicted_value column has increased from 0.05 to 0.35 over the past week while input feature distributions remain stable. Which type of drift does this indicate? (Select one!)
Explanation
Prediction drift occurs when the distribution of model outputs changes while input feature distributions remain stable. Wasserstein distance measures the difference between probability distributions, and an increase in this metric for the predicted_value column with stable features indicates prediction drift. This can signal concept drift but prediction drift is the direct observation. Feature drift would show changes in input feature distributions, which the question states are stable. Wasserstein distance on predictions does not directly measure label drift, which requires comparing actual target values. Concept drift is the underlying cause but prediction drift is the observed phenomenon.
3. A data engineering team is evaluating a multi-class classification model using SparkML. They need to select an evaluation metric that balances precision and recall across all classes while accounting for class imbalance. Which metric should they use with MulticlassClassificationEvaluator? (Select one!)
Explanation
The f1 metric is the harmonic mean of precision and recall, providing a balanced measure of model performance across all classes. It is the default metric for MulticlassClassificationEvaluator and effectively handles class imbalance by considering both false positives and false negatives. Accuracy can be misleading with imbalanced datasets as it may show high scores even when minority classes are poorly predicted. WeightedPrecision only considers false positives and ignores false negatives, providing an incomplete picture. LogLoss measures prediction confidence rather than balancing precision and recall trade-offs.
4. A data science team is implementing batch inference using a Spark UDF to apply a registered Unity Catalog model across 100 million records. They need to specify the Python environment manager for dependency isolation. Which env_manager option provides virtual environment isolation for the UDF? (Select one!)
Explanation
The virtualenv environment manager creates isolated Python virtual environments for Spark UDF execution, ensuring model dependencies do not conflict with cluster libraries. Conda environment manager is deprecated for Spark UDFs in recent MLflow versions. Local environment manager uses the cluster's existing Python environment without isolation, risking dependency conflicts. Container is not a valid env_manager option for mlflow.pyfunc.spark_udf.
5. An ML team is optimizing feature lookup performance for their production feature tables. They have databricks-feature-engineering version 0.7.0 installed and Photon is enabled on their cluster. The feature tables contain millions of rows with frequent updates. Which combination of optimizations will provide the best performance? (Select one!)
Explanation
Liquid Clustering is the recommended optimization for databricks-feature-engineering version 0.6.0 and above, replacing Z-Ordering as the preferred approach for feature tables. Setting use_spark_native_join=True enables Photon acceleration for feature joins, significantly improving lookup performance when Photon is enabled on the cluster. Z-Ordering is the legacy optimization technique that was recommended for versions below 0.6.0 and provides less efficient write operations compared to Liquid Clustering. Traditional partitioning requires manual maintenance and does not provide the same adaptive optimization benefits. Disabling Photon reduces performance rather than improving it, as Photon is specifically designed to accelerate Spark operations including feature engineering joins.
One-time access to this exam