Databricks • DCMLEA
Validates foundational knowledge of machine learning on the Databricks platform, covering AutoML, Feature Store, ML workflows and experiment tracking with MLflow, model development with Spark ML, and model deployment and serving.
Questions
630
Duration
90 minutes
Passing Score
70%
Difficulty
AssociateLast Updated
Feb 2026
The Databricks Certified Machine Learning Associate certification validates foundational knowledge and practical ability to perform core machine learning tasks on the Databricks Lakehouse Platform. The exam covers the full ML lifecycle, including exploratory data analysis, feature engineering, model training, hyperparameter tuning, evaluation, and deployment using Databricks-native tooling such as AutoML, the Feature Store, Unity Catalog integration, and Managed MLflow for experiment tracking and model registry. Candidates are expected to demonstrate proficiency with both single-node and distributed machine learning approaches, including Spark ML APIs, Hyperopt with SparkTrials, and Pandas UDFs.
The certification was updated on October 28, 2024, to reflect current platform capabilities including real-time, batch, and streaming inference patterns as well as MLOps best practices such as model metadata tagging. All machine learning code on the exam is in Python; data manipulation code outside ML-specific tasks may appear in SQL. The exam is administered online through Databricks' exam delivery platform and costs $200 USD, with local taxes potentially applicable.
This certification is designed for data scientists, machine learning engineers, and ML-adjacent data engineers who perform machine learning workflows on Databricks and want to validate their skills at an associate level. Candidates are typically early-to-mid career practitioners with approximately 6 or more months of hands-on experience using Databricks for machine learning tasks including model training, tuning, and deployment.
The exam is also well-suited for analytics consultants and data engineers who collaborate closely with ML teams and want to deepen their understanding of the Databricks ML platform. It serves as a prerequisite stepping stone for the Databricks Certified Machine Learning Professional certification.
There are no formal prerequisites required to sit for this exam. However, Databricks recommends at least 6 months of hands-on experience performing machine learning tasks on the Databricks platform as outlined in the official exam guide. Candidates should have practical familiarity with Databricks workspaces, clusters, Repos, and Jobs, as well as the Databricks Runtime for Machine Learning and its bundled libraries.
A foundational understanding of machine learning concepts—including supervised learning, feature engineering, model evaluation metrics, and hyperparameter tuning—is expected. Familiarity with Python and a working knowledge of Apache Spark concepts (DataFrames, distributed computation) are strongly recommended, as Spark ML accounts for the largest share of exam content.
The Databricks Certified Machine Learning Associate exam consists of 48 scored multiple-choice and multiple-response questions to be completed within 90 minutes. The passing score is 70%. The exam may include a small number of unscored items used to gather statistical data for future exam development; these items are not identified on the form, do not count toward the final score, and are accounted for in the total allotted time.
The exam is delivered online through Databricks' exam delivery platform and can be taken remotely. All ML code presented in questions is written in Python; SQL may appear for non-ML data manipulation scenarios. The certification is valid for two years from the date of passing, after which recertification is required to maintain certified status. The exam fee is $200 USD (local taxes may apply).
Holding the Databricks Certified Machine Learning Associate credential signals verified proficiency with the Databricks Lakehouse Platform for ML—a platform widely adopted across enterprises using the Azure Databricks, AWS, and Google Cloud ecosystems. It is recognized by employers hiring for data scientist, ML engineer, and MLOps roles where Databricks is part of the production stack. The certification is particularly valuable at organizations that have standardized on Databricks for unified data and AI workloads, as it demonstrates readiness to contribute to ML pipelines without extensive onboarding.
While Databricks does not publish official salary data tied to this specific credential, practitioners with Databricks ML certifications and associated skills (Spark, MLflow, cloud ML platforms) command salaries broadly in the $110,000–$160,000+ USD range for ML engineer and data scientist roles in the US market, depending on seniority and location. The Associate-level certification serves as a recognized stepping stone to the Databricks Certified Machine Learning Professional exam, which tests advanced topics such as model monitoring, feature engineering at scale, and custom MLflow integrations.
1. A machine learning team evaluates multiple classification models using CrossValidator in Spark MLlib. They configure CrossValidator with numFolds set to 5 and a parameter grid with 12 combinations. After training completes, they want to analyze the average performance metrics across all folds for each parameter combination to identify the best configuration. Which attribute of the fitted CrossValidator model should they access? (Select one!)
2. A machine learning team configures an MLflow experiment to track model development. Over six months, they log 600,000 runs across multiple projects. They now need to create a new run but receive an error stating the experiment has reached its limit. What is the cause of this error? (Select one!)
3. A machine learning engineer creates a model serving endpoint and configures it with two served entities for A/B testing. After deploying, they realize they need to update the traffic distribution from 50/50 to 80/20. Which API endpoint should they use to modify the traffic configuration? (Select one!)
4. An ML engineer creates a Spark MLlib pipeline for a multi-class classification problem with the following stages: StringIndexer for a categorical feature, OneHotEncoder for the indexed feature, VectorAssembler to combine features, and a classifier. They want to evaluate the model using weighted precision. Which evaluator configuration should they use? (Select one!)
5. A company runs AutoML experiments to classify customer churn with a dataset containing 100,000 records. The data science team wants to limit the experiment duration while ensuring thorough model exploration. They previously used max_trials=50 but discovered this parameter was deprecated in Databricks Runtime 11.0+. Which AutoML configuration should they use instead? (Select one!)
All exams included • Cancel anytime