Databricks • DCGAE
Validates the ability to design, develop, and deploy LLM-powered solutions on Databricks, covering RAG application design and data preparation, prompt engineering and retrieval chains, model serving and deployment, evaluation and monitoring for quality and safety, and governance with Unity Catalog.
Questions
620
Duration
90 minutes
Passing Score
70%
Difficulty
AssociateLast Updated
Feb 2026
The Databricks Certified Generative AI Engineer Associate certification validates an individual's ability to design, develop, and deploy large language model (LLM)-powered solutions on the Databricks platform. The exam tests practical competency across the full generative AI engineering lifecycle, including decomposing complex requirements into multi-stage reasoning pipelines, selecting appropriate models from both open-source and proprietary ecosystems, and implementing retrieval-augmented generation (RAG) applications using Databricks-native tooling.
Certified professionals are expected to demonstrate hands-on proficiency with key Databricks technologies: Vector Search for semantic similarity and document retrieval, Model Serving for scalable endpoint deployment, MLflow for experiment tracking and lifecycle management, and Unity Catalog for data governance and access control. All machine learning code on the exam is in Python; SQL may appear for non-ML data manipulation tasks. The certification is valid for two years, after which recertification requires retaking the current version of the exam.
This certification is designed for practitioners actively building and deploying AI systems in enterprise environments, including AI Engineers, Generative AI Engineers, LLM Engineers, AI Solution Architects, MLOps Engineers, and Data Scientists with a focus on LLM or RAG workflows. It is particularly well-suited for engineers who work within the Databricks ecosystem and need to demonstrate production-level competency in generative AI solution development.
Candidates should have at least six months of hands-on experience developing generative AI solutions, practical familiarity with Python-based ML pipelines, and working knowledge of frameworks such as LangChain or LangGraph. Experience with Databricks-specific tools—MLflow, Unity Catalog, Vector Search, and Model Serving—is strongly recommended before attempting the exam.
There are no formal prerequisites required to register for this exam; any candidate may attempt it. However, Databricks strongly recommends at least six months of hands-on experience in generative AI solution development before sitting for the certification.
Recommended technical knowledge includes Python proficiency (especially for model pipelines and application orchestration), familiarity with LLM concepts such as context windows, tokenization, and prompt engineering techniques (zero-shot, few-shot, chain-of-thought), and practical experience with LangChain or similar orchestration frameworks. Candidates should also be comfortable using Databricks-native tools including MLflow for experiment tracking, Unity Catalog for governance, Vector Search for embedding-based retrieval, and Model Serving for endpoint deployment.
The exam consists of approximately 45 scored multiple-choice and multiple-select questions to be completed within 90 minutes. It is delivered as a proctored online exam, meaning candidates complete it remotely under live or automated proctoring; no external aids are permitted. The exam is available in English, Japanese, Brazilian Portuguese, and Korean. The registration fee is $200 USD (local taxes may apply).
The passing score is 70%. Databricks notes that exams may include additional unscored items used to gather statistical data for future exam development; these items are not identified and do not affect the final score, meaning the total number of questions delivered may be slightly higher than the 45 scored items. The certification remains valid for two years, after which candidates must retake the current exam version to recertify.
Professionals holding this certification are positioned for roles at the intersection of software engineering and applied AI, including Generative AI Engineer, LLM Engineer, AI Solution Architect, and MLOps Engineer. Generative AI engineering roles command some of the highest compensation in the technology sector, with average salaries reported around $214,000 annually in the United States; the certification directly signals enterprise-grade deployment skills that go beyond prototyping or research experience.
The generative AI applications market is projected to grow at a CAGR exceeding 46% through 2030, and employer demand for engineers who can bridge the gap between experimental LLM work and production-ready Databricks deployments continues to outpace supply. As Databricks is widely adopted across Fortune 500 companies for data and AI workloads, this certification carries strong recognition among employers already invested in the Databricks ecosystem. It complements other Databricks credentials (such as the Data Engineer Associate or ML Professional certifications) for practitioners building a comprehensive Databricks certification portfolio.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 620 questions.
1. An ML team is fine-tuning a Meta-Llama-3.1-70B-Instruct model for customer support using conversation logs. The training data includes a system message defining the assistant personality. When they test with Mistral-7B-Instruct using the same data, the training fails. What is the most likely cause? (Select one!)
Explanation
Mistral models do not accept system roles in their chat data formatting, which is explicitly documented as a model-specific constraint. The system message must be removed or incorporated differently for Mistral fine-tuning. Both Mistral and Llama models support CHAT_COMPLETION task type for conversation data. Context length is a configuration parameter that can be adjusted but is not the root cause of validation failures related to system messages. Mistral models do not require starting with user messages - the issue is specifically the system role not being supported.
2. A company is creating a Vector Search index with Databricks-managed embeddings using the databricks-gte-large-en model. After querying the index, they notice inconsistent similarity score ranges across different queries. They want to use cosine similarity for ranking. What preprocessing step is required? (Select one!)
Explanation
The databricks-gte-large-en model does NOT generate normalized embeddings by default. To use cosine similarity with L2 distance ranking equivalence, embeddings must be manually normalized before indexing. This is a critical distinction from some other embedding models that produce pre-normalized outputs. There is no distance_metric='cosine' parameter in Vector Search index configuration as the system uses L2 distance. There is no cosine_normalization query parameter as normalization must happen at indexing time. Databricks-managed embeddings do not automatically normalize vectors for cosine similarity compatibility.
3. A team has deployed an LLM-powered application and needs to analyze request patterns, identify the most expensive queries by token usage, and attribute costs to different internal projects. The endpoint processes 10,000 requests daily. Which monitoring approach should they implement? (Select one!)
Explanation
Enabling inference tables is the correct approach because they automatically log all requests and responses with detailed metadata including token counts, execution duration, requester information, and usage_context fields for cost attribution. The tables are queryable via SQL to analyze patterns and aggregate costs by project. Logs are delivered within approximately 1 hour. CloudWatch metrics export is not a native Databricks feature; endpoint metrics use Prometheus format. Custom logging in application code creates operational overhead and can miss requests. MLflow Tracking is designed for experiment tracking, not high-volume production request logging with 10,000 daily requests, which would create excessive runs.
4. A data scientist is comparing retrieval evaluation metrics for a medical question-answering system. They have a test set with ground truth relevant documents for each query. Which metric specifically measures whether all relevant documents were retrieved in the top-k results? (Select one!)
Explanation
Context Recall measures the completeness of retrieved context by assessing whether all relevant documents from the ground truth set were retrieved. This metric requires ground truth and evaluates retrieval completeness. Context Precision measures the relevance of retrieved documents but not completeness. Mean Reciprocal Rank focuses on the position of the first relevant document. NDCG@k measures ranking quality with graded relevance but is not specifically designed to measure complete retrieval of all relevant documents.
5. A healthcare company is deploying an AI agent that must comply with strict data retention policies. All inference requests and responses must be logged for audit purposes for 7 years. The endpoint serves a fine-tuned model registered in Unity Catalog. The compliance team needs to verify that the logging infrastructure can capture all interactions. Which feature should be enabled to meet this requirement? (Select one!)
Explanation
Inference tables automatically log all requests and responses to the model serving endpoint in a Delta table with detailed information including timestamps, request/response payloads, and requester identity. The Delta table can be configured with appropriate retention policies to meet the 7-year requirement. MLflow automatic logging tracks training and experiments but not production serving inference. Unity Catalog audit logs capture governance events but not detailed inference request/response data. Custom logging in the predict() method adds complexity and may miss requests if the endpoint handles routing and authentication before reaching the model code.
One-time access to this exam