Microsoft • DP-800
Validates expertise in designing and developing AI-enabled database solutions across Microsoft SQL platforms including SQL Server, Azure SQL, and SQL databases in Microsoft Fabric. Covers T-SQL development, CI/CD practices, security, performance optimization, and implementing AI capabilities such as vector search and RAG.
Questions
600
Duration
100 minutes
Passing Score
700/1000
Difficulty
AssociateLast Updated
May 2026
The Microsoft Certified: SQL AI Developer Associate certification, earned by passing Exam DP-800 (Developing AI-Enabled Database Solutions), validates expertise in designing and building AI-enabled database solutions across the full breadth of Microsoft SQL platforms—Microsoft SQL Server, Azure SQL, and SQL databases in Microsoft Fabric. The credential covers the complete lifecycle of modern database development: schema design, advanced T-SQL programming, performance optimization, data security, CI/CD automation via SQL Database Projects, and deep integration with Azure services such as Data API builder (DAB), Azure Monitor, and Azure Functions.
Released in early 2026, the certification reflects the industry shift toward embedding AI directly inside the database tier rather than relying solely on external AI services. Candidates must demonstrate practical knowledge of AI-assisted development tooling (GitHub Copilot, Microsoft Copilot in Fabric, Model Context Protocol), as well as core AI concepts—vector embeddings, semantic search, hybrid search, and Retrieval-Augmented Generation (RAG) using native T-SQL functions and the sp_invoke_external_rest_endpoint stored procedure.
This certification targets mid-level database developers—typically with two or more years of hands-on T-SQL experience—who are expanding into AI-integrated and cloud-scale architectures. Ideal candidates hold roles such as SQL/database developer, Azure SQL developer, data engineer, or backend application developer who owns the database layer. Professionals working with Microsoft Fabric, Azure SQL Database, or SQL Server who need to expose intelligent search or natural-language interfaces to applications are a natural fit.
Candidates collaborate daily with application developers, DBAs, architects, AI engineers, DevSecOps engineers, and security administrators. Some exposure to GitHub-based CI/CD workflows and foundational AI concepts (embeddings, vectors, language models) is expected before attempting the exam.
Microsoft does not enforce formal prerequisites for DP-800, but the exam assumes solid practical experience. Candidates should be comfortable writing complex T-SQL including CTEs, window functions, JSON functions, and stored procedures before studying the AI-specific content. Familiarity with database design fundamentals—indexes, constraints, partitioning, row-level security, Always Encrypted, and Dynamic Data Masking—is assumed throughout.
On the tooling side, experience with GitHub (branching, pull requests, Actions) and a working knowledge of Azure services such as Azure Functions, Logic Apps, and Azure Monitor will reduce the learning curve significantly. Exposure to AI/ML concepts—particularly what embeddings are, how vector similarity works, and what a language model prompt looks like—is recommended. Microsoft's free self-paced learning path 'Become a SQL AI Developer: Prepare for Certification Exam DP-800' on Microsoft Learn can fill gaps in any of these areas.
Exam DP-800 is a proctored assessment delivered through Pearson VUE, available in English. Candidates have 120 minutes to complete the exam (the official certification page specifies 120 minutes; allow approximately 100 minutes of active test time). The exam may include interactive lab or scenario-based components in addition to standard multiple-choice and multi-select question types. Candidates can explore the user interface in advance using Microsoft's free Exam Sandbox at aka.ms/examdemo.
A scaled score of 700 or higher (on a 1–1000 scale) is required to pass. If a candidate fails the first attempt, they must wait 24 hours before retaking; subsequent retake intervals vary per Microsoft's standard retake policy. The certification is valid for one year and can be renewed at no cost via an online assessment on Microsoft Learn. Exam price varies by country/region as set by Pearson VUE.
The SQL AI Developer Associate credential positions holders at the intersection of two high-demand skill sets—enterprise SQL development and applied AI engineering—making them valuable to organizations adopting Microsoft Fabric, Azure SQL, or SQL Server 2022+ for intelligent application backends. Roles commonly associated with this certification include SQL/Database Developer, Azure Data Engineer, Backend Developer, and AI Integration Engineer. Because the certification is new (2026), early adopters gain a differentiation advantage as enterprises ramp up AI-enabled data architectures across regulated industries such as finance, healthcare, and retail.
While Microsoft does not publish salary data tied to specific certifications, data engineers and database developers with Azure AI skills command salaries in the $110,000–$145,000 range in the US market (2025–2026 surveys), with premiums for Fabric and AI integration experience. The DP-800 complements adjacent certifications such as DP-300 (Azure Database Administrator Associate) and DP-700 (Fabric Data Engineer Associate), and serves as a natural progression for SQL professionals who have outgrown purely administrative or ETL-focused roles and want to build AI-powered data products.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 600 questions.
1. Contoso is developing a real-time trading platform session cache using memory-optimized tables in SQL Server 2025. The SessionTrades table uses TradeID (bigint) as the sole lookup key, with analysts projecting approximately 600,000 unique active trades at peak load. A junior developer created a hash index on TradeID with BUCKET_COUNT = 32768, arguing it satisfies the power-of-2 requirement. The senior DBA observes severely degraded lookup performance under load. What is the root cause and the recommended BUCKET_COUNT? (Select one!)
Explanation
Hash index performance degrades severely when BUCKET_COUNT is significantly smaller than the number of unique key values. The recommended bucket count is 1 to 2 times the expected unique value count, meaning 600,000 to 1,200,000 for this scenario. SQL Server internally rounds the specified value up to the next power of 2, so specifying 1,000,000 results in an effective bucket count of 1,048,576. With BUCKET_COUNT = 32,768, approximately 18 trades would hash to each bucket on average, creating long collision chains that cause measurable performance degradation — chains exceeding 100 entries produce observable slowdowns. Hash indexes fully support bigint columns. SQL Server does not dynamically resize hash index bucket counts after table creation; the value must be chosen carefully at design time based on expected cardinality.
2. A data quality engineer at Northwind Analytics is building a customer deduplication pipeline in Microsoft Fabric SQL Database. For one deduplication pass they need a normalized edit-distance score between 0 and 100. For a second pass they need prefix-weighted similarity scoring to compare customer names where the beginning of the string carries more discriminating weight. Which functions should they use for each pass respectively? (Select one!)
Explanation
EDIT_DISTANCE_SIMILARITY returns a normalized integer percentage from 0 to 100 based on the Levenshtein edit distance between two strings, measuring the minimum number of single-character insertions, deletions, or substitutions required to transform one string into the other. JARO_WINKLER_DISTANCE is specifically designed for prefix-weighted similarity scoring, assigning greater importance to characters at the beginning of the string, which makes it well-suited for name matching scenarios where the opening characters are more discriminating. JARO_WINKLER_DISTANCE does not return a Levenshtein-based percentage. REGEXP_LIKE is a Boolean pattern-matching predicate, not a numeric similarity function. EDIT_DISTANCE_SIMILARITY does not accept a prefix-weight parameter and does not incorporate Jaro-Winkler weighting logic.
3. A developer at Contoso Search is implementing a semantic product similarity feature in SQL Server 2025. The product catalog contains 35,000 product embeddings stored in a VECTOR(1536) column named ProductEmbedding. A colleague recommends creating a DiskANN vector index on the column to improve search latency. Based on Microsoft's guidance for vector search in SQL Server 2025, what approach should the developer take? (Select one!)
Explanation
Microsoft recommends using exact nearest neighbor search without a vector index for datasets containing fewer than approximately 50,000 vectors. At this scale, an exhaustive distance calculation across all vectors completes fast enough that the overhead of maintaining and querying an approximate nearest neighbor index exceeds the latency benefit. Exact nearest neighbor search also provides a recall of 1.0, meaning it always returns the true closest vectors, whereas DiskANN targets approximately 0.95 recall by trading a small amount of accuracy for significantly reduced latency on very large datasets. DiskANN vector indexes become meaningful at scales of 100,000 vectors or more where exhaustive search latency grows to unacceptable levels. SQL Server 2025 uses the DiskANN algorithm based on Microsoft Research's Vamana graph structure for its vector indexing capability; HNSW is the algorithm used by pgvector for PostgreSQL and other vector databases and is not available in SQL Server. Full-text search indexes operate on tokenized character data and have no connection to floating-point vector operations on VECTOR typed columns.
4. A developer at Wide World Importers is implementing a customer deduplication workflow in Microsoft Fabric SQL. Customer last names were entered manually and contain character-level variations such as 'Smithe' instead of 'Smith' and 'Jonson' instead of 'Johnson'. The developer needs a function that returns a normalized integer percentage indicating how similar two strings are, based on the minimum number of single-character insertions, deletions, and substitutions required to transform one string into the other. Which function should the developer use? (Select one!)
Explanation
EDIT_DISTANCE_SIMILARITY returns a normalized integer percentage from 0 to 100 derived from the Levenshtein edit distance algorithm. The Levenshtein distance counts the minimum number of single-character insertions, deletions, and substitutions needed to transform one string into another, and normalizing this count produces a percentage score suitable for threshold comparisons such as WHERE EDIT_DISTANCE_SIMILARITY(LastName, @inputName) >= 75. JARO_WINKLER_DISTANCE returns a decimal similarity score between 0 and 1 with additional prefix weighting, making it well suited for short strings and phonetic name matching, but it does not return a normalized 0-to-100 integer percentage and measures similarity through a different algorithm than edit distance. REGEXP_LIKE performs pattern matching against a regular expression and returns a Boolean result, which cannot express degrees of similarity between two arbitrary strings with character-level variations. CHARINDEX locates exact substrings and returns a position integer, providing no fuzzy similarity measurement whatsoever.
5. A developer at Fabrikam AI is building a semantic product search feature in Azure SQL Database. The Products table will initially contain 80,000 rows with a vector column storing 1,536-dimension embeddings. The dataset is expected to grow to 3 million rows within one year. The developer is deciding between exact kNN and approximate nearest neighbor search using a VECTOR INDEX with DiskANN. Based on Microsoft recommendations, when should the developer switch from exact kNN to a VECTOR INDEX with DiskANN? (Select one!)
Explanation
Microsoft recommends using exact kNN (exhaustive distance calculation scanning all rows) for datasets with fewer than approximately 50,000 vectors, where the full table scan remains fast enough for acceptable query latency. For larger datasets, creating a VECTOR INDEX using the DiskANN algorithm provides significantly better query performance through approximate nearest neighbor search. DiskANN uses a Vamana graph structure developed by Microsoft Research and achieves approximately 0.95 recall (95%) while delivering sub-10ms query latency even on billion-scale datasets, with roughly 90% memory reduction compared to in-memory ANN algorithms like HNSW. Since the Fabrikam dataset already exceeds 50,000 vectors at 80,000 rows and will grow to 3 million, creating a VECTOR INDEX immediately is the appropriate architectural decision. The threshold for switching is based on vector count rather than query latency, vector dimensionality, or a billion-row milestone.
One-time access to this exam