AWS β’ DEA-C01
Validates ability to implement data pipelines and to monitor, troubleshoot, and optimize cost and performance issues in accordance with best practices.
Questions
1124
Duration
130 minutes
Passing Score
720/1000
Difficulty
AssociateLast Updated
Jan 2026
The AWS Certified Data Engineer β Associate (DEA-C01) is an associate-level credential that validates a practitioner's ability to implement, monitor, and optimize data pipelines on AWS. Launched in 2023, it is the first AWS certification designed specifically for data engineers, replacing the need to cobble credentials from Solutions Architect or Data Analytics Specialty exams. The exam assesses proficiency across the full data engineering lifecycle: ingesting and transforming data, selecting and managing appropriate data stores, orchestrating pipelines using programming concepts, and enforcing data security and governance policies using AWS-native tooling.
Key AWS services in scope include Amazon S3, AWS Glue, Amazon Redshift, Amazon Kinesis, Amazon EMR, AWS Lake Formation, Amazon DynamoDB, AWS Database Migration Service, and Amazon Athena, among others. Candidates are evaluated on their ability to compare cost and performance trade-offs between services, apply SQL on AWS platforms, implement encryption and access controls, and validate data quality and consistency. Out-of-scope topics include ML model training and inference, programming-language-specific syntax, and deriving business conclusions from data analysis.
The target candidate is a data engineer or data architect with roughly 2β3 years of experience in data engineering and at least 1β2 years of hands-on AWS experience. This includes professionals who design and maintain ETL/ELT pipelines, manage data lakes and warehouses, or work with real-time streaming architectures. Adjacent roles transitioning into cloud data engineering β such as database administrators, backend developers, or traditional ETL developers β will also find this certification a clear roadmap for bridging legacy skills with AWS-native approaches.
The exam suits those who regularly work with concepts such as volume, variety, and velocity of data; data modeling and schema design; data lifecycle management; and cloud security and governance. It is not aimed at data scientists, ML engineers, or business analysts, as those domains fall outside the exam's scope.
AWS does not enforce formal prerequisites for the DEA-C01, but the official exam guide recommends 2β3 years of data engineering or data architecture experience and 1β2 years of hands-on work with AWS services. Candidates should be comfortable setting up and maintaining ETL pipelines from ingestion to destination, writing and executing SQL queries, using Git-based source control workflows, and applying language-agnostic programming concepts (loops, conditionals, data structures).
On the AWS side, recommended knowledge includes familiarity with data pipeline orchestration services (AWS Glue, AWS Step Functions), storage systems (Amazon S3, Amazon Redshift, Amazon DynamoDB), streaming platforms (Amazon Kinesis), and security/governance services (AWS IAM, AWS KMS, AWS Lake Formation). Understanding of data lakes, networking fundamentals (VPC, subnets, connectivity), compute options (Amazon EMR, AWS Lambda), and vector/embedding concepts is also beneficial. While no prior AWS certification is required, having the AWS Cloud Practitioner or AWS Solutions Architect β Associate background provides a useful foundation.
The DEA-C01 exam consists of 65 total questions: 50 scored questions that contribute to the final result and 15 unscored pilot questions that AWS uses to evaluate future content. Unscored questions are not identified, so candidates should treat all questions equally. Question types are multiple choice (one correct answer from four options) and multiple response (two or more correct answers from five or more options). The time limit is 130 minutes, and the exam is delivered via Pearson VUE at a testing center or through an online proctored session. The exam is available in English, Japanese, Korean, and Simplified Chinese, and costs $150 USD.
Scores are reported on a scaled range of 100β1,000, and the minimum passing score is 720. AWS uses a compensatory scoring model, meaning candidates do not need to achieve a passing threshold in each individual domain β only the overall scaled score matters. Unanswered questions are treated as incorrect; there is no penalty for guessing. The certification is valid for three years, after which recertification requires passing the current version of the exam.
The DEA-C01 certification targets one of the fastest-growing roles in cloud computing. AWS-certified data engineers in the US report average salaries around $141,000 per year according to Glassdoor data, with entry-level positions starting near $124,000β$130,000 and senior roles exceeding $175,000. Research from the Jefferson Frank Careers and Hiring Guide found that 73% of AWS professionals saw a salary increase after certification, averaging approximately 27%. Job roles accessible with this credential include Data Engineer, Cloud Data Architect, ETL/ELT Developer, Data Platform Engineer, and Analytics Engineer.
AWS certifications appear in cloud job postings more than any other vendor credential, and the DEA-C01 specifically validates the services β Glue, Redshift, Kinesis, S3 β that dominate real-world data engineering job requirements. For professionals transitioning from database administration, backend development, or traditional ETL roles, the certification provides a structured path into cloud-native data engineering. Many candidates report role transitions or salary increases within 3β6 months of earning the credential. Pairing DEA-C01 with the Databricks Data Engineer Associate certification is widely considered the most job-market-relevant two-certification combination in the data engineering space.
5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 1124 questions.
1. Adatum Analytics uses Amazon QuickSight Enterprise Edition with SPICE for dashboarding. The security team requires that sales managers can only see data for their own region in all dashboards, while the finance team can see data for all regions but should not have access to the employee_salary column. Which combination of QuickSight features should the team implement? (Select two!)
Multiple correct answersExplanation
Row-level security in QuickSight restricts which rows each user or group can see based on defined rules, ensuring sales managers only see their own region's data. Column-level security, available in Enterprise Edition, restricts access to specific columns, preventing the finance team from seeing the employee_salary column. Creating separate SPICE datasets per region increases maintenance overhead and does not scale well as regions are added. Dashboard parameters with dynamic defaults can be overridden by users and do not enforce security restrictions. Separate QuickSight accounts are unnecessary and create significant administrative complexity when a single account with RLS and CLS provides the required granular access control.
2. A data engineer at Fabrikam Corp needs to transfer 500 GB of incremental data nightly from an on-premises NFS file server to Amazon S3. The transfer must complete within a 2-hour window over a 1 Gbps AWS Direct Connect link. The data engineer also needs to exclude temporary log files matching *.tmp patterns and limit bandwidth usage to 70% of the link capacity so other business applications are not impacted. Which service should the data engineer use? (Select one!)
Explanation
AWS DataSync provides high-speed online data transfer from on-premises NFS to Amazon S3, capable of fully utilizing up to 10 Gbps network links using a purpose-built transfer protocol with multi-threaded connections and in-line compression. At 70% of a 1 Gbps link (approximately 700 Mbps effective throughput), 500 GB can be transferred well within the 2-hour window. DataSync natively supports include/exclude file filters to skip temporary files matching patterns like *.tmp and provides granular bandwidth throttling to cap usage at a specified rate, protecting other applications on the network. An on-premises agent connects to the NFS server and handles scheduling, incremental transfers, error handling, and data integrity verification automatically. Transfer Family provides SFTP/FTPS endpoints but lacks built-in file filtering by pattern and bandwidth throttling capabilities and is designed for individual file transfers rather than bulk migration workloads. AWS CLI s3 sync can use the --exclude flag but requires custom scripting for bandwidth throttling, scheduling, retry logic, and data integrity verification, increasing operational overhead. Snow Family devices require multi-day shipping turnaround for each transfer cycle, making them entirely unsuitable for a nightly transfer window.
3. A data engineer at Adatum Analytics is building a real-time fraud detection pipeline. Transaction events arrive in Amazon Kinesis Data Streams at variable rates ranging from 1,000 to 50,000 events per second. The pipeline must detect fraudulent patterns within sliding windows of 5 minutes and produce alerts within 1 second of pattern detection. Late-arriving events up to 2 minutes should still be processed correctly. Which processing architecture meets these requirements? (Select one!)
Explanation
Amazon Managed Service for Apache Flink provides exactly-once stateful stream processing with native support for sliding windows, event-time semantics, and watermarks for handling late-arriving data. Watermarks can be configured to tolerate 2-minute late arrivals, and sliding windows enable continuous pattern detection over 5-minute intervals. Flink's sub-second processing latency meets the 1-second alerting requirement. Data Firehose is a delivery service, not a stream processing engine, and its minimum 60-second buffer does not support 1-second alerting requirements. Glue Spark Streaming operates in micro-batch mode with minimum intervals of 30 seconds or more, which cannot meet the 1-second latency requirement. A KCL consumer with tumbling windows would miss patterns that span window boundaries, and building custom sliding window logic with late-arrival handling requires significant development effort compared to Flink's native support.
4. A data engineer needs to schedule daily AWS Glue jobs that do not require execution or completion at a specific time. Which solution runs the Glue jobs most cost-effectively?
Explanation
The FLEX execution class in AWS Glue runs jobs on spare compute capacity in the AWS cloud at a discounted rate compared to the STANDARD execution class. Because spare capacity availability can fluctuate, FLEX jobs may experience variable startup times and can be interrupted and restarted, but for daily workloads with no strict execution window or deadline, this variability is an acceptable trade-off for the cost reduction. FLEX is specifically designed for non-time-sensitive batch workloads. Spot Instances are an EC2 purchasing option and are not a configuration option available in AWS Glue job properties; Glue abstracts the underlying compute infrastructure. The STANDARD execution class uses dedicated capacity with predictable startup and execution times, which is appropriate for time-sensitive production jobs but costs more than FLEX for workloads where timing flexibility is available. The GlueVersion field specifies the Glue runtime version that determines available features and engine improvements but has no direct bearing on execution cost or job scheduling behavior.
5. A data engineer works on an ML project requiring both low-latency real-time feature access for model predictions and batch feature access for model training. Which Amazon SageMaker Feature Store mode should be used?
Explanation
Amazon SageMaker Feature Store's Online and Offline mode simultaneously maintains both an online store optimized for low-latency millisecond reads needed for real-time model inference and an offline store backed by S3 that supports large-scale batch retrieval for model training and batch inference. Enabling both modes ensures features are available for real-time predictions and for training pipelines without needing to maintain separate systems or synchronize data manually. Online-only mode provides fast real-time access but cannot serve the large batch queries required for model training at scale. Offline-only mode supports batch training workloads but cannot provide the low-latency feature retrieval needed for real-time predictions. There is no standalone Batch mode in SageMaker Feature Store; batch ingestion is a method of populating the offline store, not a mode of the feature store itself.
One-time access to this exam