AWS • DEA-C01

AWS Certified Data Engineer - Associate (DEA-C01) Practice Test

Validates ability to implement data pipelines and to monitor, troubleshoot, and optimize cost and performance issues in accordance with best practices.

Exam Details

Questions

1124

Duration

130 minutes

Passing Score

720/1000

Difficulty

Associate

Last Updated

Jan 2026

Topics Covered

Data Ingestion and TransformationData Store ManagementData Operations and SupportData Security and Governance

Exam Domain Breakdown

Data Ingestion and Transformation34%

Data Store Management26%

Data Operations and Support22%

Data Security and Governance18%

Exam Overview

The AWS Certified Data Engineer – Associate (DEA-C01) is an associate-level credential that validates a practitioner's ability to implement, monitor, and optimize data pipelines on AWS. Launched in 2023, it is the first AWS certification designed specifically for data engineers, replacing the need to cobble credentials from Solutions Architect or Data Analytics Specialty exams. The exam assesses proficiency across the full data engineering lifecycle: ingesting and transforming data, selecting and managing appropriate data stores, orchestrating pipelines using programming concepts, and enforcing data security and governance policies using AWS-native tooling.

Key AWS services in scope include Amazon S3, AWS Glue, Amazon Redshift, Amazon Kinesis, Amazon EMR, AWS Lake Formation, Amazon DynamoDB, AWS Database Migration Service, and Amazon Athena, among others. Candidates are evaluated on their ability to compare cost and performance trade-offs between services, apply SQL on AWS platforms, implement encryption and access controls, and validate data quality and consistency. Out-of-scope topics include ML model training and inference, programming-language-specific syntax, and deriving business conclusions from data analysis.

Official exam page

Who Should Take This Exam

The target candidate is a data engineer or data architect with roughly 2–3 years of experience in data engineering and at least 1–2 years of hands-on AWS experience. This includes professionals who design and maintain ETL/ELT pipelines, manage data lakes and warehouses, or work with real-time streaming architectures. Adjacent roles transitioning into cloud data engineering — such as database administrators, backend developers, or traditional ETL developers — will also find this certification a clear roadmap for bridging legacy skills with AWS-native approaches.

The exam suits those who regularly work with concepts such as volume, variety, and velocity of data; data modeling and schema design; data lifecycle management; and cloud security and governance. It is not aimed at data scientists, ML engineers, or business analysts, as those domains fall outside the exam's scope.

Prerequisites

AWS does not enforce formal prerequisites for the DEA-C01, but the official exam guide recommends 2–3 years of data engineering or data architecture experience and 1–2 years of hands-on work with AWS services. Candidates should be comfortable setting up and maintaining ETL pipelines from ingestion to destination, writing and executing SQL queries, using Git-based source control workflows, and applying language-agnostic programming concepts (loops, conditionals, data structures).

On the AWS side, recommended knowledge includes familiarity with data pipeline orchestration services (AWS Glue, AWS Step Functions), storage systems (Amazon S3, Amazon Redshift, Amazon DynamoDB), streaming platforms (Amazon Kinesis), and security/governance services (AWS IAM, AWS KMS, AWS Lake Formation). Understanding of data lakes, networking fundamentals (VPC, subnets, connectivity), compute options (Amazon EMR, AWS Lambda), and vector/embedding concepts is also beneficial. While no prior AWS certification is required, having the AWS Cloud Practitioner or AWS Solutions Architect – Associate background provides a useful foundation.

Exam Format

The DEA-C01 exam consists of 65 total questions: 50 scored questions that contribute to the final result and 15 unscored pilot questions that AWS uses to evaluate future content. Unscored questions are not identified, so candidates should treat all questions equally. Question types are multiple choice (one correct answer from four options) and multiple response (two or more correct answers from five or more options). The time limit is 130 minutes, and the exam is delivered via Pearson VUE at a testing center or through an online proctored session. The exam is available in English, Japanese, Korean, and Simplified Chinese, and costs $150 USD.

Scores are reported on a scaled range of 100–1,000, and the minimum passing score is 720. AWS uses a compensatory scoring model, meaning candidates do not need to achieve a passing threshold in each individual domain — only the overall scaled score matters. Unanswered questions are treated as incorrect; there is no penalty for guessing. The certification is valid for three years, after which recertification requires passing the current version of the exam.

Skills Measured

1.Domain 1 – Data Ingestion and Transformation (34%): Covers reading from and writing to diverse data sources (batch and streaming), using AWS Glue for ETL jobs and crawlers, ingesting data with Amazon Kinesis Data Streams and Kinesis Firehose, applying transformation logic with AWS Lambda and Amazon EMR, and orchestrating pipelines with AWS Step Functions and Amazon MWAA. Also includes applying programming concepts (language-agnostic) to automate pipeline tasks.
2.Domain 2 – Data Store Management (26%): Covers selecting appropriate storage services (Amazon S3, Amazon Redshift, Amazon DynamoDB, Amazon RDS, Amazon OpenSearch Service) based on access patterns, latency, cost, and scale. Includes designing data models, implementing partitioning and compression strategies, managing schema evolution with AWS Glue Data Catalog, and enforcing data lifecycle policies (S3 lifecycle rules, Redshift data sharing).
3.Domain 3 – Data Operations and Support (22%): Covers operationalizing and monitoring data pipelines using Amazon CloudWatch, AWS CloudTrail, and AWS Glue job metrics. Includes automating pipeline deployments with AWS CDK or CloudFormation, troubleshooting pipeline failures, performing data quality checks and validation, and analyzing data using Amazon Athena and Amazon Redshift queries.
4.Domain 4 – Data Security and Governance (18%): Covers implementing authentication and authorization using AWS IAM, resource-based policies, and AWS Lake Formation column- and row-level security. Includes encrypting data at rest and in transit with AWS KMS and SSL/TLS, managing data privacy and compliance requirements, configuring audit logging via AWS CloudTrail, and applying data governance frameworks using AWS Glue Data Catalog and Lake Formation.

Study Tips

Start with the official AWS Exam Guide (PDF from awsstatic.com) and map every task statement to your existing knowledge gaps before studying anything else — prioritize Domain 1 (34%) and Domain 2 (26%) since they together account for 60% of the scored content.
Use AWS Skill Builder's official 'AWS Certified Data Engineer – Associate' exam prep course, which includes video modules aligned to each domain and the Official Practice Question Set with rationales explaining both correct and incorrect answers.
Build hands-on labs around the core data pipeline services: create an end-to-end pipeline using AWS Glue crawlers and ETL jobs, write to Amazon Redshift, and stream real-time data through Amazon Kinesis Data Firehose into S3 — this covers the three most heavily tested service categories.
Take the AWS Certification Official Pretest on Skill Builder early in your study cycle to identify weak domains, then revisit it after studying to measure progress before scheduling the real exam.
For Domain 4 (Security and Governance), specifically practice configuring AWS Lake Formation permissions, column-level security, and the difference between IAM-based vs. Lake Formation–based access control, as these are frequently tested and commonly misunderstood.
Review the AWS Well-Architected Framework's Data Analytics Lens and the AWS whitepapers on data lake architecture and streaming data solutions — these provide the architectural reasoning behind best-practice questions on the exam.
Practice with multiple-response question formats specifically, since these are harder to guess and require knowing all correct answers. Use timed practice sets to simulate the 130-minute constraint across 65 questions (roughly 2 minutes per question).

Career Benefits

The DEA-C01 certification targets one of the fastest-growing roles in cloud computing. AWS-certified data engineers in the US report average salaries around $141,000 per year according to Glassdoor data, with entry-level positions starting near $124,000–$130,000 and senior roles exceeding $175,000. Research from the Jefferson Frank Careers and Hiring Guide found that 73% of AWS professionals saw a salary increase after certification, averaging approximately 27%. Job roles accessible with this credential include Data Engineer, Cloud Data Architect, ETL/ELT Developer, Data Platform Engineer, and Analytics Engineer.

AWS certifications appear in cloud job postings more than any other vendor credential, and the DEA-C01 specifically validates the services — Glue, Redshift, Kinesis, S3 — that dominate real-world data engineering job requirements. For professionals transitioning from database administration, backend development, or traditional ETL roles, the certification provides a structured path into cloud-native data engineering. Many candidates report role transitions or salary increases within 3–6 months of earning the credential. Pairing DEA-C01 with the Databricks Data Engineer Associate certification is widely considered the most job-market-relevant two-certification combination in the data engineering space.

Sample Questions

Preview — answers shown

5 sample questions with correct answers and explanations. Start a practice session to test yourself across all 1124 questions.

1. Adatum Analytics uses Amazon QuickSight Enterprise Edition with SPICE for dashboarding. The security team requires that sales managers can only see data for their own region in all dashboards, while the finance team can see data for all regions but should not have access to the employee_salary column. Which combination of QuickSight features should the team implement? (Select two!)

Multiple correct answers

AConfigure row-level security (RLS) rules based on the user's region assignment

BConfigure column-level security (CLS) to restrict access to the employee_salary column for the finance team

CCreate separate SPICE datasets for each region and assign them to the corresponding sales managers

DUse QuickSight parameters with dynamic default values to filter data by region at the dashboard level

ECreate separate QuickSight accounts for each region to enforce data isolation

Explanation

Row-level security in QuickSight restricts which rows each user or group can see based on defined rules, ensuring sales managers only see their own region's data. Column-level security, available in Enterprise Edition, restricts access to specific columns, preventing the finance team from seeing the employee_salary column. Creating separate SPICE datasets per region increases maintenance overhead and does not scale well as regions are added. Dashboard parameters with dynamic defaults can be overridden by users and do not enforce security restrictions. Separate QuickSight accounts are unnecessary and create significant administrative complexity when a single account with RLS and CLS provides the required granular access control.

2. A data engineer at Fabrikam Corp needs to transfer 500 GB of incremental data nightly from an on-premises NFS file server to Amazon S3. The transfer must complete within a 2-hour window over a 1 Gbps AWS Direct Connect link. The data engineer also needs to exclude temporary log files matching *.tmp patterns and limit bandwidth usage to 70% of the link capacity so other business applications are not impacted. Which service should the data engineer use? (Select one!)

AA custom script using the AWS CLI s3 sync command with the --exclude flag running on an on-premises server

BAWS Snow Family Snowball Edge device for nightly offline data transfer

CAWS DataSync with an on-premises agent, configured with include/exclude filters and bandwidth throttling

DAWS Transfer Family with an SFTP server endpoint mapped to the S3 bucket

Explanation

AWS DataSync provides high-speed online data transfer from on-premises NFS to Amazon S3, capable of fully utilizing up to 10 Gbps network links using a purpose-built transfer protocol with multi-threaded connections and in-line compression. At 70% of a 1 Gbps link (approximately 700 Mbps effective throughput), 500 GB can be transferred well within the 2-hour window. DataSync natively supports include/exclude file filters to skip temporary files matching patterns like *.tmp and provides granular bandwidth throttling to cap usage at a specified rate, protecting other applications on the network. An on-premises agent connects to the NFS server and handles scheduling, incremental transfers, error handling, and data integrity verification automatically. Transfer Family provides SFTP/FTPS endpoints but lacks built-in file filtering by pattern and bandwidth throttling capabilities and is designed for individual file transfers rather than bulk migration workloads. AWS CLI s3 sync can use the --exclude flag but requires custom scripting for bandwidth throttling, scheduling, retry logic, and data integrity verification, increasing operational overhead. Snow Family devices require multi-day shipping turnaround for each transfer cycle, making them entirely unsuitable for a nightly transfer window.

3. A data engineer at Adatum Analytics is building a real-time fraud detection pipeline. Transaction events arrive in Amazon Kinesis Data Streams at variable rates ranging from 1,000 to 50,000 events per second. The pipeline must detect fraudulent patterns within sliding windows of 5 minutes and produce alerts within 1 second of pattern detection. Late-arriving events up to 2 minutes should still be processed correctly. Which processing architecture meets these requirements? (Select one!)

AAWS Glue Spark Streaming job reading from Kinesis with a 30-second micro-batch interval and window aggregations

BAmazon Kinesis Data Streams with a KCL consumer application performing in-memory aggregations with 5-minute tumbling windows

CAmazon Managed Service for Apache Flink application with sliding windows, event-time processing, and watermarks configured for 2-minute late arrival tolerance

DAmazon Data Firehose with a Lambda transformation function using a 60-second buffer interval to analyze transaction patterns

Explanation

Amazon Managed Service for Apache Flink provides exactly-once stateful stream processing with native support for sliding windows, event-time semantics, and watermarks for handling late-arriving data. Watermarks can be configured to tolerate 2-minute late arrivals, and sliding windows enable continuous pattern detection over 5-minute intervals. Flink's sub-second processing latency meets the 1-second alerting requirement. Data Firehose is a delivery service, not a stream processing engine, and its minimum 60-second buffer does not support 1-second alerting requirements. Glue Spark Streaming operates in micro-batch mode with minimum intervals of 30 seconds or more, which cannot meet the 1-second latency requirement. A KCL consumer with tumbling windows would miss patterns that span window boundaries, and building custom sliding window logic with late-arrival handling requires significant development effort compared to Flink's native support.

4. A data engineer needs to schedule daily AWS Glue jobs that do not require execution or completion at a specific time. Which solution runs the Glue jobs most cost-effectively?

AChoose the FLEX execution class in the Glue job properties

BChoose the STANDARD execution class in the Glue job properties

CChoose the latest version in the GlueVersion field in the Glue job properties

DUse the Spot Instance type in Glue job properties

Explanation

The FLEX execution class in AWS Glue runs jobs on spare compute capacity in the AWS cloud at a discounted rate compared to the STANDARD execution class. Because spare capacity availability can fluctuate, FLEX jobs may experience variable startup times and can be interrupted and restarted, but for daily workloads with no strict execution window or deadline, this variability is an acceptable trade-off for the cost reduction. FLEX is specifically designed for non-time-sensitive batch workloads. Spot Instances are an EC2 purchasing option and are not a configuration option available in AWS Glue job properties; Glue abstracts the underlying compute infrastructure. The STANDARD execution class uses dedicated capacity with predictable startup and execution times, which is appropriate for time-sensitive production jobs but costs more than FLEX for workloads where timing flexibility is available. The GlueVersion field specifies the Glue runtime version that determines available features and engine improvements but has no direct bearing on execution cost or job scheduling behavior.

5. A data engineer works on an ML project requiring both low-latency real-time feature access for model predictions and batch feature access for model training. Which Amazon SageMaker Feature Store mode should be used?

ABatch mode

BOnline mode only

COffline mode only

DOnline and Offline mode

Explanation

Amazon SageMaker Feature Store's Online and Offline mode simultaneously maintains both an online store optimized for low-latency millisecond reads needed for real-time model inference and an offline store backed by S3 that supports large-scale batch retrieval for model training and batch inference. Enabling both modes ensures features are available for real-time predictions and for training pipelines without needing to maintain separate systems or synchronize data manually. Online-only mode provides fast real-time access but cannot serve the large batch queries required for model training at scale. Offline-only mode supports batch training workloads but cannot provide the low-latency feature retrieval needed for real-time predictions. There is no standalone Batch mode in SageMaker Feature Store; batch ingestion is a method of populating the offline store, not a mode of the feature store itself.

One-time access to this exam

Full access to all 1124 questions

Or $15/mo for all 201 exams

Detailed explanations

Free preview stays available