Category: AWS Certification

Assuring Data Quality: How to Build a Serverless Data Quality Gate on AWS

Data is a vital element in business decision-making. Modern technologies and algorithms allow for processing and storage of huge amounts of data, converting it into useful predictions and insights. But they also require high-quality data to ensure prediction accuracy and insight value. In today’s world, the importance of data quality validation is hard to overestimate. … Continue reading Assuring Data Quality: How to Build a Serverless Data Quality Gate on AWS

Autonomous data observability and quality within AWS Glue Data Pipeline

Data operations and engineering teams spend 30-40% of their time firefighting data issues raised by business stakeholders. A large percentage of these data errors can be attributed to the errors present in the source system or errors that occurred or could have been detected in the data pipeline. Current data validation approaches for the data … Continue reading Autonomous data observability and quality within AWS Glue Data Pipeline

Amazon EMR introduces EMR runtime for Apache Spark

Amazon EMR is happy to announce Amazon EMR runtime for Apache Spark, a performance-optimized runtime environment for Apache Spark that is active by default on Amazon EMR clusters. EMR runtime for Spark is up to 32 times faster than EMR 5.16, with 100% API compatibility with open-source Spark. This means that your workloads run faster, … Continue reading Amazon EMR introduces EMR runtime for Apache Spark