Data's Blog

//----

Capabilities of AWS Glue Data Quality

AWS Glue Data Quality accelerates your data quality journey with the following key capabilities:

Rule Type Description
AggregateMatch Checks if two datasets match by comparing summary metrics like total sales amount. Useful for customers to compare if all data is ingested from source systems.
ColumnCorrelation Checks how well two columns are corelated.
ColumnCount Checks if any columns are dropped.
ColumnDataType Checks if a column is compliant with a data type.
ColumnExists Checks if columns exist in a dataset. This allows customers building self-service data platforms to ensure certain columns are made available.
ColumnLength Checks if length of data is consistent.
ColumnNamesMatchPattern Checks if column names match defined patterns. Useful for governance teams to enforce column name consistency.
ColumnValues Checks if data is consistent per defined values. This rule supports regular expressions.
Completeness Checks for any blank or NULLs in data.
CustomSql Customers can implement almost any type of data quality checks in SQL.
DataFreshness Checks if data is fresh.
DatasetMatch Compares two datasets and identifies if they are in sync.
DistinctValuesCount Checks for duplicate values.
Entropy Checks for entropy of the data.
IsComplete Checks if 100% of the data is complete.
IsPrimaryKey Checks if a column is a primary key (not NULL and unique).
IsUnique Checks if 100% of the data is unique.
Mean Checks if the mean matches the set threshold.
ReferentialIntegrity Checks if two datasets have referential integrity.
RowCount Checks if record counts match a threshold.
RowCountMatch Checks if record counts between two datasets match.
StandardDeviation Checks if standard deviation matches the threshold.
SchemaMatch Checks if schema between two datasets match.
Sum Checks if sum matches a set threshold.
Uniqueness Checks if uniqueness of dataset matches a threshold.
UniqueValueRatio Checks if the unique value ration matches a threshold.
//----
Choose Colour