Customize an Expectation Class
Existing Expectation Classes can be customized to include additional information such as buisness logic, more descriptive naming conventions, and specialized rendering for Data Docs. This is done by subclassing an existing Expectation class and populating the subclass with default values and customized attributes.
Advantages of subclassing an Expectation and providing customized attributes rather than creating an instance of the parent Expectation and passing in parameters include:
- All instances of the Expectation that use the default values will be updated if changes are made to the class definition.
- More descriptive Expectation names can be provided that indicate the buisness logic behind the Expectation.
- Customized text can be provided to describe the Expectation when Data Docs are generated from Validation Results.
Prerequisites
- Python version 3.8 to 3.11.
- An installation of GX 1.0.
- A preconfigured Data Context.
- Recommended. A preconfigured Data Source and Data Asset connected to your data for testing your customized Expectation.
- Procedure
- Sample code
-
Choose and import a base Expectation class.
You can customize any of the core Expectation classes in GX. You can view the available Expectations and their functionality in the Expectation Gallery.
In this example,
ExpectColumnValueToBeBetween
will be customized:Pythonfrom great_expectations.expectations import ExpectColumnValueToBeBetween
-
Create a new Expectation class that inherits the base Expectation class.
The core Expectations in GX have names describing their functionality. When you create a customized Expectation class you can provide a class name that is more indicative of your specific use case:
Pythonclass ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
-
Override the Expectation's attributes with new default values.
The attributes that can be overriden correspond to the parameters required by the base Expectation. These can be referenced from the Expectation Gallery.
In this example, the default column for
ExpectValidPassengerCount
is set topassenger_count
and the default value range for the column is defined as between1
and6
:Pythonclass ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6 -
Customize the rendering of the new Expectation when displayed in Data Docs.
The
description
attribute contains the text describing the customized Expectation when your results are rendered into Data Docs. It can be set when an Expectation class is defined or edited as an attribute of an Expectation instance. You can format thedescription
string with Markdown syntax:Pythonclass ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6
description: str = "There should be between **1** and **6** passengers." -
Use the customized subclass as an Expectation.
Once a customized Expectation subclass has been defined, instances of it can be created, added to Expectation Suites, and validated just like any other Expectation class:
Pythonexpectation1 = ExpectValidPassengerCount() # Uses the predefined default values
expectation2 = ExpectValidPassengerCount(column="occupied_seats") # Uses a different column than the default, but keeps the default min_value, max_value, and description.It is best to use the predefined default values when a customized Expectation is created. This ensures that the
description
remains accurate to the values that the Expectation uses. It also allows you to update all instances of the customized Expectation by editing the default values in the customized Expectation's class definition rather than having to update each instance individually in their Expectation Suites.
import great_expectations as gx
from great_expectations.expectations import ExpectColumnValueToBeBetween
class ExpectValidPassengerCount(ExpectColumnValueToBeBetween):
column: str = "passenger_count"
min_value: int = 1
max_value: int = 6
description: str = "There should be between **1** and **6** passengers."
context = gx.get_context()
expectation1 = ExpectValidPassengerCount() # Uses the predefined default values
expectation2 = ExpectValidPassengerCount(column="occupied_seats") # Uses a different column than the default, but keeps the default min_value, max_value, and description.
data_source_name = "my_taxi_data"
asset_name = "2018_taxi_data"
batch_definition_name = "all_records_in_asset"
batch = context.get_datasource(datasource_name).get_asset(asset_name).get_batch_definition(batch_definition_name=batch_definition_name).get_batch()
batch.validate(expectation1)
batch.validate(expectation2)