Apply Conditional Expectations to specific rows within a Batch
Conditional Expectations are experimental, and they are available for Pandas, Spark, and SQLAlchemy backends.
By default Expectations apply to the entire dataset retrieved in a Batch. However, sometimes an Expectation is not relevant for every row and validating every row could cause false positives or false negatives in the Validation Results.
For example, you may define an Expectation that a column specifying the country of origin of a product should not be null. If that Expectation is only relevant when the product is a foreign import then applying the Expectation to every row in the Batch could result in a large number of false negatives when the country of origin column is null for products produced by local industry.
To solve this issue GX allows you to define Conditional Expectations that only apply to a subset of the data retrieved in a Batch.
Create a Conditional Expectation
Great Expectations lets you express Conditional Expectations with a row_condition argument that can be passed to all Expectations that evaluate rows within a Dataset. The row_condition argument should be a boolean expression string. The Conditional Expectation will validate rows that result in the row_condition string being True. When the row_condition string evaluates as False, the row in question will not be validated by the Expectation.
Prerequisites
- Python version 3.8 to 3.11.
- An installation of GX 1.0.
- A preconfigured Data Context.
- Recommended. A preconfigured Data Source and Data Asset connected to your data for testing your customized Expectation.
- Procedure
- Sample code
In this procedure, your Data Context is assumed to be stored in the variable context and your Expectation Suite is assumed to be stored in the variable suite. suite can be a newly created and empty Expectation Suite, or an existing Expectation Suite retrieved from the Data Context.
The data used in the examples for this procedure is passenger data for the Titanic, including what class of ticket the passenger held and whether or not they survived the journey.
-
Determine the
condition_parserfor yourrow_condition.The
condition_parserdefines the syntax ofrow_conditionstrings. When implementing conditional Expectations with pandas, this argument must be set to"pandas". When implementing conditional Expectations with Spark or SQLAlchemy, this argument must be set to"great_expectations__experimental__".Conditional Expectations will fail if the Batch they are validating comes from a different type of Data Source than is indicated by the
condition_parserargument. -
Determine the
row_conditionexpression.The
row_conditionargument should be a boolean expression string which will be evaluated for each row in the Batch the Expectation validates. When therow_conditionevaluates asTruethe row will be included in the Expectation's validations. When therow_conditionevaluates asFalse, the Expectation will be skipped for that row.The syntax of the
row_conditionargument is based on thecondition_parserthat was previously specified.- pandas
- Spark/SQL
In pandas the
row_conditionvalue is passed topandas.DataFrame.query()before Expectation Validation and the returned rows from the evaluated Batch will be validated by the Conditional Expectation.In Spark and SQLAlchemy, the
row_conditionvalue uses SQL syntax and is parsed as a data filter or a query before Expectation Validation. -
Create a Conditional Expectation.
A Conditional Expectation is created exactly like a regular Expectation, except that the
row_conditionandcondition_parserparameters are provided in addition to the Expectation's other arguments.- pandas
- Spark/SQL
Pythonrow_condition = "PClass=='1st'"Pythonrow_condition = "PClass=='1st'"Do not use single quotes, newlines, or
\ninside the specifiedrow_conditionas shown in the following examples:Pythonrow_condition = "PClass=='1st'" # Don't do this. Single quotes aren't valid!
row_condition="""
PClass=="1st"
""" # Don't do this. Newlines and \n aren't valid!
row_condition = 'PClass=="1st"' # Do this instead.With pandas you can indicate variables from the environment by prefacing them with
@. You can also indicate columns with a space in their name by wrapping the name with backticks:`.Some examples of valid
row_conditionvalues for pandas include:Pythonrow_condition = '`foo foo`=="bar bar"' # The value of the column "foo foo" is "bar bar"
row_condition = 'foo==@bar' # the value of the foo field is equal to the value of the bar environment variableFor more information on the syntax accepted by pandas
row_conditionvalues see pandas.DataFrame.query.For Spark and SQL, you will also want to specify your columns using the
col()function.Some examples of valid
row_conditionvalues for Spark and SQL include:Pythonrow_condition='col("foo") == "Two Two"' # foo is 'Two Two'
row_condition='col("foo").notNull()' # foo is not null
row_condition='col("foo") > 5' # foo is greater than 5
row_condition='col("foo") != "a-b"' # foo is not "a-b" (SQL only)
row_condition='col("foo") <= 3.14' # foo is less than 3.14
row_condition='col("foo") <= date("2023-03-13")' # foo is earlier than 2023-03-13 -
Optional. Create additional Conditional Expectations.
Expectations with different conditions are treated as unique even if they are of the same type and apply to the same column within an Expectation Suite. This allows you to create one unconditional Expectation and an arbitrary number of Conditional Expectations (each with a different condition).
For example, the following code creates a unconditional Expectation that the value of the
"Suvived"column is either 0 or 1, and a Conditional Expectation that the value of the"Survived"column is1if the individual was a first class passenger:Pythonexpectation = suite.add_expectation(
gxe.ExpectColumnValuesToBeInSet(
column="Survived",
value_set=[0, 1]
)
)
conditional_expectation = suite.add_expectation(
gxe.ExpectColumnValuesToBeInSet(
column='Survived',
value_set=[1],
condition_parser='pandas',
row_condition='PClass=="1st"'
)
)
import great_expectations as gx
import great_expectations.expectations as gxe
from great_expectations.core.expectation_suite import ExpectationSuite
context = gx.get_context()
suite = context.suites.add(ExpectationSuite(name="my_expectation_suite"))
expectation = suite.add_expectation(
gxe.ExpectColumnValuesToBeInSet(
column="Survived",
value_set=[0, 1]
)
)
conditional_expectation = suite.add_expectation(
gxe.ExpectColumnValuesToBeInSet(
column='Survived',
value_set=[1],
condition_parser='pandas',
row_condition='PClass=="1st"'
)
)
Data Docs and Conditional Expectations
Conditional Expectations are displayed differently from standard Expectations in the Data Docs. Each Conditional Expectation is qualified with if 'row_condition_string', then values must be... as shown in the following image:

If 'row_condition_string' is a complex expression, it is split into several components to improve readability.
Scope and limitations
While conditions can be attached to most Expectations, the following Expectations cannot be conditioned and do not take the row_condition argument:
expect_column_to_existexpect_table_columns_to_match_ordered_listexpect_table_column_count_to_be_betweenexpect_table_column_count_to_equal