Version: 1.0 prerelease

Try Great Expectations

Start here to learn how to connect to sample data, build an Expectation, validate sample data, and review Validation Results. This is an ideal place to start if you're new to GX 1.0 and want to experiment with features and see what it offers.

Prerequisites

Python version 3.8 to 3.11

Setup

GX 1.0 is a Python library you can install with the Python pip tool.

For more comprehensive guidance on setting up a Python environment, installing GX 1.0, and installing additional dependencies for specific data formats and storage environments, see Set up a GX environment.

Run the following terminal command to install the GX 1.0 library:
Terminal input
```
pip install great_expectations
```
Verify GX 1.0 installed successfully:
Terminal input
```
great_expectations --version
```
The following output appears when GX 1.0 is successfully installed:
Terminal output
```
great_expectations, version 1.0.0a4
```

Test features and functionality

Procedure
Sample code

Import the great_expectations library and expectations module.

The great_expectations module is the root of the GX library and contains shortcuts and convenience methods for starting a GX project in a Python session.

The expectations module contains all the Expectation classes that are provided by the GX library.

Run the following code in a Python interpreter, IDE, or script:
Python input
```
import great_expectations as gx
import great_expectations.expectations as gxe
```
Create a temporary Data Context and connect to sample data.

In Python, a Data Context provides the API for interacting with many common GX objects.

Run the following code to initialize a Data Context and then use it to read the contents of a .csv file into a Batch of sample data:
Python input
```
context = gx.get_context()
batch = context.data_sources.pandas_default.read_csv(
    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
```
You'll use this sample data to test your Expectations.
Create an Expectation.

Expectations are a fundamental component of GX. They allow you to explicitly define the state to which your data should conform.

The sample data you're using is taxi trip record data. With this data, you can make certain assumptions. For example, the passenger count shouldn't be zero because at least one passenger needs to be present. Additionally, a taxi can accomodate a maximum of six passengers.

Run the following code to define an Expectation that the contents of the column passenger_count consist of values ranging from 1 to 6:
Python input
```
expectation = gxe.ExpectColumnValuesToBeBetween(
    column="passenger_count", min_value=1, max_value=6
)
```

Run the following code to validate the sample data against your Expectation and view the results:

Python input
validation_result = batch.validate(expectation)
print(validation_result.describe())

The sample data conforms to the defined Expectation and the following Validation Results are returned:

Python output
{
    "type": "expect_column_values_to_be_between",
    "success": true,
    "kwargs": {
        "batch_id": "default_pandas_datasource-#ephemeral_pandas_asset",
        "column": "passenger_count",
        "min_value": 1.0,
        "max_value": 6.0
    },
    "result": {
        "element_count": 10000,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "partial_unexpected_list": [],
        "missing_count": 0,
        "missing_percent": 0.0,
        "unexpected_percent_total": 0.0,
        "unexpected_percent_nonmissing": 0.0,
        "partial_unexpected_counts": [],
        "partial_unexpected_index_list": []
    }
}

Optional. Create an Expectation that will fail when validated against the provided data.

A failed Expectation lets you know there is something wrong with the data, such as missing or incorrect values, or there is a misunderstanding about the data.

Run the following code to create an Expectation that fails because it assumes that a taxi can seat a maximum of three passengers:

Python input
failed_expectation = gxe.ExpectColumnValuesToBeBetween(
    column="passenger_count", min_value=1, max_value=3
)
failed_validation_result = batch.validate(failed_expectation)
print(failed_validation_result.describe())

When an Expectation fails, the Validation Results of the failed Expectation include metrics to help you assess the severity of the issue:

Python output
{
    "type": "expect_column_values_to_be_between",
    "success": false,
    "kwargs": {
        "batch_id": "default_pandas_datasource-#ephemeral_pandas_asset",
        "column": "passenger_count",
        "min_value": 1.0,
        "max_value": 3.0
    },
    "result": {
        "element_count": 10000,
        "unexpected_count": 853,
        "unexpected_percent": 8.53,
        "partial_unexpected_list": [
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4,
            4
        ],
        "missing_count": 0,
        "missing_percent": 0.0,
        "unexpected_percent_total": 8.53,
        "unexpected_percent_nonmissing": 8.53,
        "partial_unexpected_counts": [
            {
                "value": 4,
                "count": 20
            }
        ],
        "partial_unexpected_index_list": [
            9147,
            9148,
            9149,
            9150,
            9151,
            9152,
            9153,
            9154,
            9155,
            9156,
            9157,
            9158,
            9159,
            9160,
            9161,
            9162,
            9163,
            9164,
            9165,
            9166
        ]
    }
}

To reduce the size of the report and make it easier to review, only a portion of the failed values and record indexes are included in the Validation Results. The failed counts and percentages correspond to the failed records in the validated data.

Optional. Go to the Expectations Gallery and experiment with other Expectations.

Full example script
# Import required modules from the GX library
import great_expectations as gx
import great_expectations.expectations as gxe


# Create a temporary Data Context and connect to provided sample data.
context = gx.get_context()
batch = context.data_sources.pandas_default.read_csv(
    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)

# Create an Expectation
expectation = gxe.ExpectColumnValuesToBeBetween(
    column="passenger_count", min_value=1, max_value=6
)

# Validate the sample data against your Expectation and view the results
validation_result = batch.validate(expectation)
print(validation_result.describe())

Next steps

Check out GX Cloud, our SaaS platform—it's now in public preview! Sign up here and you could be validating your data in minutes. We also offer regular GX Cloud workshops: click here to get more information and register.
To learn more about GX 1.0, see Community resources.
If you're ready to start using GX 1.0 with your own data, the Set up a GX environment documentation provides a more comprehensive guide to setting up GX to work with specific data formats and environments.

Prerequisites​

Setup​

Test features and functionality​

Next steps​

Prerequisites

Setup

Test features and functionality

Next steps