Skip to main content
Version: 0.18.17

Configure Validation Result Stores

A Validation Results Store is a connector that is used to store and retrieve information about objects generated when data is Validated against an Expectation. By default, Validation Results are stored in JSON format in the uncommitted/validations/ subdirectory of your gx/ folder. Use the information provided here to configure a store for your Validation Results.

caution

Validation Results can include sensitive or regulated data that should not be committed to a source control system.

Amazon S3

Use the information provided here to configure a new storage location for Validation Results in Amazon S3.

Prerequisites

    Install boto3 in your local environment

    Python interacts with AWS through the boto3 library. Great Expectations makes use of this library in the background when working with AWS. Although you won't use boto3 directly, you'll need to install it in your virtual environment.

    Run one of the following pip commands to install boto3 in your virtual environment:

    Terminal command
    python -m pip install boto3

    or

    Terminal command
    python3 -m pip install boto3

    To set up boto3 with AWS, and use boto3 within Python, see the Boto3 documentation.

    Verify your AWS credentials are properly configured

    Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:

    Terminal command
    aws sts get-caller-identity

    When your credentials are properly configured, your UserId, Account, and Arn are returned. If your credentials are not configured correctly, an error message appears. If you received an error message, or you couldn't verify your credentials, see Configuring the AWS CLI.

    Identify your Data Context Validation Results Store

    Your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..

    The following section in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. great_expectations.yml file tells Great Expectations to look for Validation Results in a Store named validations_store. It also creates a ValidationsStore named validations_store that is backed by a Filesystem and stores Validation Results under the base_directory uncommitted/validations (the default).

    Python
    stores:
    validations_store:
    class_name: ValidationsStore
    store_backend:
    class_name: TupleFilesystemStoreBackend
    base_directory: uncommitted/validations/

    validations_store_name: validations_store

    Update your configuration file to include a new Store for Validation Results

    To manually add a Validation Results Store, add the following configuration to the stores section of your great_expectations.yml file:

    Python
    stores:
    validations_S3_store:
    class_name: ValidationsStore
    store_backend:
    class_name: TupleS3StoreBackend
    bucket: '<your>'
    prefix: '<your>' # Bucket and prefix in combination must be unique across all stores

    As shown in the previous example, you need to change the default store_backend settings to make the Store work with S3. The class_name is set to TupleS3StoreBackend, bucket is the address of your S3 bucket, and prefix is the folder in your S3 bucket where Validation Results are located.

    The following example shows the additional options that are available to customize TupleS3StoreBackend:

    File contents: great_expectations.yml
    class_name: ValidationsStore
    store_backend:
    class_name: TupleS3StoreBackend
    bucket: '<your_s3_bucket_name>'
    prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
    boto3_options:
    endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
    region_name: '<your_aws_region_name>'

    In the previous example, the Store name is validations_S3_store. If you use a personalized Store name, you must also update the value of the validations_store_name key to match the Store name. For example:

    Python
    validations_store_name: validations_S3_store

    When you update the validations_store_name key value, Great Expectations uses the new Store for Validation Results.

    Add the following code to great_expectations.yml to configure the IAM user:

    File contents: great_expectations.yml
    class_name: ValidationsStore
    store_backend:
    class_name: TupleS3StoreBackend
    bucket: '<your_s3_bucket_name>'
    prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
    boto3_options:
    aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
    aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
    aws_session_token: ${AWS_ACCESS_KEY_ID}

    Add the following code to great_expectations.yml to configure the IAM Assume Role:

    File contents: great_expectations.yml
    class_name: ValidationsStore
    store_backend:
    class_name: TupleS3StoreBackend
    bucket: '<your_s3_bucket_name>'
    prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
    boto3_options:
    assume_role_arn: '<your_role_to_assume>'
    region_name: '<your_aws_region_name>'
    assume_role_duration: session_duration_in_seconds
    caution

    If you are also storing ExpectationsA verifiable assertion about data. in S3 How to configure an Expectation store to use Amazon S3, or DataDocs in S3 How to host and share Data Docs, then make sure the prefix values are disjoint and one is not a substring of the other.

    Copy existing Validation results to the S3 bucket (Optional)

    If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Validation Results saved that you want to transfer to your S3 bucket.

    To copy Validation Results into Amazon S3, use the aws s3 sync command as shown in the following example:

    Terminal input
    aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'

    The base_directory is set to uncommitted/validations/ by default.

    In the following example, the Validation Results Validation1 and Validation2 are copied to Amazon S3 and a confirmation message is returned:

    Terminal output
    upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.json
    upload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json

    Confirm the configuration

    Run a Checkpoint to store results in the new Validation Results Store on S3 then visualize the results by re-building Data Docs.