Version: 1.0 prerelease

Connect to data using SQL

To connect to your SQL data, you first create a Data Source which tells GX where your database resides and how to connect to it. You then configure Data Assets for your Data Source to tell GX which sets of records you want to be able to access from your Data Source. Finally, you will define Batch Definitions which allow you to request all the records retrieved from a Data Asset or further partition the returned records based on the contents of a date and time field.

GX supports the following SQL dialects:

PostgreSQL
SQLite
Snowflake
Databricks SQL
BigQuery SQL

All other SQL dialects are handled through the python module SQLAlchemy. You can find more information on the dialects supported by SQLAlchemy on their dialects page.

Configure credentials

To connect GX to your SQL data, you will first need your connection string and corresponding credentials. Because your connection string and credentials provide access to your data they should be stored securely outside of version control. GX 1.0 allows you to securely store credentials and connection strings as environment variables or in an uncommitted config file. These variables are then accessed through string substitution in your version controlled code.

Prerequisites

The ability to set environment variables or a File Data Context.

GX 1.0 also supports referencing credentials that have been stored in the AWS Secrets Manager, Google Cloud Secret Manager, and Azure Key Vault secrets managers. To set up GX 1.0 to access one of these secrets managers you will additionally require:

The ability to install Python modules with pip.

Procedure

Determine your connection string format.

Different types of SQL database have different formats for their connection string. In the following table, the text in <> corresponds to the values specific to your credentials and connection string.

Database type	Connection string
PostgreSQL	`postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>`
SQLite	`sqlite:///<PATH_TO_DB_FILE>`
Snowflake	`snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss`
Databricks SQL	`databricks://token:<TOKEN>@<HOST>:<PORT>/<DATABASE>?http_path=<HTTP_PATH>&catalog=<CATALOG>&schema=<SCHEMA>`
BigQuery SQL	`bigquery://<GCP_PROJECT>/<BIGQUERY_DATASET>?credentials_path=/path/to/your/credentials.json`

Other connection string formats are valid provided they are for a SQL database that is supported by SQLAlchemy. You can find more information on the dialects supported by SQLAlchemy on their dialects page.

Store the credentials required for your connection string.

GX supports the following methods of securely storing credentials. Chose one to implement for your connection string:
- Environment Variables
- config.yml
Environment variables provide the quickest way to securely set up your credentials.

You can set environment variables by replacing the values in <> with your information and entering export <VARIABLE_NAME>=<VALUE> commands in the terminal or adding the commands to your ~/.bashrc file. If you use the export command from the terminal, the environment variables will not persist beyond the current session. If you add them to the ~/.bashrc file, the variables will be exported each time you log in.

You can export individual credentials or an entire connection string. For example:
Terminal or ~/.bashrc
export MY_POSTGRES_USERNAME=<USERNAME> export MY_POSTGRES_PASSWORD=<PASSWORD>
or:
Terminal or ~/.bashrc
export POSTGRES_CONNECTION_STRING=postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
You can also reference your stored credentials within a stored connection string by wrapping their corresponding variable in ${ and }. For example:
Terminal or ~/.bashrc
export MY_POSTGRES_USERNAME=<USERNAME> export MY_POSTGRES_PASSWORD=<PASSWORD> export POSTGRES_CONNECTION_STRING=postgresql+psycopg2://${MY_POSTGRES_USERNAME}:${MY_POSTGRES_PASSWORD}@<HOST>:<PORT>/<DATABASE>
Because the dollar sign character $ is used to indicate the start of a string substitution they should be escaped using a backslash \ if they are part of your credentials. For example, if your password is pa$$word then in the previous examples you would use the command:
Terminal or ~/.bashrc
export MY_POSTGRES_PASSWORD=pa\$\$word
YAML files make variables more visible, are easier to edit, and allow for modularization. For example, you can create a YAML file for development and testing and another for production.

A File Data Context is required before you can configure credentials in a YAML file. By default, the credentials file in a File Data Context is located at /great_expectations/uncommitted/config_variables.yml. The uncommitted/ directory is included in a default .gitignore and will be excluded from version control.

Save your access credentials or the database connection string to great_expectations/uncommitted/config_variables.yml. For example:
config_variables.yml
MY_POSTGRES_USERNAME: <USERNAME> MY_POSTGRES_PASSWORD: <PASSWORD>
or:
config_variables.yml
POSTGRES_CONNECTION_STRING: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
You can also reference your stored credentials within a stored connection string by wrapping their corresponding variable in ${ and }. For example:
config_variables.yml
MY_POSTGRES_USERNAME: <USERNAME> MY_POSTGRES_PASSWORD: <PASSWORD> POSTGRES_CONNECTION_STRING: postgresql+psycopg2://${MY_POSTGRES_USERNAME}:${MY_POSTGRES_PASSWORD}@<HOST>:<PORT>/<DATABASE>
Because the dollar sign character $ is used to indicate the start of a string substitution they should be escaped using a backslash \ if they are part of your credentials. For example, if your password is pa$$word then in the previous examples you would use the command:
Terminal
export MY_POSTGRES_PASSWORD=pa\$\$word
Access your credentials in Python strings.

Securely stored credentials are accessed via string substitution. You can reference your credentials in a Python string by wrapping the variable name in ${ and }. Using individual credentials would look like:
Python
```
connection_string="postgresql+psycopg2://${MY_POSTGRES_USERNAME}:${MY_POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DATABASE}",
```
Or you could reference a configured variable that contains the full connection string by providing a Python string that contains just a reference to that variable:
Python
```
connection_string="${POSTGRES_CONNECTION_STRING}"
```
When you pass a string that references your stored credentials to a GX 1.0 method that requires a connection string as a parameter the referenced variable will be substituted for the corresponding stored value.

Reference a Secrets Manager

GX 1.0 supports the AWS Secrets Manager, Google Cloud Secret Manager, and Azure Key Vault secrets managers.

Use of a secrets manager is optional. Credentials can be securely stored as environment variables or entries in a yaml file without referencing content stored in a secrets manager.

AWS Secrets Manager
GCP Secret Manager
Azure Key Vault

Prerequisites

An AWS Secrets Manager instance. See AWS Secrets Manager.
The ability to install Python packages with pip.
A preconfigured File Data Context.

Procedure

Set up AWS Secrets Manager support.

To use the AWS Secrets Manager with GX 1.0 you will first need to install the great_expectations Python package with the aws_secrets requirement. To do this, run the following command:
Terminal
```
pip install 'great_expectations[aws_secrets]'
```
Reference AWS Secrets Manager variables in config_variables.yml.

By default, config_variables.yml is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in config_variables.yml that start with secret|arn:aws:secretsmanager will be substituted with corresponding values from the AWS Secrets Manager. However, if the keywords following secret|arn:aws:secretsmanager do not correspond to keywords in AWS Secrets Manager no substitution will occur.

You can reference other stored credentials within the keywords by wrapping their corresponding variable in ${ and }. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager:
config_variables.yml
```
 my_aws_creds:  secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately:
config_variables.yml
```
 drivername: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|drivername
 host: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|host
 port: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|port
 username: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|username
 password: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|password
 database: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:dev_db_credentials|database
```
Note that the last seven characters of an AWS Secrets Manager arn are automatically generated by AWS and are not mandatory to retrieve the secret. For example, the following two values retrieve the same secret:
config_variables.yml
```
secret1: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret-1zAyu6
secret2: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret
```
Optional. Reference versioned secrets.

Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version UUID. For example:
config_variables.yml
```
versioned_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret:00000000-0000-0000-0000-000000000000
```

Optional. Retrieve specific secrets from a JSON string.

To retrieve a specific secret from a JSON string, include the JSON key after a pipe character | at the end of the secrets keywords. For example:

config_variables.yml
json_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret|<KEY>
versioned_json_secret: secret|arn:aws:secretsmanager:${AWS_REGION}:${ACCOUNT_ID}:secret:my_secret:00000000-0000-0000-0000-000000000000|<KEY>

Prerequisites

A GCP Secret Manager instance with configured secrets.
The ability to install Python packages with pip.
A preconfigured File Data Context.

Procedure

Set up Azure Key Vault support.

To use Azure Key Vault with GX 1.0 you will first need to install the great_expectations Python package with the gcp requirement. To do this, run the following command:
Terminal
```
pip install 'great_expectations[gcp]'
```
Reference GCP Secret Manager variables in config_variables.yml.

By default, config_variables.yml is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in config_variables.yml that match the regex ^secret\|projects\/[a-z0-9\_\-]{6,30}\/secrets will be substituted with corresponding values from GCP Secret Manager. However, if the keywords in the matching regex do not correspond to keywords in GCP Secret Manager no substitution will occur.

You can reference other stored credentials within the regex by wrapping their corresponding variable in ${ and }. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager:
config_variables.yml
```
 my_gcp_creds: secret|projects/${PROJECT_ID}/secrets/dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately:
config_variables.yml
```
 drivername: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DRIVERNAME
 host: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_HOST
 port: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PORT
 username: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_USERNAME
 password: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PASSWORD
 database: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DATABASE
```
Optional. Reference versioned secrets.

Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version id. For example:
config_variables.yml
```
versioned_secret: secret|projects/${PROJECT_ID}/secrets/my_secret/versions/1
```
Optional. Retrieve specific secrets for a JSON string.

To retrieve a specific secret for a JSON string, include the JSON key after a pipe character | at the end of the secrets regex. For example:
config_variables.yml
```
json_secret: secret|projects/${PROJECT_ID}/secrets/my_secret|<KEY>
versioned_json_secret: secret|projects/${PROJECT_ID}/secrets/my_secret/versions/1|<KEY>
```

Configure your Great Expectations project to substitute variables from the Google Cloud Secret Manager. Secrets store substitution uses the configurations from your config_variables.yml file after all other types of substitution are applied with environment variables.

Secrets store substitution uses keywords and retrieves secrets from the secrets store for values matching the following regex ^secret\|projects\/[a-z0-9\_\-]{6,30}\/secrets. If the values you provide don't match the keywords, the values aren't substituted.

Run the following code to install the great_expectations package with the gcp requirement:
```
pip install 'great_expectations[gcp]'
```
Provide the name of the secret you want to substitute in GCP Secret Manager. For example, secret|projects/project_id/secrets/my_secret.

The latest version of the secret is returned by default.
Optional. To get a specific version of the secret, specify its version id. For example, secret|projects/project_id/secrets/my_secret/versions/1.
Optional. To retrieve a specific secret value for a JSON string, use secret|projects/project_id/secrets/my_secret|key or secret|projects/project_id/secrets/my_secret/versions/1|key.

Save your access credentials or the database connection string to great_expectations/uncommitted/config_variables.yml. For example:

# We can configure a single connection string
my_gcp_creds: secret|projects/${PROJECT_ID}/secrets/dev_db_credentials|connection_string

# Or each component of the connection string separately
drivername: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DRIVERNAME
host: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_HOST
port: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PORT
username: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_USERNAME
password: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_PASSWORD
database: secret|projects/${PROJECT_ID}/secrets/PROD_DB_CREDENTIALS_DATABASE

Run the following code to use the connection_string parameter values when you add a datasource to a Data Context:

# We can use a single connection string 
pg_datasource = context.data_sources.add_or_update_sql(
    name="my_postgres_db", connection_string="${my_gcp_creds}"
)

# Or each component of the connection string separately
pg_datasource = context.data_sources.add_or_update_sql(
    name="my_postgres_db", connection_string="${drivername}://${username}:${password}@${host}:${port}/${database}"
)

Prerequisites

An Azure Key Vault instance with configured secrets.
The ability to install Python packages with pip.
A preconfigured File Data Context.

Procedure

Set up Azure Key Vault support.

To use Azure Key Vault with GX 1.0 you will first need to install the great_expectations Python package with the azure_secrets requirement. To do this, run the following command:
Terminal
```
pip install 'great_expectations[azure_secrets]'
```
Reference Azure Key Vault variables in config_variables.yml.

By default, config_variables.yml is located at: 'gx/uncomitted/config_variables.yml' in your File Data Context.

Values in config_variables.yml that match the regex ^secret\|https:\/\/[a-zA-Z0-9\-]{3,24}\.vault\.azure\.net will be substituted with corresponding values from Azure Key Vault. However, if the keywords in the matching regex do not correspond to keywords in Azure Key Vault no substitution will occur.

You can reference other stored credentials within the regex by wrapping their corresponding variable in ${ and }. When multiple references are present in a value, the secrets manager substitution takes place after all other substitutions have occurred.

An entire connection string can be referenced from the secrets manager:
config_variables.yml
```
 my_abs_creds: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|connection_string
```
Or each component of the connection string can be referenced separately:
config_variables.yml
```
 drivername: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|host
 host: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|host
 port: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|port
 username: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|username
 password: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|password
 database: secret|https://${VAULT_NAME}.vault.azure.net/secrets/dev_db_credentials|database
```
Optional. Reference versioned secrets.

Unless otherwise specified, the latest version of the secret is returned by default. To get a specific version of the secret you want to retrieve, specify its version id (32 alphanumeric characters). For example:
config_variables.yml
```
versioned_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret/a0b00aba001aaab10b111001100a11ab
```
Optional. Retrieve specific secrets for a JSON string.

To retrieve a specific secret for a JSON string, include the JSON key after a pipe character | at the end of the secrets regex. For example:
config_variables.yml
```
json_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret|<KEY>
versioned_json_secret: secret|https://${VAULT_NAME}.vault.azure.net/secrets/my-secret/a0b00aba001aaab10b111001100a11ab|<KEY>
```

Create a SQL Data Source

Data Sources tell GX where your data is located and how to connect to it. With SQL databases this is done through a connection string you will provide.

Prerequisites

Procedure
Sample code

Import GX and instantiate a Data Context:

Python
import great_expectations as gx

context = gx.get_context()

Define a name and connection string for your Data Source.

You can assign any name to a Data Source as long as it is unique within your Data Context.

Your connection string or credentials should not be saved in plain text in your code. Instead, you should reference a securely stored connection string or credentials through string substitution. The guidance on how to Configure your credentials covers how to determine the format of your connection string, securely store your connection string or credentials, and how to reference your connection string or credentials in Python.

The following code defines a Data Source name and references a PostgreSQL connection string that has been securely stored in its entirety:
Python
```
datasource_name = "my_new_datasource"
my_connection_string = "${POSTGRESQL_CONNECTION_STRING}"
```

Create a Data Source.

GX 1.0 provides specific methods for creating Data Sources that correspond to supported SQL dialects. All of these methods are accessible from the data_sources attribute of your Data Context. Reference the following table to determine the method used for your data's SQL dialect:

Database type	Data Context method
PostgreSQL	`context.data_sources.add_postgres(name, connection_string)`
SQLite	`context.data_sources.add_sqlite(name, connection_string)`
DataBricks SQL	`context.data_sources.add_databricks_sql(name, connection_string)`
Snowflake	`context.data_sources.add_snowflake(name, connection_string)`
Other SQL	`context.data_sources.add_sql(name, connection_string)`

Once you have the method for your data's SQL dialect, you can call it with the previously defined Data Source name and connection string to create your Data Source. The following example creates a PostgreSQL Data Source:

Python
data_source = context.data_sources.add_postgres(
    name=datasource_name, connection_string=my_connection_string
)

Optional. Verify the Data Source is connected:
Python
```
print(context.get_datasource(datasource_name))
```
The details of your Data Source are retrieved from the Data Context and displayed.

Sample code
import great_expectations as gx

context = gx.get_context()

datasource_name = "my_new_datasource"
my_connection_string = "${POSTGRESQL_CONNECTION_STRING}"

data_source = context.data_sources.add_postgres(
    name=datasource_name, connection_string=my_connection_string
)

print(context.get_datasource(datasource_name))

Create a Data Asset

Data Assets are collections of records within a Data Source. With SQL Data Sources, a Data Asset can consist of the records from a specific table or the records from a specified query.

Prerequisites

A preconfigured Data Context. The variable context is used for your Data Context in the following example code.
A Data Source conected to SQL data.

Procedure
Sample code

Retrieve your Data Source.

Replace the value of data_source_name with the name of your Data Source and execute the following code to retrieve an existing Data Source from your Data Context:
Python
```
data_source_name = "my_new_datasource"
data_source = context.get_datasource(data_source_name)
```
Add a Data Asset to your Data Source.
- Table Data Asset
- Query Data Asset
A Table Data Asset consists of the records in a single table. It takes two required parameters:
- table_name: The name of the SQL table that the Table Data Asset will retrieve records from.
- name: The name used to reference the Table Data Asset within GX. You may assign this arbitrarily, but all Data Assets within the same Data Source must have unique names.
Replace the values of asset_name and asset_table in the following code, then execute it to add a Table Data Asset to your Data Source:
Python
asset_name = "MY_TABLE_ASSET" database_table_name = "postgres_taxi_data" table_data_asset = data_source.add_table_asset( table_name=database_table_name, name=asset_name )
A Query Data Asset consists of the records returned by a SQL query. It takes two required parameters:
- query: The SQL query that the Data Asset will retrieve records from.
- name: The name used to reference the Query Data Asset within GX. You may assign this arbitrarily, but all Data Assets within the same Data Source must have unique names.
Replace the values of asset_name and asset_query in the following code, then execute it to add a Query Data Asset to your Data Source:
Python
asset_name = "MY_QUERY_ASSET" asset_query = "SELECT * from postgres_taxi_data" query_data_asset = data_source.add_query_asset(query=asset_query, name=asset_name)
Optional. Verify that your Data Asset was added to your Data Source:
```
print(data_source.assets)
```
A list of Data Assets is printed. You can verify your Data Asset was created and added to the Data Source by checking for its name in the printed list.
Optional. Add additional Data Assets to your Data Source.

A Data Source can have multiple Data Assets. Repeat this procedure to add additional Table or Query Data Assets to your Data Source.

Full sample code
import great_expectations as gx

context = gx.get_context()

# Set up: Create a Data Source
datasource_name = "my_new_datasource"
my_connection_string = "${POSTGRESQL_CONNECTION_STRING}"

data_source = context.data_sources.add_postgres(
    name=datasource_name, connection_string=my_connection_string
)

print(context.get_datasource(datasource_name))

# Alternatively, fetch a Data Source from the Data Context.
data_source_name = "my_new_datasource"
data_source = context.get_datasource(data_source_name)

# Example of creating a Table Asset
asset_name = "MY_TABLE_ASSET"
database_table_name = "postgres_taxi_data"
table_data_asset = data_source.add_table_asset(
    table_name=database_table_name, name=asset_name
)

# Example of creating a Query Asset
asset_name = "MY_QUERY_ASSET"
asset_query = "SELECT * from postgres_taxi_data"
query_data_asset = data_source.add_query_asset(query=asset_query, name=asset_name)

# Verify that the Data Assets were created
print(data_source.assets)

Create a Batch Definition

Prerequisites

A preconfigured Data Context. The variable context is used for your Data Context in the following example code.
A Data Asset on a SQL Data Source.

Procedure
Sample code

Retrieve your Data Asset.

Replace the value of datasource_name with the name of your Data Source and the value of asset_name with the name of your Data Asset in the following code. Then execute it to retrieve an existing Data Source and Data Asset from your Data Context:
Python
```
# Retrieve a Data Source
datasource_name = "my_datasource"
data_source = context.get_datasource(datasource_name)

# Get the Data Asset from the Data Source
asset_name = "MY_TABLE_ASSET"
data_asset = data_source.get_asset(asset_name)
```
Add a Batch Definition to the Data Asset.
- Full table
- Partitioned
A full table Batch Definition returns all of the records in your Data Asset as a single Batch. Therefore, to define a full table Batch Definition you only need to provide a name for the Batch Definition to be referenced by.
Update the name parameter and execute the following code to create a full table Batch Definition:
Python
full_table_batch_definition = data_asset.add_batch_definition_whole_table( name="FULL_TABLE" )
A partitioned Batch Definition subdivides the records in a Data Asset based on the values in a specified field. GX 1.0 currently supports partitioning Data Assets based on date fields. The records can be grouped by year, month, or day.
Update the date_column variable and name parameters in the following snippet, then execute it to create partitioned Batch Definitions:
date_column = "pickup_datetime" daily_batch_definition = data_asset.add_batch_definition_daily( name="DAILY", column=date_column ) monthly_batch_definition = data_asset.add_batch_definition_monthly( name="MONTHLY", column=date_column ) yearly_batch_definition = data_asset.add_batch_definition_yearly( name="YEARLY", column=date_column )

Optional. Verify the Batch Definition is valid.

When retrieving a Batch from a partitioned Batch Definition, you can specify the date of the data to retrieve as shown in the following examples. If you do not specify a date, the most recent date in the data is returned by default.

Full table
Partitioned

Python
full_table_batch = full_table_batch_definition.get_batch()
full_table_batch.head()

daily_batch = daily_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1, "day": 14}
)
daily_batch.head()

monthly_batch = monthly_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1}
)
monthly_batch.head()

yearly_batch = yearly_batch_definition.get_batch(batch_parameters={"year": 2020})
yearly_batch.head()

Optional. Create additional Batch Definitions.

A Data Asset can have multiple Batch Definitions as long as each Batch Definition has a unique name within that Data Asset. Repeat this procedure to add additional full table or partitioned Batch Definitions to your Data Asset.

Full sample code
import great_expectations as gx

context = gx.get_context()

# Retrieve a Data Source
datasource_name = "my_datasource"
data_source = context.get_datasource(datasource_name)

# Get the Data Asset from the Data Source
asset_name = "MY_TABLE_ASSET"
data_asset = data_source.get_asset(asset_name)

# Example of a full table Batch Definition
full_table_batch_definition = data_asset.add_batch_definition_whole_table(
    name="FULL_TABLE"
)

# Verify that the Batch Definition is valid
full_table_batch = full_table_batch_definition.get_batch()
full_table_batch.head()

# Examples of partitioned Batch Definitions
date_column = "pickup_datetime"
daily_batch_definition = data_asset.add_batch_definition_daily(
    name="DAILY", column=date_column
)
monthly_batch_definition = data_asset.add_batch_definition_monthly(
    name="MONTHLY", column=date_column
)
yearly_batch_definition = data_asset.add_batch_definition_yearly(
    name="YEARLY", column=date_column
)

# Verify that the partitioned Batch Definitions are valid
daily_batch = daily_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1, "day": 14}
)
daily_batch.head()

monthly_batch = monthly_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1}
)
monthly_batch.head()

yearly_batch = yearly_batch_definition.get_batch(batch_parameters={"year": 2020})
yearly_batch.head()

Configure credentials​

Prerequisites​

Procedure​

Reference a Secrets Manager​

Prerequisites​

Procedure​

Prerequisites​

Procedure​

Prerequisites​

Procedure​

Create a SQL Data Source​

Prerequisites​

Create a Data Asset​

Prerequisites​

Create a Batch Definition​

Prerequisites​

Configure credentials

Prerequisites

Procedure

Reference a Secrets Manager

Prerequisites

Procedure

Prerequisites

Procedure

Prerequisites

Procedure

Create a SQL Data Source

Prerequisites

Create a Data Asset

Prerequisites

Create a Batch Definition

Prerequisites