Install additional dependencies
Some environments and Data Sources require additional Python libraries or third party utilities that are not included in the base installation of GX Core. Use the information provided here to install the necessary dependencies for Amazon S3, Azure Blob Storage, Google Cloud Storage, and SQL databases.
- Amazon S3
- Microsoft Azure Blob Storage
- Google Cloud Storage
- SQL databases
GX Core uses the Python library boto3
to access objects stored in Amazon S3 buckets, but you must configure your Amazon S3 account and credentials through AWS and the AWS command line interface (CLI).
Prerequisites
-
The AWS CLI. See Installing or updating the latest version of the AWS CLI.
-
AWS credentials. See Configuring the AWS CLI.
Installation
Python interacts with AWS through the boto3 library. GX Core makes use of this library in the background when working with AWS. Although you won't use boto3 directly, you'll need to install it in your virtual environment.
To set up boto3 with AWS, and use boto3 within Python, see the Boto3 documentation.
- Run the following code to verify the AWS CLI version:
aws --version
If this command does not return AWS CLI version information, reinstall or update the AWS CLI. See Install or update to the latest version of the AWS CLI.
- Run one of the following pip commands to install boto3 in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
- Run the following code to verify your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your UserId
, Account
, and Arn
are returned. If your credentials are not configured correctly, an error message appears. If you received an error message, or you couldn't verify your credentials, see Configure the AWS CLI.
- Install the Python dependencies for AWS S3 support.
If you installed GX in a virtual environment, your environment should be active when you install these dependencies.
Run the following code to install the optional dependencies required by GX to work with AWS S3:
python -m pip install 'great_expectations[s3]'
GX Core and the requirements for the boto3
Python library are installed.
Azure Blob Storage stores unstructured data on the Microsoft cloud data storage platform. To validate Azure Blob Storage data with GX Core you install additional Python libraries and define a connection string.
Prerequisites
- Python version 3.8 to 3.11
- (Recommended) A Python virtual environment.
- An Azure Storage account.
- Azure storage account access keys.
Installation
- Install the Python dependencies for AWS S3 support.
If you installed GX in a virtual environment, your environment should be active when you install these dependencies.
Run the following code to install GX Core with the additional Python libraries needed to work with Azure Blob Storage:
python -m pip install 'great_expectations[azure]'
- Configure your Azure Blob Storage credentials.
You can manage your credentials by storing them as environment variables. To do this, enter export ENV_VARIABLE_NAME=env_var_value
in the terminal or add the equivalent command to your ~/.bashrc
file. For example:
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY>"
When entering this command, replace <YOUR-STORAGE-ACCOUNT-NAME>
and <YOUR-STORAGE-ACCOUNT-KEY>
with your Azure Blob Storage account values.
If you do not want to store your credentials as environment variables, you can store them in the file config_variables.yml
after you have created a File Data Context.
To validate Google Cloud Platform (GCP) data with GX Core, you create your GX Python environment, install GX Core locally, and then configure the necessary dependencies.
Prerequisites
- Python version 3.8 to 3.11
- pip. See Installation and downloads.
- A GCP service account with permissions to access GCP resources and storage Objects.
- The
GOOGLE_APPLICATION_CREDENTIALS
environment variable is set. See Set up Application Default Credentials. - Google Cloud API authentication is set up. See Set up authentication.
Installation
-
Run the following code to confirm your Python version:
Terminal inputpython --version
If your existing Python version is not 3.8 to 3.11, see Active Python Releases.
-
Run the following code to create a virtual environment and a directory named
my_venv
:Terminal inputpython -m venv my_venv
Optional. Replace
my_venv
with another directory name.If you prefer, you can use virtualenv, pyenv, and similar tools to install GX in virtual environments.
-
Run the following code to activate the virtual environment:
Terminal inputsource my_venv/bin/activate
-
Run the following code to install optional dependencies:
Terminal inputpython -m pip install 'great_expectations[gcp]'
-
Run the following code to confirm GX was installed successfully:
Terminal inputgreat_expectations --version
The output should be
great_expectations, version <version_number>
.
To validate data stored on SQL databases with GX Core, you create your GX Python environment, install GX Core locally, and then configure the necessary dependencies.
Prerequisites
- Python version 3.8 to 3.11
- pip. See Installation and downloads.
- The necessary environment variables are set to allow access to the SQL database. See Manage credentials.
Installation
-
Run the following code to confirm your Python version:
Terminal inputpython --version
If your existing Python version is not 3.8 to 3.11, see Active Python Releases.
-
Run the following code to create a virtual environment and a directory named
my_venv
:Terminal inputpython -m venv my_venv
Optional. Replace
my_venv
with another directory name.If you prefer, you can use virtualenv, pyenv, and similar tools to install GX in virtual environments.
-
Run the following code to activate the virtual environment:
Terminal inputsource my_venv/bin/activate
-
Run the following code to install optional dependencies for SQLAlchemy:
Terminal inputpython -m pip install 'great_expectations[sqlalchemy]'
To install optional dependencies for a different SQL database, see SQL database dependency commands.
-
Run the following code to confirm GX was installed successfully:
Terminal inputgreat_expectations --version
The output should be
great_expectations, version <version_number>
.
SQL database dependency commands
The following table lists the installation commands used to install GX Core dependencies for specific SQL databases. These dependencies are required for the successful operation of GX Core.
SQL Database | Command |
---|---|
AWS Athena | pip install 'great_expectations[athena]' |
BigQuery | pip install 'great_expectations[bigquery]' |
MSSQL | pip install 'great_expectations[mssql]' |
PostgreSQL | pip install 'great_expectations[postgresql]' |
Redshift | pip install 'great_expectations[redshift]' |
Snowflake | pip install 'great_expectations[snowflake]' |
Trino | pip install 'great_expectations[trino]' |