Skip to main content

Redshift interface library

Project description

Python Version PyPi

redshift_connector is the Amazon Redshift connector for Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

  • IAM authentication

  • Identity provider (IdP) authentication

  • Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

Install from Binary

Package Manager

Downloads

Installation Command

PyPi

PyPi Downloads

pip install redshift_connector

Conda

Conda Downloads

conda install -c conda-forge redshift_connector

Install from Source

You may install from source by cloning this repository.

$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .

Tutorials

We are working to add more documentation and would love your feedback. Please reach out to the team by opening an issue or starting a discussion to help us fill in the gaps in our documentation.

Integrations

redshift_connector integrates with various open source projects to provide an interface to Amazon Redshift. Please open an issue with our project to request new integrations or get support for a redshift_connector issue seen in an existing integration.

Basic Example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Enabling autocommit

Following the DB-API specification, autocommit is off by default. It can be turned on by using the autocommit property of the connection.

# Make sure we're not in a transaction
conn.rollback()
conn.autocommit = True
conn.run("VACUUM")
conn.autocommit = False

Configuring paramstyle

Paramstyle can be set on both a module and cursor level. When paramstyle is set on a module level e.g. redshift_connector.paramstyle = 'qmark', the user specified paramstyle is used for all subsequent cursors unless set on the cursor. When paramstyle is set on the cursor e.g. `cursor.paramstyle = 'qmark' the user specified paramstyle is only used for that cursor object.

# setting paramstyle to qmark on a module level
redshift_connector.paramstyle = 'qmark'


with redshift_connector.connect() as conn1:
    with conn1.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
        pass

    with conn1.cursor() as cursor2:
        # setting paramstyle to numeric on the cursor level only this cursor will use numeric paramstyle
        cursor.paramstyle = 'numeric'

    with conn1.cursor() as cursor3: # this cursor will use qmark paramstyle as it's been set on the module level
        pass

 with redshift_connector.connect() as conn2:
    with conn2.cursor() as cursor1: # this cursor will use qmark paramstyle as it's been set on the module level
        pass

The module level default paramstyle used is format. Valid values for paramstyle include qmark, numeric, named, format, pyformat. The below example shows how to use various paramstyles after the paramstyle is set on the cursor.

When paramstyle is set to named or pyformat, parameters must be passed as a Python dictionary to the execute() method. Other paramstyles require parameters to be passed as a Python tuple or list.

# qmark
cursor.paramstyle = 'qmark'
sql = 'insert into foo(bar, jar) VALUES(?, ?)'
cursor.execute(sql, (1, "hello world"))

# numeric
cursor.paramstyle = 'numeric'
sql = 'insert into foo(bar, jar) VALUES(:1, :2)'
cursor.execute(sql, (1, "hello world"))

# named
cursor.paramstyle = 'named'
sql = 'insert into foo(bar, jar) VALUES(:p1, :p2)'
cursor.execute(sql, {"p1":1, "p2":"hello world"})

# format
cursor.paramstyle = 'format'
sql = 'insert into foo(bar, jar) VALUES(%s, %s)'
cursor.execute(sql, (1, "hello world"))

# pyformat
cursor.paramstyle = 'pyformat'
sql = 'insert into foo(bar, jar) VALUES(%(bar)s, %(jar)s)'
cursor.execute(sql, {"bar": 1, "jar": "hello world"})

Exception Handling

redshift_connector uses the guideline for exception handling specified in the Python DB-API. For exception definitions, please see redshift_connector/error.py

Example using IAM Credentials

IAM Credentials can be supplied directly to connect(...) using an AWS profile as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    profile='default'
 )
# ~/.aws/credentials
[default]
aws_access_key_id="my_aws_access_key_id"
aws_secret_access_key="my_aws_secret_access_key"
aws_session_token="my_aws_session_token"

# ~/.aws/config
[default]
region=us-west-2

If a region is not provided in ~/.aws/config or you would like to override its value, region may be passed to connect(...).

Alternatively, IAM credentials can be supplied directly to connect(...) using AWS credentials as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    access_key_id="my_aws_access_key_id",
    secret_access_key="my_aws_secret_access_key",
    session_token="my_aws_session_token",
    region="us-east-2"
 )

Integration with pandas

Retrieving query results as a pandas.DataFrame

import pandas
cursor.execute("create Temp table book(bookname varchar,author varchar)")
cursor.executemany("insert into book (bookname, author) values (%s, %s)",
                   [
                       ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                       ('A Brief History of Time', 'Stephen Hawking')

                   ])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>>                         bookname                 author
>> 0  One Hundred Years of Solitude  Gabriel García Márquez
>> 1        A Brief History of Time         Stephen Hawking

Insert data stored in a pandas.DataFrame into an Amazon Redshift table

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.array(
        [
            ["One Hundred Years of Solitude", "Gabriel García Márquez"],
            ["A Brief History of Time", "Stephen Hawking"],
        ]
    ),
    columns=["bookname", "author‎"],
)
with conn.cursor() as cursor:
    cursor.write_dataframe(df, "book")
    cursor.execute("select * from book; ")
    result = cursor.fetchall()

Integration with numpy

import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>>  ['A Brief History of Time' 'Stephen Hawking']]

Query using functions

cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]

Connection Parameters

Name

Type

Description

Default Value

Required

access_key_id

str

The access key for the IAM role or IAM user configured for IAM database authentication

None

No

allow_db_user_override

bool

True specifies the driver uses the DbUser value from the SAML assertion while False indicates the value in the DbUser connection parameter is used

FALSE

No

app_name

str

The name of the IdP application used for authentication

None

No

auth_profile

str

The name of an Amazon Redshift Authentication profile having connection properties as JSON. See the RedshiftProperty class to learn how connection parameters should be named.

None

No

auto_create

bool

Indicates whether the user should be created if they do not exist

FALSE

No

client_id

str

The client id from Azure IdP

None

No

client_secret

str

The client secret from Azure IdP

None

No

cluster_identifier

str

The cluster identifier of the Amazon Redshift Cluster

None

No

credentials_provider

str

The IdP that will be used for authenticating with Amazon Redshift.

None

No

database

str

The name of the database to connect to

None

No

database_metadata_current_db_only

bool

Indicates if application supports multi-database datashare catalogs. Default value of True indicates application does not support multi-database datashare catalogs for backwards compatibility

TRUE

No

db_groups

list

A comma-separated list of existing database group names that the DbUser joins for the current session

None

No

db_user

str

The user ID to use with Amazon Redshift

None

No

endpoint_url

str

The Amazon Redshift endpoint url. This option is only used by AWS internal teams.

None

No

group_federation

bool

Use the IdP Groups in the Redshift. Default value False.

False

No

host

str

The hostname of Amazon Redshift cluster

None

No

iam

bool

If IAM Authentication is enabled

FALSE

No

iam_disable_cache

bool

This option specifies whether the IAM credentials are cached. By default the IAM credentials are cached. This improves performance when requests to the API gateway are throttled.

FALSE

No

identity_namespace

str

The identity namespace to be used for the IdC browser auth plugin and IdP token auth plugin. It is an optional value if there is only one IdC instance existing or if default identity namespace is set on the cluster - else it is required.

None

No

idp_response_timeout

int

The timeout for retrieving SAML assertion from IdP

120

No

idp_tenant

str

The IdP tenant

None

No

listen_port

int

The listen port IdP will send the SAML assertion to

7890

No

login_to_rp

str

Only for AdfsCredentialsProvider. Used to specify the loginToRp when performing IdpInitiatedSignOn as apart of form based authentication.

urn:amazon:webservices

No

login_url

str

The SSO Url for the IdP

None

No

max_prepared_statements

int

The maximum number of prepared statements that can be open at once

1000

No

numeric_to_float

bool

Specifies if NUMERIC datatype values will be converted from decimal.Decimal to float. By default NUMERIC values are received as decimal.Decimal. Enabling this option is not recommended for use cases which prefer the most precision as results may be rounded. Please reference the Python docs on decimal.Decimal to see the tradeoffs between decimal.Decimal and float before enabling this option.

False

No

partner_sp_id

str

The Partner SP Id used for authentication with Ping

None

No

password

str

The password to use for authentication

None

No

port

Int

The port number of the Amazon Redshift cluster

5439

No

preferred_role

str

The IAM role preferred for the current connection

None

No

principal_arn

str

The ARN of the IAM entity (user or role) for which you are generating a policy

None

No

profile

str

The name of a profile in a AWS credentials file that contains AWS credentials.

None

No

provider_name

str

The name of the Redshift Native Auth Provider.

None

No

region

str

The AWS region where the cluster is located

None

No

role_arn

str

The Amazon Resource Name (ARN) of the role that the caller is assuming. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.

None

No

role_session_name

str

An identifier for the assumed role session. Typically, you pass the name or identifier that is associated with the user who is using your application. That way, the temporary security credentials that your application will use are associated with that user. This parameter is used by JwtCredentialsProvider. For this provider, this is an optional parameter.

jwt_redshift_session

No

scope

str

Scope for BrowserAzureOauth2CredentialsProvider authentication.

“”

No

secret_access_key_id

str

The secret access key for the IAM role or IAM user configured for IAM database authentication

None

No

serverless_acct_id

str

The account ID of the serverless. Default value None

None

No

serverless_work_group

str

The name of work group for serverless end point. Default value None.

None

No

session_token

str

The access key for the IAM role or IAM user configured for IAM database authentication. Not required unless temporary AWS credentials are being used.

None

No

ssl

bool

If SSL is enabled

TRUE

No

ssl_insecure

bool

Specifies if IDP hosts server certificate will be verified

TRUE

No

sslmode

str

The security of the connection to Amazon Redshift. verify-ca and verify-full are supported.

verify_ca

No

timeout

int

The number of seconds before the connection to the server will timeout.

None

No

token

str

The access token required for the IdP token auth plugin.

None

No

token_type

str

The token type required for the IdP token auth plugin.

ACCESS_TOKEN

No

user

str

The username to use for authentication

None

No

web_identity_token

str

The OAuth 2.0 access token or OpenID Connect ID token that is provided by the identity provider. Your application must get this token by authenticating the user who is using your application with a web identity provider. This parameter is used by JwtCredentialsProvider. For this provider, this is a mandatory parameter.

None

No

Supported Datatypes

redshift_connector supports the following Amazon Redshift datatypes. redshift_connector will attempt to treat unsupported datatypes as strings. Incoming data from Amazon Redshift is treated as follows:

Amazon Redshift Datatype

Python Datatype

ACLITEM

str

BOOLEAN

bool

INT8

int

INT4

int

INT2

int

VARCHAR

str

OID

int

REGPROC

int

XID

int

FLOAT4

float

FLOAT8

float

TEXT

str

CHAR

str

DATE

datetime.date

TIME

datetime.time

TIMETZ

datetime.time

TIMESTAMP

datetime.datetime

TIMESTAMPTZ

datetime.datetime

NUMERIC

decimal.Decimal

GEOMETRY

str

SUPER

str

VARBYTE

bytes

GEOGRAPHY

str

Logging

redshift_connector uses logging for providing detailed error messages regarding IdP authentication. A do-nothing handler is enabled by default as to prevent logs from being output to sys.stderr.

Enable logging in your application to view logs output by redshift_connector as described in the documentation for Python logging module.

Client Transfer Protocol

redshift_connector requests the Amazon Redshift server use the highest transfer protocol version supported. As of v2.0.879 binary transfer protocol is requested by default. If necessary, the requested transfer protocol can be modified via the client_protocol_version parameter of redshift_connector.connect(...). Please see the Connection Parameters table for more details.

Getting Help

Contributing

We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.

Changelog Generation

An entry in the changelog is generated upon release using gitchangelog. Please use the configuration file, .gitchangelog.rc when generating the changelog.

Running Tests

You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

redshift_connector-2.1.0-py3-none-any.whl (125.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page