Skip to main content

Python client for the Impala distributed query engine

Project description

# impyla

Python client for the Impala distributed query engine.


### Features

Fully supported:

* Lightweight, `pip`-installable package for connecting to Impala databases

* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients)

* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])

Alpha-quality:

* Wrapper for [MADlib][madlib]-style prediction, allowing for large-scale,
distributed machine learning (see [the Impala port of MADlib][madlibport])

* Compiling UDFs written in Python into low-level machine code for execution by
Impala (see the [`udf`](https://github.com/cloudera/impyla/tree/udf) branch;
powered by [Numba][numba]/[LLVM][llvm])


### Dependencies

Required:

* `python2.6` or `python2.7`

* `thrift>=0.8` (Python package only; no need for code-gen)

Optional:

* `pandas` for the `.as_pandas()` function to work

This project is installed with `setuptools>=2`.

### Installation

Install the latest release (`0.8.0`) with `pip`:

```bash
pip install impyla
```

For the latest (dev) version, clone the repo:

```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```


### Quickstart

Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):

```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```

**Note**: the specified port number should be for the *HiveServer2* service
(defaults to 21050 in CM), not Beeswax (defaults to 21000) which is what the
Impala shell uses.

The `Cursor` object also supports the iterator interface, which is buffered
(controlled by `cursor.arraysize`):

```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```

You can also get back a pandas DataFrame object

```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```


[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impyla-0.8.1.tar.gz (45.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

impyla-0.8.1-py2.7.egg (147.9 kB view details)

Uploaded Egg

File details

Details for the file impyla-0.8.1.tar.gz.

File metadata

  • Download URL: impyla-0.8.1.tar.gz
  • Upload date:
  • Size: 45.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.8.1.tar.gz
Algorithm Hash digest
SHA256 d3bbdfefe9bc955fb216c54390417840a2d7ff38a9d4c730175f9df99ee97baa
MD5 e1703b71aa21c3bb04827462a5a3d1f5
BLAKE2b-256 6eff7320a4f98c73b87823a0a98910b926e80d53eac6e999e54c0defb07d9fb0

See more details on using hashes here.

File details

Details for the file impyla-0.8.1-py2.7.egg.

File metadata

  • Download URL: impyla-0.8.1-py2.7.egg
  • Upload date:
  • Size: 147.9 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for impyla-0.8.1-py2.7.egg
Algorithm Hash digest
SHA256 ef8d9187dd80449bbee3b75c852e3ea296078de8aebc3c64992addadb3faa1aa
MD5 7ce0eebf9914440323f177e81d8e1857
BLAKE2b-256 83416fcd08ccbf7e6f4fc8e6cf9106f6fb7ca2332d425c9d53878d973d5fdff1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page