Python client for the Impala distributed query engine
Project description
# impyla
Python client for the Impala distributed query engine.
### Features
Fully supported:
* Lightweight, `pip`-installable package for connecting to Impala databases
* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients)
* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])
Alpha-quality:
* Wrapper for [MADlib][madlib]-style prediction, allowing for large-scale,
distributed machine learning (see [the Impala port of MADlib][madlibport])
* Compiling UDFs written in Python into low-level machine code for execution by
Impala (see the [`udf`](https://github.com/cloudera/impyla/tree/udf) branch;
powered by [Numba][numba]/[LLVM][llvm])
### Dependencies
Required:
* `python2.6` or `python2.7`
* `thrift>=0.8` (Python package only; no need for code-gen)
Optional:
* `pandas` for the `.as_pandas()` function to work
This project is installed with `setuptools>=2`.
### Installation
Install the latest release (`0.8.0`) with `pip`:
```bash
pip install impyla
```
For the latest (dev) version, clone the repo:
```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```
### Quickstart
Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):
```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```
**Note**: the specified port number should be for the *HiveServer2* service
(defaults to 21050 in CM), not Beeswax (defaults to 21000) which is what the
Impala shell uses.
The `Cursor` object also supports the iterator interface, which is buffered
(controlled by `cursor.arraysize`):
```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```
You can also get back a pandas DataFrame object
```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```
[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
Python client for the Impala distributed query engine.
### Features
Fully supported:
* Lightweight, `pip`-installable package for connecting to Impala databases
* Fully [DB API 2.0 (PEP 249)][pep249]-compliant Python client (similar to
sqlite or MySQL clients)
* Converter to [pandas][pandas] `DataFrame`, allowing easy integration into the
Python data stack (including [scikit-learn][sklearn] and
[matplotlib][matplotlib])
Alpha-quality:
* Wrapper for [MADlib][madlib]-style prediction, allowing for large-scale,
distributed machine learning (see [the Impala port of MADlib][madlibport])
* Compiling UDFs written in Python into low-level machine code for execution by
Impala (see the [`udf`](https://github.com/cloudera/impyla/tree/udf) branch;
powered by [Numba][numba]/[LLVM][llvm])
### Dependencies
Required:
* `python2.6` or `python2.7`
* `thrift>=0.8` (Python package only; no need for code-gen)
Optional:
* `pandas` for the `.as_pandas()` function to work
This project is installed with `setuptools>=2`.
### Installation
Install the latest release (`0.8.0`) with `pip`:
```bash
pip install impyla
```
For the latest (dev) version, clone the repo:
```bash
git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install
```
### Quickstart
Impyla implements the [Python DB API v2.0 (PEP 249)][pep249] database interface
(refer to it for API details):
```python
from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description # prints the result set's schema
results = cursor.fetchall()
```
**Note**: the specified port number should be for the *HiveServer2* service
(defaults to 21050 in CM), not Beeswax (defaults to 21000) which is what the
Impala shell uses.
The `Cursor` object also supports the iterator interface, which is buffered
(controlled by `cursor.arraysize`):
```python
cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
process(row)
```
You can also get back a pandas DataFrame object
```python
from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example
```
[pep249]: http://legacy.python.org/dev/peps/pep-0249/
[pandas]: http://pandas.pydata.org/
[sklearn]: http://scikit-learn.org/
[matplotlib]: http://matplotlib.org/
[madlib]: http://madlib.net/
[madlibport]: https://github.com/bitfort/madlibport
[numba]: http://numba.pydata.org/
[llvm]: http://llvm.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
impyla-0.8.1.tar.gz
(45.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
impyla-0.8.1-py2.7.egg
(147.9 kB
view details)
File details
Details for the file impyla-0.8.1.tar.gz.
File metadata
- Download URL: impyla-0.8.1.tar.gz
- Upload date:
- Size: 45.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3bbdfefe9bc955fb216c54390417840a2d7ff38a9d4c730175f9df99ee97baa
|
|
| MD5 |
e1703b71aa21c3bb04827462a5a3d1f5
|
|
| BLAKE2b-256 |
6eff7320a4f98c73b87823a0a98910b926e80d53eac6e999e54c0defb07d9fb0
|
File details
Details for the file impyla-0.8.1-py2.7.egg.
File metadata
- Download URL: impyla-0.8.1-py2.7.egg
- Upload date:
- Size: 147.9 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef8d9187dd80449bbee3b75c852e3ea296078de8aebc3c64992addadb3faa1aa
|
|
| MD5 |
7ce0eebf9914440323f177e81d8e1857
|
|
| BLAKE2b-256 |
83416fcd08ccbf7e6f4fc8e6cf9106f6fb7ca2332d425c9d53878d973d5fdff1
|