A class to handle and process multiple files with identical structures within a directory.
Project description
data-harvest-reader
Features
- Reading Various File Formats
- Directory and ZIP File Handling
- Data Joining
- Deduplication
- Custom Filters
- Logging
Installation Requirements
pip install polars loguru
Usage
Initialization
from data_harvest_reader import DataReader
data_reader = DataReader(log_to_file=True, log_file="data_reader.log")
Reading Data
From Directory
data = data_reader.read_data('path/to/directory', join_similar=True)
From ZIP File
data = data_reader.read_data('path/to/zipfile.zip', join_similar=False)
Applying Deduplication
duplicated_subset_dict = {'file1': ['column1', 'column2']}
data = data_reader.read_data('path/to/source', duplicated_subset_dict=duplicated_subset_dict)
Applying Filters
filter_subset = {
'file1': [{'column': 'Col1', 'operation': '>', 'values': 100},
{'column': 'Col2', 'operation': '==', 'values': 'Value'}]
}
data = data_reader.read_data('path/to/source', filter_subset=filter_subset)
Handling Exceptions
try:
data = data_reader.read_data('path/to/source')
except UnsupportedFormatError:
print("Unsupported file format provided")
except FilterConfigurationError:
print("Error in filter configuration")
Example
data_reader = DataReader()
data = data_reader.read_data(r'C:\path\to\data', join_similar=True,
filter_subset={'example_file': [{'column': 'Age', 'operation': '>', 'values': 30}]})
Contributing to DataReader
Getting Started
- Fork the Repository: Start by forking the main repository. This creates your own copy of the project where you can make changes.
- Clone the Forked Repository: Clone your fork to your local machine. This step allows you to work on the codebase directly.
- Set Up the Development Environment: Ensure you have all necessary dependencies installed. It's recommended to use a virtual environment.
- Create a New Branch: Always create a new branch for your changes. This keeps the main branch stable and makes reviewing changes easier.
Making Contributions
- Make Your Changes: Implement your feature, fix a bug, or make your proposed changes. Ensure your code adheres to the project's coding standards and guidelines.
- Test Your Changes: Before submitting, test your changes thoroughly. Write unit tests if applicable, and ensure all existing tests pass.
- Document Your Changes: Update the documentation to reflect your changes. If you're adding a new feature, include usage examples. Push your changes to your fork on GitHub.
- Commit Your Changes: Make concise and clear commit messages, describing what each commit does.
- Push to Your Fork: Push your changes to your fork on GitHub.
- Create a Pull Request (PR): Go to the original
DataReader
repository and create a pull request from your fork. Ensure you describe your changes in detail and link any relevant issues.
Review Process
After submitting your PR, the maintainers will review your changes. Be responsive to feedback:
- Respond to Comments: If the reviewers ask for changes, make them promptly. Discuss any suggestions or concerns.
- Update Your PR: If needed, update your PR based on feedback. This may involve adding more tests or tweaking your approach.
Final Steps
Once your PR is approved:
- Merge: The maintainers will merge your changes into the main codebase.
- Stay Engaged: Continue to stay involved in the project. Look out for feedback from users on your new feature or fix.
Conclusion
Contributing to DataReader
is a rewarding experience that benefits the entire user community. Your contributions help make DataReader
a more robust and versatile tool. We welcome developers of all skill levels and appreciate every form of contribution, from code to documentation. Thank you for considering contributing to DataReader
!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for data-harvest-reader-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73a05560f1642e2cc6758235244933870008dbed470e2d0cab1b7453130c1528 |
|
MD5 | 13937ac203fc5f0f36832349d6eb78f8 |
|
BLAKE2b-256 | 6bb4349644143c5e8dc1f088529b078f4508ec21a1d0e7acb945ef86e1ae40c0 |
Close
Hashes for data_harvest_reader-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ec136b68f4196e60d0b916a9a95efce4331acf06e87056d116257d415c53534 |
|
MD5 | f3b74e99e0301986507908c49a42fa1e |
|
BLAKE2b-256 | 927e4578f1681735fbe20747cf9dbe71abadfb4d321875309becf15e0c06b8a2 |