Speech Utils
Project description
su - Speech Utils
A comprehensive toolkit for speech recognition, text-to-speech generation, and audio processing with simple, intuitive interfaces.
Installation
pip install su
Quick Start
Speech Recognition
import su
# Quick recognition from microphone
text = su.recognize()
print(f"You said: {text}")
# Custom timeout and engine
text = su.recognize(timeout=10, engine='sphinx')
# Transcribe from various audio sources
text = su.transcribe("recording.wav") # File path
print(f"Audio contains: {text}")
# Transcribe from bytes
with open("audio.wav", "rb") as f:
audio_bytes = f.read()
text = su.transcribe(audio_bytes)
# Transcribe from file-like object
from io import BytesIO
audio_stream = BytesIO(audio_bytes)
text = su.transcribe(audio_stream)
# Transcribe from live microphone using transcribe
text = su.transcribe({'type': 'microphone', 'timeout': 5})
# Use offline engine for transcription
text = su.transcribe("recording.wav", engine='sphinx')
# Advanced usage
recognizer = su.SpeechRecognizer(engine='google')
text = recognizer.listen_and_recognize(timeout=10)
Text-to-Speech
import su
# Quick speech
su.speak("Hello, world!")
# Custom voice settings
su.speak("Slow and quiet", rate=100, volume=0.5)
# Read text from file (path starting with /)
su.speak("/path/to/speech.txt")
# Save to file without hearing
su.speak("Save this", egress="output.wav", send_to_speakers=False)
# Get audio bytes for custom use
audio_bytes = su.speak("Test", egress=lambda x: x, send_to_speakers=False)
# Both save and hear
su.speak("Hello", egress="greeting.wav", send_to_speakers=True)
# Advanced usage
tts = su.TextToSpeech(rate=150, volume=0.8)
tts.speak("This is a test", save_to="output.wav")
# List available voices
voices = tts.get_voices()
for voice in voices:
print(f"Voice: {voice['name']} ({voice['lang']})")
Partial Application for Custom Functions
import su
from functools import partial
# Create custom recognizer functions
fast_recognize = partial(su.recognize, timeout=2, engine='google')
offline_recognize = partial(su.recognize, engine='sphinx')
# Create custom speech functions
robot_voice = partial(su.speak, rate=300, volume=1.0)
quiet_voice = partial(su.speak, rate=150, volume=0.3)
# Create custom transcription functions
offline_transcribe = partial(su.transcribe, engine='sphinx')
google_transcribe = partial(su.transcribe, engine='google')
# Use them
text = fast_recognize() # Quick 2-second recognition
robot_voice("I am a robot") # Fast, loud speech
text = offline_transcribe("audio.wav") # Offline transcription
Audio Processing
import su
# Load and analyze audio
audio, sample_rate = su.AudioProcessor.load_audio("speech.wav")
features = su.AudioProcessor.extract_features(audio, sample_rate)
print(f"MFCC shape: {features['mfcc'].shape}")
print(f"Tempo: {features['tempo']} BPM")
# Convert audio formats
su.AudioProcessor.convert_format("input.mp3", "output.wav")
Features
🎤 Speech Recognition
- Multiple Engines: Google, Sphinx, Wit.ai, Azure, Houndify
- Live Recognition: Real-time microphone input
- File Transcription: Support for various audio formats
- Noise Handling: Automatic ambient noise adjustment
🔊 Text-to-Speech
- Cross-Platform: Works on Windows, macOS, Linux
- Voice Control: Rate, volume, and voice selection
- File Export: Save speech to audio files
- Multiple Voices: Access to system voices
🎵 Audio Processing
- Format Conversion: MP3, WAV, FLAC, and more
- Feature Extraction: MFCC, spectral features, tempo
- ML Ready: Features suitable for machine learning
- Librosa Integration: Advanced audio analysis
API Reference
Convenience Functions
# Speech recognition with customizable settings
text = su.recognize(timeout=5, engine='google')
# Flexible text-to-speech with multiple input/output options
result = su.speak(text_src, rate=200, volume=0.9, egress=None, send_to_speakers=True)
# Where text_src can be:
# - "Hello world" (direct text)
# - "/path/to/file.txt" (file path - must start with / or drive letter)
# - Path("file.txt") (Path object)
# - StringIO("text") (file-like object)
# - text_iterator() (iterator yielding text chunks)
# Where egress can be:
# - None (default - no special output)
# - "output.wav" (save to file path)
# - lambda x: x (return audio bytes)
# - custom_function (process audio bytes)
# Flexible audio transcription with multiple source types
text = su.transcribe(audio_src, engine='google')
SpeechRecognizer
recognizer = su.SpeechRecognizer(engine='google')
# Listen from microphone
text = recognizer.listen_and_recognize(timeout=5)
# Transcribe file
text = recognizer.recognize_file("audio.wav")
TextToSpeech
tts = su.TextToSpeech(rate=200, volume=0.9)
# Speak text
tts.speak("Hello world")
# Save to file
tts.speak("Save this", save_to="output.wav")
# Change voice
voices = tts.get_voices()
tts.set_voice(voices[0]['id'])
AudioProcessor
# Load audio
audio, sr = su.AudioProcessor.load_audio("file.wav")
# Extract ML features
features = su.AudioProcessor.extract_features(audio, sr)
# Convert format
su.AudioProcessor.convert_format("input.mp3", "output.wav")
Dependencies
- speech_recognition: Speech recognition functionality
- pyttsx3: Text-to-speech conversion
- librosa: Audio analysis and feature extraction
- pydub: Audio format conversion
- numpy: Numerical operations
- pyaudio: Audio I/O operations
System Requirements
For Speech Recognition:
- Windows: No additional requirements
- macOS: No additional requirements
- Linux:
sudo apt-get install flac(for FLAC support)
For Audio Processing:
- FFmpeg (for format conversion): Download from https://ffmpeg.org/
Examples
Voice Assistant with Custom Settings
import su
from functools import partial
# Create optimized functions for the assistant
quick_listen = partial(su.recognize, timeout=3, engine='google')
assistant_voice = partial(su.speak, rate=180, volume=0.8)
while True:
print("Listening...")
text = quick_listen()
if text:
print(f"You said: {text}")
response = f"You said: {text}"
assistant_voice(response)
if text and "goodbye" in text.lower():
assistant_voice("Goodbye!")
break
Audio Analysis Pipeline
import su
import numpy as np
# Load audio file
audio, sr = su.AudioProcessor.load_audio("speech.wav")
# Extract features for ML
features = su.AudioProcessor.extract_features(audio, sr)
# Use MFCC features (common for speech recognition)
mfcc_features = features['mfcc']
mfcc_mean = np.mean(mfcc_features, axis=1)
print(f"MFCC feature vector shape: {mfcc_mean.shape}")
Batch Processing with Different Engines
import su
from functools import partial
from pathlib import Path
# Create specialized transcription functions
google_transcribe = partial(su.transcribe, engine='google') # For online processing
sphinx_transcribe = partial(su.transcribe, engine='sphinx') # For offline processing
input_dir = Path("audio_files")
output_dir = Path("transcriptions")
output_dir.mkdir(exist_ok=True)
for audio_file in input_dir.glob("*.wav"):
print(f"Processing {audio_file.name}...")
# Try Google first (better accuracy), fallback to Sphinx
text = google_transcribe(audio_file) or sphinx_transcribe(audio_file)
# Save transcription
output_file = output_dir / f"{audio_file.stem}.txt"
with open(output_file, "w") as f:
f.write(text or "Transcription failed")
Voice Profile System
import su
from functools import partial
# Define different voice profiles
profiles = {
'assistant': partial(su.speak, rate=180, volume=0.8),
'narrator': partial(su.speak, rate=150, volume=0.7),
'robot': partial(su.speak, rate=250, volume=1.0),
'whisper': partial(su.speak, rate=120, volume=0.3),
}
# Use different voices for different purposes
profiles['assistant']("How can I help you today?")
profiles['narrator']("Once upon a time, in a land far away...")
profiles['robot']("SYSTEM INITIALIZED. READY FOR COMMANDS.")
profiles['whisper']("This is a secret message.")
# Save different voice outputs
for name, voice_func in profiles.items():
voice_func(f"This is the {name} voice.", save_to=f"{name}_sample.wav")
Flexible Audio Sources
The transcribe() function accepts audio from multiple sources:
import su
from io import BytesIO
# 1. File paths (strings or Path objects)
text = su.transcribe("recording.wav")
text = su.transcribe(Path("audio/speech.mp3"))
# 2. Raw audio bytes
with open("audio.wav", "rb") as f:
audio_bytes = f.read()
text = su.transcribe(audio_bytes)
# 3. File-like objects (BytesIO, open files, etc.)
audio_stream = BytesIO(audio_bytes)
text = su.transcribe(audio_stream)
# 4. Open file handles
with open("recording.wav", "rb") as f:
text = su.transcribe(f)
# 5. Audio chunk iterators
def audio_chunks():
with open("large_audio.wav", "rb") as f:
while True:
chunk = f.read(8192) # 8KB chunks
if not chunk:
break
yield chunk
text = su.transcribe(audio_chunks())
# 6. Live microphone via transcribe
text = su.transcribe({'type': 'microphone', 'timeout': 10})
# 7. Network streams or any file-like object
import requests
response = requests.get("https://example.com/audio.wav", stream=True)
text = su.transcribe(BytesIO(response.content))
Batch Processing with Different Sources
import su
from functools import partial
from pathlib import Path
from io import BytesIO
# Create specialized transcription functions
google_transcribe = partial(su.transcribe, engine='google')
sphinx_transcribe = partial(su.transcribe, engine='sphinx')
# Process various audio sources
sources = [
"local_file.wav", # File path
BytesIO(audio_bytes), # Bytes stream
{'type': 'microphone', 'timeout': 3}, # Live microphone
Path("recordings/interview.mp3"), # Path object
]
for i, source in enumerate(sources):
print(f"Processing source {i+1}...")
# Try Google first, fallback to Sphinx
text = google_transcribe(source) or sphinx_transcribe(source)
print(f"Result: {text or 'Transcription failed'}")
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file su-0.0.8.tar.gz.
File metadata
- Download URL: su-0.0.8.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c422ebaf516d066b5ec874cfa3f867ff349de7c7c2451461be4d77d97ccceace
|
|
| MD5 |
d36dbcfd76952f43a95f7eed509c0c7f
|
|
| BLAKE2b-256 |
c0b665aa9b37ebc837568da9ce94afbd580bca90a27ce07c9ef6bd02e56f0600
|
File details
Details for the file su-0.0.8-py3-none-any.whl.
File metadata
- Download URL: su-0.0.8-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a57c4805f18c82a992b91783b66fcea4212c97d35ee079ddfa2ebb83156cde66
|
|
| MD5 |
aa35b39292c4ced34f74b5a54971840b
|
|
| BLAKE2b-256 |
9d0b706a12020388a1f9b2c66d0e1e492e8fdba28bcfe122fee95906a2eb8b4c
|