Skip to main content

Speech Utils

Project description

su - Speech Utils

A comprehensive toolkit for speech recognition, text-to-speech generation, and audio processing with simple, intuitive interfaces.

Installation

pip install su

Quick Start

Speech Recognition

import su

# Quick recognition from microphone
text = su.recognize()
print(f"You said: {text}")

# Custom timeout and engine
text = su.recognize(timeout=10, engine='sphinx')

# Transcribe from various audio sources
text = su.transcribe("recording.wav")  # File path
print(f"Audio contains: {text}")

# Transcribe from bytes
with open("audio.wav", "rb") as f:
    audio_bytes = f.read()
text = su.transcribe(audio_bytes)

# Transcribe from file-like object
from io import BytesIO
audio_stream = BytesIO(audio_bytes)
text = su.transcribe(audio_stream)

# Transcribe from live microphone using transcribe
text = su.transcribe({'type': 'microphone', 'timeout': 5})

# Use offline engine for transcription
text = su.transcribe("recording.wav", engine='sphinx')

# Advanced usage
recognizer = su.SpeechRecognizer(engine='google')
text = recognizer.listen_and_recognize(timeout=10)

Text-to-Speech

import su

# Quick speech
su.speak("Hello, world!")

# Custom voice settings
su.speak("Slow and quiet", rate=100, volume=0.5)

# Read text from file (path starting with /)
su.speak("/path/to/speech.txt")

# Save to file without hearing
su.speak("Save this", egress="output.wav", send_to_speakers=False)

# Get audio bytes for custom use
audio_bytes = su.speak("Test", egress=lambda x: x, send_to_speakers=False)

# Both save and hear
su.speak("Hello", egress="greeting.wav", send_to_speakers=True)

# Advanced usage
tts = su.TextToSpeech(rate=150, volume=0.8)
tts.speak("This is a test", save_to="output.wav")

# List available voices
voices = tts.get_voices()
for voice in voices:
    print(f"Voice: {voice['name']} ({voice['lang']})")

Partial Application for Custom Functions

import su
from functools import partial

# Create custom recognizer functions
fast_recognize = partial(su.recognize, timeout=2, engine='google')
offline_recognize = partial(su.recognize, engine='sphinx')

# Create custom speech functions
robot_voice = partial(su.speak, rate=300, volume=1.0)
quiet_voice = partial(su.speak, rate=150, volume=0.3)

# Create custom transcription functions
offline_transcribe = partial(su.transcribe, engine='sphinx')
google_transcribe = partial(su.transcribe, engine='google')

# Use them
text = fast_recognize()  # Quick 2-second recognition
robot_voice("I am a robot")  # Fast, loud speech
text = offline_transcribe("audio.wav")  # Offline transcription

Audio Processing

import su

# Load and analyze audio
audio, sample_rate = su.AudioProcessor.load_audio("speech.wav")
features = su.AudioProcessor.extract_features(audio, sample_rate)

print(f"MFCC shape: {features['mfcc'].shape}")
print(f"Tempo: {features['tempo']} BPM")

# Convert audio formats
su.AudioProcessor.convert_format("input.mp3", "output.wav")

Features

🎤 Speech Recognition

  • Multiple Engines: Google, Sphinx, Wit.ai, Azure, Houndify
  • Live Recognition: Real-time microphone input
  • File Transcription: Support for various audio formats
  • Noise Handling: Automatic ambient noise adjustment

🔊 Text-to-Speech

  • Cross-Platform: Works on Windows, macOS, Linux
  • Voice Control: Rate, volume, and voice selection
  • File Export: Save speech to audio files
  • Multiple Voices: Access to system voices

🎵 Audio Processing

  • Format Conversion: MP3, WAV, FLAC, and more
  • Feature Extraction: MFCC, spectral features, tempo
  • ML Ready: Features suitable for machine learning
  • Librosa Integration: Advanced audio analysis

API Reference

Convenience Functions

# Speech recognition with customizable settings
text = su.recognize(timeout=5, engine='google')

# Flexible text-to-speech with multiple input/output options
result = su.speak(text_src, rate=200, volume=0.9, egress=None, send_to_speakers=True)

# Where text_src can be:
# - "Hello world" (direct text)
# - "/path/to/file.txt" (file path - must start with / or drive letter)
# - Path("file.txt") (Path object)
# - StringIO("text") (file-like object)
# - text_iterator() (iterator yielding text chunks)

# Where egress can be:
# - None (default - no special output)
# - "output.wav" (save to file path)
# - lambda x: x (return audio bytes)
# - custom_function (process audio bytes)

# Flexible audio transcription with multiple source types
text = su.transcribe(audio_src, engine='google')

SpeechRecognizer

recognizer = su.SpeechRecognizer(engine='google')

# Listen from microphone
text = recognizer.listen_and_recognize(timeout=5)

# Transcribe file
text = recognizer.recognize_file("audio.wav")

TextToSpeech

tts = su.TextToSpeech(rate=200, volume=0.9)

# Speak text
tts.speak("Hello world")

# Save to file
tts.speak("Save this", save_to="output.wav")

# Change voice
voices = tts.get_voices()
tts.set_voice(voices[0]['id'])

AudioProcessor

# Load audio
audio, sr = su.AudioProcessor.load_audio("file.wav")

# Extract ML features
features = su.AudioProcessor.extract_features(audio, sr)

# Convert format
su.AudioProcessor.convert_format("input.mp3", "output.wav")

Dependencies

  • speech_recognition: Speech recognition functionality
  • pyttsx3: Text-to-speech conversion
  • librosa: Audio analysis and feature extraction
  • pydub: Audio format conversion
  • numpy: Numerical operations
  • pyaudio: Audio I/O operations

System Requirements

For Speech Recognition:

  • Windows: No additional requirements
  • macOS: No additional requirements
  • Linux: sudo apt-get install flac (for FLAC support)

For Audio Processing:

Examples

Voice Assistant with Custom Settings

import su
from functools import partial

# Create optimized functions for the assistant
quick_listen = partial(su.recognize, timeout=3, engine='google')
assistant_voice = partial(su.speak, rate=180, volume=0.8)

while True:
    print("Listening...")
    text = quick_listen()
    
    if text:
        print(f"You said: {text}")
        response = f"You said: {text}"
        assistant_voice(response)
    
    if text and "goodbye" in text.lower():
        assistant_voice("Goodbye!")
        break

Audio Analysis Pipeline

import su
import numpy as np

# Load audio file
audio, sr = su.AudioProcessor.load_audio("speech.wav")

# Extract features for ML
features = su.AudioProcessor.extract_features(audio, sr)

# Use MFCC features (common for speech recognition)
mfcc_features = features['mfcc']
mfcc_mean = np.mean(mfcc_features, axis=1)

print(f"MFCC feature vector shape: {mfcc_mean.shape}")

Batch Processing with Different Engines

import su
from functools import partial
from pathlib import Path

# Create specialized transcription functions
google_transcribe = partial(su.transcribe, engine='google')  # For online processing
sphinx_transcribe = partial(su.transcribe, engine='sphinx')  # For offline processing

input_dir = Path("audio_files")
output_dir = Path("transcriptions")
output_dir.mkdir(exist_ok=True)

for audio_file in input_dir.glob("*.wav"):
    print(f"Processing {audio_file.name}...")
    
    # Try Google first (better accuracy), fallback to Sphinx
    text = google_transcribe(audio_file) or sphinx_transcribe(audio_file)
    
    # Save transcription
    output_file = output_dir / f"{audio_file.stem}.txt"
    with open(output_file, "w") as f:
        f.write(text or "Transcription failed")

Voice Profile System

import su
from functools import partial

# Define different voice profiles
profiles = {
    'assistant': partial(su.speak, rate=180, volume=0.8),
    'narrator': partial(su.speak, rate=150, volume=0.7),
    'robot': partial(su.speak, rate=250, volume=1.0),
    'whisper': partial(su.speak, rate=120, volume=0.3),
}

# Use different voices for different purposes
profiles['assistant']("How can I help you today?")
profiles['narrator']("Once upon a time, in a land far away...")
profiles['robot']("SYSTEM INITIALIZED. READY FOR COMMANDS.")
profiles['whisper']("This is a secret message.")

# Save different voice outputs
for name, voice_func in profiles.items():
    voice_func(f"This is the {name} voice.", save_to=f"{name}_sample.wav")

Flexible Audio Sources

The transcribe() function accepts audio from multiple sources:

import su
from io import BytesIO

# 1. File paths (strings or Path objects)
text = su.transcribe("recording.wav")
text = su.transcribe(Path("audio/speech.mp3"))

# 2. Raw audio bytes
with open("audio.wav", "rb") as f:
    audio_bytes = f.read()
text = su.transcribe(audio_bytes)

# 3. File-like objects (BytesIO, open files, etc.)
audio_stream = BytesIO(audio_bytes)
text = su.transcribe(audio_stream)

# 4. Open file handles
with open("recording.wav", "rb") as f:
    text = su.transcribe(f)

# 5. Audio chunk iterators
def audio_chunks():
    with open("large_audio.wav", "rb") as f:
        while True:
            chunk = f.read(8192)  # 8KB chunks
            if not chunk:
                break
            yield chunk

text = su.transcribe(audio_chunks())

# 6. Live microphone via transcribe
text = su.transcribe({'type': 'microphone', 'timeout': 10})

# 7. Network streams or any file-like object
import requests
response = requests.get("https://example.com/audio.wav", stream=True)
text = su.transcribe(BytesIO(response.content))

Batch Processing with Different Sources

import su
from functools import partial
from pathlib import Path
from io import BytesIO

# Create specialized transcription functions
google_transcribe = partial(su.transcribe, engine='google')
sphinx_transcribe = partial(su.transcribe, engine='sphinx')

# Process various audio sources
sources = [
    "local_file.wav",                                    # File path
    BytesIO(audio_bytes),                               # Bytes stream
    {'type': 'microphone', 'timeout': 3},              # Live microphone
    Path("recordings/interview.mp3"),                   # Path object
]

for i, source in enumerate(sources):
    print(f"Processing source {i+1}...")
    
    # Try Google first, fallback to Sphinx
    text = google_transcribe(source) or sphinx_transcribe(source)
    
    print(f"Result: {text or 'Transcription failed'}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

su-0.0.8.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

su-0.0.8-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file su-0.0.8.tar.gz.

File metadata

  • Download URL: su-0.0.8.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for su-0.0.8.tar.gz
Algorithm Hash digest
SHA256 c422ebaf516d066b5ec874cfa3f867ff349de7c7c2451461be4d77d97ccceace
MD5 d36dbcfd76952f43a95f7eed509c0c7f
BLAKE2b-256 c0b665aa9b37ebc837568da9ce94afbd580bca90a27ce07c9ef6bd02e56f0600

See more details on using hashes here.

File details

Details for the file su-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: su-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for su-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a57c4805f18c82a992b91783b66fcea4212c97d35ee079ddfa2ebb83156cde66
MD5 aa35b39292c4ced34f74b5a54971840b
BLAKE2b-256 9d0b706a12020388a1f9b2c66d0e1e492e8fdba28bcfe122fee95906a2eb8b4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page