Rhythm and Tonal Analysis Implementation¶
Overview¶
This document describes the rhythm and tonal analysis features implemented for audiomancer using Essentia.
Components¶
1. Rhythm Analysis (src/audiomancer/analyzers/rhythm.py)¶
Extracts tempo, beat positions, and loop detection using Essentia's RhythmExtractor2013.
Features Extracted:
- bpm: float or None (beats per minute, None for non-rhythmic content)
- bpm_confidence: float (0-1, confidence in BPM detection)
- beat_positions: list[float] (beat times in seconds)
- is_loop: bool (True if audio appears to be a rhythmic loop)
Key Algorithms:
- RhythmExtractor2013(method="multifeature") - BPM detection with confidence
Loop Detection Logic: - Duration matches bar boundaries (within 10% tolerance) - Duration is less than 30 seconds (typical loop length) - At least 1 bar of audio - Assumes 4/4 time signature
Error Handling: - Returns None for BPM on silence, NaN, or non-rhythmic content - Returns empty beat_positions list for silence - Raises AnalysisFailedError for invalid input (empty audio)
2. Tonal Analysis (src/audiomancer/analyzers/tonal.py)¶
Extracts key, tuning, and pitch salience using Essentia's KeyExtractor and spectral analysis.
Features Extracted:
- key: str or None (e.g., "C", "Dm", "F#m", None for percussion/noise)
- key_confidence: float (0-1, confidence in key detection)
- tuning_frequency: float (Hz, reference tuning, typically ~440)
- pitch_salience: float (0-1, how tonal vs percussive)
Key Algorithms:
- KeyExtractor() - Key and scale detection
- TuningFrequency() - Reference tuning detection (requires spectral peaks)
- PitchSalience() - Tonal vs percussive distinction
- SpectralPeaks() - Extract frequency peaks for tuning analysis
Key Formatting: - Major keys: "C", "D", "F#", etc. - Minor keys: "Am", "Dm", "F#m", etc. - Returns None if confidence < 0.2
Pitch Salience: - Higher values (>0.5) indicate tonal/melodic content - Lower values (<0.5) indicate percussive/noisy content - Computed as mean across all audio frames
Error Handling: - Returns None for key on silence, NaN, or non-tonal content - Defaults tuning_frequency to 440 Hz if detection fails - Returns 0.0 for pitch_salience on silence - Raises AnalysisFailedError for invalid input (empty audio)
Usage Examples¶
from audiomancer.analyzers import extract_rhythm_features, extract_tonal_features
import librosa
# Load audio
y, sr = librosa.load("sample.wav", sr=None)
# Extract rhythm features
rhythm = extract_rhythm_features(y, sr)
print(f"BPM: {rhythm['bpm']}")
print(f"Is loop: {rhythm['is_loop']}")
print(f"Beat positions: {rhythm['beat_positions'][:5]}...")
# Extract tonal features
tonal = extract_tonal_features(y, sr)
print(f"Key: {tonal['key']}")
print(f"Tuning: {tonal['tuning_frequency']} Hz")
print(f"Pitch salience: {tonal['pitch_salience']}")
Test Coverage¶
Unit Tests (25 tests)¶
Rhythm Analyzer Tests: - Silence handling (returns None for BPM) - Sine wave (non-rhythmic, returns None) - Impulse (low confidence) - 4/4 loop detection - Feature shape validation - Stereo to mono conversion - Empty audio error handling - NaN audio handling - Long audio (>30s) not marked as loop - Beat positions are sorted - No NaN/inf in output
Tonal Analyzer Tests: - Silence handling (returns None for key) - Sine wave (measurable pitch salience) - Impulse (low pitch salience) - Feature shape validation - Major key format (e.g., "C") - Minor key format (e.g., "Am") - Stereo to mono conversion - Empty audio error handling - NaN audio handling - Tuning frequency defaults - Pitch salience range validation - No NaN/inf in output - Percussion handling - Tonal vs percussive distinction
Integration Tests (5 tests)¶
- Complete analysis on silence
- Complete analysis on tonal content (C major arpeggio)
- Complete analysis on rhythmic percussion
- Output consistency across different audio types
- Combined features for sample classification
Dependencies¶
essentia-tensorflow>=2.1b6.dev1110,<3- Audio analysis algorithmsnumpy>=1.24.0,<2- Array operationslibrosa>=0.10.0,<0.11- Audio loading (optional, for examples)
Design Decisions¶
Why None instead of 0 for BPM?¶
Using None makes it clear that BPM could not be determined, rather than implying a BPM of 0. This is important for: - Distinguishing non-rhythmic content from silence - Avoiding division by zero in downstream calculations - Making API contracts clearer (Optional[float] vs float)
Why threshold key confidence at 0.2?¶
Key detection on noise/percussion can produce spurious results with low confidence. Only reporting keys with confidence > 0.2 reduces false positives while allowing detection on real tonal content.
Why default tuning to 440 Hz?¶
When tuning detection fails (e.g., on pure noise), defaulting to 440 Hz provides a sensible reference value that won't break downstream calculations. This is the standard concert pitch.
Why frame-based pitch salience?¶
Computing pitch salience across multiple frames and averaging provides a more robust measure of overall tonality than analyzing the entire signal at once. This handles: - Varying tonal content over time - Mixed percussive and tonal elements - Transients and noise
Validation¶
All values returned by the analyzers are validated to ensure: - No NaN or inf values - Confidence values in [0, 1] range - Tuning frequency in reasonable range (400-480 Hz) - BPM is positive when detected - Beat positions are sorted in ascending order
Future Improvements¶
Potential enhancements for future versions:
- Multi-tempo detection: Handle tempo changes within a single file
- Time signature detection: Beyond assumed 4/4
- Downbeat detection: Identify measure boundaries
- Harmonic analysis: Chord progressions, harmonic rhythm
- Melodic analysis: Contour, intervals, motifs
- Rhythmic pattern extraction: Quantized groove templates
- Genre classification: Using rhythm and tonal features
- Tempo stability: Measure tempo variance over time
References¶
- Essentia Documentation: https://essentia.upf.edu/documentation/
- RhythmExtractor2013 Algorithm: https://essentia.upf.edu/reference/std_RhythmExtractor2013.html
- KeyExtractor Algorithm: https://essentia.upf.edu/reference/std_KeyExtractor.html
- PitchSalience Algorithm: https://essentia.upf.edu/reference/std_PitchSalience.html