Analyzers API Reference¶
The audiomancer.analyzers module provides audio analysis functionality.
Overview¶
analyzers
¶
Audio analysis for audiomancer.
Provides interfaces for SynthDef parsing and audio feature extraction, including basic metadata, spectral, rhythm, tonal analysis, ML classification, and audio embeddings.
__all__ = ['ControlSpec', 'SynthControl', 'SynthDefMetadata', 'SynthDefParser', 'SynthDefStore', 'parse_synthdef', 'categorize_synthdef', 'SynthDefInfo', 'SynthDefControl', 'get_basic_metadata', 'BasicMetadata', 'extract_spectral_features', 'SpectralFeatures', 'extract_rhythm_features', 'extract_tonal_features', 'classify_instrument', 'extract_mood_tags', 'extract_genre_tags', 'extract_audio_embedding', 'cosine_similarity', 'euclidean_distance', 'load_model', 'download_model', 'list_models', 'clear_cache']
module-attribute
¶
BasicMetadata
¶
Bases: TypedDict
Basic audio file metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
duration_ms |
float
|
Audio duration in milliseconds |
sample_rate |
int
|
Sample rate in Hz |
channels |
int
|
Number of audio channels |
bit_depth |
int
|
Bit depth (16 assumed for librosa float32 conversion) |
file_size_bytes |
int
|
File size in bytes |
file_hash |
str
|
SHA256 hex digest of file contents |
ControlSpec
¶
Bases: TypedDict
Specification for a SynthDef control parameter.
Describes the valid range and characteristics of a synth control.
Example
spec = ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", # Exponential scaling for frequency ... step=1.0, ... )
SpectralFeatures
¶
Bases: TypedDict
Spectral audio features.
All frequency values are in Hz, energy values are linear (0-1 range).
Attributes:
| Name | Type | Description |
|---|---|---|
spectral_centroid |
float
|
Mean brightness/center of mass of spectrum (Hz) |
spectral_bandwidth |
float
|
Frequency spread around centroid (Hz proxy) |
spectral_rolloff |
float
|
High-frequency content cutoff point (Hz) |
zero_crossing_rate |
float
|
Measure of noisiness/percussiveness (0-1) |
rms_energy |
float
|
Root-mean-square energy level (0-1 linear) |
dynamic_range |
float
|
Peak-to-average ratio (dB) |
SynthControl
¶
Bases: TypedDict
A control parameter extracted from a SynthDef.
Represents a parameter that can be modified during synthesis.
Example
control = SynthControl( ... name="cutoff", ... default_value=1200.0, ... spec=ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", ... step=1.0, ... ), ... )
SynthDefControl
dataclass
¶
A SynthDef control parameter.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Parameter name (e.g., "freq", "cutoff") |
default_value |
float
|
Default value for the parameter |
spec |
Optional[str]
|
ControlSpec if specified (e.g., "\freq.asSpec") |
description |
Optional[str]
|
Human-readable description (if available) |
Example
ctrl = SynthControl(name="freq", default_value=440.0) ctrl.name 'freq' ctrl.default_value 440.0
SynthDefInfo
dataclass
¶
Parsed SynthDef metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
SynthDef name (e.g., "tb303", "simple_sine") |
file_path |
str
|
Absolute path to .scd file |
file_hash |
str
|
SHA256 hash of source code |
num_channels |
int
|
Output channel count |
has_gate |
bool
|
Whether synth has gate parameter for note-off |
has_envelope |
bool
|
Whether synth uses EnvGen |
ugens_used |
list[str]
|
List of UGen class names used |
controls |
list[SynthControl]
|
List of control parameters |
source_code |
str
|
Raw SuperCollider source code |
category |
Optional[str]
|
Inferred category (bass, lead, pad, drum, fx) |
tags |
list[str]
|
Additional tags for categorization |
Example
info = SynthDefInfo( ... name="simple_sine", ... file_path="/path/to/simple_sine.scd", ... file_hash="abc123", ... num_channels=2, ... has_gate=True, ... has_envelope=True, ... ugens_used=["SinOsc", "EnvGen", "Out"], ... controls=[SynthControl("freq", 440.0)], ... source_code="SynthDef(...)", ... category="lead", ... )
SynthDefMetadata
¶
Bases: TypedDict
Metadata extracted from a SynthDef file.
Contains all information parsed from a .scd file including controls, UGens used, and categorization.
Example
synthdef = SynthDefMetadata( ... id="synt_tb303", ... name="tb303", ... file_path="/synths/tb303.scd", ... file_hash="abc123", ... num_channels=1, ... has_gate=True, ... has_envelope=True, ... ugens_used=["SinOsc", "Resonz", "EnvGen", "Out"], ... category="bass", ... tags=["acid", "303", "classic"], ... source_code="SynthDef(\tb303, { ... })", ... controls=[ ... SynthControl( ... name="cutoff", ... default_value=1200.0, ... spec=ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", ... step=1.0, ... ), ... ), ... SynthControl( ... name="resonance", ... default_value=0.7, ... spec=ControlSpec( ... min=0.0, ... max=1.0, ... default=0.7, ... warp="linear", ... step=0.01, ... ), ... ), ... ], ... )
SynthDefParser
¶
Bases: Protocol
Interface for parsing SuperCollider SynthDef files.
Extracts controls, UGens, and metadata from .scd files using sclang.
extract_controls(source_code: str) -> list[SynthControl]
¶
Extract control parameters from SynthDef source code.
Parses arg declarations and infers specs from usage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_code
|
str
|
SuperCollider source code |
required |
Returns:
| Type | Description |
|---|---|
list[SynthControl]
|
List of extracted controls |
Example
code = ''' ... SynthDef(\tb303, { |cutoff=1200, resonance=0.7| ... var sig = Saw.ar(freq); ... sig = Resonz.ar(sig, cutoff, 1/resonance); ... Out.ar(0, sig); ... }) ... ''' controls = parser.extract_controls(code) controls [ SynthControl(name="cutoff", default_value=1200.0, ...), SynthControl(name="resonance", default_value=0.7, ...), ]
extract_ugens(source_code: str) -> list[str]
¶
Extract UGen class names from source code.
Finds all UGen.ar() and UGen.kr() calls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_code
|
str
|
SuperCollider source code |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of unique UGen class names |
Example
code = ''' ... SynthDef(\tb303, { ... var sig = Saw.ar(440); ... sig = Resonz.ar(sig, 1200); ... Out.ar(0, sig); ... }) ... ''' ugens = parser.extract_ugens(code) ugens ["Saw", "Resonz", "Out"]
infer_category(metadata: SynthDefMetadata) -> str
¶
Infer synth category from UGens and controls.
Categorization rules: - bass: Low-pass filter + low frequency range - lead: High resonance + envelope - pad: Long envelope + multiple oscillators - drum: Noise + short envelope - fx: No oscillators, effect UGens only
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata
|
SynthDefMetadata
|
Parsed SynthDef metadata |
required |
Returns:
| Type | Description |
|---|---|
str
|
Category string |
Example
metadata = SynthDefMetadata( ... ugens_used=["Saw", "Resonz", "EnvGen"], ... controls=[ ... SynthControl(name="cutoff", default_value=1200, ...), ... ], ... ... ... ) category = parser.infer_category(metadata) category "bass"
parse(file_path: str, timeout: int = 5) -> SynthDefMetadata
¶
Parse a SynthDef file and extract metadata.
Uses subprocess to run sclang with shell=False and timeout for safety. Falls back to binary parser if sclang fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Absolute path to .scd file |
required |
timeout
|
int
|
Maximum time to wait for sclang (seconds) |
5
|
Returns:
| Type | Description |
|---|---|
SynthDefMetadata
|
Complete SynthDef metadata |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If file_path does not exist |
ParseError
|
If file cannot be parsed (invalid syntax) |
SubprocessTimeoutError
|
If sclang exceeds timeout |
Example
parser = SynthDefParser() metadata = parser.parse("/synths/tb303.scd", timeout=5) metadata['name'] "tb303" len(metadata['controls']) 7 metadata['controls'][0]['name'] "cutoff"
parse_batch(file_paths: list[str], timeout: int = 5) -> list[SynthDefMetadata]
¶
Parse multiple SynthDef files in batch.
More efficient than individual parse() calls for many files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_paths
|
list[str]
|
List of absolute paths to .scd files |
required |
timeout
|
int
|
Maximum time per file (seconds) |
5
|
Returns:
| Type | Description |
|---|---|
list[SynthDefMetadata]
|
List of SynthDef metadata in same order as input |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If any file does not exist (fails fast) |
ParseError
|
On first parse failure (no partial results) |
Example
parser = SynthDefParser() files = ["/synths/tb303.scd", "/synths/juno.scd"] results = parser.parse_batch(files, timeout=5) len(results) 2
validate_path(file_path: str) -> bool
¶
Validate that file path is safe and exists.
Checks for: - File existence - .scd extension - No path traversal (../) - Readable permissions
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to validate |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if valid, False otherwise |
Example
parser = SynthDefParser() parser.validate_path("/synths/tb303.scd") True parser.validate_path("/etc/passwd") # Wrong extension False parser.validate_path("../../etc/passwd.scd") # Traversal False
SynthDefStore
¶
Bases: Protocol
Interface for SynthDef storage operations.
Similar to SampleStore but for synthesizer definitions.
add(synthdef: SynthDefMetadata) -> str
¶
Add SynthDef to database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
synthdef
|
SynthDefMetadata
|
Complete SynthDef metadata |
required |
Returns:
| Type | Description |
|---|---|
str
|
Synth ID (format: "synt_{hash[:8]}") |
Raises:
| Type | Description |
|---|---|
DuplicateSynthError
|
If SynthDef with same name already exists |
Example
synthdef = SynthDefMetadata( ... name="tb303", ... file_path="/synths/tb303.scd", ... file_hash="abc123", ... num_channels=1, ... has_gate=True, ... has_envelope=True, ... ugens_used=["Saw", "Resonz", "EnvGen", "Out"], ... source_code="SynthDef(...)", ... controls=[...], ... ) synth_id = store.add(synthdef) synth_id "synt_abc123"
delete(synth_id: str) -> bool
¶
Delete SynthDef from database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
synth_id
|
str
|
Synth ID to delete |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if deleted, False if not found |
Example
success = store.delete("synt_abc123") success True
get(synth_id: str) -> Optional[SynthDefMetadata]
¶
Retrieve SynthDef by ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
synth_id
|
str
|
Synth ID (format: "synt_{hash[:8]}") |
required |
Returns:
| Type | Description |
|---|---|
Optional[SynthDefMetadata]
|
SynthDef metadata if found, None otherwise |
Example
synthdef = store.get("synt_abc123") synthdef['name'] "tb303" store.get("synt_nonexistent") None
get_by_name(name: str) -> Optional[SynthDefMetadata]
¶
Retrieve SynthDef by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
SynthDef name |
required |
Returns:
| Type | Description |
|---|---|
Optional[SynthDefMetadata]
|
SynthDef metadata if found, None otherwise |
Example
synthdef = store.get_by_name("tb303") synthdef['id'] "synt_abc123"
search(category: Optional[str] = None, has_gate: Optional[bool] = None, tags: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SynthDefMetadata]
¶
Search SynthDefs with filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
Optional[str]
|
Filter by category (bass, lead, pad, drum, fx) |
None
|
has_gate
|
Optional[bool]
|
Filter by gate presence |
None
|
tags
|
Optional[list[str]]
|
Filter by tags (matches if ANY tag present) |
None
|
limit
|
int
|
Maximum results to return |
50
|
offset
|
int
|
Number of results to skip (for pagination) |
0
|
Returns:
| Type | Description |
|---|---|
list[SynthDefMetadata]
|
List of matching SynthDefs |
Example
Find bass synths with gate¶
results = store.search( ... category="bass", ... has_gate=True, ... limit=10, ... ) len(results) <= 10 True
update(synth_id: str, updates: dict[str, Any]) -> bool
¶
Update SynthDef fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
synth_id
|
str
|
Synth ID to update |
required |
updates
|
dict[str, Any]
|
Dictionary of field names and new values |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if updated, False if not found |
Example
success = store.update( ... "synt_abc123", ... {"category": "lead", "tags": ["acid", "303"]}, ... ) success True
categorize_synthdef(info: SynthDefInfo) -> str
¶
Infer category from UGens and controls.
Categories: - bass: Low-frequency synths with filters (MoogFF, RLPF) - lead: Pitched synths with envelopes and gate - pad: Long sustained synths with gate - drum: Percussive synths without gate or noise-based - fx: Effect processors, noise generators
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
info
|
SynthDefInfo
|
SynthDefInfo to categorize |
required |
Returns:
| Type | Description |
|---|---|
str
|
Category string (bass, lead, pad, drum, fx) |
Example
info = SynthDefInfo(...) categorize_synthdef(info) 'bass'
classify_instrument(audio: np.ndarray, sr: int, model_path: Optional[str] = None, top_k: int = 3) -> dict[str, Any]
¶
Classify audio into instrument categories using Essentia's pre-trained models.
Uses MTG-Jamendo instrument classification model to detect instrument presence. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
model_path
|
Optional[str]
|
Path to custom model file, or None to use default |
None
|
top_k
|
int
|
Number of top predictions to return |
3
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model file not found |
AnalysisFailedError
|
If classification fails |
Example
result = classify_instrument(audio, 44100) result['instrument_type'] 'drums' result['instrument_confidence'] 0.92 result['top_predictions'][('drums', 0.92), ('percussion', 0.78), ('beat', 0.45)]
clear_cache(model_type: Optional[ModelType] = None) -> None
¶
Clear model cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type
|
Optional[ModelType]
|
Specific model to clear, or None to clear all |
None
|
Example
clear_cache("musicnn") # Clear specific model clear_cache() # Clear all models
cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float
¶
Compute cosine similarity between two embeddings.
Since embeddings are L2-normalized, cosine similarity is just the dot product.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding1
|
list[float]
|
First embedding (128-dim) |
required |
embedding2
|
list[float]
|
Second embedding (128-dim) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity in range [-1, 1] |
float
|
(1 = identical, 0 = orthogonal, -1 = opposite) |
Example
emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) similarity = cosine_similarity(emb1, emb2) 0 <= similarity <= 1 # For typical audio True
download_model(model_type: ModelType, force: bool = False, verify_checksum: bool = True) -> Path
¶
Download an Essentia model from the model zoo.
Downloads to ~/.local/share/audiomancer/models/ and verifies checksum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type
|
ModelType
|
Model type to download |
required |
force
|
bool
|
Force re-download even if file exists |
False
|
verify_checksum
|
bool
|
Verify SHA256 checksum after download |
True
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to downloaded model file |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If download fails or checksum mismatch |
Example
path = download_model("musicnn") path.exists() True path.stat().st_size > 0 True
euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float
¶
Compute Euclidean distance between two embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding1
|
list[float]
|
First embedding (128-dim) |
required |
embedding2
|
list[float]
|
Second embedding (128-dim) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Euclidean distance (lower = more similar) |
Example
emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) distance = euclidean_distance(emb1, emb2) distance >= 0 True
extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]
¶
Extract 128-dimensional audio embedding for similarity search.
Embeddings are L2-normalized fixed-size vectors that encode audio content. Similar-sounding audio will have similar embeddings (high cosine similarity).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
model
|
ModelType
|
Embedding model to use: - "musicnn": MusiCNN embeddings (recommended for music) - "vggish": VGGish embeddings (general audio) - "openl3": OpenL3 embeddings (environmental sounds) |
'musicnn'
|
Returns:
| Type | Description |
|---|---|
list[float]
|
List of 128 floats (L2-normalized embedding vector) |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model file not found |
AnalysisFailedError
|
If embedding extraction fails |
Example
embedding = extract_audio_embedding(audio, 44100) len(embedding) 128
Verify L2 normalization¶
import math math.isclose(sum(x**2 for x in embedding), 1.0, abs_tol=1e-6) True
extract_genre_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]
¶
Extract genre tags using Essentia's genre classifiers.
Uses MTG-Jamendo genre classification model. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
top_k
|
int
|
Maximum number of genre tags to return |
3
|
threshold
|
float
|
Minimum confidence threshold (0-1) |
0.1
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of genre tags sorted by confidence |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model file not found |
AnalysisFailedError
|
If classification fails |
Example
genres = extract_genre_tags(audio, 44100) genres ['techno', 'electronic', 'house']
extract_mood_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]
¶
Extract mood/theme tags using Essentia's mood classifiers.
Uses MTG-Jamendo mood/theme classification model. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
top_k
|
int
|
Maximum number of mood tags to return |
3
|
threshold
|
float
|
Minimum confidence threshold (0-1) |
0.1
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of mood tags sorted by confidence |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model file not found |
AnalysisFailedError
|
If classification fails |
Example
moods = extract_mood_tags(audio, 44100) moods ['dark', 'electronic', 'energetic']
extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]
¶
Extract rhythm/tempo features using Essentia.
Algorithms: - RhythmExtractor2013: BPM detection with confidence - BeatTrackerDegara: Beat positions - OnsetDetection: Transient detection
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Raises:
| Type | Description |
|---|---|
AnalysisFailedError
|
If extraction fails due to invalid audio data |
Example
y, sr = librosa.load("loop_125bpm.wav", sr=None) features = extract_rhythm_features(y, sr) features['bpm'] 125.0 features['bpm_confidence'] 0.95 features['is_loop'] True
extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures
¶
Extract spectral features using Essentia.
Performs frame-based spectral analysis to extract features that describe the frequency content and energy distribution of the audio signal.
Algorithms used: - Centroid: Spectral center of mass, indicates brightness (Hz) - Bandwidth: Frequency spread via 2nd central moment (Hz proxy) - Rolloff: Frequency below which 85% of energy is contained (Hz) - ZeroCrossingRate: Time-domain noisiness measure (0-1) - RMS: Root-mean-square energy level (0-1 linear) - DynamicRange: Peak-to-RMS ratio in dB
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio data as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
Returns:
| Type | Description |
|---|---|
SpectralFeatures
|
Dictionary containing spectral features with units documented |
Raises:
| Type | Description |
|---|---|
AnalysisFailedError
|
If audio is too short, empty, or analysis fails |
Example
import librosa y, sr = librosa.load("kick.wav", sr=None) features = extract_spectral_features(y, sr) features['spectral_centroid'] 1523.5 # Hz - indicates a bright sound features['rms_energy'] 0.45 # Linear energy level features['dynamic_range'] 18.2 # dB peak-to-average ratio
extract_tonal_features(audio: np.ndarray, sr: int) -> dict[str, Any]
¶
Extract tonal/pitch features using Essentia.
Algorithms: - KeyExtractor: Key and scale detection - PitchYin: Pitch tracking for salience - TuningFrequency: Reference tuning detection
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Raises:
| Type | Description |
|---|---|
AnalysisFailedError
|
If extraction fails due to invalid audio data |
Example
y, sr = librosa.load("bass_c.wav", sr=None) features = extract_tonal_features(y, sr) features['key'] 'C' features['pitch_salience'] 0.85
get_basic_metadata(path: Path) -> BasicMetadata
¶
Extract basic audio metadata.
Loads audio file using librosa and computes fundamental properties. The file is loaded with its native sample rate to preserve metadata accuracy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to audio file |
required |
Returns:
| Type | Description |
|---|---|
BasicMetadata
|
Dictionary containing basic metadata: |
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
Raises:
| Type | Description |
|---|---|
UnsupportedFormatError
|
If file cannot be loaded by librosa |
AnalysisFailedError
|
If file is too short or contains invalid audio |
Example
meta = get_basic_metadata(Path("kick.wav")) meta['duration_ms'] 250.5 meta['sample_rate'] 44100 meta['channels'] 1
list_models(include_cached_only: bool = False) -> dict[str, dict[str, Any]]
¶
List available models and their status.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_cached_only
|
bool
|
Only show cached models |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, dict[str, Any]]
|
Dictionary mapping model type to info dict with keys: |
dict[str, dict[str, Any]]
|
|
dict[str, dict[str, Any]]
|
|
dict[str, dict[str, Any]]
|
|
Example
models = list_models() "musicnn" in models True models["musicnn"]["cached"] True
load_model(model_type: ModelType, auto_download: bool = True) -> Path
¶
Load an Essentia model, downloading if necessary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_type
|
ModelType
|
Model type to load |
required |
auto_download
|
bool
|
Automatically download if not cached |
True
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to model file |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model not found and auto_download=False |
Example
model_path = load_model("musicnn") model_path.exists() True
parse_synthdef(path: Path, timeout: float = 10.0) -> SynthDefInfo
¶
Parse a SuperCollider SynthDef file using sclang.
Uses sclang subprocess to extract SynthDesc metadata. Falls back to regex parsing if sclang is unavailable or fails.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to .scd file containing SynthDef |
required |
timeout
|
float
|
Maximum time to wait for sclang (seconds) |
10.0
|
Returns:
| Type | Description |
|---|---|
SynthDefInfo
|
SynthDefInfo with extracted metadata |
Raises:
| Type | Description |
|---|---|
SynthDefError
|
If SynthDef is invalid or cannot be parsed |
SubprocessTimeoutError
|
If sclang takes too long |
Example
info = parse_synthdef(Path("synths/tb303.scd")) info.name 'tb303' info.controls[0].name 'out' info.ugens_used ['Saw', 'Pulse', 'Select', 'MoogFF', 'EnvGen', 'Out', 'Lag']
Basic Metadata¶
basic
¶
Basic audio metadata extraction for audiomancer.
This module provides functions for extracting fundamental audio file metadata such as duration, sample rate, channel count, and file hash.
BasicMetadata
¶
Bases: TypedDict
Basic audio file metadata.
Attributes:
| Name | Type | Description |
|---|---|---|
duration_ms |
float
|
Audio duration in milliseconds |
sample_rate |
int
|
Sample rate in Hz |
channels |
int
|
Number of audio channels |
bit_depth |
int
|
Bit depth (16 assumed for librosa float32 conversion) |
file_size_bytes |
int
|
File size in bytes |
file_hash |
str
|
SHA256 hex digest of file contents |
Source code in src/audiomancer/analyzers/basic.py
get_basic_metadata(path: Path) -> BasicMetadata
¶
Extract basic audio metadata.
Loads audio file using librosa and computes fundamental properties. The file is loaded with its native sample rate to preserve metadata accuracy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to audio file |
required |
Returns:
| Type | Description |
|---|---|
BasicMetadata
|
Dictionary containing basic metadata: |
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
BasicMetadata
|
|
Raises:
| Type | Description |
|---|---|
UnsupportedFormatError
|
If file cannot be loaded by librosa |
AnalysisFailedError
|
If file is too short or contains invalid audio |
Example
meta = get_basic_metadata(Path("kick.wav")) meta['duration_ms'] 250.5 meta['sample_rate'] 44100 meta['channels'] 1
Source code in src/audiomancer/analyzers/basic.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Spectral Features¶
spectral
¶
Spectral feature extraction for audiomancer.
This module provides spectral analysis functions using Essentia for extracting features like spectral centroid, bandwidth, rolloff, and energy.
SpectralFeatures
¶
Bases: TypedDict
Spectral audio features.
All frequency values are in Hz, energy values are linear (0-1 range).
Attributes:
| Name | Type | Description |
|---|---|---|
spectral_centroid |
float
|
Mean brightness/center of mass of spectrum (Hz) |
spectral_bandwidth |
float
|
Frequency spread around centroid (Hz proxy) |
spectral_rolloff |
float
|
High-frequency content cutoff point (Hz) |
zero_crossing_rate |
float
|
Measure of noisiness/percussiveness (0-1) |
rms_energy |
float
|
Root-mean-square energy level (0-1 linear) |
dynamic_range |
float
|
Peak-to-average ratio (dB) |
Source code in src/audiomancer/analyzers/spectral.py
extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures
¶
Extract spectral features using Essentia.
Performs frame-based spectral analysis to extract features that describe the frequency content and energy distribution of the audio signal.
Algorithms used: - Centroid: Spectral center of mass, indicates brightness (Hz) - Bandwidth: Frequency spread via 2nd central moment (Hz proxy) - Rolloff: Frequency below which 85% of energy is contained (Hz) - ZeroCrossingRate: Time-domain noisiness measure (0-1) - RMS: Root-mean-square energy level (0-1 linear) - DynamicRange: Peak-to-RMS ratio in dB
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio data as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
Returns:
| Type | Description |
|---|---|
SpectralFeatures
|
Dictionary containing spectral features with units documented |
Raises:
| Type | Description |
|---|---|
AnalysisFailedError
|
If audio is too short, empty, or analysis fails |
Example
import librosa y, sr = librosa.load("kick.wav", sr=None) features = extract_spectral_features(y, sr) features['spectral_centroid'] 1523.5 # Hz - indicates a bright sound features['rms_energy'] 0.45 # Linear energy level features['dynamic_range'] 18.2 # dB peak-to-average ratio
Source code in src/audiomancer/analyzers/spectral.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
Rhythm Features¶
rhythm
¶
Rhythm analysis for audiomancer.
Extracts tempo, beat positions, and loop detection using Essentia.
extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]
¶
Extract rhythm/tempo features using Essentia.
Algorithms: - RhythmExtractor2013: BPM detection with confidence - BeatTrackerDegara: Beat positions - OnsetDetection: Transient detection
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Raises:
| Type | Description |
|---|---|
AnalysisFailedError
|
If extraction fails due to invalid audio data |
Example
y, sr = librosa.load("loop_125bpm.wav", sr=None) features = extract_rhythm_features(y, sr) features['bpm'] 125.0 features['bpm_confidence'] 0.95 features['is_loop'] True
Source code in src/audiomancer/analyzers/rhythm.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
Audio Embeddings¶
embeddings
¶
Audio embedding extraction for audiomancer.
This module provides functions for extracting fixed-dimension audio embeddings for similarity search and clustering using pre-trained Essentia models.
All embeddings are L2-normalized 128-dimensional vectors.
Includes SimilarityIndex for fast nearest-neighbor search using FAISS.
SimilarityIndex
¶
Fast similarity search index using FAISS.
Enables efficient nearest-neighbor search across thousands of audio embeddings. Uses IndexFlatIP (inner product) which equals cosine similarity for L2-normalized vectors.
Example
Build index from embeddings¶
embeddings = [extract_audio_embedding(audio, sr) for audio in samples] index = SimilarityIndex() index.add(embeddings)
Search for similar samples¶
query = extract_audio_embedding(query_audio, sr) similarities, indices = index.search(query, k=5)
Save/load for persistence¶
index.save("samples.index") loaded = SimilarityIndex.load("samples.index")
Source code in src/audiomancer/analyzers/embeddings.py
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 | |
dimension: int
property
¶
Get the embedding dimension.
ntotal: int
property
¶
Get the number of embeddings in the index.
__init__(dimension: int = 128)
¶
Create a new similarity index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dimension
|
int
|
Embedding dimension (default 128 for audiomancer) |
128
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If faiss-cpu is not installed |
Source code in src/audiomancer/analyzers/embeddings.py
add(embeddings: list[list[float]]) -> None
¶
Add embeddings to the index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embeddings
|
list[list[float]]
|
List of embedding vectors (each 128-dim, L2-normalized) |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If embeddings have wrong dimension |
Source code in src/audiomancer/analyzers/embeddings.py
load(path: str | Path) -> SimilarityIndex
classmethod
¶
Load an index from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the index file |
required |
Returns:
| Type | Description |
|---|---|
SimilarityIndex
|
Loaded SimilarityIndex |
Raises:
| Type | Description |
|---|---|
ImportError
|
If faiss-cpu is not installed |
FileNotFoundError
|
If index file doesn't exist |
Source code in src/audiomancer/analyzers/embeddings.py
save(path: str | Path) -> None
¶
Save the index to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to save the index file |
required |
search(query: list[float], k: int = 5) -> tuple[list[float], list[int]]
¶
Search for k most similar embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
list[float]
|
Query embedding (128-dim, L2-normalized) |
required |
k
|
int
|
Number of nearest neighbors to return |
5
|
Returns:
| Type | Description |
|---|---|
list[float]
|
Tuple of (similarities, indices): |
list[int]
|
|
tuple[list[float], list[int]]
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If query has wrong dimension or index is empty |
Source code in src/audiomancer/analyzers/embeddings.py
cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float
¶
Compute cosine similarity between two embeddings.
Since embeddings are L2-normalized, cosine similarity is just the dot product.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding1
|
list[float]
|
First embedding (128-dim) |
required |
embedding2
|
list[float]
|
Second embedding (128-dim) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity in range [-1, 1] |
float
|
(1 = identical, 0 = orthogonal, -1 = opposite) |
Example
emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) similarity = cosine_similarity(emb1, emb2) 0 <= similarity <= 1 # For typical audio True
Source code in src/audiomancer/analyzers/embeddings.py
euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float
¶
Compute Euclidean distance between two embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding1
|
list[float]
|
First embedding (128-dim) |
required |
embedding2
|
list[float]
|
Second embedding (128-dim) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Euclidean distance (lower = more similar) |
Example
emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) distance = euclidean_distance(emb1, emb2) distance >= 0 True
Source code in src/audiomancer/analyzers/embeddings.py
extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]
¶
Extract 128-dimensional audio embedding for similarity search.
Embeddings are L2-normalized fixed-size vectors that encode audio content. Similar-sounding audio will have similar embeddings (high cosine similarity).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio
|
ndarray
|
Audio samples as numpy array (mono or stereo) |
required |
sr
|
int
|
Sample rate in Hz |
required |
model
|
ModelType
|
Embedding model to use: - "musicnn": MusiCNN embeddings (recommended for music) - "vggish": VGGish embeddings (general audio) - "openl3": OpenL3 embeddings (environmental sounds) |
'musicnn'
|
Returns:
| Type | Description |
|---|---|
list[float]
|
List of 128 floats (L2-normalized embedding vector) |
Raises:
| Type | Description |
|---|---|
ModelLoadError
|
If model file not found |
AnalysisFailedError
|
If embedding extraction fails |
Example
embedding = extract_audio_embedding(audio, 44100) len(embedding) 128
Verify L2 normalization¶
import math math.isclose(sum(x**2 for x in embedding), 1.0, abs_tol=1e-6) True
Source code in src/audiomancer/analyzers/embeddings.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |