Analyzers API Reference¶

The audiomancer.analyzers module provides audio analysis functionality.

Overview¶

`analyzers` ¶

Audio analysis for audiomancer.

Provides interfaces for SynthDef parsing and audio feature extraction, including basic metadata, spectral, rhythm, tonal analysis, ML classification, and audio embeddings.

all = ['ControlSpec', 'SynthControl', 'SynthDefMetadata', 'SynthDefParser', 'SynthDefStore', 'parse_synthdef', 'categorize_synthdef', 'SynthDefInfo', 'SynthDefControl', 'get_basic_metadata', 'BasicMetadata', 'extract_spectral_features', 'SpectralFeatures', 'extract_rhythm_features', 'extract_tonal_features', 'classify_instrument', 'extract_mood_tags', 'extract_genre_tags', 'extract_audio_embedding', 'cosine_similarity', 'euclidean_distance', 'load_model', 'download_model', 'list_models', 'clear_cache'] `module-attribute` ¶

`BasicMetadata` ¶

Bases: TypedDict

Basic audio file metadata.

Attributes:

Name	Type	Description
`duration_ms`	`float`	Audio duration in milliseconds
`sample_rate`	`int`	Sample rate in Hz
`channels`	`int`	Number of audio channels
`bit_depth`	`int`	Bit depth (16 assumed for librosa float32 conversion)
`file_size_bytes`	`int`	File size in bytes
`file_hash`	`str`	SHA256 hex digest of file contents

`ControlSpec` ¶

Bases: TypedDict

Specification for a SynthDef control parameter.

Describes the valid range and characteristics of a synth control.

Example

spec = ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", # Exponential scaling for frequency ... step=1.0, ... )

`SpectralFeatures` ¶

Bases: TypedDict

Spectral audio features.

All frequency values are in Hz, energy values are linear (0-1 range).

Attributes:

Name	Type	Description
`spectral_centroid`	`float`	Mean brightness/center of mass of spectrum (Hz)
`spectral_bandwidth`	`float`	Frequency spread around centroid (Hz proxy)
`spectral_rolloff`	`float`	High-frequency content cutoff point (Hz)
`zero_crossing_rate`	`float`	Measure of noisiness/percussiveness (0-1)
`rms_energy`	`float`	Root-mean-square energy level (0-1 linear)
`dynamic_range`	`float`	Peak-to-average ratio (dB)

`SynthControl` ¶

Bases: TypedDict

A control parameter extracted from a SynthDef.

Represents a parameter that can be modified during synthesis.

Example

control = SynthControl( ... name="cutoff", ... default_value=1200.0, ... spec=ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", ... step=1.0, ... ), ... )

`SynthDefControl` `dataclass` ¶

A SynthDef control parameter.

Attributes:

Name	Type	Description
`name`	`str`	Parameter name (e.g., "freq", "cutoff")
`default_value`	`float`	Default value for the parameter
`spec`	`Optional[str]`	ControlSpec if specified (e.g., "\freq.asSpec")
`description`	`Optional[str]`	Human-readable description (if available)

Example

ctrl = SynthControl(name="freq", default_value=440.0) ctrl.name 'freq' ctrl.default_value 440.0

`SynthDefInfo` `dataclass` ¶

Parsed SynthDef metadata.

Attributes:

Name	Type	Description
`name`	`str`	SynthDef name (e.g., "tb303", "simple_sine")
`file_path`	`str`	Absolute path to .scd file
`file_hash`	`str`	SHA256 hash of source code
`num_channels`	`int`	Output channel count
`has_gate`	`bool`	Whether synth has gate parameter for note-off
`has_envelope`	`bool`	Whether synth uses EnvGen
`ugens_used`	`list[str]`	List of UGen class names used
`controls`	`list[SynthControl]`	List of control parameters
`source_code`	`str`	Raw SuperCollider source code
`category`	`Optional[str]`	Inferred category (bass, lead, pad, drum, fx)
`tags`	`list[str]`	Additional tags for categorization

Example

info = SynthDefInfo( ... name="simple_sine", ... file_path="/path/to/simple_sine.scd", ... file_hash="abc123", ... num_channels=2, ... has_gate=True, ... has_envelope=True, ... ugens_used=["SinOsc", "EnvGen", "Out"], ... controls=[SynthControl("freq", 440.0)], ... source_code="SynthDef(...)", ... category="lead", ... )

`SynthDefMetadata` ¶

Bases: TypedDict

Metadata extracted from a SynthDef file.

Contains all information parsed from a .scd file including controls, UGens used, and categorization.

Example

synthdef = SynthDefMetadata( ... id="synt_tb303", ... name="tb303", ... file_path="/synths/tb303.scd", ... file_hash="abc123", ... num_channels=1, ... has_gate=True, ... has_envelope=True, ... ugens_used=["SinOsc", "Resonz", "EnvGen", "Out"], ... category="bass", ... tags=["acid", "303", "classic"], ... source_code="SynthDef(\tb303, { ... })", ... controls=[ ... SynthControl( ... name="cutoff", ... default_value=1200.0, ... spec=ControlSpec( ... min=200.0, ... max=4000.0, ... default=1200.0, ... warp="exp", ... step=1.0, ... ), ... ), ... SynthControl( ... name="resonance", ... default_value=0.7, ... spec=ControlSpec( ... min=0.0, ... max=1.0, ... default=0.7, ... warp="linear", ... step=0.01, ... ), ... ), ... ], ... )

`SynthDefParser` ¶

Bases: Protocol

Interface for parsing SuperCollider SynthDef files.

Extracts controls, UGens, and metadata from .scd files using sclang.

`extract_controls(source_code: str) -> list[SynthControl]` ¶

Extract control parameters from SynthDef source code.

Parses arg declarations and infers specs from usage.

Parameters:

Name	Type	Description	Default
`source_code`	`str`	SuperCollider source code	required

Returns:

Type	Description
`list[SynthControl]`	List of extracted controls

Example

code = ''' ... SynthDef(\tb303, { |cutoff=1200, resonance=0.7| ... var sig = Saw.ar(freq); ... sig = Resonz.ar(sig, cutoff, 1/resonance); ... Out.ar(0, sig); ... }) ... ''' controls = parser.extract_controls(code) controls [ SynthControl(name="cutoff", default_value=1200.0, ...), SynthControl(name="resonance", default_value=0.7, ...), ]

`extract_ugens(source_code: str) -> list[str]` ¶

Extract UGen class names from source code.

Finds all UGen.ar() and UGen.kr() calls.

Parameters:

Name	Type	Description	Default
`source_code`	`str`	SuperCollider source code	required

Returns:

Type	Description
`list[str]`	List of unique UGen class names

Example

code = ''' ... SynthDef(\tb303, { ... var sig = Saw.ar(440); ... sig = Resonz.ar(sig, 1200); ... Out.ar(0, sig); ... }) ... ''' ugens = parser.extract_ugens(code) ugens ["Saw", "Resonz", "Out"]

`infer_category(metadata: SynthDefMetadata) -> str` ¶

Infer synth category from UGens and controls.

Categorization rules: - bass: Low-pass filter + low frequency range - lead: High resonance + envelope - pad: Long envelope + multiple oscillators - drum: Noise + short envelope - fx: No oscillators, effect UGens only

Parameters:

Name	Type	Description	Default
`metadata`	`SynthDefMetadata`	Parsed SynthDef metadata	required

Returns:

Type	Description
`str`	Category string

Example

metadata = SynthDefMetadata( ... ugens_used=["Saw", "Resonz", "EnvGen"], ... controls=[ ... SynthControl(name="cutoff", default_value=1200, ...), ... ], ... ... ... ) category = parser.infer_category(metadata) category "bass"

`parse(file_path: str, timeout: int = 5) -> SynthDefMetadata` ¶

Parse a SynthDef file and extract metadata.

Uses subprocess to run sclang with shell=False and timeout for safety. Falls back to binary parser if sclang fails.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Absolute path to .scd file	required
`timeout`	`int`	Maximum time to wait for sclang (seconds)	`5`

Returns:

Type	Description
`SynthDefMetadata`	Complete SynthDef metadata

Raises:

Type	Description
`FileNotFoundError`	If file_path does not exist
`ParseError`	If file cannot be parsed (invalid syntax)
`SubprocessTimeoutError`	If sclang exceeds timeout

Example

parser = SynthDefParser() metadata = parser.parse("/synths/tb303.scd", timeout=5) metadata['name'] "tb303" len(metadata['controls']) 7 metadata['controls'][0]['name'] "cutoff"

`parse_batch(file_paths: list[str], timeout: int = 5) -> list[SynthDefMetadata]` ¶

Parse multiple SynthDef files in batch.

More efficient than individual parse() calls for many files.

Parameters:

Name	Type	Description	Default
`file_paths`	`list[str]`	List of absolute paths to .scd files	required
`timeout`	`int`	Maximum time per file (seconds)	`5`

Returns:

Type	Description
`list[SynthDefMetadata]`	List of SynthDef metadata in same order as input

Raises:

Type	Description
`FileNotFoundError`	If any file does not exist (fails fast)
`ParseError`	On first parse failure (no partial results)

Example

parser = SynthDefParser() files = ["/synths/tb303.scd", "/synths/juno.scd"] results = parser.parse_batch(files, timeout=5) len(results) 2

`validate_path(file_path: str) -> bool` ¶

Validate that file path is safe and exists.

Checks for: - File existence - .scd extension - No path traversal (../) - Readable permissions

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to validate	required

Returns:

Type	Description
`bool`	True if valid, False otherwise

Example

parser = SynthDefParser() parser.validate_path("/synths/tb303.scd") True parser.validate_path("/etc/passwd") # Wrong extension False parser.validate_path("../../etc/passwd.scd") # Traversal False

`SynthDefStore` ¶

Bases: Protocol

Interface for SynthDef storage operations.

Similar to SampleStore but for synthesizer definitions.

`add(synthdef: SynthDefMetadata) -> str` ¶

Add SynthDef to database.

Parameters:

Name	Type	Description	Default
`synthdef`	`SynthDefMetadata`	Complete SynthDef metadata	required

Returns:

Type	Description
`str`	Synth ID (format: "synt_{hash[:8]}")

Raises:

Type	Description
`DuplicateSynthError`	If SynthDef with same name already exists

Example

synthdef = SynthDefMetadata( ... name="tb303", ... file_path="/synths/tb303.scd", ... file_hash="abc123", ... num_channels=1, ... has_gate=True, ... has_envelope=True, ... ugens_used=["Saw", "Resonz", "EnvGen", "Out"], ... source_code="SynthDef(...)", ... controls=[...], ... ) synth_id = store.add(synthdef) synth_id "synt_abc123"

`delete(synth_id: str) -> bool` ¶

Delete SynthDef from database.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID to delete	required

Returns:

Type	Description
`bool`	True if deleted, False if not found

Example

success = store.delete("synt_abc123") success True

`get(synth_id: str) -> Optional[SynthDefMetadata]` ¶

Retrieve SynthDef by ID.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID (format: "synt_{hash[:8]}")	required

Returns:

Type	Description
`Optional[SynthDefMetadata]`	SynthDef metadata if found, None otherwise

Example

synthdef = store.get("synt_abc123") synthdef['name'] "tb303" store.get("synt_nonexistent") None

`get_by_name(name: str) -> Optional[SynthDefMetadata]` ¶

Retrieve SynthDef by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	SynthDef name	required

Returns:

Type	Description
`Optional[SynthDefMetadata]`	SynthDef metadata if found, None otherwise

Example

synthdef = store.get_by_name("tb303") synthdef['id'] "synt_abc123"

`search(category: Optional[str] = None, has_gate: Optional[bool] = None, tags: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SynthDefMetadata]` ¶

Search SynthDefs with filters.

Parameters:

Name	Type	Description	Default
`category`	`Optional[str]`	Filter by category (bass, lead, pad, drum, fx)	`None`
`has_gate`	`Optional[bool]`	Filter by gate presence	`None`
`tags`	`Optional[list[str]]`	Filter by tags (matches if ANY tag present)	`None`
`limit`	`int`	Maximum results to return	`50`
`offset`	`int`	Number of results to skip (for pagination)	`0`

Returns:

Type	Description
`list[SynthDefMetadata]`	List of matching SynthDefs

Example

Find bass synths with gate¶

results = store.search( ... category="bass", ... has_gate=True, ... limit=10, ... ) len(results) <= 10 True

`update(synth_id: str, updates: dict[str, Any]) -> bool` ¶

Update SynthDef fields.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID to update	required
`updates`	`dict[str, Any]`	Dictionary of field names and new values	required

Returns:

Type	Description
`bool`	True if updated, False if not found

Example

success = store.update( ... "synt_abc123", ... {"category": "lead", "tags": ["acid", "303"]}, ... ) success True

`categorize_synthdef(info: SynthDefInfo) -> str` ¶

Infer category from UGens and controls.

Categories: - bass: Low-frequency synths with filters (MoogFF, RLPF) - lead: Pitched synths with envelopes and gate - pad: Long sustained synths with gate - drum: Percussive synths without gate or noise-based - fx: Effect processors, noise generators

Parameters:

Name	Type	Description	Default
`info`	`SynthDefInfo`	SynthDefInfo to categorize	required

Returns:

Type	Description
`str`	Category string (bass, lead, pad, drum, fx)

Example

info = SynthDefInfo(...) categorize_synthdef(info) 'bass'

`classify_instrument(audio: np.ndarray, sr: int, model_path: Optional[str] = None, top_k: int = 3) -> dict[str, Any]` ¶

Classify audio into instrument categories using Essentia's pre-trained models.

Uses MTG-Jamendo instrument classification model to detect instrument presence. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required
`model_path`	`Optional[str]`	Path to custom model file, or None to use default	`None`
`top_k`	`int`	Number of top predictions to return	`3`

Returns:

Type	Description
`dict[str, Any]`	Dictionary with keys:
`dict[str, Any]`	instrument_type: Most likely instrument (str)
`dict[str, Any]`	instrument_confidence: Confidence score 0-1 (float)
`dict[str, Any]`	top_predictions: List of (instrument, confidence) tuples

Raises:

Type	Description
`ModelLoadError`	If model file not found
`AnalysisFailedError`	If classification fails

Example

result = classify_instrument(audio, 44100) result['instrument_type'] 'drums' result['instrument_confidence'] 0.92 result['top_predictions'][('drums', 0.92), ('percussion', 0.78), ('beat', 0.45)]

`clear_cache(model_type: Optional[ModelType] = None) -> None` ¶

Clear model cache.

Parameters:

Name	Type	Description	Default
`model_type`	`Optional[ModelType]`	Specific model to clear, or None to clear all	`None`

Example

clear_cache("musicnn") # Clear specific model clear_cache() # Clear all models

`cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float` ¶

Compute cosine similarity between two embeddings.

Since embeddings are L2-normalized, cosine similarity is just the dot product.

Parameters:

Name	Type	Description	Default
`embedding1`	`list[float]`	First embedding (128-dim)	required
`embedding2`	`list[float]`	Second embedding (128-dim)	required

Returns:

Type	Description
`float`	Cosine similarity in range [-1, 1]
`float`	(1 = identical, 0 = orthogonal, -1 = opposite)

Example

emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) similarity = cosine_similarity(emb1, emb2) 0 <= similarity <= 1 # For typical audio True

`download_model(model_type: ModelType, force: bool = False, verify_checksum: bool = True) -> Path` ¶

Download an Essentia model from the model zoo.

Downloads to ~/.local/share/audiomancer/models/ and verifies checksum.

Parameters:

Name	Type	Description	Default
`model_type`	`ModelType`	Model type to download	required
`force`	`bool`	Force re-download even if file exists	`False`
`verify_checksum`	`bool`	Verify SHA256 checksum after download	`True`

Returns:

Type	Description
`Path`	Path to downloaded model file

Raises:

Type	Description
`ModelLoadError`	If download fails or checksum mismatch

Example

path = download_model("musicnn") path.exists() True path.stat().st_size > 0 True

`euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float` ¶

Compute Euclidean distance between two embeddings.

Parameters:

Name	Type	Description	Default
`embedding1`	`list[float]`	First embedding (128-dim)	required
`embedding2`	`list[float]`	Second embedding (128-dim)	required

Returns:

Type	Description
`float`	Euclidean distance (lower = more similar)

Example

emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) distance = euclidean_distance(emb1, emb2) distance >= 0 True

`extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]` ¶

Extract 128-dimensional audio embedding for similarity search.

Embeddings are L2-normalized fixed-size vectors that encode audio content. Similar-sounding audio will have similar embeddings (high cosine similarity).

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required
`model`	`ModelType`	Embedding model to use: - "musicnn": MusiCNN embeddings (recommended for music) - "vggish": VGGish embeddings (general audio) - "openl3": OpenL3 embeddings (environmental sounds)	`'musicnn'`

Returns:

Type	Description
`list[float]`	List of 128 floats (L2-normalized embedding vector)

Raises:

Type	Description
`ModelLoadError`	If model file not found
`AnalysisFailedError`	If embedding extraction fails

Example

embedding = extract_audio_embedding(audio, 44100) len(embedding) 128

Verify L2 normalization¶

import math math.isclose(sum(x**2 for x in embedding), 1.0, abs_tol=1e-6) True

`extract_genre_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]` ¶

Extract genre tags using Essentia's genre classifiers.

Uses MTG-Jamendo genre classification model. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required
`top_k`	`int`	Maximum number of genre tags to return	`3`
`threshold`	`float`	Minimum confidence threshold (0-1)	`0.1`

Returns:

Type	Description
`list[str]`	List of genre tags sorted by confidence

Raises:

Type	Description
`ModelLoadError`	If model file not found
`AnalysisFailedError`	If classification fails

Example

genres = extract_genre_tags(audio, 44100) genres ['techno', 'electronic', 'house']

`extract_mood_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]` ¶

Extract mood/theme tags using Essentia's mood classifiers.

Uses MTG-Jamendo mood/theme classification model. This requires a two-stage pipeline: 1. Extract embeddings using discogs-effnet base model 2. Pass embeddings to classification head

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required
`top_k`	`int`	Maximum number of mood tags to return	`3`
`threshold`	`float`	Minimum confidence threshold (0-1)	`0.1`

Returns:

Type	Description
`list[str]`	List of mood tags sorted by confidence

Raises:

Type	Description
`ModelLoadError`	If model file not found
`AnalysisFailedError`	If classification fails

Example

moods = extract_mood_tags(audio, 44100) moods ['dark', 'electronic', 'energetic']

`extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

Extract rhythm/tempo features using Essentia.

Algorithms: - RhythmExtractor2013: BPM detection with confidence - BeatTrackerDegara: Beat positions - OnsetDetection: Transient detection

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required

Returns:

Type	Description
`dict[str, Any]`	dict with keys:
`dict[str, Any]`	bpm: float or None (tempo in beats per minute, None for non-rhythmic)
`dict[str, Any]`	bpm_confidence: float (0-1)
`dict[str, Any]`	beat_positions: list[float] (beat times in seconds)
`dict[str, Any]`	is_loop: bool (True if audio appears to be a rhythmic loop)

Raises:

Type	Description
`AnalysisFailedError`	If extraction fails due to invalid audio data

Example

y, sr = librosa.load("loop_125bpm.wav", sr=None) features = extract_rhythm_features(y, sr) features['bpm'] 125.0 features['bpm_confidence'] 0.95 features['is_loop'] True

`extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures` ¶

Extract spectral features using Essentia.

Performs frame-based spectral analysis to extract features that describe the frequency content and energy distribution of the audio signal.

Algorithms used: - Centroid: Spectral center of mass, indicates brightness (Hz) - Bandwidth: Frequency spread via 2nd central moment (Hz proxy) - Rolloff: Frequency below which 85% of energy is contained (Hz) - ZeroCrossingRate: Time-domain noisiness measure (0-1) - RMS: Root-mean-square energy level (0-1 linear) - DynamicRange: Peak-to-RMS ratio in dB

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio data as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required

Returns:

Type	Description
`SpectralFeatures`	Dictionary containing spectral features with units documented

Raises:

Type	Description
`AnalysisFailedError`	If audio is too short, empty, or analysis fails

Example

import librosa y, sr = librosa.load("kick.wav", sr=None) features = extract_spectral_features(y, sr) features['spectral_centroid'] 1523.5 # Hz - indicates a bright sound features['rms_energy'] 0.45 # Linear energy level features['dynamic_range'] 18.2 # dB peak-to-average ratio

`extract_tonal_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

Extract tonal/pitch features using Essentia.

Algorithms: - KeyExtractor: Key and scale detection - PitchYin: Pitch tracking for salience - TuningFrequency: Reference tuning detection

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required

Returns:

Type	Description
`dict[str, Any]`	dict with keys:
`dict[str, Any]`	key: str or None (e.g., "C", "Dm", "F#m", None for percussion/noise)
`dict[str, Any]`	key_confidence: float (0-1)
`dict[str, Any]`	tuning_frequency: float (Hz, typically ~440)
`dict[str, Any]`	pitch_salience: float (0-1, how tonal vs percussive)

Raises:

Type	Description
`AnalysisFailedError`	If extraction fails due to invalid audio data

Example

y, sr = librosa.load("bass_c.wav", sr=None) features = extract_tonal_features(y, sr) features['key'] 'C' features['pitch_salience'] 0.85

`get_basic_metadata(path: Path) -> BasicMetadata` ¶

Extract basic audio metadata.

Loads audio file using librosa and computes fundamental properties. The file is loaded with its native sample rate to preserve metadata accuracy.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to audio file	required

Returns:

Type	Description
`BasicMetadata`	Dictionary containing basic metadata:
`BasicMetadata`	duration_ms: Audio duration in milliseconds (float)
`BasicMetadata`	sample_rate: Native sample rate in Hz (int)
`BasicMetadata`	channels: Number of audio channels (int)
`BasicMetadata`	bit_depth: Bit depth, 16 assumed for librosa (int)
`BasicMetadata`	file_size_bytes: File size on disk in bytes (int)
`BasicMetadata`	file_hash: SHA256 hex digest for deduplication (str)

Raises:

Type	Description
`UnsupportedFormatError`	If file cannot be loaded by librosa
`AnalysisFailedError`	If file is too short or contains invalid audio

Example

meta = get_basic_metadata(Path("kick.wav")) meta['duration_ms'] 250.5 meta['sample_rate'] 44100 meta['channels'] 1

`list_models(include_cached_only: bool = False) -> dict[str, dict[str, Any]]` ¶

List available models and their status.

Parameters:

Name	Type	Description	Default
`include_cached_only`	`bool`	Only show cached models	`False`

Returns:

Type	Description
`dict[str, dict[str, Any]]`	Dictionary mapping model type to info dict with keys:
`dict[str, dict[str, Any]]`	cached: Whether model is cached locally (bool)
`dict[str, dict[str, Any]]`	path: Path to cached model if exists (str \| None)
`dict[str, dict[str, Any]]`	description: Model description (str)

Example

models = list_models() "musicnn" in models True models["musicnn"]["cached"] True

`load_model(model_type: ModelType, auto_download: bool = True) -> Path` ¶

Load an Essentia model, downloading if necessary.

Parameters:

Name	Type	Description	Default
`model_type`	`ModelType`	Model type to load	required
`auto_download`	`bool`	Automatically download if not cached	`True`

Returns:

Type	Description
`Path`	Path to model file

Raises:

Type	Description
`ModelLoadError`	If model not found and auto_download=False

Example

model_path = load_model("musicnn") model_path.exists() True

`parse_synthdef(path: Path, timeout: float = 10.0) -> SynthDefInfo` ¶

Parse a SuperCollider SynthDef file using sclang.

Uses sclang subprocess to extract SynthDesc metadata. Falls back to regex parsing if sclang is unavailable or fails.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to .scd file containing SynthDef	required
`timeout`	`float`	Maximum time to wait for sclang (seconds)	`10.0`

Returns:

Type	Description
`SynthDefInfo`	SynthDefInfo with extracted metadata

Raises:

Type	Description
`SynthDefError`	If SynthDef is invalid or cannot be parsed
`SubprocessTimeoutError`	If sclang takes too long

Example

info = parse_synthdef(Path("synths/tb303.scd")) info.name 'tb303' info.controls[0].name 'out' info.ugens_used ['Saw', 'Pulse', 'Select', 'MoogFF', 'EnvGen', 'Out', 'Lag']

Basic Metadata¶

`basic` ¶

Basic audio metadata extraction for audiomancer.

This module provides functions for extracting fundamental audio file metadata such as duration, sample rate, channel count, and file hash.

`BasicMetadata` ¶

Bases: TypedDict

Basic audio file metadata.

Attributes:

Name	Type	Description
`duration_ms`	`float`	Audio duration in milliseconds
`sample_rate`	`int`	Sample rate in Hz
`channels`	`int`	Number of audio channels
`bit_depth`	`int`	Bit depth (16 assumed for librosa float32 conversion)
`file_size_bytes`	`int`	File size in bytes
`file_hash`	`str`	SHA256 hex digest of file contents

Source code in src/audiomancer/analyzers/basic.py

class BasicMetadata(TypedDict):
    """Basic audio file metadata.

    Attributes:
        duration_ms: Audio duration in milliseconds
        sample_rate: Sample rate in Hz
        channels: Number of audio channels
        bit_depth: Bit depth (16 assumed for librosa float32 conversion)
        file_size_bytes: File size in bytes
        file_hash: SHA256 hex digest of file contents
    """
    duration_ms: float
    sample_rate: int
    channels: int
    bit_depth: int
    file_size_bytes: int
    file_hash: str

`get_basic_metadata(path: Path) -> BasicMetadata` ¶

Extract basic audio metadata.

Loads audio file using librosa and computes fundamental properties. The file is loaded with its native sample rate to preserve metadata accuracy.

Parameters:

Name	Type	Description	Default
`path`	`Path`	Path to audio file	required

Returns:

Type	Description
`BasicMetadata`	Dictionary containing basic metadata:
`BasicMetadata`	duration_ms: Audio duration in milliseconds (float)
`BasicMetadata`	sample_rate: Native sample rate in Hz (int)
`BasicMetadata`	channels: Number of audio channels (int)
`BasicMetadata`	bit_depth: Bit depth, 16 assumed for librosa (int)
`BasicMetadata`	file_size_bytes: File size on disk in bytes (int)
`BasicMetadata`	file_hash: SHA256 hex digest for deduplication (str)

Raises:

Type	Description
`UnsupportedFormatError`	If file cannot be loaded by librosa
`AnalysisFailedError`	If file is too short or contains invalid audio

Example

meta = get_basic_metadata(Path("kick.wav")) meta['duration_ms'] 250.5 meta['sample_rate'] 44100 meta['channels'] 1

Source code in src/audiomancer/analyzers/basic.py

def get_basic_metadata(path: Path) -> BasicMetadata:
    """Extract basic audio metadata.

    Loads audio file using librosa and computes fundamental properties.
    The file is loaded with its native sample rate to preserve metadata accuracy.

    Args:
        path: Path to audio file

    Returns:
        Dictionary containing basic metadata:
        - duration_ms: Audio duration in milliseconds (float)
        - sample_rate: Native sample rate in Hz (int)
        - channels: Number of audio channels (int)
        - bit_depth: Bit depth, 16 assumed for librosa (int)
        - file_size_bytes: File size on disk in bytes (int)
        - file_hash: SHA256 hex digest for deduplication (str)

    Raises:
        UnsupportedFormatError: If file cannot be loaded by librosa
        AnalysisFailedError: If file is too short or contains invalid audio

    Example:
        >>> meta = get_basic_metadata(Path("kick.wav"))
        >>> meta['duration_ms']
        250.5
        >>> meta['sample_rate']
        44100
        >>> meta['channels']
        1
    """
    # Validate path exists
    if not path.exists():
        raise UnsupportedFormatError(
            f"File does not exist: {path}",
            details={"path": str(path), "error": "file not found"}
        )

    # Load audio with native sample rate
    try:
        y, sr = librosa.load(str(path), sr=None, mono=False)
    except Exception as e:
        raise UnsupportedFormatError(
            f"Cannot load audio file",
            details={"path": str(path), "error": str(e)}
        )

    # Determine channel count and duration
    if y.ndim > 1:
        channels = y.shape[0]
        duration_samples = y.shape[1]
    else:
        channels = 1
        duration_samples = len(y)

    # Validate audio has content
    if duration_samples == 0:
        raise AnalysisFailedError(
            "Audio file is empty",
            details={
                "path": str(path),
                "reason": "zero samples",
                "stage": "metadata extraction"
            }
        )

    # Calculate duration in milliseconds
    duration_ms = (duration_samples / sr) * 1000.0

    # Compute SHA256 hash for deduplication
    try:
        with open(path, 'rb') as f:
            file_hash = hashlib.sha256(f.read()).hexdigest()
    except Exception as e:
        raise AnalysisFailedError(
            "Failed to compute file hash",
            details={
                "path": str(path),
                "error": str(e),
                "stage": "hash computation"
            }
        )

    # Get file size
    file_size_bytes = path.stat().st_size

    return BasicMetadata(
        duration_ms=float(duration_ms),
        sample_rate=int(sr),
        channels=int(channels),
        bit_depth=16,  # librosa loads as float32, assume 16-bit source
        file_size_bytes=int(file_size_bytes),
        file_hash=file_hash,
    )

Spectral Features¶

`spectral` ¶

Spectral feature extraction for audiomancer.

This module provides spectral analysis functions using Essentia for extracting features like spectral centroid, bandwidth, rolloff, and energy.

`SpectralFeatures` ¶

Bases: TypedDict

Spectral audio features.

All frequency values are in Hz, energy values are linear (0-1 range).

Attributes:

Name	Type	Description
`spectral_centroid`	`float`	Mean brightness/center of mass of spectrum (Hz)
`spectral_bandwidth`	`float`	Frequency spread around centroid (Hz proxy)
`spectral_rolloff`	`float`	High-frequency content cutoff point (Hz)
`zero_crossing_rate`	`float`	Measure of noisiness/percussiveness (0-1)
`rms_energy`	`float`	Root-mean-square energy level (0-1 linear)
`dynamic_range`	`float`	Peak-to-average ratio (dB)

Source code in src/audiomancer/analyzers/spectral.py

class SpectralFeatures(TypedDict):
    """Spectral audio features.

    All frequency values are in Hz, energy values are linear (0-1 range).

    Attributes:
        spectral_centroid: Mean brightness/center of mass of spectrum (Hz)
        spectral_bandwidth: Frequency spread around centroid (Hz proxy)
        spectral_rolloff: High-frequency content cutoff point (Hz)
        zero_crossing_rate: Measure of noisiness/percussiveness (0-1)
        rms_energy: Root-mean-square energy level (0-1 linear)
        dynamic_range: Peak-to-average ratio (dB)
    """
    spectral_centroid: float
    spectral_bandwidth: float
    spectral_rolloff: float
    zero_crossing_rate: float
    rms_energy: float
    dynamic_range: float

`extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures` ¶

Extract spectral features using Essentia.

Performs frame-based spectral analysis to extract features that describe the frequency content and energy distribution of the audio signal.

Algorithms used: - Centroid: Spectral center of mass, indicates brightness (Hz) - Bandwidth: Frequency spread via 2nd central moment (Hz proxy) - Rolloff: Frequency below which 85% of energy is contained (Hz) - ZeroCrossingRate: Time-domain noisiness measure (0-1) - RMS: Root-mean-square energy level (0-1 linear) - DynamicRange: Peak-to-RMS ratio in dB

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio data as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required

Returns:

Type	Description
`SpectralFeatures`	Dictionary containing spectral features with units documented

Raises:

Type	Description
`AnalysisFailedError`	If audio is too short, empty, or analysis fails

Example

import librosa y, sr = librosa.load("kick.wav", sr=None) features = extract_spectral_features(y, sr) features['spectral_centroid'] 1523.5 # Hz - indicates a bright sound features['rms_energy'] 0.45 # Linear energy level features['dynamic_range'] 18.2 # dB peak-to-average ratio

Source code in src/audiomancer/analyzers/spectral.py

def extract_spectral_features(
    audio: np.ndarray,
    sr: int
) -> SpectralFeatures:
    """Extract spectral features using Essentia.

    Performs frame-based spectral analysis to extract features that describe
    the frequency content and energy distribution of the audio signal.

    Algorithms used:
    - Centroid: Spectral center of mass, indicates brightness (Hz)
    - Bandwidth: Frequency spread via 2nd central moment (Hz proxy)
    - Rolloff: Frequency below which 85% of energy is contained (Hz)
    - ZeroCrossingRate: Time-domain noisiness measure (0-1)
    - RMS: Root-mean-square energy level (0-1 linear)
    - DynamicRange: Peak-to-RMS ratio in dB

    Args:
        audio: Audio data as numpy array (mono or stereo)
        sr: Sample rate in Hz

    Returns:
        Dictionary containing spectral features with units documented

    Raises:
        AnalysisFailedError: If audio is too short, empty, or analysis fails

    Example:
        >>> import librosa
        >>> y, sr = librosa.load("kick.wav", sr=None)
        >>> features = extract_spectral_features(y, sr)
        >>> features['spectral_centroid']
        1523.5  # Hz - indicates a bright sound
        >>> features['rms_energy']
        0.45  # Linear energy level
        >>> features['dynamic_range']
        18.2  # dB peak-to-average ratio
    """
    # Ensure mono audio for analysis
    if audio.ndim > 1:
        audio = np.mean(audio, axis=0)

    # Ensure float32 for Essentia compatibility
    audio = audio.astype(np.float32)

    # Validate audio has content
    if len(audio) == 0:
        raise AnalysisFailedError(
            "Cannot analyze empty audio",
            details={
                "reason": "zero samples",
                "stage": "spectral analysis preparation"
            }
        )

    # Check for minimum length (need at least one frame)
    frame_size = 2048
    if len(audio) < frame_size:
        raise AnalysisFailedError(
            "Audio too short for spectral analysis",
            details={
                "reason": f"audio length {len(audio)} < frame size {frame_size}",
                "stage": "spectral analysis preparation",
                "min_samples": frame_size,
                "actual_samples": len(audio)
            }
        )

    # Initialize Essentia algorithms
    try:
        spectrum_extractor = es.Spectrum()
        windowing = es.Windowing(type='hann', size=frame_size)
        centroid_algo = es.Centroid(range=sr/2)
        central_moments = es.CentralMoments()
        rolloff_algo = es.RollOff()
        zcr_algo = es.ZeroCrossingRate()
        rms_algo = es.RMS()
    except Exception as e:
        raise AnalysisFailedError(
            "Failed to initialize Essentia algorithms",
            details={
                "error": str(e),
                "stage": "algorithm initialization"
            }
        )

    # Frame-based analysis
    hop_size = 512
    centroids = []
    bandwidths = []
    rolloffs = []
    zcrs = []
    rms_values = []

    try:
        for i in range(0, len(audio) - frame_size, hop_size):
            frame = audio[i:i+frame_size]

            # Apply Hann window
            windowed = windowing(frame)

            # Compute spectrum
            spectrum = spectrum_extractor(windowed)

            # Extract features
            centroids.append(centroid_algo(spectrum))

            # Use 2nd central moment as bandwidth proxy
            moments = central_moments(spectrum)
            if len(moments) > 1:
                bandwidths.append(abs(moments[1]))
            else:
                bandwidths.append(0.0)

            rolloffs.append(rolloff_algo(spectrum))
            zcrs.append(zcr_algo(frame))
            rms_values.append(rms_algo(frame))

    except Exception as e:
        raise AnalysisFailedError(
            "Spectral feature extraction failed",
            details={
                "error": str(e),
                "stage": "frame-based analysis"
            }
        )

    # Validate we got features
    if len(centroids) == 0:
        raise AnalysisFailedError(
            "No features extracted",
            details={
                "reason": "no valid frames",
                "stage": "spectral analysis"
            }
        )

    # Compute dynamic range
    peak = float(np.max(np.abs(audio)))
    rms = float(np.sqrt(np.mean(audio**2)))

    # Avoid log of zero
    if rms > 0 and peak > 0:
        dynamic_range = 20 * np.log10(peak / rms)
    else:
        dynamic_range = 0.0

    # Convert lists to arrays for mean calculation
    centroids_arr = np.array(centroids)
    bandwidths_arr = np.array(bandwidths)
    rolloffs_arr = np.array(rolloffs)
    zcrs_arr = np.array(zcrs)
    rms_arr = np.array(rms_values)

    # Validate no NaN or inf values
    def validate_value(value: float, name: str) -> float:
        if np.isnan(value) or np.isinf(value):
            raise AnalysisFailedError(
                f"Invalid {name} value",
                details={
                    "reason": f"{name} is NaN or inf",
                    "stage": "feature validation",
                    "value": str(value)
                }
            )
        return float(value)

    # Return mean values across all frames
    return SpectralFeatures(
        spectral_centroid=validate_value(float(np.mean(centroids_arr)), "spectral_centroid"),
        spectral_bandwidth=validate_value(float(np.mean(bandwidths_arr)), "spectral_bandwidth"),
        spectral_rolloff=validate_value(float(np.mean(rolloffs_arr)), "spectral_rolloff"),
        zero_crossing_rate=validate_value(float(np.mean(zcrs_arr)), "zero_crossing_rate"),
        rms_energy=validate_value(float(np.mean(rms_arr)), "rms_energy"),
        dynamic_range=validate_value(dynamic_range, "dynamic_range"),
    )

Rhythm Features¶

`rhythm` ¶

Rhythm analysis for audiomancer.

Extracts tempo, beat positions, and loop detection using Essentia.

`extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

Extract rhythm/tempo features using Essentia.

Algorithms: - RhythmExtractor2013: BPM detection with confidence - BeatTrackerDegara: Beat positions - OnsetDetection: Transient detection

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required

Returns:

Type	Description
`dict[str, Any]`	dict with keys:
`dict[str, Any]`	bpm: float or None (tempo in beats per minute, None for non-rhythmic)
`dict[str, Any]`	bpm_confidence: float (0-1)
`dict[str, Any]`	beat_positions: list[float] (beat times in seconds)
`dict[str, Any]`	is_loop: bool (True if audio appears to be a rhythmic loop)

Raises:

Type	Description
`AnalysisFailedError`	If extraction fails due to invalid audio data

Example

y, sr = librosa.load("loop_125bpm.wav", sr=None) features = extract_rhythm_features(y, sr) features['bpm'] 125.0 features['bpm_confidence'] 0.95 features['is_loop'] True

Source code in src/audiomancer/analyzers/rhythm.py

def extract_rhythm_features(
    audio: np.ndarray,
    sr: int
) -> dict[str, Any]:
    """
    Extract rhythm/tempo features using Essentia.

    Algorithms:
    - RhythmExtractor2013: BPM detection with confidence
    - BeatTrackerDegara: Beat positions
    - OnsetDetection: Transient detection

    Args:
        audio: Audio samples (mono or stereo)
        sr: Sample rate in Hz

    Returns:
        dict with keys:
        - bpm: float or None (tempo in beats per minute, None for non-rhythmic)
        - bpm_confidence: float (0-1)
        - beat_positions: list[float] (beat times in seconds)
        - is_loop: bool (True if audio appears to be a rhythmic loop)

    Raises:
        AnalysisFailedError: If extraction fails due to invalid audio data

    Example:
        >>> y, sr = librosa.load("loop_125bpm.wav", sr=None)
        >>> features = extract_rhythm_features(y, sr)
        >>> features['bpm']
        125.0
        >>> features['bpm_confidence']
        0.95
        >>> features['is_loop']
        True
    """
    try:
        # Ensure mono float32
        if audio.ndim > 1:
            audio = np.mean(audio, axis=0)
        audio = audio.astype(np.float32)

        # Validate audio is not all zeros or NaN
        if len(audio) == 0:
            raise AnalysisFailedError(
                "Cannot analyze empty audio",
                details={"stage": "rhythm_extraction", "reason": "empty input"}
            )

        if np.all(audio == 0) or np.any(np.isnan(audio)) or np.any(np.isinf(audio)):
            # Silence or invalid data - return None for BPM
            return {
                'bpm': None,
                'bpm_confidence': 0.0,
                'beat_positions': [],
                'is_loop': False,
            }

        # BPM extraction
        rhythm_extractor = es.RhythmExtractor2013(method="multifeature")
        bpm, beats, beats_confidence, _, _ = rhythm_extractor(audio)

        # Convert beat positions to seconds (they come as samples from Essentia)
        beat_positions = [float(b) for b in beats]

        # Calculate average confidence (beats_confidence is a float, not array)
        avg_confidence = float(beats_confidence) if not np.isnan(beats_confidence) else 0.0

        # Only report BPM if:
        # 1. BPM > 0 (Essentia returns 0 for non-rhythmic content)
        # 2. Confidence is reasonable (> 0.2)
        # 3. We have some beats detected
        bpm_value = None
        if bpm > 0 and avg_confidence > 0.2 and len(beats) > 0:
            bpm_value = float(bpm)

        # Determine if loop (check if duration matches bar boundaries)
        duration_sec = len(audio) / sr
        is_loop = False

        if bpm_value is not None and bpm_value > 0:
            beat_duration = 60.0 / bpm_value
            bar_duration = beat_duration * 4  # 4/4 time signature
            bars = duration_sec / bar_duration

            # Is loop if:
            # 1. Close to whole number of bars (within 10%)
            # 2. Duration is less than 30 seconds (typical loop length)
            # 3. We have at least 1 bar
            if bars >= 1 and abs(bars - round(bars)) < 0.1 and duration_sec < 30:
                is_loop = True

        return {
            'bpm': bpm_value,
            'bpm_confidence': avg_confidence,
            'beat_positions': beat_positions,
            'is_loop': is_loop,
        }

    except Exception as e:
        if isinstance(e, AnalysisFailedError):
            raise
        raise AnalysisFailedError(
            "Rhythm extraction failed",
            details={
                "stage": "rhythm_extraction",
                "error": str(e),
                "audio_shape": audio.shape if hasattr(audio, 'shape') else None,
                "sample_rate": sr,
            }
        ) from e

Audio Embeddings¶

`embeddings` ¶

Audio embedding extraction for audiomancer.

This module provides functions for extracting fixed-dimension audio embeddings for similarity search and clustering using pre-trained Essentia models.

All embeddings are L2-normalized 128-dimensional vectors.

Includes SimilarityIndex for fast nearest-neighbor search using FAISS.

`SimilarityIndex` ¶

Fast similarity search index using FAISS.

Enables efficient nearest-neighbor search across thousands of audio embeddings. Uses IndexFlatIP (inner product) which equals cosine similarity for L2-normalized vectors.

Example

Build index from embeddings¶

embeddings = [extract_audio_embedding(audio, sr) for audio in samples] index = SimilarityIndex() index.add(embeddings)

Search for similar samples¶

query = extract_audio_embedding(query_audio, sr) similarities, indices = index.search(query, k=5)

Save/load for persistence¶

index.save("samples.index") loaded = SimilarityIndex.load("samples.index")

Source code in src/audiomancer/analyzers/embeddings.py

class SimilarityIndex:
    """Fast similarity search index using FAISS.

    Enables efficient nearest-neighbor search across thousands of audio embeddings.
    Uses IndexFlatIP (inner product) which equals cosine similarity for L2-normalized vectors.

    Example:
        >>> # Build index from embeddings
        >>> embeddings = [extract_audio_embedding(audio, sr) for audio in samples]
        >>> index = SimilarityIndex()
        >>> index.add(embeddings)
        >>>
        >>> # Search for similar samples
        >>> query = extract_audio_embedding(query_audio, sr)
        >>> similarities, indices = index.search(query, k=5)
        >>>
        >>> # Save/load for persistence
        >>> index.save("samples.index")
        >>> loaded = SimilarityIndex.load("samples.index")
    """

    def __init__(self, dimension: int = 128):
        """Create a new similarity index.

        Args:
            dimension: Embedding dimension (default 128 for audiomancer)

        Raises:
            ImportError: If faiss-cpu is not installed
        """
        if not FAISS_AVAILABLE:
            raise ImportError(
                "faiss-cpu is required for SimilarityIndex. "
                "Install with: pip install faiss-cpu"
            )

        self._dimension = dimension
        # IndexFlatIP uses inner product (dot product)
        # For L2-normalized vectors, this equals cosine similarity
        self._index = faiss.IndexFlatIP(dimension)

    @property
    def dimension(self) -> int:
        """Get the embedding dimension."""
        return self._dimension

    @property
    def ntotal(self) -> int:
        """Get the number of embeddings in the index."""
        return self._index.ntotal

    def add(self, embeddings: list[list[float]]) -> None:
        """Add embeddings to the index.

        Args:
            embeddings: List of embedding vectors (each 128-dim, L2-normalized)

        Raises:
            ValueError: If embeddings have wrong dimension
        """
        if not embeddings:
            return

        # Convert to numpy array with correct dtype
        arr = np.array(embeddings, dtype=np.float32)

        if arr.ndim == 1:
            arr = arr.reshape(1, -1)

        if arr.shape[1] != self._dimension:
            raise ValueError(
                f"Expected {self._dimension}-dimensional embeddings, "
                f"got {arr.shape[1]}"
            )

        self._index.add(arr)

    def search(
        self,
        query: list[float],
        k: int = 5,
    ) -> tuple[list[float], list[int]]:
        """Search for k most similar embeddings.

        Args:
            query: Query embedding (128-dim, L2-normalized)
            k: Number of nearest neighbors to return

        Returns:
            Tuple of (similarities, indices):
            - similarities: Cosine similarity scores (1.0 = identical)
            - indices: Indices of matching embeddings in add() order

        Raises:
            ValueError: If query has wrong dimension or index is empty
        """
        if self._index.ntotal == 0:
            raise ValueError("Index is empty. Add embeddings first.")

        # Convert to numpy array
        query_arr = np.array([query], dtype=np.float32)

        if query_arr.shape[1] != self._dimension:
            raise ValueError(
                f"Expected {self._dimension}-dimensional query, "
                f"got {query_arr.shape[1]}"
            )

        # Limit k to number of indexed vectors
        k = min(k, self._index.ntotal)

        # Search returns (distances, indices)
        # For IndexFlatIP, "distances" are actually similarity scores
        similarities, indices = self._index.search(query_arr, k)

        return similarities[0].tolist(), indices[0].tolist()

    def save(self, path: str | Path) -> None:
        """Save the index to disk.

        Args:
            path: Path to save the index file
        """
        faiss.write_index(self._index, str(path))

    @classmethod
    def load(cls, path: str | Path) -> "SimilarityIndex":
        """Load an index from disk.

        Args:
            path: Path to the index file

        Returns:
            Loaded SimilarityIndex

        Raises:
            ImportError: If faiss-cpu is not installed
            FileNotFoundError: If index file doesn't exist
        """
        if not FAISS_AVAILABLE:
            raise ImportError(
                "faiss-cpu is required for SimilarityIndex. "
                "Install with: pip install faiss-cpu"
            )

        path = Path(path)
        if not path.exists():
            raise FileNotFoundError(f"Index file not found: {path}")

        index = faiss.read_index(str(path))

        # Create instance and set the loaded index
        instance = cls.__new__(cls)
        instance._dimension = index.d
        instance._index = index

        return instance

`dimension: int` `property` ¶

Get the embedding dimension.

`ntotal: int` `property` ¶

Get the number of embeddings in the index.

`init(dimension: int = 128)` ¶

Create a new similarity index.

Parameters:

Name	Type	Description	Default
`dimension`	`int`	Embedding dimension (default 128 for audiomancer)	`128`

Raises:

Type	Description
`ImportError`	If faiss-cpu is not installed

Source code in src/audiomancer/analyzers/embeddings.py

def __init__(self, dimension: int = 128):
    """Create a new similarity index.

    Args:
        dimension: Embedding dimension (default 128 for audiomancer)

    Raises:
        ImportError: If faiss-cpu is not installed
    """
    if not FAISS_AVAILABLE:
        raise ImportError(
            "faiss-cpu is required for SimilarityIndex. "
            "Install with: pip install faiss-cpu"
        )

    self._dimension = dimension
    # IndexFlatIP uses inner product (dot product)
    # For L2-normalized vectors, this equals cosine similarity
    self._index = faiss.IndexFlatIP(dimension)

`add(embeddings: list[list[float]]) -> None` ¶

Add embeddings to the index.

Parameters:

Name	Type	Description	Default
`embeddings`	`list[list[float]]`	List of embedding vectors (each 128-dim, L2-normalized)	required

Raises:

Type	Description
`ValueError`	If embeddings have wrong dimension

Source code in src/audiomancer/analyzers/embeddings.py

def add(self, embeddings: list[list[float]]) -> None:
    """Add embeddings to the index.

    Args:
        embeddings: List of embedding vectors (each 128-dim, L2-normalized)

    Raises:
        ValueError: If embeddings have wrong dimension
    """
    if not embeddings:
        return

    # Convert to numpy array with correct dtype
    arr = np.array(embeddings, dtype=np.float32)

    if arr.ndim == 1:
        arr = arr.reshape(1, -1)

    if arr.shape[1] != self._dimension:
        raise ValueError(
            f"Expected {self._dimension}-dimensional embeddings, "
            f"got {arr.shape[1]}"
        )

    self._index.add(arr)

`load(path: str | Path) -> SimilarityIndex` `classmethod` ¶

Load an index from disk.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to the index file	required

Returns:

Type	Description
`SimilarityIndex`	Loaded SimilarityIndex

Raises:

Type	Description
`ImportError`	If faiss-cpu is not installed
`FileNotFoundError`	If index file doesn't exist

Source code in src/audiomancer/analyzers/embeddings.py

@classmethod
def load(cls, path: str | Path) -> "SimilarityIndex":
    """Load an index from disk.

    Args:
        path: Path to the index file

    Returns:
        Loaded SimilarityIndex

    Raises:
        ImportError: If faiss-cpu is not installed
        FileNotFoundError: If index file doesn't exist
    """
    if not FAISS_AVAILABLE:
        raise ImportError(
            "faiss-cpu is required for SimilarityIndex. "
            "Install with: pip install faiss-cpu"
        )

    path = Path(path)
    if not path.exists():
        raise FileNotFoundError(f"Index file not found: {path}")

    index = faiss.read_index(str(path))

    # Create instance and set the loaded index
    instance = cls.__new__(cls)
    instance._dimension = index.d
    instance._index = index

    return instance

`save(path: str | Path) -> None` ¶

Save the index to disk.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to save the index file	required

Source code in src/audiomancer/analyzers/embeddings.py

def save(self, path: str | Path) -> None:
    """Save the index to disk.

    Args:
        path: Path to save the index file
    """
    faiss.write_index(self._index, str(path))

`search(query: list[float], k: int = 5) -> tuple[list[float], list[int]]` ¶

Search for k most similar embeddings.

Parameters:

Name	Type	Description	Default
`query`	`list[float]`	Query embedding (128-dim, L2-normalized)	required
`k`	`int`	Number of nearest neighbors to return	`5`

Returns:

Type	Description
`list[float]`	Tuple of (similarities, indices):
`list[int]`	similarities: Cosine similarity scores (1.0 = identical)
`tuple[list[float], list[int]]`	indices: Indices of matching embeddings in add() order

Raises:

Type	Description
`ValueError`	If query has wrong dimension or index is empty

Source code in src/audiomancer/analyzers/embeddings.py

def search(
    self,
    query: list[float],
    k: int = 5,
) -> tuple[list[float], list[int]]:
    """Search for k most similar embeddings.

    Args:
        query: Query embedding (128-dim, L2-normalized)
        k: Number of nearest neighbors to return

    Returns:
        Tuple of (similarities, indices):
        - similarities: Cosine similarity scores (1.0 = identical)
        - indices: Indices of matching embeddings in add() order

    Raises:
        ValueError: If query has wrong dimension or index is empty
    """
    if self._index.ntotal == 0:
        raise ValueError("Index is empty. Add embeddings first.")

    # Convert to numpy array
    query_arr = np.array([query], dtype=np.float32)

    if query_arr.shape[1] != self._dimension:
        raise ValueError(
            f"Expected {self._dimension}-dimensional query, "
            f"got {query_arr.shape[1]}"
        )

    # Limit k to number of indexed vectors
    k = min(k, self._index.ntotal)

    # Search returns (distances, indices)
    # For IndexFlatIP, "distances" are actually similarity scores
    similarities, indices = self._index.search(query_arr, k)

    return similarities[0].tolist(), indices[0].tolist()

`cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float` ¶

Compute cosine similarity between two embeddings.

Since embeddings are L2-normalized, cosine similarity is just the dot product.

Parameters:

Name	Type	Description	Default
`embedding1`	`list[float]`	First embedding (128-dim)	required
`embedding2`	`list[float]`	Second embedding (128-dim)	required

Returns:

Type	Description
`float`	Cosine similarity in range [-1, 1]
`float`	(1 = identical, 0 = orthogonal, -1 = opposite)

Example

emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) similarity = cosine_similarity(emb1, emb2) 0 <= similarity <= 1 # For typical audio True

Source code in src/audiomancer/analyzers/embeddings.py

def cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float:
    """Compute cosine similarity between two embeddings.

    Since embeddings are L2-normalized, cosine similarity is just the dot product.

    Args:
        embedding1: First embedding (128-dim)
        embedding2: Second embedding (128-dim)

    Returns:
        Cosine similarity in range [-1, 1]
        (1 = identical, 0 = orthogonal, -1 = opposite)

    Example:
        >>> emb1 = extract_audio_embedding(audio1, 44100)
        >>> emb2 = extract_audio_embedding(audio2, 44100)
        >>> similarity = cosine_similarity(emb1, emb2)
        >>> 0 <= similarity <= 1  # For typical audio
        True
    """
    arr1 = np.array(embedding1, dtype=np.float32)
    arr2 = np.array(embedding2, dtype=np.float32)

    # Dot product (since both are L2-normalized)
    return float(np.dot(arr1, arr2))

`euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float` ¶

Compute Euclidean distance between two embeddings.

Parameters:

Name	Type	Description	Default
`embedding1`	`list[float]`	First embedding (128-dim)	required
`embedding2`	`list[float]`	Second embedding (128-dim)	required

Returns:

Type	Description
`float`	Euclidean distance (lower = more similar)

Example

emb1 = extract_audio_embedding(audio1, 44100) emb2 = extract_audio_embedding(audio2, 44100) distance = euclidean_distance(emb1, emb2) distance >= 0 True

Source code in src/audiomancer/analyzers/embeddings.py

def euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float:
    """Compute Euclidean distance between two embeddings.

    Args:
        embedding1: First embedding (128-dim)
        embedding2: Second embedding (128-dim)

    Returns:
        Euclidean distance (lower = more similar)

    Example:
        >>> emb1 = extract_audio_embedding(audio1, 44100)
        >>> emb2 = extract_audio_embedding(audio2, 44100)
        >>> distance = euclidean_distance(emb1, emb2)
        >>> distance >= 0
        True
    """
    arr1 = np.array(embedding1, dtype=np.float32)
    arr2 = np.array(embedding2, dtype=np.float32)

    return float(np.linalg.norm(arr1 - arr2))

`extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]` ¶

Extract 128-dimensional audio embedding for similarity search.

Embeddings are L2-normalized fixed-size vectors that encode audio content. Similar-sounding audio will have similar embeddings (high cosine similarity).

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Audio samples as numpy array (mono or stereo)	required
`sr`	`int`	Sample rate in Hz	required
`model`	`ModelType`	Embedding model to use: - "musicnn": MusiCNN embeddings (recommended for music) - "vggish": VGGish embeddings (general audio) - "openl3": OpenL3 embeddings (environmental sounds)	`'musicnn'`

Returns:

Type	Description
`list[float]`	List of 128 floats (L2-normalized embedding vector)

Raises:

Type	Description
`ModelLoadError`	If model file not found
`AnalysisFailedError`	If embedding extraction fails

Example

embedding = extract_audio_embedding(audio, 44100) len(embedding) 128

Verify L2 normalization¶

import math math.isclose(sum(x**2 for x in embedding), 1.0, abs_tol=1e-6) True

Source code in src/audiomancer/analyzers/embeddings.py

def extract_audio_embedding(
    audio: np.ndarray,
    sr: int,
    model: ModelType = "musicnn",
) -> list[float]:
    """Extract 128-dimensional audio embedding for similarity search.

    Embeddings are L2-normalized fixed-size vectors that encode audio content.
    Similar-sounding audio will have similar embeddings (high cosine similarity).

    Args:
        audio: Audio samples as numpy array (mono or stereo)
        sr: Sample rate in Hz
        model: Embedding model to use:
            - "musicnn": MusiCNN embeddings (recommended for music)
            - "vggish": VGGish embeddings (general audio)
            - "openl3": OpenL3 embeddings (environmental sounds)

    Returns:
        List of 128 floats (L2-normalized embedding vector)

    Raises:
        ModelLoadError: If model file not found
        AnalysisFailedError: If embedding extraction fails

    Example:
        >>> embedding = extract_audio_embedding(audio, 44100)
        >>> len(embedding)
        128
        >>> # Verify L2 normalization
        >>> import math
        >>> math.isclose(sum(x**2 for x in embedding), 1.0, abs_tol=1e-6)
        True
    """
    try:
        # Ensure mono audio
        if audio.ndim > 1:
            audio = np.mean(audio, axis=0)

        # Resample to model's expected sample rate
        # Most Essentia models expect 16kHz
        target_sr = 16000
        if sr != target_sr:
            import librosa
            audio = librosa.resample(audio, orig_sr=sr, target_sr=target_sr)

        # Get cached model and extract embedding
        embedding_extractor = get_embedding_model(model)

        # Extract embedding based on model type
        if model == "musicnn":
            embedding = _extract_musicnn_embedding_cached(audio, embedding_extractor)
        elif model == "vggish":
            embedding = _extract_vggish_embedding_cached(audio, embedding_extractor)
        elif model == "openl3":
            embedding = _extract_openl3_embedding_cached(audio, embedding_extractor)
        else:
            raise ModelLoadError(
                f"Unknown embedding model: {model}",
                details={"model": model}
            )

        # Ensure embedding is 128-dimensional
        if len(embedding) != 128:
            raise AnalysisFailedError(
                f"Expected 128-dimensional embedding, got {len(embedding)}",
                details={
                    "model": model,
                    "embedding_dim": len(embedding),
                }
            )

        # L2 normalize
        embedding_array = np.array(embedding, dtype=np.float32)
        norm = np.linalg.norm(embedding_array)

        if norm == 0:
            raise AnalysisFailedError(
                "Zero-norm embedding (silent audio?)",
                details={"model": model}
            )

        normalized_embedding = embedding_array / norm

        # Verify normalization
        verification_norm = np.linalg.norm(normalized_embedding)
        if not np.isclose(verification_norm, 1.0, atol=1e-6):
            raise AnalysisFailedError(
                f"L2 normalization failed: norm={verification_norm}",
                details={"model": model, "norm": float(verification_norm)}
            )

        return normalized_embedding.tolist()

    except ModelLoadError:
        raise
    except AnalysisFailedError:
        raise
    except Exception as e:
        raise AnalysisFailedError(
            "Embedding extraction failed",
            details={
                "error": str(e),
                "stage": "embedding extraction",
                "model": model,
            }
        )

Analyzers API Reference¶

Overview¶

analyzers ¶

BasicMetadata ¶

ControlSpec ¶

SpectralFeatures ¶

SynthControl ¶

SynthDefControl dataclass ¶

SynthDefInfo dataclass ¶

SynthDefMetadata ¶

SynthDefParser ¶

extract_controls(source_code: str) -> list[SynthControl] ¶

extract_ugens(source_code: str) -> list[str] ¶

infer_category(metadata: SynthDefMetadata) -> str ¶

parse(file_path: str, timeout: int = 5) -> SynthDefMetadata ¶

parse_batch(file_paths: list[str], timeout: int = 5) -> list[SynthDefMetadata] ¶

validate_path(file_path: str) -> bool ¶

SynthDefStore ¶

add(synthdef: SynthDefMetadata) -> str ¶

delete(synth_id: str) -> bool ¶

get(synth_id: str) -> Optional[SynthDefMetadata] ¶

get_by_name(name: str) -> Optional[SynthDefMetadata] ¶

search(category: Optional[str] = None, has_gate: Optional[bool] = None, tags: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SynthDefMetadata] ¶

Find bass synths with gate¶

update(synth_id: str, updates: dict[str, Any]) -> bool ¶

categorize_synthdef(info: SynthDefInfo) -> str ¶

classify_instrument(audio: np.ndarray, sr: int, model_path: Optional[str] = None, top_k: int = 3) -> dict[str, Any] ¶

clear_cache(model_type: Optional[ModelType] = None) -> None ¶

cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float ¶

download_model(model_type: ModelType, force: bool = False, verify_checksum: bool = True) -> Path ¶

euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float ¶

extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float] ¶

Verify L2 normalization¶

extract_genre_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str] ¶

extract_mood_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str] ¶

extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any] ¶

extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures ¶

extract_tonal_features(audio: np.ndarray, sr: int) -> dict[str, Any] ¶

get_basic_metadata(path: Path) -> BasicMetadata ¶

list_models(include_cached_only: bool = False) -> dict[str, dict[str, Any]] ¶

load_model(model_type: ModelType, auto_download: bool = True) -> Path ¶

parse_synthdef(path: Path, timeout: float = 10.0) -> SynthDefInfo ¶

Basic Metadata¶

basic ¶

BasicMetadata ¶

get_basic_metadata(path: Path) -> BasicMetadata ¶

Spectral Features¶

spectral ¶

SpectralFeatures ¶

extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures ¶

Rhythm Features¶

rhythm ¶

extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any] ¶

Audio Embeddings¶

embeddings ¶

SimilarityIndex ¶

Build index from embeddings¶

Search for similar samples¶

Save/load for persistence¶

dimension: int property ¶

ntotal: int property ¶

__init__(dimension: int = 128) ¶

add(embeddings: list[list[float]]) -> None ¶

load(path: str | Path) -> SimilarityIndex classmethod ¶

save(path: str | Path) -> None ¶

search(query: list[float], k: int = 5) -> tuple[list[float], list[int]] ¶

cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float ¶

euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float ¶

extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float] ¶

Verify L2 normalization¶

`analyzers` ¶

`BasicMetadata` ¶

`ControlSpec` ¶

`SpectralFeatures` ¶

`SynthControl` ¶

`SynthDefControl` `dataclass` ¶

`SynthDefInfo` `dataclass` ¶

`SynthDefMetadata` ¶

`SynthDefParser` ¶

`extract_controls(source_code: str) -> list[SynthControl]` ¶

`extract_ugens(source_code: str) -> list[str]` ¶

`infer_category(metadata: SynthDefMetadata) -> str` ¶

`parse(file_path: str, timeout: int = 5) -> SynthDefMetadata` ¶

`parse_batch(file_paths: list[str], timeout: int = 5) -> list[SynthDefMetadata]` ¶

`validate_path(file_path: str) -> bool` ¶

`SynthDefStore` ¶

`add(synthdef: SynthDefMetadata) -> str` ¶

`delete(synth_id: str) -> bool` ¶

`get(synth_id: str) -> Optional[SynthDefMetadata]` ¶

`get_by_name(name: str) -> Optional[SynthDefMetadata]` ¶

`search(category: Optional[str] = None, has_gate: Optional[bool] = None, tags: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SynthDefMetadata]` ¶

`update(synth_id: str, updates: dict[str, Any]) -> bool` ¶

`categorize_synthdef(info: SynthDefInfo) -> str` ¶

`classify_instrument(audio: np.ndarray, sr: int, model_path: Optional[str] = None, top_k: int = 3) -> dict[str, Any]` ¶

`clear_cache(model_type: Optional[ModelType] = None) -> None` ¶

`cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float` ¶

`download_model(model_type: ModelType, force: bool = False, verify_checksum: bool = True) -> Path` ¶

`euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float` ¶

`extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]` ¶

`extract_genre_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]` ¶

`extract_mood_tags(audio: np.ndarray, sr: int, top_k: int = 3, threshold: float = 0.1) -> list[str]` ¶

`extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

`extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures` ¶

`extract_tonal_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

`get_basic_metadata(path: Path) -> BasicMetadata` ¶

`list_models(include_cached_only: bool = False) -> dict[str, dict[str, Any]]` ¶

`load_model(model_type: ModelType, auto_download: bool = True) -> Path` ¶

`parse_synthdef(path: Path, timeout: float = 10.0) -> SynthDefInfo` ¶

`basic` ¶

`BasicMetadata` ¶

`get_basic_metadata(path: Path) -> BasicMetadata` ¶

`spectral` ¶

`SpectralFeatures` ¶

`extract_spectral_features(audio: np.ndarray, sr: int) -> SpectralFeatures` ¶

`rhythm` ¶

`extract_rhythm_features(audio: np.ndarray, sr: int) -> dict[str, Any]` ¶

`embeddings` ¶

`SimilarityIndex` ¶

`dimension: int` `property` ¶

`ntotal: int` `property` ¶

`init(dimension: int = 128)` ¶

`add(embeddings: list[list[float]]) -> None` ¶

`load(path: str | Path) -> SimilarityIndex` `classmethod` ¶

`save(path: str | Path) -> None` ¶

`search(query: list[float], k: int = 5) -> tuple[list[float], list[int]]` ¶

`cosine_similarity(embedding1: list[float], embedding2: list[float]) -> float` ¶

`euclidean_distance(embedding1: list[float], embedding2: list[float]) -> float` ¶

`extract_audio_embedding(audio: np.ndarray, sr: int, model: ModelType = 'musicnn') -> list[float]` ¶