Storage API Reference¶

The audiomancer.storage module provides database and vector storage functionality.

Overview¶

`storage` ¶

Storage layer for audiomancer.

Provides interfaces and implementations for sample and vector storage.

`all = ['SampleMetadata', 'SampleStore', 'VectorStore', 'LanceDBVectorStore', 'SynthStore']` `module-attribute` ¶

`LanceDBVectorStore` ¶

LanceDB-backed vector storage for audio embeddings.

Provides efficient storage and similarity search for 128-dimensional audio embeddings using LanceDB's vector index capabilities.

The embeddings table has schema: - id: string (sample ID) - embedding: fixed_size_list[float32, 128] (audio embedding) - created_at: timestamp (insertion time)

Attributes:

Name	Type	Description
`db_path`		Path to LanceDB database directory
`table_name`		Name of embeddings table (default: "embeddings")
`embedding_dim`		Required embedding dimension (always 128)

`init(db_path: Path) -> None` ¶

Initialize LanceDB at given path.

Creates database directory if it doesn't exist. Table is created lazily on first add operation.

Parameters:

Name	Type	Description	Default
`db_path`	`Path`	Path to LanceDB database directory	required

Example

store = LanceDBVectorStore(Path("./embeddings")) store.db_path PosixPath('./embeddings')

`add_embedding(sample_id: str, embedding: list[float]) -> None` ¶

Store 128-dim embedding for sample.

If sample_id already exists, replaces the existing embedding.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID (format: "smpl_{hash[:8]}")	required
`embedding`	`list[float]`	Vector embedding (dimension=128)	required

Raises:

Type	Description
`ValueError`	If embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) embedding = [0.1, 0.2] + [0.0] * 126 # 128 dims store.add_embedding("smpl_abc123", embedding) retrieved = store.get_embedding("smpl_abc123") len(retrieved) 128

`add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None` ¶

Add multiple embeddings efficiently.

Validates all dimensions first, then batch inserts. If any embedding has wrong dimension, entire batch fails atomically.

Parameters:

Name	Type	Description	Default
`items`	`list[tuple[str, list[float]]]`	List of (sample_id, embedding) tuples	required

Raises:

Type	Description
`ValueError`	If any embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) items = [ ... ("smpl_abc123", [0.1] * 128), ... ("smpl_def456", [0.2] * 128), ... ("smpl_ghi789", [0.3] * 128), ... ] store.add_embeddings_batch(items) len(store.search_similar([0.1] * 128, limit=10)) 3

`delete_embedding(sample_id: str) -> bool` ¶

Delete embedding. Returns True if deleted, False if not found.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to delete	required

Returns:

Type	Description
`bool`	True if embedding was deleted, False if not found

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) store.delete_embedding("smpl_abc123") True store.delete_embedding("smpl_abc123") # Already deleted False

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

Retrieve embedding by sample ID.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID	required

Returns:

Type	Description
`Optional[list[float]]`	Embedding vector if found, None otherwise

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) embedding = store.get_embedding("smpl_abc123") len(embedding) 128 store.get_embedding("smpl_nonexistent") None

`search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]` ¶

Find similar samples by embedding distance.

Returns samples sorted by distance (ascending = most similar first). Uses ANN (Approximate Nearest Neighbors) for efficient search.

Distance metrics: - cosine: Cosine distance (0 = identical, 2 = opposite) - l2: Euclidean distance (lower = more similar)

Parameters:

Name	Type	Description	Default
`embedding`	`list[float]`	Query vector (dimension=128)	required
`limit`	`int`	Maximum results to return	`10`
`offset`	`int`	Number of results to skip (for pagination)	`0`
`exclude_ids`	`Optional[list[str]]`	Sample IDs to exclude from results	`None`
`distance_metric`	`Literal['cosine', 'l2']`	Distance calculation method	`'cosine'`

Returns:

Type	Description
`list[tuple[str, float]]`	List of (sample_id, distance) sorted by distance ascending

Raises:

Type	Description
`ValueError`	If embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) store.add_embedding("smpl_def456", [0.2] * 128) results = store.search_similar([0.1] * 128, limit=2) results[0][] # Most similar ID 'smpl_abc123' results[0][1] < results[1][] # Distance ascending True

`SampleMetadata` ¶

Bases: TypedDict

Metadata for an audio sample.

This TypedDict describes all fields stored for each sample. Fields marked as required (not NotRequired) must be present when creating a sample.

Example

sample = SampleMetadata( ... id="smpl_abc12345", ... file_path="/path/to/kick.wav", ... file_hash="abc123def456", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... spectral_centroid=1500.0, ... spectral_bandwidth=800.0, ... spectral_rolloff=5000.0, ... zero_crossing_rate=0.15, ... rms_energy=0.7, ... dynamic_range=40.0, ... bpm=125.0, ... bpm_confidence=0.95, ... is_loop=True, ... key="C", ... key_confidence=0.88, ... tuning_frequency=440.0, ... pitch_salience=0.8, ... instrument_type="kick", ... instrument_confidence=0.92, ... mood=["energetic", "dark"], ... genre_tags=["techno", "industrial"], ... created_at=datetime.now(), ... updated_at=datetime.now(), ... )

`SampleStore` ¶

Bases: Protocol

Interface for sample storage operations.

This Protocol defines the contract for storing and retrieving sample metadata. Implementations must provide CRUD operations, search, and batch operations.

`add(sample: SampleMetadata) -> str` ¶

Add sample to database.

Parameters:

Name	Type	Description	Default
`sample`	`SampleMetadata`	Complete sample metadata to store	required

Returns:

Type	Description
`str`	Sample ID (format: "smpl_{hash[:8]}")

Raises:

Type	Description
`DuplicateSampleError`	If sample with same file_hash already exists

Example

sample = SampleMetadata( ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) sample_id = store.add(sample) sample_id "smpl_abc123"

`add_batch(samples: list[SampleMetadata]) -> list[str]` ¶

Add multiple samples atomically in a single transaction.

All samples are added or none are (atomic operation). On duplicate, rolls back entire batch without partial commits.

Parameters:

Name	Type	Description	Default
`samples`	`list[SampleMetadata]`	List of sample metadata to store	required

Returns:

Type	Description
`list[str]`	List of sample IDs in same order as input

Raises:

Type	Description
`DuplicateSampleError`	On first duplicate (no partial commits)

Example

samples = [sample1, sample2, sample3] ids = store.add_batch(samples) len(ids) 3 ids[0] "smpl_abc123"

`count(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None) -> int` ¶

Count samples matching filters.

Same filter logic as search(), but returns count instead of results. Useful for calculating pagination.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Text search in file path	`None`
`instrument_type`	`Optional[str]`	Filter by instrument category	`None`
`bpm_min`	`Optional[float]`	Minimum BPM (inclusive)	`None`
`bpm_max`	`Optional[float]`	Maximum BPM (inclusive)	`None`
`key`	`Optional[str]`	Musical key filter	`None`
`mood`	`Optional[list[str]]`	Mood tags (matches if ANY tag present)	`None`

Returns:

Type	Description
`int`	Number of matching samples

Example

total = store.count(instrument_type="kick") total 234 pages = (total + 9) // 10 # Calculate pages (10 per page) pages 24

`delete(sample_id: str) -> bool` ¶

Delete sample from database.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to delete	required

Returns:

Type	Description
`bool`	True if sample was deleted, False if not found

Example

success = store.delete("smpl_abc123") success True store.delete("smpl_nonexistent") False

`get(sample_id: str) -> Optional[SampleMetadata]` ¶

Retrieve sample by ID.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID (format: "smpl_{hash[:8]}")	required

Returns:

Type	Description
`Optional[SampleMetadata]`	Sample metadata if found, None otherwise

Example

sample = store.get("smpl_abc123") sample['file_path'] "/samples/kick.wav" store.get("smpl_nonexistent") None

`get_by_hash(file_hash: str) -> Optional[SampleMetadata]` ¶

Retrieve sample by file hash.

Used for deduplication - check if sample already exists before adding.

Parameters:

Name	Type	Description	Default
`file_hash`	`str`	SHA256 hash of audio file	required

Returns:

Type	Description
`Optional[SampleMetadata]`	Sample metadata if found, None otherwise

Example

sample = store.get_by_hash("abc123") sample is not None True

`get_by_path(file_path: str) -> Optional[SampleMetadata]` ¶

Retrieve sample by file path.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Absolute path to audio file	required

Returns:

Type	Description
`Optional[SampleMetadata]`	Sample metadata if found, None otherwise

Example

sample = store.get_by_path("/samples/kick.wav") sample['id'] "smpl_abc123"

`search(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SampleMetadata]` ¶

Search samples with filters and pagination.

All filters are combined with AND logic. Text query searches file_path.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Text search in file path	`None`
`instrument_type`	`Optional[str]`	Filter by instrument category	`None`
`bpm_min`	`Optional[float]`	Minimum BPM (inclusive)	`None`
`bpm_max`	`Optional[float]`	Maximum BPM (inclusive)	`None`
`key`	`Optional[str]`	Musical key filter	`None`
`mood`	`Optional[list[str]]`	Mood tags (matches if ANY tag present)	`None`
`limit`	`int`	Maximum results to return	`50`
`offset`	`int`	Number of results to skip (for pagination)	`0`

Returns:

Type	Description
`list[SampleMetadata]`	List of matching samples (up to limit)

Example

Search for kicks between 120-130 BPM¶

results = store.search( ... instrument_type="kick", ... bpm_min=120.0, ... bpm_max=130.0, ... limit=10, ... offset=0, ... ) len(results) <= 10 True

Pagination - get second page¶

page2 = store.search( ... instrument_type="kick", ... limit=10, ... offset=10, ... )

`update(sample_id: str, updates: dict) -> bool` ¶

Update sample fields.

Only updates specified fields, leaving others unchanged.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to update	required
`updates`	`dict`	Dictionary of field names and new values	required

Returns:

Type	Description
`bool`	True if sample was updated, False if not found

Example

success = store.update( ... "smpl_abc123", ... {"bpm": 128.0, "key": "C#"} ... ) success True store.update("smpl_nonexistent", {"bpm": 120}) False

`SynthStore` ¶

SQLite implementation of synth storage.

Provides atomic CRUD operations for synth metadata with fail-fast error handling.

Example

store = SynthStore("~/.audiomancer/samples.db") synth = { ... "id": "synth_abc123", ... "name": "tb303", ... "file_path": "/synths/tb303.scd", ... "file_hash": "abc123", ... "source_code": "SynthDef(...)", ... "controls": [{"name": "freq", "default": 440.0}], ... "characteristics": {"num_channels": 2, "has_gate": True}, ... } synth_id = store.add(synth) retrieved = store.get(synth_id) retrieved['name'] 'tb303'

`init(db_path: str)` ¶

Initialize store with database connection.

Parameters:

Name	Type	Description	Default
`db_path`	`str`	Path to SQLite database file (will be created if missing)	required

Example

store = SynthStore("~/.audiomancer/samples.db") store = SynthStore(":memory:") # In-memory for testing

`add(synth: dict) -> str` ¶

Add synth to database.

Parameters:

Name	Type	Description	Default
`synth`	`dict`	Complete synth metadata to store	required

Returns:

Type	Description
`str`	Synth ID (format: "synth_{hash[:8]}")

Raises:

Type	Description
`StorageError`	If synth with same name or hash already exists

Example

synth = { ... "id": "synth_abc123", ... "name": "tb303", ... "file_path": "/synths/tb303.scd", ... "file_hash": "abc123", ... "source_code": "SynthDef(...)", ... "controls": [], ... } synth_id = store.add(synth) synth_id "synth_abc123"

`add_lineage(synth_id: str, parent_synth_id: str, contribution_weight: float = 0.5) -> None` ¶

Record synth lineage (parent-child relationship).

Used to track synth evolution when one synth is derived from another.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Child synth ID	required
`parent_synth_id`	`str`	Parent synth ID	required
`contribution_weight`	`float`	How much parent contributed (0-1)	`0.5`

Raises:

Type	Description
`StorageError`	If synths don't exist or lineage already recorded

Example

store.add_lineage("synth_new", "synth_original", 0.8)

`count(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None) -> int` ¶

Count synths matching filters.

Same filter logic as search(), but returns count instead of results.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Text search in name or file path	`None`
`category`	`Optional[str]`	Filter by category	`None`
`has_gate`	`Optional[bool]`	Filter by gate parameter presence	`None`

Returns:

Type	Description
`int`	Number of matching synths

Example

total = store.count(category="bass") total 15

`delete(synth_id: str) -> bool` ¶

Delete synth from database.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID to delete	required

Returns:

Type	Description
`bool`	True if synth was deleted, False if not found

Example

success = store.delete("synth_abc123") success True

`get(synth_id: str) -> Optional[dict]` ¶

Retrieve synth by ID.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID (format: "synth_{hash[:8]}")	required

Returns:

Type	Description
`Optional[dict]`	Synth metadata if found, None otherwise

Example

synth = store.get("synth_abc123") synth['name'] 'tb303' store.get("synth_nonexistent") None

`get_by_hash(file_hash: str) -> Optional[dict]` ¶

Retrieve synth by file hash.

Used for deduplication - check if synth already exists before adding.

Parameters:

Name	Type	Description	Default
`file_hash`	`str`	SHA256 hash of source code	required

Returns:

Type	Description
`Optional[dict]`	Synth metadata if found, None otherwise

Example

synth = store.get_by_hash("abc123") synth is not None True

`get_by_name(name: str) -> Optional[dict]` ¶

Retrieve synth by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	Synth name (e.g., "tb303")	required

Returns:

Type	Description
`Optional[dict]`	Synth metadata if found, None otherwise

Example

synth = store.get_by_name("tb303") synth['id'] "synth_abc123"

`get_by_path(file_path: str) -> Optional[dict]` ¶

Retrieve synth by file path.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Absolute path to .scd file	required

Returns:

Type	Description
`Optional[dict]`	Synth metadata if found, None otherwise

Example

synth = store.get_by_path("/synths/tb303.scd") synth['name'] 'tb303'

`get_lineage(synth_id: str) -> list[dict]` ¶

Get parent synths (lineage) for a synth.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID	required

Returns:

Type	Description
`list[dict]`	List of parent synth records with contribution weights

Example

parents = store.get_lineage("synth_new") parents[0]['parent_synth_id'] 'synth_original' parents[0]['contribution_weight'] 0.8

`list_all(limit: int = 100) -> list[dict]` ¶

List all synths with optional limit.

Convenience method that calls search() with no filters.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of synths to return	`100`

Returns:

Type	Description
`list[dict]`	List of synth metadata dictionaries

Example

synths = store.list_all(limit=50) len(synths) <= 50 True

`search(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None, limit: int = 50, offset: int = 0) -> list[dict]` ¶

Search synths with filters and pagination.

All filters are combined with AND logic. Text query searches name and file_path.

Parameters:

Name	Type	Description	Default
`query`	`Optional[str]`	Text search in name or file path	`None`
`category`	`Optional[str]`	Filter by category (bass, lead, pad, drum, fx)	`None`
`has_gate`	`Optional[bool]`	Filter by gate parameter presence	`None`
`limit`	`int`	Maximum results to return	`50`
`offset`	`int`	Number of results to skip (for pagination)	`0`

Returns:

Type	Description
`list[dict]`	List of matching synths (up to limit)

Example

Search for bass synths¶

results = store.search(category="bass", limit=10) len(results) <= 10 True

`update(synth_id: str, updates: dict) -> bool` ¶

Update synth fields.

Only updates specified fields, leaving others unchanged. Automatically updates the updated_at timestamp.

Parameters:

Name	Type	Description	Default
`synth_id`	`str`	Synth ID to update	required
`updates`	`dict`	Dictionary of field names and new values	required

Returns:

Type	Description
`bool`	True if synth was updated, False if not found

Example

success = store.update( ... "synth_abc123", ... {"characteristics": {"num_channels": 2}} ... ) success True

`VectorStore` ¶

Bases: Protocol

Interface for embedding vector operations.

Stores and searches sample embeddings for semantic similarity search. Uses cosine distance metric (0 = identical, 2 = opposite).

`add_embedding(sample_id: str, embedding: list[float]) -> None` ¶

Store embedding vector for a sample.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID (must exist in SampleStore)	required
`embedding`	`list[float]`	Vector embedding (dimension=128)	required

Raises:

Type	Description
`ValueError`	If embedding dimension != 128

Example

embedding = [0.1, 0.2, ..., 0.5] # 128 dimensions store.add_embedding("smpl_abc123", embedding)

`add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None` ¶

Add multiple embeddings efficiently in batch.

Optimized for bulk insertion (10-100x faster than individual adds).

Parameters:

Name	Type	Description	Default
`items`	`list[tuple[str, list[float]]]`	List of (sample_id, embedding) tuples	required

Raises:

Type	Description
`ValueError`	If any embedding dimension != 128

Example

items = [ ... ("smpl_abc123", [0.1, 0.2, ..., 0.5]), ... ("smpl_def456", [0.3, 0.4, ..., 0.6]), ... ("smpl_ghi789", [0.2, 0.3, ..., 0.4]), ... ] store.add_embeddings_batch(items)

`delete_embedding(sample_id: str) -> bool` ¶

Delete embedding vector for a sample.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID	required

Returns:

Type	Description
`bool`	True if embedding was deleted, False if not found

Example

success = store.delete_embedding("smpl_abc123") success True store.delete_embedding("smpl_nonexistent") False

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

Retrieve embedding vector for a sample.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID	required

Returns:

Type	Description
`Optional[list[float]]`	Embedding vector if found, None otherwise

Example

embedding = store.get_embedding("smpl_abc123") len(embedding) 128 store.get_embedding("smpl_nonexistent") None

`search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]` ¶

Find similar samples by embedding distance.

Returns samples sorted by distance (ascending = most similar first).

Distance metrics: - cosine: Cosine distance (0 = identical, 2 = opposite) - l2: Euclidean distance (lower = more similar)

Parameters:

Name	Type	Description	Default
`embedding`	`list[float]`	Query vector (dimension=128)	required
`limit`	`int`	Maximum results to return	`10`
`offset`	`int`	Number of results to skip (for pagination)	`0`
`exclude_ids`	`Optional[list[str]]`	Sample IDs to exclude from results	`None`
`distance_metric`	`Literal['cosine', 'l2']`	Distance calculation method	`'cosine'`

Returns:

Type	Description
`list[tuple[str, float]]`	List of (sample_id, distance) sorted by distance ascending

Raises:

Type	Description
`ValueError`	If embedding dimension != 128

Example

query_emb = [0.3, 0.4, ..., 0.5] # 128 dimensions results = store.search_similar( ... query_emb, ... limit=5, ... offset=0, ... distance_metric="cosine", ... ) results [ ("smpl_abc123", 0.05), ("smpl_def456", 0.12), ("smpl_ghi789", 0.18), ("smpl_jkl012", 0.23), ("smpl_mno345", 0.29), ]

Pagination - get next page¶

page2 = store.search_similar(query_emb, limit=5, offset=5)

Exclude already-used samples¶

more = store.search_similar( ... query_emb, ... limit=5, ... exclude_ids=["smpl_abc123", "smpl_def456"], ... )

Unified Storage¶

`unified` ¶

Unified storage layer integrating SQLite and LanceDB.

This module provides atomic operations across both metadata (SQLite) and embedding (LanceDB) stores, ensuring data consistency. All operations follow fail-fast semantics with automatic rollback on partial failures.

Example

from pathlib import Path storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... ) sample_id = storage.add_sample_with_embedding(sample, embedding) similar = storage.find_similar(sample_id, limit=10)

`UnifiedSampleStorage` ¶

Unified interface for sample storage with metadata and embeddings.

Coordinates atomic operations across SQLite (metadata) and LanceDB (embeddings) to maintain data consistency. If either store fails, changes are rolled back.

Attributes:

Name	Type	Description
`sample_store`		SQLite metadata store
`vector_store`		LanceDB embedding store

Example

storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... ) sample = SampleMetadata( ... id="smpl_abc123", ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) embedding = [0.1] * 128 sample_id = storage.add_sample_with_embedding(sample, embedding)

Source code in src/audiomancer/storage/unified.py

class UnifiedSampleStorage:
    """Unified interface for sample storage with metadata and embeddings.

    Coordinates atomic operations across SQLite (metadata) and LanceDB (embeddings)
    to maintain data consistency. If either store fails, changes are rolled back.

    Attributes:
        sample_store: SQLite metadata store
        vector_store: LanceDB embedding store

    Example:
        >>> storage = UnifiedSampleStorage(
        ...     db_path=Path("~/.audiomancer/samples.db"),
        ...     embeddings_path=Path("~/.audiomancer/embeddings")
        ... )
        >>> sample = SampleMetadata(
        ...     id="smpl_abc123",
        ...     file_path="/samples/kick.wav",
        ...     file_hash="abc123",
        ...     duration_ms=250.5,
        ...     sample_rate=44100,
        ...     channels=1,
        ...     bit_depth=16,
        ...     file_size_bytes=44100,
        ... )
        >>> embedding = [0.1] * 128
        >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
    """

    def __init__(self, db_path: Path, embeddings_path: Path):
        """Initialize unified storage with both stores.

        Creates database and embedding directories if they don't exist.

        Args:
            db_path: Path to SQLite database file
            embeddings_path: Path to LanceDB embeddings directory

        Example:
            >>> storage = UnifiedSampleStorage(
            ...     db_path=Path("~/.audiomancer/samples.db"),
            ...     embeddings_path=Path("~/.audiomancer/embeddings")
            ... )
        """
        # Expand user paths
        db_path = db_path.expanduser().absolute()
        embeddings_path = embeddings_path.expanduser().absolute()

        # Create parent directories
        db_path.parent.mkdir(parents=True, exist_ok=True)
        embeddings_path.mkdir(parents=True, exist_ok=True)

        self.sample_store = SampleStore(str(db_path))
        self.vector_store = LanceDBVectorStore(embeddings_path)

    def add_sample_with_embedding(
        self,
        sample: SampleMetadata,
        embedding: list[float]
    ) -> str:
        """Add sample and embedding atomically.

        Both metadata and embedding are added together. If either operation fails,
        neither is persisted (atomic rollback).

        Args:
            sample: Complete sample metadata
            embedding: 128-dimensional embedding vector

        Returns:
            Sample ID (format: "smpl_{hash[:8]}")

        Raises:
            DuplicateSampleError: If sample hash already exists in database
            ValueError: If embedding dimension != 128
            StorageError: On unexpected storage errors

        Example:
            >>> sample = SampleMetadata(
            ...     id="smpl_abc123",
            ...     file_path="/samples/kick.wav",
            ...     file_hash="abc123",
            ...     duration_ms=250.5,
            ...     sample_rate=44100,
            ...     channels=1,
            ...     bit_depth=16,
            ...     file_size_bytes=44100,
            ... )
            >>> embedding = [0.1] * 128
            >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
            >>> storage.sample_store.get(sample_id) is not None
            True
            >>> storage.vector_store.get_embedding(sample_id) is not None
            True
        """
        sample_id: Optional[str] = None

        try:
            # Add sample metadata first
            sample_id = self.sample_store.add(sample)

            # Add embedding (if this fails, rollback sample)
            self.vector_store.add_embedding(sample_id, embedding)

            return sample_id

        except DuplicateSampleError:
            # Sample already exists, propagate error
            raise
        except ValueError as e:
            # Embedding validation failed, rollback sample if added
            if sample_id:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors, original error is more important
                    pass
            raise StorageError(
                f"Invalid embedding: {str(e)}",
                details={"sample_id": sample.get("id"), "error": str(e)}
            )
        except Exception as e:
            # Unexpected error, rollback sample if added
            if sample_id:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
            raise StorageError(
                f"Failed to add sample with embedding: {str(e)}",
                details={"sample_id": sample.get("id"), "error": str(e)}
            )

    def add_samples_with_embeddings_batch(
        self,
        items: list[tuple[SampleMetadata, list[float]]]
    ) -> list[str]:
        """Add multiple samples and embeddings atomically.

        All samples and embeddings are added together or none are added
        (atomic batch operation). On any failure, rolls back all changes.

        Args:
            items: List of (sample, embedding) tuples

        Returns:
            List of sample IDs in same order as input

        Raises:
            DuplicateSampleError: If any sample hash already exists
            ValueError: If any embedding dimension != 128
            StorageError: On unexpected storage errors

        Example:
            >>> items = [
            ...     (sample1, [0.1] * 128),
            ...     (sample2, [0.2] * 128),
            ...     (sample3, [0.3] * 128),
            ... ]
            >>> sample_ids = storage.add_samples_with_embeddings_batch(items)
            >>> len(sample_ids)
            3
        """
        if not items:
            return []

        sample_ids: list[str] = []

        try:
            # Validate all embeddings first (fail fast)
            for sample, embedding in items:
                if len(embedding) != LanceDBVectorStore.EMBEDDING_DIM:
                    raise ValueError(
                        f"Embedding dimension must be {LanceDBVectorStore.EMBEDDING_DIM}, "
                        f"got {len(embedding)} for sample {sample.get('id')}"
                    )

            # Add all samples first
            samples = [sample for sample, _ in items]
            sample_ids = self.sample_store.add_batch(samples)

            # Add all embeddings
            embedding_items = [
                (sample_id, embedding)
                for sample_id, (_, embedding) in zip(sample_ids, items)
            ]
            self.vector_store.add_embeddings_batch(embedding_items)

            return sample_ids

        except DuplicateSampleError:
            # Sample already exists, propagate error
            raise
        except ValueError as e:
            # Embedding validation failed, rollback samples if added
            if sample_ids:
                for sample_id in sample_ids:
                    try:
                        self.sample_store.delete(sample_id)
                    except Exception:
                        # Ignore rollback errors
                        pass
            raise StorageError(
                f"Invalid embedding in batch: {str(e)}",
                details={"batch_size": len(items), "error": str(e)}
            )
        except Exception as e:
            # Unexpected error, rollback samples if added
            if sample_ids:
                for sample_id in sample_ids:
                    try:
                        self.sample_store.delete(sample_id)
                    except Exception:
                        # Ignore rollback errors
                        pass
            raise StorageError(
                f"Failed to add batch with embeddings: {str(e)}",
                details={"batch_size": len(items), "error": str(e)}
            )

    def delete_sample(self, sample_id: str) -> bool:
        """Delete sample and its embedding.

        Removes from both metadata and embedding stores. If either delete fails,
        the operation continues (best effort cleanup).

        Args:
            sample_id: Sample ID to delete

        Returns:
            True if sample was deleted from metadata store, False if not found

        Example:
            >>> success = storage.delete_sample("smpl_abc123")
            >>> success
            True
            >>> storage.sample_store.get("smpl_abc123")
            None
            >>> storage.vector_store.get_embedding("smpl_abc123")
            None
        """
        # Delete from both stores (best effort)
        sample_deleted = self.sample_store.delete(sample_id)

        # Always try to delete embedding even if sample wasn't found
        # (orphaned embeddings should be cleaned up)
        try:
            self.vector_store.delete_embedding(sample_id)
        except Exception:
            # Ignore embedding deletion errors
            pass

        return sample_deleted

    def find_similar(
        self,
        sample_id: str,
        limit: int = 10,
        exclude_self: bool = True,
        distance_metric: Literal["cosine", "l2"] = "cosine",
    ) -> list[tuple[SampleMetadata, float]]:
        """Find samples similar to the given sample.

        Uses the sample's embedding to find nearest neighbors in vector space,
        then retrieves full metadata for each result.

        Args:
            sample_id: Sample ID to find similar samples for
            limit: Maximum number of results to return
            exclude_self: Whether to exclude the query sample from results
            distance_metric: Distance calculation method ("cosine" or "l2")

        Returns:
            List of (sample, distance) tuples sorted by distance ascending

        Raises:
            SampleNotFoundError: If sample_id not found in vector store
            StorageError: On unexpected storage errors

        Example:
            >>> similar = storage.find_similar("smpl_abc123", limit=5)
            >>> len(similar) <= 5
            True
            >>> # First result is most similar
            >>> similar[0][1] < similar[1][1]
            True
        """
        # Get embedding for query sample
        embedding = self.vector_store.get_embedding(sample_id)
        if embedding is None:
            raise SampleNotFoundError(
                sample_id,
                details={"reason": "No embedding found for sample"}
            )

        # Find similar embeddings
        exclude_ids = [sample_id] if exclude_self else None

        # Request extra results to account for potentially missing metadata
        search_limit = limit * 2 if exclude_self else limit + 1

        similar_ids = self.vector_store.search_similar(
            embedding,
            limit=search_limit,
            exclude_ids=exclude_ids,
            distance_metric=distance_metric
        )

        # Retrieve metadata for each result
        results = []
        for sid, distance in similar_ids:
            metadata = self.sample_store.get(sid)
            if metadata:
                results.append((metadata, distance))
                if len(results) >= limit:
                    break

        return results

    def search_by_text_and_similarity(
        self,
        query_embedding: Optional[list[float]] = None,
        text_query: Optional[str] = None,
        filters: Optional[dict] = None,
        limit: int = 20,
        distance_metric: Literal["cosine", "l2"] = "cosine",
    ) -> list[SampleMetadata]:
        """Combined text search and similarity search.

        Can use vector similarity, text search, or both. When both are provided,
        results are intersected (samples must match both criteria).

        Args:
            query_embedding: Optional embedding vector for similarity search
            text_query: Optional text query for metadata search
            filters: Optional filters (instrument_type, bpm_min, bpm_max, key, mood)
            limit: Maximum number of results
            distance_metric: Distance metric for similarity search

        Returns:
            List of matching samples sorted by relevance

        Raises:
            ValueError: If neither query_embedding nor text_query provided
            StorageError: On unexpected storage errors

        Example:
            >>> # Similarity search only
            >>> results = storage.search_by_text_and_similarity(
            ...     query_embedding=[0.1] * 128,
            ...     limit=10
            ... )

            >>> # Text search only
            >>> results = storage.search_by_text_and_similarity(
            ...     text_query="kick",
            ...     filters={"bpm_min": 120.0, "bpm_max": 130.0},
            ...     limit=10
            ... )

            >>> # Combined search
            >>> results = storage.search_by_text_and_similarity(
            ...     query_embedding=[0.1] * 128,
            ...     text_query="kick",
            ...     filters={"key": "C"},
            ...     limit=10
            ... )
        """
        if query_embedding is None and text_query is None:
            raise ValueError("Must provide query_embedding or text_query or both")

        filters = filters or {}

        # Case 1: Only text search
        if query_embedding is None:
            return self.sample_store.search(
                query=text_query,
                instrument_type=filters.get("instrument_type"),
                bpm_min=filters.get("bpm_min"),
                bpm_max=filters.get("bpm_max"),
                key=filters.get("key"),
                mood=filters.get("mood"),
                limit=limit
            )

        # Case 2: Only similarity search
        if text_query is None and not filters:
            similar_ids = self.vector_store.search_similar(
                query_embedding,
                limit=limit,
                distance_metric=distance_metric
            )

            results = []
            for sample_id, _ in similar_ids:
                metadata = self.sample_store.get(sample_id)
                if metadata:
                    results.append(metadata)

            return results

        # Case 3: Combined search - get candidates from similarity, filter by text
        # Request more candidates to account for filtering
        search_limit = limit * 5

        similar_ids = self.vector_store.search_similar(
            query_embedding,
            limit=search_limit,
            distance_metric=distance_metric
        )

        # Get sample IDs from similarity search
        candidate_ids = {sample_id for sample_id, _ in similar_ids}

        # Get samples matching text filters
        text_results = self.sample_store.search(
            query=text_query,
            instrument_type=filters.get("instrument_type"),
            bpm_min=filters.get("bpm_min"),
            bpm_max=filters.get("bpm_max"),
            key=filters.get("key"),
            mood=filters.get("mood"),
            limit=search_limit
        )

        # Intersect: only samples that match both criteria
        results = []
        for sample in text_results:
            sample_id = sample.get("id")
            if sample_id is not None and sample_id in candidate_ids:
                results.append(sample)
                if len(results) >= limit:
                    break

        return results

    def get_sample(self, sample_id: str) -> Optional[SampleMetadata]:
        """Retrieve sample metadata by ID.

        Convenience wrapper around sample_store.get().

        Args:
            sample_id: Sample ID

        Returns:
            Sample metadata if found, None otherwise

        Example:
            >>> sample = storage.get_sample("smpl_abc123")
            >>> sample['file_path']
            "/samples/kick.wav"
        """
        return self.sample_store.get(sample_id)

    def get_embedding(self, sample_id: str) -> Optional[list[float]]:
        """Retrieve embedding by sample ID.

        Convenience wrapper around vector_store.get_embedding().

        Args:
            sample_id: Sample ID

        Returns:
            Embedding vector if found, None otherwise

        Example:
            >>> embedding = storage.get_embedding("smpl_abc123")
            >>> len(embedding)
            128
        """
        return self.vector_store.get_embedding(sample_id)

    def update_sample(self, sample_id: str, updates: dict) -> bool:
        """Update sample metadata fields.

        Only updates specified fields in metadata store. Does not affect embedding.
        To update embedding, use add_sample_with_embedding() with new embedding.

        Args:
            sample_id: Sample ID to update
            updates: Dictionary of field names and new values

        Returns:
            True if sample was updated, False if not found

        Example:
            >>> success = storage.update_sample(
            ...     "smpl_abc123",
            ...     {"bpm": 128.0, "key": "C#"}
            ... )
            >>> success
            True
        """
        return self.sample_store.update(sample_id, updates)

`init(db_path: Path, embeddings_path: Path)` ¶

Initialize unified storage with both stores.

Creates database and embedding directories if they don't exist.

Parameters:

Name	Type	Description	Default
`db_path`	`Path`	Path to SQLite database file	required
`embeddings_path`	`Path`	Path to LanceDB embeddings directory	required

Example

storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... )

Source code in src/audiomancer/storage/unified.py

def __init__(self, db_path: Path, embeddings_path: Path):
    """Initialize unified storage with both stores.

    Creates database and embedding directories if they don't exist.

    Args:
        db_path: Path to SQLite database file
        embeddings_path: Path to LanceDB embeddings directory

    Example:
        >>> storage = UnifiedSampleStorage(
        ...     db_path=Path("~/.audiomancer/samples.db"),
        ...     embeddings_path=Path("~/.audiomancer/embeddings")
        ... )
    """
    # Expand user paths
    db_path = db_path.expanduser().absolute()
    embeddings_path = embeddings_path.expanduser().absolute()

    # Create parent directories
    db_path.parent.mkdir(parents=True, exist_ok=True)
    embeddings_path.mkdir(parents=True, exist_ok=True)

    self.sample_store = SampleStore(str(db_path))
    self.vector_store = LanceDBVectorStore(embeddings_path)

`add_sample_with_embedding(sample: SampleMetadata, embedding: list[float]) -> str` ¶

Add sample and embedding atomically.

Both metadata and embedding are added together. If either operation fails, neither is persisted (atomic rollback).

Parameters:

Name	Type	Description	Default
`sample`	`SampleMetadata`	Complete sample metadata	required
`embedding`	`list[float]`	128-dimensional embedding vector	required

Returns:

Type	Description
`str`	Sample ID (format: "smpl_{hash[:8]}")

Raises:

Type	Description
`DuplicateSampleError`	If sample hash already exists in database
`ValueError`	If embedding dimension != 128
`StorageError`	On unexpected storage errors

Example

sample = SampleMetadata( ... id="smpl_abc123", ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) embedding = [0.1] * 128 sample_id = storage.add_sample_with_embedding(sample, embedding) storage.sample_store.get(sample_id) is not None True storage.vector_store.get_embedding(sample_id) is not None True

Source code in src/audiomancer/storage/unified.py

def add_sample_with_embedding(
    self,
    sample: SampleMetadata,
    embedding: list[float]
) -> str:
    """Add sample and embedding atomically.

    Both metadata and embedding are added together. If either operation fails,
    neither is persisted (atomic rollback).

    Args:
        sample: Complete sample metadata
        embedding: 128-dimensional embedding vector

    Returns:
        Sample ID (format: "smpl_{hash[:8]}")

    Raises:
        DuplicateSampleError: If sample hash already exists in database
        ValueError: If embedding dimension != 128
        StorageError: On unexpected storage errors

    Example:
        >>> sample = SampleMetadata(
        ...     id="smpl_abc123",
        ...     file_path="/samples/kick.wav",
        ...     file_hash="abc123",
        ...     duration_ms=250.5,
        ...     sample_rate=44100,
        ...     channels=1,
        ...     bit_depth=16,
        ...     file_size_bytes=44100,
        ... )
        >>> embedding = [0.1] * 128
        >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
        >>> storage.sample_store.get(sample_id) is not None
        True
        >>> storage.vector_store.get_embedding(sample_id) is not None
        True
    """
    sample_id: Optional[str] = None

    try:
        # Add sample metadata first
        sample_id = self.sample_store.add(sample)

        # Add embedding (if this fails, rollback sample)
        self.vector_store.add_embedding(sample_id, embedding)

        return sample_id

    except DuplicateSampleError:
        # Sample already exists, propagate error
        raise
    except ValueError as e:
        # Embedding validation failed, rollback sample if added
        if sample_id:
            try:
                self.sample_store.delete(sample_id)
            except Exception:
                # Ignore rollback errors, original error is more important
                pass
        raise StorageError(
            f"Invalid embedding: {str(e)}",
            details={"sample_id": sample.get("id"), "error": str(e)}
        )
    except Exception as e:
        # Unexpected error, rollback sample if added
        if sample_id:
            try:
                self.sample_store.delete(sample_id)
            except Exception:
                # Ignore rollback errors
                pass
        raise StorageError(
            f"Failed to add sample with embedding: {str(e)}",
            details={"sample_id": sample.get("id"), "error": str(e)}
        )

`add_samples_with_embeddings_batch(items: list[tuple[SampleMetadata, list[float]]]) -> list[str]` ¶

Add multiple samples and embeddings atomically.

All samples and embeddings are added together or none are added (atomic batch operation). On any failure, rolls back all changes.

Parameters:

Name	Type	Description	Default
`items`	`list[tuple[SampleMetadata, list[float]]]`	List of (sample, embedding) tuples	required

Returns:

Type	Description
`list[str]`	List of sample IDs in same order as input

Raises:

Type	Description
`DuplicateSampleError`	If any sample hash already exists
`ValueError`	If any embedding dimension != 128
`StorageError`	On unexpected storage errors

Example

items = [ ... (sample1, [0.1] * 128), ... (sample2, [0.2] * 128), ... (sample3, [0.3] * 128), ... ] sample_ids = storage.add_samples_with_embeddings_batch(items) len(sample_ids) 3

Source code in src/audiomancer/storage/unified.py

def add_samples_with_embeddings_batch(
    self,
    items: list[tuple[SampleMetadata, list[float]]]
) -> list[str]:
    """Add multiple samples and embeddings atomically.

    All samples and embeddings are added together or none are added
    (atomic batch operation). On any failure, rolls back all changes.

    Args:
        items: List of (sample, embedding) tuples

    Returns:
        List of sample IDs in same order as input

    Raises:
        DuplicateSampleError: If any sample hash already exists
        ValueError: If any embedding dimension != 128
        StorageError: On unexpected storage errors

    Example:
        >>> items = [
        ...     (sample1, [0.1] * 128),
        ...     (sample2, [0.2] * 128),
        ...     (sample3, [0.3] * 128),
        ... ]
        >>> sample_ids = storage.add_samples_with_embeddings_batch(items)
        >>> len(sample_ids)
        3
    """
    if not items:
        return []

    sample_ids: list[str] = []

    try:
        # Validate all embeddings first (fail fast)
        for sample, embedding in items:
            if len(embedding) != LanceDBVectorStore.EMBEDDING_DIM:
                raise ValueError(
                    f"Embedding dimension must be {LanceDBVectorStore.EMBEDDING_DIM}, "
                    f"got {len(embedding)} for sample {sample.get('id')}"
                )

        # Add all samples first
        samples = [sample for sample, _ in items]
        sample_ids = self.sample_store.add_batch(samples)

        # Add all embeddings
        embedding_items = [
            (sample_id, embedding)
            for sample_id, (_, embedding) in zip(sample_ids, items)
        ]
        self.vector_store.add_embeddings_batch(embedding_items)

        return sample_ids

    except DuplicateSampleError:
        # Sample already exists, propagate error
        raise
    except ValueError as e:
        # Embedding validation failed, rollback samples if added
        if sample_ids:
            for sample_id in sample_ids:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
        raise StorageError(
            f"Invalid embedding in batch: {str(e)}",
            details={"batch_size": len(items), "error": str(e)}
        )
    except Exception as e:
        # Unexpected error, rollback samples if added
        if sample_ids:
            for sample_id in sample_ids:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
        raise StorageError(
            f"Failed to add batch with embeddings: {str(e)}",
            details={"batch_size": len(items), "error": str(e)}
        )

`delete_sample(sample_id: str) -> bool` ¶

Delete sample and its embedding.

Removes from both metadata and embedding stores. If either delete fails, the operation continues (best effort cleanup).

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to delete	required

Returns:

Type	Description
`bool`	True if sample was deleted from metadata store, False if not found

Example

success = storage.delete_sample("smpl_abc123") success True storage.sample_store.get("smpl_abc123") None storage.vector_store.get_embedding("smpl_abc123") None

Source code in src/audiomancer/storage/unified.py

def delete_sample(self, sample_id: str) -> bool:
    """Delete sample and its embedding.

    Removes from both metadata and embedding stores. If either delete fails,
    the operation continues (best effort cleanup).

    Args:
        sample_id: Sample ID to delete

    Returns:
        True if sample was deleted from metadata store, False if not found

    Example:
        >>> success = storage.delete_sample("smpl_abc123")
        >>> success
        True
        >>> storage.sample_store.get("smpl_abc123")
        None
        >>> storage.vector_store.get_embedding("smpl_abc123")
        None
    """
    # Delete from both stores (best effort)
    sample_deleted = self.sample_store.delete(sample_id)

    # Always try to delete embedding even if sample wasn't found
    # (orphaned embeddings should be cleaned up)
    try:
        self.vector_store.delete_embedding(sample_id)
    except Exception:
        # Ignore embedding deletion errors
        pass

    return sample_deleted

`find_similar(sample_id: str, limit: int = 10, exclude_self: bool = True, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[SampleMetadata, float]]` ¶

Find samples similar to the given sample.

Uses the sample's embedding to find nearest neighbors in vector space, then retrieves full metadata for each result.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to find similar samples for	required
`limit`	`int`	Maximum number of results to return	`10`
`exclude_self`	`bool`	Whether to exclude the query sample from results	`True`
`distance_metric`	`Literal['cosine', 'l2']`	Distance calculation method ("cosine" or "l2")	`'cosine'`

Returns:

Type	Description
`list[tuple[SampleMetadata, float]]`	List of (sample, distance) tuples sorted by distance ascending

Raises:

Type	Description
`SampleNotFoundError`	If sample_id not found in vector store
`StorageError`	On unexpected storage errors

Example

similar = storage.find_similar("smpl_abc123", limit=5) len(similar) <= 5 True

First result is most similar¶

similar[0][1] < similar[1][] True

Source code in src/audiomancer/storage/unified.py

def find_similar(
    self,
    sample_id: str,
    limit: int = 10,
    exclude_self: bool = True,
    distance_metric: Literal["cosine", "l2"] = "cosine",
) -> list[tuple[SampleMetadata, float]]:
    """Find samples similar to the given sample.

    Uses the sample's embedding to find nearest neighbors in vector space,
    then retrieves full metadata for each result.

    Args:
        sample_id: Sample ID to find similar samples for
        limit: Maximum number of results to return
        exclude_self: Whether to exclude the query sample from results
        distance_metric: Distance calculation method ("cosine" or "l2")

    Returns:
        List of (sample, distance) tuples sorted by distance ascending

    Raises:
        SampleNotFoundError: If sample_id not found in vector store
        StorageError: On unexpected storage errors

    Example:
        >>> similar = storage.find_similar("smpl_abc123", limit=5)
        >>> len(similar) <= 5
        True
        >>> # First result is most similar
        >>> similar[0][1] < similar[1][1]
        True
    """
    # Get embedding for query sample
    embedding = self.vector_store.get_embedding(sample_id)
    if embedding is None:
        raise SampleNotFoundError(
            sample_id,
            details={"reason": "No embedding found for sample"}
        )

    # Find similar embeddings
    exclude_ids = [sample_id] if exclude_self else None

    # Request extra results to account for potentially missing metadata
    search_limit = limit * 2 if exclude_self else limit + 1

    similar_ids = self.vector_store.search_similar(
        embedding,
        limit=search_limit,
        exclude_ids=exclude_ids,
        distance_metric=distance_metric
    )

    # Retrieve metadata for each result
    results = []
    for sid, distance in similar_ids:
        metadata = self.sample_store.get(sid)
        if metadata:
            results.append((metadata, distance))
            if len(results) >= limit:
                break

    return results

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

Retrieve embedding by sample ID.

Convenience wrapper around vector_store.get_embedding().

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID	required

Returns:

Type	Description
`Optional[list[float]]`	Embedding vector if found, None otherwise

Example

embedding = storage.get_embedding("smpl_abc123") len(embedding) 128

Source code in src/audiomancer/storage/unified.py

def get_embedding(self, sample_id: str) -> Optional[list[float]]:
    """Retrieve embedding by sample ID.

    Convenience wrapper around vector_store.get_embedding().

    Args:
        sample_id: Sample ID

    Returns:
        Embedding vector if found, None otherwise

    Example:
        >>> embedding = storage.get_embedding("smpl_abc123")
        >>> len(embedding)
        128
    """
    return self.vector_store.get_embedding(sample_id)

`get_sample(sample_id: str) -> Optional[SampleMetadata]` ¶

Retrieve sample metadata by ID.

Convenience wrapper around sample_store.get().

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID	required

Returns:

Type	Description
`Optional[SampleMetadata]`	Sample metadata if found, None otherwise

Example

sample = storage.get_sample("smpl_abc123") sample['file_path'] "/samples/kick.wav"

Source code in src/audiomancer/storage/unified.py

def get_sample(self, sample_id: str) -> Optional[SampleMetadata]:
    """Retrieve sample metadata by ID.

    Convenience wrapper around sample_store.get().

    Args:
        sample_id: Sample ID

    Returns:
        Sample metadata if found, None otherwise

    Example:
        >>> sample = storage.get_sample("smpl_abc123")
        >>> sample['file_path']
        "/samples/kick.wav"
    """
    return self.sample_store.get(sample_id)

`search_by_text_and_similarity(query_embedding: Optional[list[float]] = None, text_query: Optional[str] = None, filters: Optional[dict] = None, limit: int = 20, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[SampleMetadata]` ¶

Combined text search and similarity search.

Can use vector similarity, text search, or both. When both are provided, results are intersected (samples must match both criteria).

Parameters:

Name	Type	Description	Default
`query_embedding`	`Optional[list[float]]`	Optional embedding vector for similarity search	`None`
`text_query`	`Optional[str]`	Optional text query for metadata search	`None`
`filters`	`Optional[dict]`	Optional filters (instrument_type, bpm_min, bpm_max, key, mood)	`None`
`limit`	`int`	Maximum number of results	`20`
`distance_metric`	`Literal['cosine', 'l2']`	Distance metric for similarity search	`'cosine'`

Returns:

Type	Description
`list[SampleMetadata]`	List of matching samples sorted by relevance

Raises:

Type	Description
`ValueError`	If neither query_embedding nor text_query provided
`StorageError`	On unexpected storage errors

Example

Similarity search only¶

results = storage.search_by_text_and_similarity( ... query_embedding=[0.1] * 128, ... limit=10 ... )

Text search only¶

results = storage.search_by_text_and_similarity( ... text_query="kick", ... filters={"bpm_min": 120.0, "bpm_max": 130.0}, ... limit=10 ... )

Combined search¶

results = storage.search_by_text_and_similarity( ... query_embedding=[0.1] * 128, ... text_query="kick", ... filters={"key": "C"}, ... limit=10 ... )

Source code in src/audiomancer/storage/unified.py

def search_by_text_and_similarity(
    self,
    query_embedding: Optional[list[float]] = None,
    text_query: Optional[str] = None,
    filters: Optional[dict] = None,
    limit: int = 20,
    distance_metric: Literal["cosine", "l2"] = "cosine",
) -> list[SampleMetadata]:
    """Combined text search and similarity search.

    Can use vector similarity, text search, or both. When both are provided,
    results are intersected (samples must match both criteria).

    Args:
        query_embedding: Optional embedding vector for similarity search
        text_query: Optional text query for metadata search
        filters: Optional filters (instrument_type, bpm_min, bpm_max, key, mood)
        limit: Maximum number of results
        distance_metric: Distance metric for similarity search

    Returns:
        List of matching samples sorted by relevance

    Raises:
        ValueError: If neither query_embedding nor text_query provided
        StorageError: On unexpected storage errors

    Example:
        >>> # Similarity search only
        >>> results = storage.search_by_text_and_similarity(
        ...     query_embedding=[0.1] * 128,
        ...     limit=10
        ... )

        >>> # Text search only
        >>> results = storage.search_by_text_and_similarity(
        ...     text_query="kick",
        ...     filters={"bpm_min": 120.0, "bpm_max": 130.0},
        ...     limit=10
        ... )

        >>> # Combined search
        >>> results = storage.search_by_text_and_similarity(
        ...     query_embedding=[0.1] * 128,
        ...     text_query="kick",
        ...     filters={"key": "C"},
        ...     limit=10
        ... )
    """
    if query_embedding is None and text_query is None:
        raise ValueError("Must provide query_embedding or text_query or both")

    filters = filters or {}

    # Case 1: Only text search
    if query_embedding is None:
        return self.sample_store.search(
            query=text_query,
            instrument_type=filters.get("instrument_type"),
            bpm_min=filters.get("bpm_min"),
            bpm_max=filters.get("bpm_max"),
            key=filters.get("key"),
            mood=filters.get("mood"),
            limit=limit
        )

    # Case 2: Only similarity search
    if text_query is None and not filters:
        similar_ids = self.vector_store.search_similar(
            query_embedding,
            limit=limit,
            distance_metric=distance_metric
        )

        results = []
        for sample_id, _ in similar_ids:
            metadata = self.sample_store.get(sample_id)
            if metadata:
                results.append(metadata)

        return results

    # Case 3: Combined search - get candidates from similarity, filter by text
    # Request more candidates to account for filtering
    search_limit = limit * 5

    similar_ids = self.vector_store.search_similar(
        query_embedding,
        limit=search_limit,
        distance_metric=distance_metric
    )

    # Get sample IDs from similarity search
    candidate_ids = {sample_id for sample_id, _ in similar_ids}

    # Get samples matching text filters
    text_results = self.sample_store.search(
        query=text_query,
        instrument_type=filters.get("instrument_type"),
        bpm_min=filters.get("bpm_min"),
        bpm_max=filters.get("bpm_max"),
        key=filters.get("key"),
        mood=filters.get("mood"),
        limit=search_limit
    )

    # Intersect: only samples that match both criteria
    results = []
    for sample in text_results:
        sample_id = sample.get("id")
        if sample_id is not None and sample_id in candidate_ids:
            results.append(sample)
            if len(results) >= limit:
                break

    return results

`update_sample(sample_id: str, updates: dict) -> bool` ¶

Update sample metadata fields.

Only updates specified fields in metadata store. Does not affect embedding. To update embedding, use add_sample_with_embedding() with new embedding.

Parameters:

Name	Type	Description	Default
`sample_id`	`str`	Sample ID to update	required
`updates`	`dict`	Dictionary of field names and new values	required

Returns:

Type	Description
`bool`	True if sample was updated, False if not found

Example

success = storage.update_sample( ... "smpl_abc123", ... {"bpm": 128.0, "key": "C#"} ... ) success True

Source code in src/audiomancer/storage/unified.py

def update_sample(self, sample_id: str, updates: dict) -> bool:
    """Update sample metadata fields.

    Only updates specified fields in metadata store. Does not affect embedding.
    To update embedding, use add_sample_with_embedding() with new embedding.

    Args:
        sample_id: Sample ID to update
        updates: Dictionary of field names and new values

    Returns:
        True if sample was updated, False if not found

    Example:
        >>> success = storage.update_sample(
        ...     "smpl_abc123",
        ...     {"bpm": 128.0, "key": "C#"}
        ... )
        >>> success
        True
    """
    return self.sample_store.update(sample_id, updates)

Storage API Reference¶

Overview¶

storage ¶

__all__ = ['SampleMetadata', 'SampleStore', 'VectorStore', 'LanceDBVectorStore', 'SynthStore'] module-attribute ¶

LanceDBVectorStore ¶

__init__(db_path: Path) -> None ¶

add_embedding(sample_id: str, embedding: list[float]) -> None ¶

add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None ¶

delete_embedding(sample_id: str) -> bool ¶

get_embedding(sample_id: str) -> Optional[list[float]] ¶

search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]] ¶

SampleMetadata ¶

SampleStore ¶

add(sample: SampleMetadata) -> str ¶

add_batch(samples: list[SampleMetadata]) -> list[str] ¶

count(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None) -> int ¶

delete(sample_id: str) -> bool ¶

get(sample_id: str) -> Optional[SampleMetadata] ¶

get_by_hash(file_hash: str) -> Optional[SampleMetadata] ¶

get_by_path(file_path: str) -> Optional[SampleMetadata] ¶

search(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SampleMetadata] ¶

Search for kicks between 120-130 BPM¶

Pagination - get second page¶

update(sample_id: str, updates: dict) -> bool ¶

SynthStore ¶

__init__(db_path: str) ¶

add(synth: dict) -> str ¶

add_lineage(synth_id: str, parent_synth_id: str, contribution_weight: float = 0.5) -> None ¶

count(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None) -> int ¶

delete(synth_id: str) -> bool ¶

get(synth_id: str) -> Optional[dict] ¶

get_by_hash(file_hash: str) -> Optional[dict] ¶

get_by_name(name: str) -> Optional[dict] ¶

get_by_path(file_path: str) -> Optional[dict] ¶

get_lineage(synth_id: str) -> list[dict] ¶

list_all(limit: int = 100) -> list[dict] ¶

search(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None, limit: int = 50, offset: int = 0) -> list[dict] ¶

Search for bass synths¶

update(synth_id: str, updates: dict) -> bool ¶

VectorStore ¶

add_embedding(sample_id: str, embedding: list[float]) -> None ¶

add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None ¶

delete_embedding(sample_id: str) -> bool ¶

get_embedding(sample_id: str) -> Optional[list[float]] ¶

search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]] ¶

Pagination - get next page¶

Exclude already-used samples¶

Unified Storage¶

unified ¶

UnifiedSampleStorage ¶

__init__(db_path: Path, embeddings_path: Path) ¶

add_sample_with_embedding(sample: SampleMetadata, embedding: list[float]) -> str ¶

add_samples_with_embeddings_batch(items: list[tuple[SampleMetadata, list[float]]]) -> list[str] ¶

delete_sample(sample_id: str) -> bool ¶

find_similar(sample_id: str, limit: int = 10, exclude_self: bool = True, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[SampleMetadata, float]] ¶

First result is most similar¶

get_embedding(sample_id: str) -> Optional[list[float]] ¶

get_sample(sample_id: str) -> Optional[SampleMetadata] ¶

search_by_text_and_similarity(query_embedding: Optional[list[float]] = None, text_query: Optional[str] = None, filters: Optional[dict] = None, limit: int = 20, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[SampleMetadata] ¶

Similarity search only¶

Text search only¶

Combined search¶

update_sample(sample_id: str, updates: dict) -> bool ¶

`storage` ¶

`all = ['SampleMetadata', 'SampleStore', 'VectorStore', 'LanceDBVectorStore', 'SynthStore']` `module-attribute` ¶

`LanceDBVectorStore` ¶

`init(db_path: Path) -> None` ¶

`add_embedding(sample_id: str, embedding: list[float]) -> None` ¶

`add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None` ¶

`delete_embedding(sample_id: str) -> bool` ¶

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

`search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]` ¶

`SampleMetadata` ¶

`SampleStore` ¶

`add(sample: SampleMetadata) -> str` ¶

`add_batch(samples: list[SampleMetadata]) -> list[str]` ¶

`count(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None) -> int` ¶

`delete(sample_id: str) -> bool` ¶

`get(sample_id: str) -> Optional[SampleMetadata]` ¶

`get_by_hash(file_hash: str) -> Optional[SampleMetadata]` ¶

`get_by_path(file_path: str) -> Optional[SampleMetadata]` ¶

`search(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SampleMetadata]` ¶

`update(sample_id: str, updates: dict) -> bool` ¶

`SynthStore` ¶

`init(db_path: str)` ¶

`add(synth: dict) -> str` ¶

`add_lineage(synth_id: str, parent_synth_id: str, contribution_weight: float = 0.5) -> None` ¶

`count(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None) -> int` ¶

`delete(synth_id: str) -> bool` ¶

`get(synth_id: str) -> Optional[dict]` ¶

`get_by_hash(file_hash: str) -> Optional[dict]` ¶

`get_by_name(name: str) -> Optional[dict]` ¶

`get_by_path(file_path: str) -> Optional[dict]` ¶

`get_lineage(synth_id: str) -> list[dict]` ¶

`list_all(limit: int = 100) -> list[dict]` ¶

`search(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None, limit: int = 50, offset: int = 0) -> list[dict]` ¶

`update(synth_id: str, updates: dict) -> bool` ¶

`VectorStore` ¶

`add_embedding(sample_id: str, embedding: list[float]) -> None` ¶

`add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None` ¶

`delete_embedding(sample_id: str) -> bool` ¶

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

`search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]` ¶

`unified` ¶

`UnifiedSampleStorage` ¶

`init(db_path: Path, embeddings_path: Path)` ¶

`add_sample_with_embedding(sample: SampleMetadata, embedding: list[float]) -> str` ¶

`add_samples_with_embeddings_batch(items: list[tuple[SampleMetadata, list[float]]]) -> list[str]` ¶

`delete_sample(sample_id: str) -> bool` ¶

`find_similar(sample_id: str, limit: int = 10, exclude_self: bool = True, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[SampleMetadata, float]]` ¶

`get_embedding(sample_id: str) -> Optional[list[float]]` ¶

`get_sample(sample_id: str) -> Optional[SampleMetadata]` ¶

`search_by_text_and_similarity(query_embedding: Optional[list[float]] = None, text_query: Optional[str] = None, filters: Optional[dict] = None, limit: int = 20, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[SampleMetadata]` ¶

`update_sample(sample_id: str, updates: dict) -> bool` ¶