Skip to content

Storage API Reference

The audiomancer.storage module provides database and vector storage functionality.

Overview

storage

Storage layer for audiomancer.

Provides interfaces and implementations for sample and vector storage.

__all__ = ['SampleMetadata', 'SampleStore', 'VectorStore', 'LanceDBVectorStore', 'SynthStore'] module-attribute

LanceDBVectorStore

LanceDB-backed vector storage for audio embeddings.

Provides efficient storage and similarity search for 128-dimensional audio embeddings using LanceDB's vector index capabilities.

The embeddings table has schema: - id: string (sample ID) - embedding: fixed_size_list[float32, 128] (audio embedding) - created_at: timestamp (insertion time)

Attributes:

Name Type Description
db_path

Path to LanceDB database directory

table_name

Name of embeddings table (default: "embeddings")

embedding_dim

Required embedding dimension (always 128)

__init__(db_path: Path) -> None

Initialize LanceDB at given path.

Creates database directory if it doesn't exist. Table is created lazily on first add operation.

Parameters:

Name Type Description Default
db_path Path

Path to LanceDB database directory

required
Example

store = LanceDBVectorStore(Path("./embeddings")) store.db_path PosixPath('./embeddings')

add_embedding(sample_id: str, embedding: list[float]) -> None

Store 128-dim embedding for sample.

If sample_id already exists, replaces the existing embedding.

Parameters:

Name Type Description Default
sample_id str

Sample ID (format: "smpl_{hash[:8]}")

required
embedding list[float]

Vector embedding (dimension=128)

required

Raises:

Type Description
ValueError

If embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) embedding = [0.1, 0.2] + [0.0] * 126 # 128 dims store.add_embedding("smpl_abc123", embedding) retrieved = store.get_embedding("smpl_abc123") len(retrieved) 128

add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None

Add multiple embeddings efficiently.

Validates all dimensions first, then batch inserts. If any embedding has wrong dimension, entire batch fails atomically.

Parameters:

Name Type Description Default
items list[tuple[str, list[float]]]

List of (sample_id, embedding) tuples

required

Raises:

Type Description
ValueError

If any embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) items = [ ... ("smpl_abc123", [0.1] * 128), ... ("smpl_def456", [0.2] * 128), ... ("smpl_ghi789", [0.3] * 128), ... ] store.add_embeddings_batch(items) len(store.search_similar([0.1] * 128, limit=10)) 3

delete_embedding(sample_id: str) -> bool

Delete embedding. Returns True if deleted, False if not found.

Parameters:

Name Type Description Default
sample_id str

Sample ID to delete

required

Returns:

Type Description
bool

True if embedding was deleted, False if not found

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) store.delete_embedding("smpl_abc123") True store.delete_embedding("smpl_abc123") # Already deleted False

get_embedding(sample_id: str) -> Optional[list[float]]

Retrieve embedding by sample ID.

Parameters:

Name Type Description Default
sample_id str

Sample ID

required

Returns:

Type Description
Optional[list[float]]

Embedding vector if found, None otherwise

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) embedding = store.get_embedding("smpl_abc123") len(embedding) 128 store.get_embedding("smpl_nonexistent") None

search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]

Find similar samples by embedding distance.

Returns samples sorted by distance (ascending = most similar first). Uses ANN (Approximate Nearest Neighbors) for efficient search.

Distance metrics: - cosine: Cosine distance (0 = identical, 2 = opposite) - l2: Euclidean distance (lower = more similar)

Parameters:

Name Type Description Default
embedding list[float]

Query vector (dimension=128)

required
limit int

Maximum results to return

10
offset int

Number of results to skip (for pagination)

0
exclude_ids Optional[list[str]]

Sample IDs to exclude from results

None
distance_metric Literal['cosine', 'l2']

Distance calculation method

'cosine'

Returns:

Type Description
list[tuple[str, float]]

List of (sample_id, distance) sorted by distance ascending

Raises:

Type Description
ValueError

If embedding dimension != 128

Example

store = LanceDBVectorStore(Path("./embeddings")) store.add_embedding("smpl_abc123", [0.1] * 128) store.add_embedding("smpl_def456", [0.2] * 128) results = store.search_similar([0.1] * 128, limit=2) results[0][] # Most similar ID 'smpl_abc123' results[0][1] < results[1][] # Distance ascending True

SampleMetadata

Bases: TypedDict

Metadata for an audio sample.

This TypedDict describes all fields stored for each sample. Fields marked as required (not NotRequired) must be present when creating a sample.

Example

sample = SampleMetadata( ... id="smpl_abc12345", ... file_path="/path/to/kick.wav", ... file_hash="abc123def456", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... spectral_centroid=1500.0, ... spectral_bandwidth=800.0, ... spectral_rolloff=5000.0, ... zero_crossing_rate=0.15, ... rms_energy=0.7, ... dynamic_range=40.0, ... bpm=125.0, ... bpm_confidence=0.95, ... is_loop=True, ... key="C", ... key_confidence=0.88, ... tuning_frequency=440.0, ... pitch_salience=0.8, ... instrument_type="kick", ... instrument_confidence=0.92, ... mood=["energetic", "dark"], ... genre_tags=["techno", "industrial"], ... created_at=datetime.now(), ... updated_at=datetime.now(), ... )

SampleStore

Bases: Protocol

Interface for sample storage operations.

This Protocol defines the contract for storing and retrieving sample metadata. Implementations must provide CRUD operations, search, and batch operations.

add(sample: SampleMetadata) -> str

Add sample to database.

Parameters:

Name Type Description Default
sample SampleMetadata

Complete sample metadata to store

required

Returns:

Type Description
str

Sample ID (format: "smpl_{hash[:8]}")

Raises:

Type Description
DuplicateSampleError

If sample with same file_hash already exists

Example

sample = SampleMetadata( ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) sample_id = store.add(sample) sample_id "smpl_abc123"

add_batch(samples: list[SampleMetadata]) -> list[str]

Add multiple samples atomically in a single transaction.

All samples are added or none are (atomic operation). On duplicate, rolls back entire batch without partial commits.

Parameters:

Name Type Description Default
samples list[SampleMetadata]

List of sample metadata to store

required

Returns:

Type Description
list[str]

List of sample IDs in same order as input

Raises:

Type Description
DuplicateSampleError

On first duplicate (no partial commits)

Example

samples = [sample1, sample2, sample3] ids = store.add_batch(samples) len(ids) 3 ids[0] "smpl_abc123"

count(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None) -> int

Count samples matching filters.

Same filter logic as search(), but returns count instead of results. Useful for calculating pagination.

Parameters:

Name Type Description Default
query Optional[str]

Text search in file path

None
instrument_type Optional[str]

Filter by instrument category

None
bpm_min Optional[float]

Minimum BPM (inclusive)

None
bpm_max Optional[float]

Maximum BPM (inclusive)

None
key Optional[str]

Musical key filter

None
mood Optional[list[str]]

Mood tags (matches if ANY tag present)

None

Returns:

Type Description
int

Number of matching samples

Example

total = store.count(instrument_type="kick") total 234 pages = (total + 9) // 10 # Calculate pages (10 per page) pages 24

delete(sample_id: str) -> bool

Delete sample from database.

Parameters:

Name Type Description Default
sample_id str

Sample ID to delete

required

Returns:

Type Description
bool

True if sample was deleted, False if not found

Example

success = store.delete("smpl_abc123") success True store.delete("smpl_nonexistent") False

get(sample_id: str) -> Optional[SampleMetadata]

Retrieve sample by ID.

Parameters:

Name Type Description Default
sample_id str

Sample ID (format: "smpl_{hash[:8]}")

required

Returns:

Type Description
Optional[SampleMetadata]

Sample metadata if found, None otherwise

Example

sample = store.get("smpl_abc123") sample['file_path'] "/samples/kick.wav" store.get("smpl_nonexistent") None

get_by_hash(file_hash: str) -> Optional[SampleMetadata]

Retrieve sample by file hash.

Used for deduplication - check if sample already exists before adding.

Parameters:

Name Type Description Default
file_hash str

SHA256 hash of audio file

required

Returns:

Type Description
Optional[SampleMetadata]

Sample metadata if found, None otherwise

Example

sample = store.get_by_hash("abc123") sample is not None True

get_by_path(file_path: str) -> Optional[SampleMetadata]

Retrieve sample by file path.

Parameters:

Name Type Description Default
file_path str

Absolute path to audio file

required

Returns:

Type Description
Optional[SampleMetadata]

Sample metadata if found, None otherwise

Example

sample = store.get_by_path("/samples/kick.wav") sample['id'] "smpl_abc123"

search(query: Optional[str] = None, instrument_type: Optional[str] = None, bpm_min: Optional[float] = None, bpm_max: Optional[float] = None, key: Optional[str] = None, mood: Optional[list[str]] = None, limit: int = 50, offset: int = 0) -> list[SampleMetadata]

Search samples with filters and pagination.

All filters are combined with AND logic. Text query searches file_path.

Parameters:

Name Type Description Default
query Optional[str]

Text search in file path

None
instrument_type Optional[str]

Filter by instrument category

None
bpm_min Optional[float]

Minimum BPM (inclusive)

None
bpm_max Optional[float]

Maximum BPM (inclusive)

None
key Optional[str]

Musical key filter

None
mood Optional[list[str]]

Mood tags (matches if ANY tag present)

None
limit int

Maximum results to return

50
offset int

Number of results to skip (for pagination)

0

Returns:

Type Description
list[SampleMetadata]

List of matching samples (up to limit)

Example
Search for kicks between 120-130 BPM

results = store.search( ... instrument_type="kick", ... bpm_min=120.0, ... bpm_max=130.0, ... limit=10, ... offset=0, ... ) len(results) <= 10 True

Pagination - get second page

page2 = store.search( ... instrument_type="kick", ... limit=10, ... offset=10, ... )

update(sample_id: str, updates: dict) -> bool

Update sample fields.

Only updates specified fields, leaving others unchanged.

Parameters:

Name Type Description Default
sample_id str

Sample ID to update

required
updates dict

Dictionary of field names and new values

required

Returns:

Type Description
bool

True if sample was updated, False if not found

Example

success = store.update( ... "smpl_abc123", ... {"bpm": 128.0, "key": "C#"} ... ) success True store.update("smpl_nonexistent", {"bpm": 120}) False

SynthStore

SQLite implementation of synth storage.

Provides atomic CRUD operations for synth metadata with fail-fast error handling.

Example

store = SynthStore("~/.audiomancer/samples.db") synth = { ... "id": "synth_abc123", ... "name": "tb303", ... "file_path": "/synths/tb303.scd", ... "file_hash": "abc123", ... "source_code": "SynthDef(...)", ... "controls": [{"name": "freq", "default": 440.0}], ... "characteristics": {"num_channels": 2, "has_gate": True}, ... } synth_id = store.add(synth) retrieved = store.get(synth_id) retrieved['name'] 'tb303'

__init__(db_path: str)

Initialize store with database connection.

Parameters:

Name Type Description Default
db_path str

Path to SQLite database file (will be created if missing)

required
Example

store = SynthStore("~/.audiomancer/samples.db") store = SynthStore(":memory:") # In-memory for testing

add(synth: dict) -> str

Add synth to database.

Parameters:

Name Type Description Default
synth dict

Complete synth metadata to store

required

Returns:

Type Description
str

Synth ID (format: "synth_{hash[:8]}")

Raises:

Type Description
StorageError

If synth with same name or hash already exists

Example

synth = { ... "id": "synth_abc123", ... "name": "tb303", ... "file_path": "/synths/tb303.scd", ... "file_hash": "abc123", ... "source_code": "SynthDef(...)", ... "controls": [], ... } synth_id = store.add(synth) synth_id "synth_abc123"

add_lineage(synth_id: str, parent_synth_id: str, contribution_weight: float = 0.5) -> None

Record synth lineage (parent-child relationship).

Used to track synth evolution when one synth is derived from another.

Parameters:

Name Type Description Default
synth_id str

Child synth ID

required
parent_synth_id str

Parent synth ID

required
contribution_weight float

How much parent contributed (0-1)

0.5

Raises:

Type Description
StorageError

If synths don't exist or lineage already recorded

Example

store.add_lineage("synth_new", "synth_original", 0.8)

count(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None) -> int

Count synths matching filters.

Same filter logic as search(), but returns count instead of results.

Parameters:

Name Type Description Default
query Optional[str]

Text search in name or file path

None
category Optional[str]

Filter by category

None
has_gate Optional[bool]

Filter by gate parameter presence

None

Returns:

Type Description
int

Number of matching synths

Example

total = store.count(category="bass") total 15

delete(synth_id: str) -> bool

Delete synth from database.

Parameters:

Name Type Description Default
synth_id str

Synth ID to delete

required

Returns:

Type Description
bool

True if synth was deleted, False if not found

Example

success = store.delete("synth_abc123") success True

get(synth_id: str) -> Optional[dict]

Retrieve synth by ID.

Parameters:

Name Type Description Default
synth_id str

Synth ID (format: "synth_{hash[:8]}")

required

Returns:

Type Description
Optional[dict]

Synth metadata if found, None otherwise

Example

synth = store.get("synth_abc123") synth['name'] 'tb303' store.get("synth_nonexistent") None

get_by_hash(file_hash: str) -> Optional[dict]

Retrieve synth by file hash.

Used for deduplication - check if synth already exists before adding.

Parameters:

Name Type Description Default
file_hash str

SHA256 hash of source code

required

Returns:

Type Description
Optional[dict]

Synth metadata if found, None otherwise

Example

synth = store.get_by_hash("abc123") synth is not None True

get_by_name(name: str) -> Optional[dict]

Retrieve synth by name.

Parameters:

Name Type Description Default
name str

Synth name (e.g., "tb303")

required

Returns:

Type Description
Optional[dict]

Synth metadata if found, None otherwise

Example

synth = store.get_by_name("tb303") synth['id'] "synth_abc123"

get_by_path(file_path: str) -> Optional[dict]

Retrieve synth by file path.

Parameters:

Name Type Description Default
file_path str

Absolute path to .scd file

required

Returns:

Type Description
Optional[dict]

Synth metadata if found, None otherwise

Example

synth = store.get_by_path("/synths/tb303.scd") synth['name'] 'tb303'

get_lineage(synth_id: str) -> list[dict]

Get parent synths (lineage) for a synth.

Parameters:

Name Type Description Default
synth_id str

Synth ID

required

Returns:

Type Description
list[dict]

List of parent synth records with contribution weights

Example

parents = store.get_lineage("synth_new") parents[0]['parent_synth_id'] 'synth_original' parents[0]['contribution_weight'] 0.8

list_all(limit: int = 100) -> list[dict]

List all synths with optional limit.

Convenience method that calls search() with no filters.

Parameters:

Name Type Description Default
limit int

Maximum number of synths to return

100

Returns:

Type Description
list[dict]

List of synth metadata dictionaries

Example

synths = store.list_all(limit=50) len(synths) <= 50 True

search(query: Optional[str] = None, category: Optional[str] = None, has_gate: Optional[bool] = None, limit: int = 50, offset: int = 0) -> list[dict]

Search synths with filters and pagination.

All filters are combined with AND logic. Text query searches name and file_path.

Parameters:

Name Type Description Default
query Optional[str]

Text search in name or file path

None
category Optional[str]

Filter by category (bass, lead, pad, drum, fx)

None
has_gate Optional[bool]

Filter by gate parameter presence

None
limit int

Maximum results to return

50
offset int

Number of results to skip (for pagination)

0

Returns:

Type Description
list[dict]

List of matching synths (up to limit)

Example
Search for bass synths

results = store.search(category="bass", limit=10) len(results) <= 10 True

update(synth_id: str, updates: dict) -> bool

Update synth fields.

Only updates specified fields, leaving others unchanged. Automatically updates the updated_at timestamp.

Parameters:

Name Type Description Default
synth_id str

Synth ID to update

required
updates dict

Dictionary of field names and new values

required

Returns:

Type Description
bool

True if synth was updated, False if not found

Example

success = store.update( ... "synth_abc123", ... {"characteristics": {"num_channels": 2}} ... ) success True

VectorStore

Bases: Protocol

Interface for embedding vector operations.

Stores and searches sample embeddings for semantic similarity search. Uses cosine distance metric (0 = identical, 2 = opposite).

add_embedding(sample_id: str, embedding: list[float]) -> None

Store embedding vector for a sample.

Parameters:

Name Type Description Default
sample_id str

Sample ID (must exist in SampleStore)

required
embedding list[float]

Vector embedding (dimension=128)

required

Raises:

Type Description
ValueError

If embedding dimension != 128

Example

embedding = [0.1, 0.2, ..., 0.5] # 128 dimensions store.add_embedding("smpl_abc123", embedding)

add_embeddings_batch(items: list[tuple[str, list[float]]]) -> None

Add multiple embeddings efficiently in batch.

Optimized for bulk insertion (10-100x faster than individual adds).

Parameters:

Name Type Description Default
items list[tuple[str, list[float]]]

List of (sample_id, embedding) tuples

required

Raises:

Type Description
ValueError

If any embedding dimension != 128

Example

items = [ ... ("smpl_abc123", [0.1, 0.2, ..., 0.5]), ... ("smpl_def456", [0.3, 0.4, ..., 0.6]), ... ("smpl_ghi789", [0.2, 0.3, ..., 0.4]), ... ] store.add_embeddings_batch(items)

delete_embedding(sample_id: str) -> bool

Delete embedding vector for a sample.

Parameters:

Name Type Description Default
sample_id str

Sample ID

required

Returns:

Type Description
bool

True if embedding was deleted, False if not found

Example

success = store.delete_embedding("smpl_abc123") success True store.delete_embedding("smpl_nonexistent") False

get_embedding(sample_id: str) -> Optional[list[float]]

Retrieve embedding vector for a sample.

Parameters:

Name Type Description Default
sample_id str

Sample ID

required

Returns:

Type Description
Optional[list[float]]

Embedding vector if found, None otherwise

Example

embedding = store.get_embedding("smpl_abc123") len(embedding) 128 store.get_embedding("smpl_nonexistent") None

search_similar(embedding: list[float], limit: int = 10, offset: int = 0, exclude_ids: Optional[list[str]] = None, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[str, float]]

Find similar samples by embedding distance.

Returns samples sorted by distance (ascending = most similar first).

Distance metrics: - cosine: Cosine distance (0 = identical, 2 = opposite) - l2: Euclidean distance (lower = more similar)

Parameters:

Name Type Description Default
embedding list[float]

Query vector (dimension=128)

required
limit int

Maximum results to return

10
offset int

Number of results to skip (for pagination)

0
exclude_ids Optional[list[str]]

Sample IDs to exclude from results

None
distance_metric Literal['cosine', 'l2']

Distance calculation method

'cosine'

Returns:

Type Description
list[tuple[str, float]]

List of (sample_id, distance) sorted by distance ascending

Raises:

Type Description
ValueError

If embedding dimension != 128

Example

query_emb = [0.3, 0.4, ..., 0.5] # 128 dimensions results = store.search_similar( ... query_emb, ... limit=5, ... offset=0, ... distance_metric="cosine", ... ) results [ ("smpl_abc123", 0.05), ("smpl_def456", 0.12), ("smpl_ghi789", 0.18), ("smpl_jkl012", 0.23), ("smpl_mno345", 0.29), ]

Pagination - get next page

page2 = store.search_similar(query_emb, limit=5, offset=5)

Exclude already-used samples

more = store.search_similar( ... query_emb, ... limit=5, ... exclude_ids=["smpl_abc123", "smpl_def456"], ... )

Unified Storage

unified

Unified storage layer integrating SQLite and LanceDB.

This module provides atomic operations across both metadata (SQLite) and embedding (LanceDB) stores, ensuring data consistency. All operations follow fail-fast semantics with automatic rollback on partial failures.

Example

from pathlib import Path storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... ) sample_id = storage.add_sample_with_embedding(sample, embedding) similar = storage.find_similar(sample_id, limit=10)

UnifiedSampleStorage

Unified interface for sample storage with metadata and embeddings.

Coordinates atomic operations across SQLite (metadata) and LanceDB (embeddings) to maintain data consistency. If either store fails, changes are rolled back.

Attributes:

Name Type Description
sample_store

SQLite metadata store

vector_store

LanceDB embedding store

Example

storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... ) sample = SampleMetadata( ... id="smpl_abc123", ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) embedding = [0.1] * 128 sample_id = storage.add_sample_with_embedding(sample, embedding)

Source code in src/audiomancer/storage/unified.py
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
class UnifiedSampleStorage:
    """Unified interface for sample storage with metadata and embeddings.

    Coordinates atomic operations across SQLite (metadata) and LanceDB (embeddings)
    to maintain data consistency. If either store fails, changes are rolled back.

    Attributes:
        sample_store: SQLite metadata store
        vector_store: LanceDB embedding store

    Example:
        >>> storage = UnifiedSampleStorage(
        ...     db_path=Path("~/.audiomancer/samples.db"),
        ...     embeddings_path=Path("~/.audiomancer/embeddings")
        ... )
        >>> sample = SampleMetadata(
        ...     id="smpl_abc123",
        ...     file_path="/samples/kick.wav",
        ...     file_hash="abc123",
        ...     duration_ms=250.5,
        ...     sample_rate=44100,
        ...     channels=1,
        ...     bit_depth=16,
        ...     file_size_bytes=44100,
        ... )
        >>> embedding = [0.1] * 128
        >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
    """

    def __init__(self, db_path: Path, embeddings_path: Path):
        """Initialize unified storage with both stores.

        Creates database and embedding directories if they don't exist.

        Args:
            db_path: Path to SQLite database file
            embeddings_path: Path to LanceDB embeddings directory

        Example:
            >>> storage = UnifiedSampleStorage(
            ...     db_path=Path("~/.audiomancer/samples.db"),
            ...     embeddings_path=Path("~/.audiomancer/embeddings")
            ... )
        """
        # Expand user paths
        db_path = db_path.expanduser().absolute()
        embeddings_path = embeddings_path.expanduser().absolute()

        # Create parent directories
        db_path.parent.mkdir(parents=True, exist_ok=True)
        embeddings_path.mkdir(parents=True, exist_ok=True)

        self.sample_store = SampleStore(str(db_path))
        self.vector_store = LanceDBVectorStore(embeddings_path)

    def add_sample_with_embedding(
        self,
        sample: SampleMetadata,
        embedding: list[float]
    ) -> str:
        """Add sample and embedding atomically.

        Both metadata and embedding are added together. If either operation fails,
        neither is persisted (atomic rollback).

        Args:
            sample: Complete sample metadata
            embedding: 128-dimensional embedding vector

        Returns:
            Sample ID (format: "smpl_{hash[:8]}")

        Raises:
            DuplicateSampleError: If sample hash already exists in database
            ValueError: If embedding dimension != 128
            StorageError: On unexpected storage errors

        Example:
            >>> sample = SampleMetadata(
            ...     id="smpl_abc123",
            ...     file_path="/samples/kick.wav",
            ...     file_hash="abc123",
            ...     duration_ms=250.5,
            ...     sample_rate=44100,
            ...     channels=1,
            ...     bit_depth=16,
            ...     file_size_bytes=44100,
            ... )
            >>> embedding = [0.1] * 128
            >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
            >>> storage.sample_store.get(sample_id) is not None
            True
            >>> storage.vector_store.get_embedding(sample_id) is not None
            True
        """
        sample_id: Optional[str] = None

        try:
            # Add sample metadata first
            sample_id = self.sample_store.add(sample)

            # Add embedding (if this fails, rollback sample)
            self.vector_store.add_embedding(sample_id, embedding)

            return sample_id

        except DuplicateSampleError:
            # Sample already exists, propagate error
            raise
        except ValueError as e:
            # Embedding validation failed, rollback sample if added
            if sample_id:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors, original error is more important
                    pass
            raise StorageError(
                f"Invalid embedding: {str(e)}",
                details={"sample_id": sample.get("id"), "error": str(e)}
            )
        except Exception as e:
            # Unexpected error, rollback sample if added
            if sample_id:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
            raise StorageError(
                f"Failed to add sample with embedding: {str(e)}",
                details={"sample_id": sample.get("id"), "error": str(e)}
            )

    def add_samples_with_embeddings_batch(
        self,
        items: list[tuple[SampleMetadata, list[float]]]
    ) -> list[str]:
        """Add multiple samples and embeddings atomically.

        All samples and embeddings are added together or none are added
        (atomic batch operation). On any failure, rolls back all changes.

        Args:
            items: List of (sample, embedding) tuples

        Returns:
            List of sample IDs in same order as input

        Raises:
            DuplicateSampleError: If any sample hash already exists
            ValueError: If any embedding dimension != 128
            StorageError: On unexpected storage errors

        Example:
            >>> items = [
            ...     (sample1, [0.1] * 128),
            ...     (sample2, [0.2] * 128),
            ...     (sample3, [0.3] * 128),
            ... ]
            >>> sample_ids = storage.add_samples_with_embeddings_batch(items)
            >>> len(sample_ids)
            3
        """
        if not items:
            return []

        sample_ids: list[str] = []

        try:
            # Validate all embeddings first (fail fast)
            for sample, embedding in items:
                if len(embedding) != LanceDBVectorStore.EMBEDDING_DIM:
                    raise ValueError(
                        f"Embedding dimension must be {LanceDBVectorStore.EMBEDDING_DIM}, "
                        f"got {len(embedding)} for sample {sample.get('id')}"
                    )

            # Add all samples first
            samples = [sample for sample, _ in items]
            sample_ids = self.sample_store.add_batch(samples)

            # Add all embeddings
            embedding_items = [
                (sample_id, embedding)
                for sample_id, (_, embedding) in zip(sample_ids, items)
            ]
            self.vector_store.add_embeddings_batch(embedding_items)

            return sample_ids

        except DuplicateSampleError:
            # Sample already exists, propagate error
            raise
        except ValueError as e:
            # Embedding validation failed, rollback samples if added
            if sample_ids:
                for sample_id in sample_ids:
                    try:
                        self.sample_store.delete(sample_id)
                    except Exception:
                        # Ignore rollback errors
                        pass
            raise StorageError(
                f"Invalid embedding in batch: {str(e)}",
                details={"batch_size": len(items), "error": str(e)}
            )
        except Exception as e:
            # Unexpected error, rollback samples if added
            if sample_ids:
                for sample_id in sample_ids:
                    try:
                        self.sample_store.delete(sample_id)
                    except Exception:
                        # Ignore rollback errors
                        pass
            raise StorageError(
                f"Failed to add batch with embeddings: {str(e)}",
                details={"batch_size": len(items), "error": str(e)}
            )

    def delete_sample(self, sample_id: str) -> bool:
        """Delete sample and its embedding.

        Removes from both metadata and embedding stores. If either delete fails,
        the operation continues (best effort cleanup).

        Args:
            sample_id: Sample ID to delete

        Returns:
            True if sample was deleted from metadata store, False if not found

        Example:
            >>> success = storage.delete_sample("smpl_abc123")
            >>> success
            True
            >>> storage.sample_store.get("smpl_abc123")
            None
            >>> storage.vector_store.get_embedding("smpl_abc123")
            None
        """
        # Delete from both stores (best effort)
        sample_deleted = self.sample_store.delete(sample_id)

        # Always try to delete embedding even if sample wasn't found
        # (orphaned embeddings should be cleaned up)
        try:
            self.vector_store.delete_embedding(sample_id)
        except Exception:
            # Ignore embedding deletion errors
            pass

        return sample_deleted

    def find_similar(
        self,
        sample_id: str,
        limit: int = 10,
        exclude_self: bool = True,
        distance_metric: Literal["cosine", "l2"] = "cosine",
    ) -> list[tuple[SampleMetadata, float]]:
        """Find samples similar to the given sample.

        Uses the sample's embedding to find nearest neighbors in vector space,
        then retrieves full metadata for each result.

        Args:
            sample_id: Sample ID to find similar samples for
            limit: Maximum number of results to return
            exclude_self: Whether to exclude the query sample from results
            distance_metric: Distance calculation method ("cosine" or "l2")

        Returns:
            List of (sample, distance) tuples sorted by distance ascending

        Raises:
            SampleNotFoundError: If sample_id not found in vector store
            StorageError: On unexpected storage errors

        Example:
            >>> similar = storage.find_similar("smpl_abc123", limit=5)
            >>> len(similar) <= 5
            True
            >>> # First result is most similar
            >>> similar[0][1] < similar[1][1]
            True
        """
        # Get embedding for query sample
        embedding = self.vector_store.get_embedding(sample_id)
        if embedding is None:
            raise SampleNotFoundError(
                sample_id,
                details={"reason": "No embedding found for sample"}
            )

        # Find similar embeddings
        exclude_ids = [sample_id] if exclude_self else None

        # Request extra results to account for potentially missing metadata
        search_limit = limit * 2 if exclude_self else limit + 1

        similar_ids = self.vector_store.search_similar(
            embedding,
            limit=search_limit,
            exclude_ids=exclude_ids,
            distance_metric=distance_metric
        )

        # Retrieve metadata for each result
        results = []
        for sid, distance in similar_ids:
            metadata = self.sample_store.get(sid)
            if metadata:
                results.append((metadata, distance))
                if len(results) >= limit:
                    break

        return results

    def search_by_text_and_similarity(
        self,
        query_embedding: Optional[list[float]] = None,
        text_query: Optional[str] = None,
        filters: Optional[dict] = None,
        limit: int = 20,
        distance_metric: Literal["cosine", "l2"] = "cosine",
    ) -> list[SampleMetadata]:
        """Combined text search and similarity search.

        Can use vector similarity, text search, or both. When both are provided,
        results are intersected (samples must match both criteria).

        Args:
            query_embedding: Optional embedding vector for similarity search
            text_query: Optional text query for metadata search
            filters: Optional filters (instrument_type, bpm_min, bpm_max, key, mood)
            limit: Maximum number of results
            distance_metric: Distance metric for similarity search

        Returns:
            List of matching samples sorted by relevance

        Raises:
            ValueError: If neither query_embedding nor text_query provided
            StorageError: On unexpected storage errors

        Example:
            >>> # Similarity search only
            >>> results = storage.search_by_text_and_similarity(
            ...     query_embedding=[0.1] * 128,
            ...     limit=10
            ... )

            >>> # Text search only
            >>> results = storage.search_by_text_and_similarity(
            ...     text_query="kick",
            ...     filters={"bpm_min": 120.0, "bpm_max": 130.0},
            ...     limit=10
            ... )

            >>> # Combined search
            >>> results = storage.search_by_text_and_similarity(
            ...     query_embedding=[0.1] * 128,
            ...     text_query="kick",
            ...     filters={"key": "C"},
            ...     limit=10
            ... )
        """
        if query_embedding is None and text_query is None:
            raise ValueError("Must provide query_embedding or text_query or both")

        filters = filters or {}

        # Case 1: Only text search
        if query_embedding is None:
            return self.sample_store.search(
                query=text_query,
                instrument_type=filters.get("instrument_type"),
                bpm_min=filters.get("bpm_min"),
                bpm_max=filters.get("bpm_max"),
                key=filters.get("key"),
                mood=filters.get("mood"),
                limit=limit
            )

        # Case 2: Only similarity search
        if text_query is None and not filters:
            similar_ids = self.vector_store.search_similar(
                query_embedding,
                limit=limit,
                distance_metric=distance_metric
            )

            results = []
            for sample_id, _ in similar_ids:
                metadata = self.sample_store.get(sample_id)
                if metadata:
                    results.append(metadata)

            return results

        # Case 3: Combined search - get candidates from similarity, filter by text
        # Request more candidates to account for filtering
        search_limit = limit * 5

        similar_ids = self.vector_store.search_similar(
            query_embedding,
            limit=search_limit,
            distance_metric=distance_metric
        )

        # Get sample IDs from similarity search
        candidate_ids = {sample_id for sample_id, _ in similar_ids}

        # Get samples matching text filters
        text_results = self.sample_store.search(
            query=text_query,
            instrument_type=filters.get("instrument_type"),
            bpm_min=filters.get("bpm_min"),
            bpm_max=filters.get("bpm_max"),
            key=filters.get("key"),
            mood=filters.get("mood"),
            limit=search_limit
        )

        # Intersect: only samples that match both criteria
        results = []
        for sample in text_results:
            sample_id = sample.get("id")
            if sample_id is not None and sample_id in candidate_ids:
                results.append(sample)
                if len(results) >= limit:
                    break

        return results

    def get_sample(self, sample_id: str) -> Optional[SampleMetadata]:
        """Retrieve sample metadata by ID.

        Convenience wrapper around sample_store.get().

        Args:
            sample_id: Sample ID

        Returns:
            Sample metadata if found, None otherwise

        Example:
            >>> sample = storage.get_sample("smpl_abc123")
            >>> sample['file_path']
            "/samples/kick.wav"
        """
        return self.sample_store.get(sample_id)

    def get_embedding(self, sample_id: str) -> Optional[list[float]]:
        """Retrieve embedding by sample ID.

        Convenience wrapper around vector_store.get_embedding().

        Args:
            sample_id: Sample ID

        Returns:
            Embedding vector if found, None otherwise

        Example:
            >>> embedding = storage.get_embedding("smpl_abc123")
            >>> len(embedding)
            128
        """
        return self.vector_store.get_embedding(sample_id)

    def update_sample(self, sample_id: str, updates: dict) -> bool:
        """Update sample metadata fields.

        Only updates specified fields in metadata store. Does not affect embedding.
        To update embedding, use add_sample_with_embedding() with new embedding.

        Args:
            sample_id: Sample ID to update
            updates: Dictionary of field names and new values

        Returns:
            True if sample was updated, False if not found

        Example:
            >>> success = storage.update_sample(
            ...     "smpl_abc123",
            ...     {"bpm": 128.0, "key": "C#"}
            ... )
            >>> success
            True
        """
        return self.sample_store.update(sample_id, updates)

__init__(db_path: Path, embeddings_path: Path)

Initialize unified storage with both stores.

Creates database and embedding directories if they don't exist.

Parameters:

Name Type Description Default
db_path Path

Path to SQLite database file

required
embeddings_path Path

Path to LanceDB embeddings directory

required
Example

storage = UnifiedSampleStorage( ... db_path=Path("~/.audiomancer/samples.db"), ... embeddings_path=Path("~/.audiomancer/embeddings") ... )

Source code in src/audiomancer/storage/unified.py
def __init__(self, db_path: Path, embeddings_path: Path):
    """Initialize unified storage with both stores.

    Creates database and embedding directories if they don't exist.

    Args:
        db_path: Path to SQLite database file
        embeddings_path: Path to LanceDB embeddings directory

    Example:
        >>> storage = UnifiedSampleStorage(
        ...     db_path=Path("~/.audiomancer/samples.db"),
        ...     embeddings_path=Path("~/.audiomancer/embeddings")
        ... )
    """
    # Expand user paths
    db_path = db_path.expanduser().absolute()
    embeddings_path = embeddings_path.expanduser().absolute()

    # Create parent directories
    db_path.parent.mkdir(parents=True, exist_ok=True)
    embeddings_path.mkdir(parents=True, exist_ok=True)

    self.sample_store = SampleStore(str(db_path))
    self.vector_store = LanceDBVectorStore(embeddings_path)

add_sample_with_embedding(sample: SampleMetadata, embedding: list[float]) -> str

Add sample and embedding atomically.

Both metadata and embedding are added together. If either operation fails, neither is persisted (atomic rollback).

Parameters:

Name Type Description Default
sample SampleMetadata

Complete sample metadata

required
embedding list[float]

128-dimensional embedding vector

required

Returns:

Type Description
str

Sample ID (format: "smpl_{hash[:8]}")

Raises:

Type Description
DuplicateSampleError

If sample hash already exists in database

ValueError

If embedding dimension != 128

StorageError

On unexpected storage errors

Example

sample = SampleMetadata( ... id="smpl_abc123", ... file_path="/samples/kick.wav", ... file_hash="abc123", ... duration_ms=250.5, ... sample_rate=44100, ... channels=1, ... bit_depth=16, ... file_size_bytes=44100, ... ) embedding = [0.1] * 128 sample_id = storage.add_sample_with_embedding(sample, embedding) storage.sample_store.get(sample_id) is not None True storage.vector_store.get_embedding(sample_id) is not None True

Source code in src/audiomancer/storage/unified.py
def add_sample_with_embedding(
    self,
    sample: SampleMetadata,
    embedding: list[float]
) -> str:
    """Add sample and embedding atomically.

    Both metadata and embedding are added together. If either operation fails,
    neither is persisted (atomic rollback).

    Args:
        sample: Complete sample metadata
        embedding: 128-dimensional embedding vector

    Returns:
        Sample ID (format: "smpl_{hash[:8]}")

    Raises:
        DuplicateSampleError: If sample hash already exists in database
        ValueError: If embedding dimension != 128
        StorageError: On unexpected storage errors

    Example:
        >>> sample = SampleMetadata(
        ...     id="smpl_abc123",
        ...     file_path="/samples/kick.wav",
        ...     file_hash="abc123",
        ...     duration_ms=250.5,
        ...     sample_rate=44100,
        ...     channels=1,
        ...     bit_depth=16,
        ...     file_size_bytes=44100,
        ... )
        >>> embedding = [0.1] * 128
        >>> sample_id = storage.add_sample_with_embedding(sample, embedding)
        >>> storage.sample_store.get(sample_id) is not None
        True
        >>> storage.vector_store.get_embedding(sample_id) is not None
        True
    """
    sample_id: Optional[str] = None

    try:
        # Add sample metadata first
        sample_id = self.sample_store.add(sample)

        # Add embedding (if this fails, rollback sample)
        self.vector_store.add_embedding(sample_id, embedding)

        return sample_id

    except DuplicateSampleError:
        # Sample already exists, propagate error
        raise
    except ValueError as e:
        # Embedding validation failed, rollback sample if added
        if sample_id:
            try:
                self.sample_store.delete(sample_id)
            except Exception:
                # Ignore rollback errors, original error is more important
                pass
        raise StorageError(
            f"Invalid embedding: {str(e)}",
            details={"sample_id": sample.get("id"), "error": str(e)}
        )
    except Exception as e:
        # Unexpected error, rollback sample if added
        if sample_id:
            try:
                self.sample_store.delete(sample_id)
            except Exception:
                # Ignore rollback errors
                pass
        raise StorageError(
            f"Failed to add sample with embedding: {str(e)}",
            details={"sample_id": sample.get("id"), "error": str(e)}
        )

add_samples_with_embeddings_batch(items: list[tuple[SampleMetadata, list[float]]]) -> list[str]

Add multiple samples and embeddings atomically.

All samples and embeddings are added together or none are added (atomic batch operation). On any failure, rolls back all changes.

Parameters:

Name Type Description Default
items list[tuple[SampleMetadata, list[float]]]

List of (sample, embedding) tuples

required

Returns:

Type Description
list[str]

List of sample IDs in same order as input

Raises:

Type Description
DuplicateSampleError

If any sample hash already exists

ValueError

If any embedding dimension != 128

StorageError

On unexpected storage errors

Example

items = [ ... (sample1, [0.1] * 128), ... (sample2, [0.2] * 128), ... (sample3, [0.3] * 128), ... ] sample_ids = storage.add_samples_with_embeddings_batch(items) len(sample_ids) 3

Source code in src/audiomancer/storage/unified.py
def add_samples_with_embeddings_batch(
    self,
    items: list[tuple[SampleMetadata, list[float]]]
) -> list[str]:
    """Add multiple samples and embeddings atomically.

    All samples and embeddings are added together or none are added
    (atomic batch operation). On any failure, rolls back all changes.

    Args:
        items: List of (sample, embedding) tuples

    Returns:
        List of sample IDs in same order as input

    Raises:
        DuplicateSampleError: If any sample hash already exists
        ValueError: If any embedding dimension != 128
        StorageError: On unexpected storage errors

    Example:
        >>> items = [
        ...     (sample1, [0.1] * 128),
        ...     (sample2, [0.2] * 128),
        ...     (sample3, [0.3] * 128),
        ... ]
        >>> sample_ids = storage.add_samples_with_embeddings_batch(items)
        >>> len(sample_ids)
        3
    """
    if not items:
        return []

    sample_ids: list[str] = []

    try:
        # Validate all embeddings first (fail fast)
        for sample, embedding in items:
            if len(embedding) != LanceDBVectorStore.EMBEDDING_DIM:
                raise ValueError(
                    f"Embedding dimension must be {LanceDBVectorStore.EMBEDDING_DIM}, "
                    f"got {len(embedding)} for sample {sample.get('id')}"
                )

        # Add all samples first
        samples = [sample for sample, _ in items]
        sample_ids = self.sample_store.add_batch(samples)

        # Add all embeddings
        embedding_items = [
            (sample_id, embedding)
            for sample_id, (_, embedding) in zip(sample_ids, items)
        ]
        self.vector_store.add_embeddings_batch(embedding_items)

        return sample_ids

    except DuplicateSampleError:
        # Sample already exists, propagate error
        raise
    except ValueError as e:
        # Embedding validation failed, rollback samples if added
        if sample_ids:
            for sample_id in sample_ids:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
        raise StorageError(
            f"Invalid embedding in batch: {str(e)}",
            details={"batch_size": len(items), "error": str(e)}
        )
    except Exception as e:
        # Unexpected error, rollback samples if added
        if sample_ids:
            for sample_id in sample_ids:
                try:
                    self.sample_store.delete(sample_id)
                except Exception:
                    # Ignore rollback errors
                    pass
        raise StorageError(
            f"Failed to add batch with embeddings: {str(e)}",
            details={"batch_size": len(items), "error": str(e)}
        )

delete_sample(sample_id: str) -> bool

Delete sample and its embedding.

Removes from both metadata and embedding stores. If either delete fails, the operation continues (best effort cleanup).

Parameters:

Name Type Description Default
sample_id str

Sample ID to delete

required

Returns:

Type Description
bool

True if sample was deleted from metadata store, False if not found

Example

success = storage.delete_sample("smpl_abc123") success True storage.sample_store.get("smpl_abc123") None storage.vector_store.get_embedding("smpl_abc123") None

Source code in src/audiomancer/storage/unified.py
def delete_sample(self, sample_id: str) -> bool:
    """Delete sample and its embedding.

    Removes from both metadata and embedding stores. If either delete fails,
    the operation continues (best effort cleanup).

    Args:
        sample_id: Sample ID to delete

    Returns:
        True if sample was deleted from metadata store, False if not found

    Example:
        >>> success = storage.delete_sample("smpl_abc123")
        >>> success
        True
        >>> storage.sample_store.get("smpl_abc123")
        None
        >>> storage.vector_store.get_embedding("smpl_abc123")
        None
    """
    # Delete from both stores (best effort)
    sample_deleted = self.sample_store.delete(sample_id)

    # Always try to delete embedding even if sample wasn't found
    # (orphaned embeddings should be cleaned up)
    try:
        self.vector_store.delete_embedding(sample_id)
    except Exception:
        # Ignore embedding deletion errors
        pass

    return sample_deleted

find_similar(sample_id: str, limit: int = 10, exclude_self: bool = True, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[tuple[SampleMetadata, float]]

Find samples similar to the given sample.

Uses the sample's embedding to find nearest neighbors in vector space, then retrieves full metadata for each result.

Parameters:

Name Type Description Default
sample_id str

Sample ID to find similar samples for

required
limit int

Maximum number of results to return

10
exclude_self bool

Whether to exclude the query sample from results

True
distance_metric Literal['cosine', 'l2']

Distance calculation method ("cosine" or "l2")

'cosine'

Returns:

Type Description
list[tuple[SampleMetadata, float]]

List of (sample, distance) tuples sorted by distance ascending

Raises:

Type Description
SampleNotFoundError

If sample_id not found in vector store

StorageError

On unexpected storage errors

Example

similar = storage.find_similar("smpl_abc123", limit=5) len(similar) <= 5 True

First result is most similar

similar[0][1] < similar[1][] True

Source code in src/audiomancer/storage/unified.py
def find_similar(
    self,
    sample_id: str,
    limit: int = 10,
    exclude_self: bool = True,
    distance_metric: Literal["cosine", "l2"] = "cosine",
) -> list[tuple[SampleMetadata, float]]:
    """Find samples similar to the given sample.

    Uses the sample's embedding to find nearest neighbors in vector space,
    then retrieves full metadata for each result.

    Args:
        sample_id: Sample ID to find similar samples for
        limit: Maximum number of results to return
        exclude_self: Whether to exclude the query sample from results
        distance_metric: Distance calculation method ("cosine" or "l2")

    Returns:
        List of (sample, distance) tuples sorted by distance ascending

    Raises:
        SampleNotFoundError: If sample_id not found in vector store
        StorageError: On unexpected storage errors

    Example:
        >>> similar = storage.find_similar("smpl_abc123", limit=5)
        >>> len(similar) <= 5
        True
        >>> # First result is most similar
        >>> similar[0][1] < similar[1][1]
        True
    """
    # Get embedding for query sample
    embedding = self.vector_store.get_embedding(sample_id)
    if embedding is None:
        raise SampleNotFoundError(
            sample_id,
            details={"reason": "No embedding found for sample"}
        )

    # Find similar embeddings
    exclude_ids = [sample_id] if exclude_self else None

    # Request extra results to account for potentially missing metadata
    search_limit = limit * 2 if exclude_self else limit + 1

    similar_ids = self.vector_store.search_similar(
        embedding,
        limit=search_limit,
        exclude_ids=exclude_ids,
        distance_metric=distance_metric
    )

    # Retrieve metadata for each result
    results = []
    for sid, distance in similar_ids:
        metadata = self.sample_store.get(sid)
        if metadata:
            results.append((metadata, distance))
            if len(results) >= limit:
                break

    return results

get_embedding(sample_id: str) -> Optional[list[float]]

Retrieve embedding by sample ID.

Convenience wrapper around vector_store.get_embedding().

Parameters:

Name Type Description Default
sample_id str

Sample ID

required

Returns:

Type Description
Optional[list[float]]

Embedding vector if found, None otherwise

Example

embedding = storage.get_embedding("smpl_abc123") len(embedding) 128

Source code in src/audiomancer/storage/unified.py
def get_embedding(self, sample_id: str) -> Optional[list[float]]:
    """Retrieve embedding by sample ID.

    Convenience wrapper around vector_store.get_embedding().

    Args:
        sample_id: Sample ID

    Returns:
        Embedding vector if found, None otherwise

    Example:
        >>> embedding = storage.get_embedding("smpl_abc123")
        >>> len(embedding)
        128
    """
    return self.vector_store.get_embedding(sample_id)

get_sample(sample_id: str) -> Optional[SampleMetadata]

Retrieve sample metadata by ID.

Convenience wrapper around sample_store.get().

Parameters:

Name Type Description Default
sample_id str

Sample ID

required

Returns:

Type Description
Optional[SampleMetadata]

Sample metadata if found, None otherwise

Example

sample = storage.get_sample("smpl_abc123") sample['file_path'] "/samples/kick.wav"

Source code in src/audiomancer/storage/unified.py
def get_sample(self, sample_id: str) -> Optional[SampleMetadata]:
    """Retrieve sample metadata by ID.

    Convenience wrapper around sample_store.get().

    Args:
        sample_id: Sample ID

    Returns:
        Sample metadata if found, None otherwise

    Example:
        >>> sample = storage.get_sample("smpl_abc123")
        >>> sample['file_path']
        "/samples/kick.wav"
    """
    return self.sample_store.get(sample_id)

search_by_text_and_similarity(query_embedding: Optional[list[float]] = None, text_query: Optional[str] = None, filters: Optional[dict] = None, limit: int = 20, distance_metric: Literal['cosine', 'l2'] = 'cosine') -> list[SampleMetadata]

Combined text search and similarity search.

Can use vector similarity, text search, or both. When both are provided, results are intersected (samples must match both criteria).

Parameters:

Name Type Description Default
query_embedding Optional[list[float]]

Optional embedding vector for similarity search

None
text_query Optional[str]

Optional text query for metadata search

None
filters Optional[dict]

Optional filters (instrument_type, bpm_min, bpm_max, key, mood)

None
limit int

Maximum number of results

20
distance_metric Literal['cosine', 'l2']

Distance metric for similarity search

'cosine'

Returns:

Type Description
list[SampleMetadata]

List of matching samples sorted by relevance

Raises:

Type Description
ValueError

If neither query_embedding nor text_query provided

StorageError

On unexpected storage errors

Example
Similarity search only

results = storage.search_by_text_and_similarity( ... query_embedding=[0.1] * 128, ... limit=10 ... )

Text search only

results = storage.search_by_text_and_similarity( ... text_query="kick", ... filters={"bpm_min": 120.0, "bpm_max": 130.0}, ... limit=10 ... )

results = storage.search_by_text_and_similarity( ... query_embedding=[0.1] * 128, ... text_query="kick", ... filters={"key": "C"}, ... limit=10 ... )

Source code in src/audiomancer/storage/unified.py
def search_by_text_and_similarity(
    self,
    query_embedding: Optional[list[float]] = None,
    text_query: Optional[str] = None,
    filters: Optional[dict] = None,
    limit: int = 20,
    distance_metric: Literal["cosine", "l2"] = "cosine",
) -> list[SampleMetadata]:
    """Combined text search and similarity search.

    Can use vector similarity, text search, or both. When both are provided,
    results are intersected (samples must match both criteria).

    Args:
        query_embedding: Optional embedding vector for similarity search
        text_query: Optional text query for metadata search
        filters: Optional filters (instrument_type, bpm_min, bpm_max, key, mood)
        limit: Maximum number of results
        distance_metric: Distance metric for similarity search

    Returns:
        List of matching samples sorted by relevance

    Raises:
        ValueError: If neither query_embedding nor text_query provided
        StorageError: On unexpected storage errors

    Example:
        >>> # Similarity search only
        >>> results = storage.search_by_text_and_similarity(
        ...     query_embedding=[0.1] * 128,
        ...     limit=10
        ... )

        >>> # Text search only
        >>> results = storage.search_by_text_and_similarity(
        ...     text_query="kick",
        ...     filters={"bpm_min": 120.0, "bpm_max": 130.0},
        ...     limit=10
        ... )

        >>> # Combined search
        >>> results = storage.search_by_text_and_similarity(
        ...     query_embedding=[0.1] * 128,
        ...     text_query="kick",
        ...     filters={"key": "C"},
        ...     limit=10
        ... )
    """
    if query_embedding is None and text_query is None:
        raise ValueError("Must provide query_embedding or text_query or both")

    filters = filters or {}

    # Case 1: Only text search
    if query_embedding is None:
        return self.sample_store.search(
            query=text_query,
            instrument_type=filters.get("instrument_type"),
            bpm_min=filters.get("bpm_min"),
            bpm_max=filters.get("bpm_max"),
            key=filters.get("key"),
            mood=filters.get("mood"),
            limit=limit
        )

    # Case 2: Only similarity search
    if text_query is None and not filters:
        similar_ids = self.vector_store.search_similar(
            query_embedding,
            limit=limit,
            distance_metric=distance_metric
        )

        results = []
        for sample_id, _ in similar_ids:
            metadata = self.sample_store.get(sample_id)
            if metadata:
                results.append(metadata)

        return results

    # Case 3: Combined search - get candidates from similarity, filter by text
    # Request more candidates to account for filtering
    search_limit = limit * 5

    similar_ids = self.vector_store.search_similar(
        query_embedding,
        limit=search_limit,
        distance_metric=distance_metric
    )

    # Get sample IDs from similarity search
    candidate_ids = {sample_id for sample_id, _ in similar_ids}

    # Get samples matching text filters
    text_results = self.sample_store.search(
        query=text_query,
        instrument_type=filters.get("instrument_type"),
        bpm_min=filters.get("bpm_min"),
        bpm_max=filters.get("bpm_max"),
        key=filters.get("key"),
        mood=filters.get("mood"),
        limit=search_limit
    )

    # Intersect: only samples that match both criteria
    results = []
    for sample in text_results:
        sample_id = sample.get("id")
        if sample_id is not None and sample_id in candidate_ids:
            results.append(sample)
            if len(results) >= limit:
                break

    return results

update_sample(sample_id: str, updates: dict) -> bool

Update sample metadata fields.

Only updates specified fields in metadata store. Does not affect embedding. To update embedding, use add_sample_with_embedding() with new embedding.

Parameters:

Name Type Description Default
sample_id str

Sample ID to update

required
updates dict

Dictionary of field names and new values

required

Returns:

Type Description
bool

True if sample was updated, False if not found

Example

success = storage.update_sample( ... "smpl_abc123", ... {"bpm": 128.0, "key": "C#"} ... ) success True

Source code in src/audiomancer/storage/unified.py
def update_sample(self, sample_id: str, updates: dict) -> bool:
    """Update sample metadata fields.

    Only updates specified fields in metadata store. Does not affect embedding.
    To update embedding, use add_sample_with_embedding() with new embedding.

    Args:
        sample_id: Sample ID to update
        updates: Dictionary of field names and new values

    Returns:
        True if sample was updated, False if not found

    Example:
        >>> success = storage.update_sample(
        ...     "smpl_abc123",
        ...     {"bpm": 128.0, "key": "C#"}
        ... )
        >>> success
        True
    """
    return self.sample_store.update(sample_id, updates)