Testing Guide¶
This guide covers testing practices, running tests, and writing new tests for the soong CLI.
Overview¶
The test suite uses pytest with coverage reporting, mocking, and HTTP request interception. Our goal is high coverage (95%+) with tests that verify actual behavior, not implementation details.
Running Tests¶
Basic Test Execution¶
# Run all tests
pytest
# Verbose output
pytest -v
# Run specific test file
pytest tests/test_models.py
# Run specific test
pytest tests/test_models.py::test_estimate_vram_llama_70b_int4
# Run tests matching pattern
pytest -k "vram" # Runs all tests with "vram" in name
Coverage Reporting¶
# Run with coverage
pytest --cov=soong
# Coverage with missing lines
pytest --cov=soong --cov-report=term-missing
# Generate HTML coverage report
pytest --cov=soong --cov-report=html
# Open htmlcov/index.html in browser
# Coverage for specific module
pytest --cov=soong.models tests/test_models.py
Test Output Options¶
# Show print statements
pytest -s
# Show logging output
pytest --log-cli-level=DEBUG
# Fail fast (stop on first failure)
pytest -x
# Run last failed tests
pytest --lf
# Run failed tests first, then all
pytest --ff
Test Organization¶
Directory Structure¶
cli/tests/
├── conftest.py # Shared fixtures
├── helpers/
│ ├── __init__.py
│ └── assertions.py # Custom assertions
├── test_config.py # Configuration tests
├── test_models.py # Model registry tests
├── test_vram.py # VRAM estimation tests
├── test_gpu_recommendation.py # GPU recommendation tests
├── test_lambda_api.py # API client tests
├── test_instance.py # Instance manager tests
├── test_ssh.py # SSH tunnel tests
├── test_history.py # History tracking tests
├── test_cli_configure.py # Configure command tests
├── test_cli_start.py # Start command tests
├── test_cli_status.py # Status command tests
├── test_cli_commands.py # Other CLI commands
├── test_cli_models.py # Models subcommand tests
├── test_cli_models_add.py # Models add tests
├── test_cli_models_remove.py # Models remove tests
└── test_cli_tunnel.py # Tunnel subcommand tests
Test Naming Conventions¶
From conftest.py:
"""
Test Naming Conventions:
- test_<function_name>_<scenario>() for unit tests
Example: test_estimate_vram_llama_70b_int4()
- test_<command>_<behavior>() for CLI command tests
Example: test_models_list_displays_all_known_models()
- test_<class_name>_<method>_<scenario>() for class method tests
Example: test_model_config_from_dict_valid()
- test_<integration_workflow>() for end-to-end tests
Example: test_full_workflow_configure_add_model_start()
Use descriptive scenario suffixes:
- _valid, _invalid for validation tests
- _missing_field, _negative_params for error cases
- _happy_path, _edge_case for behavior tests
"""
Writing Tests¶
Unit Tests¶
Test individual functions in isolation:
def test_estimate_vram_llama_70b_int4():
"""Test VRAM estimation for Llama 3.1 70B with INT4 quantization."""
from soong.models import estimate_vram, Quantization
result = estimate_vram(
params_billions=70,
quantization=Quantization.INT4,
context_length=8192,
)
# Verify structure
assert "base_vram_gb" in result
assert "kv_cache_gb" in result
assert "total_estimated_gb" in result
assert "min_vram_gb" in result
# Verify calculations
# 70B * 0.5 bytes (INT4) = 35GB base
assert result["base_vram_gb"] == 35.0
# Total should be base + KV cache + overhead + activations
assert result["total_estimated_gb"] > 35.0
assert result["total_estimated_gb"] < 50.0 # Reasonable upper bound
# Should fit on 48GB GPU
assert result["min_vram_gb"] in [40, 48, 80]
CLI Command Tests¶
Test CLI commands using CliRunner:
def test_start_command_shows_cost_estimate(cli_runner, sample_config, mocker):
"""Test that start command shows cost estimate before launching."""
# Setup mocks
mocker.patch("soong.cli.get_config", return_value=sample_config)
mock_api = mocker.Mock()
mock_api.get_instance_type.return_value = mocker.Mock(
description="1x A100 SXM4 (80 GB)",
price_per_hour=1.29,
estimate_cost=lambda h: 1.29 * h,
)
mocker.patch("soong.cli.LambdaAPI", return_value=mock_api)
# Mock user declining
mocker.patch("questionary.confirm", return_value=mocker.Mock(ask=lambda: False))
# Run command
result = cli_runner.invoke(app, ["start"])
# Verify cost estimate shown
assert "Cost Estimate" in result.output
assert "$1.29/hr" in result.output
assert "4 hours" in result.output # Default lease
# Verify launch cancelled
assert "cancelled" in result.output.lower()
assert result.exit_code == 0
API Client Tests¶
Test API calls with responses library (not mocking requests directly):
def test_list_instances_success(mock_http, lambda_api_base_url):
"""Test listing instances with successful API response."""
# Setup mock HTTP response
mock_http.add(
responses.GET,
f"{lambda_api_base_url}/instances",
json={
"data": [
{
"id": "inst_abc123",
"name": "test-instance",
"ip": "1.2.3.4",
"status": "active",
"instance_type": {"name": "gpu_1x_a100_sxm4_80gb"},
"region": {"name": "us-west-1"},
"created_at": "2025-01-01T12:00:00Z",
}
]
},
status=200,
)
# Call API
from soong.lambda_api import LambdaAPI
api = LambdaAPI("test_key")
instances = api.list_instances()
# Verify results
assert len(instances) == 1
assert instances[0].id == "inst_abc123"
assert instances[0].status == "active"
# Verify HTTP call made correctly
assert len(mock_http.calls) == 1
assert mock_http.calls[0].request.headers["Authorization"] == "Bearer test_key"
Testing Error Handling¶
def test_launch_instance_api_error(mock_http, lambda_api_base_url):
"""Test launch_instance handles API errors gracefully."""
# Mock API error response
mock_http.add(
responses.POST,
f"{lambda_api_base_url}/instance-operations/launch",
json={"error": "Insufficient quota"},
status=403,
)
# Call API and expect error
from soong.lambda_api import LambdaAPI, LambdaAPIError
api = LambdaAPI("test_key")
with pytest.raises(LambdaAPIError, match="API request failed"):
api.launch_instance(
region="us-west-1",
instance_type="gpu_1x_a100_sxm4_80gb",
ssh_key_names=["my-key"],
)
Test Fixtures¶
Common Fixtures (from conftest.py)¶
Model configurations:
def test_using_sample_model(sample_model_config):
"""Use the sample 70B model fixture."""
assert sample_model_config.params_billions == 70.0
assert sample_model_config.default_quantization == Quantization.INT4
def test_using_small_model(small_model_config):
"""Use the sample 7B model fixture."""
assert small_model_config.params_billions == 7.0
Configuration:
def test_using_config(sample_config):
"""Use the sample configuration fixture."""
assert sample_config.lambda_config.api_key == "test_key_12345"
assert sample_config.defaults.lease_hours == 4
HTTP mocking:
def test_api_call(mock_http, lambda_api_base_url):
"""Use HTTP mocking fixture."""
mock_http.add(
responses.GET,
f"{lambda_api_base_url}/endpoint",
json={"status": "ok"},
)
# Make request and verify
CLI runner:
def test_command(cli_runner):
"""Use CLI runner fixture."""
from soong.cli import app
result = cli_runner.invoke(app, ["command", "args"])
assert result.exit_code == 0
Creating Custom Fixtures¶
# In test file or conftest.py
import pytest
@pytest.fixture
def mock_instance_ready(mocker):
"""Mock an instance that is ready."""
instance = mocker.Mock()
instance.id = "inst_ready123"
instance.status = "active"
instance.ip = "5.6.7.8"
return instance
def test_with_custom_fixture(mock_instance_ready):
"""Use custom fixture."""
assert mock_instance_ready.status == "active"
Mocking Best Practices¶
Mock External Services, Not Internal Logic¶
Good:
def test_instance_launch(mocker):
"""Mock the Lambda API, not internal logic."""
mock_api = mocker.Mock()
mock_api.launch_instance.return_value = "inst_123"
manager = InstanceManager(mock_api)
result = manager.launch_with_config(...)
# Verify API was called correctly
mock_api.launch_instance.assert_called_once_with(
region="us-west-1",
instance_type="gpu_1x_a100_sxm4_80gb",
)
Bad:
def test_instance_launch_bad(mocker):
"""Don't mock internal helper functions."""
mocker.patch("soong.instance._validate_config") # Internal detail
# This test is brittle and doesn't verify behavior
Use responses for HTTP, Not mocker.patch("requests")¶
Good:
def test_http_request(mock_http):
"""Mock HTTP responses with responses library."""
mock_http.add(responses.GET, "https://api.example.com", json={"ok": True})
result = requests.get("https://api.example.com")
assert result.json()["ok"] is True
Bad:
def test_http_request_bad(mocker):
"""Don't mock requests directly."""
mocker.patch("requests.get", return_value=mocker.Mock(json=lambda: {"ok": True}))
# Doesn't test actual HTTP logic
Use Realistic Test Data¶
Good:
def test_instance_parsing(mock_http):
"""Use realistic instance data from Lambda API."""
mock_http.add(
responses.GET,
url,
json={
"data": [
{
"id": "inst_unique_789xyz", # Unique ID
"name": "my-test-instance",
"ip": "192.168.99.42", # Unique IP
"status": "active",
# ... full realistic data
}
]
},
)
Bad:
def test_instance_parsing_bad(mock_http):
"""Don't use minimal data that hides bugs."""
mock_http.add(responses.GET, url, json={"data": [{"id": "123"}]})
# Missing fields would cause AttributeError in real usage
Coverage Goals¶
Current Coverage¶
As of recent updates, the test suite achieves:
- Overall coverage: 95%+
- Core modules: 95-100%
- CLI commands: 90-95%
- API client: 95%+
Measuring Coverage¶
# Generate coverage report
pytest --cov=soong --cov-report=term-missing
# See which lines are not covered
pytest --cov=soong --cov-report=html
open htmlcov/index.html
Improving Coverage¶
Find untested code:
Add tests for uncovered lines:
# Example: Testing error path
def test_config_validation_invalid_quantization():
"""Test that invalid quantization raises ValueError."""
from soong.config import validate_custom_model
invalid_model = {
"hf_path": "org/model",
"params_billions": 7,
"quantization": "invalid", # Invalid value
"context_length": 8192,
}
with pytest.raises(ValueError, match="Invalid quantization"):
validate_custom_model(invalid_model)
Continuous Integration¶
Tests run automatically on:
- Every push to feature branches
- Every pull request
- Merges to main branch
CI Configuration¶
GitHub Actions workflow (.github/workflows/test.yml):
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
cd cli
pip install -e ".[test]"
- name: Run tests with coverage
run: |
cd cli
pytest --cov=soong --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./cli/coverage.xml
Debugging Tests¶
Using pytest --pdb¶
# Drop into debugger on failure
pytest --pdb tests/test_models.py
# Drop into debugger on first failure
pytest -x --pdb
Adding Breakpoints¶
def test_complex_logic():
"""Test with debugging."""
result = complex_function()
import pdb; pdb.set_trace() # Debugger stops here
assert result == expected
Viewing Test Output¶
# Show all print statements
pytest -s
# Show logging
pytest --log-cli-level=DEBUG
# Capture turned off (shows all output)
pytest --capture=no
Test Performance¶
Running Tests Quickly¶
# Run tests in parallel (requires pytest-xdist)
pip install pytest-xdist
pytest -n auto
# Run only fast tests (mark slow tests with @pytest.mark.slow)
pytest -m "not slow"
# Run only changed tests
pytest --testmon
Marking Slow Tests¶
import pytest
@pytest.mark.slow
def test_full_integration_workflow():
"""This test takes >5 seconds."""
# Slow integration test
Then skip slow tests:
Testing Best Practices¶
1. Test Behavior, Not Implementation¶
Good:
def test_launch_instance_creates_active_instance():
"""Verify instance becomes active after launch."""
# Test the outcome
assert instance.status == "active"
Bad:
def test_launch_instance_calls_api_exactly_once():
"""Don't test implementation details."""
# Too coupled to implementation
mock_api.launch_instance.assert_called_once()
2. Use Descriptive Test Names¶
Good:
def test_estimate_vram_70b_int4_fits_on_a100_80gb():
"""Clear what is being tested and expected outcome."""
Bad:
3. Arrange-Act-Assert Pattern¶
def test_something():
# Arrange: Set up test data
model = ModelConfig(...)
# Act: Execute the code being tested
result = model.estimated_vram_gb
# Assert: Verify the outcome
assert result > 0
assert result < 100
4. One Logical Assertion Per Test¶
Good:
def test_vram_estimate_includes_base():
"""Test that base VRAM is calculated."""
assert result["base_vram_gb"] == 35.0
def test_vram_estimate_includes_overhead():
"""Test that overhead is included."""
assert result["total_estimated_gb"] > result["base_vram_gb"]
Bad:
def test_vram_estimate():
"""Test everything at once."""
assert result["base_vram_gb"] == 35.0
assert result["kv_cache_gb"] > 0
assert result["overhead_gb"] == 2.0
assert result["total_estimated_gb"] > 35.0
# Hard to debug when one assertion fails
5. Test Edge Cases¶
def test_estimate_vram_zero_params():
"""Test VRAM estimation with zero parameters."""
result = estimate_vram(0, Quantization.FP16, 8192)
assert result["base_vram_gb"] == 0
def test_estimate_vram_huge_params():
"""Test VRAM estimation with very large model."""
result = estimate_vram(1000, Quantization.FP32, 32768)
assert result["min_vram_gb"] == 160 # Multi-GPU
Next Steps¶
- Read Development Setup to set up your environment
- Explore existing tests in
cli/tests/ - Run tests and experiment with changes
- Write tests for new features before implementing them (TDD)