GPU Types Reference¶
Complete reference for Lambda Labs GPU types available through soong.
Overview¶
Lambda Labs offers GPUs from NVIDIA's professional and data center lines:
- A10/RTX 6000: Entry-level, 24 GB VRAM
- A6000: Workstation class, 48 GB VRAM
- A100: Data center standard, 40-80 GB VRAM
- H100: Hopper generation, 80 GB VRAM with faster performance
- GH200: Grace Hopper Superchip, 96 GB VRAM with ARM CPU
- B200: Blackwell (newest), up to 180 GB VRAM per GPU
GPU Comparison Table¶
| GPU Type | GPU | VRAM | vCPUs | RAM | Storage | Est. Price/hr |
|---|---|---|---|---|---|---|
gpu_1x_rtx6000 |
1x RTX 6000 | 24 GB | 30 | 200 GB | 1.4 TB | $0.50 |
gpu_1x_a10 |
1x A10 | 24 GB | 30 | 200 GB | 1.4 TB | $0.75 |
gpu_1x_a6000 |
1x A6000 | 48 GB | 28 | 200 GB | 512 GB | $0.80 |
gpu_1x_a100 |
1x A100 PCIe | 40 GB | 30 | 200 GB | 512 GB | $1.29 |
gpu_1x_a100_sxm4 |
1x A100 SXM4 | 40 GB | 30 | 200 GB | 512 GB | $1.29 |
gpu_1x_gh200 |
1x GH200 | 96 GB | 72 | 480 GB | 2 TB | $1.49 |
gpu_1x_h100_pcie |
1x H100 PCIe | 80 GB | 26 | 200 GB | 512 GB | $2.49 |
gpu_1x_h100_sxm5 |
1x H100 SXM5 | 80 GB | 52 | 1000 GB | 2 TB | $3.29 |
gpu_1x_b200_sxm6 |
1x B200 Blackwell | 180 GB | 72 | 480 GB | 2 TB | $5.29 |
gpu_2x_a100 |
2x A100 SXM4 | 80 GB (2×40) | 48 | 900 GB | 6 TB | $2.58 |
gpu_2x_h100_sxm5 |
2x H100 SXM5 | 160 GB (2×80) | 104 | 2000 GB | 4 TB | $6.38 |
gpu_2x_b200_sxm6 |
2x B200 Blackwell | 360 GB (2×180) | 144 | 960 GB | 4 TB | $10.38 |
gpu_4x_a100 |
4x A100 SXM4 | 160 GB (4×40) | 120 | 1800 GB | 14 TB | $5.16 |
gpu_4x_h100_sxm5 |
4x H100 SXM5 | 320 GB (4×80) | 208 | 4000 GB | 8 TB | $12.36 |
gpu_8x_a100 |
8x A100 SXM4 | 320 GB (8×40) | 240 | 1800 GB | 14 TB | $10.32 |
gpu_8x_a100_80gb_sxm4 |
8x A100 SXM4 | 640 GB (8×80) | 240 | 1800 GB | 14 TB | $14.32 |
gpu_8x_h100_sxm5 |
8x H100 SXM5 | 640 GB (8×80) | 416 | 7800 GB | 30 TB | $23.92 |
gpu_8x_b200_sxm6 |
8x B200 Blackwell | 1440 GB (8×180) | 576 | 3840 GB | 16 TB | $39.92 |
Pricing
Prices are approximate and may vary by region and availability. Check Lambda Labs dashboard for current pricing.
Single GPU Instances¶
A10 (24 GB) - Budget Option¶
Best For: - Small models (7-8B parameters) - Quantized 32B models (INT4) - Development and testing - Cost-conscious workloads
Compatible Models: - ✅ Llama 3.1 8B (FP16) - ✅ Mistral 7B (FP16) - ✅ Qwen 2.5 Coder 32B (INT4)
Regions: Usually good availability in us-west-1, us-east-1
A6000 / RTX 6000 Ada (48 GB) - Mid-Range¶
Type: gpu_1x_a6000 or gpu_1x_rtx6000
GPU: 1x NVIDIA A6000 or RTX 6000 Ada
VRAM: 48 GB
Price: ~$0.80/hr
Best For: - Medium models (30-70B INT4) - Small multi-GPU experiments - Professional workstation workloads
Compatible Models: - ✅ DeepSeek-R1 70B (INT4) - ✅ Llama 3.1 70B (INT4) - ✅ All smaller models
Regions: Limited availability, check soong available
A100 40 GB - Standard Data Center¶
Type: gpu_1x_a100 or gpu_1x_a100_sxm4
GPU: 1x NVIDIA A100 (PCIe or SXM4)
VRAM: 40 GB
Price: ~$1.10/hr
Best For: - 70B models with tight VRAM (INT4) - Production inference - Training small models
Compatible Models: - ✅ DeepSeek-R1 70B (INT4) - ✅ Llama 3.1 70B (INT4) - ⚠️ Qwen 2.5 Coder 32B (FP16) - tight fit
Notes: - SXM4 has faster interconnect (useful for multi-GPU) - PCIe version has slightly lower bandwidth
A100 80 GB - Recommended ⭐¶
Best For:
- All 70B models (FP16 and INT4)
- 32B models at full precision (FP16)
- Most flexible option
- Default for soong
Compatible Models: - ✅ All models in registry - ✅ Qwen 2.5 Coder 32B (FP16) - recommended - ✅ DeepSeek-R1 70B (INT4) - ✅ Code Llama 34B (FP16)
Regions: Best availability across all regions
Why Recommended: - 2x VRAM of A100 40GB for only 17% more cost - Runs all models comfortably - Good availability
H100 PCIe (80 GB) - Latest Generation¶
Best For: - Faster inference (2-3x vs A100) - Production with tight latency requirements - Latest Transformer Engine features
Compatible Models: - ✅ Same as A100 80GB - ⚡ 2-3x faster inference
Notes: - Worth the premium for production workloads - Not necessary for development
H100 SXM5 (80 GB) - Maximum Performance¶
Type: gpu_1x_h100_sxm5
GPU: 1x NVIDIA H100 SXM5
VRAM: 80 GB
Price: ~$3.29/hr
CPU: 52 vCPUs
RAM: 1 TB
Best For: - Maximum single-GPU performance - Large batch sizes - NVLink for multi-GPU scaling
Compatible Models: - ✅ Same as A100 80GB - ⚡ 3-4x faster than A100
Notes: - SXM5 has faster interconnect than PCIe - 2× more CPU cores and RAM than H100 PCIe
GH200 (96 GB) - Grace Hopper Superchip ⭐¶
Type: gpu_1x_gh200
GPU: 1x NVIDIA GH200 Grace Hopper
VRAM: 96 GB HBM3
Price: ~$1.49/hr
CPU: 72 ARM Neoverse cores
RAM: 480 GB
Best For: - Large models that need >80GB VRAM - Excellent price/VRAM ratio - ARM-based workloads
Compatible Models: - ✅ All 70B models at any precision - ✅ Qwen 2.5 Coder 32B (FP16) with room to spare - ✅ Models up to ~90GB VRAM requirement
Notes: - Uses ARM CPU (Grace), not x86 - Unified memory architecture - Best value for VRAM-hungry workloads
B200 Blackwell (180 GB) - Next Generation¶
Type: gpu_1x_b200_sxm6
GPU: 1x NVIDIA B200 Blackwell
VRAM: 180 GB HBM3e
Price: ~$5.29/hr
CPU: 72 vCPUs
RAM: 480 GB
Best For: - Largest single-GPU VRAM - Models up to 175B at FP16 - Future-proof workloads
Compatible Models: - ✅ All models up to ~170GB VRAM - ✅ 70B FP16 with massive headroom - ✅ 175B INT8 models
Notes: - Latest Blackwell architecture - 2× faster than H100 for some workloads - Highest single-GPU VRAM available
Multi-GPU Instances¶
Multi-GPU Support
Multi-GPU instances require distributed inference setup (e.g., vLLM, Ray). Not currently automated in soong.
2x A100 (80 GB Total)¶
When to Use: - Models that need 60-80 GB total - Cheaper than 1x A100 80GB? No - use single 80GB instead - Training with data parallelism
4x A100 (160 GB Total)¶
When to Use: - 70B models at FP32 precision - Large 175B+ models with quantization - Multi-GPU training
8x A100 (320 GB Total)¶
When to Use: - 175B models (FP16) - Large-scale training - Multi-user inference serving
8x H100 (640 GB Total)¶
Type: gpu_8x_h100_sxm5
GPUs: 8x H100 SXM5 (80 GB each)
Total VRAM: 640 GB
Price: ~$23.92/hr
CPU: 416 vCPUs
RAM: 7.8 TB
When to Use: - Largest models (400B+) - High-throughput production serving - Multi-GPU training at scale
8x B200 Blackwell (1440 GB Total)¶
Type: gpu_8x_b200_sxm6
GPUs: 8x B200 SXM6 (180 GB each)
Total VRAM: 1440 GB
Price: ~$39.92/hr
CPU: 576 vCPUs
RAM: 3.8 TB
When to Use: - Largest models (1T+ parameters) - Maximum VRAM capacity - Cutting-edge Blackwell architecture
GPU Selection Guide¶
By Model Size¶
| Model Size | Recommended GPU | Alternative |
|---|---|---|
| 7-8B FP16 | A10 (24 GB) | A6000 (48 GB) |
| 32B INT4 | A10 (24 GB) | A6000 (48 GB) |
| 32B FP16 | A100 (80 GB) | H100 (80 GB) |
| 70B INT4 | A100 (80 GB) | A6000 (48 GB) |
| 70B FP16 | 2x A100 (80 GB) | H100 (80 GB) |
By Budget¶
| Budget/hr | GPU Type | Models |
|---|---|---|
| < $1 | A10 (24 GB) | Small models, INT4 quantized |
| $1-2 | A100 80GB | All common models |
| $2-5 | H100 or 2-4x A100 | Large models, fast inference |
| $5+ | 8x A100/H100 | Multi-GPU production |
By Use Case¶
| Use Case | Recommended | Why |
|---|---|---|
| Development/Testing | A10 (24 GB) | Cheapest, fast iteration |
| Production Inference | A100 80GB | Best price/performance |
| Low Latency | H100 PCIe | 2-3x faster inference |
| Training | Multi-GPU A100 | NVLink, large batches |
| Maximum Performance | H100 SXM5 | Latest tech, fastest |
Availability¶
GPU availability varies by region and time. Check current availability:
Typical Availability (by region)¶
| Region | A10 | A100 80GB | H100 |
|---|---|---|---|
us-west-1 |
✅ High | ✅ High | ⚠️ Limited |
us-east-1 |
✅ High | ✅ High | ⚠️ Limited |
us-south-1 |
✅ Medium | ✅ Medium | ❌ Rare |
europe-central-1 |
⚠️ Limited | ✅ Medium | ❌ Rare |
Availability Strategy
- A100 80GB has best availability across regions
- H100s are often scarce - check multiple regions
- A10 is usually available but may have waitlists
- Multi-GPU instances (4x, 8x) often require scheduling
Cost Examples¶
4-Hour Coding Session¶
| GPU | Cost | Suitable Models |
|---|---|---|
| A10 | $2.40 | Llama 8B, Qwen 32B INT4 |
| A100 80GB | $5.16 | All models |
| H100 PCIe | $7.96 | All models, faster |
Full Day (8 hours)¶
| GPU | Cost | Suitable Models |
|---|---|---|
| A10 | $4.80 | Small models |
| A100 80GB | $10.32 | All models |
| H100 PCIe | $15.92 | All models, production |
Weekly Development (40 hours)¶
| GPU | Cost | Suitable Models |
|---|---|---|
| A10 | $24 | Budget coding |
| A100 80GB | $51.60 | Professional use |
| H100 PCIe | $79.60 | High performance |
GPU Specifications Deep Dive¶
Compute Architecture¶
| GPU | Architecture | CUDA Cores | Tensor Cores | TDP |
|---|---|---|---|---|
| A10 | Ampere | 9,216 | 288 (3rd gen) | 150W |
| A6000 | Ampere | 10,752 | 336 (3rd gen) | 300W |
| RTX 6000 Ada | Ada Lovelace | 18,176 | 568 (4th gen) | 300W |
| A100 | Ampere | 6,912 | 432 (3rd gen) | 400W |
| H100 | Hopper | 16,896 | 528 (4th gen) | 700W |
Memory Bandwidth¶
| GPU | Memory Type | Bandwidth | ECC |
|---|---|---|---|
| A10 | GDDR6 | 600 GB/s | Yes |
| A6000 | GDDR6 | 768 GB/s | Yes |
| RTX 6000 Ada | GDDR6 | 960 GB/s | Yes |
| A100 40GB | HBM2e | 1,555 GB/s | Yes |
| A100 80GB | HBM2e | 2,039 GB/s | Yes |
| H100 | HBM3 | 3,350 GB/s | Yes |
Inference Performance (relative)¶
| GPU | FP16 | INT8 | INT4 |
|---|---|---|---|
| A10 | 1.0× | 1.0× | 1.0× |
| A100 | 1.5× | 2.0× | - |
| H100 | 3.0× | 4.0× | 6.0× |
Performance relative to A10 baseline
Multi-GPU Interconnects¶
PCIe (Standard)¶
- Bandwidth: 64 GB/s (PCIe 4.0 x16)
- Use case: Single GPU or CPU-bound workloads
- GPUs: A10, A6000, RTX 6000, H100 PCIe
NVLink (A100 SXM4)¶
- Bandwidth: 600 GB/s (12 NVLink lanes)
- Use case: Multi-GPU training, large models
- GPUs: A100 SXM4 in multi-GPU configs
NVLink 4 (H100 SXM5)¶
- Bandwidth: 900 GB/s (18 NVLink lanes)
- Use case: Highest multi-GPU performance
- GPUs: H100 SXM5 in multi-GPU configs
Choosing the Right GPU¶
Decision Tree¶
Need model > 70B?
├─ Yes → Multi-GPU or wait for larger models
└─ No
├─ Need FP16 precision for 32B model?
│ └─ Yes → A100 80GB
└─ No
├─ Budget < $1/hr?
│ └─ Yes → A10 (use INT4 models)
└─ No
├─ Need fastest inference?
│ └─ Yes → H100
└─ No → A100 80GB (best value)
Common Mistakes¶
❌ Using A100 40GB for Qwen 32B FP16 - Too tight, may OOM - Use A100 80GB instead
❌ Using H100 for development - 2× cost for minimal benefit in dev - Use A100 80GB instead
❌ Using 2x A100 for 70B INT4 - Doesn't need multi-GPU - Use 1x A100 80GB instead
❌ Using A10 for DeepSeek-R1 FP16 - Won't fit (needs 140+ GB) - Use A100 80GB with INT4
Frequently Asked Questions¶
Can I use multiple GPUs for one model?¶
Yes, but requires setup:
- vLLM supports tensor parallelism
- Ray supports pipeline parallelism
- Not currently automated in soong
What's the difference between PCIe and SXM?¶
- PCIe: Plugs into motherboard slot, lower bandwidth
- SXM: Direct socket, higher bandwidth, NVLink support
- For single GPU, minimal difference
- For multi-GPU, SXM much faster
Should I get H100 over A100?¶
Get H100 if: - Production workload with tight latency SLA - Need maximum throughput - Budget allows
Get A100 if: - Development/testing - Cost-conscious - 2-3x inference speed sufficient
Why is A100 80GB only 17% more than 40GB?¶
Lambda Labs prices by total cost of ownership. 80GB variant has: - 2× VRAM - 31% higher memory bandwidth - Better availability
It's the best value in their lineup.
What about RTX 4090 or consumer GPUs?¶
Lambda Labs only offers professional/data center GPUs: - Better reliability (ECC memory) - Official driver support - Better multi-GPU scaling - Data center warranties
See Also¶
- Model Registry - Which models fit on which GPUs
- CLI Commands -
availableandstartcommands - Configuration - Setting default GPU type
- Lambda Labs Instance Types - Official specs