GPU Types Reference¶

Complete reference for Lambda Labs GPU types available through soong.

Overview¶

Lambda Labs offers GPUs from NVIDIA's professional and data center lines:

A10/RTX 6000: Entry-level, 24 GB VRAM
A6000: Workstation class, 48 GB VRAM
A100: Data center standard, 40-80 GB VRAM
H100: Hopper generation, 80 GB VRAM with faster performance
GH200: Grace Hopper Superchip, 96 GB VRAM with ARM CPU
B200: Blackwell (newest), up to 180 GB VRAM per GPU

GPU Comparison Table¶

GPU Type	GPU	VRAM	vCPUs	RAM	Storage	Est. Price/hr
`gpu_1x_rtx6000`	1x RTX 6000	24 GB	30	200 GB	1.4 TB	$0.50
`gpu_1x_a10`	1x A10	24 GB	30	200 GB	1.4 TB	$0.75
`gpu_1x_a6000`	1x A6000	48 GB	28	200 GB	512 GB	$0.80
`gpu_1x_a100`	1x A100 PCIe	40 GB	30	200 GB	512 GB	$1.29
`gpu_1x_a100_sxm4`	1x A100 SXM4	40 GB	30	200 GB	512 GB	$1.29
`gpu_1x_gh200`	1x GH200	96 GB	72	480 GB	2 TB	$1.49
`gpu_1x_h100_pcie`	1x H100 PCIe	80 GB	26	200 GB	512 GB	$2.49
`gpu_1x_h100_sxm5`	1x H100 SXM5	80 GB	52	1000 GB	2 TB	$3.29
`gpu_1x_b200_sxm6`	1x B200 Blackwell	180 GB	72	480 GB	2 TB	$5.29
`gpu_2x_a100`	2x A100 SXM4	80 GB (2×40)	48	900 GB	6 TB	$2.58
`gpu_2x_h100_sxm5`	2x H100 SXM5	160 GB (2×80)	104	2000 GB	4 TB	$6.38
`gpu_2x_b200_sxm6`	2x B200 Blackwell	360 GB (2×180)	144	960 GB	4 TB	$10.38
`gpu_4x_a100`	4x A100 SXM4	160 GB (4×40)	120	1800 GB	14 TB	$5.16
`gpu_4x_h100_sxm5`	4x H100 SXM5	320 GB (4×80)	208	4000 GB	8 TB	$12.36
`gpu_8x_a100`	8x A100 SXM4	320 GB (8×40)	240	1800 GB	14 TB	$10.32
`gpu_8x_a100_80gb_sxm4`	8x A100 SXM4	640 GB (8×80)	240	1800 GB	14 TB	$14.32
`gpu_8x_h100_sxm5`	8x H100 SXM5	640 GB (8×80)	416	7800 GB	30 TB	$23.92
`gpu_8x_b200_sxm6`	8x B200 Blackwell	1440 GB (8×180)	576	3840 GB	16 TB	$39.92

Pricing

Prices are approximate and may vary by region and availability. Check Lambda Labs dashboard for current pricing.

Single GPU Instances¶

A10 (24 GB) - Budget Option¶

Type: gpu_1x_a10
GPU: 1x NVIDIA A10
VRAM: 24 GB
Price: ~$0.60/hr

Best For: - Small models (7-8B parameters) - Quantized 32B models (INT4) - Development and testing - Cost-conscious workloads

Compatible Models: - ✅ Llama 3.1 8B (FP16) - ✅ Mistral 7B (FP16) - ✅ Qwen 2.5 Coder 32B (INT4)

Regions: Usually good availability in us-west-1, us-east-1

A6000 / RTX 6000 Ada (48 GB) - Mid-Range¶

Type: gpu_1x_a6000 or gpu_1x_rtx6000
GPU: 1x NVIDIA A6000 or RTX 6000 Ada
VRAM: 48 GB
Price: ~$0.80/hr

Best For: - Medium models (30-70B INT4) - Small multi-GPU experiments - Professional workstation workloads

Compatible Models: - ✅ DeepSeek-R1 70B (INT4) - ✅ Llama 3.1 70B (INT4) - ✅ All smaller models

Regions: Limited availability, check soong available

A100 40 GB - Standard Data Center¶

Type: gpu_1x_a100 or gpu_1x_a100_sxm4
GPU: 1x NVIDIA A100 (PCIe or SXM4)
VRAM: 40 GB
Price: ~$1.10/hr

Best For: - 70B models with tight VRAM (INT4) - Production inference - Training small models

Compatible Models: - ✅ DeepSeek-R1 70B (INT4) - ✅ Llama 3.1 70B (INT4) - ⚠️ Qwen 2.5 Coder 32B (FP16) - tight fit

Notes: - SXM4 has faster interconnect (useful for multi-GPU) - PCIe version has slightly lower bandwidth

A100 80 GB - Recommended ⭐¶

Type: gpu_1x_a100_sxm4_80gb
GPU: 1x NVIDIA A100 SXM4
VRAM: 80 GB
Price: ~$1.29/hr

Best For: - All 70B models (FP16 and INT4) - 32B models at full precision (FP16) - Most flexible option - Default for soong

Compatible Models: - ✅ All models in registry - ✅ Qwen 2.5 Coder 32B (FP16) - recommended - ✅ DeepSeek-R1 70B (INT4) - ✅ Code Llama 34B (FP16)

Regions: Best availability across all regions

Why Recommended: - 2x VRAM of A100 40GB for only 17% more cost - Runs all models comfortably - Good availability

H100 PCIe (80 GB) - Latest Generation¶

Type: gpu_1x_h100_pcie
GPU: 1x NVIDIA H100 PCIe
VRAM: 80 GB
Price: ~$1.99/hr

Best For: - Faster inference (2-3x vs A100) - Production with tight latency requirements - Latest Transformer Engine features

Compatible Models: - ✅ Same as A100 80GB - ⚡ 2-3x faster inference

Notes: - Worth the premium for production workloads - Not necessary for development

H100 SXM5 (80 GB) - Maximum Performance¶

Type: gpu_1x_h100_sxm5
GPU: 1x NVIDIA H100 SXM5
VRAM: 80 GB
Price: ~$3.29/hr
CPU: 52 vCPUs
RAM: 1 TB

Best For: - Maximum single-GPU performance - Large batch sizes - NVLink for multi-GPU scaling

Compatible Models: - ✅ Same as A100 80GB - ⚡ 3-4x faster than A100

Notes: - SXM5 has faster interconnect than PCIe - 2× more CPU cores and RAM than H100 PCIe

GH200 (96 GB) - Grace Hopper Superchip ⭐¶

Type: gpu_1x_gh200
GPU: 1x NVIDIA GH200 Grace Hopper
VRAM: 96 GB HBM3
Price: ~$1.49/hr
CPU: 72 ARM Neoverse cores
RAM: 480 GB

Best For: - Large models that need >80GB VRAM - Excellent price/VRAM ratio - ARM-based workloads

Compatible Models: - ✅ All 70B models at any precision - ✅ Qwen 2.5 Coder 32B (FP16) with room to spare - ✅ Models up to ~90GB VRAM requirement

Notes: - Uses ARM CPU (Grace), not x86 - Unified memory architecture - Best value for VRAM-hungry workloads

B200 Blackwell (180 GB) - Next Generation¶

Type: gpu_1x_b200_sxm6
GPU: 1x NVIDIA B200 Blackwell
VRAM: 180 GB HBM3e
Price: ~$5.29/hr
CPU: 72 vCPUs
RAM: 480 GB

Best For: - Largest single-GPU VRAM - Models up to 175B at FP16 - Future-proof workloads

Compatible Models: - ✅ All models up to ~170GB VRAM - ✅ 70B FP16 with massive headroom - ✅ 175B INT8 models

Notes: - Latest Blackwell architecture - 2× faster than H100 for some workloads - Highest single-GPU VRAM available

Multi-GPU Instances¶

Multi-GPU Support

Multi-GPU instances require distributed inference setup (e.g., vLLM, Ray). Not currently automated in soong.

2x A100 (80 GB Total)¶

Type: gpu_2x_a100
GPUs: 2x A100 SXM4 (40 GB each)
Total VRAM: 80 GB
Price: ~$2.20/hr

When to Use: - Models that need 60-80 GB total - Cheaper than 1x A100 80GB? No - use single 80GB instead - Training with data parallelism

4x A100 (160 GB Total)¶

Type: gpu_4x_a100
GPUs: 4x A100 SXM4 (40 GB each)
Total VRAM: 160 GB
Price: ~$4.40/hr

When to Use: - 70B models at FP32 precision - Large 175B+ models with quantization - Multi-GPU training

8x A100 (320 GB Total)¶

Type: gpu_8x_a100
GPUs: 8x A100 SXM4 (40 GB each)
Total VRAM: 320 GB
Price: ~$8.80/hr

When to Use: - 175B models (FP16) - Large-scale training - Multi-user inference serving

8x H100 (640 GB Total)¶

Type: gpu_8x_h100_sxm5
GPUs: 8x H100 SXM5 (80 GB each)
Total VRAM: 640 GB
Price: ~$23.92/hr
CPU: 416 vCPUs
RAM: 7.8 TB

When to Use: - Largest models (400B+) - High-throughput production serving - Multi-GPU training at scale

8x B200 Blackwell (1440 GB Total)¶

Type: gpu_8x_b200_sxm6
GPUs: 8x B200 SXM6 (180 GB each)
Total VRAM: 1440 GB
Price: ~$39.92/hr
CPU: 576 vCPUs
RAM: 3.8 TB

When to Use: - Largest models (1T+ parameters) - Maximum VRAM capacity - Cutting-edge Blackwell architecture

GPU Selection Guide¶

By Model Size¶

Model Size	Recommended GPU	Alternative
7-8B FP16	A10 (24 GB)	A6000 (48 GB)
32B INT4	A10 (24 GB)	A6000 (48 GB)
32B FP16	A100 (80 GB)	H100 (80 GB)
70B INT4	A100 (80 GB)	A6000 (48 GB)
70B FP16	2x A100 (80 GB)	H100 (80 GB)

By Budget¶

Budget/hr	GPU Type	Models
< $1	A10 (24 GB)	Small models, INT4 quantized
$1-2	A100 80GB	All common models
$2-5	H100 or 2-4x A100	Large models, fast inference
$5+	8x A100/H100	Multi-GPU production

By Use Case¶

Use Case	Recommended	Why
Development/Testing	A10 (24 GB)	Cheapest, fast iteration
Production Inference	A100 80GB	Best price/performance
Low Latency	H100 PCIe	2-3x faster inference
Training	Multi-GPU A100	NVLink, large batches
Maximum Performance	H100 SXM5	Latest tech, fastest

Availability¶

GPU availability varies by region and time. Check current availability:

soong available

Typical Availability (by region)¶

Region	A10	A100 80GB	H100
`us-west-1`	✅ High	✅ High	⚠️ Limited
`us-east-1`	✅ High	✅ High	⚠️ Limited
`us-south-1`	✅ Medium	✅ Medium	❌ Rare
`europe-central-1`	⚠️ Limited	✅ Medium	❌ Rare

Availability Strategy

A100 80GB has best availability across regions
H100s are often scarce - check multiple regions
A10 is usually available but may have waitlists
Multi-GPU instances (4x, 8x) often require scheduling

Cost Examples¶

4-Hour Coding Session¶

GPU	Cost	Suitable Models
A10	$2.40	Llama 8B, Qwen 32B INT4
A100 80GB	$5.16	All models
H100 PCIe	$7.96	All models, faster

Full Day (8 hours)¶

GPU	Cost	Suitable Models
A10	$4.80	Small models
A100 80GB	$10.32	All models
H100 PCIe	$15.92	All models, production

Weekly Development (40 hours)¶

GPU	Cost	Suitable Models
A10	$24	Budget coding
A100 80GB	$51.60	Professional use
H100 PCIe	$79.60	High performance

GPU Specifications Deep Dive¶

Compute Architecture¶

GPU	Architecture	CUDA Cores	Tensor Cores	TDP
A10	Ampere	9,216	288 (3^rd gen)	150W
A6000	Ampere	10,752	336 (3^rd gen)	300W
RTX 6000 Ada	Ada Lovelace	18,176	568 (4^th gen)	300W
A100	Ampere	6,912	432 (3^rd gen)	400W
H100	Hopper	16,896	528 (4^th gen)	700W

Memory Bandwidth¶

GPU	Memory Type	Bandwidth	ECC
A10	GDDR6	600 GB/s	Yes
A6000	GDDR6	768 GB/s	Yes
RTX 6000 Ada	GDDR6	960 GB/s	Yes
A100 40GB	HBM2e	1,555 GB/s	Yes
A100 80GB	HBM2e	2,039 GB/s	Yes
H100	HBM3	3,350 GB/s	Yes

Inference Performance (relative)¶

GPU	FP16	INT8	INT4
A10	1.0×	1.0×	1.0×
A100	1.5×	2.0×	-
H100	3.0×	4.0×	6.0×

Performance relative to A10 baseline

Multi-GPU Interconnects¶

PCIe (Standard)¶

Bandwidth: 64 GB/s (PCIe 4.0 x16)
Use case: Single GPU or CPU-bound workloads
GPUs: A10, A6000, RTX 6000, H100 PCIe

NVLink (A100 SXM4)¶

Bandwidth: 600 GB/s (12 NVLink lanes)
Use case: Multi-GPU training, large models
GPUs: A100 SXM4 in multi-GPU configs

NVLink 4 (H100 SXM5)¶

Bandwidth: 900 GB/s (18 NVLink lanes)
Use case: Highest multi-GPU performance
GPUs: H100 SXM5 in multi-GPU configs

Choosing the Right GPU¶

Decision Tree¶

Need model > 70B?
├─ Yes → Multi-GPU or wait for larger models
└─ No
    ├─ Need FP16 precision for 32B model?
    │   └─ Yes → A100 80GB
    └─ No
        ├─ Budget < $1/hr?
        │   └─ Yes → A10 (use INT4 models)
        └─ No
            ├─ Need fastest inference?
            │   └─ Yes → H100
            └─ No → A100 80GB (best value)

Common Mistakes¶

❌ Using A100 40GB for Qwen 32B FP16 - Too tight, may OOM - Use A100 80GB instead

❌ Using H100 for development - 2× cost for minimal benefit in dev - Use A100 80GB instead

❌ Using 2x A100 for 70B INT4 - Doesn't need multi-GPU - Use 1x A100 80GB instead

❌ Using A10 for DeepSeek-R1 FP16 - Won't fit (needs 140+ GB) - Use A100 80GB with INT4

Frequently Asked Questions¶

Can I use multiple GPUs for one model?¶

Yes, but requires setup: - vLLM supports tensor parallelism - Ray supports pipeline parallelism - Not currently automated in soong

What's the difference between PCIe and SXM?¶

PCIe: Plugs into motherboard slot, lower bandwidth
SXM: Direct socket, higher bandwidth, NVLink support
For single GPU, minimal difference
For multi-GPU, SXM much faster

Should I get H100 over A100?¶

Get H100 if: - Production workload with tight latency SLA - Need maximum throughput - Budget allows

Get A100 if: - Development/testing - Cost-conscious - 2-3x inference speed sufficient

Why is A100 80GB only 17% more than 40GB?¶

Lambda Labs prices by total cost of ownership. 80GB variant has: - 2× VRAM - 31% higher memory bandwidth - Better availability

It's the best value in their lineup.

What about RTX 4090 or consumer GPUs?¶

Lambda Labs only offers professional/data center GPUs: - Better reliability (ECC memory) - Official driver support - Better multi-GPU scaling - Data center warranties