Long Context Test Plan (16K-32K Tokens)
Project: Issue #995 - ModernBERT-base-32k Integration
Required: NVIDIA A100 GPU (40GB+ VRAM)
Overview
This test plan covers validation of ModernBERT-base-32k for long context sequences (16K-32K tokens). These tests cannot be completed with the current environment (NVIDIA L4 GPU, 23GB VRAM) and require an A100 GPU with 40GB+ VRAM.
Infrastructure Status: Ready - All tools and test frameworks are prepared
Test Requirements
Hardware Requirements
- GPU: NVIDIA A100 (40GB+ VRAM) - Required
- System RAM: 64GB+ recommended
- CUDA: Version 12.0+
- Driver: Latest NVIDIA driver
Software Requirements
benchmark_concurrent.rs- Supports 16K/32K (currently commented out)benchmark_performance.rs- Performance profiling tool- Flash Attention 2 enabled
- All dependencies installed
Test Cases
1. Basic Inference Testing
1.1 Single Request Latency (C=1)
| Context Length | Expected Latency | Success Criteria |
|---|---|---|
| 16384 tokens | < 10s | Latency < 10s |
| 24576 tokens | < 15s | Latency < 15s |
| 32768 tokens | < 20s | Latency < 20s |
Test Steps:
- Load ModernBERT-base-32k model
- Create test sequences of 16K, 24K, 32K tokens
- Measure inference latency for each
- Verify no OOM errors
- Document results
Deliverables:
- Latency measurements for each context length
- Memory usage profiles
- Success/failure status