GPU Mining
High-throughput PoT-O mining using NVIDIA and AMD GPUs.
Overview
GPU mining significantly accelerates tensor computations through massively parallel processing. Modern GPUs can solve PoT-O challenges 10-100x faster than CPUs, making GPU mining the preferred method for serious miners.
Supported Hardware
NVIDIA GPUs (Recommended)
| GPU | VRAM | Throughput | Power | Notes |
|---|---|---|---|---|
| RTX 4090 | 24 GB | ~50 MH/s | 450W | Best performance/cost ratio |
| RTX 4080 | 16 GB | ~35 MH/s | 320W | Excellent balance |
| RTX 4070 Ti | 12 GB | ~28 MH/s | 285W | Mid-range sweet spot |
| RTX 3090 | 24 GB | ~35 MH/s | 420W | Previous gen, still viable |
| A100 | 40 GB | ~100 MH/s | 250W | Data center grade |
AMD GPUs
| GPU | VRAM | Throughput | Power | Notes |
|---|---|---|---|---|
| RX 7900 XTX | 24 GB | ~40 MH/s | 420W | Best AMD option |
| RX 7900 XT | 20 GB | ~35 MH/s | 380W | Good alternative |
| RX 6900 XT | 16 GB | ~25 MH/s | 405W | Previous gen |
Getting Started
1. Install Dependencies
NVIDIA (CUDA):
bash
# Install CUDA Toolkit 12.0+
curl https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-repo-ubuntu2004_12.0.0_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004_12.0.0_amd64.deb
sudo apt-get update && sudo apt-get install -y cuda-toolkit-12-0AMD (HIP):
bash
# Install HIP runtime
wget -q https://repo.radeon.com/rocm/rocm.gpg.key -O - | sudo apt-key add -
sudo apt-get update && sudo apt-get install -y rocm-dkms hip-runtime-amd2. Build GPU Miner
Clone the PoT-O validator with GPU support:
bash
git clone --branch gpu-support https://github.com/TribeWarez/pot-o-validator.git
cd pot-o-validator
# Build with GPU support
cargo build --release --features gpu-mining,cuda
# or for AMD:
cargo build --release --features gpu-mining,hip3. Configure GPU Settings
Create gpu-miner.toml:
toml
[mining]
validator_url = "https://pot.rpc.gateway.tribewarez.com"
miner_pubkey = "your_solana_pubkey"
[gpu]
device_id = 0 # GPU index (0 for first GPU)
compute_capability = "8.9" # RTX 4090 is 8.9, see docs for your GPU
max_block_size = 512 # Blocks per grid
shared_memory = 48000 # Bytes of shared memory per block
cache_mode = "L1" # Prefer L1 cache over shared memory
[performance]
batch_size = 100 # Challenges to batch process
num_streams = 4 # CUDA streams for pipelining
profile_interval = 60 # Profile GPU every 60s4. Start Mining
bash
./target/release/pot-o-gpu-miner --config gpu-miner.tomlMonitor output:
[2026-05-19 14:32:45] GPU 0: RTX 4090 (24GB) ready
[2026-05-19 14:32:46] Fetching challenges...
[2026-05-19 14:32:47] Batch 1: 100 challenges received
[2026-05-19 14:32:52] ✓ 87 valid proofs found, MH/s: 48.3
[2026-05-19 14:32:57] ✓ 92 valid proofs found, MH/s: 51.7Optimization Guide
Memory Tuning
GPU memory bandwidth is critical. Optimize allocation:
toml
[gpu.memory]
# Allocate 80-90% of GPU VRAM for tensor buffers
tensor_cache_mb = 18000 # For 24GB GPU: 24000 * 0.85
intermediate_buffers = 3 # Number of intermediate result buffers
persistent_kernels = true # Keep kernels compiled in memoryKernel Optimization
Choose the right compute kernel for your tensor type:
bash
# For small tensors (< 4K elements)
kernel_type = "matmul_small"
# For large tensors (> 64K elements)
kernel_type = "matmul_large"
# For convolution-heavy challenges
kernel_type = "conv2d_strided"Stream Pipeline
Use CUDA streams to overlap computation and memory transfer:
toml
[gpu.pipeline]
num_streams = 8 # More streams = better latency
h2d_stream = 0 # Host-to-device transfers
compute_streams = [1,2,3,4,5,6,7] # Compute on different streams
d2h_stream = 0 # Device-to-host transfers (overlaps with compute)Power Efficiency
Balance performance and power consumption:
toml
[gpu.power]
max_power_w = 350 # Limit power draw
dynamic_clock = true # Reduce clock during idle
fan_curve = "aggressive" # Maximize cooling
target_temp_c = 75 # Throttle if exceeding thisMulti-GPU Mining
Scale across multiple GPUs:
toml
[mining]
num_gpus = 4
[[gpu]]
device_id = 0
max_block_size = 512
[[gpu]]
device_id = 1
max_block_size = 512
[[gpu]]
device_id = 2
max_block_size = 512
[[gpu]]
device_id = 3
max_block_size = 512Run with load balancer:
bash
./target/release/pot-o-gpu-load-balancer --config gpu-miner.tomlPerformance Benchmarking
Benchmark your GPU setup:
bash
pot-o-benchmark --gpu 0 --duration 60 --batch-size 100Expected output:
GPU 0: RTX 4090
Throughput: 48.2 MH/s
Latency (p50): 52.3 ms
Latency (p99): 85.1 ms
Memory: 18.2 / 24.0 GB (75.8%)
Power: 385W
Efficiency: 125 MH/s/W (full potential)Troubleshooting
Low Throughput
- Check GPU clock speeds:
nvidia-smi dmon - Verify GPU isn't throttled by temperature:
nvidia-smi - Increase batch size in config
- Check PCIe bandwidth: may be bottleneck with multiple GPUs
Out of Memory Errors
- Reduce
tensor_cache_mbin config - Decrease batch size
- Check other GPU processes:
nvidia-smi
Kernel Launch Failures
- Verify CUDA compute capability matches config
- Check GPU driver version:
nvidia-smi - Update drivers:
sudo apt-get install --upgrade nvidia-driver-535
Power Throttling
- Ensure adequate PSU capacity (24GB GPU needs 750W+ PSU)
- Check power cable connections
- Reduce
max_power_wlimit in config
Profiling
Use NVIDIA Nsight to profile kernel execution:
bash
nsys profile -o gpu-mining ./target/release/pot-o-gpu-miner --config gpu-miner.tomlAnalyze timeline:
bash
nsys-ui gpu-mining.nsys-repNext Steps
- CPU Mining - CPU-based mining for lower hardware requirements
- ESP32 Mining - Embedded IoT mining
- Pool Mining - Combine with others for consistent rewards
- Rewards - Understanding your mining payouts