Skip to content

GPU Mining

High-throughput PoT-O mining using NVIDIA and AMD GPUs.

Overview

GPU mining significantly accelerates tensor computations through massively parallel processing. Modern GPUs can solve PoT-O challenges 10-100x faster than CPUs, making GPU mining the preferred method for serious miners.

Supported Hardware

GPUVRAMThroughputPowerNotes
RTX 409024 GB~50 MH/s450WBest performance/cost ratio
RTX 408016 GB~35 MH/s320WExcellent balance
RTX 4070 Ti12 GB~28 MH/s285WMid-range sweet spot
RTX 309024 GB~35 MH/s420WPrevious gen, still viable
A10040 GB~100 MH/s250WData center grade

AMD GPUs

GPUVRAMThroughputPowerNotes
RX 7900 XTX24 GB~40 MH/s420WBest AMD option
RX 7900 XT20 GB~35 MH/s380WGood alternative
RX 6900 XT16 GB~25 MH/s405WPrevious gen

Getting Started

1. Install Dependencies

NVIDIA (CUDA):

bash
# Install CUDA Toolkit 12.0+
curl https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-repo-ubuntu2004_12.0.0_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004_12.0.0_amd64.deb
sudo apt-get update && sudo apt-get install -y cuda-toolkit-12-0

AMD (HIP):

bash
# Install HIP runtime
wget -q https://repo.radeon.com/rocm/rocm.gpg.key -O - | sudo apt-key add -
sudo apt-get update && sudo apt-get install -y rocm-dkms hip-runtime-amd

2. Build GPU Miner

Clone the PoT-O validator with GPU support:

bash
git clone --branch gpu-support https://github.com/TribeWarez/pot-o-validator.git
cd pot-o-validator

# Build with GPU support
cargo build --release --features gpu-mining,cuda
# or for AMD:
cargo build --release --features gpu-mining,hip

3. Configure GPU Settings

Create gpu-miner.toml:

toml
[mining]
validator_url = "https://pot.rpc.gateway.tribewarez.com"
miner_pubkey = "your_solana_pubkey"

[gpu]
device_id = 0              # GPU index (0 for first GPU)
compute_capability = "8.9" # RTX 4090 is 8.9, see docs for your GPU
max_block_size = 512       # Blocks per grid
shared_memory = 48000      # Bytes of shared memory per block
cache_mode = "L1"          # Prefer L1 cache over shared memory

[performance]
batch_size = 100           # Challenges to batch process
num_streams = 4            # CUDA streams for pipelining
profile_interval = 60      # Profile GPU every 60s

4. Start Mining

bash
./target/release/pot-o-gpu-miner --config gpu-miner.toml

Monitor output:

[2026-05-19 14:32:45] GPU 0: RTX 4090 (24GB) ready
[2026-05-19 14:32:46] Fetching challenges...
[2026-05-19 14:32:47] Batch 1: 100 challenges received
[2026-05-19 14:32:52] ✓ 87 valid proofs found, MH/s: 48.3
[2026-05-19 14:32:57] ✓ 92 valid proofs found, MH/s: 51.7

Optimization Guide

Memory Tuning

GPU memory bandwidth is critical. Optimize allocation:

toml
[gpu.memory]
# Allocate 80-90% of GPU VRAM for tensor buffers
tensor_cache_mb = 18000  # For 24GB GPU: 24000 * 0.85
intermediate_buffers = 3 # Number of intermediate result buffers
persistent_kernels = true # Keep kernels compiled in memory

Kernel Optimization

Choose the right compute kernel for your tensor type:

bash
# For small tensors (< 4K elements)
kernel_type = "matmul_small"

# For large tensors (> 64K elements)
kernel_type = "matmul_large"

# For convolution-heavy challenges
kernel_type = "conv2d_strided"

Stream Pipeline

Use CUDA streams to overlap computation and memory transfer:

toml
[gpu.pipeline]
num_streams = 8            # More streams = better latency
h2d_stream = 0             # Host-to-device transfers
compute_streams = [1,2,3,4,5,6,7] # Compute on different streams
d2h_stream = 0             # Device-to-host transfers (overlaps with compute)

Power Efficiency

Balance performance and power consumption:

toml
[gpu.power]
max_power_w = 350          # Limit power draw
dynamic_clock = true       # Reduce clock during idle
fan_curve = "aggressive"   # Maximize cooling
target_temp_c = 75         # Throttle if exceeding this

Multi-GPU Mining

Scale across multiple GPUs:

toml
[mining]
num_gpus = 4

[[gpu]]
device_id = 0
max_block_size = 512

[[gpu]]
device_id = 1
max_block_size = 512

[[gpu]]
device_id = 2
max_block_size = 512

[[gpu]]
device_id = 3
max_block_size = 512

Run with load balancer:

bash
./target/release/pot-o-gpu-load-balancer --config gpu-miner.toml

Performance Benchmarking

Benchmark your GPU setup:

bash
pot-o-benchmark --gpu 0 --duration 60 --batch-size 100

Expected output:

GPU 0: RTX 4090
  Throughput: 48.2 MH/s
  Latency (p50): 52.3 ms
  Latency (p99): 85.1 ms
  Memory: 18.2 / 24.0 GB (75.8%)
  Power: 385W
  Efficiency: 125 MH/s/W (full potential)

Troubleshooting

Low Throughput

  • Check GPU clock speeds: nvidia-smi dmon
  • Verify GPU isn't throttled by temperature: nvidia-smi
  • Increase batch size in config
  • Check PCIe bandwidth: may be bottleneck with multiple GPUs

Out of Memory Errors

  • Reduce tensor_cache_mb in config
  • Decrease batch size
  • Check other GPU processes: nvidia-smi

Kernel Launch Failures

  • Verify CUDA compute capability matches config
  • Check GPU driver version: nvidia-smi
  • Update drivers: sudo apt-get install --upgrade nvidia-driver-535

Power Throttling

  • Ensure adequate PSU capacity (24GB GPU needs 750W+ PSU)
  • Check power cable connections
  • Reduce max_power_w limit in config

Profiling

Use NVIDIA Nsight to profile kernel execution:

bash
nsys profile -o gpu-mining ./target/release/pot-o-gpu-miner --config gpu-miner.toml

Analyze timeline:

bash
nsys-ui gpu-mining.nsys-rep

Next Steps

TribeWarez Blockchain Ecosystem