GPU-Hot: The Revolutionary NVIDIA Dashboard Without SSH

Tired of constantly SSH-ing into servers just to check nvidia-smi? You're not alone. Every AI engineer, ML researcher, and DevOps professional knows the pain: you're training a model, rendering frames, or mining crypto, and you need to know what's happening on your GPUs right now. Traditional monitoring is clunky, delayed, and forces you into terminal windows that refresh every two seconds. The lag kills productivity. The context switching breaks your flow. And when you're managing multiple machines? Forget about it—tabs multiply like rabbits.

Enter GPU-Hot, the lightweight, web-based NVIDIA GPU dashboard that changes everything. This self-hosted solution delivers sub-second metric updates through a sleek browser interface, eliminating SSH tunnels forever. Monitor a single workstation or scale to 100+ GPUs across your entire cluster using the same Docker image. No complex setup. No enterprise bloat. Just pure, real-time visibility into utilization, temperature, memory, power draw, and process-level details.

In this deep dive, you'll discover how GPU-Hot transforms your workflow with its FastAPI backend, WebSocket architecture, and intelligent multi-node aggregation. We'll walk through real deployment scenarios, dissect actual code from the repository, and show you advanced configurations that turn this simple dashboard into a production-grade monitoring powerhouse. Whether you're fine-tuning LLMs or managing a render farm, GPU-Hot belongs in your toolkit.

What Is GPU-Hot and Why It's Breaking the Internet

GPU-Hot is an open-source, real-time NVIDIA GPU monitoring dashboard created by psalias2006 that prioritizes speed, simplicity, and scalability. Built with modern Python and JavaScript, it provides a self-hosted alternative to cloud-based monitoring solutions that charge premium prices for basic metrics. The project has gained rapid traction in the AI/ML community because it solves a fundamental friction point: instant GPU visibility without infrastructure complexity.

At its core, GPU-Hot is a FastAPI application that interfaces directly with NVIDIA's Management Library (NVML) to extract hardware metrics at 500-millisecond intervals. This data streams to your browser via WebSocket connections, enabling live charts that update faster than you can blink. The entire stack runs in a single Docker container weighing under 200MB, making deployment trivial on any system with NVIDIA drivers and the Container Toolkit installed.

What makes GPU-Hot genuinely revolutionary is its dual-mode architecture. Run it in single-node mode on individual GPU servers, or deploy a hub instance that aggregates data from dozens of worker nodes. This flexibility means you can start monitoring one machine today and seamlessly expand to a full cluster tomorrow without changing your tooling. The dashboard automatically detects every GPU in your system, from consumer RTX cards to data-center A100s, displaying per-device metrics in an intuitive card-based layout.

The project embraces modern DevOps practices: environment-based configuration, declarative Docker deployments, and a static frontend that works on any device. No database required. No external dependencies. Just pure, focused GPU monitoring that respects your time and resources.

Key Features That Make GPU-Hot Essential

Sub-Second Real-Time Metrics

GPU-Hot's defining feature is its aggressive 0.5-second polling interval, configurable in core/config.py. While nvidia-smi refreshes every second at best, GPU-Hot captures utilization spikes, thermal throttling events, and memory allocation bursts that happen in the blink of an eye. The backend uses NVML's asynchronous event system where possible, minimizing CPU overhead while maximizing data freshness.

Automatic Multi-GPU Detection and Process Monitoring

The dashboard doesn't just show GPUs—it shows what's running on them. Each GPU card displays active processes with PID, process name, and memory consumption. By adding --init --pid=host to your Docker run command, GPU-Hot maps GPU memory allocations to actual host processes, giving you actionable intelligence about which training job or rendering task is consuming resources. This is critical for multi-tenant environments where teams share GPU infrastructure.

Historical Charting with Correlation Analysis

Beyond live numbers, GPU-Hot renders time-series charts for utilization, temperature, power draw, and clock speeds using Chart.js. The JavaScript frontend (static/js/chart-manager.js) maintains a rolling buffer of historical data, allowing you to spot trends and correlate performance anomalies. The innovative correlation drawer feature lets you overlay multiple metrics to see how temperature spikes affect clock speeds or how power limits impact utilization.

System-Wide Resource Context

GPU-Hot understands that GPUs don't operate in isolation. The dashboard includes host system metrics: CPU usage, RAM consumption, swap activity, disk I/O, and network throughput. This holistic view prevents the common mistake of diagnosing GPU issues when the bottleneck is actually CPU data preprocessing or storage I/O. The metrics/collector.py module gathers these stats using psutil, ensuring comprehensive system visibility.

Massive Scalability with Hub Mode

The hub architecture is a masterclass in distributed systems simplicity. Worker nodes expose their metrics via the /api/gpu-data endpoint. The hub instance polls these endpoints and aggregates results into a unified dashboard. This design supports 100+ GPUs without centralizing data collection overhead. Each node remains autonomous, and the hub provides a single pane of glass. Environment variables like NODE_URLS make adding new workers a simple configuration change.

Graceful Degradation for Legacy Hardware

Not everyone has the latest RTX 40-series cards. GPU-Hot includes an nvidia-smi fallback mode (nvidia_smi_fallback.py) for older GPUs that lack full NVML support. By setting NVIDIA_SMI=true, the dashboard parses command-line output, ensuring compatibility across diverse hardware estates. This backward compatibility makes GPU-Hot viable in academic labs and enterprise environments with mixed-generation hardware.

Real-World Use Cases Where GPU-Hot Shines

AI/ML Training Pipeline Monitoring

You're training a diffusion model across eight A100 GPUs. With GPU-Hot's hub mode, you launch a worker container on each training node and a central hub on your laptop. The dashboard reveals that GPU 5 is thermal throttling at 87°C while others sit at 75°C—indicating a cooling issue. You notice process memory creeping up on GPU 3, predicting an out-of-memory crash 20 minutes before it happens. The encoder/decoder session metrics confirm your data pipeline is saturating NVENC, prompting you to adjust batch preprocessing. Without GPU-Hot, you'd discover these issues only after jobs failed.

Cryptocurrency Mining Farm Management

Managing a 50-GPU mining operation requires constant vigilance. GPU-Hot's multi-node aggregation lets you monitor all rigs from one tab. The power draw metrics help you identify cards with degraded power connectors drawing 20W less than spec. Fan speed and temperature correlation reveals which rigs need dust cleaning. When a rig goes offline, the hub's connection status shows immediate red indicators. The historical charts track hash rate stability over time, proving which overclock settings are truly stable versus superficially fast.

University Research Cluster Oversight

Academic clusters serve multiple research groups with competing priorities. GPU-Hot's process monitoring shows which student is monopolizing the V100s with inefficient code. The per-process memory breakdown helps you counsel users on memory optimization. When a job hangs, the P-State and throttle status reveal it's stuck in P2 state due to insufficient power. The PCIe info confirms that a GPU was accidentally plugged into a x4 slot instead of x16, explaining the 30% performance regression. The lightweight Docker deployment means you can install it without fighting the university's IT department.

Cloud GPU Cost Optimization

You're spending $50k/month on AWS EC2 P4 instances. GPU-Hot deployed on each instance reveals that average utilization is only 42% during business hours. The system metrics show CPU bottlenecks in data loading, not GPU saturation. This data justifies migrating to smaller, cheaper instances with fewer GPUs but faster storage. The historical utilization patterns prove that preemptible instances would save 70% of costs with minimal job interruption. The API endpoint (/api/gpu-data) integrates with your cost monitoring stack, automating instance right-sizing decisions.

Real-Time Rendering Farm Coordination

A VFX studio renders frames across 200 GPUs in multiple time zones. GPU-Hot's WebSocket architecture provides instant feedback when render nodes complete frames. The temperature monitoring prevents thermal shutdowns during all-night renders. Encoder metrics show which nodes handle video encoding efficiently. When a node drops frames, the process list reveals a competing background job that wasn't supposed to run. The hub's aggregated view lets producers in Los Angeles monitor render progress in London without VPN access—just a browser tab.

Step-by-Step Installation & Setup Guide

Prerequisites Checklist

Before starting, verify your system meets these requirements:

NVIDIA Drivers: Install the latest drivers for your GPU generation
Docker: Version 20.10 or newer with BuildKit enabled
NVIDIA Container Toolkit: Follow the official installation guide
Port 1312: Ensure this port is available or plan to map to another

Test your setup with:

nvidia-smi  # Should show your GPUs
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi  # Should work inside Docker

Single Machine Deployment

Deploy GPU-Hot on a single server in one command:

docker run -d \
  --name gpu-hot \
  --gpus all \
  -p 1312:1312 \
  --restart unless-stopped \
  ghcr.io/psalias2006/gpu-hot:latest

Command breakdown:

-d: Runs container detached in background
--name gpu-hot: Names container for easy management
--gpus all: Exposes all NVIDIA GPUs to the container
-p 1312:1312: Maps container port to host
--restart unless-stopped: Auto-restarts on failure/reboot

Open http://localhost:1312 in your browser. You should see your GPUs immediately.

Multi-Node Cluster Setup

For monitoring multiple machines, deploy workers and a hub:

On each GPU server (worker mode):

docker run -d \
  --name gpu-hot-worker \
  --gpus all \
  -p 1312:1312 \
  -e NODE_NAME=$(hostname) \
  --restart unless-stopped \
  ghcr.io/psalias2006/gpu-hot:latest

On a hub machine (can be CPU-only):

docker run -d \
  --name gpu-hot-hub \
  -p 1312:1312 \
  -e GPU_HOT_MODE=hub \
  -e NODE_URLS=http://gpu-server-1:1312,http://gpu-server-2:1312,http://gpu-server-3:1312 \
  --restart unless-stopped \
  ghcr.io/psalias2006/gpu-hot:latest

Environment variables explained:

NODE_NAME: Custom display name for each worker (defaults to hostname)
GPU_HOT_MODE=hub: Transforms container into aggregation master
NODE_URLS: Comma-separated list of worker endpoints

Process Monitoring Enhancement

To see actual process names instead of PIDs:

docker run -d \
  --name gpu-hot \
  --gpus all \
  -p 1312:1312 \
  --init \
  --pid=host \
  ghcr.io/psalias2006/gpu-hot:latest

Security note: --pid=host grants container access to host process information. Use only in trusted environments.

Legacy GPU Support

For older GPUs (Kepler architecture and earlier):

docker run -d \
  --name gpu-hot \
  --gpus all \
  -p 1312:1312 \
  -e NVIDIA_SMI=true \
  ghcr.io/psalias2006/gpu-hot:latest

This forces the dashboard to parse nvidia-smi XML output instead of using NVML directly.

Building from Source

For development or customization:

git clone https://github.com/psalias2006/gpu-hot
cd gpu-hot
docker-compose up --build

The docker-compose.yml handles build context and volume mounts automatically.

REAL Code Examples from the Repository

Example 1: Docker Deployment Commands

The README provides these production-ready Docker commands. Let's dissect them:

# Single machine deployment
docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest

This minimal command hides powerful functionality. The --gpus all flag uses the NVIDIA Container Toolkit to mount GPU device files and libraries into the container. The image ghcr.io/psalias2006/gpu-hot:latest is hosted on GitHub Container Registry, ensuring always-up-to-date builds. The -d daemonizes the process, while -p 1312:1312 exposes the FastAPI server running inside.

# Multi-machine worker node
docker run -d --gpus all -p 1312:1312 -e NODE_NAME=$(hostname) ghcr.io/psalias2006/gpu-hot:latest

The addition of -e NODE_NAME=$(hostname) injects the host's name into the container environment. Inside app.py, this variable is read and included in every metrics payload, allowing the hub to distinguish between workers. The $(hostname) shell substitution ensures each worker has a unique identifier without manual configuration.

# Hub aggregator node
docker run -d -p 1312:1312 -e GPU_HOT_MODE=hub -e NODE_URLS=http://server1:1312,http://server2:1312,http://server3:1312 ghcr.io/psalias2006/gpu-hot:latest

This command transforms GPU-Hot into a centralized aggregator. The -e GPU_HOT_MODE=hub environment variable triggers hub.py instead of the standard monitor. The NODE_URLS comma-separated list is parsed in core/hub.py, which spawns asynchronous HTTP clients to poll each worker's /api/gpu-data endpoint every 500ms. Notice this hub container doesn't need --gpus all—it can run on a CPU-only machine, making it perfect for lightweight monitoring stations.

Example 2: WebSocket Client Implementation

The README includes this JavaScript snippet for real-time data streaming:

const ws = new WebSocket('ws://localhost:1312/socket.io/');

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // data.gpus      — per-GPU metrics
  // data.processes  — active GPU processes
  // data.system     — host CPU, RAM, swap, disk, network
};

This code connects to GPU-Hot's WebSocket endpoint, which uses Socket.IO for reliable bidirectional communication. The ws.onmessage handler fires every 500ms when the server broadcasts updated metrics. The payload structure is elegantly designed:

data.gpus: Array of GPU objects containing utilization, temperature, memory, power, fan speed, clock speeds, PCIe info, P-State, and throttle status
data.processes: Array of active GPU processes with PID, name, and memory usage
data.system: Host metrics including CPU percentage, RAM usage, swap, disk I/O, and network stats

In static/js/socket-handlers.js, the frontend uses this exact pattern with batched rendering optimizations to prevent DOM thrashing at high update rates.

Example 3: Configuration File

The backend configuration is refreshingly simple:

# core/config.py
UPDATE_INTERVAL = 0.5  # Polling interval in seconds
PORT = 1312            # Server port

These two lines control the entire application's behavior. UPDATE_INTERVAL = 0.5 sets the NVML polling frequency. Reducing this to 0.1 gives you 10Hz updates for ultra-fine-grained profiling, while increasing to 2.0 reduces CPU overhead for large clusters. The PORT = 1312 is used throughout app.py to bind the FastAPI server and in all URL generation.

Example 4: Environment Variable Configuration

The README documents these runtime configurations:

NVIDIA_VISIBLE_DEVICES=0,1     # Specific GPUs (default: all)
NVIDIA_SMI=true                # Force nvidia-smi mode for older GPUs
GPU_HOT_MODE=hub               # Set to 'hub' for multi-node aggregation (default: single node)
NODE_NAME=gpu-server-1         # Node display name (default: hostname)
NODE_URLS=http://host:1312...  # Comma-separated node URLs (required for hub mode)

These variables demonstrate GPU-Hot's 12-factor app design. NVIDIA_VISIBLE_DEVICES leverages NVIDIA's runtime to limit GPU visibility without code changes. NVIDIA_SMI=true switches the metrics source from NVML to XML parsing in nvidia_smi_fallback.py, ensuring compatibility with legacy hardware. The GPU_HOT_MODE toggle completely rearchitects the application from worker to hub, showcasing excellent software design.

Example 5: API Endpoint Usage

The HTTP API provides programmatic access:

GET /              # Dashboard
GET /api/gpu-data  # JSON metrics snapshot
GET /api/version   # Version and update info

The /api/gpu-data endpoint returns the same JSON structure as the WebSocket payload, making it perfect for integration with monitoring stacks. A simple curl http://localhost:1312/api/gpu-data | jq gives you a complete system snapshot. The /api/version endpoint helps automate upgrades across large fleets by exposing the running version.

Advanced Usage & Best Practices

Optimizing Update Intervals for Large Clusters

For clusters exceeding 50 GPUs, the default 0.5-second interval may strain the hub. Modify core/config.py:

UPDATE_INTERVAL = 1.0  # 1-second updates reduce hub CPU by 60%

Rebuild with docker-compose up --build for persistent changes.

Securing Your Dashboard

Never expose GPU-Hot directly to the internet. Use an Nginx reverse proxy with basic auth:

location / {
    proxy_pass http://localhost:1312;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    auth_basic "GPU Monitoring";
    auth_basic_user_file /etc/nginx/.htpasswd;
}

Integrating with Prometheus

While GPU-Hot doesn't natively export Prometheus metrics, you can use the API endpoint:

# prometheus_exporter.py
import requests
import time

while True:
    data = requests.get('http://gpu-hot:1312/api/gpu-data').json()
    for gpu in data['gpus']:
        print(f'gpu_utilization{{id="{gpu["id"}"}} {gpu["utilization"]}')
    time.sleep(15)

Setting Up Alerts

Create a watchdog script using the WebSocket API:

const ws = new WebSocket('ws://gpu-hot:1312/socket.io/');
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  data.gpus.forEach(gpu => {
    if (gpu.temperature > 85) {
      fetch('https://hooks.slack.com/services/...', {
        method: 'POST',
        body: JSON.stringify({text: `GPU ${gpu.id} overheating: ${gpu.temperature}°C`})
      });
    }
  });
};

Performance Tuning

For maximum performance:

Run workers on GPU nodes with --network host to reduce latency
Use SSD storage for the hub to handle historical data buffering
Increase Docker memory limits if monitoring 100+ GPUs: --memory 2g

Comparison: GPU-Hot vs. Alternatives

Feature	GPU-Hot	nvidia-smi	nvtop	NVIDIA DCGM
Real-time Updates	Sub-second (0.5s)	1+ seconds	1 second	1 second
Web Dashboard	✅ Yes, self-hosted	❌ No	❌ No	✅ Yes, complex
Multi-Node	✅ Hub mode (100+ GPUs)	❌ No	❌ No	✅ Yes, heavy
Process Monitoring	✅ PID + Name	✅ PID only	✅ PID only	✅ Limited
Historical Charts	✅ Built-in	❌ No	❌ No	✅ With InfluxDB
Docker Deploy	✅ Single command	❌ Not applicable	❌ Not applicable	❌ Complex
Resource Usage	~150MB RAM	~10MB RAM	~50MB RAM	~1GB RAM
Setup Time	30 seconds	Instant	5 minutes	2+ hours
API	✅ HTTP + WebSocket	❌ CLI only	❌ No	✅ C Library
License	MIT	NVIDIA EULA	GPLv3	NVIDIA EULA

Why GPU-Hot Wins: It combines the simplicity of nvidia-smi with the power of DCGM, packaged in a Docker container that deploys in seconds. The hub architecture provides enterprise scalability without enterprise complexity. The WebSocket API enables real-time integrations that DCGM makes difficult. For teams wanting immediate visibility without vendor lock-in, GPU-Hot is the clear choice.

Frequently Asked Questions

Q: Does GPU-Hot support AMD GPUs? A: No, GPU-Hot is specifically designed for NVIDIA GPUs using NVML and nvidia-smi. For AMD monitoring, consider RadeonTop or ROCm tools.

Q: What's the minimum GPU architecture required? A: GPU-Hot works with NVIDIA GPUs dating back to Kepler (2012). For pre-Kepler cards, enable NVIDIA_SMI=true mode for compatibility.

Q: How many GPUs can the hub mode handle? A: The hub has been tested with 100+ GPUs across 20 nodes. Performance depends on hub CPU and network latency. Increase UPDATE_INTERVAL for larger deployments.

Q: Is using --pid=host secure? A: It grants container access to host process information. Use it only in trusted environments. For production, consider running without it and mapping process names externally via the API.

Q: Can I export metrics to Grafana? A: Yes! Use the /api/gpu-data endpoint with a custom Prometheus exporter or Telegraf plugin. The JSON structure is stable and well-documented.

Q: Why aren't my metrics appearing? A: First run nvidia-smi on the host. If it works, test Docker GPU access with docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi. For older GPUs, add -e NVIDIA_SMI=true.

Q: How do I update GPU-Hot? A: Run docker pull ghcr.io/psalias2006/gpu-hot:latest then restart your container. For custom builds, git pull and docker-compose up --build.

Conclusion: Your GPU Monitoring Deserves an Upgrade

GPU-Hot represents a paradigm shift in NVIDIA GPU monitoring—SSH-free, real-time, and radically simple. It eliminates the friction that has plagued GPU operations for years, replacing terminal juggling with a single, beautiful dashboard that scales from your laptop to your data center. The sub-second updates catch problems before they cascade. The hub architecture turns cluster management from a chore into a pleasure. The Docker-native deployment means you're monitoring in minutes, not hours.

What impresses most is the thoughtful engineering: the graceful fallback for legacy hardware, the batched WebSocket rendering for performance, the environment-driven configuration for DevOps workflows. This isn't a weekend hack—it's production-ready software that respects your time and infrastructure.

The bottom line: If you're still SSH-ing into servers to run nvidia-smi, you're working too hard. GPU-Hot gives you superpowers. Deploy it today on a single machine, and within a week you'll wonder how you ever lived without it. Your GPUs are hot—your monitoring should be too.

Star the repository, try the live demo, and join the revolution in GPU monitoring: ⭐ https://github.com/psalias2006/gpu-hot ⭐

Your future self will thank you every time you catch a thermal issue before it costs you a 48-hour training run.