Stop Waiting Hours for Renders! PSRayTracing Cuts Time by 75%

What if your ray tracer could finish in the time it takes to grab coffee—instead of running overnight?

Every graphics programmer knows the soul-crushing wait. You hit "render" on your ray tracer, watch the progress bar crawl, and realize you've got 47 minutes to kill. Maybe you started Peter Shirley's legendary Ray Tracing in One Weekend series. Maybe you followed every line, typed every vec3, and finally reached that glorious final scene—only to discover it takes two hours on a single core.

Here's the dirty secret most tutorials won't tell you: the reference implementation is intentionally simple, not fast. It's pedagogical code, designed to teach concepts—not to squeeze every drop of performance from your CPU. But what if you could have both? What if clean, educational code could also scream through renders at 4x the speed?

Enter PSRayTracing—a modern C++17 reimagining that transforms Shirley's gentle introduction into a performance beast. With multi-core rendering, SIMD-friendly optimizations, and a gorgeous cross-platform GUI that runs on your phone, this isn't just a rewrite. It's a masterclass in extracting hidden performance without sacrificing clarity. And the best part? You can toggle every optimization on and off to see exactly what each change buys you.

Ready to stop watching progress bars and start shipping beautiful renders? Let's dive in.

What is PSRayTracing?

PSRayTracing is a ground-up modern C++ implementation of Peter Shirley's Ray Tracing in One Weekend mini-book series, created by developer Benjamin N. Summerton (known online as @DefPriPub). Born from a desire to learn modern C++ and push CPU rendering to its limits, this project evolved into something far more ambitious than a simple code translation.

The repository lives primarily on GitLab, with a GitHub mirror for community engagement. What started as a personal learning exercise in September 2020—based on v3.2.0 of the original book code—has matured into a production-quality renderer with real-world deployments on Android, iOS, macOS (Apple Silicon), Windows, and Linux.

Why it's trending now: Ray tracing has exploded from academic curiosity to industry standard, powering everything from NVIDIA's RTX cards to Pixar's RenderMan. Yet most developers still learn from Shirley's deliberately simple reference code, unaware that significant performance gains lurk just beneath the surface. PSRayTracing exposes these gains systematically, making it both an educational tool and a practical renderer. The project's Apache 2.0 license and active community contributions have further fueled its adoption.

The creator's journey is telling: after first implementing the books in Nim back in 2016, he returned four years later with deeper systems knowledge and a mission. The result? A renderer that proves modern C++ can be both beautiful and blisteringly fast.

Key Features That Make PSRayTracing Insane

4x Single-Core Speedup (Without Cheating)

The headline figure: PSRayTracing renders Book 2's final scene in approximately one-quarter the time of the original implementation. This isn't from throwing a GPU at the problem or radically restructuring the algorithm. It's from disciplined, measurable optimizations that you can inspect and toggle individually.

Multi-Core Rendering with Intelligent Threading

Gone are the days of single-threaded suffering. PSRayTracing implements a custom thread pool that distributes scanlines across available cores. The creator's first C++ thread pool implementation shows honest scaling: 1 core = 120 seconds, 2 cores = 72 seconds, 4 cores = 43 seconds. Not perfectly linear (pointer chasing in the scene graph likely hurts), but dramatically better than serial execution.

Toggleable Optimizations via CMake

Every major change from the reference implementation can be switched on/off through WITH_* CMake variables. Want to see if the AABB hit optimization matters on your hardware? Toggle WITH_BOOK_AABB_HIT. Curious about trigonometric approximations? Try WITH_BOOK_ATAN2. This transforms the project into a living benchmark suite.

Cross-Platform GUI (Including Mobile!)

The qt_ui/ subdirectory delivers a Qt/QML interface that runs natively on:

Android (Google Play Store)
iOS (Apple App Store)
macOS (Apple Silicon + Intel)
Windows, Linux (build from source)

Localized into Japanese and German, with the Earth texture embedded for seamless mobile deployment.

Production-Ready Output

No more embarrassing PPM files. PSRayTracing uses stb_image_write to generate proper PNGs directly. Configurable parameters include samples per pixel, max ray depth, resolution, and output filename.

Deep Copy Per Thread Architecture

A brilliant insight: copying the entire scene graph to each thread (via the IDeepCopyable interface) eliminates shared pointer contention. Result? 20-30% faster multi-core renders on many scenes. Counter-intuitive, but measurably real.

Real-World Use Cases Where PSRayTracing Dominates

1. Graphics Education at Scale

University courses teaching ray tracing often struggle with render times during labs. PSRayTracing's 4x speedup means students complete assignments in class, not overnight. The toggleable optimizations let instructors demonstrate exactly why each technique matters—turning a black box into a transparent learning tool.

2. Procedural Content Generation for Games

Indie developers generating lightmaps, environment probes, or promotional renders need CPU ray tracing that finishes before deadlines. The multi-core support and PNG output integrate cleanly into asset pipelines, while the modern C++ structure allows easy embedding into larger engines.

3. Mobile Graphics Research

The Android/iOS GUI makes PSRayTracing unique among Shirley implementations. Researchers studying energy-efficient rendering, adaptive sampling, or perceptual metrics can deploy experiments directly to phones without rewriting the core renderer.

4. Performance Engineering Training

The documented optimization journey—complete with blog posts on final, noexcept, random number generation, and trigonometric approximations—serves as a real-world case study for engineers learning to profile and optimize C++ code. Each WITH_* flag is a lesson in measurement-driven development.

Step-by-Step Installation & Setup Guide

Prerequisites

C++17 compiler: GCC 11/12+ (recommended), Clang 11+, or MSVC (community help needed)
CMake 3.x: ccmake or CMake GUI strongly recommended for toggling optimizations
Qt 6.2+: Only if building the GUI (qt_ui/)

Building the Core Renderer

# 1. Clone and enter the repository
git clone https://github.com/define-private-public/PSRayTracing.git
cd PSRayTracing

# 2. Create build directory
mkdir build && cd build/

# 3. Set compiler (adjust for your system)
export CC=gcc-12
export CXX=g++-12

# 4. Configure with Release optimizations
cmake ../ -DCMAKE_BUILD_TYPE=Release

# 5. Build
make -j$(nproc)

Exploring Optimization Flags (Critical Step!)

After initial configuration, run ccmake ../ to interactively toggle features:

WITH_BOOK_AABB_HIT          ON/OFF    # Optimized AABB intersection
WITH_BOOK_ATAN2             ON/OFF    # Fast atan2 approximation
WITH_BOOK_PERLIN            ON/OFF    # Improved Perlin noise
WITH_BOOK_SQRT              ON/OFF    # (Deprecated: std::sqrt wins)
WITH_BOOK_MAT_PTR           ON/OFF    # Raw pointer in HitRecord
WITH_PDF_VARIANT            ON/OFF    # Stack-allocated PDFs (Book 3)
WITH_DEEP_COPY_PER_THREAD   ON/OFF    # Scene copy per thread

Pro tip: Benchmark with all OFF, then selectively enable to measure each optimization's impact on your specific hardware.

Building the Qt GUI

cd qt_ui/
# Follow README.rst for platform-specific Qt setup
# Generally: mkdir build && cd build && cmake .. && make

REAL Code Examples from the Repository

Example 1: Basic CLI Rendering

The simplest possible invocation—render Book 2's final scene with defaults:

./PSRayTracing

But where's the fun in defaults? Here's a production-quality command with explicit parameters:

# Render at 1080p with 250 samples/pixel on 4 cores
./PSRayTracing \
    --scene book2::final_scene \
    -n 250 \
    -j 4 \
    -s 1920x1080 \
    -o masterpiece.png

Parameter breakdown:

--scene book2::final_scene: The iconic final scene from Book 2
-n 250: 250 samples per pixel (vs. default 25—noticeably smoother)
-j 4: Utilize 4 CPU cores for parallel scanline rendering
-s 1920x1080: Full HD resolution (vs. default 960x540)
-o masterpiece.png: Explicit output filename

Example 2: Discovering Available Scenes

Before rendering, inspect what's implemented:

./PSRayTracing --list-scenes

This outputs all scene identifiers in book order, from book1::normal_sphere through book3::cornell_smoke and experimental scenes like fun::cornell_glass_boxes. The scene ID format follows book{1|2|3}::{scene_name} with additional fun:: variants for stress testing.

Example 3: Quick Preview Render

For rapid iteration during scene development:

# Low-quality preview: few samples, low res, single core
./PSRayTracing \
    --scene book1::diffuse_sphere \
    -n 10 \
    -j 1 \
    -s 480x270 \
    -o preview.png

This completes in seconds rather than minutes, letting you verify scene composition before committing to a full-quality render.

Example 4: The AABB Hit Optimization (Code Deep-Dive)

The AABB::hit() function exemplifies PSRayTracing's optimization philosophy. Original book code uses sequential if statements with swap() calls—branch-heavy and resistant to vectorization. PSRayTracing's version replaces this with parallelizable min()/max() operations:

// Simplified conceptual comparison
// BOOK VERSION (sequential branches):
// if (t0 > t1) swap(t0, t1);
// if (t_min > t0) t_min = t0;
// ... repeated for each axis ...

// PSRAYTRACING VERSION (SIMD-friendly):
// Compute all axes simultaneously using min/max
// Compiler can auto-vectorize this into SIMD instructions
// See src/AABB.cpp for full implementation with USE_BOOK_AABB_HIT guards

The #ifdef USE_BOOK_AABB_HIT blocks let you compile both versions and benchmark directly. On the creator's i5-7300U, this change contributed significantly to the overall 4x speedup.

Example 5: PCG Random & Per-Thread RNG Setup

The original random_* functions caused black speckle artifacts in multi-core renders due to thread-unsafe shared state. PSRayTracing solves this by creating a unique RandomGenerator per scanline, each seeded from a master RNG:

// Conceptual structure (see src/RandomGenerator.hpp)
class RandomGenerator {
    pcg32 rng;  // PCG instead of Mersenne Twister—faster, better statistical quality
public:
    explicit RandomGenerator(uint64_t seed) : rng(seed) {}
    
    double random_double() { return rng.next_double(); }
    // ... other distribution methods ...
};

// In the thread pool setup:
// master_rng generates unique seeds
// each worker thread gets: RandomGenerator(master_rng())
// No shared state = no black speckles + better cache behavior

The PCG random number generator (from pcg-random.org) is a drop-in replacement that outperforms C++'s standard std::mt19937.

Advanced Usage & Best Practices

Benchmarking Methodology

The creator's approach is exemplary: always use CMAKE_BUILD_TYPE=Release, run multiple trials, and report variance. When testing optimizations, disable CPU frequency scaling and close background applications. The WITH_* flags exist precisely so you can isolate variables.

Memory Layout Matters

The PDFVariant optimization for Book 3 scenes demonstrates a crucial principle: prefer stack allocation and std::variant over dynamic allocation and virtual dispatch when types are known at compile time. CosinePDF, HittablePDF, and MixturePDF all live on the stack, accessed via raw pointers—eliminating shared_ptr overhead entirely in the hot path.

When Approximations Help (and When They Don't)

The trigonometric journey is instructive: custom asin() approximations yielded 10% gains using a 1960s calculus textbook formula, but std::sin()/std::cos() became indistinguishable from Taylor series on newer hardware. Measure, don't assume. The atan2() bit-twiddling approximation remains valuable, however.

The Deep Copy Secret Weapon

For multi-core renders, always test with --no-copy-per-thread disabled (the default). The 20-30% improvement from deep copying isn't universal—scene structure matters—but it's consistently beneficial for complex hierarchies with many shared_ptr nodes.

Comparison with Alternatives

Feature	PSRayTracing	Original Shirley Code	rayrender (R)	Other C++ Ports
Language	Modern C++17	C++ (pre-11 style)	R	Varies
Single-core speed	~4x faster	Baseline	~5x faster (reported)	Typically 1-2x
Multi-core rendering	Built-in thread pool	None	Yes	Rare
Cross-platform GUI	Android, iOS, Desktop	None	None	Rare
Toggleable optimizations	*CMake `WITH_` flags**	N/A	No	Rare
Mobile deployment	App Store / Play Store	No	No	No
PNG output	Native (stb_image_write)	PPM only	Varies	Varies
Code architecture	Interfaces, clean separation	Monolithic	Object-oriented	Mixed
Documentation depth	Extensive blog posts	Books only	README	Mixed

Why PSRayTracing wins: It's the only implementation that combines measurable educational value (toggleable optimizations) with production practicality (mobile GUI, multi-core, proper output formats). The rayrender project achieves similar speedups but targets R's ecosystem; PSRayTracing brings that performance culture to C++ where it can integrate with game engines, tools, and pipelines.

FAQ: Common Developer Concerns

Q: Is this suitable for beginners learning ray tracing? A: Absolutely—if you're comfortable with C++. The code follows Shirley's progression but adds modern practices. Start with all WITH_* flags OFF to match the book, then enable optimizations as you learn.

Q: Why C++17 specifically? A: std::variant (for PDFVariant), structured bindings, and improved constexpr support enable cleaner, faster code without the complexity of C++20 modules or concepts.

Q: Can I use this commercially? A: Yes—Apache 2.0 licensed. The only exception is src/third_party (check individual licenses). A shout-out to Benjamin is appreciated but not required.

Q: Why is multi-core scaling not perfectly linear? A: Scene graph pointer chasing and memory bandwidth limitations. The creator welcomes PRs improving the thread pool implementation.

Q: Does it support real-time rendering? A: No—this is a CPU path tracer. The creator notes GPU ports (CUDA/OpenCL/Vulkan) as future work that could enable real-time performance.

Q: How do I contribute the MSVC build? A: The creator explicitly requests help with MSVC Windows builds. Check the GitHub issues and submit PRs to the GitLab primary repository.

Q: What's the deal with rreal instead of float/double? A: A type alias defaulting to double for precision experiments. Interestingly, float performed worse in testing—possibly due to SIMD width or promotion overhead.

Conclusion: Your Renders Deserve Better

Peter Shirley's ray tracing series remains the gold standard for learning graphics programming—but the reference implementation was never meant to be your production renderer. PSRayTracing proves that with disciplined modern C++, thoughtful architecture, and measurement-driven optimization, you can preserve educational clarity while extracting genuine performance that matters.

The 4x single-core speedup isn't marketing fluff; it's documented, reproducible, and dissectable through CMake toggles. The multi-core rendering transforms overnight jobs into coffee-break renders. The cross-platform GUI puts a path tracer in your pocket. And every optimization—from AABB::hit() SIMD restructuring to stack-allocated PDF variants—comes with honest blog posts about what worked, what didn't, and why.

If you've been following Shirley's books and wondering "what's next," this is your answer. If you're teaching graphics and need students to see results in real time, this is your tool. If you're building a pipeline that needs reliable CPU rendering without the bloat of commercial renderers, this is your foundation.

Stop waiting. Start rendering. Grab the code, toggle some optimizations, and watch your CPU finally earn its keep.

👉 Star PSRayTracing on GitHub and join the community pushing CPU ray tracing forward.

Have you benchmarked PSRayTracing on your hardware? Found an optimization the creator missed? The project thrives on community contributions—share your results and help make CPU rendering faster for everyone.