AI Library — Red Deer Investments

The HRM Fine-Tuning Journey

June 5, 2026 · Failing, Debugging, and Finally Getting Modest Improvement on an 8GB Mac Mini

A detailed account of fine-tuning HRM-Text-1B — a 1B-parameter hierarchical reasoning model — via QLoRA on an 8GB Mac Mini. 13 iterations, a silent gradient bug, catastrophic forgetting, a v6 breakthrough, and the honest conclusion about the ceiling of local fine-tuning on low-end hardware.

Running Local Models at Home

Updated June 5, 2026 · A Split Toolkit for Local Inference on an 8GB Mac Mini

A rigorous benchmark of five small models across 1,350 scored runs. Gemma 4 E2B leads overall at 82.2%, but the top three are clustered within 3.3 points — the real finding is that architectures are diverging, not converging. Which model you should use depends on which failure mode you can tolerate.

ZAYA1-8B: Observations on the Efficiency Frontier

May 22, 2026 · Intelligence Density, Termination Failure, and the Diffusion Bet

A 42-prompt evaluation of ZAYA1-8B, Nemotron 30B-A3B, and Gemma 4 26B-A4B separating raw observations from architectural implications — revealing a proof-of-concept whose broken governor masks an intriguing signal about where inference is heading.

Hardware & Inference

From the Gate Up

May 22, 2026 · How Chips Actually Work, from Logic Gates to AI Silicon

A journey through the hidden physics of AI hardware — starting with a single AND gate and building up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do. Based on the Dwarkesh Podcast conversation with Reiner Pope.

The Waiting Game

May 19, 2026 · How Inference Economics Shapes the Future of AI

A journey through the memory wall, KV caches, batch economics, speculative decoding, and custom silicon — explaining why everything in AI slows down before it gets faster.

Beyond the Waiting Game

May 19, 2026 · How AI is Learning to Work Around the Memory Wall

The sequel exploring what comes after inference economics — architectures and techniques that reshape how models think.

The Model That Does Everything

May 19, 2026 · What NVIDIA's Diffusion LM Means for Inference

A critical analysis of NVIDIA's Nemotron-Labs-Diffusion — what the three-mode diffusion model means for hardware, where it breaks, and what to build now.

Model Design

Against Backprop

May 22, 2026 · Why the Brain Can't Use AI's Best Algorithm, and the Search for What Comes Next

A narrative exploration of biologically plausible alternatives to backpropagation — from equilibrium propagation to predictive coding — and why closing the gap between AI and the brain unlocks fundamentally more efficient hardware.

SlimQwen Reference Book

May 19, 2026 · A Guide to Compressing and Optimizing Large Language Models

Practical techniques for shrinking giant AI brains: compression, mixture of experts, pruning, merging, and recovery training.