Skip to main content
Chris Royse field notes

Five Megabytes Of Shakespeare Became A Training System

The Shakespeare case shows how a small corpus can become SFT examples, DPO pairs, graph edges, style centroids, and verification checks.

Media / PAPER + video / 11:49

Five Megabytes Of Shakespeare Became A Training System - Teleox.ai field note thumbnail

Audience

Post-training researchers, evals leads, language model teams

Core idea

A small text corpus can be transformed into multiple supervision surfaces when the system extracts structure instead of just counting tokens.

Founder source

Style LoRA

Watch on YouTube· 11:49

Five Megabytes Of Shakespeare Became A Training System

This is the most intuitive demo of meaning compression: same source, more structured training material, more checks.

Watch videoOpen the full video on YouTube

What to take from it

The videos are raw build context. These notes translate them into the shortest useful frame for creators, companies, and AI lab readers.

SFT, DPO, and verification can all come from one corpus pipeline.

High style fidelity needs negative examples, not just imitation.

The audit has to catch memorized artifacts like headers.

Continue this thread.

Related notes stay inside the same problem area first, then move to the next useful context.

Make it concrete.

Send the audience, data type, target task, proof bar, and sharing limits.