synthStation — wenn das LLM die Synths spielt, nicht den Sound · HandsOn

Die meisten KI-Musiktools erzeugen Audio. Man tippt "Hardstyle, treibend, düster" und bekommt eine WAV-Datei. synthStation macht etwas anderes: Es erzeugt keine Töne, sondern Spielanweisungen — und schickt sie an fünf echte Synthesizer, die im Studio stehen und wirklich klingen.

Der Ablauf hat drei Glieder:

Ich tippe einen Prompt.
Ein lokales Sprachmodell (Qwen3 27B auf Zuse) schreibt strukturiertes Performance-JSON — Noten, Parameter, Timing.
Ein Realtime-Scheduler spielt dieses JSON über USB-MIDI an den Synth-Zoo ab.

Das LLM komponiert. Der Scheduler timt. Diese Trennung ist heilig: Das Sprachmodell macht nie Echtzeit-Arbeit, sonst zerfällt das Timing.

Die Leitprinzipien

Ein Hobbyprojekt driftet, wenn man es lässt. Gegen den Drift habe ich mir ein paar Regeln festgeschrieben — und die wichtigste ist:

Keine hardcoded Sound-Rezepte. Das LLM designt die Sounds.

Es gibt im Code keine Vorlage für einen "guten Lead". Das Modell kennt Da Tweekaz, es weiß, wie Hardstyle klingt — also lasse ich es die rund 540 Synth-Parameter selbst einstellen, statt ihm meine Vorurteile einzucodieren. Sound-Design-Wissen lebt im Modell, nicht im Programm.

Zweite Regel: audibel verifiziert. Jede Behauptung über die Hardware muss per Ohr nachprüfbar sein, bevor sie ein Häkchen bekommt. Ein MIDI-Mapping, das "eigentlich" stimmen sollte, zählt nicht. Erst wenn der DrumBrute auf Pad 36 wirklich die Kick auslöst, ist es wahr.

Die Stolperfallen, alle bezahlt

Hardware lügt anders als Software. Sie sagt nicht "Fehler", sie tut einfach das Falsche.

NRPN-Timing am Mininova. Schickt man die Parameter-Änderungen zu schnell hintereinander, schluckt der Synth die Hälfte. Die Lösung ist eine künstliche Verlangsamung beim Senden — unschön, aber die einzige, die zuverlässig klingt.
DrumBrute-Pads sind nicht General-MIDI. Die Belegung ist linear von 36 bis 45, nicht nach dem Standard-Schema. Das habe ich nicht aus dem Handbuch gelernt, sondern durch einen MIDI-Listener, der mir live zeigt, welches Pad welche Note feuert.
Cross-Platform-Venvs korrumpieren sich. Der lokale Coder-Agent (Linux) und meine Windows-Box teilten sich anfangs dasselbe Arbeitsverzeichnis inklusive .venv. Jedes Sync hat die Umgebung zerschossen. Lösung: getrennte Clones, Sync nur über Git.

Vom Proof-of-Concept zum Produkt

Lange war das ein Wegwerf-Experiment im Bastelordner. Seit Kurzem nicht mehr — ich habe es als eigenständiges Repo herausgelöst und wie ein kleines Produkt aufgesetzt: sauberes src-Layout, ein einziger MCP-Server für alle Geräte (set_param(device, name, value)), über hundert Tests, ein schlankes Web-Frontend ohne Build-Schritt.

Der Build-Modus ist derselbe wie bei aiDoom: Ich bin Architekt und Reviewer, der lokale Agent ist der Coder, gearbeitet wird in kleinen, abgenommenen Schritten. Die Spezifikationen und die Build-Historie liegen im Repo — damit eine künftige Session genau da weitermachen kann, wo die letzte aufgehört hat.

Der schönste Moment bisher: fünf Synthesizer synchron im Hardstyle-Jam — Pad-Flächen, Unison-Lead über zwei Geräte, ein Oktav-pumpender Bass, vier-auf-die-Eins von der Drum-Maschine. Alles aus einem getippten Satz. Der Loop läuft durch, und für einen Moment vergesse ich, dass ich das Ding selbst gebaut habe.

Most AI music tools generate audio. You type "hardstyle, driving, dark" and get a WAV file. synthStation does something different: it doesn't generate sound, it generates playing instructions — and sends them to five real synthesizers sitting in the studio, actually making the noise.

The flow has three links:

I type a prompt.
A local language model (Qwen3 27B on Zuse) writes structured performance JSON — notes, parameters, timing.
A realtime scheduler plays that JSON over USB-MIDI to the synth zoo.

The LLM composes. The scheduler times. That separation is sacred: the language model never does realtime work, or the timing falls apart.

The guiding principles

A hobby project drifts if you let it. Against the drift I wrote down a few rules — and the most important one is:

No hardcoded sound recipes. The LLM designs the sounds.

There's no template in the code for a "good lead." The model knows Da Tweekaz, it knows what hardstyle sounds like — so I let it set the roughly 540 synth parameters itself, instead of encoding my own biases. Sound-design knowledge lives in the model, not the program.

Second rule: verified by ear. Every claim about the hardware has to be checkable by listening before it gets a checkmark. A MIDI mapping that "should" be right doesn't count. Only when the DrumBrute actually fires the kick on pad 36 is it true.

The pitfalls, all paid for

Hardware lies differently than software. It doesn't say "error," it just does the wrong thing.

NRPN timing on the Mininova. Send parameter changes too fast back to back and the synth swallows half of them. The fix is an artificial slowdown when sending — ugly, but the only one that reliably sounds right.
DrumBrute pads aren't General MIDI. The layout is linear from 36 to 45, not the standard scheme. I didn't learn that from the manual but from a MIDI listener that shows me live which pad fires which note.
Cross-platform venvs corrupt themselves. The local coder agent (Linux) and my Windows box initially shared the same working directory, including the .venv. Every sync wrecked the environment. Fix: separate clones, sync only through Git.

From proof of concept to product

For a long time this was a throwaway experiment in the tinkering folder. Not anymore — I pulled it out into a standalone repo and set it up like a small product: clean src layout, a single MCP server for all devices (set_param(device, name, value)), over a hundred tests, a lean web frontend with no build step.

The build mode is the same as on aiDoom: I'm the architect and reviewer, the local agent is the coder, work happens in small, signed-off increments. The specs and build history live in the repo — so a future session can pick up exactly where the last one left off.

The best moment so far: five synthesizers in sync on a hardstyle jam — pad textures, a unison lead across two devices, an octave-pumping bass, four-on-the-floor from the drum machine. All from one typed sentence. The loop runs, and for a moment I forget I built the thing myself.