I run roughly thirty live experiments and personalisations across six client brands at any given time, all on top of GrowthBook. GrowthBook itself is great — feature flags, a stats engine, a UI my product managers can navigate without me hovering. But out of the box it gives you an SDK and a hosted control plane, and that’s it. Everything around the edges — the per-client build pipelines, the way an experiment is structured on disk, the QA preview workflow, the scaffolding that turns a Jira ticket into a folder with the right shape — that’s stuff you build yourself if you want it. So I built it, kept building it, and the result is what I’m calling the Toolbox.

It’s a private mono-repo, and it’ll stay private — too much client-specific stuff in there to push public without a serious scrub. The TL;DR of where it sits today (2026):

  • Vite as the bundler, GrowthBook’s SDK on top, Biome for lint and format. Very few runtime dependencies on purpose.
  • Six active client brands — QLF, Toom, LAB, TSH, Plate, and one MintMinds-internal site — each with its own scoped build mode.
  • 33 live items in production right now; 173 archived across the same clients. (QLF on its own has racked up several hundred lifetime A/B tests across the legacy ra_framework era and the GrowthBook era combined.)

This didn’t appear from nowhere. I started CRO work properly in 2014, co-founded Mintminds in 2016, and spent a long stretch building ra_framework — a tightly homegrown experimentation layer that did roughly what the Toolbox does today, only without GrowthBook underneath it. The migration to GrowthBook came later, on top of a rudimentary first version of this Toolbox plus a third-party script injector. The current shape is what’s left after several rounds of “make the next ten experiments cheaper than the last”.

The shape of an experiment, because the rest of this post makes more sense against that backdrop: each one lives in its own folder named for the Jira ticket, with a tiny entry file the platform calls into, a small module that does the actual DOM work, an about.md carrying the ticket brief, and a typed config describing targeting and tracking. Every experiment is structurally identical regardless of which client it belongs to — once you’ve read one of them, you’ve read all 200-odd.

Triggers, and the hydration-timing handler I shouldn’t have had to write

GrowthBook’s native targeting evaluates against attributes you push into the SDK. That’s perfect for static facts (browser, country, B2B/B2C) but inadequate for “fire this experiment after the SPA finishes hydrating, on page X, only when Y is in the viewport”. So the Toolbox has its own trigger layer that sits above GrowthBook attributes: each trigger condition becomes an attribute the SDK can target on, once the condition is met.

The handlers cover the cases that come up over and over: URL and dataLayer predicates, element-load, viewport entry (via IntersectionObserver, never scroll listeners), click/hover events, custom events, and the painful one — framework hydration.

The framework-hydration story is short and grim. Leaseabike’s storefront is Remix; any DOM mutation I made during the SSR-to-client handover was getting nuked by React’s reconciliation pass a few hundred milliseconds later. DOMContentLoaded was useless; load was both too late and not late enough; requestIdleCallback would happily fire mid-hydration on slow connections. After a couple of weeks of false starts, the handler now waits on three independent signals together — one tied to React’s internal state, one to DOM stability, one to the browser actually being idle — and only continues when all three line up, with a safety timeout for the cases where they don’t. I’m being deliberately vague about the exact predicates: getting them right was hard-won and I’d rather not write the implementation manual for the next vendor.

It’s not elegant. It is, however, the difference between an experiment that ships and an experiment that flickers and tracks zero conversions. Hydration timing is one of those things experimentation-platform vendors like to pretend they handle, and they don’t.

Triggers also serve a second purpose: gating enrolment. If an experiment shouldn’t apply to a user (wrong domain, prefers-reduced-motion, B2B flag, article number not in some lookup map), that predicate goes into the trigger stack, not into an early return inside the change file. The reason is statistical. A user who hits an early return inside the change function has still been counted as exposed by GrowthBook, and that pollutes the variant population. Encoding the predicate as a trigger keeps excluded users out of the stats entirely, which is the only honest way to do it.

The dev panel, the tracer, and the runtime profiler

When a stakeholder asks “can I see variant B without putting myself into an experiment cohort”, the answer in vanilla GrowthBook is “kind of, here’s a URL parameter, but it’ll bypass targeting and you’ll see the variant on every page regardless of the rules”. That’s not what most stakeholders actually want. What they want is “show me variant B exactly as it would render if I qualified for the experiment”.

So the Toolbox has an override layer that swaps the variant assignment but still respects every other targeting rule — triggers, audience, page conditions, the lot. You activate it from a dropdown in the dev panel (or via a URL flag), it persists in session storage until the tab closes, and a banner pinned to the bottom of the page reminds you which experiments are currently forced. It’s the single feature internal QA folks ask for by name.

The dev panel itself is more useful than I expected when I started writing it. It hosts the override dropdown, a debug-verbosity toggle, and a runtime profiler that captures DOM snapshots before and after the experiments run, records every trigger firing and every variant assignment with timestamps, and exports the lot as a JSON file. I keep the resulting profiles in the repo — they’re the closest thing I’ve got to a regression record for “did this experiment do what I thought it did on the actual production page?”.

The trace logger that backs all of this is colour-themed against the operating system’s dark/light setting, prefixes every line with a relative timestamp from page start, and namespaces by module + function so it’s grep-able even in a console with two thousand messages from third-party scripts. I’ll happily admit I copied the prefix-style pattern from one of the older Mintminds projects, and it’s still my favourite small thing in the codebase.

Per-client adapters where they have to exist

The core of the Toolbox is framework-neutral. Trigger handlers, the event registry, storage utilities, declarative DOM helpers — none of those know what stack they’re running on top of. That’s deliberate. I run on Magento 2 + Hyva + Alpine.js for QLF, a React SPA for Toom, a Remix app for LAB, a React 16 SPA with react-helmet for TSH, plus more conventional setups for Plate and MintMinds. Wiring framework awareness into the core would have produced six versions of every utility within a year.

Some things have to be framework-aware, though, and those live either in the per-client folders or in the trigger handler that needs them:

  • LAB (Remix) gets the hydration-aware trigger described above. Without it, every DOM mutation was getting reconciled out from under us.
  • Toom’s market/locale system regenerates parts of the page on locale switch, so any DOM injection has to defend against React replacing its own children. The general shape: don’t fight the reconciler, watch for it, and re-apply.
  • TSH uses react-helmet, so head-tag injection has to either route through helmet or get wiped on the next render. The TSH experiments have a small adapter for that.
  • QLF’s checkout runs on a server-driven hydration layer; cart-state mutations have to route through its input cycle to survive the next reconcile.

These adapters aren’t pretty and they don’t generalise across clients, because the failure modes don’t generalise. The job of the core is to give each adapter the smallest possible surface to hook into.

Scaffolding, MCP servers, and the live-reload loop

The most boring time-sink in CRO work is the gap between a Jira ticket landing and a developer actually writing the first line of code. Read the ticket, copy the variation description, find the design URL, create the folder, copy a previous experiment for shape, rename everything, register it. Twenty minutes of nothing.

A small CLI handles all of that. Pick the client, give it the ticket key, and it pulls the variation description and design URL straight off Jira, writes a templated brief into the experiment folder, registers the entry, and formats with Biome. By the time the CLI exits, the developer has a folder, a registered experiment, and a brief — straight from a Jira ticket to ready-to-code.

That’s the corollary of the MCP-server side. I run six MCP servers: three Jira (one per active client) and three GrowthBook (same). The reason there are six and not two is that each one is isolated to a single client’s credentials and project — there is literally no path through the code that lets me accidentally create an experiment on the wrong account. I’ve written about why I run them this way in more detail. The dependency-update side has its own write-up — Dep Guardian is the weekly job that audits, fetches release notes, and uses an LLM to assess breaking-change risk before I get a notification.

There’s one piece of related infrastructure that doesn’t live in this repo at all but is essential to how I work: RequestLite, a Chrome extension I maintain separately. It rewrites production script URLs to localhost while I’m developing, which means I can edit a change file in WebStorm, hit save, and watch the change land on the actual production website with the actual production DOM in the time it takes the WebSocket reload to fire. That feedback loop is the single thing about this setup I’d defend the hardest. Experiments tested only in some staging environment lie to you about timing and lie to you about the tracker scripts, and this loop closes both gaps.

What I haven’t sanded down yet

The honest open questions, since this isn’t a launch announcement and I’d rather not pretend:

  • Six per-client MCP processes is more than I want running idle on my laptop. The trade-off I keep going around on is whether to consolidate into a single router that switches credentials per request. The “literally cannot cross-mix accounts” guarantee is genuinely valuable — it’s caught me from one near-miss already — and any router would have to recreate that guarantee at the application layer rather than at the OS process boundary. I haven’t written it yet.
  • A few architectural pieces are queued for the next major. Mostly about reducing observer overhead, making cleanup more robust under SPA navigation, and quieting the console once you’ve got thirty live experiments running. None of it is in the current version yet, and the first migration target will be recoveryarea.nl itself — which has zero existing experiments and therefore zero things I can break by getting it wrong.
  • The WebStorm Kotlin inspection I wrote to enforce GA4 tag-length limits at the IDE level no longer loads under WebStorm 2025.2 — JetBrains seems to have changed something about how custom Kotlin inspection scripts get compiled. The replacement Node lint script is on the TODO.
  • Test coverage on the core utilities is thin. I’m not proud of it. The things I touch most often have the most production exposure and the least automated coverage, and I’m running on the assumption that the production exposure does the work that unit tests would otherwise do. That’s a defensible position only as long as the next refactor doesn’t go wrong.

If you’re doing similar work and want to compare notes about hydration triggers, framework-neutral cores, or per-client MCP setups, I’m easy to reach.

Happy experimenting 🙂