Project
Automation orchestrator
Production FastAPI service at TrainWithMe that schedules and runs browser automations on top of WebGraph, with webhook intake and a review UI.
Live
- Python
- FastAPI
- WebGraph
- Cloudflare Workers
- AWS
Overview
A production automation server I designed and built at TrainWithMe. It’s the system that turns scheduled jobs and inbound events into actual browser work — the WebGraph framework is what it composes underneath. Public writeup focuses on the system shape, not the workflows it runs.
The orchestrator is responsible for the parts WebGraph deliberately doesn’t own: triggering automations, retrying them, persisting their state, exposing them through an authenticated API, and giving operators a place to inspect what happened after the fact.
Shape of the system
- Authenticated FastAPI core. Every endpoint is gated; machine callers use an API-key header, and the service is the single front door for the automations.
- Scheduled job runner. A first-class scheduler component triggers recurring automations on top of WebGraph workers. Each run is tracked end-to-end, not fire-and-forget.
- Webhook intake fronted by a Cloudflare Worker. Inbound webhooks hit a Worker that authenticates them and forwards via a queue. The origin only ever sees pre-validated traffic, and queue-side retries absorb origin outages instead of dropping events.
- Review UI mounted at runtime. A small operator UI is mounted by the server when present on disk. It’s the human surface for inspecting runs, rerunning failures, and reading captured artifacts.
- Shared infra layer. Logging, storage backends, retries, transactional email, and a session cache live in a separate Python package the server and the framework both consume.
Technical decisions worth talking about
- Framework / orchestrator split. WebGraph stays domain-agnostic — no knowledge of schedules, customers, or persistence. The orchestrator owns all of that. This means the framework gets reused without dragging product-specific state with it.
- Webhooks never hit origin raw. Putting a Worker in front kills two classes of pain at once: spoofed traffic and origin downtime. The cost is one extra hop; the benefit is the origin can take its time and the queue smooths over restarts.
- Operator UI as a static mount, not a separate deploy. The review UI ships in the same image and is mounted by the server when present. One thing to deploy, one URL, no CORS dance, no extra auth surface.
- Per-run artifacts, not per-run logs alone. Captured pages, screenshots, and intermediate state are kept with the run record. The first question after “did it fail?” is always “what did it see?”, and the answer is one click away in the review UI.
Status
In production at TrainWithMe.