Replicating My AI-Built Research Workstation

Created: June 13, 2026   Last Modified: June 13, 2026   Category: research, tools   Print this pageBack to Home

Summary

Over time my research machine grew into a small fleet of AI coding agents — Claude Code, Codex, GitHub Copilot CLI, OpenCode, DeepSeek, Gemini, and a self-hosted bot called OpenClaw — wired together with a shared set of research skills: a Zotero library, a paper downloader, daily literature digests, multi-agent proof/manuscript reviews, and a SageMath sandbox. To make that setup reproducible (and survivable), the agents built their own backup-and-replication system: a public GitHub repo of sanitized configuration plus a single encrypted archive of secrets that, together, can rebuild the whole machine on a fresh Ubuntu box. This post is a short tour of what the system does, two or three real examples, how to try its basic functions for free in a GitHub Codespace, and which keys you would need to fully replicate it.

This is experimental. Everything below — and this very post — was generated by the multi-agent AI coding system it describes. It was built for one person’s workflow (combinatorics and graph theory, especially combinatorial reconfiguration), it assumes an arm64 Ubuntu host, and it may not work as expected on other machines, other agent versions, or research tasks outside those assumptions.

What this is

The system lives in three public repositories:

The motivation was never “back up dotfiles” — it was to keep a research workflow intact: ask a question, gather the literature, compute, verify, and write, with the agents doing the legwork. The rest of this post focuses on that workflow. Architecture details are in ARCHITECTURE.md.

What it does

The research workflow (the main motivation)

A non-trivial research question goes through visible gates rather than a single black-box answer: a short Research Brief scopes the question, a deep-research pass fans out web searches, fetches sources, and adversarially verifies each claim, and review/verification gates check the evidence before anything is called “done.”

Vertical flowchart of the research workflow: research question; Research Brief; gather literature (Zotero, Calibre, getscipapers); deep research with adversarial verification; compute and draw (SageMath, TikZ); multi-agent review; verification gate; deliver.
The research workflow. The blue steps are the visible gates — the brief, the adversarial verification, the multi-agent review, and the final evidence check — that keep every claim sourced before anything is called “done”.

These flows live in the agent instructions and the multi-agent templates in the agent-group-discuss skill.

Getting papers (Zotero first)

Document lookup follows a strict order — Zotero → Calibre → online — so a paper already in my 10,000-item library is never re-downloaded.

Ingestion is powered by a local Zotero Translation Server — a small Docker service (the same engine behind the Zotero browser connector) that turns a URL, DOI, or identifier into a fully-catalogued item with correct metadata. It’s what makes one-command “add this paper” work, and the rebuild restores it along with the library config (see INSTALL.md).

Daily arXiv / Semantic Scholar and RSS digests surface new papers on tracked topics. (Some download methods are a separate topic, discussed here.)

Multi-agent tasks

For work that benefits from independent perspectives, tasks are split across several agents that run in parallel within a round and hand off between rounds — the same idea as the proof example above, applied to other jobs:

There is also a SageMath sandbox for the small computations these tasks need — chromatic/Tutte polynomials, automorphism groups, exhaustive small-case checks — with ready-made templates such as reconfiguration_check.sage and counterexample_search.sage.

Figures (TikZ), Sage-assisted

Those computed structures usually have to become figures in a paper. The tikz-draw skill builds structural diagrams — finite graphs, gadgets, automata, trees, commutative diagrams — with a structure-first loop: figure brief → spec → render → verify-semantic → compile → review, so a diagram is checked against the structure it is meant to show rather than just compiled.

Heavy compute, offloaded (Modal)

Some research steps are easy to parallelise but too heavy for a single box — enumerating all graphs up to some order to hunt for a counterexample, sweeping a parameter grid, or re-running a SageMath check over thousands of cases. The /research-compute skill routes those jobs to Modal through a small local broker, picking remote CPU, high-memory CPU, or GPU to fit the job: the agent packages the work, Modal spins up containers on demand, fans the work out, and streams results back. A search that would run for hours on the local box finishes in minutes across many workers, and you pay only for the seconds they actually run.

Trying it (limited) in a GitHub Codespace

You can run a live, interactive replica in a GitHub Codespace without any of my secrets. Open the repo, Code → Codespaces → Create, and the container builds itself: it installs the software stack, renders all the configuration, and runs the health checks.

Full instructions and the honest caveats (a Codespace is amd64; my machine is arm64, so it’s a functional — not bit-identical — replica) are in CODESPACES.md.

Want a real arm64 box like mine? The machine this system runs on is an Oracle Cloud Ampere A1 (arm64) instance, and Oracle’s Always Free tier hands out one in the same family at no cost: up to 4 Arm cores and 24 GB of RAM (which you can split across as many as four small VMs) plus around 200 GB of block storage — enough to host the full arm64 replica rather than the amd64 Codespace approximation. Free-tier offerings change, so check Oracle’s current Always Free list before relying on the exact numbers.

Secrets and keys you’d need to replicate it

The public repos contain no secrets — only sanitized templates and the names of the keys. To replicate the system you would supply your own. The key thing to understand is that almost every secret exists only because the system talks to some external service I happen to use — so for several of them you would not need the same key at all, and might plug in a different service entirely. Here is why each category is needed, and where your choices would differ from mine:

Every key — where to obtain it, which file it lives in, and exactly what stops working without it — is documented key-by-key in SECRETS.md. Without them the system still installs and the Codespace still runs; it just degrades feature-by-feature, and make verify-secrets --degraded prints which feature each missing key disables.

Caveats

This is a personal, experimental system, not a product. It targets a specific arm64 Ubuntu setup, pins specific tool versions, and bakes in assumptions about my research (reconfiguration problems, graph invariants, LaTeX manuscripts). Expect to adapt it. If you only want to look, the Codespace degraded mode is the safest way to poke around.

What’s actually worked

It is not all cautionary tales — the same logs record real wins, across several different agents:

When the agents get it wrong

Because the same agents wrote this system, it is only fair to show how they fail. The following are real incidents from this project’s own logs (spanning Claude Code, Codex, DeepSeek, and the self-hosted OpenClaw bot), not hypotheticals — and they are why the workflow above has so many explicit gates:

The pattern is the same across all of them: an agent sounded certain, or an automated check looked green, when it was not. That is exactly why the research workflow front-loads a visible brief, adversarial verification, independent multi-agent review, and a final evidence gate — and why this system is labelled experimental and meant to be supervised, not trusted blindly.

Tools it builds on

The system glues together a number of smaller tools, several of which have their own repositories:

It also relies on established third-party software — Zotero, SageMath, Calibre, Lean, and the AI agent CLIs themselves — installed and configured by the rebuild.

A note on how this was made

The three repositories, their documentation, the backup/restore machinery, the CI, the Codespace, and this blog post were all written by the same multi-agent AI coding system that the repositories back up and replicate — the agents building (and documenting) their own infrastructure. That self-referential loop is half the fun, and half the reason to treat it as experimental.