Debugging CI workflows with a miniature CI server

Tom Elliott
Tom Elliott

The new v0.3.0 of the Ocuroot SDK is taking its final shape and I've been firmly in testing mode for the past few weeks. A key feature under test is Ocuroot's integration with GitHub Actions, allowing you to run all your builds and deploys on GitHub Actions, but with Ocuroot managing state and orchestrating the non-linear parts of your release process.

Last week, I was working on a demo repo that created a Kubernetes cluster in staging and production environments, then started an app on top of them. All was going smoothly until I introduced an intent change, at which point, scheduling of actions runs exploded.

A small sample of the unintentional actions runs I created

With hundreds of runs being created over just a few minutes, I quickly disabled Actions on the repo and considered my next move. Frustratingly, having actions disabled meant I couldn't review logs. In retrospect, a better fix would have been to disable the PAT token that was being used to automatically trigger these runs. In writing that sentence, I now realize why you can only trigger runs using a PAT token rather than the autogenerated github token provided by default...

At this point, I had a few hunches as to what might be going on, but to test them I'd need to recreate the sequence of events leading to the problem. I could do this manually, but this would involve multiple pushes, and waiting for a few resulting Actions runs to complete. Even if I automated the sequence, it would be time consuming. Plus I'd need to have a safety mechanism to catch runaway scheduling before my account got flagged for abuse.

What I needed was something I could run locally, on-demand and ideally without having to wait for jobs to make it to the front of a cloud queue. I briefly considered setting up a local Jenkins instance, but quickly realized that an integration with a local git server would be a bit more involved than I might have liked - not to mention the potential challenges in reproducing this setup for the delightfully meta scenario of testing CI workflows on CI.

All I really needed was a service that could check out a repo at a specific commit, and run a shell command. This would be enough to simulate triggering runs with pushes to a repo (assuming I was the only one doing pushes), and with a REST endpoint I could use curl to trigger a run from another run.

One really nice thing I discovered was that you don't even need a git server to test git workflows involving clones and pushes. You can create a "bare" repository in a local directory and use this directory path as a remote in a clone.

Finally, following on from my previous post, I intended to continue my practice of scripting my end-to-end tests in bash.

So I set about writing an embarrassingly simple CI server in Go. After a couple of days, the result was minici, a teeny-tiny CI server that can be started locally to test workflows.

minici

This is an insanely niche use case, but I've open sourced the tool anyway for the heck of it!

In my bash-driven tests, I could start up minici with a single command:

go run github.com/ocuroot/minici/cmd/minici@latest -port "$CI_PORT" &

With a background server in-hand, I could schedule a run with curl:

curl -X POST http://localhost:$CI_PORT/api/jobs -H "Content-Type: application/json" -d '{"repo": "/path/to/test/repo.git", "branch": "main", "command": "./run.sh"}'

With additional endpoints to query the list of jobs (/api/jobs), get the status of a job (/api/jobs/{job_id}), and get the logs of a job (/api/jobs/{job_id}/logs).

And when the tests were done, I could stop it by killing the process:

pkill -f "minici"

With these tools available, I was able to construct a test that reproduced the problems I saw in GitHub Actions. This test ran in seconds rather than minutes, and provided me with as much log output as I needed to get to the root of the problem.

And what was the problem? It turned out I needed to remove a * from a glob string. Yes, it was a one-character fix.

What's next?

I'm in the final stages of preparing the SDK v0.3 release. My aim is to share a version that you can try out very soon. In the meantime, you can follow Ocuroot on LinkedIn, BlueSky or get in touch directly by booking a demo.