Notes on systemd Internals
systemd is the init system used by the current versions of most major Linux distributions, including Debian, Red Hat, Ubuntu and Arch. One of its key features is reliable dependency management, building a dependency graph of services, and sequencing of services where unrelated services can be started in parallel, but dependencies are sequenced correctly.
Let’s walk through how systemd implements this. Of note - if you’re interested in what a modern C codebase that doesn’t care about backwards compatibility can look like, systemd is especially interesting.
[note: this ended up as more of a brain dump than anything else.]
At a high level, systemd does the following tasks in order:
loads unit files, potentially from disk (each type, such as
service
andmount
is asked to load all the files of that particular type, presumably so that logic specific to a unit type, such as reading in symlinks in a directory ending in.target.wants
updates theWants
associated with a given target.adds a job to run the default target, which recursively adds jobs for every dependency to be started. This is where the complexity lives.
enters into an infinite loop, where the
manager_loop
function is invoked. From here, each work queue is dispatched, which results in dependencies getting started.
manager_add_job
In manager_add_job
, a transaction object is created and used as a
staging ground to encapsulate all of the dependency management and
ordering logic. If anything goes wrong (like a dependency cycle can’t
be broken, or a dependency is permanently masked), the transaction is
aborted, an error is returned, and nothing is started or stopped.
Jobs are added recursively to the transaction by
transaction_add_job_and_dependencies
.
If the flag JOB_ISOLATE
is set, then jobs are also added to stop all
known units that aren’t already in the current transaction (which is
how systemd manages effectively decreasing the runlevel, going from
graphical to console-based.
transaction_activate
is really the meat and potatoes of what we’re looking for. In this
function, unnecessary jobs are removed (like jobs to start an already
running service), an ordering is performed on all remaining jobs in
the transaction, sanity checks are performed (like - is this a
transaction to put our computer to sleep, when we have jobs pending
for a shutdown? We can’t do both!), and finally jobs are added to the
queue by
transaction_apply
.
So, the gist of things is that all of the logic for figuring out what to start (or stop) and in what order is done at once, inside of a transaction, and I can’t see a way where having each unit as an independent agent (goroutine) that communicates over channels makes things less complex.