Notes on systemd Internals
June 9, 2016
systemd is the init system used by the current versions of most major Linux distributions, including Debian, Red Hat, Ubuntu and Arch. One of its key features is reliable dependency management, building a dependency graph of services, and sequencing of services where unrelated services can be started in parallel, but dependencies are sequenced correctly.
Let’s walk through how systemd implements this. Of note - if you’re interested in what a modern C codebase that doesn’t care about backwards compatibility can look like, systemd is especially interesting.
[note: this ended up as more of a brain dump than anything else.]
At a high level, systemd does the following tasks in order:
loads unit files, potentially from disk (each type, such as
mountis asked to load all the files of that particular type, presumably so that logic specific to a unit type, such as reading in symlinks in a directory ending in
Wantsassociated with a given target.
adds a job to run the default target, which recursively adds jobs for every dependency to be started. This is where the complexity lives.
enters into an infinite loop, where the
manager_loopfunction is invoked. From here, each work queue is dispatched, which results in dependencies getting started.
manager_add_job, a transaction object is created and used as a
staging ground to encapsulate all of the dependency management and
ordering logic. If anything goes wrong (like a dependency cycle can’t
be broken, or a dependency is permanently masked), the transaction is
aborted, an error is returned, and nothing is started or stopped.
Jobs are added recursively to the transaction by
If the flag
JOB_ISOLATE is set, then jobs are also added to stop all
known units that aren’t already in the current transaction (which is
how systemd manages effectively decreasing the runlevel, going from
graphical to console-based.
is really the meat and potatoes of what we’re looking for. In this
function, unnecessary jobs are removed (like jobs to start an already
running service), an ordering is performed on all remaining jobs in
the transaction, sanity checks are performed (like - is this a
transaction to put our computer to sleep, when we have jobs pending
for a shutdown? We can’t do both!), and finally jobs are added to the
So, the gist of things is that all of the logic for figuring out what to start (or stop) and in what order is done at once, inside of a transaction, and I can’t see a way where having each unit as an independent agent (goroutine) that communicates over channels makes things less complex.