Fix DNF silently crashing on Fedora Server 36

     

To be fair, this is most likely not specific to Fedora, and definitely not just Fedora Server 36 specificly. The symptom is that when you try to run

dnf update

The command just crashes so hard that you’re either logged out from SSH, or if you log in directly via the console, you’re presented with an empty login screen, so you don’t even have the chance to read any output it might’ve produced.

Then you try the usual things:

dnf clean dbcache
dnf clean all

Still no luck. Then you realize there’s a utility called pkcon, mostly used by desktop environments to basically connect the UI with the backend. But it still has a CLI frontend, so it may be worth a shot, right?

So let’s try to clean up the mess in a different way:

pkcon refresh force

Now let’s do the update itself!

pkcon get-updates

NOPE:

The daemon crashed mid-transaction!

Riiight, we’re getting somewhere. Let’s see the logs… how about EL’s generic trashbin, /var/log/messages?

Nov 27 00:11:44 noobient systemd[1]: Starting dnf-makecache.service - dnf makecache...
Nov 27 00:11:46 noobient dnf[544776]: Fedora 36 - x86_64                               42 kB/s |  20 kB     00:00
Nov 27 00:11:47 noobient dnf[544776]: Fedora 36 openh264 (From Cisco) - x86_64        1.9 kB/s | 989  B     00:00
Nov 27 00:11:48 noobient dnf[544776]: Fedora Modular 36 - x86_64                       29 kB/s |  19 kB     00:00
Nov 27 00:11:48 noobient dnf[544776]: Fedora 36 - x86_64 - Updates                     25 kB/s |  17 kB     00:00
Nov 27 00:11:53 noobient systemd-oomd[333539]: Killed /system.slice/dnf-makecache.service due to memory used (959856640) / total (1011499008) and swap used (955424768) / total (1010823168) being more than 90.00%
Nov 27 00:11:53 noobient systemd-oomd[333539]: Killed /system.slice/dnf-makecache.service due to memory used (926289920) / total (1011499008) and swap used (950833152) / total (1010823168) being more than 90.00%
Nov 27 00:11:54 noobient systemd[1]: dnf-makecache.service: Main process exited, code=killed, status=9/KILL
Nov 27 00:11:54 noobient systemd[1]: dnf-makecache.service: Failed with result 'signal'.
Nov 27 00:11:54 noobient systemd[1]: Failed to start dnf-makecache.service - dnf makecache.
Nov 27 00:11:54 noobient audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-makecache comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Nov 27 00:11:54 noobient systemd[1]: dnf-makecache.service: Consumed 5.423s CPU time.

Are you fucking kidding me? DNF fails because… it’s eating up too much memory? For real? My system has 1G RAM and 1G swap, but that’s still not enough even to… CHECK for updates? Not to apply the updates, just to CHECK for updates. We’re literally talking about downloading a couple of XML files, and comparing their contents with local package state files.

The big irony of all this is that DNF is actually being ported to C/C++, so I can’t even blame this on new hipster languages, it cannot be anything else but bad coding and/or memory leaks somewhere…

In any case, the problem existed at least since to Fedora 33, but clearly, things haven’t improved but actually regressed even further. They suggested that zram would mitigate this – too bad my Fedora system is already on zram, but “only” 1G of that.

So let’s increase the zram, shall we? Just create /etc/systemd/zram-generator.conf the with the following:

[zram0]
zram-fraction=2.0
max-zram-size=2048

Then

systemctl restart [email protected]

And bam, DNF suddenly starts working! Easy.