Yes, NixOS may have spotted this particular issue in the following two ways, in order:
Hydra’s CI via the ZFS integration test.
Evaluation of Nix code (assertions and errors before deployment) constraining what Kernel Modules are required in the initramfs
if the issue is undetected and bypasses 1 or 2, each nixos-rebuild switch creates a new bootloader entry anyway, which allows you to boot from the previous configuration’s initrd/kernel. This means recovery from this scenario is a simple as selecting that old option from the bootloader after rebooting.
In short, issues have to bypass 1 and 2 first before it can effect the user, and when it does effect the user they can often just rollback to the old configuration.
Something similar happened to me once. So I cobbled a solution which was an extra kernel+initramfs and grub entry for it.
It’d dump me in the initramfs with a shell and zfs so I could import my pool and roll to an earlier snapshot. Combine that with a pacman hook to snapshot on package transactions and I had a mostly bullet proof system.
These days zfsbootmenu exists and solves the problem much better than I did.
I’m wondering why you don’t run the zfs-linux package from the archzfs repository though. That would have prevented the issue and that is what is mostly recommended by the community.
I used to use zfs-linux, but since I only update/reboot about once a week, it was too annoying when the package and the kernel mismatched and thus blocked updating. archzfs shipping a suitable kernel itself could solve this. or me just ignoring kernel/zfs updates in that case, but that doesn’t feel right. I kinda wanna look into keeping old kernel/initramfs around, since that seems generally useful if something breaks
I personally separate out kernel updates and normal upgrades. It usually works fine even if it’s not recommended, but you need to be slightly more aware about changes and new entries in-case there are changes to the initramfs generation.
Really not. Nix is not a silver bullet. And it’s not at all stable and established compared to even “unstable” distros like Arch. Yes, this was a stupid problem for Arch to have, but I’d rather have this problem than run Nix rebuilds for every other line of config I change, or figure out how to set up an unsupported config option the “right” way, or figure out which parts of the filesystem are even visible to whatever environment whatever broken thing is running and failing from. There are plenty of valid reasons not to use Nix.
That and most Linux distros support booting from the previous kernel version for exactly this reason. They have done so for years and years, long before Nix ever existed. You don’t need Nix to press the down arrow in grub before you boot.
I should of mentioned , for those that have never seen a nixos test, this is one of them.
Every time Nixos is built , it runs of series of integrations tests. These are provided by Nixos itself, not tests that are part of the upstream package. Nixos does it own tests… Somethings can only be spotted with full integration tests, rather like the post in question.
Upstream can’t run integration tests, because they don’t know where there code is going to be integrated. Often upstream do not even have tests :-(. The exception is the Linux kernel, but ZFS is out of tree… see the problem?
So, nixos is not unbreakable, it breaks all the time because that is the nature of the open source and software in general… more dependencies, more patches, more lines of code, just more ….. . software complexity increases. Breakages increase!
Nix can detect if a version bump of a lib , driver (if hardware is available), kernel etc etc breaks something… This catches a lot of stuff, Nixos does the testing, not our users.
Nixos is also ‘released’ and ‘rolling’ at the same time, you can pull packages from stable and unstable at the same time.
Pull from stable and unstable at the same time? You can’t do that…. … ‘WRONG’.
So, this test would of caught this problem, meaning the Nixos CI/CD would of spotted it.
Automated tests catch as much as possible, we don’t leave testing to the end user, like Microsoft , Arch and everyone else. We have so many bots running around, its like Terminator 2 and Wall-E combined.
Arch is fantastic, and i was a happy user for many years. But Nixos has the best end to end full integration tests of any operating system on earth right now.
It’s in the box, included with the operating system… :-)
If your interesting in a CI/CD pipeline that builds an entire operating systems…, does full end to end integration testing, or any kind of testing.. then take a look!
If your writing your own software, or deploying your own infrastructure, at home or at work, i strongly suggest you take a look. It could solve some of you problems, and give you new ones that are nicer to have.
Nix is kinda like Marmite, you either love the taste, or you hate it.
Upstream can’t run integration tests, because they don’t know where there code is going to be integrated. Often upstream do not even have tests :-(. The exception is the Linux kernel, but ZFS is out of tree… see the problem?
They can, though, run tests for a relevant subset of popular distributions. I believe the OpenZFS folks test their changes on a handful of Linux distributions and FreeBSD and so on as part of their CI environment.
There is no such construction as would/should/could of in English. It’s meaningless. “Would of” is just a mishearing of “would’ve”, which is short for “would have”.
You may not consider this important. I do, and I really hate it. When I see it, it makes me think less of the person writing, and it makes me wonder that if they don’t understand this, maybe they don’t understand what they are writing about, and maybe what they are saying is wrong because they don’t know as much as they think that they do.
And such, in fact, this turned out to be. As later comments indicate, you were in fact wrong. But while I know Linux and I use ZFS and I am exploring Nix, and have written favourably about NixOS recently, I didn’t know enough to judge this comment for its content. Something else leapt out at me instead.
I thought GRUB merged support for ZFS boot environments a while ago. These should be usable on any ZoL distro and let you roll back after an upgrade trivially.
Comments like this is inherently unhelpful and considering a hashtag is what you have contributed in discussions beyond your own makes me question the intent of your engagement on this forum.
Suppose that I were to give the top-level comment, “Have you considered using ext4?” I’m not going to do this because the author indicated that this article is meant as a personal note, rather than an endorsement for ZFS.
To the community, this is a time for postmortem analysis. The author suffered a disastrous incident, and it’s worth reflecting on what happened so that it doesn’t happen again. It’s not facile or inherently unhelpful to point out that Nix can manage atomic kernel upgrades/downgrades, any more than it would be to point out that ext4 is maintained in-kernel and that it is a bad idea to depend on an out-of-tree kernel module for one’s root filesystem.
It’s not facile or inherently unhelpful to point out that Nix can manage atomic kernel upgrades/downgrades, any more than it would be to point out that ext4 is maintained in-kernel and that it is a bad idea to depend on an out-of-tree kernel module for one’s root filesystem.
If that was the comment I was replying to, sure.
However it literally just contains a meme.
Your replies are also memetic. In particular, you’ve replied to my analogy – a two-dimensional commuting structure which is well-pointed not just with the top-level example point (Nix) but with a second point (ext4) – with a thought-terminating dismissal. So, dilemma: are meme-only dismissive replies a reasonable mode of engagement? Hope this is interesting food for thought.
Hello, Your profile says
‘F/OSS Developer and Arch Linux Developer doing packaging and security.
Interested in Golang, development and supply chain security.’
Nix can do a lot of great things related to software supply chain security…
I’ve given my full explanation why #shuddausednix is valid , so please take a look. I’d love your views.
Hopefully Arch can get integration testing framework as good an Nix, so i can return Arch one day. #shuddausedarch
What you forget is that, like Archlinux, we have to deal with those and our maintainers are not very responsive (well I became one and I don’t guarantee that I will be able to answer everything in a timely fashion, though I actively use ZFS on latest kernels).
As a NixOS maintainer (of ZFS) and developer, I don’t think it help that much OP to hear these things.
Certainly, we have features that enable people to rollback, etc. In the end, ZFS is this hard to get right.
And stuff like https://github.com/NixOS/nixpkgs/pull/222946 are still open BTW.
Please do not belittle the difficulty of the task.
NixOS hasn’t exactly been smooth sailing for ZFS for similar reasons though. When a kernel gets deprecated, Nixpkgs (rightfully) removes it a few weeks after. Since OpenZFS lags behind the kernels, this means that Nixpkgs will naturally drop back to the latest supported kernel for the latest stable ZFS modules (which ends up being LTS). In my case, prior to 6.1 LTS, the LTS would not even boot on my hardware. That said, Nix could at least allow you to do a rollback and troubleshoot this issue. But I would have been stuck on a very old Nixpkgs version had I not ran off of a local fork that bumped the Unstable versions since the listed ZFS maintainers for Nixpkgs aren’t responsive.
I was thinking about doing that. What previously put me off was that nix is kinda slow (when I was trying nix as a replacement for AUR) and that the nixOS ISO didn’t even boot.
As an Arch-to-Nix convert, I can say that switching worked wonders for me (I switched after I somehow lost my GPU drivers on Arch). To me, nix felt very much like the good parts of arch and then some.
Like Arch:
you have a lot of control over system, there’s no “by default, gnome is installed”, if you want something, you need to enable it via config yourself.
you have a boatload of packages available. Graphs say nix has an insane lead here over everything else, including arch (https://repology.org/repositories/graphs). I don’t know how exact is that count, but that does seem believable.
rolling release: nixos-unstable is a bit of a misnomer, it works perfectly as a desktop OS, and packages are updated very rapidly (and it’s also easy to submit updates yourself, or to use a fork of nixpkgs)
Unlike Arch:
you can’t break nix, even you handle it wrong. You can always just boot the previous system. Root is mostly read-only, so even Linux dummies like me can’t mess it up
the state of the system is encoded in one short human-readable file, what’s not in that file is not on your system. NixOS is the first OS where I don’t need to re-install it from scratch once every couple of years just to get a clean state. Rather the opposite, my OS now outlives my hardware.
Of course, there are drawbacks as well:
the logo is not so great
there’s nothing close to Arch wiki and the learning curve is steep
NixOS is quite unlike other traditional distros, it’s very much “my way or highway.
the tooling is not as polished and fast than that of Arch
I really liked all those aspects of Nix. I probably would have stuck with it if I could get it to run and build binaries that weren’t packaged for nix. I think it’s possible to do what I wanted with the fake filesystem hierarchy thing, but I could only get it to sorta work and I had no luck at all understanding the nix language.
A NixOS user, I am inclined to believe reports of Nix being “kinda slow” — in my experience, not in downloading or building packages but in evaluating the Nix language and deciding what it needs to download or build; I suspect that package managers with less “Turing-complete” package definitions can be faster about that.
I don’t consider this a significant problem for me, but I understand it will bother some users.
You could overlay + override nix / nixUnstable packages to be recompiled using O3 and native architecture flags for its C++ compilation and you’d get an uptick in performance.
the mean actually evaluating the nix code and generating the $out path’s. If you know the path , you know what to fetch. This can take some time..but it has been optimised, and is being optimised either further for lighting fast evaluations.
Yes, NixOS may have spotted this particular issue in the following two ways, in order:
if the issue is undetected and bypasses 1 or 2, each
nixos-rebuild switch
creates a new bootloader entry anyway, which allows you to boot from the previous configuration’s initrd/kernel. This means recovery from this scenario is a simple as selecting that old option from the bootloader after rebooting.In short, issues have to bypass 1 and 2 first before it can effect the user, and when it does effect the user they can often just rollback to the old configuration.
Something similar happened to me once. So I cobbled a solution which was an extra kernel+initramfs and grub entry for it.
It’d dump me in the initramfs with a shell and zfs so I could import my pool and roll to an earlier snapshot. Combine that with a pacman hook to snapshot on package transactions and I had a mostly bullet proof system.
These days zfsbootmenu exists and solves the problem much better than I did.
I’m wondering why you don’t run the
zfs-linux
package from the archzfs repository though. That would have prevented the issue and that is what is mostly recommended by the community.https://wiki.archlinux.org/title/ZFS#General
I used to use
zfs-linux
, but since I only update/reboot about once a week, it was too annoying when the package and the kernel mismatched and thus blocked updating. archzfs shipping a suitable kernel itself could solve this. or me just ignoring kernel/zfs updates in that case, but that doesn’t feel right. I kinda wanna look into keeping old kernel/initramfs around, since that seems generally useful if something breaksI personally separate out kernel updates and normal upgrades. It usually works fine even if it’s not recommended, but you need to be slightly more aware about changes and new entries in-case there are changes to the initramfs generation.
#shuddausednix
no really.
Really not. Nix is not a silver bullet. And it’s not at all stable and established compared to even “unstable” distros like Arch. Yes, this was a stupid problem for Arch to have, but I’d rather have this problem than run Nix rebuilds for every other line of config I change, or figure out how to set up an unsupported config option the “right” way, or figure out which parts of the filesystem are even visible to whatever environment whatever broken thing is running and failing from. There are plenty of valid reasons not to use Nix.
That and most Linux distros support booting from the previous kernel version for exactly this reason. They have done so for years and years, long before Nix ever existed. You don’t need Nix to press the down arrow in grub before you boot.
NixOS does keep previous boot entries around, which the author specifically calls out as something that would have avoided the issue.
Yes so does almost every other Linux, as I said. That’s not an argument for Nix.
https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/zfs.nix
Let me explain the #shuddausednix hash tag
I should of mentioned , for those that have never seen a nixos test, this is one of them.
Every time Nixos is built , it runs of series of integrations tests. These are provided by Nixos itself, not tests that are part of the upstream package. Nixos does it own tests… Somethings can only be spotted with full integration tests, rather like the post in question.
Upstream can’t run integration tests, because they don’t know where there code is going to be integrated. Often upstream do not even have tests :-(. The exception is the Linux kernel, but ZFS is out of tree… see the problem?
So, nixos is not unbreakable, it breaks all the time because that is the nature of the open source and software in general… more dependencies, more patches, more lines of code, just more ….. . software complexity increases. Breakages increase!
Nix can detect if a version bump of a lib , driver (if hardware is available), kernel etc etc breaks something… This catches a lot of stuff, Nixos does the testing, not our users.
Nixos is also ‘released’ and ‘rolling’ at the same time, you can pull packages from stable and unstable at the same time.
Pull from stable and unstable at the same time? You can’t do that…. … ‘WRONG’.
So, this test would of caught this problem, meaning the Nixos CI/CD would of spotted it.
Automated tests catch as much as possible, we don’t leave testing to the end user, like Microsoft , Arch and everyone else. We have so many bots running around, its like Terminator 2 and Wall-E combined.
Arch is fantastic, and i was a happy user for many years. But Nixos has the best end to end full integration tests of any operating system on earth right now.
It’s in the box, included with the operating system… :-)
If your interesting in a CI/CD pipeline that builds an entire operating systems…, does full end to end integration testing, or any kind of testing.. then take a look!
If your writing your own software, or deploying your own infrastructure, at home or at work, i strongly suggest you take a look. It could solve some of you problems, and give you new ones that are nicer to have.
Nix is kinda like Marmite, you either love the taste, or you hate it.
Also Nix/os comes with a money back guarantee.
I don’t like Marmite… it’s taste like used tyres.
They can, though, run tests for a relevant subset of popular distributions. I believe the OpenZFS folks test their changes on a handful of Linux distributions and FreeBSD and so on as part of their CI environment.
They can , but can you? I can run the tests , because i have the CI/CD system on my system.
I don’t need their infrastructure to run the tests, because i can run the tests locally.
Does that make sense?
So i do the test where it is most needed, at point of integration with the actual operating system.
Very good question..thanks for asking!!!
Would have.
There is no such construction as would/should/could of in English. It’s meaningless. “Would of” is just a mishearing of “would’ve”, which is short for “would have”.
You may not consider this important. I do, and I really hate it. When I see it, it makes me think less of the person writing, and it makes me wonder that if they don’t understand this, maybe they don’t understand what they are writing about, and maybe what they are saying is wrong because they don’t know as much as they think that they do.
And such, in fact, this turned out to be. As later comments indicate, you were in fact wrong. But while I know Linux and I use ZFS and I am exploring Nix, and have written favourably about NixOS recently, I didn’t know enough to judge this comment for its content. Something else leapt out at me instead.
And it turns out, my feeling was right.
You might want to watch out for this.
Sorry, I have dyslexia.
I thought GRUB merged support for ZFS boot environments a while ago. These should be usable on any ZoL distro and let you roll back after an upgrade trivially.
Comments like this is inherently unhelpful and considering a hashtag is what you have contributed in discussions beyond your own makes me question the intent of your engagement on this forum.
Suppose that I were to give the top-level comment, “Have you considered using
ext4
?” I’m not going to do this because the author indicated that this article is meant as a personal note, rather than an endorsement for ZFS.To the community, this is a time for postmortem analysis. The author suffered a disastrous incident, and it’s worth reflecting on what happened so that it doesn’t happen again. It’s not facile or inherently unhelpful to point out that Nix can manage atomic kernel upgrades/downgrades, any more than it would be to point out that
ext4
is maintained in-kernel and that it is a bad idea to depend on an out-of-tree kernel module for one’s root filesystem.If that was the comment I was replying to, sure. However it literally just contains a meme.
Your replies are also memetic. In particular, you’ve replied to my analogy – a two-dimensional commuting structure which is well-pointed not just with the top-level example point (Nix) but with a second point (
ext4
) – with a thought-terminating dismissal. So, dilemma: are meme-only dismissive replies a reasonable mode of engagement? Hope this is interesting food for thought.Hello, Your profile says ‘F/OSS Developer and Arch Linux Developer doing packaging and security. Interested in Golang, development and supply chain security.’
Nix can do a lot of great things related to software supply chain security…
I’ve given my full explanation why #shuddausednix is valid , so please take a look. I’d love your views.
Hopefully Arch can get integration testing framework as good an Nix, so i can return Arch one day. #shuddausedarch
What you forget is that, like Archlinux, we have to deal with those and our maintainers are not very responsive (well I became one and I don’t guarantee that I will be able to answer everything in a timely fashion, though I actively use ZFS on latest kernels). As a NixOS maintainer (of ZFS) and developer, I don’t think it help that much OP to hear these things. Certainly, we have features that enable people to rollback, etc. In the end, ZFS is this hard to get right. And stuff like https://github.com/NixOS/nixpkgs/pull/222946 are still open BTW.
Please do not belittle the difficulty of the task.
ZFS is out of tree it’s hard.
Thanks for you hard work. I’m sorry that i miss understood why Arch failed, and Nixos would of failed too. Tests don’t catch everything.
I reread the thread, i think i understand it now, i’m not sure who is belittling the task, it’s a complex task.
The fact you maintain both Arch and Nixos ZFS is a great undertaking. I reread the thread, i can’t see any belittling?
I’m thinking of porting Nix to Illmous …ZFS in tree there as far as i can tell.
And you get to use a solaris kernel too, not the ‘kitchen sink’ linux kernel.
Lets get ZFS in the linux kernel tree! :-)
The world would be much simpler if it was not for Oracle..
ZFS in the Linux kernel tree is called https://bcachefs.org/ IMHO :P.
NixOS hasn’t exactly been smooth sailing for ZFS for similar reasons though. When a kernel gets deprecated, Nixpkgs (rightfully) removes it a few weeks after. Since OpenZFS lags behind the kernels, this means that Nixpkgs will naturally drop back to the latest supported kernel for the latest stable ZFS modules (which ends up being LTS). In my case, prior to 6.1 LTS, the LTS would not even boot on my hardware. That said, Nix could at least allow you to do a rollback and troubleshoot this issue. But I would have been stuck on a very old Nixpkgs version had I not ran off of a local fork that bumped the Unstable versions since the listed ZFS maintainers for Nixpkgs aren’t responsive.
I took the maintenance of ZFS back, I am trying to keep up, but it’s definitely hard for every distribution that wants to have a up to date kernel.
That’s good to hear though. I’d prefer not to be a “please respond” guy when all I did was bump a commit, get a new hash, and saw that it built/ran.
I was thinking about doing that. What previously put me off was that nix is kinda slow (when I was trying nix as a replacement for AUR) and that the nixOS ISO didn’t even boot.
As an Arch-to-Nix convert, I can say that switching worked wonders for me (I switched after I somehow lost my GPU drivers on Arch). To me, nix felt very much like the good parts of arch and then some.
Like Arch:
Unlike Arch:
Of course, there are drawbacks as well:
I really liked all those aspects of Nix. I probably would have stuck with it if I could get it to run and build binaries that weren’t packaged for nix. I think it’s possible to do what I wanted with the fake filesystem hierarchy thing, but I could only get it to sorta work and I had no luck at all understanding the nix language.
Never heard that one before. Is it too complex compared to the Arch or Debian logos?
I flashed a USB with the NixOS live iso yesterday and it worked, if that matters.
Unless you’re getting into the weeds, the packages you need to build are probably in the binary cache, so they’re just downloaded instead of built.
A NixOS user, I am inclined to believe reports of Nix being “kinda slow” — in my experience, not in downloading or building packages but in evaluating the Nix language and deciding what it needs to download or build; I suspect that package managers with less “Turing-complete” package definitions can be faster about that.
I don’t consider this a significant problem for me, but I understand it will bother some users.
A victim of it’s own success.
There are great efforts to improve this evaluation time. Hopefully some changes evaluation time can greatly reduced with nix compiler optimisations.
There is a lot of software out there to build!!!! lets build all the things!
You could overlay + override
nix
/nixUnstable
packages to be recompiled using O3 and native architecture flags for its C++ compilation and you’d get an uptick in performance.the mean actually evaluating the nix code and generating the $out path’s. If you know the path , you know what to fetch. This can take some time..but it has been optimised, and is being optimised either further for lighting fast evaluations.