ZFS Documentaion is Bad For Beginners

I’m a bit annoyed at ZFS documentation, so I thought I’d write about it.

To say it a bit strongly, ZFS documentation is really bad for beginners.

Looking at this like a beginner:

Beginner questions:

  1. What’s a ZPOOL
  2. What’s a vdev
  3. How do you replace a drive that fails
  4. What sort of maintenance do I have to do?

And that’s omitting the obvious, but hard to answer, how should I construct my system?

Let’s take that first question. As soon as a beginner were to ask it, they get introduced to the phrase vdev. That’s not clear. Answer the question before introducing jargon. The official documentation is going to be our only reference because 3rd party documentation is hit-or-miss.

Here’s the official documentation answer:

A storage pool is a collection of devices that provides physical storage and data replication for ZFS datasets. All datasets within a storage pool share the same space. See zfs(8) for information on managing datasets.

Immediate questions:

  • What is a ZFS dataset?
  • I can infer that a ZPOOL is a ZFS storage pool. What’s the difference between physical storage and data replication – why are they listed separately?
  • What does it mean that datasets within a storage pool share the same space?

So, if I’m a beginner, I now have 3 new questions to replace my answered 1st one.

So, now I’ll look at the concepts page which explains what a vdev, but generally does a poor job at explaining how a vdev relates to a zpool.

The first statement on the page about pools is:

A pool can have any number of virtual devices at the top of the configuration (known as “root vdevs”). Data is dynamically distributed across all top-level devices to balance data among devices. As new virtual devices are added, ZFS automatically places data on the newly available devices.

So, if I’m new, I just learned that disks are not actually, by default, part of zpools, but rather a virtual disk is instead used. That virtual disk (as stated on the doc page), doesn’t have to be a physical drive, but can be – but I probably want a collection of disks, not just 1 disk. But now, I’d infer that data is striped across all vdevs in a pool. I’d also guess (without any real confirmation) that a pool is what a user would see when trying to access the drive in their OS.

However, if I read the text of mirror:

A mirror of two or more devices. Data is replicated in an identical fashion across all components of a mirror. A mirror with N disks of size X can hold X bytes and can withstand N-1 devices failing, without losing data.

So my questions now are:

  • Can I change types in a vdev? If I have 2 drives, I probably want a mirror, but what if I get a 3rd drive and now RAIDZ is best?

I can understand that this document would assume knowledge of RAID. The RAIDZ documentation isn’t super clear here either:

A raidz group with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P devices failing without losing data.

That means, with 2 disks, I can have 0 parity disks meaning that raidz is a bad choice for 2 disks, but is good for 3. I would assume that means there’s some way to convert between them.


Now let’s looks at failure.

In order to take advantage of these features, a pool must make use of some form of redundancy, using either mirrored or raidz groups.

If I didn’t look carefully, I wouldn’t have known that 2 drives in raidz grouping would have 0 redundancy. Seems like an easy correction to make. Just say raidz groups (with at least 1 parity drive).

So then we get to:

If a device is removed and later re-attached to the system, ZFS attempts to bring the device online automatically. Device attachment detection is hardware-dependent and might not be supported on all platforms.

I read that as if I replace a drive, it’ll automatically get up to date. But the caveat I missed at first is that it probably means the same drive (like if a drive gets out of sync, not if it’s replaced). The section on drive failure and recovery is absolutely incomplete and misleading. It should include a paragraph about replacing a drive that fails. So, I’ll do another search and get Oracle’s documentation. They made ZFS, I’d expect it to be clear and it is. So now I know there’s a commandline command to trigger the rebuilding of the vdev, assuming it can be rebuilt.

So, for me to have 5 questions when we’ve answered 3 and had to use 2 different sources of official documentation, I think it’s clear to say, it’s tricky. Here are the current questions:

  1. What’s a ZPOOL
  2. What’s a vdev
  3. How do you replace a drive that fails
  4. What sort of maintenance do I have to do?
  5. What is a ZFS dataset?
  6. I can infer that a ZPOOL is a ZFS storage pool. What’s the difference between physical storage and data replication – why are they listed separately?
  7. What does it mean that datasets within a storage pool share the same space?
  8. Can I change types in a vdev? If I have 2 drives, I probably want a mirror, but what if I get a 3rd drive and now RAIDZ is best?


It was like having to take notes in a class to get these questions! Beginner friendly is a single place that actually explains the concepts in a clear, jargon-free way. Then, when I want to see implementation, jargon comes in. How hard is it to say something like:

Concept:

ZFS writes data across virtual devices. A virtual device is two elements: 1 or more disks or files & a preset configuration of how those elements work together. A virtual device’s preset configuration can be modified after creation for various benefits, including drive expansion / cannot be modified after creation meaning that drive layout and other decisions may need to be made carefully. There are several preset configurations that are commonly used, however see the table below to see storage impacts, how many drives can fail, etc (assuming a 4TB Hard Drive).

In 15 min, I can’t get even a high-level understanding of maintenance requirements from the official documentation.


If you look on google at the top 10 results for each of these questions, you’ll see my criticisms hold true.


I may choose to do something about this in future postings, but at minimum, I thought it reasonable to post this to hopefully comment on why things need to be better…and maybe how.

yes, the header image is AI generated and it looks how it feels when I was reading through the ZFS documentation and it felt like I was barraged with things that made no sense – so I thought it would be cute to include text that doesn’t exist

Speak Your Mind

*