tl;dr
Consider playing with sysctl vfs.ffs.dirhash_maxmem
to increase the maximum
dirhash cache.
OpenBSD filesystem foundation #
Under OpenBSD we have nothing to laugh about and no fun when it comes to the filesystem subsystem. If we look into the kernel we can divide the subsystem into 3 areas:
VFS (Virtual File System): The layer that makes all file systems look the same to programs. It’s like the kernel interface to file systems. The focus of vfs activity is the vnode. A vnode is the in-memory representation of a file that VFS uses - it’s like a “virtual inode” that works for any file system.
UFS (Unix File System): The general Unix way of organizing files - the basic rules and structures. The focus of ufs activity is the inode. When vnode lives in-memory, the inode is a data structure on disk that contains information about a file (size, permissions, where the data blocks are located).
FFS (Fast File System): The actual implementation that stores your files on the disk with OpenBSD-specific optimizations. FFS manages the physical disk layout by controlling inode placement and timing, while implementing storage structures like data blocks, superblocks, and cylinder groups for optimal performance sigh.
Optimizations #
When it comes to optimizations, there is almost nothing that we as OpenBSD users
can do in this area. Since
g2k23 there is also no
more softdep
under OpenBSD. This was still a possibility to get some
performance out of the otherwise so defensive file system. So what to do?
For a few days now I have been reading through the above-mentioned subsystem (I
have the greatest respect for anyone who can cope with this complexity. I
can’t). Anyway I stumbled across dirhash
. dirhash creates
a in-memory hash table for large directories, transforming file lookups
ufs_lookup
from slow linear search (O(n)) to fast constant time (O(1)).
That sounds great. Something like that is helpful, how can we optimise it. If
we look at the ufsdirhash_init
code, we see ufs_dirhashmaxmem
here. Which
means that we initially start with a 5MB cache. That sounds like a good idea,
doesn’t it?
void
ufsdirhash_init(void)
{
pool_init(&ufsdirhash_pool, DH_NBLKOFF * sizeof(doff_t), 0, IPL_NONE,
PR_WAITOK, "dirhash", NULL);
rw_init(&ufsdirhash_mtx, "dirhash_list");
arc4random_buf(&ufsdirhash_key, sizeof(ufsdirhash_key));
TAILQ_INIT(&ufsdirhash_list);
ufs_dirhashmaxmem = 5 * 1024 * 1024;
ufs_mindirhashsize = 5 * DIRBLKSIZ;
}
The good thing is that we can adjust this value using sysctl(8) and optimise it to our needs:
$ sysctl vfs.ffs
vfs.ffs.dirhash_dirsize=2560
vfs.ffs.dirhash_maxmem=52428800
vfs.ffs.dirhash_mem=4223611
I took a look at my OpenBSD systems. Server and desktop systems. I was very
close to dirhash_maxmem
on many systems. On my desktop system, see above, I
have increased the value to 50MB. I have a lot of big cvs/git repos and a lot
of files in my $HOME.
50MB? That’s actually far too much, isn’t it? How do I get to this value? I
think a good way is to increase the value considerably. For example 50MB (doas sysctl vfs.ffs.dirhash_maxmem=52428800
) and then run:
doas find / >/dev/null
That will find every file on the system and execute a ufs_lookup
internally
in the UFS layer, which should cache the lookup by ufsdirhash_lookup()
.
For one of my desktop systems:
$ sysctl vfs.ffs
vfs.ffs.dirhash_dirsize=2560
vfs.ffs.dirhash_maxmem=52428800
vfs.ffs.dirhash_mem=28483627
After a full find /
on my
gembox I can
see dirhash_mem=28483627
27MB, Now I could take this value as the maximum
value + 10% buffer. Or, I leave 50 MB, because my system has 32GB RAM and even
reading this article in the browser requires more memory.
Depending on your use case, something can be optimised here. You can also
consider using find /path
to keep the cache warm. Again, depending on the use
case. I hope you found this article interesting and helpful.