By Andrew Bogott, Senior Site Reliability Engineer, Wikimedia Cloud Services
A couple of years ago I ran across a bug in OpenStack Neutron: I was trying to gather quota information for display in the WMCS tool “Openstack Browser” and got the response “Only admin is authorized to access quotas for another tenant.”
Most API calls in OpenStack are governed by customizable role-based access controls, aka ‘RBAC.’ For some reason Neutron quota weren’t; they were just hard-coded to require admin access. I wasn’t the first person to run into this. There was already a bug entered in the bug tracker and a pending fix.
The pending fix looked pretty good but was a bit out of date. I tuned it up, resubmitted an updated version, and waited for review. After a bit of back-and-forth, there was a delay while I waited for an answer to a question. In the meantime, EVERYTHING about RBAC was redesigned in OpenStack, leaving my patch totally broken. Months later another (in response to prodding from another developer) I finally submitted an updated patch. There were some complications after that with people asking for tests, me adding tests, someone else deciding that maybe the tests were in the wrong place, the CI testing framework breaking, etc. etc. etc., but last week, my tiny patch was finally merged.
OpenStack has fairly rapid release cycles, so this patch will probably be included in an official release in the next couple of months. This will be OpenStack version ‘Victoria’. WMCS currently runs OpenStack version ‘Rocky,’ so we will need to upgrade our install 4 times (the release names are alphabetical) before we’re running the fixed version of Neutron in production.
We install OpenStack using upstream Debian packages. Right now, our Hypervisors are running Debian Stretch; the last version of OpenStack available for Debian Stretch is Rocky. Before we upgrade from Rocky to Stein we need to upgrade our Virtualization hardware from Stretch to Buster. Typically, we don’t actually ‘upgrade’ hardware; instead, we wipe servers clean and reinstall with a fresh, empty OS. In the case of our Hypervisors, though, there are many (sometimes dozens) of VMs stored locally on the hardware. Wiping the servers would delete our users’ data, so a hypervisor upgrade is a huge, delicate pain in the neck.
Fortunately, we’re in the process of moving all VMs to distributed storage, after which we can easily transfer VMs here and there to get them safely out of the way of OS upgrades.
To summarize the remaining steps:
- We need to finish moving VMs to Ceph
- So we can upgrade Hypervisors to Buster
- So we can upgrade our OpenStack to version Stein
- So we can upgrade our OpenStack version to Train
- At which point we’ll need to upgrade our OpenStack web interfaces to use version Victoria (Horizon is mercifully backwards-compatible so I don’t have to rebuild it with every release) and upgrade all our custom panels to handle whatever API changes have happened since Rocky…
- And then we can upgrade OpenStack to Ussuri
- So that we can upgrade Openstack to Victoria
- At which point we can finally fix the Neutron quota issue in OpenStack Browser.
From the perspective of that one little bug, this is a terrible story! Of course, in reality, this chain of patches and upgrades intersects dozens of other issues and improvements, all stumbling forward together and occasionally knocking each other down or getting in each others’ way in the process. This Neutron issue is just the one lucky bug that got to be on the sidelines to watch it all.
I got to watch it too! The average tech worker keeps a given job for something like 18 months. It’s a genuine privilege to be in one place long enough to watch step after step of a plan come to life and sometimes get to close a bug that I opened years ago.