This is a short story about how I became aware of the dangers of “careless preloading”, while learning a bit about memcached internals along the way.
A few years ago, while working on a high traffic app on the Facebook platform, I ran across a caching bug. All of a sudden, our memcached servers had stopped working — kind of. Stuff could be read, but nothing could be added to the cache: all set
operations failed. Actually, it was stranger than that: a few sets did go through.
I had just fixed a broken preload script (which had never actually worked but nobody had ever noticed!) shortly before caching hell broke loose, so that script was my primary suspect. I decided to try the app on a test environment with and without that script, and bingo: without that preload, memcached worked normally, whereas running that preload caused the erratic cache behaviour we were experiencing in production.
So, what’s a preload, anyway? It’s a script that fills the cache with data beforehand, in order to avoid having low performance while the cache is gradually fed objects after each miss. I hadn’t written any preloads, but back then I didn’t stop to think if they were a good or bad idea, either. And it turned out that in many cases they were a bad idea, because they are always a bad idea if done without proper consideration. Ultimately, it boils down to something that has long been known to be… the root of all evil. Yep, I’m talking about premature optimization.
So, when I fixed that preload script (a trivial edit), it… uhm, started working. And it turned out that its job was to select a whole freakin’ table — about 1.5M rows — and load it into memcached. But, hey, that would be, at most, inefficient, right? In the worst case, it might displace useful data replacing it with useless data, but things would fix themselves with usage, right? Wrong! Enter memcached slabs.
For speed and efficiency reasons, Memcached has a custom memory manager, which consists of “slabs”, each of which can be assigned any number of 1MB “pages”, which are in turn split into a number of equally sized “chunks”, each of which may hold an individual object. Slabs hold objects within a specific size range, starting at 88 bytes (I think) and growing exponentially in steps of 1.25x. So for instance there may be a 1280 byte slab, which contains any number of 1MB pages split into many chunks of 1280 bytes, each holding (unless empty) an object whose size will be under 1280 bytes (including key and flags) and above 1024 bytes (the maximum allowed for the previous slab). When you perform a set
, Memcached looks at the object size and determines which slab it belongs in, and looks for a free chunk in one of the pages, assigning the slab a new page if needed (i.e. if all its pages are full, or it has no pages at all because no objects of this size were stored before). And once assigned to a slab, a page stays assigned forever and it can’t be reassigned.
That explains the problem. Our preload scripts were executed after we had to resize our cache pool, or restart servers after security updates, or restart the daemon after a mysterious crash, or migrate to a different EC2 instance type, so they acted on an empty cache (no pages assigned). And this script was storing 1.5M objects with sizes in a rather specific range, causing all or most of the pages to be assigned to a few specific slabs, leaving none or too few available for the rest of the slabs. After a short while, the result was that, unless an incoming object’s size happened to match one of the few existing slabs, it was discarded. Regardless of the amount of empty space or stale data, those objects didn’t make it.
So the fix consisted in just removing that preload. The fact that we had never noticed that this particular preload was broken hinted that it wasn’t really necessary after all. And after some investigation and testing, it turned out that in normal operation — that is, caching objects on demand — only around 35k of the 1.5M objects were stored in the cache during the first hour, and there was little to no performance impact during this ramp-up period.
My point is not that preloading itself is inherently bad. Storing those 1.5M objects upfront could have been a good idea in some situation, but it wasn’t our case. My point is: before preloading data — before optimizing anything — make sure it’s necessary. If it is, make sure you’re being selective enough to preload useful data. And keep in mind that careless preloading may not only be useless or inefficient, but possibly harmful, as there may be unforeseen side effects.