The stupid simple optimizations turned out to be effective and I confirmed another bottleneck in the original code that caught my eye when I first saw it (I didn't change it then because I was less familiar with the code and performance wasn't yet a problem) which is letting the latest version of the code run a lot faster than it used to even with a heavier weight algorithm running alongside the original.
I'll see how that does with today's roasting.