New algorithm looks like the best approach I've tried yet, though there's still some room for improvement that I'd like to explore before attempting to upstream my changes. I'm also tempted to do a thorough refactoring, but the program is still <1kLOC so *shrug*