But you can avoid this by only using word counts that are multiples of three right? (12, 15, 18, ...)
Correct. BIP39 specifies 12, 15, 18, 21, or 24 words, which corresponds to 128 + 4, 160 + 5, 192 + 6, 224 + 7, and 256 + 8 bits of entropy + checksum, each of which is exactly divisible by 11, meaning straightforward mapping of each 11 bit segment to a word on the BIP39 word list.
We don't have to only use 11 bits of entropy/word and 4 bits of checksum since that only fits inside 12-word phrases perfectly without any padding.
Sure, you can split up your entropy any way you like. You could split in to 8 bit chunks and encode against word list of 256 words, or you split in to 15 bit chunks and encode against a word list of 32,768 words, for example. But if you want to use the BIP39 word list of 2048 words, then you need to split in to 11 bit chunks.