Substituting one word for another in a 12 word seed phrase would result in an invalid seed phrase 15 times out of 16 on average, and only 1 out of 16 times would pass the checksum.
Sorry for going off-topic but how did you arrive at this percentage? SHA256 of the entropy is supposed to give results in a way the first 4 bits, or all the bits for that matter, of the result are pseudorandom. I suspect you arrived at 16 by dividing the 256 bits returned by SHA256 by the 4-bit length of the checksum.
That is probably true for arbitrary bits input but when you're just substituting a single word, you can only change up to 11 bits at once, and because a bunch of input bits can't just be flipped to predictable some output bits from SHA256, I think the probability of a checksum collision from word substitution is much, much lower than 1/16 on average, especially if the last word is the one being changed (7 input bits changed + the entire checksum, two moving targets at once).
In fact for a given checksum there may not even be a substitution in any single word that makes an equal checksum. Multiple word substitutions is a different story and I can imagine at least a few collisions being made of many words are allowed to be replaced at once.
I didn't follow your explanation, so sorry if I am arguing a moot point, but here is why it is 1/16:
The checksum value in a 12-word phrase is 4 bits and it is constant, assuming that the last word is not variable. There are 16 possible computed checksums but only one matches the expected value, and since computed checksums are random, then 1/16 of the computed checksums will match the expected value.
Now, if the last word is variable, then the expected checksum is not constant. In this case, there are 7 bits that can be changed and the other 4 are the expected checksum and are determined. If you try all possible 2048 words, then for each value of the first 7 bits, only 1 out of 16 values for the last 4 bits will match the computed checksum.
Or, in general, in a random string of bits, of which the last 4 are expected to match the computed checksum for the other bits, only 1 of 16 of these random strings will be valid.