the only predictable inputs would be having the first byte (0x2 or 0x3)
No, there is much more into that. In secp256k1, we have p-value and n-value. Each of them is a prime number, and we work on numbers in range from 0 to p-1 for public keys, and from 0 to n-1 for private keys. If we factor p-1 and n-1, we can see this:
https://sagecell.sagemath.org/p=0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
n=0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141
print(factor(p-1))
print(factor(n-1))
2 * 3 * 7 * 13441 * 205115282021455665897114700593932402728804164701536103180137503955397371
2^6 * 3 * 149 * 631 * 107361793816595537 * 174723607534414371449 * 341948486974166000522343609283189
Which means, that not only we can use 02 and 03 as a prefix, but the number "three" can be used, to get three different x-values, leading to the same y-value. However, things doesn't stop there: by factoring n-1, we can use these numbers to explore smaller circles, where the biggest one has only 341948486974166000522343609283189 elements. For secp256k1, it is hard to break, however for secp160k1, we have yet another interesting property:
p=0xfffffffffffffffffffffffffffffffeffffac73
n=0x0100000000000000000001b8fa16dfab9aca16b6b3
print(factor(p-1))
print(factor(n-1))
2 * 3 * 5 * 7 * 113 * 61588775277324185343602394973294691093621473
2 * 3 * 5 * 8837 * 42918291593381467397 * 128449012680369359431471
In this case, not only we can use two and three, but also five as well, because it is a factor of p-1 and n-1 at the same time, and for that reason secp160k1 is a bit weaker, than it was supposed to be, if that wouldn't be the case.
However, in general, what I am trying to say, is that private to public key mapping is not "random". There are some algorithms behind it, just like there are different algorithms behind hash functions. Which means, that if some public key is altered in a specific way, then bits are not shifted randomly from some values to some completely different values. Everything is deterministic, and if only some bits of the private key are altered, then it is very similar case, as if some hashed message is only slightly changed here and there: we have many constants. There are many computations, which can be done once, and reused. It is true in both cases: for hash functions, and for elliptic curve cryptography.
And I am still surprised, that we don't have "modulo bias crowd" yet. Because if p-1 is divisible by 7, and n-1 is divisible by 8, then some "magic" can be done, based on that fact. And instead of computing private and public keys modulo p-value or n-value, different values can be used, to approximate things in a similar way, as the "prefix crowd" tries to do with hash functions.
If some useless dependency between these steps exists, the compiler would remove it already.
Existing compiler optimizations are often great, but they are not perfect. In SHA-256, you have ARX model, where three main operations are combined: Addition, Rotation, and Xor. If you use any of these two alone, then many compilers won't optimize it. By having addition and rotation alone, things can be trivially broken, but compilers will execute all steps anyway, and won't optimize them automatically. The same is true for having only rotation and xor, where you have to only trace, which bit is xored with which other bit, to trivially break that combination alone.
When it comes to addition and xor, it is a little more tricky, but it can be also broken, if you notice, that addition can be splitted into f-functions from SHA-1, or special functions from SHA-256, called "ch" and "maj", where you have a choose function, and a majority function. Basically, addition can be written as a sequence of xor operations, which can give single bits of the result, a majority operation, which can set carry to proper values, and optionally it can be made faster by carefully injecting choose function, where some things are executed, while others are skipped, so you can focus only on parts with non-zero bits.
So, of course, many compilers are great, and their optimizations can help a lot, but human factor is also important, when it comes to inventing new optimizations, and making things faster. And there are many things to discover, when it comes to hash functions and elliptic curves. Because even currently, only some operations are weakened and optimized, but there is still many things to improve. And also, hardened SHA-1 can give us some hints, because before trying to fight with SHA-256, it is often a good idea to try to break "weakened SHA-256", where some vectors are specially crafted, to make some operations easier, but to also preserve some features of the original SHA-256.
Another thing is number of bits in RIPEMD-160, which is "only" 160. Which means, that when people are trying to break 71-bit keys now, then when they will get closer to 80-bit ones, then they will also get much closer to address collisions. Because even now, if someone has enough computing power, to potentially break 70-bit key, then that person also probably has enough computing power, to create two different addresses, sharing around 140 identical bits (maybe a little less than that, because not storing computed keys also has some cost, when you keep partial results, instead of keeping everything).
But the truth is disheartning to prefix theorists since the resulting H160s that match in the middle have no pretty base58 similarities whatsoever.
Yes, they would be really surprised, if they would actually learn, how RIPEMD-160 internally works, which parts are hashed, and when, and so on.
Besides, who would start scanning from scratch and change their blessed rules just because things would run a bit faster?
Well, there is always a choice between keeping some brute-force solution, and improving the algorithm. Wise people know, that improving the software is what can lead to better results, but humans are lazy, and if they can use their old methods, which works fine, but are slow, then they keep doing that, and just hope, that they will be somehow lucky.