Are you really getting a 2.5x while not having the y coordinate sign and doing combinatorial recovery? If so thats really surprising to me.
No, it assumes that R is known.
For actual implementation, it would require that an odd/even bit is provided.
The square root operation is an additional potential slow down.
Signatures that don't include this info could be marked as non-standard and have to be individually checked. If it was non-standard to not include the info and the reference client included it, then most transactions would have the extra info.
Can you try implementing this on top of libsecp256k1? Getting a speedup on a very slow implementation— even though the speedup is purely algorithmic— is less interesting than getting it on a state of the art implementation.
I could have a look. Is it worth it if R has to be provided?