The existence of the endomorphism is a roughly 20% speedup in a plain multi-exp due to halving the number of doublings. What it does is gives many algorithms which could be batched across multiple point the efficiency they'd have at twice the number of pubkeys. It's a pretty big speedup and AFAIK at an equivalent level of optimization it makes secp256k1 the fastest to verify of any widely deployed curve. So the motivation there is pretty clear, I think.
Could you give more info on this because I don't see how to have efficient decomposition k.P= (k1+k2.lambda).P using such lambda.
Thanks