Coz I posted the code which could be simplified using NAND/NOR-optimizations.
Could it really be simplified? That's the main question.
Assembler edition is very hard to read. I wasn't going to show efficient SIMD implementation.
Yes, yes. We already know that you decided not to show it, but "
close researches in this direction". Assuming that the assembler edition was actually efficient and even existed in the first place...
Look at my function named Salsa(). Each cell from 16-vector uiA is XORed 4*2=8 times. My phrase was
"Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA..."
It's not about the posted code, it's about NAND/NOR-optimization.
So here we are again. Have you managed to do the alleged NAND/NOR-optimizations? If yes, then how many XORs does it actually eliminate?
I didn't say that posted code eliminates any XORs.
You lost me here. What are you trying to prove by posting some piece of useless code?
And what makes you think that "Scrypt is good, but [1024, 1, 1] was a bad choice"?