Bitcoin Forum

Alternate cryptocurrencies => Altcoin Discussion => Topic started by: Come-from-Beyond on December 21, 2011, 04:58:05 PM



Title: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on December 21, 2011, 04:58:05 PM
I'm talking about algorithm that is used between two SHA256 calculations to make hashing GPU unfriendly. Its name is "Scrypt".

It's supposed to do calculations that can't be done in parralel mode and require a lot of memory. Parameters used for LTC and similar forks (1024, 1 and 1) make us to use only 1 thread and at least 128 Kb of memory to calc 1 hash value. I rewrote the algorithm so I could track a path of every bit. After that I used math methods for simplifying integrated circuits (it's like when u have 1000 NAND or NOR elements and wish to use as less as possible of them, just 300 for example). I failed to build complete Scrypt this way, coz of memory constraint, but I managed to simplify small pieces of it. I'm afraid that with a lot of memory anyone could build complete scheme and pass thousand SALSA steps. This will lead to huge hashrate boost, i.e. 1 Mh/s vs 1 Kh/s with "usual" Scrypt. No need to tell why it's very bad for LTC and the others.

Well, now my question: is there any theory that prove quality of Scrypt?

PS: My modified Scrypt code do less calculations. It gives approx. 20% boost when compared to original code of Tarnsnap project due to sequential memory access. Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA...


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: pooler on December 21, 2011, 05:04:31 PM
I'm talking about algorithm that is used between two SHA256 calculations to make hashing GPU unfriendly. Its name is "Scrypt".

It's supposed to do calculations that can't be done in parralel mode and require a lot of memory. Parameters used for LTC and similar forks (1024, 1 and 1) make us to use only 1 thread and at least 128 Kb of memory to calc 1 hash value. I rewrote the algorithm so I could track a path of every bit. After that I used math methods for simplifying integrated circuits (it's like when u have 1000 NAND or NOR elements and wish to use as less as possible of them, just 300 for example). I failed to build complete Scrypt this way, coz of memory constraint, but I managed to simplify small pieces of it. I'm afraid that with a lot of memory anyone could build complete scheme and pass thousand SALSA steps. This will lead to huge hashrate boost, i.e. 1 Mh/s vs 1 Kh/s with "usual" Scrypt. No need to tell why it's very bad for LTC and the others.

Well, now my question: is there any theory that prove quality of Scrypt?

PS: My modified Scrypt code do less calculations. It gives approx. 20% boost when compared to original code of Tarnsnap project due to sequential memory access. Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA...

I think it would be interesting to discuss this with the author of scrypt, Colin Percival. There is even a mailing list dedicated to scrypt here:
http://www.tarsnap.com/lists.html


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on December 21, 2011, 05:25:58 PM
Good idea! thank you


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on December 21, 2011, 05:43:48 PM
Scrypt looks OK, the problem comes from chosen parameters (1024, 1 and 1) I think. Maybe "problem" is too strong word, but realy good algorithm supposed to use every byte of scratchpad with equal probability. Unfortunately, usage of the scratchpad is not symmetrical. For example, if u divide memory into 64 byte chunks u'll notice that every even chunk used 2 times while odd one used 1 time. There are also evident patterns in jumps from chunk to chunk. So on.

I think no need to discuss this topic. We should just pray that noone will solve the problem how to simplify SALSA.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: grue on December 22, 2011, 12:59:16 AM
PS: My modified Scrypt code do less calculations. It gives approx. 20% boost when compared to original code of Tarnsnap project due to sequential memory access. Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA...
do share!


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on December 22, 2011, 05:40:06 AM
I'm going to publish the miner when I create a "pool" for it.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ssvb on January 19, 2012, 10:37:02 AM
I think no need to discuss this topic. We should just pray that noone will solve the problem how to simplify SALSA.
Praying is not very useful and obscurity is definitely not the right way to do things here :) Even without sharing the sources yet, do you have any statistics for the number of 32-bit XOR, ADD and ROTATE operations before and after your optimization? Or even better would be the statistics for 128-bit SIMD operations (XOR, ADD, ROTATE, SHUFFLE).


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 11:50:51 AM
I think no need to discuss this topic. We should just pray that noone will solve the problem how to simplify SALSA.
Praying is not very useful and obscurity is definitely not the right way to do things here :) Even without sharing the sources yet, do you have any statistics for the number of 32-bit XOR, ADD and ROTATE operations before and after your optimization? Or even better would be the statistics for 128-bit SIMD operations (XOR, ADD, ROTATE, SHUFFLE).

I didn't count the statistics. I tried to simplify the algo using usual NAND/NOR-logic optimization techniques. (Un)fortunatelly I ran out of memory (I had only 16 Gb) so optimization wasn't completed.

If u wish u could try to do it urself using my implementation of [1024, 1, 1]-Scrypt. When I wrote it using SSE2 (calc 4 hashes at once) I got 2500 h/s from each GHz of an Intel CPU.

Code:
#define ROTR(x, n) ((x>>n)|(x<<(32-n)))

void Salsa(unsigned __int32 uiData[16])
{
unsigned __int32 uiA[16], i, j;

for (i=0; i<16; i++) uiA[i]=uiData[i];

for (i=0; i<4; i++)
{
uiA[4]^=ROTR((uiA[0]+uiA[12]), 25);
uiA[9]^=ROTR((uiA[5]+uiA[1]), 25);
uiA[14]^=ROTR((uiA[10]+uiA[6]), 25);
uiA[3]^=ROTR((uiA[15]+uiA[11]), 25);

uiA[8]^=ROTR((uiA[4]+uiA[0]), 23);
uiA[13]^=ROTR((uiA[9]+uiA[5]), 23);
uiA[2]^=ROTR((uiA[14]+uiA[10]), 23);
uiA[7]^=ROTR((uiA[3]+uiA[15]), 23);

uiA[12]^=ROTR((uiA[8]+uiA[4]), 19);
uiA[1]^=ROTR((uiA[13]+uiA[9]), 19);
uiA[6]^=ROTR((uiA[2]+uiA[14]), 19);
uiA[11]^=ROTR((uiA[7]+uiA[3]), 19);

uiA[0]^=ROTR((uiA[12]+uiA[8]), 14);
uiA[5]^=ROTR((uiA[1]+uiA[13]), 14);
uiA[10]^=ROTR((uiA[6]+uiA[2]), 14);
uiA[15]^=ROTR((uiA[11]+uiA[7]), 14);

uiA[1]^=ROTR((uiA[0]+uiA[3]), 25);
uiA[6]^=ROTR((uiA[5]+uiA[4]), 25);
uiA[11]^=ROTR((uiA[10]+uiA[9]), 25);
uiA[12]^=ROTR((uiA[15]+uiA[14]), 25);

uiA[2]^=ROTR((uiA[1]+uiA[0]), 23);
uiA[7]^=ROTR((uiA[6]+uiA[5]), 23);
uiA[8]^=ROTR((uiA[11]+uiA[10]), 23);
uiA[13]^=ROTR((uiA[12]+uiA[15]), 23);

uiA[3]^=ROTR((uiA[2]+uiA[1]), 19);
uiA[4]^=ROTR((uiA[7]+uiA[6]), 19);
uiA[9]^=ROTR((uiA[8]+uiA[11]), 19);
uiA[14]^=ROTR((uiA[13]+uiA[12]), 19);

uiA[0]^=ROTR((uiA[3]+uiA[2]), 14);
uiA[5]^=ROTR((uiA[4]+uiA[7]), 14);
uiA[10]^=ROTR((uiA[9]+uiA[8]), 14);
uiA[15]^=ROTR((uiA[14]+uiA[13]), 14);
}

for (i=0; i<16; i++) uiData[i]+=uiA[i];
}

unsigned __int32 uiStorage[32768+32];
..........................
..........................
..........................

for (i=0; i<32768; i+=16)
{
for (j=0; j<16; j++) uiStorage[i+32+j]=uiStorage[i+j]^uiStorage[i+16+j];
Salsa(&uiStorage[i+32]);
}
for (i=0; i<32768; i+=32)
{
for (j=0; j<16; j++) uiStorage[i+j]^=uiStorage[i+16+j];
}
for (i=0; i<32768; i+=16)
{
j=((uiStorage[32784]&1023)<<5)+(i&31);
for (k=0; k<16; k++) uiStorage[32768+(i&31)+k]=uiStorage[32768+k]^uiStorage[32784+k]^uiStorage[j+k];
Salsa(&uiStorage[32768+(i&31)]);
}
..........................
..........................
..........................

I can't post here my semi-optimized solution. It used some precomputing and required a lot of memory. Nothing special, but I decided to keep it in a cold dark place.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ArtForz on January 19, 2012, 12:45:40 PM
Let's play a game of counting XORs, shall we?

Code:
for (i=0; i<32768; i+=16)
{
for (j=0; j<16; j++)
uiStorage[i+32+j]=uiStorage[i+j]^uiStorage[i+16+j];
Salsa(&uiStorage[i+32]);
}
for (i=0; i<32768; i+=32)
{
for (j=0; j<16; j++)
uiStorage[i+j]^=uiStorage[i+16+j];
}
for (i=0; i<32768; i+=16)
{
j=((uiStorage[32784]&1023)<<5)+(i&31);
for (k=0; k<16; k++)
uiStorage[32768+(i&31)+k]=uiStorage[32768+k]^uiStorage[32784+k]^uiStorage[j+k];
Salsa(&uiStorage[32768+(i&31)]);
}
first loop is 2048 times 1 16-dword XOR
2nd loop 1024 times 1 16-dword XOR
3rd loop 2048 times 2 16-dword XORs (sorry, a 3-input XOR is still 2 XORs)
total # of 16-dword XORs ... 7168

vs. a simple stupid scrypt(1024,1,1) (using 16-dword vectors for X and V to keep things more readable):
Code:
for (i = 0; i < 1024; i++) {
V[i*2+0] = X[0];
V[i*2+1] = X[1];
X[0] ^= X[1];
X[0] = salsa20(X[0]);
X[1] ^= X[0];
X[1] = salsa20(X[1]);
}
for (i = 0; i < 1024; i++) {
j = X[1].vec_element[0] & 1023;
X[0] ^= V[j * 2 + 0];
X[1] ^= V[j * 2 + 1];
X[0] ^= X[1]
X[0] = salsa20(X[0]);
X[1] ^= X[0];
X[1] = salsa20(X[1]);
}
let's count again...
1st loop 1024 times 2 16-dword XORs
2nd loop 1024 times 4 16-dword XORs
total # of 16-dword XORs ... 6144

Congratulations. Your implementation *increased* the total # of 16-dword XORs vs. reference scrypt by 1024... ?!?


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: fivebells on January 19, 2012, 01:03:05 PM
Your fear that someone might find a faster implementation by running one instance of it through some magical optimization software doesn't seem that plausible to me.  Roughly speaking, scrypt is based on randomly mixing up the results of a large number of SHA256 hashes, so any substantial optimization which didn't also optimize SHA256 would be pretty surprising.  Any accurate optimization would still need random access to all those hashes, which forces a very large circuit.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 01:07:59 PM
Let's play a game of counting XORs, shall we?

...

Congratulations. Your implementation *increased* the total # of 16-dword XORs vs. reference scrypt by 1024... ?!?

Perhaps. I didn't check it by myself but I suppose ur counting is correct. So?


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 01:11:52 PM
Your fear that someone might find a faster implementation by running one instance of it through some magical optimization software doesn't seem that plausible to me.  Roughly speaking, scrypt is based on randomly mixing up the results of a large number of SHA256 hashes, so any substantial optimization which didn't also optimize SHA256 would be pretty surprising.  Any accurate optimization would still need random access to all those hashes, which forces a very large circuit.

I'm afraid that this circuit won't be large enough for modern FPGAs.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ArtForz on January 19, 2012, 01:15:18 PM
Your fear that someone might find a faster implementation by running one instance of it through some magical optimization software doesn't seem that plausible to me.  Roughly speaking, scrypt is based on randomly mixing up the results of a large number of SHA256 hashes, so any substantial optimization which didn't also optimize SHA256 would be pretty surprising.  Any accurate optimization would still need random access to all those hashes, which forces a very large circuit.

I'm afraid that this circuit won't be large enough for modern FPGAs.
... says the guy claiming +1024 is negative.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 01:32:18 PM
Your fear that someone might find a faster implementation by running one instance of it through some magical optimization software doesn't seem that plausible to me.  Roughly speaking, scrypt is based on randomly mixing up the results of a large number of SHA256 hashes, so any substantial optimization which didn't also optimize SHA256 would be pretty surprising.  Any accurate optimization would still need random access to all those hashes, which forces a very large circuit.

I'm afraid that this circuit won't be large enough for modern FPGAs.
... says the guy claiming +1024 is negative.

:) I got u.

U assumed that time of calculating of a Scrypt-based hash is proportionally to number of operations. It's wrong.

For example, i*4+i could be faster than i*5.

My implementation of Scrypt is optimized for an Intel cpu and runs faster than ur code. The 1st step I made was optimization of ur miner and I wasn't satisfied with results, so I moved to original Tarsnap code and then to implementation posted here. I took into account a REAL cpu that has limited cache for code and data, not very clever branch predictor, register stalls and that is able to process 3 instructions at once.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ArtForz on January 19, 2012, 01:58:09 PM
My implementation of Scrypt is optimized for an Intel cpu and runs faster than ur code.
Reference C implementation slower than optimized asm, news at 11.

The 1st step I made was optimization of ur miner and I wasn't satisfied with results, so I moved to original Tarsnap code and then to implementation posted here. I took into account a REAL cpu that has limited cache for code and data, not very clever branch predictor, register stalls and that is able to process 3 instructions at once.
So... exactly what pooler did, except he managed it without big secrecy or talking about boolean simplification and factor 1000 speedups "if it only had worked".

Tip: Statements like that make me think "oh dear, another crypto kook".


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 02:30:58 PM
Reference C implementation slower than optimized asm, news at 11.

As I wrote:

"When I wrote it using SSE2 (calc 4 hashes at once) I got 2500 h/s from each GHz of an Intel CPU."

See?


Tip: Statements like that make me think "oh dear, another crypto kook".

I don't need tips from a guy who is so infantile. I don't know why u think that this topic has something personal against u.

Read https://bitcointalk.org/index.php?topic=55670.0 to see results of my implementation. Hashrate was proven by litecoinpool.org statistics and tests were made by other guys. Ur miner is so slow that even simple optimizations give good boost to performance (+200% bonus).

---
Starting from this point I put u into ignore list so u won't waste my time.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ArtForz on January 19, 2012, 02:46:29 PM
Reference C implementation slower than optimized asm, news at 11.

As I wrote:

"When I wrote it using SSE2 (calc 4 hashes at once) I got 2500 h/s from each GHz of an Intel CPU."

See?


Tip: Statements like that make me think "oh dear, another crypto kook".

I don't need tips from a guy who is so infantile. I don't know why u think that this topic has something personal against u.

Read https://bitcointalk.org/index.php?topic=55670.0 to see results of my implementation. Hashrate was proven by litecoinpool.org statistics and tests were made by other guys. Ur miner is so slow that even simple optimizations give good boost to performance (+200% bonus).

---
Starting from this point I put u into ignore list so u won't waste my time.
*facepalm*
Again, completely avoiding the topic of outrageous claims without *any* proof.
Oh wait, the "proof" is writing a miner that's 20% slower than poolers. right.
Ignored.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ssvb on January 19, 2012, 07:13:34 PM
OK, thanks for the clarifications. Now we can see that it was just scaremongering and/or handwaving. So much for "do less calculations" and "sequential memory access" promises. In reality the demonstrated code snippet has more XOR operations and does three separate passes over 128K buffer instead of two (which means more L1 cache misses unless the hardware prefetcher saves the day). The pointless replacement of left rotations with right rotations also looks amusing.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 19, 2012, 08:10:44 PM
OK, thanks for the clarifications. Now we can see that it was just scaremongering and/or handwaving. So much for "do less calculations" and "sequential memory access" promises. In reality the demonstrated code snippet has more XOR operations and does three separate passes over 128K buffer instead of two (which means more L1 cache misses unless the hardware prefetcher saves the day). The pointless replacement of left rotations with right rotations also looks amusing.

Well... There are some explanations:

Code I posted here is C++ edition of the algo (for better reading). CPU runs assembler code which looks completely different and u should know this unless u r just a coin trader. Ur post looks like an attempt to defend ArtForz' position. But everyone sees that he is just upset that there are other miners which are better then his one (pooler's, mine and I know 2 other guys who developed much better miners).

My implementation uses less calculations coz CPU process 3 instructions at once using a trick to avoid register stalls. So extra cycle between 2 others is necessary for "less calculations" and "sequential memory access". Btw, a CPU with 128K L1 per core is quite... uncommon and ur statement about 3 passes is wrong.

Regarding replacement of left rotations with right ones. I suspect that u r NOT serious. Even a young programmer won't belive that this was made for optimizations or something like that. I made it coz there was already defined function for calculating SHA-256 that used right rotation.

PS: Ask more if u have questions :)


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Roadhog2k5 on January 19, 2012, 08:23:25 PM
u, ur, r, coz.

https://i.imgur.com/vDfnh.jpg


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ssvb on January 19, 2012, 09:38:31 PM
Code I posted here is C++ edition of the algo (for better reading). CPU runs assembler code which looks completely different
So what was the purpose of dumping here this garbage C++ edition then? Why not using some pseudocode or intrinsics which would show the idea about the efficient SIMD implementation of scrypt?

Quote
My implementation uses less calculations coz CPU process 3 instructions at once using a trick to avoid register stalls. So extra cycle between 2 others is necessary for "less calculations" and "sequential memory access".
Better instructions scheduling on superscalar processors for making use of triple issue is not using "less calculations", it is more like just doing "the same calculations faster". That's an ordinary performance optimization and has nothing to do with the "quality of Scrypt" and does not help to "get rid of redundant XORs". I had a hope that you have something more impressive to show and was somewhat disappointed.

Quote
Btw, a CPU with 128K L1 per core is quite... uncommon and ur statement about 3 passes is wrong.
You have 3 loops looking like "for (i=0; i<32768; i+=16) { ... }" in your code. The first two of them are walking over the buffer sequentially one after another. And because the buffer is larger than L1 cache ("a CPU with 128K L1 per core is quite... uncommon" as you have noticed), you get L1 cache misses on each of these loops. Merging these first two loops into one would reduce the number of L1 cache misses and also save some load/store instructions. This would actually make the code look more like the reference implementation.

The thing that actually *might* make some sense is to have the data buffers for 4 hashes interleaved. So on the first loop you just run the SIMD code in exactly the same way as scalar code without the need for any shuffle instructions. However this data layout becomes problematic on the last loop because you need to access different memory areas for different hashes ('j' indexes are different for each of the simultaneously processed 4 hashes) and gather scattered data into different lanes of SIMD registers. Introducing the intermediate loop for deinterleaving the data in addition to XORing it would fix the problem. But the data needs to be interleaved again on the last loop. The disadvantage is extra interleaving/deinterleaving overhead. The advantage is no need for shuffles and possibly better instructions scheduling. Is this what your SSE2 code is actually trying to do?


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 20, 2012, 06:13:43 AM
>> So what was the purpose of dumping here this garbage C++ edition then? Why not using some pseudocode or intrinsics which would show the idea about the efficient SIMD implementation of scrypt?

Coz I posted the code which could be simplified using NAND/NOR-optimizations. Assembler edition is very hard to read. I wasn't going to show efficient SIMD implementation.

>> Better instructions scheduling on superscalar processors for making use of triple issue is not using "less calculations", it is more like just doing "the same calculations faster".

English is not my mother tongue, sorry if I failed to explain my idea. In few words its like "CPU spends less time".

>> That's an ordinary performance optimization and has nothing to do with the "quality of Scrypt" and does not help to "get rid of redundant XORs". I had a hope that you have something more impressive to show and was somewhat disappointed.

Look at my function named Salsa(). Each cell from 16-vector uiA is XORed 4*2=8 times. My phrase was

"Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA..."

It's not about the posted code, it's about NAND/NOR-optimization. I didn't say that posted code eliminates any XORs.

>> You have 3 loops looking like "for (i=0; i<32768; i+=16) { ... }" in your code. The first two of them are walking over the buffer sequentially one after another. And because the buffer is larger than L1 cache ("a CPU with 128K L1 per core is quite... uncommon" as you have noticed), you get L1 cache misses on each of these loops.

I thought so before I tried both ways. CPU understands when code do sequential memory access and prefetches data into a cache before the code needs it.

>> Merging these first two loops into one would reduce the number of L1 cache misses and also save some load/store instructions. This would actually make the code look more like the reference implementation.

Merged loop accesses 3 different chunks of memory. This leads to more MOVDQA instructions and makes CPU to maintain 3 flows of prefetching. My edition needs only 2 flows which improve memory access.

>> The thing that actually *might* make some sense is to have the data buffers for 4 hashes interleaved. So on the first loop you just run the SIMD code in exactly the same way as scalar code without the need for any shuffle instructions. However this data layout becomes problematic on the last loop because you need to access different memory areas for different hashes ('j' indexes are different for each of the simultaneously processed 4 hashes) and gather scattered data into different lanes of SIMD registers. Introducing the intermediate loop for deinterleaving the data in addition to XORing it would fix the problem. But the data needs to be interleaved again on the last loop. The disadvantage is extra interleaving/deinterleaving overhead. The advantage is no need for shuffles and possibly better instructions scheduling. Is this what your SSE2 code is actually trying to do?

I got rid of PSHUFDs this way. I collect data from uiStorage to be XORed with 4 hashes into a register using 4 other registers as pointers (EAX, EBX, ECX, EDX). Disadvantage of PSHUFDs is much bigger than disadvantage of collecting data into a register. EAX, EBX, ECX and EDX r set in previous loop iteration, so when I realy need this data it's already in a cache.


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: ssvb on January 20, 2012, 08:31:15 AM
Coz I posted the code which could be simplified using NAND/NOR-optimizations.
Could it really be simplified? That's the main question.

Quote
Assembler edition is very hard to read. I wasn't going to show efficient SIMD implementation.
Yes, yes. We already know that you decided not to show it, but "close researches in this direction (https://bitcointalk.org/index.php?topic=60026.msg701524#msg701524)". Assuming that the assembler edition was actually efficient and even existed in the first place...

Quote
Look at my function named Salsa(). Each cell from 16-vector uiA is XORed 4*2=8 times. My phrase was

"Imagine what will happen if someone get rid of redundant XORs of 16 temporary memory cells in SALSA..."

It's not about the posted code, it's about NAND/NOR-optimization.
So here we are again. Have you managed to do the alleged NAND/NOR-optimizations? If yes, then how many XORs does it actually eliminate?

Quote
I didn't say that posted code eliminates any XORs.
You lost me here. What are you trying to prove by posting some piece of useless code?

And what makes you think that "Scrypt is good, but [1024, 1, 1] was a bad choice"?


Title: Re: Scrypt based coins: Any proof that this algorithm is realy good?
Post by: Come-from-Beyond on January 20, 2012, 09:18:05 AM
>> Could it really be simplified? That's the main question.

Theoretically it could be. But like factorization of a very long number it's hard to do. And I hope noone will manage to do it at least while I'm holding a lot of LTC.

>> So here we are again. Have you managed to do the alleged NAND/NOR-optimizations? If yes, then how many XORs does it actually eliminate?

I failed to complete the optimization. Replaced it with modified salsa which does precached results lookup. Traded CPU for RAM. Qty of XORs were reduced to 50%.

>> You lost me here. What are you trying to prove by posting some piece of useless code?

It's usefull code. It's easier to optimize coz u could move it into 1st or 2nd cycle, or leave it alone. It also helps to access memory in a "symmetrical" way.

>> And what makes you think that "Scrypt is good, but [1024, 1, 1] was a bad choice"?

2nd part of Scrypt looks like jumps from one chunk of memory to other. If u count how often each chunk is accessed u'll notice that some of them rn't accessed at all and some - very often. Good parameters r supposed to make probability of access equal for all chunks. That's how GOOD random number generators work. With [1024, 1, 1] I always got BAD distribution of access probability. So [1024, 1, 1] is a BAD choice.