e1ghtSpace
Legendary
Offline
Activity: 1540
Merit: 1001
Crypto since 2014
|
|
November 20, 2015, 11:18:38 PM |
|
Ah so it might be easier to modify Mr. Spread's miner. Anyway Girino has some explaining to do.
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
|
|
November 21, 2015, 03:27:53 AM |
|
Looks like I oopsed, they're not hashing the same thing over 3k times, they're hashing an INSANE amount of data. What the fuck is all of this...?
i never used the amd miner wolf - ever ... it never compiled properly for me - and never worked properly for me ... thefarm ( nvidia based ) was the only thing that was working well based on sp's spreadminer ( which was obviously based on ccminer-tsiv ) and nonce-pool spreadcoin private pool ... so if you are building an amd miner - could you also see ( if its part of your scope with the build ) if it can be integrated with the same sgminer as x11 / quark / lyra2rev2 / neoscrypt? ... i am in the office again - so im back on irc ... i have some updates about farmamd ... i have a softspot for spreadcoin ( and ftc for that matter ) but have never been happy about the algo itself ... it has always been a difficult algo to implement on theafrm due to changing miners constantly whenever i wanted to accumulate ( spreadminer as opposed to ccminer-spmod ) - and im not the only one that has had this particular issue ... so an integration of the algo in the same sgminer would be a HUGE advantage ... #crysx
|
|
|
|
e1ghtSpace
Legendary
Offline
Activity: 1540
Merit: 1001
Crypto since 2014
|
|
November 21, 2015, 05:18:23 AM |
|
Ah so it might be easier to modify Mr. Spread's miner. Anyway Girino has some explaining to do. Girino's IS better, and as I said, this is not a crippling of the miner, it's just WTF. It's not crippling? Ok that's good, I thought it was. Does the hashing of all that data slow it down very much?
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 10:05:02 AM Last edit: November 21, 2015, 10:27:50 AM by georgem |
|
Girino's IS better, and as I said, this is not a crippling of the miner, it's just WTF.
Can you be a little bit more specific please? You said words like: WTF, Insane, Barbaric and Ooops, so now I am not sure anymore I understand the point you are trying to make. Nobody doubts that the miner's efficiency can be improved, and everybody knows that SpreadX11 is pretty exotic * So... what gives? Are you just baffled by it? In a good/bad way? *(adds MinerSignature to the blockheader, and constructs MAX SIZE block by padding the previousBlockHash (this means many thousand iterations, depending on how "empty" the block is, but those are just simple multiplications and bit operations, no hashes) so that a hashWholeBlock value can be added to the header, so as to prove that whoever found a block solution was also the same person to sign the coinbase and know about the whole blocks content, before any pool did.).
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 06:15:31 PM Last edit: November 21, 2015, 06:33:53 PM by georgem |
|
I'm baffled as to why it's done the way it is. There's just a TON of data in there being hashed by SHA256, far larger than any block SPR has...
Yes, the idea is that to calculate hashWholeBlock we need to fill a complete MAX_BLOCK_SIZE block with data (that's why we have the "padding") and then Double-SHA256 it. MAX_BLOCK_SIZE for spreadcoin is 200 Kbyte. https://github.com/spreadcoin/spreadcoin/blob/master/src/main.h#L35So you have the header (less than 100 bytes) , followed by a few tx (<1 KByte or more), and then a giant padding (199 KBytes) to fill up the rest. (This large pseudoblock is only existing temporarily in memory BTW, just to create the hashWholeBlock hash.) From the whitepaper: Padding ensures that there is no incentive to mine empty blocks without transactions So the way I understand this, it means that padding calculations are more numerous the fewer transactions you get into your block. Vice versa: The bigger the block already is (more txs), the less padding has to be added/calculated. So in a way, padding is horrible on efficiency since we are mostly having 1 transaction blocks these days. When this changes, then padding will not carry as much weight as it does now. But I don't think that the padding calculations are that heavy anyway, but I need to do some benchmarks and look into OpenCL / Cuda. And yes, we are ALWAYS doing a double SHA256 calculation on a MAX_BLOCK_SIZE block (200 KByte).This size doesn't change from block to block, it's always the maximum. Why Mr. Spread did it this way? I'd have to guess, but probably to deliberately give GPUs a hard time? Wasn't he playing the "CPU-Only-coin" - card for some time?
|
|
|
|
coins101
Legendary
Offline
Activity: 1456
Merit: 1000
|
|
November 21, 2015, 06:33:06 PM |
|
I'm baffled as to why it's done the way it is. There's just a TON of data in there being hashed by SHA256, far larger than any block SPR has...
Yes, the idea is that for hashWholeBlock we need to fill a complete MAX_BLOCK_SIZE block with data (that's why we have the "padding"). MAX_BLOCK_SIZE for spreadcoin is 200 Kbyte. https://github.com/spreadcoin/spreadcoin/blob/master/src/main.h#L35So you have the header (less than 100 bytes) , followed by a few tx (<1 KByte or more), and then a giant padding (199 KBytes) to fill up the rest. (This large pseudoblock is only existing temporarily in memory BTW, just to create the hashWholeBlock hash.) From the whitepaper: Padding ensures that there is no incentive to mine empty blocks without transactions So the way I understand this, it means that padding calculations are more numerous the fewer transactions you get into your block. Vice versa: The bigger the block already is (more txs), the less padding has to be added/calculated. So in a way, padding is horrible on efficiency since we are mostly having 1 transaction blocks these days. When this changes, then padding will not carry as much weight as it does now. And yes, we are ALWAYS doing a double SHA256 calculation on a MAX_BLOCK_SIZE block (200 KByte).This size doesn't change from block to block, it's always the maximum. Why Mr. Spread did it this way? I'd have to guess, but probably to deliberately give GPUs a hard time? Wasn't he playing the "CPU-Only-coin" - card for some time? Possibly. But maybe its to give everyone equal weight. If you think about it, everyone effectively processes a full block. There is no incentive to choose more efficient blocks to mine to move on to the next one quickly. I suppose if there is an incentive to fill blocks quickly, it sounds like it makes it more profitable to process blocks with transactions vs. padding. What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 06:36:28 PM |
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 09:02:55 PM Last edit: November 21, 2015, 09:17:29 PM by georgem |
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented. It's largely SPH code... really, really bad. About the same as the original darkcoin-mod. EDIT: Idea! What's the padding made out of? Maybe I can shortcut the memory usage! padding is constructed starting with a seed/copy of the 32 bytes of previousHashBlock (or hashPrevBlock as it is called in the code), and then there are a few bitshifts (if necessary) and then tens of thousands of multiplications while we are filling the space (moving backwards thru the block). But we basically start with just those 32 bytes, and everything is derived from them. Padding starts here: https://github.com/spreadcoin/spreadcoin/blob/master/src/main.cpp#L1511 while (BlockData.size() % 4 != 0) BlockData << uint8_t(7);
// Fill rest of the buffer to ensure that there is no incentive to mine small blocks without transactions. uint32_t *pFillBegin = (uint32_t*)&BlockData[BlockData.size()]; uint32_t *pFillEnd = (uint32_t*)&BlockData[MAX_BLOCK_SIZE]; uint32_t *pFillFooter = std::max(pFillBegin, pFillEnd - 8);
memcpy(pFillFooter, &hashPrevBlock, (pFillEnd - pFillFooter)*4); for (uint32_t *pI = pFillFooter; pI < pFillEnd; pI++) *pI |= 1;
for (uint32_t *pI = pFillFooter - 1; pI >= pFillBegin; pI--) pI[0] = pI[3]*pI[7];
BlockData.forsed_resize(MAX_BLOCK_SIZE); First thing we do is fill up the Block from the left (right after the tx-section) with a few 0x07 bytes (only if necessary), just so that the current size of the blockdata size (header + txs) is exactly divisible by 4. Maybe size is already divisible modulo 4, so we don't need to add any such bytes. Then we define pointers pFillBegin, pFillEnd and pFillFooter, and then copy hashPrevBlock to the end (last 32 bytes) of this MAX_BLOCK_SIZE block, and then fill all the empty bytes in between moving backwards, 4 bytes per iteration. Oh, and before we start we also turning these 32 bytes (or the 8 x 4 byte integers it consists of) "ODD", by doing this *pI |= 1 operation on them, so that they are not divisible by 2 anymore). Then we just iterate backwards, in 4 byte steps, always taking pI[3]*pI[7] and writing the multiplication result into pI[0], we do that, until we reach the transaction section (or those 0x07 bytes we created earlier (if they were necessary))... That's about it. So this large padding section doesn't have any regularity or repetition if you were expecting that. It's pretty messy & chaotic data. Mr. Spread really wants your GPU to double-SHA256 a 200Kbyte datastructure. All the time.
|
|
|
|
coins101
Legendary
Offline
Activity: 1456
Merit: 1000
|
|
November 21, 2015, 09:26:02 PM |
|
I think its a good idea, on paper. Tx's are given priority, the rest is there to make mining equal. With near full blocks, mining should be easy. Perhaps we should cut the block sizes down to 1 tx 1tx blocks, moving up to 1mb blocks by next year. Maybe we should just organise hard forks every 6 months.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 09:36:12 PM Last edit: November 21, 2015, 10:49:09 PM by georgem |
|
Maybe I can shortcut the memory usage!
I told Mr. Spread about your idea. He LOL'ed and... Who is Mr. Spread?
|
|
|
|
coins101
Legendary
Offline
Activity: 1456
Merit: 1000
|
|
November 21, 2015, 09:39:11 PM |
|
That looks like he's standing in front of an ASICs with vents. Has Mr.Spread already created an ASICs for SpreadX11?
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 09:41:45 PM |
|
That looks like he's standing in front of an ASICs with vents. Has Mr.Spread already created an ASICs for SpreadX11?
Yep! It's a hamster dung powered wooden ASIC alright. He created it himself, just with the stuff that was lying around...
|
|
|
|
coins101
Legendary
Offline
Activity: 1456
Merit: 1000
|
|
November 21, 2015, 09:51:13 PM |
|
That looks like he's standing in front of an ASICs with vents. Has Mr.Spread already created an ASICs for SpreadX11?
Yep! It's a hamster dung powered wooden ASIC alright. He created it himself, just with the stuff that was lying around... Dude...He's ripping you off, stealing your electricity. // Fill rest of the buffer to ensure that there is no incentive to mine small blocks without transactions. So, that's clear now....the padding is to make mining equal. Just needs a better implementation, but still with the padding, I guess.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 10:30:50 PM |
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented. It's largely SPH code... really, really bad. About the same as the original darkcoin-mod. EDIT: Idea! What's the padding made out of? Maybe I can shortcut the memory usage! What I want to find out how many times we actually need to calculate those double-SHA256 for the whole 200KBytes. Maybe we can skip / hold a few iterations.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 21, 2015, 10:37:08 PM |
|
I think its a good idea, on paper. Tx's are given priority, the rest is there to make mining equal. With near full blocks, mining should be easy. Perhaps we should cut the block sizes down to 1 tx 1tx blocks, moving up to 1mb blocks by next year. Maybe we should just organise hard forks every 6 months. One thing is for sure. If we keep the algo as it is, and increase the block size to say 2 Megabytes (10x ), this will also make the padding / hashWholeBlock calculation 10x more heavy on your GPU. I wonder if this algo can be reduced in complexity while still maintaining the same results. (But we can also introduce new complexity if it helps make everything MORE anti pool and pro solo-mining) After all, a solo-miner is always also a full node. Bingo!
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 23, 2015, 07:18:53 PM |
|
It's not really the SHA256d that's bad, it's the REST of X11.
You mean the way the whole SPH library has been ported to OpenCL, right?
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 23, 2015, 07:44:34 PM |
|
It's not really the SHA256d that's bad, it's the REST of X11.
You mean the way the whole SPH library has been ported to OpenCL, right? If you wanna call it that. It hasn't been ported so much as copypasted, and on top of this, SPH is a BAD library to use for any kind of speed-critical application. Its main purpose is to be portable across a wide range of CPUs, not perform well. I understand now, thanks. This means that the efficiency can probably be increased by a tenfold, I would guess... wow!
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 23, 2015, 08:02:32 PM |
|
Not quite - you need to take into consideration that the massive number of SHA256 hashes take MOST of the time. So the REMAINING code can probably be doubled in speed, and I'm not sure what I can do with the signature yet. Right now, signature2 is not looking good - https://ottrbutt.com/tmp/spreadx11-sig2-analysis.png -- it's bigger than the code cache by a lot (code cache on GCN is 32KiB), and uses enough registers to limit it to one wave in flight. Since the kernel also uses some memory, it probably would benefit from more waves in flight. Interesting, so SHA256d is THE problem, although SPH is the most obvious thing that can be improved. I need to look into this CodeXL thing to analyze kernels. PS: mr. spread asks if you have any NSFW pics of naked hamster girls.
|
|
|
|
georgem (OP)
Legendary
Offline
Activity: 1484
Merit: 1007
spreadcoin.info
|
|
November 23, 2015, 08:32:25 PM |
|
SHA256d, at least the code itself, probably isn't going to get too much faster. HOWEVER, it can be improved, I think, by improving the structure of the kernel. It's partially unrolled, possibly wasting space. There's a tradeoff in rolling it up - I'll have to branch, or use conditional moves - but I'm pretty sure it'll be WELL worth it to decrease register usage and shrink code size.
Thanks again for providing this insight. It doesn't work for mr. spread if she has no "cheek pouches"... thanks but no thanks.
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
|
|
November 25, 2015, 12:51:54 AM |
|
Can someone shed some light on this: uint64_t signature8[5]; signature8[0] = psign[0]; signature8[1] = psign[8]; signature8[2] = psign[16]; signature8[3] = psign[24]; signature8[4] = psign[32];
uint64_t signature[4]; signature[0] = (DEC64LEng(psign + 0) >> 8) | (signature8[1] << 56); signature[1] = (DEC64LEng(psign + 8) >> 8) | (signature8[2] << 56); signature[2] = (DEC64LEng(psign + 16) >> 8) | (signature8[3] << 56); signature[3] = (DEC64LEng(psign + 24) >> 8) | (signature8[4] << 56);
signature8[1] = signature[0] >> 56; signature8[2] = signature[1] >> 56; signature8[3] = signature[2] >> 56; signature8[4] = signature[3] >> 56;
signbe[0] = SWAP8((signature[0] << 8) | signature8[0]); signbe[1] = SWAP8((signature[1] << 8) | signature8[1]); signbe[2] = SWAP8((signature[2] << 8) | signature8[2]); signbe[3] = SWAP8((signature[3] << 8) | signature8[3]); signbe[4] = (signature8[4] << 56) | 0x80000000000000;
Just... what is it even doing? Got it. Fun fact, that can be replaced by this: for(int i = 0; i < 4; ++i) signbe[i] = SWAP8(((ulong *)psign)[i]);
signbe[4] = (((ulong)psign[32]) << 56) | 0x80000000000000;
Whoever wrote that needs to stay away from the alcohol... hahaha ... you will find the main drive for the creation of most crypto IS alcohol ... not innovation ... just kidding ... thats awesome stuff wolf ... #crysx
|
|
|
|
|