Last night my 3 cards mining spread got ~ 100coins, I saw these in the morning showing in the wallet, and then suddenly they all became "rejected" ?! Any idea?! Are there newer wallet/miners :S
No, most people are still mining with the GPU miners created by Mr. Spread / girino and tsiv. Can you provide us with a getinfo of the wallet you used for mining? BTW mining 100 coins with 3 cards in 1 day seems unbelievable... what's going on here? Could it be that you started the wallet in testnet mode? Every day about 8000 SPR are mined. Right now nethash is around 300MH/s so you need about 3.75MH/s to get 100 coins a day. That is about 2-3 GTX750ti or one GTX970. Yes, this makes sense alright. I haven't done any GPU mining yet, so I have only secondary opinions from other people about what can be achieved with what GPU model.
|
|
|
Last night my 3 cards mining spread got ~ 100coins, I saw these in the morning showing in the wallet, and then suddenly they all became "rejected" ?! Any idea?! Are there newer wallet/miners :S
No, most people are still mining with the GPU miners created by Mr. Spread / girino and tsiv. Can you provide us with a getinfo of the wallet you used for mining? BTW mining 100 coins with 3 cards in 1 day seems unbelievable... what's going on here? Could it be that you started the wallet in testnet mode?
|
|
|
Has there been any progress made on funding the project?
Yes, a few people are sponsoring further development. It's just a matter of time for servicenodes to reach testnet. Stay tuned.
|
|
|
Probably more, but I don't want to look that deeply into it. You've done enough. Thanks for your assessment. I will take over from here.
|
|
|
SHA256d, at least the code itself, probably isn't going to get too much faster. HOWEVER, it can be improved, I think, by improving the structure of the kernel. It's partially unrolled, possibly wasting space. There's a tradeoff in rolling it up - I'll have to branch, or use conditional moves - but I'm pretty sure it'll be WELL worth it to decrease register usage and shrink code size.
Thanks again for providing this insight. It doesn't work for mr. spread if she has no "cheek pouches"... thanks but no thanks.
|
|
|
Not quite - you need to take into consideration that the massive number of SHA256 hashes take MOST of the time. So the REMAINING code can probably be doubled in speed, and I'm not sure what I can do with the signature yet. Right now, signature2 is not looking good - https://ottrbutt.com/tmp/spreadx11-sig2-analysis.png -- it's bigger than the code cache by a lot (code cache on GCN is 32KiB), and uses enough registers to limit it to one wave in flight. Since the kernel also uses some memory, it probably would benefit from more waves in flight. Interesting, so SHA256d is THE problem, although SPH is the most obvious thing that can be improved. I need to look into this CodeXL thing to analyze kernels. PS: mr. spread asks if you have any NSFW pics of naked hamster girls.
|
|
|
It's not really the SHA256d that's bad, it's the REST of X11.
You mean the way the whole SPH library has been ported to OpenCL, right? If you wanna call it that. It hasn't been ported so much as copypasted, and on top of this, SPH is a BAD library to use for any kind of speed-critical application. Its main purpose is to be portable across a wide range of CPUs, not perform well. I understand now, thanks. This means that the efficiency can probably be increased by a tenfold, I would guess... wow!
|
|
|
It's not really the SHA256d that's bad, it's the REST of X11.
You mean the way the whole SPH library has been ported to OpenCL, right?
|
|
|
I think its a good idea, on paper. Tx's are given priority, the rest is there to make mining equal. With near full blocks, mining should be easy. Perhaps we should cut the block sizes down to 1 tx 1tx blocks, moving up to 1mb blocks by next year. Maybe we should just organise hard forks every 6 months. One thing is for sure. If we keep the algo as it is, and increase the block size to say 2 Megabytes (10x ), this will also make the padding / hashWholeBlock calculation 10x more heavy on your GPU. I wonder if this algo can be reduced in complexity while still maintaining the same results. (But we can also introduce new complexity if it helps make everything MORE anti pool and pro solo-mining) After all, a solo-miner is always also a full node. Bingo!
|
|
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented. It's largely SPH code... really, really bad. About the same as the original darkcoin-mod. EDIT: Idea! What's the padding made out of? Maybe I can shortcut the memory usage! What I want to find out how many times we actually need to calculate those double-SHA256 for the whole 200KBytes. Maybe we can skip / hold a few iterations.
|
|
|
That looks like he's standing in front of an ASICs with vents. Has Mr.Spread already created an ASICs for SpreadX11?
Yep! It's a hamster dung powered wooden ASIC alright. He created it himself, just with the stuff that was lying around...
|
|
|
Maybe I can shortcut the memory usage!
I told Mr. Spread about your idea. He LOL'ed and... Who is Mr. Spread?
|
|
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented. It's largely SPH code... really, really bad. About the same as the original darkcoin-mod. EDIT: Idea! What's the padding made out of? Maybe I can shortcut the memory usage! padding is constructed starting with a seed/copy of the 32 bytes of previousHashBlock (or hashPrevBlock as it is called in the code), and then there are a few bitshifts (if necessary) and then tens of thousands of multiplications while we are filling the space (moving backwards thru the block). But we basically start with just those 32 bytes, and everything is derived from them. Padding starts here: https://github.com/spreadcoin/spreadcoin/blob/master/src/main.cpp#L1511 while (BlockData.size() % 4 != 0) BlockData << uint8_t(7);
// Fill rest of the buffer to ensure that there is no incentive to mine small blocks without transactions. uint32_t *pFillBegin = (uint32_t*)&BlockData[BlockData.size()]; uint32_t *pFillEnd = (uint32_t*)&BlockData[MAX_BLOCK_SIZE]; uint32_t *pFillFooter = std::max(pFillBegin, pFillEnd - 8);
memcpy(pFillFooter, &hashPrevBlock, (pFillEnd - pFillFooter)*4); for (uint32_t *pI = pFillFooter; pI < pFillEnd; pI++) *pI |= 1;
for (uint32_t *pI = pFillFooter - 1; pI >= pFillBegin; pI--) pI[0] = pI[3]*pI[7];
BlockData.forsed_resize(MAX_BLOCK_SIZE); First thing we do is fill up the Block from the left (right after the tx-section) with a few 0x07 bytes (only if necessary), just so that the current size of the blockdata size (header + txs) is exactly divisible by 4. Maybe size is already divisible modulo 4, so we don't need to add any such bytes. Then we define pointers pFillBegin, pFillEnd and pFillFooter, and then copy hashPrevBlock to the end (last 32 bytes) of this MAX_BLOCK_SIZE block, and then fill all the empty bytes in between moving backwards, 4 bytes per iteration. Oh, and before we start we also turning these 32 bytes (or the 8 x 4 byte integers it consists of) "ODD", by doing this *pI |= 1 operation on them, so that they are not divisible by 2 anymore). Then we just iterate backwards, in 4 byte steps, always taking pI[3]*pI[7] and writing the multiplication result into pI[0], we do that, until we reach the transaction section (or those 0x07 bytes we created earlier (if they were necessary))... That's about it. So this large padding section doesn't have any regularity or repetition if you were expecting that. It's pretty messy & chaotic data. Mr. Spread really wants your GPU to double-SHA256 a 200Kbyte datastructure. All the time.
|
|
|
What I don't get is why the AMD miner is so much worse than nvidia; and it has problems too.
I can't really judge it (yet), but it's probably badly implemented.
|
|
|
I'm baffled as to why it's done the way it is. There's just a TON of data in there being hashed by SHA256, far larger than any block SPR has...
Yes, the idea is that to calculate hashWholeBlock we need to fill a complete MAX_BLOCK_SIZE block with data (that's why we have the "padding") and then Double-SHA256 it. MAX_BLOCK_SIZE for spreadcoin is 200 Kbyte. https://github.com/spreadcoin/spreadcoin/blob/master/src/main.h#L35So you have the header (less than 100 bytes) , followed by a few tx (<1 KByte or more), and then a giant padding (199 KBytes) to fill up the rest. (This large pseudoblock is only existing temporarily in memory BTW, just to create the hashWholeBlock hash.) From the whitepaper: Padding ensures that there is no incentive to mine empty blocks without transactions So the way I understand this, it means that padding calculations are more numerous the fewer transactions you get into your block. Vice versa: The bigger the block already is (more txs), the less padding has to be added/calculated. So in a way, padding is horrible on efficiency since we are mostly having 1 transaction blocks these days. When this changes, then padding will not carry as much weight as it does now. But I don't think that the padding calculations are that heavy anyway, but I need to do some benchmarks and look into OpenCL / Cuda. And yes, we are ALWAYS doing a double SHA256 calculation on a MAX_BLOCK_SIZE block (200 KByte).This size doesn't change from block to block, it's always the maximum. Why Mr. Spread did it this way? I'd have to guess, but probably to deliberately give GPUs a hard time? Wasn't he playing the "CPU-Only-coin" - card for some time?
|
|
|
Girino's IS better, and as I said, this is not a crippling of the miner, it's just WTF.
Can you be a little bit more specific please? You said words like: WTF, Insane, Barbaric and Ooops, so now I am not sure anymore I understand the point you are trying to make. Nobody doubts that the miner's efficiency can be improved, and everybody knows that SpreadX11 is pretty exotic * So... what gives? Are you just baffled by it? In a good/bad way? *(adds MinerSignature to the blockheader, and constructs MAX SIZE block by padding the previousBlockHash (this means many thousand iterations, depending on how "empty" the block is, but those are just simple multiplications and bit operations, no hashes) so that a hashWholeBlock value can be added to the header, so as to prove that whoever found a block solution was also the same person to sign the coinbase and know about the whole blocks content, before any pool did.).
|
|
|
What about the commit history on GitHub? It's on there, right?
Yep, exactly... that's why I had girino in my mind in the first place.
|
|
|
Looks like I oopsed, they're not hashing the same thing over 3k times, they're hashing an INSANE amount of data. What the fuck is all of this...?
I don't remember who put together the AMD miner, I'm not even sure if I was part of spreadcoin community yet back then. Looks like whoever did this was pretty negligent. I remember who it was, it was Mr. Spread. lol But didn't someone else (girino?) create an "improved" version after that? hm.... need to check the old thread's history.
|
|
|
Looks like I oopsed, they're not hashing the same thing over 3k times, they're hashing an INSANE amount of data. What the fuck is all of this...?
I don't remember who put together the AMD miner, I'm not even sure if I was part of spreadcoin community yet back then. Looks like whoever did this was pretty negligent.
|
|
|
|