smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
April 29, 2014, 11:52:10 PM Last edit: April 30, 2014, 12:44:20 AM by smooth |
|
I may not understand what you are trying to accomplish.
If H1 is slow but does not require a lot of random access to memory, then you can run H1 on a GPU or ASIC, then deliver a set of indexes into the blockchain to the node. If the blockchain fits in memory then you are doing a handful of memory accesses and the other work may dominate. If the blockchain does not fit in memory, then you are giving a huge advantage to people with large solid state drives (flash or battery DRAM) or probably better the ability to store the block chain in a memory kvs across multiple servers.
This may frustrate decentralization because you are better off just maintaining a connection to a node/pool with such a device than running node yourself.
If you want to use the block chain for PoW like Ethereum to require miners to run nodes (but see above), then you can probably do something simple like:
B = block E = hash function, such as Keccak B(i) = blockchain data at index i (mod len(blockchain) or some such)
H1=E(B) H2=E(B+1) PoW= E(B(H1))+H2)
Could be repeated, but not sure that adds much.
Maybe that is close to what you propose, but again I don't see the point to using a scratchpad at all. The blockchain is essentially your scratchpad.
|
|
|
|
crypto_zoidberg (OP)
|
|
April 30, 2014, 10:19:39 AM Last edit: April 30, 2014, 11:07:12 AM by crypto_zoidberg |
|
I may not understand what you are trying to accomplish.
If H1 is slow but does not require a lot of random access to memory, then you can run H1 on a GPU or ASIC, then deliver a set of indexes into the blockchain to the node.
That's why H1 have to be fast, as you wrote in shortcomings. If the blockchain fits in memory then you are doing a handful of memory accesses and the other work may dominate. If the blockchain does not fit in memory, then you are giving a huge advantage to people with large solid state drives (flash or battery DRAM) or probably better the ability to store the block chain in a memory kvs across multiple servers.
This may frustrate decentralization because you are better off just maintaining a connection to a node/pool with such a device than running node yourself.
Let's do some calculations to see what we have: To get hashing data from block we use: 1. Coinbase outs: usualy 10*32 = 320 bytes. 2. Tx hashes: 32 * (from 1 to 80) (80 is current bitcoin transaction flow) = from 32 to 2560 bytes With 720 block per day we will increase scratchpad from 92 MB to 758MB per year. Enough to make ASIC's stay away but ok for normal miners. Even if we will be a very success and will get tx flow like a bitcoin in next ten years, scratchpad will be about 10GB, not a problem even now. The real problem i think is to have SPV client with this approach. What do you think ? If you want to use the block chain for PoW like Ethereum to require miners to run nodes (but see above), then you can probably do something simple like: B = block E = hash function, such as Keccak B(i) = blockchain data at index i (mod len(blockchain) or some such)
H1=E(B) H2=E(B+1) PoW= E(B(H1))+H2) Could be repeated, but not sure that adds much. Maybe that is close to what you propose, but again I don't see the point to using a scratchpad at all. The blockchain is essentially your scratchpad.
We do very similar: E' - first phase hash. E - final phase hash. H1' and H1' - is different parts of same hash (low and high) used to address random block H1=E'(B) PoW= E(H1 + E(B(H1')) + E(B(H1'')) ) E' can be a keccak(at least it should be as fast as keccak), but better to use some hash with more complicated instruction set as i said (64-bits numbers multiplication, AES/SSE)
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
May 01, 2014, 02:03:20 AM |
|
The real problem i think is to have SPV client with this approach. What do you think ?
You most certainly can't have SPV clients if verifying the block header pow requires the entire block chain. As far as the pow itself I'm still not quite sure what you are trying to accomplish. As far as the bitcoin transaction volume you mentioned, I consider that very low if you are designing for the long term. But it seems you have through this through a fair amount and if you are launching in two days I doubt it makes sense to change the pow right before launch, so I guess we'll just see how it goes. You can always hard fork, but that gets much more difficult to accomplish over time.
|
|
|
|
digicoin
Legendary
Offline
Activity: 1106
Merit: 1000
|
|
May 01, 2014, 09:03:37 AM |
|
Is it possible to extend daemond to return full list of block header hashes instead of the full blockchain? What is the security implications of this approach? E.x: a malicious/compromised node can response with purposefully modified hash list?
I believe that this coin can take CryptoNote as its core technology but it must separate itself from Bytecoin to make room for improvement. At least for the middle term.
|
|
|
|
crypto_zoidberg (OP)
|
|
May 02, 2014, 09:56:24 PM |
|
You most certainly can't have SPV clients if verifying the block header pow requires the entire block chain.
As far as the pow itself I'm still not quite sure what you are trying to accomplish.
Sorry if i'm not clear. We are looking for a way to be memory hard (at mining) on the one hand, on the other to use wider cpu instruction set (if possible). In our opinion it make sense in ASIC protection. Please correct me if i wrong. At the same time we want to have cheap PoW check operation(the reason why we changing CryptoNote PoW). As far as the bitcoin transaction volume you mentioned, I consider that very low if you are designing for the long term.
But it seems you have through this through a fair amount and if you are launching in two days I doubt it makes sense to change the pow right before launch, so I guess we'll just see how it goes. You can always hard fork, but that gets much more difficult to accomplish over time.
Probably it's better to spend one week for discussion before launch than stress network with hard fork.
|
|
|
|
crypto_zoidberg (OP)
|
|
May 02, 2014, 11:10:08 PM |
|
Is it possible to extend daemond to return full list of block header hashes instead of the full blockchain? What is the security implications of this approach? E.x: a malicious/compromised node can response with purposefully modified hash list?
Yes, for example we can make daemon able to return randomly requested block headers. But don't think it's necessary. SPV client have to keep block id vector, like our wallet do, it is 8Mb per year. For SPV client each new block should be received with the headers required to get this PoW. To check if this supplied headers valid you just needed to get id-hash(cn_fast_hash which is actually keccak) of this header and validate if this id equal with id in SPV's vector on correspond height. Even if compromised node will make PoW with fake headers, SPV client is able to validate it. So, probably we need to extend daemon to be able work with SPV clients by making possible to send blocks coupled with related PoW headers. Think that SPV client could be done based on Wallet + p2p layer + PoW. I believe that this coin can take CryptoNote as its core technology but it must separate itself from Bytecoin to make room for improvement. At least for the middle term.
Not sure that i gues what you mean about separating from Bytecoin.
|
|
|
|
otila
|
|
May 08, 2014, 06:16:36 PM |
|
We are looking for a way to be memory hard (at mining) on the one hand, on the other to use wider cpu instruction set (if possible). In our opinion it make sense in ASIC protection. Please correct me if i wrong.
You can't make it memory-hard with 24-round non-optimized keccak, so why insist on using it?
|
|
|
|
superresistant
Legendary
Offline
Activity: 2156
Merit: 1131
|
|
May 10, 2014, 06:48:20 AM |
|
1. Wide CPU instruction set 2. Memory-oriented algo 3. Small work time.
Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work. It was supposed to rely on RAM amount to counter bot farming but it didn't. I think it is a great to be memory dependent. What do you think about a minimum memory amount to mine ?
|
|
|
|
otila
|
|
May 10, 2014, 08:21:33 AM |
|
As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms. Such difference(caused by the cache memory overflow) points to real memory hardness we guess.
Compare memory read/written per second by the hash to memmove() speed. What do you get? Does each round of keccak read from different areas of scratchpad?
|
|
|
|
crypto_zoidberg (OP)
|
|
May 10, 2014, 10:37:33 AM |
|
1. Wide CPU instruction set 2. Memory-oriented algo 3. Small work time.
Memorycoin failed on the 2 first point (I don't know about the last). It was AES-NI instruction only, if you didn't have a recent CPU, you were very slow or it didn't work. It was supposed to rely on RAM amount to counter bot farming but it didn't. I think it is a great to be memory dependent. What do you think about a minimum memory amount to mine ? Not very big. It's not about huge memory amount, scratchpad is building on blocks pseudorandom data, such as hashes and tx keys, and will grow about 90MB/year. Huge scratchpad gonna make almost impossible to have SPV-client.
|
|
|
|
crypto_zoidberg (OP)
|
|
May 10, 2014, 10:46:37 AM |
|
As you can see, working on small amount of memory 100000 hash operations takes 3020 ms, meanwhile work on 100Mb scratchpad with the same operations count takes 8574 ms. Such difference(caused by the cache memory overflow) points to real memory hardness we guess.
Compare memory read/written per second by the hash to memmove() speed. What do you get? Does each round of keccak read from different areas of scratchpad? 1. it gives correlation between calculations time and memory wait time. 2. yes. addressing based on state buffer.
|
|
|
|
hirschhornsalz
Newbie
Offline
Activity: 16
Merit: 0
|
|
May 10, 2014, 11:52:43 AM |
|
Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too. ~/.bitcoin $ du -sh blocks 20G blocks/
Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations?
|
|
|
|
crypto_zoidberg (OP)
|
|
May 10, 2014, 02:46:26 PM |
|
Now lets imagine - just for a short time - this currency will be really successful. The blockchain of bitcoin grows faster thean linear in time, it does make sense to assume a slow exponential growth for a successful altcoin too. ~/.bitcoin $ du -sh blocks 20G blocks/
Now lets just assume you hit a 40 G blockchain in 3 years. Are you sure there are enough nodes left with 64 GB Ram? What about the distribution of this kind of workstations? I feel that i need to have more clear description. Each block's entry in scratchpad is not depends of number of transactions included in it. It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase key, and hashed coinbase outs (usually 8 ). I'll put more detailed description.
|
|
|
|
otila
|
|
May 11, 2014, 06:24:50 AM |
|
I feel that i need to have more clear description. Each block's entry in scratchpad is not depends of number of transactions included in it. It is fixed to about 320 bytes and use prev_id, merkle root, onetime coinbase key, and hashed coinbase outs (usually 8 ).
(with mixin_t) with width=1600, rate=1536, and capacity=(1600-1536)=64, you get collision resistance=2^32, (second) preimage resistance=2^32. However, data is mixed into the state each round and you use multiply instead of xor, so security level of the construction is unknown.
|
|
|
|
otila
|
|
May 13, 2014, 08:50:46 AM |
|
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
May 13, 2014, 09:00:45 AM |
|
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard The block chain is tiny right? Probably all in near cache
|
|
|
|
otila
|
|
May 13, 2014, 10:22:37 AM Last edit: May 13, 2014, 10:41:03 AM by otila |
|
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard The block chain is tiny right? Probably all in near cache EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles. The vector size could as well be rounded up to next power of two and doing some padding, avoiding modulus by doing bitwise and operation, so instead of size() you use shift count.. I wouldn't like to uglify the code by implementing reciprocal multiplication hacks. But who knows, maybe GPUs and ASICs have slow 64 bit divide
|
|
|
|
smooth
Legendary
Offline
Activity: 2968
Merit: 1198
|
|
May 13, 2014, 11:18:44 AM |
|
now when mining in testnet, currency::get_blob_longhash takes 75% of CPU time, and 75% of CPU time in that function is spent in doing div instructions due to size() and operator[], not quite memory-hard The block chain is tiny right? Probably all in near cache EDIT: div cycle count on Sandy Bridge depends on divisor, bigger divisors take more time (latency: 30-94 cycles). As a comparison, L2 cache minimum latency is 11 cycles. Right but memory is much higher latency. The idea is for the block chain data to (eventually) be in memory, not L2. I agree replacing div by something faster is probably a good idea, but I haven't looked at this code at all.
|
|
|
|
otila
|
|
May 14, 2014, 01:52:13 PM Last edit: May 15, 2014, 01:25:37 PM by otila |
|
seems this miner stuff is obfuscated on purpose, dev is probably running 10x faster miner himself.. but I have two days to make a faster version #0 std::vector<crypto::hash, std::allocator<crypto::hash> >::operator[] (this=0x7f837d3fb950, __n=12302) at /usr/include/c++/4.9.0/bits/stl_vector.h:780 #1 0x0000000000c4e75c in currency::miner::<lambda(uint64_t)>::operator()(uint64_t) const (__closure=0x7f837d3fb4e0, index=9325784170990468272) at /c/boolberry/src/currency_core/miner.cpp:365 #2 0x0000000000c4f912 in currency::<lambda(uint64_t (&)[25], uint64_t (&)[24])>::operator()(crypto::state_t_m &, crypto::mixin_t &) const (__closure=0x7f837d3fb200, st=..., mix=...) at /c/boolberry/src/currency_core/currency_format_utils.h:189 #3 0x0000000000c4fdab in crypto::wild_keccak<crypto::mul_f, currency::get_blob_longhash(const blobdata&, crypto::hash&, uint64_t, callback_t) [with callback_t = currency::miner::worker_thread()::<lambda(uint64_t)>; currency::blobdata = std::basic_string<char>; uint64_t = long unsigned int]::<lambda(uint64_t (&)[25], uint64_t (&)[24])> >(const uint8_t *, size_t, uint8_t *, size_t, currency::<lambda(uint64_t (&)[25], uint64_t (&)[24])>) ( in=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", inlen=33, md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:134 #4 0x0000000000c4fb09 in crypto::wild_keccak_dbl<crypto::mul_f, currency::get_blob_longhash(const blobdata&, crypto::hash&, uint64_t, callback_t) [with callback_t = currency::miner::worker_thread()::<lambda(uint64_t)>; currency::blobdata = std::basic_string<char>; uint64_t = long unsigned int]::<lambda(uint64_t (&)[25], uint64_t (&)[24])> >(const uint8_t *, size_t, uint8_t *, size_t, currency::<lambda(uint64_t (&)[25], uint64_t (&)[24])>) ( in=0x7f8381dba098 "\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i", inlen=76, md=0x7f837d3fb540 "\366+\307\351\330\b3pQ\264\067\061ǭVjf1\034s \237\224\233\327\016\226-\332xko", mdlen=32, cb=...) at /c/boolberry/src/crypto/wild_keccak.h:151 #5 0x0000000000c4fa8d in currency::get_blob_longhash<currency::miner::worker_thread()::<lambda(uint64_t)> >(const currency::blobdata &, crypto::hash &, uint64_t, currency::miner::<lambda(uint64_t)>) ( bd="\001-\261FVm\345O\330\061\021\237\257\210\200\204\364=\374\243\031.\023\254\350\233O\372\373\262\032~łz$i\000\216\353қ\005QJ\034k0\023pq\024y\\\356\031\343\376\376\342\366\250{\340\327\363\344RSn\002c\f\277\236\001", res=..., height=2056, accessor=...) at /c/boolberry/src/currency_core/currency_format_utils.h:179 #6 0x0000000000c4ec65 in currency::miner::worker_thread (this=0x7fff82e7fdd8) at /c/boolberry/src/currency_core/miner.cpp:366
|
|
|
|
perl
Legendary
Offline
Activity: 1918
Merit: 1190
|
|
May 18, 2014, 10:48:44 PM |
|
You have smoke for write algo hash or I did not understand ?
What interest is one pool not receveid more 20 personne ? Pool as need validate best resultat of miner for validate submit . Make validation 20 personne get more resources of mining directly .
|
|
|
|
|