hirschhornsalz
Newbie
Offline
Activity: 16
Merit: 0
|
|
July 22, 2014, 09:55:10 PM |
|
|
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
July 22, 2014, 11:07:30 PM Last edit: July 23, 2014, 12:10:14 AM by bsunau7 |
|
With -m 100000000 on a nexus 4:
Thanks! With that -m option you are sieving the first 5.7million primes which is about 10 times larger than I would suggest. I suspect using a tenth will more than double overall performance (reducing it even more should improve it further). While your output was truncated the overall goal is to get the total time per thread to it's minimum. Right now you'd be spending ~2 years for a block (i.e not worth it, but playing with -m should improve that significantly). Some other notes, the sieve makes use of 128bit wide NEON instructions (most premium phones will have that). All sieves are impacted by memory performance (and phone memory is slow), the nexus 4 has a smaller cache than my system so memory access will play a bigger role in determining your performance (which is why making the sieve even smaller might help). Also I don't know enough about the Snapdragon processor, but most older ARM CPU's have very bad memory prefetch. Right now the sieve time should be pretty constant but the fermat tests will vary based on difficulty, I was thinking about a dynamic balancing the amount of sieving based on the relative performance of the fermat tests. As difficulty drops less sieving, as it rises more sieving. It should also help balance architectural differences in the CPUs in addition to difficulty levels. Thanks again, -- bsunau7
|
|
|
|
aamarket
|
|
July 23, 2014, 07:59:52 PM |
|
hmmmm ... I'll try over the weekend, but 2 years - even improved by a factor 20 means more than a month for a block ... not very encouraging ...
|
IMPORTANT:http://bitcointalk.org/index.php?topic=177133.0,Tips welcome BTC:1AAMARKETmJvfjDwEFmhyYYwfre7ZFVseP RIC:RGnX6LcJrsVEuYeySDDxkmH7AjRqoprcKt
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
July 23, 2014, 09:37:08 PM |
|
hmmmm ... I'll try over the weekend, but 2 years - even improved by a factor 20 means more than a month for a block ... not very encouraging ...
My gut says his hardware should be about 4 time quicker with a better (smaller) sized sieve. So only 5 times better to go (and there are 3 or 4 things which should get me significantly closer if I can pull it off). Regards, -- bsunau7
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 24, 2014, 05:30:59 PM |
|
Any uppdate this week Gatra?
|
|
|
|
hirschhornsalz
Newbie
Offline
Activity: 16
Merit: 0
|
|
July 24, 2014, 06:39:34 PM |
|
I suspect using a tenth will more than double overall performance (reducing it even more should improve it further).
Yes, I noticed the the "time to block" decreased to a third when using -m 5M. In my miner (for intel hardware though) I use all primes up to 2^32, so I was used to some larger numbers :-) Note that the time to block displayed my be serverly off. You need to apply a correction depending primorial you use, which will increase the likelyhood of finding a sextuplet, but OTOH you do miss some sextuplets if your primorial is bigger than 210 (and of course it is). Mertens 3rd theorem fits very well on this kind of sieves, it does give exactly the probability of not sieving a number if you sieve with all the primes up to n. You surely noticed the constant factor between p0/p1/p2... This factor can be calculated as the quotient from the 3rd mertens function with n as the sieve size and the prime density at the given difficulty, which is about 1/ln(2^(diff+256+8+1)). This factor raised to the sixt power (for p6) decreases significantly with the size of the primes used, so it might be worth using primes as big a possible, if your sieve is fast enough (but on an arm 2^32 is probably to big). I was thinking about a dynamic balancing the amount of sieving based on the relative performance of the fermat tests
The problem I found with the sieve after I eliminated all the modulo operations was that it is limited by memory bandwith. To make more than one core do useful work on it, I needed to divide it into small chunks which fit into the caches.
|
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
July 25, 2014, 12:19:24 AM |
|
Yes, I noticed the the "time to block" decreased to a third when using -m 5M. In my miner (for intel hardware though) I use all primes up to 2^32, so I was used to some larger numbers :-) Note that the time to block displayed my be serverly off. You need to apply a correction depending primorial you use, which will increase the likelyhood of finding a sextuplet, but OTOH you do miss some sextuplets if your primorial is bigger than 210 (and of course it is).
The first choice I made was to use 32bit everywhere (64bit arithmetic is expensive). This means each thread only searches at most (nonce -> nonce+2^32) sieving with larger primes has low chance of pruning the candidate. It just becomes cheaper to stop sieving primes (diminishing returns) and start fermat testing. I don't use the pnXX+97 step, but test all 5005 candidates per pn19, so I don't believe I should miss any, so I am not sure why a correction factor would be required. Mertens 3rd theorem fits very well on this kind of sieves, it does give exactly the probability of not sieving a number if you sieve with all the primes up to n. You surely noticed the constant factor between p0/p1/p2... This factor can be calculated as the quotient from the 3rd mertens function with n as the sieve size and the prime density at the given difficulty, which is about 1/ln(2^(diff+256+8+1)). This factor raised to the sixt power (for p6) decreases significantly with the size of the primes used, so it might be worth using primes as big a possible, if your sieve is fast enough (but on an arm 2^32 is probably to big).
Yes I did, ~34-35 is the observed factor (@ current difficulty and sieving with the first 550k primes) and I use it outside of the miner to estimate time to block. In the miner I still use the original gatra code for the time estimate, it is ~10% of what it should be which isn't a big concern (ie low priority). I think the sieve code is pretty fast, but it is only fast because I stick to the ARM's word size. A sieve of a few hundred thousand primes will be a natural limit for a 32bit implementation, until the difficulty increases (when fermat testing gets significantly more expensive) I don't see any benefit in removing the 32bit design limit. The problem I found with the sieve after I eliminated all the modulo operations was that it is limited by memory bandwidth. To make more than one core do useful work on it, I needed to divide it into small chunks which fit into the caches.
Yes, memory bandwidth is a big issue in ARM as well. Not only does it use slower memory, hardware pre-fetch is not as advanced as x86, reading & writing to the same cache line has a penalty as does misalignment. My first few miners were all running at the same general speed no matter what I did, once I started minimizing memory access I was able to get some good gains. Regards, -- bsunau7
|
|
|
|
dga
|
|
July 25, 2014, 02:08:58 AM |
|
I'll need to look at dga's code to come up with a method to map relative performance; question is does dga's miner look for any 4 out of 6 or does it look for any chain (contiguous) of 4?
The formula dga's miner uses is the 1st plus any three of the remaining 5. So all other things being equal the scaling factor should be 10 (please check my numbers, red wine and all...). You'll need to multiply the p4's I report by 10 and divide by the time taken. Regards, -- bsunau7 Correct. I coordinated this definition with jh00 of ypool to ensure that the definition of a share was consistent, so if you need to change it for some reason, just be careful with the pool interaction whichever way you go.
|
|
|
|
gatra (OP)
|
|
July 25, 2014, 03:21:57 AM |
|
Hi people! I'm still short on time... other members of the community offered help to I'll be working with them.
My problem debugging (the code I hacked based on) dga's miner is that using Eclipse + gdb, after the mining threads are created it stops responding for a while, making debugging very hard.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 25, 2014, 04:33:04 AM |
|
Hi people! I'm still short on time... other members of the community offered help to I'll be working with them.
My problem debugging (the code I hacked based on) dga's miner is that using Eclipse + gdb, after the mining threads are created it stops responding for a while, making debugging very hard.
Great! Destroying Ypool monopolization is a nice goal to have
|
|
|
|
hirschhornsalz
Newbie
Offline
Activity: 16
Merit: 0
|
|
July 25, 2014, 07:32:25 AM |
|
The first choice I made was to use 32bit everywhere (64bit arithmetic is expensive).
Yes of course. In the sieving code I use 32 bit too, except for the sieve setup, which requires long (256 bit) arithmetic. This means each thread only searches at most (nonce -> nonce+2^32) so I don't believe I should miss any, so I am not sure why a correction factor would be required.
I search from nonce to nonce+2^255 with a sieve of 2^31 bit size, and to do that in my sieve every bit represents a number 16057 + 173# * n, where 16057 is the offset of the second sixtuple and 173# is the primorial I use. This of course leaves large gaps between numbers represented by the sieve and I wrongly assumed your code would be similar. Yes, memory bandwidth is a big issue in ARM as well.
It's not only bandwith (mainly reading a big precalculated table with primes and modulo inverses, which the x86 does indeed pretty well) but the latency of the pseudo random accesses when the primes get bigger than the table, each one amounting to 60 ns (300 clocks cycles!). Nevertheless i found it useful if in the end the blocks comes faster :-)
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 26, 2014, 03:00:47 AM |
|
I know this is late but riecoin.org is updated to match updated client and miner version.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 29, 2014, 03:24:11 PM |
|
How many people are planning to buy Riecoin as soon (if) it goes to .00001. I know I am.
|
|
|
|
primer-
Legendary
Offline
Activity: 1092
Merit: 1000
|
|
July 29, 2014, 03:35:31 PM |
|
How many people are planning to buy Riecoin as soon (if) it goes to .00001. I know I am.
Its dead
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 29, 2014, 03:40:51 PM |
|
How many people are planning to buy Riecoin as soon (if) it goes to .00001. I know I am.
Its dead o.o lol. Its far from dead if you read Gatra's announcement. Your account makes me lose any faith in your credibility.
|
|
|
|
primer-
Legendary
Offline
Activity: 1092
Merit: 1000
|
|
July 29, 2014, 03:48:37 PM |
|
How many people are planning to buy Riecoin as soon (if) it goes to .00001. I know I am.
Its dead o.o lol. Its far from dead if you read Gatra's announcement. Your account makes me lose any faith in your credibility. Gatra is one competent developer but the coin is dead. I will be following his work, should he launch a new coin i'll mine-support it.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 29, 2014, 03:54:22 PM |
|
How many people are planning to buy Riecoin as soon (if) it goes to .00001. I know I am.
Its dead o.o lol. Its far from dead if you read Gatra's announcement. Your account makes me lose any faith in your credibility. Gatra is one competent developer but the coin is dead. I will be following his work, should he launch a new coin i'll mine-support it. I agree that the coin has had more hype before and has lot traction since but a dead coin is like Kanye or Particle where dev. actually abandoned it...continuous development = not dead.
|
|
|
|
c.figgis
Newbie
Offline
Activity: 14
Merit: 0
|
|
July 30, 2014, 09:57:10 PM |
|
Hi riecoin community, what service/services do you think this coin needs to make it more valuable?
the more the merrier! The most important service would be more pools. Then it could be online wallets, payment processors, blockchains with api like blockchain.info.... We still have no open source xpt pool or proxy?
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
July 31, 2014, 02:22:29 PM |
|
Hi riecoin community, what service/services do you think this coin needs to make it more valuable?
the more the merrier! The most important service would be more pools. Then it could be online wallets, payment processors, blockchains with api like blockchain.info.... We still have no open source xpt pool or proxy? I believe Gatra is still working on it. Unsuprisingly, it is difficult to do that. Fortunately, he says that he now has a team of developer I believe?
|
|
|
|
c.figgis
Newbie
Offline
Activity: 14
Merit: 0
|
|
July 31, 2014, 06:44:49 PM |
|
Hi riecoin community, what service/services do you think this coin needs to make it more valuable?
the more the merrier! The most important service would be more pools. Then it could be online wallets, payment processors, blockchains with api like blockchain.info.... We still have no open source xpt pool or proxy? I believe Gatra is still working on it. Unsuprisingly, it is difficult to do that. Fortunately, he says that he now has a team of developer I believe? Many months ago I setup a high uptime distributed pool infrastructure because we were expecting that piece. I don't see how the lack of pools problem will be resolved without it. Oh well. If it ever gets built we'll turn that pool on.
|
|
|
|
|