Mike270
|
 |
July 10, 2013, 08:22:45 PM |
|
did you measure the speedup of every change? the first rule of optimization is: measure! otherwise you may be making it worse without knowing...
Hi gatra, with those few blocks/day and primes/s being quite imprecise... I left optimizations in when they didn't appear to cause an immediately visible (~10min) deterioration in primes/s. just a few comments: those bool arrays are probably no much better than the vectors. Each bool takes 1 byte of RAM. If you want to make that faster you have use an array of ints that is 1/32 of the size, and encode each bool in a single bit of the 32 bits. That will make arrays shorter, speeding up mallocs and making more bools fit in the cache.
Also, probably changing the call to size() by a variable, and splitting the loop look like not worth it. However the upper and lower_bound are definitly better
Yeah, I'm aware of the bool issue, and I thought let's give it a try in the malloc vs. memory tradeoff. It actually seems to have been the single optimization that brought a rather direct improvement to primes/sec (apart from Sunny Kings change). Still, take a look at a profiler, those mallocs are by far the most dominant in the whole program. I'm thinking of merging the three arrays into one and saying e.g. 1 bit means vfCompositeCunningham1, another one means vfCompositeCunningham2 etc. since they are usually addressed one after another at the same index, this optimizes cache usage, and something like if (!vfCompositeCunningham1[nMultiplier] || !vfCompositeCunningham2[nMultiplier] || !vfCompositeBiTwin[nMultiplier]) can be reduced to if (newarray[nMultiplier]!=0) @tacotime: I did a CSieveOfEratosthenes(unsigned int nBits, uint256 hashBlockHeader, CBigNum& bnFixedMultiplier): vfCompositeCunningham1{0}, vfCompositeCunningham2{0}, vfCompositeBiTwin{0} { instead - the compiler gives a strange warning about me having to supply a switch but this behaviour being the default anyway (  ) but all bools appear to be false, so it appears to be working (I did a loop in the initializer for a while that was checking whether alls are false and if not would have output a warning, and as I saw no warning after ~30mins I then removed the check and am assuming that all is fine... prime.h:96:2: Warnung: erweiterte Initialisierungsliste nur mit -std=c++11 oder -std=gnu++11 verfügbar [standardmäÃig aktiviert] (Roughly translates to : Enhanced initialization list only when using -std=c++11 or -std=gnu++11 available (activated by default) But when I supply either, I get errors in other modules...
|
|
|
|
|
|
|
|
|
Bitcoin mining is now a specialized and very risky industry, just like gold mining. Amateur miners are unlikely to make much money, and may even lose money. Bitcoin is much more than just mining, though!
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
tacotime
Legendary
Offline
Activity: 1484
Merit: 1005
|
 |
July 10, 2013, 08:29:25 PM |
|
I think it'd be ideal to use a 2D array with the three composite values stored sequentially in memory as single bits each
Probably should be a std::bitset datatype too
|
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
|
|
|
Mike270
|
 |
July 10, 2013, 08:31:54 PM |
|
Thanks Mike! I did some change but i don't understand some of them, like those three :
-did the headerhash mod outlined earlier
https://bitcointalk.org/index.php?topic=251850.msg2689981#msg2689981(But I didn't change the MaxSieveSize) -after creating vPrimes array in prime.cpp, store vPrimes.size() in a static variably and replace all calls to it with the variable
in prime.cpp: // Prime Table std::vector<unsigned int> vPrimes; unsigned int vPrimessz; (<--- add this) in GeneratePrimeTable(), add a last line: vPrimessz=vPrimes.size(); Replace all other occurrences of vPrimes.size() with vPrimessz. - in prime.cpp, CSieveOfEratosthenes::Weave() : Split the for loop into two for loops, one going up to nChainLength, the other one going up to 2*nChainLength Thus, you can safe the first first if+for clause in the first loop and leave away the if, and leave the if+for completely away in the second loop.
for (; nBiTwinSeq < nChainLength; nBiTwinSeq++) { [...] //if (nBiTwinSeq < nChainLength) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeBiTwin[nVariableMultiplier] = true; if (((nBiTwinSeq & 1u) == 0)) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham1[nVariableMultiplier] = true; else for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham2[nVariableMultiplier] = true; [...] for (; nBiTwinSeq < 2 * nChainLength; nBiTwinSeq++) { [...] //if (nBiTwinSeq < nChainLength) // for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) // vfCompositeBiTwin[nVariableMultiplier] = true; if (((nBiTwinSeq & 1u) == 0)) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham1[nVariableMultiplier] = true; else for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham2[nVariableMultiplier] = true;
|
|
|
|
gudmunsn
|
 |
July 10, 2013, 08:34:32 PM |
|
I think it'd be ideal to use a 2D array with the three composite values stored sequentially in memory as single bits each
Probably should be a std::bitset datatype too
+1
|
|
|
|
refer_2_me
|
 |
July 10, 2013, 08:35:50 PM |
|
I think it'd be ideal to use a 2D array with the three composite values stored sequentially in memory as single bits each
Probably should be a std::bitset datatype too
+1 When you guys get that figured out, pull requests would be much appreciated.
|
BTC: 1reFerkRnftob5YvbB112bbuwepC9XYLj XPM: APQpPZCfEz3kejrYTfyACY1J9HrjnRf34Y
|
|
|
Sunny King (OP)
Legendary
Offline
Activity: 1205
Merit: 1010
|
 |
July 10, 2013, 08:38:02 PM |
|
Sorry for my earlier error on Market cap. The newest exchange rates and coin total (168,186.31) yields a cap of around 68k which would be 15th just behind CNC. The hourly mining rate is equivalent to 6 BTC (assuming 1 minute blocks and 20.41 coins per block as the last few have been) which wold be $500 at current exchange rates, still 3rd over.
Going according to plan isn't it  Looking forward to XPM's debut on the network mining income pie chart 
|
|
|
|
Mike270
|
 |
July 10, 2013, 08:39:08 PM |
|
I think it'd be ideal to use a 2D array with the three composite values stored sequentially in memory as single bits each
Probably should be a std::bitset datatype too
Hm, that would save memory but cost more in terms of computing (fiddling with all the bit shift operations). I think I'll first go for a plain int array and when I want to set a bit for the former arrays I can just do a |=, and it will much simplify the ORing of all three arrays, so I'd guess it's only one malloc for the whole array and saving in computational efforts. (Although I'm aware that those CBigNum probably much outweigh all these optimizations if one could save there)
|
|
|
|
96redformula
|
 |
July 10, 2013, 08:39:23 PM |
|
Sorry for my earlier error on Market cap. The newest exchange rates and coin total (168,186.31) yields a cap of around 68k which would be 15th just behind CNC. The hourly mining rate is equivalent to 6 BTC (assuming 1 minute blocks and 20.41 coins per block as the last few have been) which wold be $500 at current exchange rates, still 3rd over.
Going according to plan isn't it  Looking forward to XPM's debut on the network mining income pie chart  Are you doing the windows binary build?
|
|
|
|
RandyFolds
|
 |
July 10, 2013, 08:39:33 PM |
|
I think a lot of people are frustrated because they don't understand the difference between pooled and solo mining.
In pooled mining, (ie, what we all do with Bitcoin), the work is broken down into a ton of discrete chunks - each miner is compensated when a block is solved based on their proportion of work contributed - ie everyone gets a slow, steady and somewhat predictable payout.
Solo minings different - blocks don't get shared, so each individual gets a much greater payout, but whether they get it or not is a LOT more dependent on whether they're lucky enough to solve a block before someone else does. There have only been 8000 or so blocks solved so far. I've got the miner running on various computers at varying pps rates - during the first 17 hours or so, I solved no blocks at all. Then they started getting solved. And the thing was, the difference between the number of blocks solved from a computer running at 10 pps and another running at 150pps is much smaller than the difference in prime generation rate.
So, there are 8000 blocks awarded so far.
Lets say there are 500 people attempting to mine them and each person is using 3 computers to do so - on average, if they've all been mining since the beginning, each computer would have solved 5.333 blocks. But the distribution is a lot more random; some people will have gotten 5 blocks, others have gotten zero, and others have gotten 20+. But that's the thing with solo mining. And if, in the beginning, there were only 50 people mining during the first hour, 200 mining 6 hours later, and 500 24-hours later, of course there will be drop-offs in the rate that people see blocks get solved...
Given the newness of the coin, the lack of markets for it, etc, I'd be shocked if someone had already figured out how to have a GPU solve the problems - it's not just rewriting the existing SHA algorithm that bitcoin uses, changing a couple lines to make it primecoin ready, it's revising everything about the miner to handle a completely different type of work. Optmizing the CPU miner? Sure. But even so, I don't think they'ed be churning out coins in this environment, as, again, I think the payout has a lot more to do with luck at this point than anything else including raw horse power.
Putting together some charts to demonstrate the distribution of the rewards I've seen so far.... maybe someone will find it to be of interest, who knows?
But in the meantime, if people want to see more predictable coin generation, then maybe they should start a fund to pay a bounty to the first person to develop a mining pool? i assume it'll be a lot of work, because, again, the dynamics of this coin are much different than any of the others, including this time to "mature". Or, you could make an informal "pool" with someone you know and really trust - just have both of you mine to the same wallet.dat file, and, as you add machines, the randomness that a single miner sees will be evened out more and more...
That my two cents. Utterly useless, because I'm not a programmer. But so far, I'm pretty excited about this coin and I can't quite fathom why people are getting upset with it already?
People are getting upset because some people not sharing their compiled code are damaging their own reputation and the reputation of the coin. Also, it's just flat out unethical to steal blocks. 'Of course they are! This is the internetz where anything goes!' is the typical argument, however I'd like to think the crypto-community and particularly XPM miners are for the most part much better than the worst posters on the /b forums. Right now there are a lot of people proving me wrong. All you have to do is look at the statistics. Behind ahead of the curve is now unethical? Ever hear of this thing called life? So you pretty much outed yourself and someone not sharing your optimizations with the community. Can people please realize this and use the reputation button the way it was intended? There's a rep system here? I see a WoT for TRADE, not bitching because you're butthurt.
|
|
|
|
fluffypony
Donator
Legendary
Offline
Activity: 1274
Merit: 1060
GetMonero.org / MyMonero.com
|
 |
July 10, 2013, 08:39:38 PM |
|
I think it'd be ideal to use a 2D array with the three composite values stored sequentially in memory as single bits each
Probably should be a std::bitset datatype too
+1 When you guys get that figured out, pull requests would be much appreciated. I was literally about to ask why there haven't been pull requests for this:) Much easier working off a common codebase than trying to implement scattered patches.
|
|
|
|
gateway
|
 |
July 10, 2013, 08:41:03 PM |
|
Difficult to say, as Sunny King's change brings a huge boost to primes/sec measurement, and since we already know that this is not a real measurement.... I found three blocks in 1,5h whereas I only had 2 blocks in the 24h before that, but since we're talking luck this may well just be a statistical anomaly.
With all mentioned optimizations in place I now see 2013-07-10 19:54:30 primemeter 1229628 prime/h 9519403 test/h 2013-07-10 19:56:30 primemeter 1200380 prime/h 9385149 test/h 2013-07-10 19:58:31 primemeter 1059123 prime/h 8281702 test/h 2013-07-10 20:00:32 primemeter 1132707 prime/h 8806710 test/h
Whereas before I had 2013-07-09 16:01:06 primemeter 191133 prime/h 1230638 test/h 2013-07-09 16:03:07 primemeter 207187 prime/h 1333814 test/h 2013-07-09 16:05:07 primemeter 197670 prime/h 1302659 test/h 2013-07-09 16:07:08 primemeter 226844 prime/h 1538940 test/h 2013-07-09 16:08:09 primemeter 241089 prime/h 1653641 test/h 2013-07-09 16:10:09 primemeter 246644 prime/h 1574452 test/h
So if this helps, feel free to donate ;-) XPM ALLbe86QswwTRvx1LTVnKaVXCz6mcA6YUR BTC 135oF6hF9uUbzq9x1vTuaoqKPab2j513f2
I'm thinking the biggest improvements can be done by a) optimizing the algorithm itself, possible according to TacoTime's post b) minimizing malloc()s and I guess there's still lots of potential. If I had more time I'd try looking into using smaller sieves that fit into CPU cache but doing more sieves (so that the range sieved remains the same)
nice, you building on windows or *nix?
|
|
|
|
Mike270
|
 |
July 10, 2013, 08:48:30 PM |
|
Difficult to say, as Sunny King's change brings a huge boost to primes/sec measurement, and since we already know that this is not a real measurement.... I found three blocks in 1,5h whereas I only had 2 blocks in the 24h before that, but since we're talking luck this may well just be a statistical anomaly.
With all mentioned optimizations in place I now see 2013-07-10 19:54:30 primemeter 1229628 prime/h 9519403 test/h 2013-07-10 19:56:30 primemeter 1200380 prime/h 9385149 test/h 2013-07-10 19:58:31 primemeter 1059123 prime/h 8281702 test/h 2013-07-10 20:00:32 primemeter 1132707 prime/h 8806710 test/h
Whereas before I had 2013-07-09 16:01:06 primemeter 191133 prime/h 1230638 test/h 2013-07-09 16:03:07 primemeter 207187 prime/h 1333814 test/h 2013-07-09 16:05:07 primemeter 197670 prime/h 1302659 test/h 2013-07-09 16:07:08 primemeter 226844 prime/h 1538940 test/h 2013-07-09 16:08:09 primemeter 241089 prime/h 1653641 test/h 2013-07-09 16:10:09 primemeter 246644 prime/h 1574452 test/h
So if this helps, feel free to donate ;-) XPM ALLbe86QswwTRvx1LTVnKaVXCz6mcA6YUR BTC 135oF6hF9uUbzq9x1vTuaoqKPab2j513f2
I'm thinking the biggest improvements can be done by a) optimizing the algorithm itself, possible according to TacoTime's post b) minimizing malloc()s and I guess there's still lots of potential. If I had more time I'd try looking into using smaller sieves that fit into CPU cache but doing more sieves (so that the range sieved remains the same)
nice, you building on windows or *nix? building on Ubuntu. Should I get phenomenal results (ahem) I'll try to follow the instructions on how to set up a windows build environment for my windows machines. Processor on that machine is an i7-920 at stock speeds, primecoind mining with 6 threads at nice 20 (the machine also has other stuff to do, like running bfgminer on my fpgas :-) )
|
|
|
|
mustyoshi
|
 |
July 10, 2013, 08:52:25 PM |
|
I just had a transaction confirmed in under a minute.
The effective mining rate of the network is waaaaay faster than it should be.
|
|
|
|
craslovell
Legendary
Offline
Activity: 1470
Merit: 1021
|
 |
July 10, 2013, 08:52:51 PM |
|
Thanks for sharing Mike270  We are now onto the 4th day of mining (74 hours since launch)... Some rough observations: - Day 1 block spacing was in the range 40 - 110 seconds | difficulty 7 - 7.2
- Day 2 block spacing was in the range 10 - 40 seconds | difficulty 7.2 - 7.33
- Day 3 block spacing was in the range 15 - 25 seconds | difficulty 7.4 - 7.6
If you are running the original binary released by Sunny, then you have very little chance of finding a block in less than 15 seconds IMO Yeah I'm watching all the blocks I'm not mining right now
|
|
|
|
Buffer Overflow
Legendary
Offline
Activity: 1652
Merit: 1015
|
 |
July 10, 2013, 08:58:04 PM |
|
I just had a transaction confirmed in under a minute.
The effective mining rate of the network is waaaaay faster than it should be.
The per-block difficulty adjustment seems to lag behind.
|
|
|
|
96redformula
|
 |
July 10, 2013, 08:58:16 PM |
|
Well it seems the little guys are out now. The big machines are up and running with the fixed code  . BTW seems like we have a 10 second gap between blocks now  .
|
|
|
|
mustyoshi
|
 |
July 10, 2013, 09:00:44 PM |
|
Well it seems the little guys are out now. The big machines are up and running with the fixed code  . BTW seems like we have a 10 second gap between blocks now  . My proposed p2pool client can't operate when the network speed is the same as the p2p chain speed is supposed to be.
|
|
|
|
craslovell
Legendary
Offline
Activity: 1470
Merit: 1021
|
 |
July 10, 2013, 09:03:54 PM |
|
Well it seems the little guys are out now. The big machines are up and running with the fixed code  . BTW seems like we have a 10 second gap between blocks now  . My proposed p2pool client can't operate when the network speed is the same as the p2p chain speed is supposed to be. Ahh that's disappointing. Too bad blocks are being found an entire 6 times faster than they're supoosed to be...
|
|
|
|
romerun
Legendary
Offline
Activity: 1078
Merit: 1001
Bitcoin is new, makes sense to hodl.
|
 |
July 10, 2013, 09:08:27 PM |
|
where's the most efficient code ? 8 cores in total 24 hours got nothing. 
|
|
|
|
bidji29
|
 |
July 10, 2013, 09:11:13 PM |
|
For the two for loop, i get that // Weave the sieve for the prime unsigned int nChainLength = TargetGetLength(nBits); for (; nBiTwinSeq < nChainLength; nBiTwinSeq++) { // Find the first number that's divisible by this prime int nDelta = ((nBiTwinSeq % 2 == 0)? (-1) : 1); unsigned int nSolvedMultiplier = ((bnFixedInverse * (p - nDelta)) % p).getuint(); if (nBiTwinSeq % 2 == 1) bnFixedInverse *= bnTwoInverse; // for next number in chain
unsigned int nPrime = vPrimes[nPrimeSeq]; if (nBiTwinSeq < nChainLength) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeBiTwin[nVariableMultiplier] = true; if (((nBiTwinSeq & 1u) == 0)) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham1[nVariableMultiplier] = true; else for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham2[nVariableMultiplier] = true; } for (; nBiTwinSeq < 2 * nChainLength; nBiTwinSeq++) { // Find the first number that's divisible by this prime int nDelta = ((nBiTwinSeq % 2 == 0)? (-1) : 1); unsigned int nSolvedMultiplier = ((bnFixedInverse * (p - nDelta)) % p).getuint(); if (nBiTwinSeq % 2 == 1) bnFixedInverse *= bnTwoInverse; // for next number in chain
unsigned int nPrime = vPrimes[nPrimeSeq]; if (nBiTwinSeq < nChainLength) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeBiTwin[nVariableMultiplier] = true; if (((nBiTwinSeq & 1u) == 0)) for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham1[nVariableMultiplier] = true; else for (unsigned int nVariableMultiplier = nSolvedMultiplier; nVariableMultiplier < nMaxSieveSize; nVariableMultiplier += nPrime) vfCompositeCunningham2[nVariableMultiplier] = true; } i'm not sure of myself on this one
|
|
|
|
|