Bitcoin Forum
April 10, 2021, 09:59:03 PM *
News: Latest Bitcoin Core release: 0.21.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [31] 32 33 34 35 36 37 »
  Print  
Author Topic: [XPM] Primecoin Built-in Miner Sieve Performance Issue  (Read 69059 times)
redphlegm
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


My spoon is too big!


View Profile
July 12, 2013, 11:43:17 PM
 #601

I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?

Whiskey Fund: (BTC) 1whiSKeYMRevsJMAQwU8NY1YhvPPMjTbM | (Ψ) ALcoHoLsKUfdmGfHVXEShtqrEkasihVyqW
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
LazyOtto
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
July 12, 2013, 11:44:03 PM
 #602

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?
yes
Chemisist
Member
**
Offline Offline

Activity: 99
Merit: 10



View Profile
July 12, 2013, 11:44:20 PM
 #603

Alright, so just updated my version (currently on github) such that each thread an independent evolving weave timing parameter.  To compare to mine with Sunny's most recent update, I used the testnet where my version found 30 confirmed blocks in 10 minutes while the original code found 16 confirmed blocks.  I feel that this is a legitimate comparison because there were no other nodes on the test net currently mining (I know this because my client found every continuous block in both cases).  This comparison was performed with a t61p IBM laptop with a T9300 Core 2 Duo processor.  The current difficulty on the testnet is 5.4426.  
Why make it a weave timing parameter and not just a weave count parameter? I think that would be a better metric, as a change in CPU load means the timing parameter's results will change a lot.

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..

[image]

Can you share your source code?  Did you modify Sunny's algorithm at all?
I think the biggest change in Luke's miner is that it moves the bnTwoInverse calculation out of Weave() and just pre-calculates it for all of the primes in GeneratePrimeTable(). I didn't get much more performance out of porting that change though to primecoin but I didn't check too hard.

Thanks for the update on Luke's code.

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.

btc 1ChemaH12nRmd75M8BmPSiqd8x7B2wxFNF     ltc LaWX7jgJDyQ2oFaQYJvo5kqC1e1KYPoCfd     xpm Ab8NSgxHgGUJvHgSHYqMYBMWai6ZdsA91s
K1773R
Legendary
*
Offline Offline

Activity: 1792
Merit: 1008


/dev/null


View Profile
July 12, 2013, 11:45:20 PM
 #604

some of us on #eligius-prime were able with lukes help and others to get it running.. now im just waiting to see if i can actually get a block..
try testnet for tests!  Cheesy
./primecoind stop
./primecoind -testnet
i mined some blocks in -testnet in some minutes:
http://pastebin.com/GN1fafrm

[GPG Public Key]  [Devcoin Builds]  [BBQCoin Builds]  [Multichain Blockexplorer]  [Multichain Blockexplorer - PoS Coins]  [Ufasoft Miner Linux Builds]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN
Chemisist
Member
**
Offline Offline

Activity: 99
Merit: 10



View Profile
July 12, 2013, 11:46:29 PM
 #605

I think he using Chemisist 2nd release that he posted about last page
But he is actually pointing out something more interesting.

With proclimit == number-of-cores the cpu utilization will be 100%.

With proclimit > number-of-cores the same amount of cpu is being used, but the reported pps is higher.
That's because less time is spent weaving with the threads fighting each other, and more false-positives are counted by the primespersec value.

So you say leaving setgenerate value on its default which is -1 is the best way to observe the real pps?

I think Chemisist was checking the solved block rate on testnet over a 10-minute period. Have those results from the overthreading been tallied?

I just ran one and with 40 threads on 8 cores giving me 61/62 confirmations over 10 minutes.  There might be a maximum between 8 and 40, but I don't have the time right now to figure it out.  Some friends just arrived so I am going to have to make an exit for the evening, unfortunately.  I'll check in later tonight (maybe) or definitely tomorrow.

btc 1ChemaH12nRmd75M8BmPSiqd8x7B2wxFNF     ltc LaWX7jgJDyQ2oFaQYJvo5kqC1e1KYPoCfd     xpm Ab8NSgxHgGUJvHgSHYqMYBMWai6ZdsA91s
AgentME
Member
**
Offline Offline

Activity: 84
Merit: 10


View Profile
July 12, 2013, 11:49:32 PM
 #606

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.
No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().
urubu
Member
**
Offline Offline

Activity: 87
Merit: 10


View Profile
July 12, 2013, 11:50:04 PM
 #607

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?
Chemisist
Member
**
Offline Offline

Activity: 99
Merit: 10



View Profile
July 12, 2013, 11:53:22 PM
 #608

The rationale for the weave timing parameter instead of the weave count parameter is because the weave timing parameter requires only a call to "GetTimeMicros()" whereas determining the weave count parameter is far more intensive to calculate.  To calculate it requires looping through all three arrays to find the values that are still false:

from prime.h:

Code:
   unsigned int GetCandidateCount()
    {
        unsigned int nCandidates = 0;
        for (unsigned int nMultiplier = 0; nMultiplier < nSieveSize; nMultiplier++)
        {
            if (!vfCompositeCunningham1[nMultiplier] ||
                !vfCompositeCunningham2[nMultiplier] ||
                !vfCompositeBiTwin[nMultiplier])
                nCandidates++;
        }
        return nCandidates;
    }

the vfComposite arrays are all start out as "0" and when a value is found that is not a prime compatible with Sunny's algorithm, that value gets set to 1.  All vfComposite arrays are nMaxSieveSize in length:

Code:
static const unsigned int nMaxSieveSize = 1000000u;

So to calculate a weave count parameter requires 3 million boolean tests, plus the costs of a loop plus the increment operator for each time the if statement returns true.
No, I meant only a counter of how many times the Weave() function is called, not related to GetCandidateCount().

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls

btc 1ChemaH12nRmd75M8BmPSiqd8x7B2wxFNF     ltc LaWX7jgJDyQ2oFaQYJvo5kqC1e1KYPoCfd     xpm Ab8NSgxHgGUJvHgSHYqMYBMWai6ZdsA91s
anonppcoin
Newbie
*
Offline Offline

Activity: 48
Merit: 0


View Profile
July 12, 2013, 11:55:42 PM
 #609

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.
redphlegm
Sr. Member
****
Offline Offline

Activity: 246
Merit: 250


My spoon is too big!


View Profile
July 12, 2013, 11:56:37 PM
 #610

My latest Windows builds. From Chemisist source:

Tuned for Sandy and Ivy Intel Core processors (AVX), O3:

https://www.dropbox.com/s/18bgecwqzsmwsh2/primecoin0712v2-avx.zip


Ivy Bridge ONLY build:

https://www.dropbox.com/s/f7fu0u0yk4i09il/primecoin0712v2-ivyonly.zip

XPM: AR2BpBnitqXudN67Ncuc9FfYVT8u9jNe7a

Would your ivy bridge build be best for haswell?

The Ivy Bridge build will work well on Haswell. It doesn't have every instruction set available on Haswell, but most. I am probably done compiling for the night (yay, Friday!) but maybe another kind soul will build you a core-avx2 optimized daemon.

How about my outdated Nehalem?

Whiskey Fund: (BTC) 1whiSKeYMRevsJMAQwU8NY1YhvPPMjTbM | (Ψ) ALcoHoLsKUfdmGfHVXEShtqrEkasihVyqW
AgentME
Member
**
Offline Offline

Activity: 84
Merit: 10


View Profile
July 13, 2013, 12:00:09 AM
 #611

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.
Chemisist
Member
**
Offline Offline

Activity: 99
Merit: 10



View Profile
July 13, 2013, 12:04:59 AM
 #612

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.


btc 1ChemaH12nRmd75M8BmPSiqd8x7B2wxFNF     ltc LaWX7jgJDyQ2oFaQYJvo5kqC1e1KYPoCfd     xpm Ab8NSgxHgGUJvHgSHYqMYBMWai6ZdsA91s
tadakaluri
Hero Member
*****
Offline Offline

Activity: 616
Merit: 500



View Profile WWW
July 13, 2013, 12:07:32 AM
 #613

Updated windows build using the new Chemisist source. Tuned for Intel Sandy and Ivy Bridge but compatible with other architecture.

https://www.dropbox.com/s/4k0xmuajxf5i4ly/primecoin0712v3-avx.zip

I'm seeing lower PPS than my v2 builds but I think that weaving will be better overall.

How to use it?  Over write Installed files? or use from the downloaded folder itself?
fabrizziop
Hero Member
*****
Offline Offline

Activity: 506
Merit: 500



View Profile
July 13, 2013, 12:17:37 AM
 #614

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.



I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

https://www.dropbox.com/s/vx9wnzfws4zttg8/primecoin-chemisist-mod-v2-o2-amd.rar

it should run on most recent cpus.
Chemisist
Member
**
Offline Offline

Activity: 99
Merit: 10



View Profile
July 13, 2013, 12:25:14 AM
 #615

I don't call the Weave() function over and over and over like Sunny King's.  I call it once and then have a for loop inside the function, to eliminate the overhead of continuous function calls
A little refactoring shouldn't stop a counter from being used instead of a timer.

I haven't thought about doing it this way tbh.



I'm getting over 1600 PPS with the new version! Are they for real or what?. I just compiled with -O2 -march=native.

Running mine versus the original on testnet shows that I mine 30 versus 16 with the original client in 10 minutes on a Core 2 Duo t9300.  Running on an i7-950 on testnet generates 97 with mine and 81 with the original.

btc 1ChemaH12nRmd75M8BmPSiqd8x7B2wxFNF     ltc LaWX7jgJDyQ2oFaQYJvo5kqC1e1KYPoCfd     xpm Ab8NSgxHgGUJvHgSHYqMYBMWai6ZdsA91s
romerun
Legendary
*
Offline Offline

Activity: 1078
Merit: 1001


Bitcoin is new, makes sense to hodl.


View Profile
July 13, 2013, 12:26:33 AM
 #616

although I get more pps from chemisis, but I have yet found a block since switching from the 1.1, it's been like 8 hours from 18 cores...
PoolMinor
Legendary
*
Offline Offline

Activity: 1730
Merit: 1125


XXXVII Fnord is toast without bread


View Profile
July 13, 2013, 12:50:49 AM
 #617

This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Code:

13:48:22

"blocks" : 23683,
"generate" : true,
"genproclimit" : 3,
"primespersec" : 439,
16:39:46
"blocks" : 24634,

"generate" : true,
"genproclimit" : 3,
"primespersec" : 409,


16:39:58
?
setgenerate true 30

16:40:40
?
getmininginfo


16:40:40
?
{
"blocks" : 24639,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}


16:40:55
?
getmininginfo


16:40:55
?
{
"blocks" : 24641,
"currentblocksize" : 1000,
"currentblocktx" : 0,
"errors" : "",
"generate" : true,
"genproclimit" : 30,
"primespersec" : 624,
"pooledtx" : 0,
"testnet" : false
}

16:41:34
?
getprimespersec

16:41:34
?
903

17:20:25
?
getprimespersec


17:20:25
?
1043

17:21:35
?
getprimespersec


17:21:35
?
1073


Btc=C2MF    Go With FLO (public persistent storage)     Free BTC Poker
Being defeated is often a temporary condition. Giving up is what makes it permanent. -Marilyn vos Savant
tinnvec
Newbie
*
Offline Offline

Activity: 54
Merit: 0



View Profile
July 13, 2013, 12:58:38 AM
 #618

This is my AMD Phenom II 710 X3 Unleashed to 4 cores.

Looks right in line, my AMD Phenom II X4 920 is sitting around 1250 primes/sec

I also run on linux, so I thought I'd share my little bash startup script in case others can use it:
Code:
#!/bin/sh
cd [INSERT PATH TO PRIMECOIND HERE]
./primecoind --daemon
watch './primecoind getbalance ; ./primecoind getmininginfo'
kill -9 $(pidof primecoind)

This'll give you a little readout to watch your balance and miner info, when you quit (ctrl+c), it will then kill the primecoind process for you
drummerjdb666
Full Member
***
Offline Offline

Activity: 244
Merit: 101



View Profile
July 13, 2013, 01:10:09 AM
 #619

should I try using more threads than 8?  seems my 3770k won't go higher than 1700pps.. which is nice considering when we started i was originally getting 400..   I can't seem to get the 2 or 3k other people are showing from their 3770k's     using the ivy only build.. on win7...   tried to compile on ubunut though my vm but it seems I fail or are using the wrong distro
PoolMinor
Legendary
*
Offline Offline

Activity: 1730
Merit: 1125


XXXVII Fnord is toast without bread


View Profile
July 13, 2013, 01:13:48 AM
 #620

My point in showing those high PPS was to confirm or deny whether they had any bearing on finding more blocks or not. I have not found any blocks in the last 18 hours, only 5 total since start anyway. I made the higher thread count change within the last 3 hours and have not seen any difference, other than a higher number to look at.

Btc=C2MF    Go With FLO (public persistent storage)     Free BTC Poker
Being defeated is often a temporary condition. Giving up is what makes it permanent. -Marilyn vos Savant
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [31] 32 33 34 35 36 37 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!