atc1
|
|
April 25, 2014, 11:29:40 AM |
|
Not too sure about the performance hit .. but I was running some of my miners on Windows using Cygwin.
Um..is there a calculator somewhere that estimates the number of coins/day? Is that even possible? Also,how does running on Cygwin affect the rate? Is it slower or the same and do we need to install anything other than the preinstalled dependencies?
|
|
|
|
otila
|
|
April 25, 2014, 12:46:33 PM |
|
Dave ran out of good names? I get these when not using -std=gnu++11 xptMiner/riecoinMiner.cpp: In function ‘void riecoin_process(minerRiecoinBlock_t*)’: xptMiner/riecoinMiner.cpp:512:37: warning: name lookup of ‘i’ changed mpz_mul_ui(z_temp2, z_temp2, i); ^ xptMiner/riecoinMiner.cpp:438:17: warning: matches this ‘i’ under ISO standard rules for(uint32 i=1; i<riecoin_sieveSize; i++) { ^ xptMiner/riecoinMiner.cpp:486:17: warning: matches this ‘i’ under old rules for (int i = 1; i < 6; i++) { ^
|
|
|
|
gatra (OP)
|
|
April 25, 2014, 12:59:55 PM |
|
Hi Riecoiners!
I'm woking on a new client version that will include changes from Bitcoin 0.8.6 to 0.9.1 besides some Riecoin specific speed improvements. It's a lot of changes, and Bitcoin has changed so much that cherry-picking or rebasing commits results in lots on conflicts that have to be manually merged, that's why it's taking longer than expected, but the good news is that we'll have a much better client soon. There will be a Windows installer, and also 32 and 64 bits versions for Windows. This means block verification will be much faster for Windows users with 64 bits CPUs and OSs.
Regarding the miners, keeping miners updated with all the latest optimizations is a lot of work so I'll focus on an xpt to stratum proxy.
gatra
Thanks Gatra! Would the 0.9.1 include the improvements from dga's xptminer? Also, would there be an independent solo miner that incorporates dga's improvements? or does it make sense to have an xpt to stratum & GetBlockTemplate proxy? I'd rather keep the miner embedded in the client as simple as possible, so it works as a reference of what does the miner calculate. An xpt to GetBlockTemplate proxy is a nice idea, I'll look into it
|
|
|
|
dga
|
|
April 25, 2014, 02:14:49 PM |
|
Hi Riecoiners!
I'm woking on a new client version that will include changes from Bitcoin 0.8.6 to 0.9.1 besides some Riecoin specific speed improvements. It's a lot of changes, and Bitcoin has changed so much that cherry-picking or rebasing commits results in lots on conflicts that have to be manually merged, that's why it's taking longer than expected, but the good news is that we'll have a much better client soon. There will be a Windows installer, and also 32 and 64 bits versions for Windows. This means block verification will be much faster for Windows users with 64 bits CPUs and OSs.
Regarding the miners, keeping miners updated with all the latest optimizations is a lot of work so I'll focus on an xpt to stratum proxy.
gatra
Thanks Gatra! Would the 0.9.1 include the improvements from dga's xptminer? Also, would there be an independent solo miner that incorporates dga's improvements? or does it make sense to have an xpt to stratum & GetBlockTemplate proxy? I'd rather keep the miner embedded in the client as simple as possible, so it works as a reference of what does the miner calculate. An xpt to GetBlockTemplate proxy is a nice idea, I'll look into it For b15, I've stripped out everything but the Riecoin part of xptMiner as a first step towards simplifying the codebase and making it easier to port. There's another possibility - adding stratum support to my miner instead of having two different versions. Let me think about this a little - I'm not sure off the top of my head how hard it will be. Pulling in cgminer to get stratum would be horrible, but maybe there's another way these days?
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
April 26, 2014, 03:04:49 AM |
|
This is excellent news. Getting Riecoin stablie will be the first step towards the moon.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
April 26, 2014, 03:48:34 AM |
|
I think what can help a lot right now are community members posting on this thread daily to keep it on the first page constantly.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
April 26, 2014, 04:16:18 AM |
|
Update: As promised, I've updated mini apps for Android on Google Play for Riecoin. I havn't published all of them yet because I ran across some problems and will wait to fix it until releasing it confidently on Google Play Here is the first mini-app for Riecoin: https://play.google.com/store/apps/details?id=org.riecoinfoundation.paperIt is a paper wallet for android forged from bitcoin paper wallet. This way, you don't need to go to a browser to generate an address. For people using phone to generate addresses, this is a great app to use for Riecoin! It has been tested but as with all services, I cannot warranty anything. I can only promise to fix a bug if one is found.
|
|
|
|
dga
|
|
April 26, 2014, 11:26:17 AM |
|
I've updated the optimized miner to b15. This version currently works only on Linux - I would greatly appreciate some help figuring out what I broke on windows/mingw! I've left the b14 binaries for both linux and windows online. Source and binaries are in the usual spots: ChangeLog: https://github.com/dave-andersen/fastrie/blob/master/ChangeLog Source: https://github.com/dave-andersen/fastrie Binaries: http://www.cs.cmu.edu/~dga/crypto/ric/The basic summary of the below: It uses a lot less memory and is about 15% faster on most platforms. Single-core machines will be unchanged, and on huge machines (64 core) you'll want to run multiple copies, one per processor slot, for best performance. But for most of us on single or dual CPU platforms with 4-24 cores, this should produce a nice speedup. As always, test for yourself. b15 (2013-04-26) - Major internal architectural overhaul. Sieving and primality testing are now divided among all threads instead of having each do a single operation. The current consequence of this is a good speedup on modest-core architectures while using substantially less memory. 4-16 core machines should be particularly happy with this upgrade. Sieves can now be up to -s 4100000000 (4 billion) in size, though this does not appear to be a particularly useful setting from a performance perspective. Single-core machines may suffer a 5-10% slowdown. If this is prohibitive, let me know, but for now I plan to let it stay that way. Very large, slow core machines (e.g., 64 core AMD) are running MUCH slower. Please either continue to use b14 or run multiple copies of the miner, one per physical CPU, using taskset. Windows users must use at least Vista (2006, NT 6.0) or later. XP and Windows Server 2003 are no longer supported.
|
|
|
|
dga
|
|
April 26, 2014, 11:29:03 AM |
|
I've updated the optimized miner to b15. This version currently works only on Linux - I would greatly appreciate some help figuring out what I broke on windows/mingw! I've left the b14 binaries for both linux and windows online.
Btw - you want to run this with a larger sieve. Try various values, but I find that two billion works pretty well ( -s 2000000000 ) I'm boosting the default from 500m to 900m to be conservative, but 2b is typically better.
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
April 26, 2014, 07:35:33 PM |
|
I've updated the optimized miner to b15. This version currently works only on Linux - I would greatly appreciate some help figuring out what I broke on windows/mingw! I've left the b14 binaries for both linux and windows online. Source and binaries are in the usual spots: ChangeLog: https://github.com/dave-andersen/fastrie/blob/master/ChangeLog Source: https://github.com/dave-andersen/fastrie Binaries: http://www.cs.cmu.edu/~dga/crypto/ric/The basic summary of the below: It uses a lot less memory and is about 15% faster on most platforms. Single-core machines will be unchanged, and on huge machines (64 core) you'll want to run multiple copies, one per processor slot, for best performance. But for most of us on single or dual CPU platforms with 4-24 cores, this should produce a nice speedup. As always, test for yourself. b15 (2013-04-26) - Major internal architectural overhaul. Sieving and primality testing are now divided among all threads instead of having each do a single operation. The current consequence of this is a good speedup on modest-core architectures while using substantially less memory. 4-16 core machines should be particularly happy with this upgrade. Sieves can now be up to -s 4100000000 (4 billion) in size, though this does not appear to be a particularly useful setting from a performance perspective. Single-core machines may suffer a 5-10% slowdown. If this is prohibitive, let me know, but for now I plan to let it stay that way. Very large, slow core machines (e.g., 64 core AMD) are running MUCH slower. Please either continue to use b14 or run multiple copies of the miner, one per physical CPU, using taskset. Windows users must use at least Vista (2006, NT 6.0) or later. XP and Windows Server 2003 are no longer supported. Excellent work and thanks for your contributions!
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
April 26, 2014, 07:39:01 PM |
|
Hey Gatra,
will the next client fix OpenSSL problem?
|
|
|
|
coinfusion
|
|
April 26, 2014, 08:43:28 PM |
|
Just compiled b15 with msys, using 64bit gcc4.8.2 and gmp6 -- 2 shares found so far in the 5 minutes it's been running on win7sp1. I see that with b15 the startup message says "using 4+1 cpu threads". I'm wondering how the threads should be pinned on multi-socket machines. Does it now have 4 worker threads and 1 synchronizer/memory-handling thread ? Or can the extra 1 thread be run on a separate processor than the workers without too much any penalty? Will benchmark later to find out.
|
|
|
|
aamarket
|
|
April 26, 2014, 08:45:46 PM |
|
hi, Dave, thanks a lot for updated miner ! I tested b15 compiled on Linux, but it seems to be slower than b14. it says using 4+1 cpu threads - is that right ? I tested different sieve sizes, but after couple minutes, everything was slower than b14 ... any idea what shall I try ?
|
IMPORTANT:http://bitcointalk.org/index.php?topic=177133.0,Tips welcome BTC:1AAMARKETmJvfjDwEFmhyYYwfre7ZFVseP RIC:RGnX6LcJrsVEuYeySDDxkmH7AjRqoprcKt
|
|
|
dga
|
|
April 26, 2014, 08:49:10 PM |
|
hi, Dave, thanks a lot for updated miner ! I tested b15 compiled on Linux, but it seems to be slower than b14. it says using 4+1 cpu threads - is that right ? I tested different sieve sizes, but after couple minutes, everything was slower than b14 ... any idea what shall I try ?
Define slower. If you're running with the default sieve on both b14 and b15, you will see fewer 2ch/s and more 4ch/s because of the sieve size increase from 500m to 900m. Only 4ch matter. What CPU? Also, be sure you're comparing at the same diff. Yes, to both previous posts: it's using 5 threads if you say -t 4. One of the threads is a master thread that runs at about 15% load. It may be better or worse to use 1 fewer thread; I haven't compared too extensively.
|
|
|
|
dga
|
|
April 26, 2014, 08:51:44 PM |
|
Just compiled b15 with msys, using 64bit gcc4.8.2 and gmp6 -- 2 shares found so far in the 5 minutes it's been running on win7sp1. I see that with b15 the startup message says "using 4+1 cpu threads". I'm wondering how the threads should be pinned on multi-socket machines. Does it now have 4 worker threads and 1 synchronizer/memory-handling thread ? Or can the extra 1 thread be run on a separate processor than the workers without too much any penalty? Will benchmark later to find out.
On multi-socket, if you have the memory for it, I'd run one process per socket, with N+1 threads on each socket (where N is the number of cores). There's some coarse-grained sharing between threads. I haven't found it to be a problem on a 2 socket machine, but on a huge AMD with 4 sockets and 8 different NUMA domains, things got bad. Interesting that it works on windows. Gives me hope that it's a mingw or something bug, not a "my use of critical sections or cond vars is wrong" bug. This is *not* mingw, right? Is there some way I should be doing a compile for windows peeps other than mingw on linux? I'd love to provide an official windows binary.
|
|
|
|
coinfusion
|
|
April 26, 2014, 09:23:54 PM |
|
On multi-socket, if you have the memory for it, I'd run one process per socket, with N+1 threads on each socket (where N is the number of cores).
There's some coarse-grained sharing between threads. I haven't found it to be a problem on a 2 socket machine, but on a huge AMD with 4 sockets and 8 different NUMA domains, things got bad.
Interesting that it works on windows. Gives me hope that it's a mingw or something bug, not a "my use of critical sections or cond vars is wrong" bug. This is *not* mingw, right? Is there some way I should be doing a compile for windows peeps other than mingw on linux? I'd love to provide an official windows binary.
Msys is a unix-like environment for windows that uses mingw compilers, so my build isn't a cross-compile. (Still working fine for over 40 minutes now, seems very slightly faster with 4cores). Hopefully the mingw linux cross-compile environment will be updated soon after gcc4.9 is released, which might fix whatever bug you're running into. It's using only 30% of the memory that b14 does with the same -s setting ??
|
|
|
|
|
coinfusion
|
|
April 26, 2014, 09:49:22 PM |
|
Point of information for my windows compile: with my normal -s value of 400000000 it works fine, but when I cranked it up to 800m it crashed in under 1 minute. Will run overnight with 400m and report back tomorrow.
|
|
|
|
aamarket
|
|
April 26, 2014, 09:53:38 PM |
|
well, most people including me do not have access to university resources With my normal one socket i5-4670 and 8GB RAM running fastrie/xptMiner/xptminer -s 700000000 -t 4 it says [00:08:40] 2ch/s: 28.6247 3ch/s: 1.8826 4ch/s: 0.0630 Shares total: 8 / 8 with b15 - and [00:08:40] 2ch/s: 29.2391 3ch/s: 1.8196 4ch/s: 0.0630 Shares total: 8 / 8 the difference seems small, but using bigger sieve sizes it is worse. I understand only 4ch matters, I try to let it run for longer with sieve 19e8 and post the result. Also regarding diff - It is around 1700 now- but I can hardly force it to be the same It may be nice to display better metrics not 2ch/s ... but with diff correction as well ... e.g. if we have X [2ch/s ] use something like X * math.e ** (1.0*diff / 1e4) [2ch*b/s] just wild guess
|
IMPORTANT:http://bitcointalk.org/index.php?topic=177133.0,Tips welcome BTC:1AAMARKETmJvfjDwEFmhyYYwfre7ZFVseP RIC:RGnX6LcJrsVEuYeySDDxkmH7AjRqoprcKt
|
|
|
|
|