dga
|
|
May 15, 2014, 03:21:40 PM |
|
ubuntu should work fine with ./buildAll
fedora required a bit of help (see history here).
Also a small progress with ARM miner, investigation led me to the part of code, I already marked with "?" long time ago when trying to understand some old miner ...
having
uint32* powHashU32 = (uint32*)powHash;
for(uint32 i=0; i<256; i++) { mpz_mul_2exp(z_target, z_target, 1); if( (powHashU32[i/32]>>(i))&1 ) z_target->_mp_d[0]++; }
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thank you. Testing and thinking hard about this - this was part of the code I inherited. I'm going to run it past jh as well, because it likely reflects a bug in the base xptMiner as well. I'll commit this fix tomorrow if all is good. -Dave
|
|
|
|
dga
|
|
May 15, 2014, 04:35:09 PM |
|
ubuntu should work fine with ./buildAll
fedora required a bit of help (see history here).
Also a small progress with ARM miner, investigation led me to the part of code, I already marked with "?" long time ago when trying to understand some old miner ...
having
uint32* powHashU32 = (uint32*)powHash;
for(uint32 i=0; i<256; i++) { mpz_mul_2exp(z_target, z_target, 1); if( (powHashU32[i/32]>>(i))&1 ) z_target->_mp_d[0]++; }
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thank you. Testing and thinking hard about this - this was part of the code I inherited. I'm going to run it past jh as well, because it likely reflects a bug in the base xptMiner as well. I'll commit this fix tomorrow if all is good. -Dave Looks good from here. Committed now and will back it out if jh says it's wrong. Thanks again for spotting this.
|
|
|
|
gatra (OP)
|
|
May 15, 2014, 06:44:10 PM |
|
ubuntu should work fine with ./buildAll
fedora required a bit of help (see history here).
Also a small progress with ARM miner, investigation led me to the part of code, I already marked with "?" long time ago when trying to understand some old miner ...
having
uint32* powHashU32 = (uint32*)powHash;
for(uint32 i=0; i<256; i++) { mpz_mul_2exp(z_target, z_target, 1); if( (powHashU32[i/32]>>(i))&1 ) z_target->_mp_d[0]++; }
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thank you. Testing and thinking hard about this - this was part of the code I inherited. I'm going to run it past jh as well, because it likely reflects a bug in the base xptMiner as well. I'll commit this fix tomorrow if all is good. -Dave Looks good from here. Committed now and will back it out if jh says it's wrong. Thanks again for spotting this. I found this here: http://stackoverflow.com/questions/3394259/weird-behavior-of-right-shift-operatorThe logical right shift (SHR) behaves like a >> (b % 32/64) on x86/x86-64 (Intel #253667, Page 4-404):
The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.
However, on ARM (armv6&7, at least), the logical right-shift (LSR) is implemented as (ARMISA Page A2-6)
(bits(N), bit) LSR_C(bits(N) x, integer shift) assert shift > 0; extended_x = ZeroExtend(x, shift+N); result = extended_x<shift+N-1:shift>; carry_out = extended_x<shift-1>; return (result, carry_out); where (ARMISA Page AppxB-13)
ZeroExtend(x,i) = Replicate('0', i-Len(x)) : x This guarantees a right shift of ≥32 will produce zero. For example, when this code is run on the iPhone, foo(1,32) will give 0.
These shows shifting a 32-bit integer by ≥32 is non-portable. So ">> i" may run faster than ">> (i % 32)" in x86 or x86_64 because the % is optimized out, but is not a good idea because it's not portable and also >> with values larger than the operand size are undefined according to the C standard. Since in the miner this loop is done only once for each search of the 256bit nonce, you can do i%32 without any harm. Maybe a logical AND instead of the % would be faster? of course you should profile instead of believeing me but I think optimizing this is not worth the trouble. gatra
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
May 15, 2014, 07:17:16 PM |
|
Update Thursday here once again. Any news Gatra?
|
|
|
|
jh00
Newbie
Offline
Activity: 39
Merit: 0
|
|
May 15, 2014, 07:32:05 PM |
|
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thanks for spotting this. I have committed a fix to the xptMiner repository. The performance difference should be insignificant on x64/x86. I also know there are some other places in the xpt source where memory alignment can make problems. Especially all the xptPacketbuffer_* functions have unaligned read/write access which is supported on x86/x64 but not on other platforms, but I don't know if it matters on ARM.
|
|
|
|
dga
|
|
May 16, 2014, 01:36:53 AM |
|
(trimmed awesome find from the architecture manual)
So ">> i" may run faster than ">> (i % 32)" in x86 or x86_64 because the % is optimized out, but is not a good idea because it's not portable and also >> with values larger than the operand size are undefined according to the C standard. Since in the miner this loop is done only once for each search of the 256bit nonce, you can do i%32 without any harm. Maybe a logical AND instead of the % would be faster? of course you should profile instead of believeing me but I think optimizing this is not worth the trouble.
gatra
That explains it - thanks. No optimization needed - the compiler will turn %32 into an AND mask anyway, and that part of the code isn't particularly performance critical right now. I'll start going through and slowly cleaning up a few more of these, in addition to any that any of you spot. Thanks again for the bug spotting!
|
|
|
|
gatra (OP)
|
|
May 16, 2014, 04:04:03 AM |
|
After the release of client version 0.9.1 with many new cool features, this was a slow week (not much news, been very busy with other work) for me for Riecoin.
I didn't have time to look at the android wallet yet. Also work done by aamarket on the ARM miner looks primising. I plan to borrow a Mac this weekend and release an OSX version of the 0.9.1 client, and I'll keep working on stratum. Also I'll write down the math for expected pool shares vs blocks.
Thanks and stay tuned, gatra
|
|
|
|
ryen123
|
|
May 16, 2014, 07:09:57 PM |
|
@Dga - Will there be a windows version of the b15 xptminer?
|
|
|
|
dga
|
|
May 16, 2014, 07:23:10 PM |
|
@Dga - Will there be a windows version of the b15 xptminer?
Yup. My semester just ended, so I'll have some time to start thinking about it and try to squish the bug. I think I need to get a working windows VM setup first, though, unless there's a way to get a newer gcc working on mingw under Linux. That's the first problem - and then there's clearly some kind of other bug with the sieve. Any linux/mingw experts have a solution to get something newer than 4.6.x running?
|
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
May 17, 2014, 03:17:22 AM |
|
I agree, dga, and I know, but ... just few lines below ifdef ... else ... *(uint32*)(nOffset+d*4) = z_temp2->_mp_d[d]; and it was in the code since the beginning. But to be sure, I'll double check. I know (u)intXX_t is the right way, I already burned myself doing data exchange x64<->mips32 long time ago I think I replaced some in some previous miner version, doing cleanup as well, but your version is far more superior, so I ditched old changes. regarding win - that was exactly what I was thinking but no results yet apart from "uninitialized values" seem to be in the libraries and/or old kernel (so no apparent problem in the code), killed in valgrind soon because with sieve=9e8 its close to my memory limit, and that I checked couple returned triplets, they seem to be primes (like p - 4, p - 6, p - 16 ; p - 4, p - 12, p - 16 ; p - 4, p - 10, p - 16 ; ....) Which points us to your original point I'll post some results soon, but the speed is horrible and it does not make sense to run on this ARM architecture I run through 32bits of candidates in ~28 seconds (odorid-xu lite). Best speed up trick on ARM is to avoid division (or mod) most ARM CPUs emulate div in SW. I've not spent much (any) time on mine for a few weeks, but the power profile is magic. Waiting for the minnow board MAX, should do wonderful things for embedded miners! -- bsunau7
|
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
May 17, 2014, 03:22:43 AM |
|
I can confirm above mentioned change helped, ARM riecoin miner delivered first 4 primes share (after quite a long time) : p4=0x803a9a745c512a58eaceaafd83f4259ffd0e9bcec9d306484a0e0a7944efc5762b * 2**1452 + 0x32e238469041bc6ef8edc6dcd5d872f664bcbd5ae7e76fd1385ec97334d4c0f [00:54:07] Share found! (Blockheight: 54252) ====xptShare:: algo=7,ver=2,nTime=1399991211,nBits=33993728,userExtraNonceLength=4 xptShare->prevBlockHash :: dc 4c 98 04 9e ac af 34 2e a5 4a 7b 64 88 07 5a 64 1c 1b 0b c4 f8 2f c8 33 7f 0d 3b f0 21 87 75 xptShare->merkleRoot :: e9 e6 b4 09 ad 1c 4e 87 8b 10 7b 6e 48 08 23 1d 5f 1a 4e c4 e5 bb ad c2 22 f6 a5 fc 1b 9d 45 91 xptShare->merkleRootOriginal :: 7a 7c e8 0d 6d 07 91 7a e8 40 60 55 0f 46 09 e2 86 9a 49 ca ae 4e e6 28 29 13 d3 4d b9 a8 19 d2 xptShare->userExtraNonceData :: 12 00 00 00 xptShare->riecoin_nOffset :: 0f 4c 4d 33 97 ec 85 13 fd 76 7e ae d5 cb 4b 66 2f 87 5d cd 6d dc 8e ef c6 1b 04 69 84 23 2e 03 1716[00:54:08] 2ch/s: 0.7239 3ch/s: 0.0454 4ch/s: 0.0013 Shares total: 1 / 1 "Bus error" is still present, so a watchdog had to be implemented, but hey, now everybody can mine coins with their android cell phone Check for un-initialized (or reused) variables. I found that the optimizer in gcc would get horrendously confused unless everything was initialized, code which would work without optimization would crash with -O2... (and the re-ordering with -O2 is fun to debug). Regards, -- bsunau7
|
|
|
|
bsunau7
Member
Offline
Activity: 114
Merit: 10
|
|
May 17, 2014, 03:44:29 AM |
|
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thanks for spotting this. I have committed a fix to the xptMiner repository. The performance difference should be insignificant on x64/x86. I also know there are some other places in the xpt source where memory alignment can make problems. Especially all the xptPacketbuffer_* functions have unaligned read/write access which is supported on x86/x64 but not on other platforms, but I don't know if it matters on ARM. Yes it does, one fix I had to deploy summarized in: https://bitcointalk.org/index.php?topic=424517.msg4883101#msg4883101Regards, -- bsunau7
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
May 18, 2014, 07:45:25 AM |
|
bumpity
|
|
|
|
primer10
|
|
May 18, 2014, 03:03:41 PM |
|
Diff can't seems to go pass 1761..
|
|
|
|
voingiappone
Newbie
Offline
Activity: 43
Merit: 0
|
|
May 19, 2014, 01:26:14 AM |
|
Hey you guys!
Just wanted to drop a line to say I jumped on Riecoin! I find the concept very interesting and hope it will get some attention sooner or later.
I've been doing some mining with an 8 cores 2.6GHz Xeon and I'm getting really few coins. It's about 0.5 RIC per couple of days... is that normal or am I missing something big? Mining in ypool with the optimized miner found on the pool.
Another thing: I used the cli version of the wallet since I started mining but than I found the binary of the qt client and is sycing now. It is taking ages... is that normal? Any upgraded node list?
Thx for the help and keep up the good work!
|
|
|
|
zhonghao110
Full Member
Offline
Activity: 224
Merit: 100
goog
|
|
May 19, 2014, 01:39:47 AM |
|
Interesting coins, looks cheap, look
|
|
|
|
Bigtruck45
Newbie
Offline
Activity: 37
Merit: 0
|
|
May 19, 2014, 06:16:36 AM |
|
I've updated the optimized miner to b15. This version currently works only on Linux - I would greatly appreciate some help figuring out what I broke on windows/mingw! I've left the b14 binaries for both linux and windows online. Source and binaries are in the usual spots: ChangeLog: https://github.com/dave-andersen/fastrie/blob/master/ChangeLog Source: https://github.com/dave-andersen/fastrie Binaries: http://www.cs.cmu.edu/~dga/crypto/ric/The basic summary of the below: It uses a lot less memory and is about 15% faster on most platforms. Single-core machines will be unchanged, and on huge machines (64 core) you'll want to run multiple copies, one per processor slot, for best performance. But for most of us on single or dual CPU platforms with 4-24 cores, this should produce a nice speedup. As always, test for yourself. b15 (2013-04-26) - Major internal architectural overhaul. Sieving and primality testing are now divided among all threads instead of having each do a single operation. The current consequence of this is a good speedup on modest-core architectures while using substantially less memory. 4-16 core machines should be particularly happy with this upgrade. Sieves can now be up to -s 4100000000 (4 billion) in size, though this does not appear to be a particularly useful setting from a performance perspective. Single-core machines may suffer a 5-10% slowdown. If this is prohibitive, let me know, but for now I plan to let it stay that way. Very large, slow core machines (e.g., 64 core AMD) are running MUCH slower. Please either continue to use b14 or run multiple copies of the miner, one per physical CPU, using taskset. Windows users must use at least Vista (2006, NT 6.0) or later. XP and Windows Server 2003 are no longer supported.
|
|
|
|
Bigtruck45
Newbie
Offline
Activity: 37
Merit: 0
|
|
May 19, 2014, 06:17:35 AM |
|
Hey you guys!
Just wanted to drop a line to say I jumped on Riecoin! I find the concept very interesting and hope it will get some attention sooner or later.
I've been doing some mining with an 8 cores 2.6GHz Xeon and I'm getting really few coins. It's about 0.5 RIC per couple of days... is that normal or am I missing something big? Mining in ypool with the optimized miner found on the pool.
Another thing: I used the cli version of the wallet since I started mining but than I found the binary of the qt client and is sycing now. It is taking ages... is that normal? Any upgraded node list?
Thx for the help and keep up the good work!
Use the b14 or b15 code linked in this thread. its much faster. Source and binaries are in the usual spots: ChangeLog: https://github.com/dave-andersen/fastrie/blob/master/ChangeLog Source: https://github.com/dave-andersen/fastrie Binaries: http://www.cs.cmu.edu/~dga/crypto/ric/
|
|
|
|
voingiappone
Newbie
Offline
Activity: 43
Merit: 0
|
|
May 19, 2014, 01:32:55 PM |
|
Bigtruck45, thx for the reply. I think I'm using the b15 version of the dga miner.... So, as I was supposing, I have some other problem... Maybe something with the pool. I'm on ypool. Well, I'll keep mining this way for now. Mining Riecoin=it is evident that I don't do this for profit
|
|
|
|
northranger79510
Sr. Member
Offline
Activity: 308
Merit: 250
Riecoin and Huntercoin to rule all!
|
|
May 19, 2014, 05:44:40 PM |
|
Hey you guys!
Just wanted to drop a line to say I jumped on Riecoin! I find the concept very interesting and hope it will get some attention sooner or later.
I've been doing some mining with an 8 cores 2.6GHz Xeon and I'm getting really few coins. It's about 0.5 RIC per couple of days... is that normal or am I missing something big? Mining in ypool with the optimized miner found on the pool.
Another thing: I used the cli version of the wallet since I started mining but than I found the binary of the qt client and is sycing now. It is taking ages... is that normal? Any upgraded node list?
Thx for the help and keep up the good work!
Excellent choice. Riecoin's price right now is where Litecoin, Primecoin, and Namecoin once were
|
|
|
|
|