Show Posts
|
Pages: « 1 2 3 4 [5] 6 »
|
Riecoin appears to be a useless shitcoin, unless someone can prove to me why it isn't. I checked its exchange rate 2 days ago, and it'd take about 40,000 Riecoins to equal one BitCoin. That sucks.
I got signed up on ypool and am using the xptminer on several of my PCs to CPU mine Riecoin.
With a total of 5 CPU cores, split between i3, Athlon, and Celeron CPUs, I have gotten 3-1/2 Riecoins in a day and a half.
I am not impressed, to say the least. I thought it'd be cool to have my CPUs mine something, like the early days of BitCoin, but to have gained less than 1/10,000th of a BTC (about 6 cents) after a day and a half isn't good.
I'm not happy either that I've not made a fortune mining coins! However just because this coin isn't your path way to riches does not make it a bad coin. My definition of a shitcoin is a re-branded clone (how many scrypt coins are there all using the same miner and code base?!) this is not one of them. The biggest issue (and I think this is the case with most CPU coins) is that people will find a way to either mine at zero cost or develop a private miner which remains profitable. Right now the best way to mine is for heating ( http://www.pugetsystems.com/labs/articles/Gaming-PC-vs-Space-Heater-Efficiency-511/) and all that does is offset your power bill somewhat (and this is what my GPU does when I am working from home). Sorry this coin isn't for you. Regards, -- bsunau7
|
|
|
(**) It took me about $50 of virtual server to mine about 100 ric over the weekend. People mining are doing it at a loss, not paying for hardware/power or are using miners which are 40x more efficient that dga's. Only ypool.net can tell if there is a super miner.
Just picked 3 random(ish) blocks from ypool, 5 users are (consistently) claiming 35% of all coins, with the top earner claiming 10%. To get that 10% 1 person would need the equivalent of ~2000 Xeon v2 (@2.7GHz) cores. Regards, -- bsunau7
|
|
|
Hi! I've been a little sick (just a flu) and I didn't touch a keyboard in like 4 days (!) that's weird for me... anyway, I'm still working but I didn't make much progress lately.
One idea I've been fantasizing about: we could make Riecoin2, based on the new ring signature anonymous coins (Monero/Bytecoin). It would be a fork of their code but using out PoW. There should be a way to transition from RIC to RIC2. Maybe we could have a burn period where we would burn RIC to transform them into RIC2, or I could just premine a lot of RIC2, sell them for at 1 RIC on an exchange and after a while just burn the unsold ones. The latter would be easier than monitoring both chains.
What do you think? those who don't like it would be allowed to continue in the current RIC, and the fact that you would be able to use it to buy RIC2 at a fixed price (for a while) would make the price (of RIC) go up. Or would it hold RIC2 down? who knows....
I don't believe that 100% anonymous coins can exist(*) and re-tasking for a pipe dream probably isn't worth the risk. However as a PR campaign it might have some merit. On other items; I don't think we can avoid a GPU miner for Riecoin just depends on the economics. Currently a GPU miner would need to be about 40 times more efficient(**) than dga's miner, a bit of an ask right now. I can't see any way of making this coin RAM dependent, larger memories optionally allow larger (and more efficient) sieves, but even then we are still well within a modern GPU's memory capacities. (*) at some point you will want to do "something" with your coins at which point you are no longer anonymous. Also don't underestimate peoples ingenuity.. (**) It took me about $50 of virtual server to mine about 100 ric over the weekend. People mining are doing it at a loss, not paying for hardware/power or are using miners which are 40x more efficient that dga's. Only ypool.net can tell if there is a super miner. Regards, -- bsunau7
|
|
|
Hi all Riecoiners !
@bsunau7 Android port of DGA's miner was just a side note, I prefer Linux to Android.
I checked some net info about FPGA modular exponentation performance, nothing interesting or too expensive, but possibly somebody has better info.
I would tend to agree with you on the FPGA; the (highly) variable difficulty (integer length) or riecoin would make anything which operated on the whole integer very difficult/expensive on an FPGA. Another approach would be a (fixed integer length) sieve in the FPGA and keep the difficult bits on a CPU. Problem with that (and the one I am having) is you wind up hand coding an optimized sieve which has a scale out issues both in terms effort/reward and gates on the chip. Each tested prime seems in the sieve knocks off ~10% of the candidates, so quickly sieving the first few dozen primes should should see the bulk of the gains. As I said this is more of an idle fun project than earning coin. Regards, -- bsunau7
|
|
|
I've succeed to compile DGA's miner on the Parallella without any problem and the speed on the main dualcore ARM A9 CPU is 0.3655 2ch/s. Not impressive but I didn't expected more. The problem is with the share submission. In the moment it finds a share and trying to submit it, xptMiner crashes and I get the error message "Bus error". I didn't had the time to debug it yet. I think it's some alignment problem. When I'll have some free time I will also start to port the code to use the 16-core Epiphany chip, just for fun. I don't expect to see any dramatic speed improvement.
Your right the problem is the xpt data packets are not aligned, everything (except neon) is 32bit aligned in ARM. To debug it I had to compile without -03 (makes the code flow easier by also slows you right down) and gdb it. Some ugly alignment code (using arm load/stores or array assignments) should work, once you know what needs to be fixed. jh00 knows about this. On performance, compile and use gmp 6.0 if you aren't already using it. I saw a ~14% (from memory) improvement in my code moving to that version. Regards, -- bsunau7
|
|
|
I just got my parallella 2 days ago. I don't even remember when I ordered. Anyway it's time to play with it and find some primes . If anybody has some links to start with, I will thank them. I'm a noob in this territory. You lucky guy (or girl)! The 16 or 64 core epiphany one? The cores should be perfect for prime number grinding which is why I was looking at them in the first place... I'll assume you are comfortable with development (otherwise why get a parallela?) so here is a quick high level road map... The riecoin wallet compiles pretty cleanly, I'd probably start with that as it gets most of your build dependencies in place (also gets you used to linaro/ubuntu package system if it is new to you). I used the latest dbm libraries (not the 4.8 ones recommended) as I don't need a transportable wallet. Following that the cpu miner should compile (once again it has its own dependencies). Performance will suck, but you'll get something up and running. dga's miner should also work (I've not compiled it recently however). Then tune/code/tune/code until you get a block. Ask if you need any more help. Regards, -- bsunau7
|
|
|
very interesting! I used to work with an ARM chip that had an "embedded cryptographic coprocessor", which basically means it had a few interesting features implemented by hardware, like sha2 and modular exponentiation (up to 2048bits in this case). Wish I could put my hands on one of those, but the development boards were expensive.
Not wanting to distract you from riecoin but look at http://www.zedboard.org/product/microzed dual core with FPGA (parallela also sell one but they've had big issues with their kickstarter campaign). Xilinx even give you access to a (cut down) version of their development SDK. With some free VHDL tools you should be able to generate a bit stream to define your own co-processor. Parallela even use this to interface into their 16/64 core coprocessor and to define their HDMI hardware (yes, HDMI is defined in SW just add some hardware to support the physical interface). Also there are some OpenCL to VHDL converters, GPU coins with simple algorithms and some market liquidity are about to get very cheap/easy to move to FPGA based mining rigs. This is why I am mostly ignoring AES/SHA/CRYPT/X11 based coins... A few $$$$ and some time could all but destroy them. Regards, -- bsunau7
|
|
|
so you're really mining with ARM? cool! well, thermal throttling means it's not literally cool, but you get the point Aside from the first 2 weeks when I had a dozen or so AWS instances everything I've done has been ARM based. Not sure I'd call it mining yet, but it is fun. No hardware division, slower clock, really slow memory access and 32bit architecture are the downsides. On the upside, very low power and a super scaler architecture and my intel box isn't making noise. Anyway, I rewrote the code to make use of neon instructions; parts of the chip were are normally in a powered down state are powered up drawing power and making heat. I replaced the stock fan with a 12v silent one and knocked 23c off the temperature (107c to 84c). If I can get the speed to something usable was thinking about a binary release, I am amazed at how many people have little raspberry pi's doing nothing (or very little). Might make the few 100 RIC I own a little more valuable (or might not). aamarket with android port dga'a miner probably was thinking the same general game plan. Regards, -- bsunau7
|
|
|
Damn my cooling solution!
Speed the miner by 5% lose 5% of CPU thanks to thermal throttling. Add 10% lose 10%.
Time to externally power the little fan on the odroid...
-- bsunau7
|
|
|
Slow thread right now. Need posters! Ok I'll post some memories this coin has dredged up from the past. A little trick for those 32bit CPU's without hardware division (like most ARM) for calculating mod 17 (correctness tested for all values 0 to 0xffffffff): // 32 = 2^5 ; (2^32 / 17) = 252645135.05882352941176470588 ~= 0x0f0f0f0f uint32_t mod17(uint32_t n) { static uint8_t table[32] = { 0 ,1 ,2 ,2 ,3 ,3 ,4 ,4 , 5 ,5 ,6 ,6 ,7 ,7 ,8 ,8 , 9 ,9 ,10,10,11,11,12,12, 13,13,14,14,15,15,16,16 }; n = (0x0f0f0f0f*n + (n >> 4)) >> 27; return table[n]; }
While it is only marginally (a few %) quicker than the abi version gcc uses it was fun to rediscover. Strangely removing the table lookup makes it slower (which I didn't expect). Bump. -- bsunau7
|
|
|
It needs ">>(i%32))" I'd say - and I do not know why it works fine on x64, but the change helped to get seemingly proper targets on ARM. Still no share submitted, so I do not know if it helped.
Thanks for spotting this. I have committed a fix to the xptMiner repository. The performance difference should be insignificant on x64/x86. I also know there are some other places in the xpt source where memory alignment can make problems. Especially all the xptPacketbuffer_* functions have unaligned read/write access which is supported on x86/x64 but not on other platforms, but I don't know if it matters on ARM. Yes it does, one fix I had to deploy summarized in: https://bitcointalk.org/index.php?topic=424517.msg4883101#msg4883101Regards, -- bsunau7
|
|
|
I can confirm above mentioned change helped, ARM riecoin miner delivered first 4 primes share (after quite a long time) : p4=0x803a9a745c512a58eaceaafd83f4259ffd0e9bcec9d306484a0e0a7944efc5762b * 2**1452 + 0x32e238469041bc6ef8edc6dcd5d872f664bcbd5ae7e76fd1385ec97334d4c0f [00:54:07] Share found! (Blockheight: 54252) ====xptShare:: algo=7,ver=2,nTime=1399991211,nBits=33993728,userExtraNonceLength=4 xptShare->prevBlockHash :: dc 4c 98 04 9e ac af 34 2e a5 4a 7b 64 88 07 5a 64 1c 1b 0b c4 f8 2f c8 33 7f 0d 3b f0 21 87 75 xptShare->merkleRoot :: e9 e6 b4 09 ad 1c 4e 87 8b 10 7b 6e 48 08 23 1d 5f 1a 4e c4 e5 bb ad c2 22 f6 a5 fc 1b 9d 45 91 xptShare->merkleRootOriginal :: 7a 7c e8 0d 6d 07 91 7a e8 40 60 55 0f 46 09 e2 86 9a 49 ca ae 4e e6 28 29 13 d3 4d b9 a8 19 d2 xptShare->userExtraNonceData :: 12 00 00 00 xptShare->riecoin_nOffset :: 0f 4c 4d 33 97 ec 85 13 fd 76 7e ae d5 cb 4b 66 2f 87 5d cd 6d dc 8e ef c6 1b 04 69 84 23 2e 03 1716[00:54:08] 2ch/s: 0.7239 3ch/s: 0.0454 4ch/s: 0.0013 Shares total: 1 / 1 "Bus error" is still present, so a watchdog had to be implemented, but hey, now everybody can mine coins with their android cell phone Check for un-initialized (or reused) variables. I found that the optimizer in gcc would get horrendously confused unless everything was initialized, code which would work without optimization would crash with -O2... (and the re-ordering with -O2 is fun to debug). Regards, -- bsunau7
|
|
|
I agree, dga, and I know, but ... just few lines below ifdef ... else ... *(uint32*)(nOffset+d*4) = z_temp2->_mp_d[d]; and it was in the code since the beginning. But to be sure, I'll double check. I know (u)intXX_t is the right way, I already burned myself doing data exchange x64<->mips32 long time ago I think I replaced some in some previous miner version, doing cleanup as well, but your version is far more superior, so I ditched old changes. regarding win - that was exactly what I was thinking but no results yet apart from "uninitialized values" seem to be in the libraries and/or old kernel (so no apparent problem in the code), killed in valgrind soon because with sieve=9e8 its close to my memory limit, and that I checked couple returned triplets, they seem to be primes (like p - 4, p - 6, p - 16 ; p - 4, p - 12, p - 16 ; p - 4, p - 10, p - 16 ; ....) Which points us to your original point I'll post some results soon, but the speed is horrible and it does not make sense to run on this ARM architecture I run through 32bits of candidates in ~28 seconds (odorid-xu lite). Best speed up trick on ARM is to avoid division (or mod) most ARM CPUs emulate div in SW. I've not spent much (any) time on mine for a few weeks, but the power profile is magic. Waiting for the minnow board MAX, should do wonderful things for embedded miners! -- bsunau7
|
|
|
Perhaps I'm misunderstanding, but:
(a) Using the 40th primorial (plus or minus depending on which of the miners you're talking about) means that you never sieve factors that fit into a word anyway.
(b) The majority of time in my code is spent doing three things, in order: - Fermat primality test (gmp) - Calculating T_rounded_up % p (gmp) - Sieving large primes that still occur multiple times in the maximum number of nonces (primes under 2^29). Most of this time is actually spent asking one thing: if (offset < sieve_size) which mostly fails with a sieve of 8M entries and a prime of, e.g., 100m.
My guess, though I might be wrong, is that a lot of the optimizations you're looking at start to become less dominant when you go for a really huge primorial. For example, almost _no_ time is spent in checking the actual sieve - as far as I can tell, there's basically zero benefit to trying to optimize finding candidates. The code spends somewhere between 1-2 seconds doing primality testing for each iteration through the sieve (8 million bits). The time to check each bit position is a few tens of microseconds of that 1-2 seconds.
Ah! Mine is a 32bit machine so I've kept everything 32bits or less so I don't have the issue in (a). Likewise it becomes inefficient for me to sieve primes larger than a million which mitigates out some of (b). Regards, -- bsunau7
|
|
|
Hi, all - I've posted b14 source and binaries for the fastrie xptMiner: - Binaries: http://www.cs.cmu.edu/~dga/crypto/ric/- Source: https://github.com/dave-andersen/fastrie- ChangeLog: https://github.com/dave-andersen/fastrie/blob/master/ChangeLog- README: http://www.cs.cmu.edu/~dga/crypto/ric/readme.htmlThis is a speed-boost release targeting larger sieves, and the binaries are now linked against gmp 6.0.0a, which provides faster code targeting avx and avx2 in particular. Sandy Bridge, Ivy Bridge, and Haswell machines should see a very reasonable 5-15% speedup. The larger sieve support comes from borrowing a trick from a00k's miner, which reduces memory consumption with sieves > 500m and slightly improves speed. Older, slower machines will probably not get much of a boost from this release, but newer, faster boxes may benefit from a larger sieve - I'm seeing 10-20% speedups on i7-4770 (Haswell) CPUs. Thanks to the folks on ypool for kicking the tires on this. As always, there are likely to be bugs, but hopefully not too many. Please do read the README before worrying about some of them, and before tweaking the sieve size too much. Just had a very quick look over the code and I do my sieve in phases which might help speed things up a little more (warning my system does not have a hardware divide so I see the benefits very clearly, x86_64 might not see any speed-ups at all). Phase 1. primes smaller than 2*primorial (I use 210). A normal sieve with a fast exit eg. if(!(psieve[j>>5] & ( 1U << (j & 0x1f)))) break; Phase 2. The next "few hundred primes" Add the "remainder to large" test. Doing this test early in the sieve slows the sieve as the test mostly fails which is why I do it later eg. if(tmp & 0xffffffe0UL) continue; Only when the remainder has a greater than 50% chance of passing the test does it becomes time efficient to have this test. Phase 3. The last few hundred thousand primes. I do this in line with the scanner but the main difference is a bulk check 32 candidates at a time eg. if(!psieve[j>>5]) { j += 31; offset += 210*31; continue; } This needs candidate density to be less that ~1 in 64 candidates which is why you need to sieve the "first few hundred" before you get benefit. Regards and as always check my logic, -- bsunau7
|
|
|
The folks on ypool just pointed out to me that gmp 6.0.0a has been released. (Thanks!)
fastrie is only 5% faster with 6.0.0a than with 5.1.3 (I have SandyBridge) 11% faster with a custom ARM based miner. -- bsunau7
|
|
|
YAY, more CRLF/CR/LF/Encoding/whatever bullshit... I give up. Please provide some unmangled sourcecode or at the very least, a 32 bit linux build.
I had faith in this coin from launch but it's just been a disappointment to me. Missed the boat due to, seemingly, dev's that take cross platform code, change a line or two (aka/ie, a new coin), and completely fuck up all code portability. Seems like there's only a handful of actual developers in the cryptocurrency community that know what cross-platform/portable means.
If you have sympathy or pity this fool, RAvAQ3TrUNWrG2DDgfuPvdhzaiXtg2wjEu
I'll pay with knowledge... Install dos2unix or 'sed -e "s/^M$//"' (you'll need to ^V ^M to get the right string). I understand that not everyone can code or can navigate a unix-ish CLI but is that the fault of a dev? PS. My Verilog sucks but I don't blame the *coin ASIC miners for that. -- bsunau7
|
|
|
Is your miner publicly available? I can now say with absolute certainty, that if I had my current solo miner implementation on launch day, I would have solved the first 576 blocks in under 15 minutes using only 20 Intel E5-2697 v2 CPUs Was the launch really all that fair? I guess we will never know. The implementations from gatra, dga, and jh00 will be just as fast if they used slightly larger primorials and AVX2 instructions for multiple-precision arithmetic. So in a few weeks from now, the performance advantage from all of the miners should be negligible. I will begin work on an Nvidia GPU (sm_20 - sm_35) version this weekend. Also, I plan on releasing that version in binary format with a small developer's fee if the performance gain against a high-end CPU is significant. Sorry, I do not know how to program AMD GPUs for HPC related to multiple-precision arithmetic (Karatsuba multiplication + Montgomery Reduction). Credit where due - I really wanted to make an alternate, pre-computation and memory-heavy approach to this work, but I've given up. It's tough to beat straightforward sieving, and you (@supercomputing) were right - I ended up just going to a big primorial with a highly optimized sieve. I haven't rewritten any of the routines currently handled by gmp (and my GMP is probably compiled horribly), but my pool miner is now somewhere in that ballpark. I've solved 14 or so blocks in 24 hours using 14 machines, which is about in the same range you mentioned for your miner. Block 17920 is an example, if you're curious to extract my primorial. I'd still love to find a more satisfying way to crack this nut, though. But for now, the sieves and fast prime tests have it. Gatra, thank you for creating Riecoin. It's terribly fun, has one of the more intellectually interesting proof-of-work cores (as does XPM, in fairness), and I hope it starts producing record prime tuples one of these days. I'm planning on proposing high-speed RIC sieving cores as a class project for our parallel computing class next week. Should be fun! -Dave I've not had time to spend on this recently but I was hoping that a large pre-sieved static file (i.e network heavy) would address botnet mining. By making the payload to large to efficiently ship to victims it would reduce their effective return compared to legit miners and force them to look at another coin. -- bsunau7
|
|
|
Money on me being at fault, to save memory (make it fit!) I merge two smaller sieves... -- bsunau7
Well at least I won my money back! (my sieve was in error). But I can't thank you all enough for the round about way this has helped me. Thanks! -- bsunau7
|
|
|
- Generate up to a certain size polynomial. I use 200560490130 or the next as my base primorial and store a vector of all 48923875 entries. - Sieve *this* out up to the huge primorial in advance. - Do your operations relative to the huge primorial. But, as warned - the simple bitvector is still working better for me. Cool, that is what I am going but looking at your numbers I also pre sieve the possible p6 chains reducing my candidate count by ~128 times: const uint64_t primorial = 7420738134810; const uint32_t sexcount = 14 243 984; Then I run a second scan inline to catch the next 2 dozen or so primes (lets me avoid gmp and use simple 64bit math) before I hit the expensive code. General idea was to get a list of candidates which could be feed into something else (GPU was the thought). It is much faster than reference but it is reaching the limit of how fast I can push it. I have probably made a horrendous error in my algorithm... but coding again was fun... Regards, My implementation is a little different from both implementations mentioned above. If fact, the overhead is much less than that of jh00's implementation. My implementation is almost identical to Kim Walisch's primesieve implementation with a few minor exceptions. Please see Kim Walisch's description of wheel factorization if you would like to know exactly what I am doing: http://primesieve.org/@bsunau7 - mine does the same. I kill any location that fails to produce a six-set. I wonder which of us has a bug? *grin* I'll check my sieving code again. As one way to start comparing, the polynomials for the first few primorials are: Generator at Pn7 (210) 97 Generator at Pn11 (2310) 97 937 1147 1357 2197 Generator at Pn13 (30030) 97 1357 2407 3457 4717 5557 5767 6817 7867 8077 8287 10177 10597 11647 12907 13747 13957 15007 16057 16267 17107 18367 19417 19837 21727 21937 22147 23197 24247 24457 25297 26557 27607 28657 29917 @Supercomputing - Did you figure out a way to combine wheel factorization with storing a dense bitvector div 2310 (or div 210)? Or do you just allow a large bitvector and handle it through segmentation? I liked the way the jh implementation saved a lot of sieve space that way, and a straightforward prime sieve achieves a less dense packing (3-4x). Money on me being at fault, to save memory (make it fit!) I merge two smaller sieves... I'm working on an alternate method to validate, I suspect my error only exists in numbers later than 10million. For referecne my first few numbers: $ od -t u8 sextuplet.bin | head 0000000 1 7 0000020 97 1357 0000040 3457 4717 0000060 5767 8077 0000100 10597 12907 0000120 19417 23197 0000140 29917 30127 0000160 32947 34417 0000200 35797 36847 0000220 37897 38107 A quick spot check picking 10177 as an example... 10177 is prime, (10177+6) is not prime (divisible by 17) so it shouldn't be a valid sextuplet. Regards, -- bsunau7
|
|
|
|