nerdralph
|
|
December 04, 2016, 07:40:06 PM |
|
The implementation of the new algorithm turned out to be quite difficult, but it's coming along. It's not surprising, however, given the fact that I'm up against professional cryptographers. Just to let you guys know, the new algo involves a modified version of Wagner's algorithm, which I call "castrated Wagner's," with reduced memory bandwidth without compromising algorithm binding. My brain is fried thinking about this problem with so many gotchas day and night, but we are at the end of the tunnel.
You've piqued my curiosity. I never gave much thought to optimizing Wagner's algorithm, as I thought the algorithm binding limited what you could do. Maybe I'll take a break from AMD GCN assembler docs and look back at the equihash paper. p.s. You know your brain is working hard when you get a headache from just thinking! After a day of racking my brain any faster versions of the algorithm I've come up with result in close to 0 solutions. I've had ideas that might improve only round 0 or round8, but nothing that would make a material difference for the algorithm as a whole. I'm not quite confident enough to say it's impossible to significantly optimize the algorithm, but I'm going back to reading AMD GCN architecture docs which I know can help optimize the implementation.
|
|
|
|
laik2
|
|
December 04, 2016, 07:55:13 PM |
|
The implementation of the new algorithm turned out to be quite difficult, but it's coming along. It's not surprising, however, given the fact that I'm up against professional cryptographers. Just to let you guys know, the new algo involves a modified version of Wagner's algorithm, which I call "castrated Wagner's," with reduced memory bandwidth without compromising algorithm binding. My brain is fried thinking about this problem with so many gotchas day and night, but we are at the end of the tunnel.
You've piqued my curiosity. I never gave much thought to optimizing Wagner's algorithm, as I thought the algorithm binding limited what you could do. Maybe I'll take a break from AMD GCN assembler docs and look back at the equihash paper. p.s. You know your brain is working hard when you get a headache from just thinking! After a day of racking my brain any faster versions of the algorithm I've come up with result in close to 0 solutions. I've had ideas that might improve only round 0 or round8, but nothing that would make a material difference for the algorithm as a whole. I'm not quite confident enough to say it's impossible to significantly optimize the algorithm, but I'm going back to reading AMD GCN architecture docs which I know can help optimize the implementation. It is possible. Claymore and Optimer have done it. I don't see why OSS community won't. I guess it will take more time. Which is the barrier for all zec miners as of current trade value.... EDIT: The keyword in my post is "community". If you have an idea just share it... I don't see any ideas popping up lately. I've been reading OpenCL and AMD GCN docs too but my knowledge is pretty limited...if I see anything useful I think I can help...one way or another.
|
|
|
|
nerdralph
|
|
December 04, 2016, 09:23:44 PM Last edit: December 04, 2016, 09:40:08 PM by nerdralph |
|
After a day of racking my brain any faster versions of the algorithm I've come up with result in close to 0 solutions. I've had ideas that might improve only round 0 or round8, but nothing that would make a material difference for the algorithm as a whole. I'm not quite confident enough to say it's impossible to significantly optimize the algorithm, but I'm going back to reading AMD GCN architecture docs which I know can help optimize the implementation.
It is possible. Claymore and Optimer have done it. I don't see why OSS community won't. I guess it will take more time. Which is the barrier for all zec miners as of current trade value.... EDIT: The keyword in my post is "community". If you have an idea just share it... I don't see any ideas popping up lately. I've been reading OpenCL and AMD GCN docs too but my knowledge is pretty limited...if I see anything useful I think I can help...one way or another. No, they only optimized the implementation. The algorithm, i.e. 8 rounds of bin sorts to find collisions on 20 bits followed by a final 40-bit collision search, has not been optimized. I've already explained the optimizations that could be done in OpenCL which would take ~5 million core clocks per round. That would push the performance up to ~200 on a Rx 470 clocked at 1250/1750. With GCN assembler I believe I can get that down to ~3 million core clocks per round, with performance of over 300 sols/s on a Rx 470. p.s. When I finish I'll probably go the closed-source route too. While I don't expect to make thousands per day, I think with a 1-2% fee I could make a few hundred dollars per day as long as the ZEC price holds above $25. You can talk "community" and sing kumbaya till you are blue in the face, but I expect closed-source miners will have the best performance for the next few months. Claymore probably spends more than 40 hrs/wk on miner development, but he can be reasonably confident he'll get paid for his efforts. While the $10K prize money may have been some incentive for Marc, now that the contest is over, the people working on open-source miners are primarily doing it out of fun.
|
|
|
|
laik2
|
|
December 04, 2016, 09:40:01 PM Last edit: December 04, 2016, 10:40:18 PM by laik2 |
|
Ok...I don't really care close/open source as long as it works and is maintained constantly. Take optiminer ...truly crashing software with 10% fee...ppl still use it for reason unknown. If you make any optimizations that you claim you can reach. I deffinately will use your work as long as you guarantee stability.
|
|
|
|
ghostfaceuk
|
|
December 05, 2016, 07:26:31 AM |
|
After a day of racking my brain any faster versions of the algorithm I've come up with result in close to 0 solutions. I've had ideas that might improve only round 0 or round8, but nothing that would make a material difference for the algorithm as a whole. I'm not quite confident enough to say it's impossible to significantly optimize the algorithm, but I'm going back to reading AMD GCN architecture docs which I know can help optimize the implementation.
It is possible. Claymore and Optimer have done it. I don't see why OSS community won't. I guess it will take more time. Which is the barrier for all zec miners as of current trade value.... EDIT: The keyword in my post is "community". If you have an idea just share it... I don't see any ideas popping up lately. I've been reading OpenCL and AMD GCN docs too but my knowledge is pretty limited...if I see anything useful I think I can help...one way or another. No, they only optimized the implementation. The algorithm, i.e. 8 rounds of bin sorts to find collisions on 20 bits followed by a final 40-bit collision search, has not been optimized. I've already explained the optimizations that could be done in OpenCL which would take ~5 million core clocks per round. That would push the performance up to ~200 on a Rx 470 clocked at 1250/1750. With GCN assembler I believe I can get that down to ~3 million core clocks per round, with performance of over 300 sols/s on a Rx 470. p.s. When I finish I'll probably go the closed-source route too. While I don't expect to make thousands per day, I think with a 1-2% fee I could make a few hundred dollars per day as long as the ZEC price holds above $25. You can talk "community" and sing kumbaya till you are blue in the face, but I expect closed-source miners will have the best performance for the next few months. Claymore probably spends more than 40 hrs/wk on miner development, but he can be reasonably confident he'll get paid for his efforts. While the $10K prize money may have been some incentive for Marc, now that the contest is over, the people working on open-source miners are primarily doing it out of fun. Interesting to hear you think you may be able to squeeze that much from a 470. I know there is probably a lot of work still required to fully optimise everything but I was wondering if you are going to proceed with the miner (be it open or closed source) will you be making a new thread for it or will we be able to follow its progress here?
|
|
|
|
Amph
Legendary
Offline
Activity: 3248
Merit: 1070
|
|
December 05, 2016, 07:46:02 AM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
|
|
|
|
qwep1
|
|
December 05, 2016, 08:19:40 AM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
is that for miner
|
|
|
|
ghostfaceuk
|
|
December 05, 2016, 08:38:57 AM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
is that for miner Its what nerdralph thinks he can get from a 470 with GCN optimisation
|
|
|
|
Genamant
Full Member
Offline
Activity: 730
Merit: 102
Trphy.io
|
|
December 05, 2016, 01:08:09 PM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
is that for miner Its what nerdralph thinks he can get from a 470 with GCN optimisation He predicted the 470 could do only 160 H/s previously. but he might be right this time. But he said he would not share the fast miner.
|
|
|
|
laik2
|
|
December 05, 2016, 01:20:25 PM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
is that for miner Its what nerdralph thinks he can get from a 470 with GCN optimisation He predicted the 470 could do only 160 H/s previously. but he might be right this time. But he said he would not share the fast miner. He will share it in binary form with devfee.
|
|
|
|
ghostfaceuk
|
|
December 05, 2016, 01:26:47 PM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
but 300 sol is nearly 100% faster, for a 470...
is that for miner Its what nerdralph thinks he can get from a 470 with GCN optimisation He predicted the 470 could do only 160 H/s previously. but he might be right this time. But he said he would not share the fast miner. He will share it in binary form with devfee. Yeah he said above it will be closed source and will have a dev fee of 1-2% built in. I hope he does manage to make that kind of speed from the 470 cards, perhaps then he may be able to look at other algo's and use his experience to boost the cards for them as well. By releasing a miner that can used the memory to its full potential and thus increase speeds above what is available right for the various algos I think more people will use it anid help him make money so he can continue to develop/improve his miners just like other coders
|
|
|
|
nerdralph
|
|
December 05, 2016, 02:11:35 PM |
|
He predicted the 470 could do only 160 H/s previously. but he might be right this time. But he said he would not share the fast miner.
The 300 target is based on Marc's observation that the performance counters go up by 32 (one DDR5 channel) for a single-byte write, and not 64 (a full cache line) as I initially expected. This has not yet been measured in code that does sustained writes. In the short term I should have some test code that will stress the memory controller, and allow me to confirm the practical performance limits instead of just relying on datasheet specs.
|
|
|
|
nerdralph
|
|
December 05, 2016, 02:14:16 PM |
|
If you make any optimizations that you claim you can reach. I deffinately will use your work as long as you guarantee stability.
After version 0.4, I had no problem getting Optiminer to run stable for days with safe overclock settings and good risers. If Optiminer v 0.6 crashes on your rig, my miner probably will too.
|
|
|
|
nerdralph
|
|
December 05, 2016, 02:20:58 PM |
|
300 sol is impressive for a 470, especially after clymore said that it's not possible anymore to get 30-50% but only small boost from now
Claymore may be limited to coding in OpenCL, which doesn't provide a means of using the GDS(global data share). OpenCL also doesn't expose the SLC or GLC bits to control the caching policy. https://community.amd.com/thread/208471It's also possible that he's already tried GCN assembler and found out that the memory controller on GCN chips does not perform the way Marc and I think it should.
|
|
|
|
laik2
|
|
December 05, 2016, 02:22:39 PM |
|
If you make any optimizations that you claim you can reach. I deffinately will use your work as long as you guarantee stability.
After version 0.4, I had no problem getting Optiminer to run stable for days with safe overclock settings and good risers. If Optiminer v 0.6 crashes on your rig, my miner probably will too. Just to clarify...downclocked 4xRX 480 without undervolt meassured from the wall 880/890W vents spin constantly 70+ % to sustain below 60 degree.(tested 0.6.0 last night) Claymore v8 -i 2 640/650W undervolt -100mV vents 35/40% temp 59 degree. Silentarmy v5(eXtremal opts + memleak fix) - ~700W vents 25/30% temps ~60 degree. I do not consider this normal. Do you? S/s on optiminer and claymore are almost equal, I added 2xR9 390 and Claymore beats optiminer with ~100S/s so just using win10 for now. Release binary miner with acceptable devfee 1/2% and silentarmy power and resources consumption and I will use it.
|
|
|
|
Linit
Newbie
Offline
Activity: 13
Merit: 0
|
|
December 05, 2016, 05:26:49 PM |
|
Who is the developer of the new version of SA, zawana or nerdralph ?.
|
|
|
|
qwep1
|
|
December 05, 2016, 05:35:52 PM |
|
there is a new version of miners
|
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 05, 2016, 05:56:29 PM |
|
I will start my own fork of SA once the current rewrite is done as I am taking a different path. It will be announced here once it is ready for public consumption. Thank you guys for the great work!
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
ioglnx
Sr. Member
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
|
December 05, 2016, 05:58:33 PM |
|
I will start my own fork of SA once the current rewrite is done as I am taking a different path. It will be announced here once it is ready for public consumption. Thank you guys for the great work!
Any idea of when we see the first Alpha beta or omega? It's somehow taking the wrong path of all split up..going closed source..bad feeling.
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
zawawa
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 05, 2016, 06:33:27 PM |
|
Oh, my fork will be open-source, so no worries. I will probably GPL it so that all the derivative works will be open-source, too. I'm a "Free as in Freedom" kind of guy, so I am not particularly interested in closed-source miners. If other devs would like to monetize their great skills, good luck to them! They have all the rights to do so. It is just that I don't want to join the band for philosophical reasons. All I can say about the time frame is, "It's done when it's done." I am working VERY HARD on the new version, though. I feel pretty bad about dragging you guys along, but you also need to understand that I have been working on Equihash for only two weeks while other devs have been doing so for months. We will see.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|