Kodaman
Jr. Member
Offline
Activity: 189
Merit: 2
|
|
August 23, 2019, 02:38:12 PM |
|
Nothing is changed here. Still milking the newbies
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 01, 2019, 03:37:52 PM Last edit: September 01, 2019, 05:30:53 PM by sp_ |
|
Ravencoin will hardfork on the 1st of october and change the mining algo. I have added support for x16rv2 in my opensource fork without any fee. My fork is the fastest opensource free miner for x16r and x16rv2. https://github.com/sp-hash/suprminer/commits/master-Added support for X16RV2 -Improved speed on rtx cards and 1660/1660ti (x16s,x16r,x16r2,x17) cuda 9.2 32bit binary (compute 6.1+) https://github.com/sp-hash/suprminer/releases
|
|
|
|
|
reb0rn21
Legendary
Offline
Activity: 1898
Merit: 1024
|
|
September 06, 2019, 07:32:12 PM |
|
X16rv2 is FPGA shit algo, you better start optimizing the FPGA bitsream to 5Ghz or so for a fee
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3164
Merit: 1003
|
|
September 06, 2019, 07:40:41 PM |
|
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 06, 2019, 08:17:25 PM Last edit: September 06, 2019, 09:00:52 PM by sp_ |
|
X16rv2 is FPGA shit algo, you better start optimizing the FPGA bitsream to 5Ghz or so for a fee I'm just testing the new cards and algo. I've got a rtx 2070 and a rtx 2060 SUPER. In x16rv2 I managed to remove the new tiger192 completely from the SHA512, and partly on luffa and keccak. By merging the tiger into the other kernels the gpu can do the new multiplications and AES in the tiger192 in parallell. So the new X16v2 will perform close to the speed of the old x16r on the gpu, and fpga's will slow down mostly because of the multiplications. A bether FGGA killer would be to generate PTX kernels runtime for each block by permuting the assembly instructions. The instuctions should include multiplications, logic, scrambling. The gpu miner will need to compile the ptx on the fly for every block before warping. Then the FPGA implementation would need to have an ALU (cpu emulation) and this will slow down alot.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 06, 2019, 08:32:57 PM Last edit: September 06, 2019, 08:58:53 PM by sp_ |
|
What about older cards?
Some RTX optimalizations doesn't work on gtx cards. In the code I have split execution on some of them rtx optimized kernel/gtx optimized, but not all of them. If I do the splitting x17 will do around 24MHASH on the gtx 1080ti. ccminer 1.0 alexis is around 20MHASH. F.ex reverting cubehash-shavite to the old version you gain a megahash on 1080ti. To get the opensource up to date with the latest fee miners is more work. The opensource SIMD is slow and need to be rewritten. I have extracted the latest t-rex ptx code. PM me if you want to help reverse engineer to cuda and opensource.
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
September 07, 2019, 07:08:37 AM |
|
sorry to say but FPGA hashrate of x16rv2 will be the same as the old x16r
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 07, 2019, 09:26:47 AM Last edit: September 07, 2019, 10:11:46 AM by sp_ |
|
Let's see what the result will be after the fork. X16v2 will remove the ASIC'S and x16v3 can remove the fpga'a Ravencoin could hardfork again in 2 months to a randomhash variant with permuted instructions in the hash x16v3, and then the FPGA's will have to mine something else. An optmized x16rv2 could do around 35MHASH on 65 watt's on the RTX 2060 SUPER.
|
|
|
|
joblo
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
September 07, 2019, 11:49:09 AM |
|
The problem is x16rv2 isn't any more ASIC resistant than v1. All it really needs is development of a Tiger kernel which shouldn't be too difficult. The only thing that would prevent it is lack of market demand. Lyra2v3 has a similar problem.
It's not easy to make an algo GPU friendly and ASIC resistant. It would have to target a resource that GPUs have in abundance that would be too expensive to implement on an ASIC. I'm not aware of any.
Permuted instructions can be worked around with a RAM code segment so it just increases RAM requirements. A bigger dataset has the same effect. Ironically Lyra2REv2 used a smaller dataset than Lyra2RE to give GPUs an advantage over CPUs. Lyra2REv3 did not change the size of the dataset.
In the end coins that fork to new algos periodically as an anti-ASIC strategy do little more than create a planned obsolenscence environment driving demand for new ASICs.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 07, 2019, 01:10:00 PM |
|
The problem is x16rv2 isn't any more ASIC resistant than v1. All it really needs is development of a Tiger kernel which shouldn't be too difficult. The only thing that would prevent it is lack of market demand. Lyra2v3 has a similar problem.
I expect a decline in the difficulty after the fork. Look what happened in the Beam II fork. Permuted instructions can be worked around with a RAM code segment so it just increases RAM requirements.
You need to read the instruction from ram decode and execute. Difficult to make the fpga run at full speed. You can create a superscalar version that execute more than one instruction per cycle, but still much slower than a static hash function. Or is it a faster way?
|
|
|
|
bensam1231
Legendary
Offline
Activity: 1750
Merit: 1024
|
|
September 12, 2019, 01:37:08 AM |
|
Depending on the market there isn't demand for new ASICs (due to price of development and emission), FPGAs are the new ASICs and much more difficult to deal with. What further exasperates the issue is that the bitstreams for them are generally relegated to smokey back rooms where most people don't have access. So even if you find and buy the really expensive hardware, you don't have access to the software to run on them.
This is an awful lot like the scrypt days where people were making miners for algos and trading them in back rooms while ASICs were just starting to emerge. This entire last year or so has been dominated by this behavior, matched with market decline and over bloat of hash (along with dark hash) has lead to a relatively stark outlook.
GPUs DO have things that FPGAs don't have, which is a much lower price tag and memory. So then it comes down to hash per/$. If FPGAs give little to no advantage for the price you're paying for them, there is no reason to use them. Memory is only on a couple FPGAs and depending on which one you're talking about, it's not the best in the world, which further limits performance.
A lot of the current predicament, putting aside the market decline and bloating of hashrate caused by the boom in '18, has to do with coin devs being lazy. There are a lot of algos that are much more asic resistant then others, instead they do stupid shit like what RVN is doing by slightly altering their algo to maintain a brandname and appearing progressive, without actually tackling the problem. It's not even about having a silver bullet either, they can just swap for already available algos. MTP and Progpow for instance are very asic/fpga resistant (for now).
|
I buy private Nvidia miners. Send information and/or inquiries to my PM box.
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
September 12, 2019, 06:56:05 AM |
|
There are HBM equipped FPGAs already. Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 12, 2019, 10:06:44 PM Last edit: September 12, 2019, 10:19:19 PM by sp_ |
|
There are HBM equipped FPGAs already. Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.
So the next question is how many times can you access the HBM per cycle. In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime). On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.
|
|
|
|
joblo
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
September 13, 2019, 02:24:32 AM |
|
So the next question is how many times can you access the HBM per cycle. In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime). On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.
By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.
|
|
|
|
bensam1231
Legendary
Offline
Activity: 1750
Merit: 1024
|
|
September 13, 2019, 03:51:50 AM |
|
There are HBM equipped FPGAs already. Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.
If you're talking about FKs they aren't even being shipped and they don't even talk about what algos they'll support. Either way, as I mentioned FPGAs with memory (specifically fast memory) are in the extreme minority. They aren't everywhere. Anti-FPGA effort is not a silver bullet. They're a lot more expensive to produce so you make something that makes it extremely expensive to produce then there has to be a huge reward on the other side or it's not worth it. Looking at 2-3 year ROI on a lot of FPGAs, even if they produce a lot of hashrate makes them very unpalatable. There is opportunity cost associated with everything a lot of people don't consider that. FPGAs also become obsolete and obsolescence is something that has to be considered. So even if you have a FPGA that will ROI in 3 years, there can and more then likely will be newer ones out that will obsolete those. Chinese don't respect licenses or IP rights unless it's some megacorp and you have millions to throw at it with lawyers.
|
I buy private Nvidia miners. Send information and/or inquiries to my PM box.
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
September 13, 2019, 06:19:24 AM |
|
There are HBM equipped FPGAs already. Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.
So the next question is how many times can you access the HBM per cycle. In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime). On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs. what will happen when cards compatible with the language are no longer produced? maybe you are planning a pump and dump coin so you don't care :-D
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 13, 2019, 07:00:57 AM |
|
By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.
Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution. The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2912
Merit: 1087
Team Black developer
|
|
September 13, 2019, 07:03:54 AM |
|
what will happen when cards compatible with the language are no longer produced?
The point with ptx is that it's a unified language for all NVIDIA gpu architechtures. The ptx is compiled to the native gpu language by the NVIDIA driver before execution. If NVIDIA decide to replace PTX with SPTX, you simply need your miner software to convert the random hashing function into SPTX.
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
September 13, 2019, 07:13:02 AM |
|
By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.
Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution. The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges. Yeah, no PTX, that's what I was saying. ==> RandomX
|
|
|
|
|