Bitcoin Forum
May 10, 2024, 05:00:38 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 [1224] 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347498 times)
Kodaman
Jr. Member
*
Offline Offline

Activity: 189
Merit: 2


View Profile
August 23, 2019, 02:38:12 PM
 #24461

Nothing is changed here.
Still milking the newbies  Cool
1715317238
Hero Member
*
Offline Offline

Posts: 1715317238

View Profile Personal Message (Offline)

Ignore
1715317238
Reply with quote  #2

1715317238
Report to moderator
If you want to be a moderator, report many posts with accuracy. You will be noticed.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715317238
Hero Member
*
Offline Offline

Posts: 1715317238

View Profile Personal Message (Offline)

Ignore
1715317238
Reply with quote  #2

1715317238
Report to moderator
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 01, 2019, 03:37:52 PM
Last edit: September 01, 2019, 05:30:53 PM by sp_
 #24462

Ravencoin will hardfork on the 1st of october and change the mining algo.

I have added support for x16rv2 in my opensource fork without any fee. My fork is the fastest opensource free miner for x16r and x16rv2.

https://github.com/sp-hash/suprminer/commits/master


-Added support for X16RV2
-Improved speed on rtx cards and 1660/1660ti (x16s,x16r,x16r2,x17)

cuda 9.2 32bit binary (compute 6.1+)

https://github.com/sp-hash/suprminer/releases

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 06, 2019, 04:10:48 PM
 #24463

Spmod-git #13 released.


changes from #12

-X16rv2 Around +30% faster on RTX 2060 super
-X17 +24% on RTX 2060 super

https://github.com/sp-hash/suprminer/commits/master

https://github.com/sp-hash/suprminer/releases


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
reb0rn21
Legendary
*
Offline Offline

Activity: 1898
Merit: 1024


View Profile
September 06, 2019, 07:32:12 PM
 #24464

X16rv2 is FPGA shit algo, you better start optimizing the FPGA bitsream to 5Ghz or so for a fee  Grin

              ▄▄▄ ▀▀▀▀▀▀▀▀▀ ▄▄▄
           ▄▀▀    ▄▄▄▄▄▄▄▄▄    ▀▀▄
        ▄▀▀  ▄▄▀█          ▀█▀▄▄  ▀▀▄
      ▄▀▀ ▄▄▀    ▀▀▄▄▄▄▄▄▄▀▀    ▀▄▄ ▀▀▄
     █   █            ▀            █   █
   ▄▀ █  ▀▄▄                     ▄█▀  █ ▀▄
  ▄▀ ▄▀ █▄ ▀▀▀██▄▄▄       ▄▄▄██▀▀  ██ ▀▄ ▀▄
  ▀▄▀▀▄ ██ ▄▄▄▄▄▄  ▀▄   ▄▀  ▄▄▄▄▄▄ ██ ▄▀▀▄▀
 ██   █ ██ ▀▄    ▀▄ █   █ ▄▀    ▄▀ ██ █  ▀██
 █  ▄█  ▀█  ▀▀▀▀▀▀▀ █   █ ▀▀▀▀▀▀▀  █   █▄  █
█▀ █  █  █          █   █          █  █  █ ▀▀
 █▀  ▄▀  █▀▄        █   █        ▄▀█  ▀▄  ▀█
 ▄  █▀   █ ▀█▄      ▀   ▀      ▄█▀ █  ▄▀█  ▄
 █▄▀  █  █                         █  █  ▀▄█
 ▀▄  █   ▀█        ▄▄▀▄▀▄▄        █▀   █  ▄
  ▀▄▀▀  █▄ █     ▀█  ▀▀▀  █▀     █ ▄█ ▄▀▀▄▀
   ▀ ▄  ██ █▀▄     ▀▀▄▄▄▀▀     ▄▀█ ██ ▀▄ ▀
    ▀█  ██ █ █▀▄    ▄▄▄▄▄    ▄▀█ █ ██  █▀
      ▀▄ ▀ █ █ ██▄         ▄██ █ █ ▀ ▄▀
        ▀▄ █ █ █ ▀█▄     ▄█▀ █ █ █ ▄▀
          ▀▀▄█ █    ▀▀▀▀▀    █ █▄▀▀
              ▀▀ ▄▄▄▄▄▄▄▄▄▄▄ ▀▀
   
..I  D  E  N  A..
   
Proof-of-Person Blockchain

Join the mining of the first human-centric
cryptocurrency
 



 
▲    2 3 2 2

..N  O  D  E  S..
   
                ██
                ██
                ██
                ██
                ██
         ▄      ██      ▄
         ███▄   ██   ▄███
          ▀███▄ ██ ▄███▀
            ▀████████▀
              ▀████▀
                ▀▀
██▄                            ▄██
███                            ███
███                            ███
███                            ███
 ███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███
  ▀▀██████████████████████████▀▀
   
D O W N L O A D

Idena node

   
   
▄▄▄██████▄▄▄
▄▄████████████████▄▄
▄█████▀▀        ▀▀█████▄
████▀                ▀████
███▀    ▄▄▄▄▄▄▄▄▄       ▀███
███      █   ▄▄ █▀▄        ███
██▀      █  ███ █  ▀▄      ▀██
███       █   ▀▀ ▀▀▀▀█       ███
███       █  ▄▄▄▄▄▄  █       ███
███       █  ▄▄▄▄▄▄  █       ███
██▄      █  ▄▄▄▄▄▄  █      ▄██
███      █          █      ███
███▄    ▀▀▀▀▀▀▀▀▀▀▀▀    ▄███
████▄                ▄████
▀█████▄▄        ▄▄█████▀
▀▀████████████████▀▀
▀▀▀██████▀▀▀
   
    .REQUEST INVITATION.
tbearhere
Legendary
*
Offline Offline

Activity: 3136
Merit: 1003



View Profile
September 06, 2019, 07:40:41 PM
 #24465

Spmod-git #13 released.


changes from #12

-X16rv2 Around +30% faster on RTX 2060 super
-X17 +24% on RTX 2060 super

https://github.com/sp-hash/suprminer/commits/master

https://github.com/sp-hash/suprminer/releases


What about older cards?
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 06, 2019, 08:17:25 PM
Last edit: September 06, 2019, 09:00:52 PM by sp_
 #24466

X16rv2 is FPGA shit algo, you better start optimizing the FPGA bitsream to 5Ghz or so for a fee  Grin

I'm just testing the new cards and algo. I've got a rtx 2070 and a rtx 2060 SUPER.
In x16rv2 I managed to remove the new tiger192 completely from the SHA512, and partly on luffa and keccak. By merging the tiger into the other kernels the gpu can do the new multiplications and AES in the tiger192 in parallell.

So the new X16v2 will perform close to  the speed of the old x16r on the gpu, and fpga's will slow down mostly because of the multiplications.
A bether FGGA killer would be to generate PTX kernels runtime for each block by permuting the assembly instructions. The instuctions should include multiplications, logic, scrambling. The gpu miner will need to compile the ptx on the fly for every block before warping. Then the FPGA implementation would need to have an ALU (cpu emulation) and this will slow down alot.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 06, 2019, 08:32:57 PM
Last edit: September 06, 2019, 08:58:53 PM by sp_
 #24467

What about older cards?

Some RTX optimalizations doesn't work on gtx cards. In the code I have split execution on some of them rtx optimized kernel/gtx optimized, but not all of them.  If I do the splitting x17 will do around 24MHASH on the gtx 1080ti. ccminer 1.0 alexis is around 20MHASH.

F.ex reverting cubehash-shavite to the old version you gain a megahash on 1080ti.

To get the opensource up to date with the latest fee miners is more work. The opensource SIMD is slow and need to be rewritten. I have extracted the latest t-rex ptx code. PM me if you want to help reverse engineer to cuda and opensource.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 07, 2019, 07:08:37 AM
 #24468

sorry to say but FPGA hashrate of x16rv2 will be the same as the old x16r

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 07, 2019, 09:26:47 AM
Last edit: September 07, 2019, 10:11:46 AM by sp_
 #24469

Let's see what the result will be after the fork. X16v2 will remove the ASIC'S and x16v3 can remove the fpga'a
Ravencoin could hardfork again in 2 months to a randomhash variant with permuted instructions in the hash x16v3, and then the FPGA's will have to mine something else. An optmized x16rv2 could do around 35MHASH on 65 watt's on the RTX 2060 SUPER.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
September 07, 2019, 11:49:09 AM
 #24470

The problem is x16rv2 isn't any more ASIC resistant than v1. All it really needs is development
of a Tiger kernel which shouldn't be too difficult. The only thing that would prevent it is lack
of market demand. Lyra2v3 has a similar problem.

It's not easy to make an algo GPU friendly and ASIC resistant. It would have to target a resource
that GPUs have in abundance that would be too expensive to implement on an ASIC. I'm not aware of any.

Permuted instructions can be worked around with a RAM code segment so it just increases RAM requirements.
A bigger dataset has the same effect. Ironically Lyra2REv2 used a smaller dataset than Lyra2RE to give GPUs
an advantage over CPUs. Lyra2REv3 did not change the size of the dataset.

In the end coins that fork to new algos periodically as an anti-ASIC strategy do little more than create
a planned obsolenscence environment driving demand for new ASICs.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 07, 2019, 01:10:00 PM
 #24471

The problem is x16rv2 isn't any more ASIC resistant than v1. All it really needs is development
of a Tiger kernel which shouldn't be too difficult. The only thing that would prevent it is lack
of market demand. Lyra2v3 has a similar problem.

I expect a decline in the difficulty after the fork. Look what happened in the Beam II fork.

Quote
Permuted instructions can be worked around with a RAM code segment so it just increases RAM requirements.

You need to read the instruction from ram decode and execute. Difficult to make the fpga run at full speed.  You can create a superscalar version that execute more than one instruction per cycle, but still much slower than a static hash function. Or is it a faster way?

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
bensam1231
Legendary
*
Offline Offline

Activity: 1750
Merit: 1024


View Profile
September 12, 2019, 01:37:08 AM
 #24472

Depending on the market there isn't demand for new ASICs (due to price of development and emission), FPGAs are the new ASICs and much more difficult to deal with. What further exasperates the issue is that the bitstreams for them are generally relegated to smokey back rooms where most people don't have access. So even if you find and buy the really expensive hardware, you don't have access to the software to run on them.

This is an awful lot like the scrypt days where people were making miners for algos and trading them in back rooms while ASICs were just starting to emerge. This entire last year or so has been dominated by this behavior, matched with market decline and over bloat of hash (along with dark hash) has lead to a relatively stark outlook.

GPUs DO have things that FPGAs don't have, which is a much lower price tag and memory. So then it comes down to hash per/$. If FPGAs give little to no advantage for the price you're paying for them, there is no reason to use them. Memory is only on a couple FPGAs and depending on which one you're talking about, it's not the best in the world, which further limits performance.

A lot of the current predicament, putting aside the market decline and bloating of hashrate caused by the boom in '18, has to do with coin devs being lazy. There are a lot of algos that are much more asic resistant then others, instead they do stupid shit like what RVN is doing by slightly altering their algo to maintain a brandname and appearing progressive, without actually tackling the problem. It's not even about having a silver bullet either, they can just swap for already available algos. MTP and Progpow for instance are very asic/fpga resistant (for now).

I buy private Nvidia miners. Send information and/or inquiries to my PM box.
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2019, 06:56:05 AM
 #24473

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 12, 2019, 10:06:44 PM
Last edit: September 12, 2019, 10:19:19 PM by sp_
 #24474

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
September 13, 2019, 02:24:32 AM
 #24475

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
bensam1231
Legendary
*
Offline Offline

Activity: 1750
Merit: 1024


View Profile
September 13, 2019, 03:51:50 AM
 #24476

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

If you're talking about FKs they aren't even being shipped and they don't even talk about what algos they'll support. Either way, as I mentioned FPGAs with memory (specifically fast memory) are in the extreme minority. They aren't everywhere.

Anti-FPGA effort is not a silver bullet. They're a lot more expensive to produce so you make something that makes it extremely expensive to produce then there has to be a huge reward on the other side or it's not worth it. Looking at 2-3 year ROI on a lot of FPGAs, even if they produce a lot of hashrate makes them very unpalatable. There is opportunity cost associated with everything a lot of people don't consider that. FPGAs also become obsolete and obsolescence is something that has to be considered. So even if you have a FPGA that will ROI in 3 years, there can and more then likely will be newer ones out that will obsolete those.

Chinese don't respect licenses or IP rights unless it's some megacorp and you have millions to throw at it with lawyers.

I buy private Nvidia miners. Send information and/or inquiries to my PM box.
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 13, 2019, 06:19:24 AM
 #24477

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

what will happen when cards compatible with the language are no longer produced?
maybe you are planning a pump and dump coin so you don't care :-D

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 13, 2019, 07:00:57 AM
 #24478

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
September 13, 2019, 07:03:54 AM
 #24479

what will happen when cards compatible with the language are no longer produced?

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures. The ptx is compiled to the native gpu language by the NVIDIA driver before execution. If NVIDIA decide to replace PTX with SPTX, you simply need your miner software to convert the random hashing function into SPTX.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 13, 2019, 07:13:02 AM
 #24480

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

Yeah, no PTX, that's what I was saying.
==> RandomX

Pages: « 1 ... 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 [1224] 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!