Bitcoin Forum
December 14, 2019, 02:04:08 AM *
News: Latest Bitcoin Core release: 0.19.0.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 [1226] 1227 1228 1229 »
  Print  
Author Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels.  (Read 2318856 times)
pallas
Legendary
*
Offline Offline

Activity: 2212
Merit: 1087


Black Belt Developer


View Profile
September 12, 2019, 06:56:05 AM
 #24501

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1576289048
Hero Member
*
Offline Offline

Posts: 1576289048

View Profile Personal Message (Offline)

Ignore
1576289048
Reply with quote  #2

1576289048
Report to moderator
sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 12, 2019, 10:06:44 PM
Last edit: September 12, 2019, 10:19:19 PM by sp_
 #24502

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.
joblo
Legendary
*
Offline Offline

Activity: 1372
Merit: 1096


View Profile
September 13, 2019, 02:24:32 AM
 #24503

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
bensam1231
Legendary
*
Offline Offline

Activity: 1526
Merit: 1020


View Profile
September 13, 2019, 03:51:50 AM
 #24504

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

If you're talking about FKs they aren't even being shipped and they don't even talk about what algos they'll support. Either way, as I mentioned FPGAs with memory (specifically fast memory) are in the extreme minority. They aren't everywhere.

Anti-FPGA effort is not a silver bullet. They're a lot more expensive to produce so you make something that makes it extremely expensive to produce then there has to be a huge reward on the other side or it's not worth it. Looking at 2-3 year ROI on a lot of FPGAs, even if they produce a lot of hashrate makes them very unpalatable. There is opportunity cost associated with everything a lot of people don't consider that. FPGAs also become obsolete and obsolescence is something that has to be considered. So even if you have a FPGA that will ROI in 3 years, there can and more then likely will be newer ones out that will obsolete those.

Chinese don't respect licenses or IP rights unless it's some megacorp and you have millions to throw at it with lawyers.

I buy private Nvidia miners. Send information and/or inquiries to my PM box.
pallas
Legendary
*
Offline Offline

Activity: 2212
Merit: 1087


Black Belt Developer


View Profile
September 13, 2019, 06:19:24 AM
 #24505

There are HBM equipped FPGAs already.
Problem is, even with restricted bitstreams, their ROI is close to infinity. Just like with ASICs.

So the next question is how many times can you access the HBM per cycle.
In my algo proposal you will have a random stream of instructions for every new block. (15000 PTX instructions / 15 sec blocktime).
On the GPU you will just run the ptx. (cuda will compile and cache the code before execution and it will take a few milliseconds). After the compilation has been done, you get 14.xx seconds left to run the compiled kernel in full speed. On the FPGA you cannot generate the VHDL code compile and flash in 15 seconds, so you need to make a CPU emulator. This is because it would probably difficult,slow or impossible to generate VHDL out of random instructions and run it without timing bugs.

what will happen when cards compatible with the language are no longer produced?
maybe you are planning a pump and dump coin so you don't care :-D

sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 13, 2019, 07:00:57 AM
 #24506

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.
sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 13, 2019, 07:03:54 AM
 #24507

what will happen when cards compatible with the language are no longer produced?

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures. The ptx is compiled to the native gpu language by the NVIDIA driver before execution. If NVIDIA decide to replace PTX with SPTX, you simply need your miner software to convert the random hashing function into SPTX.
pallas
Legendary
*
Offline Offline

Activity: 2212
Merit: 1087


Black Belt Developer


View Profile
September 13, 2019, 07:13:02 AM
 #24508

By using PTX you're esentially using a proprietary language to prevent anything but a Nvidia product
or a Nvidia licensed product from mining your algo. That's one way to make an algo ASIC/FPGA resistant.

Doesn't need to be PTX. You need a pseudo Assembly language that can easily be translated to ptx before execution.
The CPU miner would have to parse this language and create proper native binary before execution. (Create instructions in memory, flush the caches, then execute ) CPU verification is important for the pool/wallet/exchanges.

Yeah, no PTX, that's what I was saying.
==> RandomX

sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 13, 2019, 07:15:06 AM
Last edit: September 13, 2019, 07:45:47 AM by sp_
 #24509

Yeah, no PTX, that's what I was saying.
==> RandomX

So to make a fast randomx miner on NVIDIA you can convert the randomx code to ptx before execution. (Create a new ptx kernel for each block)

Without optimalizations the NVIDIA cards are loosing to the CPU.

randomx benchmarks:

https://bitcointalk.org/index.php?topic=5176747.0

GPUCryptonight-RRandomX
AMD
Vega 642200 H/s1225 H/s
RX 480/580960-1000 H/s400-410 H/s
RX 560 4GB (1400/2200 MHz)495 H/s260 H/s
NVIDIA/EVGA
RTX 2080 Ti (1915/13600 MHz)960-1000 H/s400-410 H/s
GTX 1080 Ti (2037/11800 MHz)927 H/s1122 H/s
GTX 1070 Ti (1900/7600 MHz)625 H/s769 H/s

For CPUs:
CPUCryptonight-RRandomX
AMD 3900X (4.25GHZ ALL CORE, 3600MHZ RAM)1335 H/s13330 H/s
RYZEN 3700X1018 H/s6853 H/s
RYZEN 5 3600803 H/s6580 H/s
INTEL I9 9900K630 H/s2102 H/s
2X XEON E5 2670 V2 930 H/s5815 H/s
INTEL I7 7700K350 H/s2100 H/s

joblo
Legendary
*
Offline Offline

Activity: 1372
Merit: 1096


View Profile
September 13, 2019, 04:33:06 PM
 #24510

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.

The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 13, 2019, 08:24:57 PM
 #24511

The point with ptx is that it's a unified language for all NVIDIA gpu architechtures.
The point s that it's only Nvidia GPU architectures. No ASIC, no FPGA, no Radeon, no CPU.

Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...
joblo
Legendary
*
Offline Offline

Activity: 1372
Merit: 1096


View Profile
September 13, 2019, 10:37:05 PM
 #24512

Doesn't need to be PTX. If you run on NVIDIA hardware you convert the random stream of instructions to PTX. RandomX could be very profitable on NVIDIA hardware with a proper implementation...

Precisely. You can build a Nvidia-only proof of concept, but a real product will need
it's own pseudo language that can be compiled to ptx/cuda, ocl, and x86 native instructions
producing identical functionality. The language would have to complex enough (in the CISC sense)
that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 14, 2019, 05:05:48 AM
 #24513

The language would have to complex enough (in the CISC sense) that the FPGA can't decode with a simple table lookup. That's a hell of a lot of work.

The FPGA have limits to memory access and multipliers. Let's say the FPGA can do 32 multiplications and 32 mem access per cycle, then you might be able to run 32 instruction per cycle. @500mhz


RandomX on the gpu doesn't need any memory access because the code is compiled, and you can run with 1024 threads at 2000Mz.

So the gpu can do 1024 instructions per cycle@2000mz
pallas
Legendary
*
Offline Offline

Activity: 2212
Merit: 1087


Black Belt Developer


View Profile
September 14, 2019, 05:47:38 AM
 #24514

The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.

sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 14, 2019, 05:53:53 AM
Last edit: September 14, 2019, 06:15:07 AM by sp_
 #24515

The FPGA doesn't make N multiplications per cycle. It does N hashes per cycle, with N integer > 0 or, in the case of complex algorithms, 1/N.

Yes, but in Randomx the FPGA need to do a memory read per cycle to determine the instruction to be executed so the N hash doesn't apply. Then the new limit is N instructions where N is limited by the number of memory accesses the chip can do per cycle. In older FPGA designs it was normal to have ASIC multipliers you could use to speedup multiplications (f.ex Altera Cyclone IV). The multiplication could also be done in code.
pallas
Legendary
*
Offline Offline

Activity: 2212
Merit: 1087


Black Belt Developer


View Profile
September 14, 2019, 06:11:31 AM
 #24516

True, but it's also true that you can fill the FPGA with custom made cores each executing RandomX instructions. FPGAs are plenty flexible, much more than GPUs, it only takes much more time to optimise.

sp_
Legendary
*
Offline Offline

Activity: 1848
Merit: 1053

Ccminer developer


View Profile
September 14, 2019, 06:30:33 AM
 #24517

With a compiled kernel, the GPU can execute 15000 Randomx Instructions in 15 cycles per hash@2000mhz.
Kodaman
Jr. Member
*
Offline Offline

Activity: 97
Merit: 1


View Profile
September 24, 2019, 07:07:09 PM
 #24518

maybe this place has the info. There are rumors about x16rv2 has already ASICS.
Any feedback?
joblo
Legendary
*
Offline Offline

Activity: 1372
Merit: 1096


View Profile
September 24, 2019, 07:52:41 PM
 #24519

If you don't produce a source it usually means you're starting the rumour.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
Kodaman
Jr. Member
*
Offline Offline

Activity: 97
Merit: 1


View Profile
September 24, 2019, 07:58:20 PM
 #24520

If you don't produce a source it usually means you're starting the rumour.

iBeLink in California is the ASIC provider. Is the source good enough Wink
Pages: « 1 ... 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 [1226] 1227 1228 1229 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!