CCminer(SP-MOD) Modded GPU kernels.

Grim

Sr. Member

Offline

Activity: 504
Merit: 252

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:19:59 AM

#10021

Quote from: Ayers on March 14, 2016, 07:28:33 AM

also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future
the money will always move in way or another and diff will follow

one HUGE flaw ...

Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten)

And ETH actually has a VERY hard memory algo which has pretty much the best ASIC resistance in existence.
Yet exactly that coin gos POS ... (strange world we live in, ain't it) Roll Eyes

If the algos were the other way around you would be right ...

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:32:11 AM

#10022

Etherum will hardfork, but dwarfpool has 57% of the network hashrate. What if they refuse to upgrade the wallet. Then there will be no hardfork..

Etherum

Nethashrate: 1 303 320 GHASH
Dwarfpool: 740 872 GHASH

In the early Dash days (darkcoin) many pools refused to upgrade their wallets, because the payouts to the miners was reduced.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:34:47 AM

#10023

Quote from: Grim on March 14, 2016, 09:19:59 AM

one HUGE flaw ...
Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten)

The VHDL code is already there opensource. blake-256. But the problem is the cost of producing the chips. Decred has a marketcap of 2MUSD. An altcoin mosqito.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:43:50 AM

#10024

Quote from: Wolf0 on March 14, 2016, 09:39:13 AM

Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.

To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it...

I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic..

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:54:55 AM

#10025

On a side note, I've worked on hodlcoin algo (which is similar to memorycoin and has a 1GB scratchpad of "random" data).
It is a bit different than the dag file because it depends on the blockheader (including the nonce), still a similar "memory hard" algo.
As a test I tried generating the scratchpad slice I need on the fly, instead of doing it all in advance. That way you only need 8KB of data instead of 1GB.
Without specific optimisations, it was about half the speed (on CPU). On GPU, it probably would be faster than keeping the full buffer.
It is very interesting because generating on fly means an order of magnitude more calculations (for the sha512 part), still it is only 2 times slower because of the much better cache usage.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:55:20 AM

#10026

Quote from: Wolf0 on March 14, 2016, 09:52:13 AM

Quote from: sp_ on March 14, 2016, 09:43:50 AM

Quote from: Wolf0 on March 14, 2016, 09:39:13 AM

Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.

To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it...
I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic..

Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board...

Here are some numbers from blake coin(8 round blake 256):

1.6GH/s on a ZTEX USB-FPGA 1.15y Quad Spartan-6 LX150 Development Board
1.5GH/s on a Enterpoint Cairnsmore 1 Quad Spartan-6 LX150 Development Board
960MH/s on a Lancelot Dual Spartan-6 LX150 Development Board
360MH/s on a ZTEX USB-FPGA 1.15x Spartan-6 LX150 Development Board

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

chrysophylax

Legendary

Offline

Activity: 2814
Merit: 1091

--- ChainWorks Industries ---

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 09:55:54 AM

#10027

Quote from: Wolf0 on March 14, 2016, 09:52:13 AM

Quote from: sp_ on March 14, 2016, 09:43:50 AM

Quote from: Wolf0 on March 14, 2016, 09:39:13 AM

Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.

To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it...

I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic..

Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board...

which is THE reason im looking for an investor to take on the 'challenge' ...

ooops! ... did i say that out loud? ...

Wink

...

#crysx

CWI-Thread (theFORUM) - https://bitcointalk.org/index.php?topic=1563601 . CWI-WebSite (theSITE) - https://chainworksindustries.com/ . CWI-Shop (theSHOP) - https://chainworksindustries.com/theSHOP.html .

Grim

Sr. Member

Offline

Activity: 504
Merit: 252

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:01:29 AM

#10028

@ Wolf

So what is your idea of an ASIC resistant algo?
The most extreme algo in that direction is probably Burst.

But since anything can be calculated on the fly ... this is a losing battle?

I'm sorry but fpgas and asics are VERY much against a decentralized distribution.
You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.

(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)

Ayers

Legendary

Offline

Activity: 2604
Merit: 1023

Leading Crypto Sports Betting & Casino Platform

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:11:53 AM

#10029

Quote from: Grim on March 14, 2016, 09:19:59 AM

Quote from: Ayers on March 14, 2016, 07:28:33 AM

also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future
the money will always move in way or another and diff will follow

one HUGE flaw ...

Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten)

And ETH actually has a VERY hard memory algo which has pretty much the best ASIC resistance in existence.
Yet exactly that coin gos POS ... (strange world we live in, ain't it) Roll Eyes

If the algos were the other way around you would be right ...

it may be right but decred can always be forked for a better algo, there is an evolution of ethereum algo, the one used by HODL coin, i'm not sure, they could use that for the future if decred get big
or maybe a new strong currency will emerge with that algo or a new one, you will never know, like decred emerged from nothing, another altcoin can do the same

.
^SPIN

.
^RIUM

.
███
███
███
███
███
███
███
███
███
███
███
███

SAFE GAMES
WITH WITHDRAWALS

│

▄▀▀▀▀▀▀▄▄▄▄ ▄▀▀▀▀▀▀▀▀▀▀▀▀▄ ▀▀▄ █ ▄ █ ▀▌ █ █ █ █ ▌ █ ▄█▄ █ ▐ █ ▄███▄ █ ▌ █ ███████ █ ▐ █ ▀▀ █ ▀▀ █ ▌ █ ▄███▄ █ ▐ █ █▐▌ █ █ █ █▌ ▀▄▄▄▄▄▄▄▄█▄▄▄▀

.DEPOSIT BONUS 150%.
UP TO $1500

▄▀▀▀▀▀▀▄▄▄▄ ▄▀▀▀▀▀▀▀▀▀▀▀▀▄ ▀▀▄ █ ▄ █ ▀▌ █ █ █ █ ▌ █ ▄█▄ █ ▐ █ ▄███▄ █ ▌ █ ███████ █ ▐ █ ▀▀ █ ▀▀ █ ▌ █ ▄███▄ █ ▐ █ █▐▌ █ █ █ █▌ ▀▄▄▄▄▄▄▄▄█▄▄▄▀

.
███
███
███
███
███
███
███
███
███
███
███
███

.
.^{SIGN UP}.

Ayers

Legendary

Offline

Activity: 2604
Merit: 1023

Leading Crypto Sports Betting & Casino Platform

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:12:50 AM

#10030

Quote from: Grim on March 14, 2016, 10:01:29 AM

@ Wolf

So what is your idea of an ASIC resistant algo?
The most extreme algo in that direction is probably Burst.

But since anything can be calculated on the fly ... this is a losing battle?

I'm sorry but fpgas and asics are VERY much against a decentralized distribution.
You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.

(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)

monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go

.
^SPIN

.
^RIUM

.
███
███
███
███
███
███
███
███
███
███
███
███

SAFE GAMES
WITH WITHDRAWALS

│

▄▀▀▀▀▀▀▄▄▄▄ ▄▀▀▀▀▀▀▀▀▀▀▀▀▄ ▀▀▄ █ ▄ █ ▀▌ █ █ █ █ ▌ █ ▄█▄ █ ▐ █ ▄███▄ █ ▌ █ ███████ █ ▐ █ ▀▀ █ ▀▀ █ ▌ █ ▄███▄ █ ▐ █ █▐▌ █ █ █ █▌ ▀▄▄▄▄▄▄▄▄█▄▄▄▀

.DEPOSIT BONUS 150%.
UP TO $1500

▄▀▀▀▀▀▀▄▄▄▄ ▄▀▀▀▀▀▀▀▀▀▀▀▀▄ ▀▀▄ █ ▄ █ ▀▌ █ █ █ █ ▌ █ ▄█▄ █ ▐ █ ▄███▄ █ ▌ █ ███████ █ ▐ █ ▀▀ █ ▀▀ █ ▌ █ ▄███▄ █ ▐ █ █▐▌ █ █ █ █▌ ▀▄▄▄▄▄▄▄▄█▄▄▄▀

.
███
███
███
███
███
███
███
███
███
███
███
███

.
.^{SIGN UP}.

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:16:31 AM

#10031

Quote from: Ayers on March 14, 2016, 10:12:50 AM

Quote from: Grim on March 14, 2016, 10:01:29 AM

@ Wolf

So what is your idea of an ASIC resistant algo?
The most extreme algo in that direction is probably Burst.

But since anything can be calculated on the fly ... this is a losing battle?

I'm sorry but fpgas and asics are VERY much against a decentralized distribution.
You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.

(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)

monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go

gpus are no more efficient than cpus on monero and hodl because cpus use the aes extension.
if the gpu had the same, they'd be much more efficient than cpus.
still the post by wolf0 is valid.
you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult.
that's not the case of monero and hodl.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

Grim

Sr. Member

Offline

Activity: 504
Merit: 252

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:22:26 AM

#10032

Quote from: pallas on March 14, 2016, 10:16:31 AM

you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult.
that's not the case of monero and hodl.

so how is that done? any example already out there?

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 10:26:53 AM

#10033

Quote from: Grim on March 14, 2016, 10:22:26 AM

Quote from: pallas on March 14, 2016, 10:16:31 AM

you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult.
that's not the case of monero and hodl.

so how is that done? any example already out there?

not that I know of.
but I don't understand all that "asic resistant" hype.
as people has been saying for years: if it's worth, asics will come.
worth = high market cap: if you invested in the coin, you should be happy, not sad ;-)

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 01:13:34 PM

#10034

Ok. I have found away to do the optimal decred kernal now.

http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly

So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Since the sourcecode will be ptx assembly I also can support linux users. Since operations on constants can be precalculated, the compiler will reduce the number of instructions needed for you, so you end up with a kernal that use less instructions than before..

Release #4 will be near optimal..

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 01:16:57 PM

#10035

Quote from: sp_ on March 14, 2016, 01:13:34 PM

Ok. I have found away to do the optimal decred kernal now.

http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly

So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..

Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 01:20:19 PM
Last edit: March 14, 2016, 01:34:28 PM by sp_

#10036

Quote from: pallas on March 14, 2016, 01:16:57 PM

Quote from: sp_ on March 14, 2016, 01:13:34 PM

Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..

Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.

Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 01:52:19 PM

#10037

Quote from: sp_ on March 14, 2016, 01:20:19 PM

Quote from: pallas on March 14, 2016, 01:16:57 PM

Quote from: sp_ on March 14, 2016, 01:13:34 PM

Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..

Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.

Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.

so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 01:55:49 PM
Last edit: March 14, 2016, 03:06:04 PM by sp_

#10038

Quote from: pallas on March 14, 2016, 01:52:19 PM

Quote from: sp_ on March 14, 2016, 01:20:19 PM

Quote from: pallas on March 14, 2016, 01:16:57 PM

Quote from: sp_ on March 14, 2016, 01:13:34 PM

Ok. I have found away to do the optimal decred kernal now.
http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directly
So I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article.
To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal..

Interesting technique.
But I doubt you'll gain even 1% from it, likely less.

you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after.
Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.

so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.

There is a faster way. Poke the new constants directly into the binary of the gpu. (self modified code.). Once the binary has been made, only 24 (+) constant numbers needs to be changed (on a new transaction or block), then the kernal needs to be reloaded to the gpu with a cacheflush (cudadevice reset) or perhaps there is a api call that can load/reload a .cubin file directly.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

sp_ (OP)

Legendary

Offline

Activity: 2898
Merit: 1087

Team Black developer

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 03:16:34 PM

#10039

Quote from: Wolf0 on March 14, 2016, 03:08:15 PM

I kinda doubt there's a documented and stable, supported method of doing so...

You can do it safe:

1. Put the compiled cubin in a ramdisk. (virtual memory drive)
2. Poke the constant values with the cpu directly in the file. (the locations can be found with disassembly and the offsets might change from compiler to compiler (cuda versions) )
2. call the cuda api call cuModuleLoad

https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDA__MODULE_g366093bd269dafd0af21f1c7d18115d3.html

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner

joblo

Legendary

Offline

Activity: 1470
Merit: 1114

Re: CCminer(SP-MOD) Modded NVIDIA Maxwell kernels.

March 14, 2016, 04:59:52 PM

#10040

Quote from: Wolf0 on March 14, 2016, 04:05:23 PM

Quote from: sp_ on March 14, 2016, 03:16:34 PM

Quote from: Wolf0 on March 14, 2016, 03:08:15 PM

I kinda doubt there's a documented and stable, supported method of doing so...

You can do it safe:

1. Put the compiled cubin in a ramdisk. (virtual memory drive)
2. Poke the constant values with the cpu directly in the file. (the locations can be found with disassembly and the offsets might change from compiler to compiler (cuda versions) )
2. call the cuda api call cuModuleLoad

https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDA__MODULE_g366093bd269dafd0af21f1c7d18115d3.html

I stand corrected.

Nice hack. I've always had a soft spot for self modifying code. I once implemented a switch/case that way because there
wasn't enough memory for a jump table. I didn't think it was still possible with modern cpus and all their protections.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,