Grim
|
|
March 14, 2016, 09:19:59 AM |
|
also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future the money will always move in way or another and diff will follow
one HUGE flaw ... Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten) And ETH actually has a VERY hard memory algo which has pretty much the best ASIC resistance in existence. Yet exactly that coin gos POS ... (strange world we live in, ain't it) If the algos were the other way around you would be right ...
|
|
|
|
|
|
|
|
|
The Bitcoin network protocol was designed to be extremely flexible. It can be used to create timed transactions, escrow transactions, multi-signature transactions, etc. The current features of the client only hint at what will be possible in the future.
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 09:32:11 AM |
|
Etherum will hardfork, but dwarfpool has 57% of the network hashrate. What if they refuse to upgrade the wallet. Then there will be no hardfork..
Etherum
Nethashrate: 1 303 320 GHASH Dwarfpool: 740 872 GHASH
In the early Dash days (darkcoin) many pools refused to upgrade their wallets, because the payouts to the miners was reduced.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 09:34:47 AM |
|
one HUGE flaw ... Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten)
The VHDL code is already there opensource. blake-256. But the problem is the cost of producing the chips. Decred has a marketcap of 2MUSD. An altcoin mosqito.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 09:43:50 AM |
|
Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.
To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it... I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic..
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
March 14, 2016, 09:54:55 AM |
|
On a side note, I've worked on hodlcoin algo (which is similar to memorycoin and has a 1GB scratchpad of "random" data). It is a bit different than the dag file because it depends on the blockheader (including the nonce), still a similar "memory hard" algo. As a test I tried generating the scratchpad slice I need on the fly, instead of doing it all in advance. That way you only need 8KB of data instead of 1GB. Without specific optimisations, it was about half the speed (on CPU). On GPU, it probably would be faster than keeping the full buffer. It is very interesting because generating on fly means an order of magnitude more calculations (for the sha512 part), still it is only 2 times slower because of the much better cache usage.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 09:55:20 AM |
|
Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.
To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it... I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic.. Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board... Here are some numbers from blake coin(8 round blake 256): 1.6GH/s on a ZTEX USB-FPGA 1.15y Quad Spartan-6 LX150 Development Board 1.5GH/s on a Enterpoint Cairnsmore 1 Quad Spartan-6 LX150 Development Board 960MH/s on a Lancelot Dual Spartan-6 LX150 Development Board 360MH/s on a ZTEX USB-FPGA 1.15x Spartan-6 LX150 Development Board
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 2814
Merit: 1091
--- ChainWorks Industries ---
|
|
March 14, 2016, 09:55:54 AM |
|
Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.
To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it... I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic.. Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board... which is THE reason im looking for an investor to take on the 'challenge' ... ooops! ... did i say that out loud? ... ... #crysx
|
|
|
|
Grim
|
|
March 14, 2016, 10:01:29 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
|
|
|
|
Ayers
Legendary
Offline
Activity: 2604
Merit: 1023
Leading Crypto Sports Betting & Casino Platform
|
|
March 14, 2016, 10:11:53 AM |
|
also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future the money will always move in way or another and diff will follow
one HUGE flaw ... Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten) And ETH actually has a VERY hard memory algo which has pretty much the best ASIC resistance in existence. Yet exactly that coin gos POS ... (strange world we live in, ain't it) If the algos were the other way around you would be right ... it may be right but decred can always be forked for a better algo, there is an evolution of ethereum algo, the one used by HODL coin, i'm not sure, they could use that for the future if decred get big or maybe a new strong currency will emerge with that algo or a new one, you will never know, like decred emerged from nothing, another altcoin can do the same
|
|
|
|
Ayers
Legendary
Offline
Activity: 2604
Merit: 1023
Leading Crypto Sports Betting & Casino Platform
|
|
March 14, 2016, 10:12:50 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
March 14, 2016, 10:16:31 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go gpus are no more efficient than cpus on monero and hodl because cpus use the aes extension. if the gpu had the same, they'd be much more efficient than cpus. still the post by wolf0 is valid. you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
|
|
|
|
Grim
|
|
March 14, 2016, 10:22:26 AM |
|
you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
so how is that done? any example already out there?
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
March 14, 2016, 10:26:53 AM |
|
you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
so how is that done? any example already out there? not that I know of. but I don't understand all that "asic resistant" hype. as people has been saying for years: if it's worth, asics will come. worth = high market cap: if you invested in the coin, you should be happy, not sad ;-)
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 01:13:34 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Since the sourcecode will be ptx assembly I also can support linux users. Since operations on constants can be precalculated, the compiler will reduce the number of instructions needed for you, so you end up with a kernal that use less instructions than before.. Release #4 will be near optimal..
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
March 14, 2016, 01:16:57 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 01:20:19 PM Last edit: March 14, 2016, 01:34:28 PM by sp_ |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
March 14, 2016, 01:52:19 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate. so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2898
Merit: 1087
Team Black developer
|
|
March 14, 2016, 01:55:49 PM Last edit: March 14, 2016, 03:06:04 PM by sp_ |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate. so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me. There is a faster way. Poke the new constants directly into the binary of the gpu. (self modified code.). Once the binary has been made, only 24 (+) constant numbers needs to be changed (on a new transaction or block), then the kernal needs to be reloaded to the gpu with a cacheflush (cudadevice reset) or perhaps there is a api call that can load/reload a .cubin file directly.
|
|
|
|
|
joblo
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
March 14, 2016, 04:59:52 PM |
|
Nice hack. I've always had a soft spot for self modifying code. I once implemented a switch/case that way because there wasn't enough memory for a jump table. I didn't think it was still possible with modern cpus and all their protections.
|
|
|
|
|