pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
March 14, 2016, 09:54:55 AM |
|
On a side note, I've worked on hodlcoin algo (which is similar to memorycoin and has a 1GB scratchpad of "random" data). It is a bit different than the dag file because it depends on the blockheader (including the nonce), still a similar "memory hard" algo. As a test I tried generating the scratchpad slice I need on the fly, instead of doing it all in advance. That way you only need 8KB of data instead of 1GB. Without specific optimisations, it was about half the speed (on CPU). On GPU, it probably would be faster than keeping the full buffer. It is very interesting because generating on fly means an order of magnitude more calculations (for the sha512 part), still it is only 2 times slower because of the much better cache usage.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
March 14, 2016, 09:55:20 AM |
|
Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.
To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it... I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic.. Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board... Here are some numbers from blake coin(8 round blake 256): 1.6GH/s on a ZTEX USB-FPGA 1.15y Quad Spartan-6 LX150 Development Board 1.5GH/s on a Enterpoint Cairnsmore 1 Quad Spartan-6 LX150 Development Board 960MH/s on a Lancelot Dual Spartan-6 LX150 Development Board 360MH/s on a ZTEX USB-FPGA 1.15x Spartan-6 LX150 Development Board
|
|
|
|
chrysophylax
Legendary
Offline
Activity: 3122
Merit: 1093
--- ChainWorks Industries ---
|
 |
March 14, 2016, 09:55:54 AM |
|
Dead on. But I looked at some Blake-256 code, Kramble's. Probably different from the code you're thinking of, but anyways, it wasn't all that awesome. So you'd probably want to do your own design if you're gonna commit to a manufacturing run.
To reduce the cost there are co-ops between developers to print an asic with many kernals in one chip and split the cost. Since blake-256 will take little chip space you might get a good deal, I.E pay 1% of the cost of the chip.. But then you need to draw the circut board, print it (expensive). mount it. And make code for it. If decred's MCAP goes up to 20MUSD it might be worth it... I think FPGA is the way to go.. Less investments, programable, buit also much slower than an asic.. Fuck that noise, if I'm making an ASIC for a coin, I may as well go whole hog. I want to fit as many Blake-256 hashing cores as I can on each chip. I have a Blake-256 Decred implementation I run on FPGA that uses a 56-stage pipeline in order to keep outputting one result per clock tick, yet have very little delay so it'll clock to the moon. I can fit two of them on my Cyclone V - imagine what you could have for hashrate if you could pack a shitton of them on a chip made with the latest fabs (14nm) and put multiple chips on a board... which is THE reason im looking for an investor to take on the 'challenge' ... ooops! ... did i say that out loud? ...  ... #crysx
|
|
|
|
Grim
|
 |
March 14, 2016, 10:01:29 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
|
|
|
|
Ayers
Legendary
Offline
Activity: 2940
Merit: 1024
Make Your Own Fortune
|
 |
March 14, 2016, 10:11:53 AM |
|
also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future the money will always move in way or another and diff will follow
one HUGE flaw ... Decred will have ASIC's in a matter of months. Easy to implement compute only algo. (kindergarten) And ETH actually has a VERY hard memory algo which has pretty much the best ASIC resistance in existence. Yet exactly that coin gos POS ... (strange world we live in, ain't it)  If the algos were the other way around you would be right ... it may be right but decred can always be forked for a better algo, there is an evolution of ethereum algo, the one used by HODL coin, i'm not sure, they could use that for the future if decred get big or maybe a new strong currency will emerge with that algo or a new one, you will never know, like decred emerged from nothing, another altcoin can do the same
|
██████████▄█ ████████▄██▌ ██████▄████ ████▄█████▌ ██▄███▀░▀███▄ ▄███▀█▄░▄█▀███▄ ███████████████ ▀███▄█▀░▀█▄███▀ ██▀███▄░▄███▀ ████▐█████▀ ████████▀ ███▐██▀ ████▀ | Shock | │ | POWER UP YOUR PLAY! | | █████████████████████ ██████▄▄███████▄▄██████ ████▄██▄▀▀███▀▀▄██▄████ ███████▄▀▀███▀▀▄███████ ████▀▄▀█████████▀▄▀████ ████▄▀██▄██▄██▄██▀▄████ ███████████████████████ ████▀▄███▀███▀███▄▀████ ████▄▀▄████▀████▄▀▄████ ███████▀▄▄███▄▄▀███████ ████▀██▀▄▄███▄▄▀██▀████ ██████▀▀███████▀▀██████ █████████████████████ | █████████████████████ ███████████████████████ ███████████████▄█▀█████ ██████████████████░████ ███████████████████░███ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ █████████████████████ | │ | ORIGINAL GAMES INSTANT RAKEBACK WEEKLY REWARDS MONTHLY REWARDS
| │ | . ..100% FIRST DEPOSIT BONUS....PLAY NOW.. |
|
|
|
Ayers
Legendary
Offline
Activity: 2940
Merit: 1024
Make Your Own Fortune
|
 |
March 14, 2016, 10:12:50 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go
|
██████████▄█ ████████▄██▌ ██████▄████ ████▄█████▌ ██▄███▀░▀███▄ ▄███▀█▄░▄█▀███▄ ███████████████ ▀███▄█▀░▀█▄███▀ ██▀███▄░▄███▀ ████▐█████▀ ████████▀ ███▐██▀ ████▀ | Shock | │ | POWER UP YOUR PLAY! | | █████████████████████ ██████▄▄███████▄▄██████ ████▄██▄▀▀███▀▀▄██▄████ ███████▄▀▀███▀▀▄███████ ████▀▄▀█████████▀▄▀████ ████▄▀██▄██▄██▄██▀▄████ ███████████████████████ ████▀▄███▀███▀███▄▀████ ████▄▀▄████▀████▄▀▄████ ███████▀▄▄███▄▄▀███████ ████▀██▀▄▄███▄▄▀██▀████ ██████▀▀███████▀▀██████ █████████████████████ | █████████████████████ ███████████████████████ ███████████████▄█▀█████ ██████████████████░████ ███████████████████░███ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ ███████████████████████ █████████████████████ | │ | ORIGINAL GAMES INSTANT RAKEBACK WEEKLY REWARDS MONTHLY REWARDS
| │ | . ..100% FIRST DEPOSIT BONUS....PLAY NOW.. |
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
March 14, 2016, 10:16:31 AM |
|
@ Wolf
So what is your idea of an ASIC resistant algo? The most extreme algo in that direction is probably Burst.
But since anything can be calculated on the fly ... this is a losing battle?
I'm sorry but fpgas and asics are VERY much against a decentralized distribution. You guys think to much about how to milk a coin and forget on the other hand that nobody cares for a coin which gets milked. Like shooting in your own foot.
(Yes I know you can make a shitton of money that way, but it essentially is against EVERYTHING cryptocoins stand for)
monero is a good candidate, since not even gpu are efficient there, so asic will not be efficient too, hodlcoin use an evolution of monero algo, so that is the way to go gpus are no more efficient than cpus on monero and hodl because cpus use the aes extension. if the gpu had the same, they'd be much more efficient than cpus. still the post by wolf0 is valid. you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
|
|
|
|
Grim
|
 |
March 14, 2016, 10:22:26 AM |
|
you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
so how is that done? any example already out there?
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
March 14, 2016, 10:26:53 AM |
|
you don't need to be "memory hard", you need a "changing" algo so a fixed chip design is more difficult. that's not the case of monero and hodl.
so how is that done? any example already out there? not that I know of. but I don't understand all that "asic resistant" hype. as people has been saying for years: if it's worth, asics will come. worth = high market cap: if you invested in the coin, you should be happy, not sad ;-)
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
March 14, 2016, 01:13:34 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Since the sourcecode will be ptx assembly I also can support linux users. Since operations on constants can be precalculated, the compiler will reduce the number of instructions needed for you, so you end up with a kernal that use less instructions than before.. Release #4 will be near optimal..
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
March 14, 2016, 01:16:57 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
March 14, 2016, 01:20:19 PM Last edit: March 14, 2016, 01:34:28 PM by sp_ |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate.
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
March 14, 2016, 01:52:19 PM |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate. so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2926
Merit: 1087
Team Black developer
|
 |
March 14, 2016, 01:55:49 PM Last edit: March 14, 2016, 03:06:04 PM by sp_ |
|
Ok. I have found away to do the optimal decred kernal now. http://stackoverflow.com/questions/15842507/passing-the-ptx-program-to-the-cuda-driver-directlySo I will generate the ptx assembly with the midstate data included in the instructions. Then for every time the midstate is changing, I recompile the kernal runtime with the API calls described in the article. To estimate the speedgain you can replace all the constant mem access with contstants in the 1.7.4 code.. Release #4 will be optimal.. Interesting technique. But I doubt you'll gain even 1% from it, likely less. you will, because some of the first rounds will be gone.. (instructions are removed since they work on constant data..) You can try it. replace the d_data[0]...d_data[23] with constant data 0x01234567 etc; make sure every constant is different from each other.. Compile,read the ptx, and count the lines before and after. Then you don't have 14 round blake kernal. but a 12 rounds blake kernal that only works for one midstate. And solves the 14 round blake problem for one given midstate. so, when in solo mode everytime you get a new transaction or block (and on a pool it's not much different), you will recompile the kernel? doesn't look optimal to me. There is a faster way. Poke the new constants directly into the binary of the gpu. (self modified code.). Once the binary has been made, only 24 (+) constant numbers needs to be changed (on a new transaction or block), then the kernal needs to be reloaded to the gpu with a cacheflush (cudadevice reset) or perhaps there is a api call that can load/reload a .cubin file directly.
|
|
|
|
|
joblo
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
March 14, 2016, 04:59:52 PM |
|
Nice hack. I've always had a soft spot for self modifying code. I once implemented a switch/case that way because there wasn't enough memory for a jump table. I didn't think it was still possible with modern cpus and all their protections.
|
|
|
|
bensam1231
Legendary
Offline
Activity: 1848
Merit: 1024
|
 |
March 14, 2016, 05:30:19 PM |
|
ethereum is much more profitable to mine so this is pointless, i can mine ethereum and buy more decred than mining decred
And BTC used to be profitable for GPUs to mine. Things change. We're in a huge profit bubble right now and that can pop at any moment and then all hell is going to break loose when all that Eth hash hits all the other GPU coins. it does not work like that, they dump? i'm fine, diff will adjust = same profit as before Oh yeah? I don't think it works the way you're thinking. Why do you think profitability will be the same if Eth loses market value? No other coin is nearly as profitable and Eth has hand over fist more hash then any other coin. If it starts to equalize the other coins can't support the amount of hash. As I mentioned before, GPU mining hash has grown about 30% in the last two weeks... Maybe closer to 50% as Eth has gained another 300Mh since then. This is completely putting aside Eth can crash and it can go PoS, which means no more mining. They have talked about PoS already. Decred has some pretty damned good profitability - it may not exceed Eth for all GPUs, but it comes fairly close. Yeah, but quite fragile. Eth has a lot of hash and volume going for it. That isn't easily upset. If people from Eth all jumped on Decred it'd instantly bottom out. also if ethereum go pos, another big coin will emerge, probably decred, so a pump there is not so unexpected in the near future the money will always move in way or another and diff will follow
Investors don't just decide to invest in a new coin when one goes PoS in order to feed miners money. If Eth dies, either by bottoming out or PoS, miners are more then likely SoL. Decred and Vanilla are the next closest things. Before Eth it was Dash and Dash has been private kernels/ASIC for quite some time... three months ago we were making $.50 profit on a 970, today it's $6... This is definitely a high point and it shouldn't be expected it'll stay this way.
|
I buy private Nvidia miners. Send information and/or inquiries to my PM box.
|
|
|
malekbaba
Legendary
Offline
Activity: 1526
Merit: 1026
|
 |
March 14, 2016, 09:16:13 PM |
|
As per my opinion performance of 970 is equal to 2.7x gtx 750ti. What would be clever idea, 1 gtx 970 or 3x 750ti would be better to start with? Some points: 1. If any how gpu dies, in case of 970, some one will loose $350. But 1 750ti will cost $120. 2. In both case almost same amount of electricity bill will be needed. 3. Regarding eth, 970 is solely winner.
I am confused. Should i buy 1x 970 or 3x 750ti?
Also mention if u have other choice
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
 |
March 14, 2016, 09:52:40 PM |
|
As per my opinion performance of 970 is equal to 2.7x gtx 750ti. What would be clever idea, 1 gtx 970 or 3x 750ti would be better to start with? Some points: 1. If any how gpu dies, in case of 970, some one will loose $350. But 1 750ti will cost $120. 2. In both case almost same amount of electricity bill will be needed. 3. Regarding eth, 970 is solely winner.
I am confused. Should i buy 1x 970 or 3x 750ti?
Also mention if u have other choice
gpu's don't die like that unless you really don't take care or them and you can always RMA'd them.
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
malekbaba
Legendary
Offline
Activity: 1526
Merit: 1026
|
 |
March 14, 2016, 10:07:21 PM |
|
I was sleeping and my 970 was mining. While i woke up, i found my pc in comatose form. Pc was running but no display and there was burning smell from my cpu. I found something greeze like product in the back end of my gpu and it died that way.
|
|
|
|
|