rocks
Legendary
Offline
Activity: 1153
Merit: 1000
|
|
August 20, 2013, 12:28:33 AM |
|
Yes, there was a significant increase of hash rate for FPGAs in SHA256, but this is an effect of the possibility to fully unroll the SHA256 core. It then goes one hash per clock. MHz = MHash/s This is algorithmically impossible for scrypt as this algorithm was specially designed to be resistant against that. In most linear algorithms you have two possibilities: Get a speed up with the need of more ressources, or save ressources but achieve a lower computation rate. If one side decreases, the other one increases and vice versa. Not so for scrypt. Both sides increase nearly equally. Here you can see a scrypt demonstration on FPGA with hashrates ~ 2kh/s. (You need to be very experienced to make it twice that speed!): https://github.com/kramble/FPGA-Litecoin-MinerThank you hf_developer, this was very helpful. I was not aware of this effort and look forward to reading and understanding the code better. The design only used the on-chip FPGA RAM of an LX150, which is fairly limited. With many FPGAs you can have multiple 64-bit external memory ports that all run at full speed, for example 4 ports * 64bits/port * 200MHz optimized design yields 6.4 GBytes/sec of memory bandwidth. Even a basic LX150 includes integrated Memory Controller blocks for DDR1-DRR3 memories at up to 12.8 Gb/s peak bandwidth (from spec sheet). This chip may or may not be optimal for scrypt, other chips offer higher max bandwidth. I think the main point is memory bandwidth shouldn't be a bottle neck if done right.
|
|
|
|
yohan (OP)
|
|
August 20, 2013, 09:07:25 AM |
|
On Litecoin we do have a memory add-on for CM1 going into manufacture. I don't want anyone to think we have Litecoin mining now running on CM1. We don't and won't for at least some time to come give our current workload. This new module will simply be available anyone that wants to try and implement it on a CM1. On the big boards we will come back to Litecoin at some point but no promises when that might be.
I _might_ be interested in one such CM1 board with memory add-ons in order to try and hack a solution together in my spare time (which is very little these days between work and babies, and it is doubtful I would have enough time). Couple questions: 1) Have you done any sizing to estimate/optimize both the amount of memory and # of memory ports that would be optimal for scrypt and CM1? Otherwise the board configuration is a stab in the dark. 2) Would such a board come with the necessary IP blocks needed to interface with the add-on memory ports during Verilog development? This would make development easier and is usually provided with development boards. 3) If someone did successfully develop a scrypt miner, would you offer any business terms to make it available for others? We don't think the memory solution for CM1 will be the most optimal Litecoin solution possible with FPGAs and interconnect bandwidth may be, or not, somewhat of a limitation. There is a local FPGA with the memory that may help in this solution. Whilst we think CM1s will remain profitable (more earning than electric cost) in Bitcoin mining for some time to come yet but we would like to see the many CM1 owners with a way forward with their boards into Litecoin. We don't have them yet in final form but all the elements for the accessing the memory have been done before by the team before so I would anticipate we could get that knocked together into a workable form some time in September. Someone made the comment that GPUs are ASICs. If you are working on that basis FPGAs are also ASICs. However it is worth saying that FPGAs can often compete with, and often beat, GPUs albeit usually with a different structural approach. Outside maybe a few people already working on FPGA Litecoin solutions I doubt anyone actually has the knowledge to say which will be better - GPU or FPGA at the moment. Our point here is that there is a awful lot of CM1s out there with potentially nothing to do in maybe 6-12 months time when we foresee Bitcoin being no longer profitable and provided CM1s operate in profit no reason not to do Litecoin at whatever level a CM1 can achieve. I will also say that anyone having a serious attempt to do a Litecoin solution should talk to us partly so we know what is happening but it will also allow us to offer some limited help maybe technically or even some form of financial reward for a solution. On the latter this might be royalty (say based on memory module sales), or a straight payment in some form, or even a bounty. We have not discussed this or set anything up yet so I can't say more at this point about what we might do.
|
|
|
|
Milan77
|
|
August 20, 2013, 09:21:27 AM |
|
There is a easier way to keep FPGA boards in game, just write sha1 bruteforcer for them. It will keep them in a play for few more years!
|
|
|
|
hf_developer
Member
Offline
Activity: 66
Merit: 10
|
|
August 20, 2013, 05:40:06 PM |
|
Yes, there was a significant increase of hash rate for FPGAs in SHA256, but this is an effect of the possibility to fully unroll the SHA256 core. It then goes one hash per clock. MHz = MHash/s This is algorithmically impossible for scrypt as this algorithm was specially designed to be resistant against that. In most linear algorithms you have two possibilities: Get a speed up with the need of more ressources, or save ressources but achieve a lower computation rate. If one side decreases, the other one increases and vice versa. Not so for scrypt. Both sides increase nearly equally. Here you can see a scrypt demonstration on FPGA with hashrates ~ 2kh/s. (You need to be very experienced to make it twice that speed!): https://github.com/kramble/FPGA-Litecoin-MinerThank you hf_developer, this was very helpful. I was not aware of this effort and look forward to reading and understanding the code better. The design only used the on-chip FPGA RAM of an LX150, which is fairly limited. With many FPGAs you can have multiple 64-bit external memory ports that all run at full speed, for example 4 ports * 64bits/port * 200MHz optimized design yields 6.4 GBytes/sec of memory bandwidth. Even a basic LX150 includes integrated Memory Controller blocks for DDR1-DRR3 memories at up to 12.8 Gb/s peak bandwidth (from spec sheet). This chip may or may not be optimal for scrypt, other chips offer higher max bandwidth. I think the main point is memory bandwidth shouldn't be a bottle neck if done right. Here is one portion of scrypt core: for (i = 0; i < 1024; i += 2) { memcpy(&V[i * 32], X, 128); salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]); memcpy(&V[(i + 1) * 32], X, 128); salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]); } As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses)
|
|
|
|
yohan (OP)
|
|
August 21, 2013, 09:23:03 AM |
|
Yes, there was a significant increase of hash rate for FPGAs in SHA256, but this is an effect of the possibility to fully unroll the SHA256 core. It then goes one hash per clock. MHz = MHash/s This is algorithmically impossible for scrypt as this algorithm was specially designed to be resistant against that. In most linear algorithms you have two possibilities: Get a speed up with the need of more ressources, or save ressources but achieve a lower computation rate. If one side decreases, the other one increases and vice versa. Not so for scrypt. Both sides increase nearly equally. Here you can see a scrypt demonstration on FPGA with hashrates ~ 2kh/s. (You need to be very experienced to make it twice that speed!): https://github.com/kramble/FPGA-Litecoin-MinerThank you hf_developer, this was very helpful. I was not aware of this effort and look forward to reading and understanding the code better. The design only used the on-chip FPGA RAM of an LX150, which is fairly limited. With many FPGAs you can have multiple 64-bit external memory ports that all run at full speed, for example 4 ports * 64bits/port * 200MHz optimized design yields 6.4 GBytes/sec of memory bandwidth. Even a basic LX150 includes integrated Memory Controller blocks for DDR1-DRR3 memories at up to 12.8 Gb/s peak bandwidth (from spec sheet). This chip may or may not be optimal for scrypt, other chips offer higher max bandwidth. I think the main point is memory bandwidth shouldn't be a bottle neck if done right. Here is one portion of scrypt core: for (i = 0; i < 1024; i += 2) { memcpy(&V[i * 32], X, 128); salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]); memcpy(&V[(i + 1) * 32], X, 128); salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]); } As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses) Actually indirectly we have already done a 1024 memory interface/FPGA in our HPC product Merrick4 that has 1024 bit memory interface and 16GB of local DDR3. What is different here is that this is done with 16 S6 FPGAs working together. Cost base is also expensive before anyone asks. Once we have more time we will look at the viability of doing Litecoin on all of our HPC products. There are some better that than Merrick4 in the pipeline that have much more memory bandwidth and will trash GPUs in many applications and quite possibly Litecoin too.
|
|
|
|
rocks
Legendary
Offline
Activity: 1153
Merit: 1000
|
|
August 21, 2013, 06:14:56 PM |
|
Here is one portion of scrypt core:
for (i = 0; i < 1024; i += 2) { memcpy(&V[i * 32], X, 128);
salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]);
memcpy(&V[(i + 1) * 32], X, 128);
salsa20_8(&X[0], &X[16]); salsa20_8(&X[16], &X[0]); }
As you can see, you have to memcopy 128x8bit = 1024 bit. If you have those 64 bit rambusses you will need 16 of them to make this operation work in one cycle. This has to be done 2 times per loop, looping 1024 times (in full scrypt algorithm). You cannot hard-unroll this loop as you can with SHA256. You see, even if you had rambusses of 1024 bit, you need at least 2048 ram operations per scrypt. Assume your hardware runs at 500MHz. Divide this by 2048. Even then you cannot get higher rates than 250 kh/s. (...and you will never see an FPGA with 1024 bit rambusses)
So essentially you either need 2Mb/sec bandwidth per hash, or ~128KB on chip memory per hashing core. This means that an FPGA with 2 Gb/sec total memory and perfect pipelining would only achieve 1kHash/sec. OK the difficulty is clear, thanks. This also means a GPU card achieving ~250kHash/sec has over 500Gb/sec of usable memory bandwidth, that is very impressive. It also looks like the scrypt parameters litecoin chooses are optimized to take full advantage of common high-end GPU characteristics and no more, with a balanced ratio between GPU cores to B/W per core. If litecoin selected slightly larger parameters it seem likely GPUs that would be much less efficient and not be able to utilize all their available cores, but as it GPU bandwidth is just able to feed all the GPU cores...
|
|
|
|
yohan (OP)
|
|
September 18, 2013, 09:26:03 AM |
|
One of our test boards with the new concept Cairnsmore4 and Controller1 module fitted. This board supports 16 Clusters of up to 9 Bitfury ASICs. We will talk more about the spec and pricing when we are happy with the firmware/software, thermal solution and are ready to ship. Meanwhile enjoy.
|
|
|
|
shapemaker
Full Member
Offline
Activity: 238
Merit: 100
I run Linux on my abacus.
|
|
September 18, 2013, 11:47:23 AM Last edit: September 18, 2013, 12:35:12 PM by shapemaker |
|
One of our test boards with the new concept Cairnsmore4 and Controller1 module fitted. This board supports 16 Clusters of up to 9 Bitfury ASICs. We will talk more about the spec and pricing when we are happy with the firmware/software, thermal solution and are ready to ship. Meanwhile enjoy. So it has 144 BF ASICs. At 22,5 eur per chip, the chip cost alone is 3240 euros without bulk discounts. If you manage to get 4 GHash/s from each chip (as burnin already has), we're looking at 576 GH/s. At maybe 4 W per chip, the full unit would be using around 600 Watts of power. If you manage to price that competitively, I'm sure you will have sales. The pricing is what will decide if people want that or not. Time to market is essential at the moment though so don't take too long. edit: If you manage to deliver in October, that unit should be able to mine 50-75 BTC between Oct and May, depending on, of course, how harshly the difficulty rises in the next year. That will leave some wiggle room in pricing, so if we deduct chip price, we're looking at maybe 3000-3500 euros ROI. Now you just need to decide how much you want from that 3k euros and how much the customer should get.
|
Shut up and give me money: 115UAYWLPTcRQ2hrT7VNo84SSFE5nT5ozo
|
|
|
joeventura
|
|
September 18, 2013, 12:13:40 PM |
|
I predict BTC68 will be the cost
$14 USD a GH
|
|
|
|
markm
Legendary
Offline
Activity: 3024
Merit: 1121
|
|
September 18, 2013, 12:17:40 PM |
|
So a loss of eighteen to minus-two bitcoins, then? Seems like averaging across that range you're more likely to make a loss than a gain...
-MarkM-
|
|
|
|
shapemaker
Full Member
Offline
Activity: 238
Merit: 100
I run Linux on my abacus.
|
|
September 18, 2013, 12:27:13 PM |
|
I predict BTC68 will be the cost
$14 USD a GH At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot.
|
Shut up and give me money: 115UAYWLPTcRQ2hrT7VNo84SSFE5nT5ozo
|
|
|
eve
|
|
September 18, 2013, 02:32:12 PM |
|
I predict BTC68 will be the cost
$14 USD a GH At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot. 20-30 btc will be more attractive
|
|
|
|
CryptoCluster
Member
Offline
Activity: 84
Merit: 10
|
|
September 18, 2013, 02:36:41 PM |
|
I predict BTC68 will be the cost
$14 USD a GH At that price there wouldn't be much point in buying. I'd guess 45-50 BTC would be a decent spot. 20-30 btc will be more attractive And 0.5-1 BTC even more attractive.
|
"The cumulative development of a medium of exchange on the free market — is the only way money can become established. ... government is powerless to create money for the economy; it can only be developed by the processes of the free market." M. N. Rothbard
|
|
|
rocks
Legendary
Offline
Activity: 1153
Merit: 1000
|
|
September 18, 2013, 05:41:46 PM |
|
Competitive pricing is by far the most important issue here.
I (and most people I suspect) just want a simple board populated with chips. In other words the exact same model as the Cairnsmore1 FPGA boards, where most of the cost was the FPGAs themselves and everything else was minimized.
My issue with the Cairnsmore2-3 proposals were all of the expensive racking and other 'engineering' put in raised the cost over the price of the base chips.
If this project and pricing goes the same way as Cairnsmore1 FPGA boards, which maximized the chip BOM vs. everything else, then I would be interested.
|
|
|
|
shapemaker
Full Member
Offline
Activity: 238
Merit: 100
I run Linux on my abacus.
|
|
September 18, 2013, 07:15:48 PM |
|
Competitive pricing is by far the most important issue here.
I (and most people I suspect) just want a simple board populated with chips. In other words the exact same model as the Cairnsmore1 FPGA boards, where most of the cost was the FPGAs themselves and everything else was minimized.
My issue with the Cairnsmore2-3 proposals were all of the expensive racking and other 'engineering' put in raised the cost over the price of the base chips.
If this project and pricing goes the same way as Cairnsmore1 FPGA boards, which maximized the chip BOM vs. everything else, then I would be interested. Agreed. There are quite a few BF based products coming very soon now. The primary differentiating factors now are a) how cheaply a complete product can be made, and b) how much juice one can get out of the chips. Performance ties neatly into a) since if you can minimize the amount of chips used while still getting a decent hashrate, you can push price lower than competition. I would say simplicity at this point is key, not fancy pants features that just increase fail rate. The chip is an interesting one, since it is very power efficient and apparently can be pushed to at least 4GH/s. I wonder how much more one could get by just die shrinking...
|
Shut up and give me money: 115UAYWLPTcRQ2hrT7VNo84SSFE5nT5ozo
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
September 18, 2013, 08:00:32 PM |
|
Competitive pricing is by far the most important issue here.
Chips are 22,5 EUR each (plus VAT if you're an end user). Chip price is insane, those chips should cost less than one EUR each to make (maybe even less). Enterpoint had a good product with CM1s and first units where priced right at 520 EUR/each + VAT or thereabout. If CM4s end up costing more than 50-55 BTC it is very difficult to breakeven and if they price them cheaper they'll end up selling a ton, so making breakeven more difficult as well. spiccioli
|
|
|
|
rocks
Legendary
Offline
Activity: 1153
Merit: 1000
|
|
September 18, 2013, 08:27:33 PM |
|
Chips are 22,5 EUR each (plus VAT if you're an end user).
Chip price is insane, those chips should cost less than one EUR each to make (maybe even less).
Agreed, these chip prices are currently priced at a steep markup to the actual costs. Most of this is due to the fact that the chip designers need to recoup their development & NRE costs, plus make a profit. When mining reaches break-even with limited ROI, purchases of new chips/rigs will slow or stop. This happened in the GPU/FPGA era. When this happens all of these chip vendors will be forced by competition to drop prices much closer to their actual manufacturing cost to keep making some sort of profit, and pricing will become more stable and reasonable.
|
|
|
|
bit_wizard
|
|
September 18, 2013, 08:49:36 PM |
|
Any word on the CM3?
|
|
|
|
yohan (OP)
|
|
September 18, 2013, 09:03:17 PM |
|
Any word on the CM3?
We are working on CM3 in parallel to CM4 and we have development prototypes that look a bit like the picture of the CM4 where we are working on 1 cluster of chips. A lot of the software control functions are actually common with CM4 and I expect both products will be ready in a similar timeframe.
|
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
September 18, 2013, 09:04:38 PM |
|
Chips are 22,5 EUR each (plus VAT if you're an end user).
Chip price is insane, those chips should cost less than one EUR each to make (maybe even less).
Agreed, these chip prices are currently priced at a steep markup to the actual costs. Most of this is due to the fact that the chip designers need to recoup their development & NRE costs, plus make a profit. I think they're priced so high because chip designers need to keep competition low, they're all setting up private pools which is where their earnings will come from. Avalon batch #1 were priced 1500 USD and each unit uses 240 chips and has a full aluminum case and heatsinks weighting 20 Kg plus a decent PSU. At 22 EUR each chip an Avalon had to be 6500 USD just in chips... they did try with batch 3 to price them in this way, but I think that in the end they found that mining with the units was better than selling them because selling units increases difficulty for everyone. spiccioli
|
|
|
|
|