Bitcoin Forum

Bitcoin => Mining speculation => Topic started by: investorpgroovy on January 02, 2018, 02:18:58 AM



Title: SHA256d IC design question
Post by: investorpgroovy on January 02, 2018, 02:18:58 AM
for any ASIC/FPGA designers , I would love to have an offline discussion about this

What is the possibility that someone would have verified verilog/VHDL  for an SHA256d ASIC  from a current or defunct company ( BF or BM28NM for example) that they would be willing to sell ? Does anyone know if the 16nm and 10nm designs use the same logic as the 28nm, as it seems to me that is the case.

Is anyone trying to complete a improved ASIC at this point ?





Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 02, 2018, 07:00:58 PM
Do you have the money to tape one out?

The entire field of asics has a very narrow bloodline starting with grandpa Icarus and his kids.  There are improvements along the way, but you can see his heritage in all the children once you get to know them a bit.

Most generational changes are stuffing more hashcores and doing manufacturability/convenience tweaks.  There's only so many ways to unroll a loop, but you need a couple experienced layout guys to place it - you wouldn't want to P&R the whole thing, it would perform poorly.


Title: Re: SHA256d IC design question
Post by: investorpgroovy on January 03, 2018, 06:45:35 AM
This is purely exploratory but financing the tapeout/production is the easy part..Its the R&D that worries me

It had occurred to me to start with ICARUS..but based on my experience adapting an FPGA to an ASIC is pretty risky compared to starting with a verified design and just moving to a new process. Especially with the time factor... honestly I don't know what IP was used in those FPGA boards thats not synthesisable or not available to license



I am guessing based on your comment you already know what I am thinking. I kinda figured that there had been a number of improvements since icarus in order to get the hashrate/watt up...My initial thought was that if I could get my hands on something that was verified already in a relatively recent iteration it would derisk a project like this...

That being said, I haven't reviewed the designs of any of the ASICs or FPGAs... If the IP is out there and its a matter of just getting a few layout guys, well that's pretty damn interesting....  


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 03, 2018, 09:33:01 PM
If you have deep pockets, you can get chipworks to reverse any chip you want.  Start with a larger geometry to save yourself some bucks.

But adapting verilog from fpga to asic is not difficult.  I can assure you that many digital designs are prototyped in fpga and then synthesized into asic.

I would say that other than asicboost, the architecture changes have been small in the last few generations - it's an unrolled loop, pipelined to give one result per clock.  There's really only one answer there, I would expect that everyone's design is very similar for this important part.





Title: Re: SHA256d IC design question
Post by: investorpgroovy on January 04, 2018, 07:44:35 AM
Sure everyone uses FPGAs for design..

to clarify a bit more.  the issue I would worry about with going from FPGA is I dont know how it was designed .If  we started with icarus and for example they designed it using the "free " IP that xilinix offers you, you start adding on to the development process time... not to mention timing issues and what not.

The last project where I acquired a FPGA design with the plan to convert quickly to asic ended up being a nightmare,  went a year beyond schedule and had crazy licensing fees (it was a complex design with 4 arm cores, 3 levels of on die cache and so on ) .. it was successful in the end.. but it wasn't easy,

here a guide to the type of issues I am talking about .... http://www.onsemi.com/pub/Collateral/HBD872-D.PDF

so I am not a designer myself and I suspect these designs are very simple... so let me ask, are you basicaly saying these chips are so simple I don't need to worry about CPU core/memory timing type issues and 3rd party licensing ?


Title: Re: SHA256d IC design question
Post by: investorpgroovy on January 04, 2018, 07:50:37 AM
I mean I am just generally exploring to try and and find the lowest cost.shortest time to market.. not a huge fan of hiring someone to reverse engineer something if the other option is just to start with Icarus..


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 04, 2018, 02:20:37 PM
I didn't say simple, I said there are not many unique ways to make an optimized sha core to give the ideal performance of one hash per clock once pipelined. In my opinion, only one.

You're probably going to want to license the pll from the foundry, but the logic would be built from gates.  Some company built a hashcore they licensed out back in the 0.13 days, not sure of the name but it doesn't seem worth the money.

Pm me the part number of that asic project that you mentioned, I'd like to take a look



Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 04, 2018, 07:21:20 PM
I believe the Hashfast design ended up in the hands of their silicon integrator, which did the design work in the first place. Similarly, Terrahash went bankrupt, so their design IP ended up in play and is probably held by someone who assigns it a modest value.  BFL - I don't know what became of them at all, but they had a design.

The catch is that all of these designs were laid out using VHDL to standard cell libraries.

Bitfury clearly demonstrated that laying out at the transistor level gave massive advantage for power efficiency.  His 64 nm chip performed better than the 28 nm generation.  It was also buggy as hell.

I don't think it's feasible to deliver a power competitive design with standard cells.  You will need to start with transistor level design of an unrolled hashing core.  From there, it's likely there are power optimizations that are possible.  The design will likely need to be optimized thermally as well, to limit hot spots.

Delivering a working SHA256 hash core isn't all that hard.  Being competitive from a power efficiency standpoint will be difficult.  I doubt it's practical to expect you will be within 20% of Bitmain on a given node until your 3rd or 4th generation.

Good luck, but I think you will find your money would be better invested convincing a major player like AMD or NVIDIA to develop a solution.


Title: Re: SHA256d IC design question
Post by: investorpgroovy on January 05, 2018, 03:47:53 AM
so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.


Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 05, 2018, 05:57:37 AM
so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.

So far the public intentions has been to offer mining gpus that don't have video outputs.  The sole purpose is to prevent miners from dumping their gpu gear onto the market used when the inevitable crash comes and they can't mine profitably.

Global Foundries operates on a standard contract fab model so it's not really surprising that they built the BFL devices.

I don't think the transistor level design requirement is that big of a barrier.  Bitfury did it by himself on a kitchen table over the course of a year.  The problem is you won't find a design house willing to work that way.  They have their tool sets and their work flows and they aren't going to diverge from it.  So you will need to buy your own set of design tools and find a team of borderline Asperger's cases to do the transistor design.  

Somebody should really fund a Professor to do the design work under an open hardware license.  One the transistor design for SHA256 is done you just have to bring that into the fab's design tools and optimize for placement.  Conductor losses are becoming dominant at these process nodes so that is where the biggest optimizations will be found.


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 05, 2018, 06:47:44 PM
Borderline aspergers, lol.

I'd like to learn a little more about this transistor level implementation, I'm having a hard time picturing what could reasonably be exploded or minimized in the hash core.  Xor?  It's just flops and wiring otherwise, I would be surprised if the flop was exploded, but maybe - if you have any links to check out, it would be an interesting read.


Title: Re: SHA256d IC design question
Post by: NotFuzzyWarm on January 05, 2018, 08:03:14 PM
It is mainly all about efficient layout of the signal paths between the cores and coms. Like the Cray super computers proved decades ago, using very short and direct pathways with minimal reliance on multiple layers has a very dramatic effect on speed and power consumption. Standard Foundry IP blocks only care about functions and not optimum I/O speed between the blocks.


Title: Re: SHA256d IC design question
Post by: investorpgroovy on January 07, 2018, 07:28:02 AM
I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.

My main objective was to try and figure out if there was a way to get something to market quickly enough to challenge the incumbent players, the thing that bothers me specifically with bitmain is that they unfairly mine, driving up difficulty then release the parts into the market..

I started out in dram so I know once you get to the point that you need to do transistor level layout its a lot harder to jump into a new market essentially from scratch...and its become clear to me that there is no shortcut in terms of buying the IP from a defunct company as everything has advanced so much since anyone with a reasonably fast chip has been in the market... a better focus would be a "breaking" a different algo


that being said Its beyond my scope of knowledge but I wonder if there is a more efficient way to go about hashing fundamentally.

Bitmain  BM1382 calculates 63 hashes per clock cycle (Hz) and BM1384 calculates 55 hashes per clock cycle.
BitFury's BF756C55 is claimed to have 756 cores for about 11.6 hashes per clock cycle.


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 07, 2018, 04:10:39 PM
Hashes per cycle means number of hash cores.  Each core is an unrolled sha engine, so you continuously feed data into the front end, and finished hashes come out the back end and you check result to see what difficulty result a particular nonce generated.

There is delay in filling the engine, but once it's full they all give one hash per clock as each clock starts a new hash on the front end, and spits out a finished one on the back end.

So you naturally want to stuff as many copies of the engine in as your little power supply lines can handle.  :)


Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 07, 2018, 06:48:32 PM
I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.



Actually you want engineers with a balance if it's any sort of a team.  With full on cases they will only be effective with a strong manager who can command respect on a technical level.  There's an amusing article on medium from a few months back talking about an example of this; it was something like 'We fired out best programmer and it was the smartest thing I ever did'.

Semi design isn't my area of expertise.  But as an outsider I don't understand why it wouldn't be feasible to have the transistor level layout be done in a platform independent way.  The would then allow for design debug to be completed at a low cost node for less than $1 M, then you could focus on building at the expensive node with confidence there won't be a catastrophe.  If that approach is feasible I don't really see why the whole thing couldn't be done in an open source fashion.  An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

I don't think there's much promise in pursuing other algorithms.  Basically ether's algo is the only one without existing silicon and a chance to survive long term.  I guarantee you there are people working on it already.

 


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 07, 2018, 07:58:07 PM
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

 An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part.  Bitcoin has changed from a cool open-source environment to ultra-greed mode.  Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.



Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 07, 2018, 08:34:58 PM
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

 An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part.  Bitcoin has changed from a cool open-source environment to ultra-greed mode.  Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.



That's the whole point of doing the transistor design as open hardware.  You eliminate the biggest barrier to entry by putting the transistor layout into the public domain.  There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M.  That's 50 BTC.  I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 07, 2018, 11:34:29 PM
Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it.  Do the first tapeout at MOSIS, then buy a mask set once it's verified.  At that point, if you want to open source the GDS you're free to do so.


Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 08, 2018, 04:12:30 AM
Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it.  Do the first tapeout at MOSIS, then buy a mask set once it's verified.  At that point, if you want to open source the GDS you're free to do so.

It would take a lot more than that.  Form a 501(c), build a project and test plan and publish a budget.  Identify a qualified team that's committed to moving forward if the needed resources are available, and the key milestones where funds are needed.

With that it really wouldn't be hard to raise the funds.  Whether it's via angel investors for a start up, or a kickstarter approach with a public domain solution as the end point would be up to you.


Title: Re: SHA256d IC design question
Post by: the_electronrancher on January 08, 2018, 08:38:10 PM
Are you by any chance a marketing guy?


Title: Re: SHA256d IC design question
Post by: alh on January 08, 2018, 11:26:38 PM

That's the whole point of doing the transistor design as open hardware.  You eliminate the biggest barrier to entry by putting the transistor layout into the public domain.  There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M.  That's 50 BTC.  I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!

I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools"  for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.



Title: Re: SHA256d IC design question
Post by: Entropy-uc on January 09, 2018, 06:42:39 AM
I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools"  for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.

I know that very well.

The arrangement of transistor gates required to implement a double SHA-256 hash would not change.  The implement would change for the target process node and Fab.  

By demonstrating that your transistor level design is correct, the risk is dramatically reduced.  It's not zero because, as you say, the implementation at each node would require a unique design layout.

Would it really move the need on the costs to implement a bitcoin hash chip?  I can't say for certain.  I can tell you that a surface discussion with somebody exposed to semi design won't give you a valid answer because they are thinking in terms of tool chains and standard cells that decouple you from the transistor level by several layers.  It simply isn't industry standard practice.  But results delivered by Bitfury make it clear it's the only way to be competitive in the crypto mining space.



Moderator's note: This post was edited by frodocooper to remove a nested quote.


Title: Re: SHA256d IC design question
Post by: alh on January 09, 2018, 08:15:00 AM
Are you by any chance a marketing guy?

No.

Ph.D in engineering.  I worked in process R&D for Intel for over a decade before I escaped.

Given your experience and education, why would you start asking questions here? I am lost as to what you are seeking from a truly ransom collection of folks here.....


Title: Re: SHA256d IC design question
Post by: majlkcze on January 09, 2018, 08:31:39 AM
Interesting topic guys. I´m still thinking about things you are talking about there, and this should be possible.
I went through the whole process from FPGA to ASIC as a microelectronics student, from design to fabrication, yes I was personally in the clean room holding the wafers.

The saddest thing is that it was 5+ years back and the crypto was not so well known and I had no clue what can be done in this area.

Now I´m still on the same faculty as a PhD. student, I think I can pull some triggers and be helpful in this area.
If a group of members decide to try something, count me in.



Moderator's note: This post was edited by frodocooper to remove an unnecessary quote.


Title: Re: SHA256d IC design question
Post by: 2112 on January 10, 2018, 01:12:08 AM
The problem is you won't find a design house willing to work that way.  They have their tool sets and their work flows and they aren't going to diverge from it.  So you will need to buy your own set of design tools and find a team of borderline Asperger's cases to do the transistor design.  
This is where I disagree, people are just looking for a wrong kind of design house. They need to look for designers experienced and interested in mixed-signal and power-electronics designs. This is significantly different than the predominant industry practice in digital logic design.

Technically there are 3 main points where Bitcoin mining chip differs from the typical modern digital IC:

1) SHA-256D is practically a fully self-testing circuit
2) SHA-256D has very high signal toggle rate (0.5, only 3dB below the theoretical maximum of 1.0 for a ring oscillator)
3) there are practically no external design requirements (like timing closure) the chip is 100% limited to either thermal/power (when over-clocking) or self-switching-noise (when under-volting)

The standard tool chains used in digital logic design fail to produce efficient designs with the above requirements:

1) heuristic layout optimization algorithms fail to converge on a design where each bit of the output depends on each bit of input, so the designers force round unrolling to achieve convergence
2) the methodology is mostly designed for timing-closure or test-driven-design, when this is completely not an problem here
3) the approximations made by the toolchains are very inaccurate in the interesting problem space (very high toggle rate and no timing demand whatsoever)

The end result is that standard tools produce designs that are way too conservative in terms of individual reliability of gates and flip-flops: they are way too reliable at the local-logic level and then trade this off for noise tolerance on the very long and high fan-out interconnections.

Somebody should really fund a Professor to do the design work under an open hardware license.  One the transistor design for SHA256 is done you just have to bring that into the fab's design tools and optimize for placement.  Conductor losses are becoming dominant at these process nodes so that is where the biggest optimizations will be found.
Well, I haven't spoken with a Professor recently, but I did in the past. From that experience I can surmise that work on a Bitcoin miner could be a career-limiting move for a scientist in the current prevailing climate at the engineering schools.

If you are going to ask around here are the two good questions to ask:

1) why nobody considers plain old serial adders/subtractors for the most common operation in SHA-256D? One can add two 32 bit numbers with a few XOR gates and a D flip-flop with absolute minimum of power spent. It will just take 32 clocks, but who cares? Why such an obsession with parallel adders and complex carry-look-ahead logic when there's no real timing constraint?

2) why nobody considers the old trick of using differential logic (like ECL vs TTL) when the signal toggle rate is at 50%? The complementary logic gives great power saving provided that the toggle rate is much less, the closer to zero the better. This is of
no benefit here whatsoever.

So if you guys are looking for either commercial design houses or semiconductor design professors avoid the mainstream. In addition to the two categories above I could also suggest asking about past experience with GaAs or other exotic processes, which exercised less explored corners of the design mind-scape.

Remember, BitFury's original design may have been done on a kitchen table or in a garage, but they did not unroll and decisively beat all the experienced CAD-monkeys despite using much older fabrication process.



Title: Re: SHA256d IC design question
Post by: 2112 on January 10, 2018, 08:11:53 PM
I'd like to learn a little more about this transistor level implementation, I'm having a hard time picturing what could reasonably be exploded or minimized in the hash core.  Xor?  It's just flops and wiring otherwise, I would be surprised if the flop was exploded, but maybe - if you have any links to check out, it would be an interesting read.
Here's the example of what can be optimized with the transistor-level knowledge.

SHA-256 has 64 rounds that when unrolled have values that once computed have to be used in 16 different places (fanout of 16). For this example lets simplify and assume that there are only 2 inputs and 6 outputs.
Code:
a<=
b<=
c<=
d<=
e<=
f<= x + y;
This can be optimized to:
Code:
a<=
b<=
c<= x + y;
d<=
e<=
f<= x + y;
The optimization is that the same value is computed twice, but in different physical locations on a die and the signal needs shorter routes from the source to the destination. Here's more in-depth explanation:

https://en.wikipedia.org/wiki/FO4

Note that the above optimization is the opposite of the ASICBOOST "optimization".

We know that recent Bitmain chips are have capability to work both in regular way and boosted with ASICBOOST (with theoretical maximum of about 25% savings). We also know that when used in the boosted configuration they need to be clocked much lower and have lower overall performance (and probably lower yield of chips that could work in boosted modes).

If Bitmain was capable of accurately simulating their chips they wouldn't waste their resources on that exercise because 25% is lower than the normal manufacturing tolerances on the process nodes they were using. Transistor-level simulation is nowadays more accurate than the manufacturing variance and one could actually simulate the performance at the various process corners.

From the above we can deduct that they don't have any sort of transistor-level design, they just use standard cells and sandbagging the design with wide safety margins. That is the same thing that KnCminer did years ago.

The other possibility is that Bitmain did implement their chips dual-capable (both boosted and un-boosted) for some non-technical, political or personal reasons. But that would mean that their chips are even less optimized than they could be without wasting space on the unused boosting logic.


Title: Re: SHA256d IC design question
Post by: 2112 on January 10, 2018, 08:31:13 PM
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.
There's nothing to debug at the transistor level that is process-independent. In fact, even the transistor model changed from BSIM3 to BSIM4-family when you move from cheap to expensive processes.

The general topology of the models is already well known and open sourced:

http://bsim.berkeley.edu/models/

What is secret? The parameter values of those models. And even if you use MOSIS/Europractice or similar program you won't be able to publish those secret values. Without those you can't optimize in any sensible way beyond "sandbag the hell out of it and keep your fingers crossed". KnC did that already.


Title: Re: SHA256d IC design question
Post by: QuintLeo on January 12, 2018, 09:57:16 PM

Global Foundries operates on a standard contract fab model so it's not really surprising that they built the BFL devices.


 Partly true - they do have some extensive contracts with IBM and AMD dating back to the "fab spinoff" days and amended/updated every so often that lock up a lot of their capacity if IBM or AMD wants that capacity.

 The contract fab model applies to whatever is "left over".





Title: Re: SHA256d IC design question
Post by: NODEhaven on March 09, 2018, 11:46:27 PM
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.
There's nothing to debug at the transistor level that is process-independent. In fact, even the transistor model changed from BSIM3 to BSIM4-family when you move from cheap to expensive processes.

The general topology of the models is already well known and open sourced:

http://bsim.berkeley.edu/models/

What is secret? The parameter values of those models. And even if you use MOSIS/Europractice or similar program you won't be able to publish those secret values. Without those you can't optimize in any sensible way beyond "sandbag the hell out of it and keep your fingers crossed". KnC did that already.


This is by far one of the better threads I have come across on Bitcointalk.

If its not too much, could you describe a little on how KnC "sandbagged" the design and why didn't they use Europractice?


Title: Re: SHA256d IC design question
Post by: 2112 on March 16, 2018, 04:17:53 PM

This is by far one of the better threads I have come across on Bitcointalk.

If its not too much, could you describe a little on how KnC "sandbagged" the design and why didn't they use Europractice?

"sandbagging" means that they used quite large factors of safety in their design ( https://en.wikipedia.org/wiki/Factor_of_safety describes is for mechanical/structural designs ). E.g. if the design tool came up with N um wide power rail they actually drawn the power rail as S*N where S > 1 . If their simulation computed that the maximum clock speed will be F MHz, they used D*F (where D < 1) in their published specification.

One of their executives enumerated their multiple layers of safety margins in the video they published upon initial release of their miners. Maybe somebody archived it somewhere in the KnC thread?

Europractice access is limited to educational/research/non-profit institutions. KnC from the beginning was a funded for-profit corporation. On the other hand Bitfury (person) initially developed his chip with cooperation from some Polish research institute before funding the Bitfury (corporation).

I keep mentioning Europractice/Mosis in the thread like this because it is an obvious and effective way of saving money in the initial stages of a design. Lots of folks keep mentioning multi-million dollar initial costs of developing the mining ASICs. But this is quite obviously not true if somebody knows how to use the educational discounts and how to deal with associated limitations on merchantability.


Title: Re: SHA256d IC design question
Post by: NODEhaven on March 21, 2018, 02:01:35 AM
...

How does the overt ASICboost that Halong is implementing effect the logic on the chip?



Moderator's note: This post was edited by frodocooper to trim the quote from 2112.


Title: Re: SHA256d IC design question
Post by: 2112 on March 21, 2018, 05:12:56 PM
How does the overt ASICboost that Halong is implementing effect the logic on the chip?
I don't think that there's any non-bullshit information available publicly about Halong chips, so I'll refrain from making comments.


Title: Re: SHA256d IC design question
Post by: NODEhaven on March 24, 2018, 07:10:53 PM
"sandbagging" means that they used quite large factors of safety in their design ( https://en.wikipedia.org/wiki/Factor_of_safety describes is for mechanical/structural designs ). E.g. if the design tool came up with N um wide power rail they actually drawn the power rail as S*N where S > 1 . If their simulation computed that the maximum clock speed will be F MHz, they used D*F (where D < 1) in their published specification.

One of their executives enumerated their multiple layers of safety margins in the video they published upon initial release of their miners. Maybe somebody archived it somewhere in the KnC thread?

Europractice access is limited to educational/research/non-profit institutions. KnC from the beginning was a funded for-profit corporation. On the other hand Bitfury (person) initially developed his chip with cooperation from some Polish research institute before funding the Bitfury (corporation).

I keep mentioning Europractice/Mosis in the thread like this because it is an obvious and effective way of saving money in the initial stages of a design. Lots of folks keep mentioning multi-million dollar initial costs of developing the mining ASICs. But this is quite obviously not true if somebody knows how to use the educational discounts and how to deal with associated limitations on merchantability.

I will check the KncMiner thread and post a link if I can find it.

Also, That's pretty genius of Bitfury.  I know of a few professors at University of Houston that are interested in developing some FPGAs for crypto-currency.  Also, in college I took full advantage of those type of licenses.

Right now I looking into on-chip temperature sensors and voltage regulation to use in a feedback loop that may require outside IP if feasible which would require a license.  Those licenses may forgo the ability to use the non-profit approach.

Something I pulled up after a quick search.  It has digital output.  Not sure if that is an issue and how to calibrate it.
https://www.design-reuse.com/sip/temperature-sensor-series-6-with-digital-output-tsmc-7nm-ff-high-accuracy-thermal-sensing-for-reliability-and-optimisation-ip-43229/?login=1 (https://www.design-reuse.com/sip/temperature-sensor-series-6-with-digital-output-tsmc-7nm-ff-high-accuracy-thermal-sensing-for-reliability-and-optimisation-ip-43229/?login=1)



Moderator's note: This post was edited by frodocooper to remove a nested quote.


Title: Re: SHA256d IC design question
Post by: 2112 on March 25, 2018, 12:37:12 AM
Right now I looking into on-chip temperature sensors and voltage regulation to use in a feedback loop that may require outside IP if feasible which would require a license.  Those licenses may forgo the ability to use the non-profit approach.

Something I pulled up after a quick search.  It has digital output.  Not sure if that is an issue and how to calibrate it.
https://www.design-reuse.com/sip/temperature-sensor-series-6-with-digital-output-tsmc-7nm-ff-high-accuracy-thermal-sensing-for-reliability-and-optimisation-ip-43229/?login=1 (https://www.design-reuse.com/sip/temperature-sensor-series-6-with-digital-output-tsmc-7nm-ff-high-accuracy-thermal-sensing-for-reliability-and-optimisation-ip-43229/?login=1)
Don't make a mistake of putting nontrivial control logic onto the same chip as the mining circuitry. In case of failure you won't be able to distinguish between the real fault or bogus fault induced by the noise and/or heat from mining logic. By definition the mining logic has to work at the edge of starvation or hyperthermia death, otherwise it is operating far from optimal.

Helveticoin did something like you are thinking (including an on-die ARM controller) and it was completely non-competitive. It had to be severely underclocked to maintain the reliability of the controlling SoC.

Spondoolies included on-die power-on-self-test and then had to create software workarounds for mining engines that fail the POST but operate correctly after a warm-up. Some desperadoes resorted to preheating their miners with a hair dryer.

You'll be much better off with just temperature-sensing diodes or averaging multiple low-accuracy temperature sensors located in far-away corners of the die.


Title: Re: SHA256d IC design question
Post by: NODEhaven on March 26, 2018, 04:09:02 AM
In a rather striking coincidence, I may not be the only one looking at this exact same sensor at 7nm.  I contacted Moortec to learn a little moor about their sensor.

http://www.moortec.com/blog/2018/03/05/moortec-providers-of-in-chip-monitoring-pvt-subsystems-solutions-are-pleased-to-announce-that-canaan-creative-have-employed-moortecs-in-chip-monitoring-subsystem-their-hpc-ic

http://www.moortec.com/blog/tag/7nm



Moderator's note: This post was edited by frodocooper to correct erroneous URL formatting.


Title: Re: SHA256d IC design question
Post by: HyperMega on March 26, 2018, 03:39:46 PM
How does the overt ASICboost that Halong is implementing effect the logic on the chip?

Please have a look at page 8 of the original ASICboost white paper:
https://arxiv.org/ftp/arxiv/papers/1604/1604.00575.pdf

There is a Duo-Core ASICboost implementation shown. In case you would operate such a Duo-Core in a non-ASICboost mode (at the same clock frequency), you would run at 50% of the ASICboost performance, because only one of the two cores can operate in non-ASICboost mode.

Ck said in another thread, that the Halong miner is at 25% of its performance in a non-ASICboost mode. Because of that I would assume, that they implemented a Quad-Core, which requires about 18.75% less silicon area (leakage power)/logic toggling (dynamic power) compared to 4 non-ASICboost cores.



Moderator's note: This post was edited by frodocooper to trim the quote from NODEhaven.


Title: Re: SHA256d IC design question
Post by: 2112 on March 26, 2018, 05:27:02 PM
Please have a look at page 8 of the original ASICboost white paper:
https://arxiv.org/ftp/arxiv/papers/1604/1604.00575.pdf

Ck said in another thread, that the Halong miner is at 25% of its performance in a non-ASICboost mode. Because of that I would assume, that they implemented a Quad-Core, which requires about 18.75% less silicon area (leakage power)/logic toggling (dynamic power) compared to 4 non-ASICboost cores.
All the numbers in that paper are theoretical values assuming infinite speed of light and counting of ideal logic gates with no parasitic impedances, infinite input impedance and zero output impedance.

That has no bearing on any actual implementation in any realistic logic circuit technology. In particular even non-ASIC-boosted but unrolled SHA256 has same values used in 16 different places. This implies https://en.wikipedia.org/wiki/Fan-out of 16 when nearly all CMOS processes are optimized for fan-out of 4 https://en.wikipedia.org/wiki/FO4 .

The FO4 argument probably explains why that chip is built with fixed 4-way ASICboost.


Title: Re: SHA256d IC design question
Post by: HyperMega on March 26, 2018, 07:15:38 PM
All the numbers in that paper are theoretical values assuming infinite speed of light and counting of ideal logic gates with no parasitic impedances, infinite input impedance and zero output impedance.

That has no bearing on any actual implementation in any realistic logic circuit technology. In particular even non-ASIC-boosted but unrolled SHA256 has same values used in 16 different places. This implies https://en.wikipedia.org/wiki/Fan-out of 16 when nearly all CMOS processes are optimized for fan-out of 4 https://en.wikipedia.org/wiki/FO4 .

The FO4 argument probably explains why that chip is built with fixed 4-way ASICboost.

These numbers are not based on completely ideal assumptions. They are based on the fact that the part of the pipeline, which outputs could be reused by other cores, counts for about 25% of the overall core logic of a single core.

Ok, you are right, the FO/load cap of the reused bits is increased by feeding multiple cores. But the reused outputs are only 32 bits in contrast to a 512 bit wide pipeline without increased FO, implemented only once.

So the gain of an ASICboost duo-core in terms of power efficiency will be a bit less than 12.5%, but not much.



Moderator's note: This post was edited by frodocooper to remove a nested quote.


Title: Re: SHA256d IC design question
Post by: 2112 on March 26, 2018, 08:13:09 PM
These numbers are not based on completely ideal assumptions. They are based on the fact that the part of the pipeline, which outputs could be reused by other cores, counts for about 25% of the overall core logic of a single core.

Ok, you are right, the FO/load cap of the reused bits is increased by feeding multiple cores. But the reused outputs are only 32 bits in contrast to a 512 bit wide pipeline without increased FO, implemented only once.
I haven't read the full patent application, but I understand how they are written with a goal of withstanding claim/counter-claim adversarial legal system in the USA and other anglophone countries. So I can confidently repeat: you are wrong, these numbers intentionally use idealized, abstract algebraic models to make a strong patent application. The whitepaper is just a marketing brief for the patent. This isn't a scientific report in the applied science field.

In the next paragraph you use the term "512-bit wide pipeline". This is just such a nice marketing speak. SHA256 is actually a 16-stage 32-bit wide shift register with some fancy feedback terms. The re-invention of it as 16*32=512 bit vector pipeline is nothing more than a workaround in for the bugs/design flaws in the front-end Verilog tools used preferably in the West Coast of the USA. If the design was done in VHDL (as preferred by East Coast USA boutiques) there would be no need  for that trick of making 32-bit slices out of 512-bit vector.

No matter which front-end was used the actual physical layout is very far from the neatness associated with the word "pipeline" and how e.g. AMD/Intel use it in theirs marketing literature and die photos.

The physical layout of such designed unrolled mining engine very much resembles the snake pit like one used in my avatar. That happens because the heuristic layout optimization tools cannot find any useful gradient to optimize for, fail to converge or converge extremely slowly resulting with semi-random rats nest of long traces.

So the gain of an ASICboost duo-core in terms of power efficiency will be a bit less than 12.5%, but not much.
I cut this paragraph into a separate quote because it is a beautiful sample of USDA prime marketing baloney.

Firstly duo-core was just a sample on the whitepaper, the Halong's implementation is quad-core. So it is 18.75% not 12.5%.

Secondly, you use values of bit much less than 2. Such a nice English creative writing trick. How do you values of "bit" compare with manufacturing tolerances which are about +/-20%?

Thirdly, it not about just (A) reduction of power use. You neglected to mention:

B) lower clock speed due to need to keep nearly four times larger area that needs to be kept in lockstep;
C) lower yield because the area of mutually dependent logic is increased nearly four-fold.

It is quite an achievement in marketing to squeeze 3-way deception into a single sentence. You must be a professional.

Finally, whatever one can say about Bitmain's chip that is ASIC-boost capable, at least it is somewhat honest in implementing switchable levels of ASIC-boost. One could actually measure the actual gains or loses from various levels of boosting and compare them with the table of theoretical values. It isn't as perfect an experiment as designing separate chips for each level of boosting, but a better scientific compromise.

All I can guess about Halong's chip is that it's design was worked out as some sort of political compromise or attack/defense strategy. I'm definitely not up to speed on the factions currently involved in the Bitcoin internecine warfare.


Title: Re: SHA256d IC design question
Post by: HyperMega on March 27, 2018, 03:43:39 PM

It is quite an achievement in marketing to squeeze 3-way deception into a single sentence. You must be a professional.

Finally, whatever one can say about Bitmain's chip that is ASIC-boost capable, at least it is somewhat honest in implementing switchable levels of ASIC-boost. One could actually measure the actual gains or loses from various levels of boosting and compare them with the table of theoretical values. It isn't as perfect an experiment as designing separate chips for each level of boosting, but a better scientific compromise.

All I can guess about Halong's chip is that it's design was worked out as some sort of political compromise or attack/defense strategy. I'm definitely not up to speed on the factions currently involved in the Bitcoin internecine warfare.


A professional marketing guy? No, I’m not that kind of professional.  :)

It always takes me a while to extract the useful information from your posts, but believe me, I finally agree with you, sometimes.
 
Yes, having the ability to switch ASICboost on/off (as Bitmain did), give you a chance to compare the two modes. It would even be possible to do a complete power shut-off of the unused backup logic in ASICboost mode, to avoid the leakage of these logic parts. But it still consumes silicon area, which increases your production costs in terms of $/GH.

Halong has chosen a very aggressive way to implement ASICboost without any backup logic for a non-ASICboost mode. In this way they have enabled the full potential of ASICboost in terms of J/GH and $/GH. I wouldn’t dare something like that, without the support of parts of the community (e.g. Slush). The risk of falling down to only 25% of the maximum performance would be much to high, in case no pool would support rolling versions.
So yes, I agree, it was “a sort of political compromise or attack/defense strategy”.


Title: Re: SHA256d IC design question
Post by: 2112 on March 27, 2018, 05:17:55 PM
A professional marketing guy? No, I’m not that kind of professional.  :)
But you are some sort of an insider, with access to the information about the most current tools and processes. Why don't you use to produce some useful information, even if you don't have funding for the fully separate mask set production?

Why don't you actually run some representative simulations, even some reduced-rounds non-compatible version of SHA-256?

Why can't you squeeze a single Bitcoin mining engine into paid-for but left unused free space on some unrelated project taping out? The same way that Heveticoin did, and in the spirit of the long history of placing various more-or-less useable Easter Eggs into established silicon products?

It would even be possible to do a complete power shut-off of the unused backup logic in ASICboost mode, to avoid the leakage of these logic parts. But it still consumes silicon area, which increases your production costs in terms of $/GH.

Halong has chosen a very aggressive way to implement ASICboost without any backup logic for a non-ASICboost mode. In this way they have enabled the full potential of ASICboost in terms of J/GH and $/GH.
Can you make an educated speculation as to what would be the underlying business strategy?

What is the technical merit of multiplying the complexity of the engine several times in exchange for the gains lower than the regular manufacturing tolerances?

Lots of ASICs get designed purely for non-technical reasons: copy protection, hiding of patent or license violation in a way that is extremely hard to reverse-engineer and litigate, etc.

I think there should be some constructive speculation that you could post without violating your NDAs, don't you think? Or maybe everyone at your company already knows that HyperMega is a pseudonym of their 1st VP of Sales, and everyone there already watches your back?


Title: Re: SHA256d IC design question
Post by: NODEhaven on March 28, 2018, 02:51:12 PM
Can you make an educated speculation as to what would be the underlying business strategy?

What is the technical merit of multiplying the complexity of the engine several times in exchange for the gains lower than the regular manufacturing tolerances?

Lots of ASICs get designed purely for non-technical reasons: copy protection, hiding of patent or license violation in a way that is extremely hard to reverse-engineer and litigate, etc.

I think there should be some constructive speculation that you could post without violating your NDAs, don't you think? Or maybe everyone at your company already knows that HyperMega is a pseudonym of their 1st VP of Sales, and everyone there already watches your back?

I think the answer is more simple.  The originators of the project, if not engineers, may have seen any "easy" answer with ASICboost.  If the BPDL was always planned by Halong then that also answers the question as it requires everyone using ASICboost to release all patents and purchase rights for any IP that is licensed from third-parties for everyone in the BPDL.

https://blockchaindpl.org/licensev10 (https://blockchaindpl.org/licensev10)

As far as optimizations of layout go, am looking at some different methodologies.

This one is interesting.  Not sure if their exact approach applies, but this 2017 paper shows power and efficiency improvements on the order of 300% for a modelled cryptographic implementation using asynchronous clocks.

https://www.sciencedirect.com/science/article/pii/S2090123217301170 (https://www.sciencedirect.com/science/article/pii/S2090123217301170)





Title: Re: SHA256d IC design question
Post by: 2112 on March 30, 2018, 02:41:02 AM
This one is interesting.  Not sure if their exact approach applies, but this 2017 paper shows power and efficiency improvements on the order of 300% for a modelled cryptographic implementation using asynchronous clocks.

https://www.sciencedirect.com/science/article/pii/S2090123217301170 (https://www.sciencedirect.com/science/article/pii/S2090123217301170)
This paper is about a way of implementing GALS (Globally Asynchronous Locally Synchronous) logic. IMO it won't help at all for SHA256D. It could probably work for something like scrypt() which is a sandwich of two layers of PBKDF2 with Salsa20 in the middle. Fixed length SHA-256 like in Bitcoin is too trivial to be susceptible to such advanced optimizations.