Bitcoin Forum
April 25, 2024, 02:23:14 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Warning: One or more bitcointalk.org users have reported that they strongly believe that the creator of this topic is a scammer. (Login to see the detailed trust ratings.) While the bitcointalk.org administration does not verify such claims, you should proceed with extreme caution.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 »
  Print  
Author Topic: [Announcement] Avalon ASIC Development Status [Batch #1]  (Read 155269 times)
PuertoLibre
Legendary
*
Offline Offline

Activity: 1834
Merit: 1003


View Profile
December 16, 2012, 01:28:24 PM
 #401

I am sort of remembering the 7.5*7.5mm number for BFL was the size of the package and not the die size. I am not sure the die size was never revealed but it must be much smaller to fit into the package.

As I recall, the BFL package size was 11mm*11mm.

How would one organise 88 chips? Would it be a good idea to put them all on one PCB, or stack PCBs with 22 or 44 chips?
I doubt there are 88 chips. But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)
1714011794
Hero Member
*
Offline Offline

Posts: 1714011794

View Profile Personal Message (Offline)

Ignore
1714011794
Reply with quote  #2

1714011794
Report to moderator
1714011794
Hero Member
*
Offline Offline

Posts: 1714011794

View Profile Personal Message (Offline)

Ignore
1714011794
Reply with quote  #2

1714011794
Report to moderator
TalkImg was created especially for hosting images on bitcointalk.org: try it next time you want to post an image
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714011794
Hero Member
*
Offline Offline

Posts: 1714011794

View Profile Personal Message (Offline)

Ignore
1714011794
Reply with quote  #2

1714011794
Report to moderator
1714011794
Hero Member
*
Offline Offline

Posts: 1714011794

View Profile Personal Message (Offline)

Ignore
1714011794
Reply with quote  #2

1714011794
Report to moderator
1714011794
Hero Member
*
Offline Offline

Posts: 1714011794

View Profile Personal Message (Offline)

Ignore
1714011794
Reply with quote  #2

1714011794
Report to moderator
mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
December 16, 2012, 08:11:35 PM
 #402

But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)

This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
PuertoLibre
Legendary
*
Offline Offline

Activity: 1834
Merit: 1003


View Profile
December 16, 2012, 09:06:17 PM
 #403

But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)

This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
Not exactly true, there are interconnect/bus issues and firmware issues that might stop that from being true. If you have overclocked a normal CPU you know that something other than the main chip itself might limit a decent performance gain. Chips are usually a part of a system and not a standalone device.

Talking in terms of ASIC being overclocked, it depends on a number of design decisions. Overclocking 8 massive chips from 60 to 120 Gh/s is not the "exactly" the same deal as overclocking 88 chips that subdivide the work.

As long as you are dividing the heat load across a wider array of chips with more surface area [88 for example] and as long as your cooling is sufficient for all 88 (and you have enough space for all the chips), no single die should experience the same heat load as 1 in a group of 8. You can double the clock on a group of 88, but the heat is shared across a wider area.

In modern computing the idea is to create hyper efficient chips at a decent clock rate and stack them in as tiny a package as you possibly can. In fact, these days most CPU vendors are trying to compact as many cores as possible into one socket.

AMD has 32 per socket as an experimental design while intel is aiming for 50.

----------------

In the ASICs coming from the vendors, depending on the design decisions being made, you don't have to go with that logic. You can spread it out into clusters/modules with their own heatsink (like bASIC did).

Anyway, overclocking is much more than just changing the rate of the clock if you are designing the hardware. Perhaps Avalon has gone with the "shot gun" approach where the chips are all very inefficient but they make up the difference by:

1) Perhaps in their simplicity. (reliable, easy to produce dies?)
2) Perhaps by being so tiny alot of them can be packaged together like a mini rig?

I dunno. But there is more than one way to build a system. As long as you change the principles of the design enough that it makes practical sense.

BFL went with the idea of creating dense "Full custom" chips with high performance in a low nm process. But they are no "Intel" or "AMD". God knows how many failures they might face per wafer if their fab bakes the chips just slightly off.

Intel and AMD have their fabs set up to try tons of different combinations in one go. As the fab proceeds they get good data on what worked great and what works terrible as the layers are checked and baked. Therefore the first chips out of their fabs are usually the worst. While the last runs are their best and most efficient chips (and highly overclockable).

etc...

mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
December 16, 2012, 09:26:24 PM
 #404

But if there were, the slightest overclocking of that group of chips would incur one hell of a performance gain. (And lots of extra electrical use)
This is nonsense. Overclocking by x% always brings a constant x% performance gain, whether it is 88 small chips or, say, 4 large chips.
Not exactly true, there are interconnect/bus issues and firmware issues that might stop that from being true.

In the context of Bitcoin mining, an x% overclock does bring an x% performance gain, because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck. This specific argument is certaintly not going to explain that 88 chips will perform better than 4 large chips when overclocked (if interconnect was even an issue, it would be more a problem for 88 chips than for 4 chips).

As long as you are dividing the heat load across a wider array of chips with more surface area [88 for example] and as long as your cooling is sufficient for all 88 (and you have enough space for all the chips), no single die should experience the same heat load as 1 in a group of 8. You can double the clock on a group of 88, but the heat is shared across a wider area.

Then you should have said "it is easier to overclock 88 small chips than 4 large chips" (which I agree with). Your sentence "the slightest overclocking of that group of chips would incur one hell of a performance gain" does not convey this idea at all. You need to communicate your ideas more clearly if you want to be understood.
PuertoLibre
Legendary
*
Offline Offline

Activity: 1834
Merit: 1003


View Profile
December 16, 2012, 11:58:27 PM
 #405

@ Mrb

I'll try better next time.
hardcore-fs
Full Member
***
Offline Offline

Activity: 196
Merit: 100


View Profile WWW
December 17, 2012, 12:55:05 AM
 #406

 "because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck."

Sorry I would have to disagree with this, if you take a look at some of the RTL floating about,  a solution is provided every clock cycle.
That solution has to be tested and extracted, therefore the more engines you have working on solutions, the higher the probability of generating multiple nonces during the same clock phase that satisfy the rules you are looking for.

lets say that for the sake of argument you have 4 cores running independently at 100Mhz or 100MHs and all four cores produce a solution at the same time (rare but it can happen).
The internal silicon must then be capable of dealing with those 4 results during the same clock cycle. (how you gonna do that?), run the combiner logic at 4* the system clock?, so that you can process the 4 results is a "single" 100Mhz clk cycle, but 4 cycles at 400Mhz?
yep you could split the design down into groups of two engines and process the results in parallel at 200Mhz, but eventually it all has to be combined to get it out of the chip. Now multiply that by the number of cores some of these designs are running (6?)
or are we just going to "pretend" there was only 1 result and discard the other solutions.

Then you have to FIFO all this crap so that you can get it out of the chip, so the more cores you have on the chip, the more problems you have as regards raw silicon design, that is before you even think about HOW you are going to get work into the chip.

For interest take a look at one of the ASICS floating about, they have given a proposed pinout showing 8 data lines and some strobes.
WTF.... even the nonce will require 4 CLK cycles just to get it out of the chip and they are claiming this design is good into the GH/S range?



BTC:1PCTzvkZUFuUF7DA6aMEVjBUUp35wN5JtF
mem
Hero Member
*****
Offline Offline

Activity: 644
Merit: 501


Herp Derp PTY LTD


View Profile
December 17, 2012, 01:16:25 AM
 #407

It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.



Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?

mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
December 17, 2012, 01:52:07 AM
 #408

"because mining is an embarrassingly parallel workload that requires very little bandwidth, so it is trivial to design the interconnect so as to not make it a bottleneck."

Sorry I would have to disagree with this, if you take a look at some of the RTL floating about,  a solution is provided every clock cycle.
That solution has to be tested and extracted, therefore the more engines you have working on solutions, the higher the probability of generating multiple nonces during the same clock phase that satisfy the rules you are looking for.

lets say that for the sake of argument you have 4 cores running independently at 100Mhz or 100MHs and all four cores produce a solution at the same time (rare but it can happen).
The internal silicon must then be capable of dealing with those 4 results during the same clock cycle. (how you gonna do that?), run the combiner logic at 4* the system clock?, so that you can process the 4 results is a "single" 100Mhz clk cycle, but 4 cycles at 400Mhz?
yep you could split the design down into groups of two engines and process the results in parallel at 200Mhz, but eventually it all has to be combined to get it out of the chip. Now multiply that by the number of cores some of these designs are running (6?)
or are we just going to "pretend" there was only 1 result and discard the other solutions.

Then you have to FIFO all this crap so that you can get it out of the chip, so the more cores you have on the chip, the more problems you have as regards raw silicon design, that is before you even think about HOW you are going to get work into the chip.

For interest take a look at one of the ASICS floating about, they have given a proposed pinout showing 8 data lines and some strobes.
WTF.... even the nonce will require 4 CLK cycles just to get it out of the chip and they are claiming this design is good into the GH/S range?

No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
December 17, 2012, 02:19:49 AM
Last edit: December 17, 2012, 02:50:49 AM by nathanrees19
 #409

or are we just going to "pretend" there was only 1 result and discard the other solutions

Yes. I don't care if 0.000001% of results are lost.

Edit: What failure rate would be acceptable to a "serious" miner? I'm guessing that 0.1% would not even be noticeable.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
December 17, 2012, 02:55:27 AM
 #410

No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
I think I understand what hardcore-fs has on his mind. He is saying that with multiple hashing pipelines you may miss more valuable difficulty-n (n>1) share if your glue hardware is occupied with transmiting a difficulty-1 share that had just been found by another pipeline. This situation is probably infrequent, but he insists on a synchronous FIFO to handle it properly.

I had similar problem back in school, where we had to handle quite improbable fault conditions but we didn't wanted to lose track of them. We simply used asynchronous S/R flip-flops and interrupts. Software would single-step backtrack the faulty channels if more than one fault occured nearly simultaneously.

I think the same approach can be used for hashing chip: don't bother catching exact nonce; since you know the order in which nonces are tried you can check couple of previous nonces in software. Even if the hashing chip cannot reliably use asynchronous S/R flip-flops it could for sure use synchronous J/K flip-flops.

Basically, it is a hardware/software tradeoff in handling rare, but important, conditions.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
bce
Sr. Member
****
Offline Offline

Activity: 756
Merit: 250



View Profile
December 17, 2012, 02:56:16 AM
 #411

It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.



Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?


This thread is about Avalon ASIC Development Status, and Inaba is staying on topic.
Bogart
Legendary
*
Offline Offline

Activity: 966
Merit: 1000


View Profile
December 17, 2012, 03:41:49 AM
 #412

No, a solution is not provided every clock cycle. A mining logic block will drop non-solutions without requiring any communication with any external logic: it just has to look if the high 32-bits are zero or not.

The end-result is that a a ~7.5 Ghash/sec chip, for example, is going to output a difficulty-1 solution every half second, on average. That's only a few hundred bytes transmitted every second. Hardly "rocket science".
I think I understand what hardcore-fs has on his mind. He is saying that with multiple hashing pipelines you may miss more valuable difficulty-n (n>1) share if your glue hardware is occupied with transmiting a difficulty-1 share that had just been found by another pipeline. This situation is probably infrequent, but he insists on a synchronous FIFO to handle it properly.

These chips crunch near a billion hashes per second.  Losing a small handful of those each second is miniscule.

Mine along on your CPU if you wanna make up the difference and then some.

"All safe deposit boxes in banks or financial institutions have been sealed... and may only be opened in the presence of an agent of the I.R.S." - President F.D. Roosevelt, 1933
Syke
Legendary
*
Offline Offline

Activity: 3878
Merit: 1193


View Profile
December 17, 2012, 04:12:57 AM
 #413

It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.

Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?

This thread is about Avalon ASIC Development Status, and Inaba is staying on topic.

Inaba has no idea whatsoever how Avalon is doing things, so he has no useful info to add to the thread, therefore he's just trolling as usual.

Buy & Hold
abeaulieu
Sr. Member
****
Offline Offline

Activity: 295
Merit: 250



View Profile
December 17, 2012, 04:52:19 AM
 #414

It's not unlikely at all... depending on power factors, we estimate between 64 and 128 chips per device.  88 is an odd number (not impossible of course), but given the propensity for the Avalon team to like nice round figures, I suspect a multiple of 8.  My thoughts are they are going to have to go with more than 64 chips to overcome the particular issues they haven't run into quite yet.

Why dont you spend less time "guessing" what your competition is doing, what stages they are at and how further behind than you they are and spend more time answering the multitude of questions people have surrounding your shitty company ?

This thread is about Avalon ASIC Development Status, and Inaba is staying on topic.

Inaba has no idea whatsoever how Avalon is doing things, so he has no useful info to add to the thread, therefore he's just trolling as usual.

Just as much useful information as you or I may have...
Syke
Legendary
*
Offline Offline

Activity: 3878
Merit: 1193


View Profile
December 17, 2012, 05:04:50 AM
 #415


Inaba has no idea whatsoever how Avalon is doing things, so he has no useful info to add to the thread, therefore he's just trolling as usual.

Just as much useful information as you or I may have...

Actually, no. You or I are somewhat impartial. Inaba is a competitor. Any information he might put forth is extremely biased and anti-useful.

Buy & Hold
mem
Hero Member
*****
Offline Offline

Activity: 644
Merit: 501


Herp Derp PTY LTD


View Profile
December 17, 2012, 06:04:14 AM
 #416

On second thought, don't talk to me Inaba.

Talk to me when you actually have delivered 100 [ASIC] units of a product your company sells. Until that time shooo....I disown you as a legitimate vendor. Begone from my sight!

Edit: and uh, best of luck.

Wonderful PuertoLibre, does this mean you will leave this BFL thread and go to the actual ASIC vendor's thread? We will be deeply saddened by this loss.

abeaulieu is a hypocrite that knows exactly what he is doing and only does an average job of trying to hide it.
Your goal spread FUD against avalon while propping up the polished turd BFL.
Shill is a Shill is a Shill.

bce
Sr. Member
****
Offline Offline

Activity: 756
Merit: 250



View Profile
December 17, 2012, 07:53:07 AM
Last edit: December 17, 2012, 08:04:43 AM by bce
 #417

On second thought, don't talk to me Inaba.

Talk to me when you actually have delivered 100 [ASIC] units of a product your company sells. Until that time shooo....I disown you as a legitimate vendor. Begone from my sight!

Edit: and uh, best of luck.

Wonderful PuertoLibre, does this mean you will leave this BFL thread and go to the actual ASIC vendor's thread? We will be deeply saddened by this loss.

abeaulieu is a hypocrite that knows exactly what he is doing and only does an average job of trying to hide it.
Your goal spread FUD against avalon while propping up the polished turd BFL.
Shill is a Shill is a Shill.

so... troll, reverse troll, counter troll, counter reverse troll, triple sow cow double hypocrisy back flip =  useful thread?  

I think it's better to keep things on topic.   Back on topic about the ASIC design of The Avalon Team - Thanks, Inaba for the thoughtful feedback.
abeaulieu
Sr. Member
****
Offline Offline

Activity: 295
Merit: 250



View Profile
December 17, 2012, 01:48:24 PM
 #418

On second thought, don't talk to me Inaba.

Talk to me when you actually have delivered 100 [ASIC] units of a product your company sells. Until that time shooo....I disown you as a legitimate vendor. Begone from my sight!

Edit: and uh, best of luck.

Wonderful PuertoLibre, does this mean you will leave this BFL thread and go to the actual ASIC vendor's thread? We will be deeply saddened by this loss.

abeaulieu is a hypocrite that knows exactly what he is doing and only does an average job of trying to hide it.
Your goal spread FUD against avalon while propping up the polished turd BFL.
Shill is a Shill is a Shill.

I'm actually quite impartial, and most certainly not being paid by BFL. You're just an idiot, mem, and there's no cure for that.

PuertoLibre
Legendary
*
Offline Offline

Activity: 1834
Merit: 1003


View Profile
December 17, 2012, 03:12:00 PM
 #419

Well, I was enjoying the technical talk between the three others. I actually want to hear what they have to say.

I want to see what they hash out [pun intended] between themselves on how they think the process should be handled. I dunno about anyone else, but at the end of that discussion I am going to ask Avalon teams members...."So your thoughts on this technical discussion is...?"

Then they will say something clever (one hopes) and let us in on some insight into how they handled the problem at the firmware or silicon level. That will benefit all of us, don't you think?
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
December 17, 2012, 03:26:55 PM
 #420

These chips crunch near a billion hashes per second.  Losing a small handful of those each second is miniscule.

Mine along on your CPU if you wanna make up the difference and then some.
I get a feeling that a longer explanation is required for those unfamiliar with digital logic design.

The issue isn't really about losing one in billions of hashes. It is about gaining the timing margin (a.k.a. overclocking headroom) in the design.

Of course Avalon's logic is secret, but I'm going to discuss the problem based on one of the open-source FPGA hashers. It had a critical timing path in the logic that latched the "golden nonce". Since the design was 125-deep pipelined it had a hardware that subtracted constant 125 from the nonce counter before sending it out of the chip.

Now we have two ways to speed up the above design:

1) remove the 32-bit wide constant subtractor. This will gain a fraction of a nanosecond on every hash tried. It is very easy to subtract 125 in software from the nonce downloaded from the chip.

2) acknowledge that the timing violation may occur and the nonce latched may not be the exact one that solved the block, but a next one or previous one, depending on the details of the latching logic. It is somewhat more involved, but still easily doable in software: recompute the hashes for nonce values n-126,n-125,n-124 and use the one that solved the block. Again this will make the design more tolerant to overclocking for every hash tried inside the chip.

Obviously 1) cannot be applied to the ASIC chip or closed-source FPGA bitstream. But the method 2) remains applicable, just use a different set of test values.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!