Bitcoin Forum
May 25, 2024, 02:43:43 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: [1] 2 »
1  Alternate cryptocurrencies / Altcoin Discussion / Re: Ripple Giveaway! on: May 20, 2013, 02:34:44 AM
rnsM7NFrahaJAxW2ozvBPjiL6hDEFNApLa
2  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: May 10, 2013, 05:38:42 AM
Possibly, but I used some compiler directives to force SRLs and registers in certain situations so the design would fit.  In XST it infers too many of register or SRL to properly fit, so some manual instantiation might be required.

I'm still surprised that the Ztex project would hit 300 Mhz without extra pipeline stages.  I think that it might be better to add DSPs to that project instead?
3  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: May 09, 2013, 05:26:23 PM
Oh, what speed grade did you use for the V6?  All my boards with 130s and 240s (ml605) are -1, so if you used -3 that could explain the big difference in the quality of the results.
4  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: May 09, 2013, 05:25:01 PM
That's really interesting, I am not familiar with the ztex projects (they don't compile in my preferred synthesis tool, synplify pro).  I would not have expected it to run on the V6 with that usage @ 300 MHz without additional pipelining.  Maybe I'll take a look at that project and see what the big difference is.

I used the old veriliog port to get to 300 MHz on mine (adding DSPs and pipelining), but the power usage is around 7W due to the use of the DSPs.

IIDX

Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

xbaby,

You can also try to floorplan the DSP48s if you want to cut your runtime.  To get the boards I have with V6 130Ts to run at 300 MHz, I had to constrain each of the DSP48s, otherwise there was no chance.  This was based on the original verilog port, but I'm sure the problem with no pre-placement is the same.

Hi, thanks for your tips. I'm compiling the "X6000_ztex_comm4" project, which doesn't use any DSP48 block as I know. I also successfully compiled the same project on V6 130T device (with minor fix for MMCM, FIFO, JTAG core), just achieve at most 300MHz, same as yours, but no DSP48s. the compile time of V6 device is much less than spartan6 LX150. I guess the long-route resources of virtex6 make the difference.

next, I want to try difference implement options to go higher target, such as 350MHz.

BTW the power estimation given by ISE of V6 130T @ 300MHz is about 10W. below is the resource usage:

Code:
Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                85,173 out of 160,000   53%
    Number used as Flip Flops:              85,172
    Number used as Latches:                      1
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     57,385 out of  80,000   71%
    Number used as logic:                   34,910 out of  80,000   43%
      Number using O6 output only:          14,978
      Number using O5 output only:             539
      Number using O5 and O6:               19,393
      Number used as ROM:                        0
    Number used as Memory:                   9,759 out of  27,840   35%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         9,759
        Number using O6 output only:         9,759
        Number using O5 output only:             0
        Number using O5 and O6:                  0
    Number used exclusively as route-thrus: 12,716
      Number with same-slice register load: 12,452
      Number with same-slice carry load:       264
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                15,859 out of  20,000   79%
  Number of LUT Flip Flop pairs used:       62,383
    Number with an unused Flip Flop:         1,382 out of  62,383    2%
    Number with an unused LUT:               4,998 out of  62,383    8%
    Number of fully used LUT-FF pairs:      56,003 out of  62,383   89%
    Number of slice register sites lost
      to control set restrictions:               0 out of 160,000    0%
5  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: May 09, 2013, 06:25:56 AM
Quote
Thanks for you hint. I've already tried SmartXplorer with default 7 built-in strategies, but can't achieve above 160MHz result. so, you mean I should use the cost table method to brute force it? thanks.
Yup.  For reference, the released bitstreams took days/weeks to compile.

xbaby,

You can also try to floorplan the DSP48s if you want to cut your runtime.  To get the boards I have with V6 130Ts to run at 300 MHz, I had to constrain each of the DSP48s, otherwise there was no chance.  This was based on the original verilog port, but I'm sure the problem with no pre-placement is the same.
6  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: April 17, 2013, 09:18:10 PM
I think the problem is linking 11 BRAMs together requires a lot of LUTs for address decode/routing since the BRAMs are arranged in columns throughout the chip.  Plus linking 11 together would probably result in a minimum period much higher than 2.0ns (2.0 ns is for 1 BRAM I think).

So, you would need 128 (hashers) * 11 (BRAMs) for one pipeline stage = 1408 total BRAMs.  Of course, you're not suggesting you use BRAM for all the delay.  However, I think the slices you would sacrifice to connect the BRAMs and create their address logic would be more expensive than just using the built in FFs or DMEMs (plus the speed hit).

I'm hoping by floor planning each hashing module I can get to quick speeds.  Currently the logic delay I am facing is only around ~2.0 ns, with the routes taking the rest.  So with some nice routing I would hopefully meet my target.

The V6LX130 isn't even as big as the S6 150, but at least is has DSP48s.

I may also need to cut down the PCIe link from 4x to 1x and reduce its performance settings to regain some of the space that is being used up.

IIDX

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...


   If you are short on flip flops, have you considered using the BRAMs?  You would need 11 primitives (there are 264 in the LX130T) to make a by 792 bit wide memory.  You can set the BRAM to 'write first' mode, which will echo the data to the output.  The clk-to-out for unpipelined BRAM is ~2.0ns...slower than FF. 
   Since the BRAMs are dual port, you can use both sides of the memory (with different locked addresses), you can get enough storage for 48 stages of a fully unrolled algorithm.   
   I've never tried this, but was just thinking of how to make use of all the unused BRAM laying around.  I usually run out of LUTs, but need to rethink if this is worthwhile with the DSP48 implementation.


7  Bitcoin / Hardware / Re: [In Dev] 28nm mining FPGA (Amateur) on: April 16, 2013, 07:00:40 PM
Forgot to mention that the ML605s and the boards with the LX130s are both speed grade -1 (DSP48 max frequency 450MHz).

I do have access to boards with K7325T and K7480Ts, but neither has PCIe (they aren't xilinx dev boards) access to a normal PC so I never bothered trying on those boards.
8  Bitcoin / Hardware / Re: [In Dev] 28nm mining FPGA (Amateur) on: April 16, 2013, 06:45:39 PM
The Kintex and Virtex FPGAs are basically the same.  The Virtex just comes in bigger sizes and allows for higher speed serial transceivers (up to 28 Gb/s).

As a reference, I've been using ML605s at 375 MH/s since 2011.  That's just using 3 copies of the verilog port, with some of the adders replaced by DSPs (Not pipelined).  It's also using some space because I use PCIe for the connectivity to the PC.  So, you could use 375 MH/s as the benchmark for your ML605.

It's possible that there is more speed in there using more pipeline stages, but the size of the device limits what can be done.  I haven't tried to get more out of it so far. 

I've been trying to get 300+ MH/s out of V6LX130Ts, but only have it running at 200 Mhz right now (using PCIe as the connectivity method again).  I've been trying to fully pipeline the DSPs (as FPGAminer has done with the KC707), but the problem is there aren't enough registers/dmem to delay the rest of the pipeline in that device (plus it does not have enough DSP48s to replace all the adders).

I just did a scan over the 7 series datasheets to see their relative performance.  Here is a summary of the important figures; I figure they might be helpful for this and any other 7-series based projects:

Everything is for the -2 speed grade
Code:
                   Artix 7  |  Kintex 7 |  Virtex 7
FIFO Fmax      |   460.83   |  543.77   |  543.77
DSP48E1 Fmax   |   550.66   |  650.20   |  650.20

The FIFO Fmax (block ram) has been a fairly good measure of the absolute maximum frequency we can expect to see out of hashing cores.  My rough estimates show that Artix 7 is likely to have a better MH/s/$ based on these figures alone, and single unit prices.  However, it is difficult to tell for sure, because I suspect that the Artix is crippled in some other way.  I have not check each chip's routing and CLB configurations.
I'm suprised that the kintex and vortex are equal, I would expect the virtex to be quiet a bit faster xD
but thanks, for this, this means the FPGA should put out about 1GH/s
9  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) on: April 15, 2013, 07:24:50 AM
Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...

BTW, what does Xpower report for that design at 400 MHz?
10  Bitcoin / Hardware / Re: PCI-e Based FPGA Mining Cards on: April 13, 2013, 05:43:54 AM
You need the LXT version for PCIe, just making sure you know that since you mentioned XC6SLX150 and not XC6SLX150T.

The PCIe core on the Spartan 6 is one lane only, so no need to use all 16 fingers.  You'll also need to ensure that your PCB stackup can handle 2.5 ghz pcie gen 1 speeds.  A lot of trouble to just push a few bits around Cheesy

I have 6 spartan6x150's in the mail headed my way to do some testing with.  I have the PCI-e board sketched up on KiCad, and am going to contact a PCB manufacture tomorrow about getting a few made. If all goes well, I'll post the board schematics on here for public use.

For those intrested, DigiKey is selling spartan6x150 singles for $158/chip and $170/chip (same chip, just 2 different batches), just search for "XC6SLX150".

My current board design is PCI-e x16, double wide to fit a wide heatsink and fan in.

I'm also in contact with a rep from Achronix, hoping for a low-ish estimate on their HD1000 series of FPGAs (sub $1,500/chip), but I doubt it.

Thoughts, ideas, opinions?

There are PCI - e FPGA cards are there not already?


http://www.knjn.com/FPGA-PCIe.html

There are, however, they cost $600 for a card with a single Spartan 6.  They card design I have now has 2 Spartan 6's, and is costing me about $450 to make.
11  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: March 30, 2013, 12:50:35 AM
I've trawled this topic and github but am still not sure what the best starting point for a new (Kintex7-325) fpga miner would be.  
Multicore is key and a pointer to a working open source software/fpga combo for a serial interface would be hugely appreciated but any sensible starting point would be fine - I expect to do some work!
I'm poking about with the verilog_xilinx port at the moment.

I started with the verilog_xilinx port back in 2011 to put on a handful of ML605s (V6 240s).  I also have some K7 325s and K7 480s at work, but only did build tests for those because I didn't have permanent access to those boards.

I would suggest starting with verilog_xilinx or one of the Ztex ports.  I used PCIe for mine, so I modified the interfaces to take in 32 bit words.  Unfortunately that means I don't have a starting place for you to use if you are going to try and use the serial port.

However, I would not try to fit more than 3 instances of the fully unrolled verilog_xilinx version into that 325 without changing some of the adders into DSP48s.  On the V6 240 I can fit 3 instances if I use most of the DSP48s to replace some of the adders in the design.  Sadly, the K7 325 doesn't have that many more adders.  I don't think I was successful getting 4 instances of the verilog_xilinx port to fit.

Technically you could actually just make several instances of the entire design... and just use multiple serial ports to talk to it Wink
12  Bitcoin / Pools / Re: [3600 GH] BTC Guild - PPS, PPLNS with TxFees+Orphans, Stratum+Vardiff ASIC Ready on: January 25, 2013, 10:44:53 PM
Hey,

Were is lady luck at the moment reffering to pplns ..i pointed half of  my rigs at it today but i don not hope its me having this unlucky strike of block finds...   Huh Huh

It will hopefully get better i want to keep pplns ..for some longer period to see what wiull happen... Grin

You missed the luck train Sad.  The previous 3 days had some great runs of blocks, but the last 24 hours have been pretty bad.



I will probably give it a shot soon and cross my fingers!  Hopefully my software won't know the difference.

The proxy should trick any properly setup software.  The proxy talks to mining software in the old getwork protocol.

Well I've fired up the proxy and switched to pplns.  So far so good...
13  Bitcoin / Pools / Re: [3600 GH] BTC Guild - PPS, PPLNS with TxFees+Orphans, Stratum+Vardiff ASIC Ready on: January 25, 2013, 06:26:45 PM
I'm still using getwork because I have to use home made mining software to drive my FPGAs (I use a handful of pcie based FPGA dev cards).  I didn't know there was a stratum proxy available, maybe I will try seeing if that works soon.

The proxy is actually the best solution if you have a large number of mining clients required.  Native support is good in cgminer, since it will run multiple cards over one stratum connection.  However, if you have multiple PCs, they can all run through a single stratum connection via the proxy.  Stratum's work distribution method supplies you more work every 30 seconds than you could possibly exhaust.  Paired with variable difficulty, this means an entire farm of 100 PCs on the proxy uses roughly the same amount of external bandwidth as each PC would use individually with native stratum clients.  There is one small disadvantage, which is you lose fine measurement of individual rig performance under a proxy connection with multiple machines.

They each report separately, but because variable difficulty affects your whole farm, the time between shares on a per-worker basis will be longer.  Since they're solving higher difficulty work, this isn't a huge issue, it just means the per-worker speed estimates will be subject to higher variance.  That could be solved by splitting a large farm over a handful of proxies to keep per-worker variance lower.  Overall account variance is effectively unchanged regardless.


You can read up on how to start the proxy at: http://www.btcguild.com/stratum.php [or click 'Stratum Protocol' under Help & Support].


EDIT/UPDATE:  Just to be clear, I'm saying that the proxy is the best/most efficient solution for a large farm, but it does provide a single point of failure.  Having an automatic detection/restart of a proxy failure would mostly eliminate that risk.  If you're somebody who sets up their miner and completely ignores them for days at a time, it may not be the best solution.  Idle warnings would give notification of a full proxy failure though!

I actually run 6 cards inside 1 pc (well, 1 card is inside and 5 cards are connected via an external pcie dev motherboard), and the software I wrote controls all of them in one instance.  Each card has 3 separate mining cores, but in the end there is only one instance of my software managing all the miners.  So, currently I have a single point of failure now anyway.

I will probably give it a shot soon and cross my fingers!  Hopefully my software won't know the difference.
14  Bitcoin / Pools / Re: [3600 GH] BTC Guild - PPS, PPLNS with TxFees+Orphans, Stratum+Vardiff ASIC Ready on: January 25, 2013, 03:48:06 PM
I'm still using getwork because I have to use home made mining software to drive my FPGAs (I use a handful of pcie based FPGA dev cards).  I didn't know there was a stratum proxy available, maybe I will try seeing if that works soon.
15  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: October 24, 2012, 05:02:14 PM
So, does this work with the Xilinx ML605 board? because every FPGA miner advertises support for the spartan family, but does not mention the Virtex family.

Yes, it works with the ML605 board.  I have been running a modified version of the verilog port for over a year on 6 ML605s.

As stated by a previous poster you have to make minor changes to account for the different the board and chip.

You can see the performance achieved in the bitcoin mining hardware comparison article:

https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGAs

I added the entry when I got it working over a year ago, and no new ML605 entry has been added/updated, so I suppose non or not many have tried.  I just happened to have access to the boards so I tried it out.
16  Economy / Services / Re: HideMyNet.com - VPN & Proxy Services - Now Accepting BTC! on: October 15, 2012, 05:08:08 AM
I've been using hide my net for about a year now, maybe I'll try paying by BTC for my latest invoice.
17  Bitcoin / Pools / Re: [3200 GH] BTC Guild - Pure PPS Merged Mining - Stratum+Variable Diff ASIC Ready on: October 14, 2012, 06:45:29 AM
Thank you for the info!
18  Bitcoin / Pools / Re: [3200 GH] BTC Guild - Pure PPS Merged Mining - Stratum+Variable Diff ASIC Ready on: October 14, 2012, 03:46:05 AM
Not sure if this has been asked and answered already, if it has I apologize.

Will the current (non stratum) interface be removed at some point?  I've been mining for a little over a year at BTC Guild with a non standard FPGA setup (6 PCIe FPGA dev boards, only 2250 MH/s total) using my own software to get and send work to the pool (since my FPGA interfaces aren't supported by the popular mining apps).

If there's a plan to move to stratum only connections, I'll assume you'll give everybody a heads up before the old protocol is shutdown?  That kind of news would get me off my butt and motivate me to actually revise my software to work with the new protocol  Grin.
19  Other / Off-topic / Re: "Frist Look at BFL's ASIC Hardware" on: September 26, 2012, 05:05:36 AM
So they only have basic 3d drafts done? Shouldn't they have made these designs way before they even asked for money?

There are several steps involved in the type of project that BFL is doing.  Here's a general overview of the procedure they might be following.  We do this where I work, but in general, we don't do step #1 because we use off the shelf parts in most products.

If you need money to complete the steps, I guess you have to ask for it up front Grin.

1a) ASIC (or whatever they are using) design:

The engineers will define the interfaces of the ASIC (power, digital and analog interfaces), and also decide at a high level how the guts of the chip will be put together (think block diagram).

1b)Implementation:

During this step, the designers work on implement the guts of the hashing chip and laying it out in the chip based on step 1.

1c)ASIC manufacture:

At this point, BFL will know how to use their ASIC (what the pinouts are and how to talk to the chip).  All that is left is for the fab to actually make the chip.  The fab or other company may also test the chip based on test vectors provided by BFL.  BFL may also do this in house on the final PCB or a test PCB.

2) Schematic capture / system design - deciding which other electronic components in addition to the ASICs are required, and hooking them all together.  This step can be started even before step 1 is complete.  Once step 1a is complete, the ASICs can be hooked up once the final pinouts and communication/power requirements are known, completing step 2.

3) Mechanical design:

How big can the PCB be?  Where do we place the parts on the PCB to make them fit?  How much clearance do you have?  How much heat has to be dissipated (airflow/heatsink requirements)?  This step would normally happen in parallel with #2.

The picture in the OP is the result of #3 with input from #2 and #4 (because the PCB routing can change where parts are placed).  However, the picture does not actually mean that step 4 is complete.  However, It might be finished because there is top/bottom layer routing that is visible.  Based on their proposed release date, either they have the boards already or are just waiting for them to arrive.

4) PCB part placement and routing:

Based on the data gathered from #3, the electrical design team must actually place the required parts determined in step 2.  Once the parts are placed on the PCB, the actual components must be connected as per step 2.  The parts are connected via traces on multiple layers on the PCB.  During this step, parts may also be moved around to make the routing easier.

5) PCB manufacturing and assembly:

A company manufactures the PCB and then places and solders the req5ired components onto the board.

6) Turn on and test

Now it's time for BFL to get the board, and figure out what actually happens when they turn on the power...  Errors that were not caught during the previous ste6s may be correctable with mods.

7) Done??

If all goes well, the product is finished!  Any problems that could not be corrected by mods will require you to create a new version of the board or ASIC.
20  Other / Off-topic / Re: Diablo Mining Company will never buy Butterfly Labs hardware on: July 06, 2012, 04:31:24 AM
What do you think that Kintex 7 480 would do Mh/s and Mh/W? Any insight into Artix 7s? They are supposed to slot into the Spartan 6 space once available in good volume.

I haven't tried mining on the 480s, only on V6 240s.  The reason behind this is I have a handful of unused boards that have V6s on them, while the K7 boards are all used for my real work.

I made an entry in the mining hardware comparison long ago:

The V6 240s run at 375 MH/s at about 16w.  This isn't optimized (other than to use some DSP48s), so I guess you could probably squeeze another 50-100 MH/s out of the devices with some effort.

The MH/s based on the size of the device breaks down as follows:

375MH/s / 240 = 1.5625 MH/s per "size unit"

Based on this, I would estimate the 480 could do:

480 * 1.5625 = 750 MH/s

I haven't used Xilinx's power estimator for a K7 bitcoin design, but Xilinx claims 50% less power than the V6.

So, you could assume your worst case would be 32w, best case 16w.  Somewhere in the middle will probably be your actual power.

BUT!!  You may think the Kintex 7 would be better for mining (price/performance) than a Spartan or Artix device, but you'd be wrong!

Take the Mod miner for example:

840 MH/s @ 40w in 4 x Spartan 150 devices

840 MH/s / (4 * 150) = 1.4 MH/s per "size unit"

So, the S6 is a little less efficient in terms of size/performance (partly because I used DSP48s in the V6 example to reduce logic usage, but there are other factors), but the price difference is huge.  I think the S6 LX150 is ~$100, so there's no real good reason to buy the K7 or V6.  The K7 and V6 provide more advanced functionality (high speed serial, more internal memory, more DSP slices, more pins, etc) that aren't required for bitcoin mining.  You'd end up paying for features you don't use.

Once the Artix devices come out in force, I suspect they will be similar in price/performance to the S6.  Power consumption will be somewhere between 50% less and the same as the S6.  However, they are the last devices to go into production (after the K7 and V7).

Okay, so, the forum had V7 and K7 pricing a magnitude wrong (thanks guys), so what is the cheapest way of getting SASICS? Power usage isn't an issue if we go 65 or 90nm, and BFL is going 90 or 130nm on "real" ASIC (which has yet to be proven).

I think the ideal device would be an FPGA that it just all slices, no DSP slices, no high speed serial, no large gobs of memory, etc, and on top of that offers SASIC migration.

Does any company offer that, no matter the node size?

I am not that familiar with Structred ASICs (SASICs).  I'm mostly a Xilinx person, and they offer "Easypath" devices which aren't Structured ASICs.  They are more or less FPGAs that only load one bitstream.  Xilinx only promises a 35% cost reduction and requires a 300K upfront fee.  I don't believe Easy path devices are offered for the Spartan 6 family because the cost is already pretty low.

Altera offers actual structured ASICs for many families, so the price per unit can be reasonable for a larger device.  However, since it's an actual custom ASIC, the setup/up front fee is probably pretty large.  In addition, you have to budget time and money for "oops" mistakes which would require a second or even third turn of the ASIC.

Anyway, to answer your question, Altera offers 4 different structured ASIC options.  Two of the options offer the stuff you would use for mining (mostly logic, not features we wouldn't use) without extra fluff.  They also offer sizes that range from small to large within the different families.

I do not have any pricing information on this scheme because I've never gone down this path as part of my Job.  I still think the cost is non trivial since you're actually making a custom device.

There are other vendors you can work with that will help you convert create and ASIC or Structured ASIC, but Altera offers an end to end solution for SASICs.  That probably makes them a good choice since you only deal with one vendor.  However, I have never dealt with that type of design flow before, so I have no first hand experience.

In the end, it's all about up front money.  Upfront money for manufacturing a PCB (not too bad), and upfront money to create an ASIC or SASIC.  I am curious as to what route (ASIC/SASIC) BFL went for their "upcoming" products and what the upfront costs were...  Time to market for any of these options is pretty long, so they had to have started a while ago.
Pages: [1] 2 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!