Bitcoin Forum
November 18, 2024, 07:39:18 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
Author Topic: BTCMiner - Open Source Bitcoin Miner for ZTEX FPGA Boards, 215 MH/s on LX150  (Read 161726 times)
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 05, 2011, 08:05:25 AM
 #41

So my question in this case would be how long until my FPGA machine becomes more cost efficient then the GPU Rig. 

How do you define "efficiency"? Usually efficiency means H/J. LX150 FPGA boards generate about 10 times more H/J than modern 6990 GPU's. This ratio is constant over the time.

ElectricMucus
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
September 05, 2011, 03:45:58 PM
 #42

The problem is FPGAs themselves are very expensive and in order to get them to work you need a board communication / logic voltage level conversion and power converters.

So while being the most versatile silicon estate they are also the most expensive. The also have alot of i/o capabilities (the most of any ic) which we don't need for bitcoin mining but have to pay nevertheless.
Uhlbelk
Member
**
Offline Offline

Activity: 91
Merit: 10



View Profile WWW
September 05, 2011, 06:37:30 PM
 #43

The problem is FPGAs themselves are very expensive and in order to get them to work you need a board communication / logic voltage level conversion and power converters.

So while being the most versatile silicon estate they are also the most expensive. The also have alot of i/o capabilities (the most of any ic) which we don't need for bitcoin mining but have to pay nevertheless.

So if FPGA is "too versatile" what is keeping ASIC folding development from taking off?
Thanks for bringing up this point.

Freedom with SciFi Coin
ElectricMucus
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
September 05, 2011, 07:35:57 PM
 #44

The problem is FPGAs themselves are very expensive and in order to get them to work you need a board communication / logic voltage level conversion and power converters.

So while being the most versatile silicon estate they are also the most expensive. The also have alot of i/o capabilities (the most of any ic) which we don't need for bitcoin mining but have to pay nevertheless.

So if FPGA is "too versatile" what is keeping ASIC folding development from taking off?
Thanks for bringing up this point.
The main problem about asics is the initial costs.
There are basically 2 kinds of "asics":

Cell based asics which are built like fpgas with the difference that the logic is fused roms and the interconnect is hard-wired. Those have the lowest initial cost but also the weakest performance. There are different many different products which offer different block layouts and interconnect strategies.
The main advantage is that these chips can be produced from HDL code.

Custom asics are a different animal. You work on bare silicon, from scratch. You have to work on the analogue domain here, write your own software to model your transistors, choose between doping, layer thickness, model interconnect, calculate resonances and so on.
From what I've heard every company writes its own proprietary software and uses that, there is FOSS toolkit or commercial one for that.
Even worse every software uses different models, and the more you can get to the cutting edge of physics the better the chip will be.

The research and the software alone are several man years. As for bitcoin... if it ever becomes as big as lets say Linux you might start thinking of an effort but right now this looks like a reach for the stars.
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 06, 2011, 04:58:31 AM
Last edit: September 06, 2011, 05:09:49 AM by rph
 #45

Agreed. Somebody might make a toy full-custom ASIC on a low-cost academic fab (~250nm), but a modern FPGA will beat it on performance per watt and per dollar. A 40nm mask set costs $4M -- I don't think anyone will invest that kind of money into Bitcoin at the moment.

Xilinx Easypath could make sense once the HDL designs stop improving. But I think we will see 6s150 reach 175-200MH/s, OC'd, before then. And I'd hate to submit a final design into Easypath 1 week before somebody releases a 10% optimization to the RTL. That would suck.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 06, 2011, 08:12:24 AM
 #46

Xilinx Easypath could make sense once the HDL designs stop improving. But I think we will see 6s150 reach 175-200MH/s, OC'd, before then. And I'd hate to submit a final design into Easypath 1 week before somebody releases a 10% optimization to the RTL. That would suck.

The Easypath program is only available for Virtex FPGA's and the cost reduction is only 35%. This is still more expensive than Spartan 6 FPGA's.

The frequency limit is about 135 MHz (non overclocked) with the pipelines I used. This limit is defined by the levels of logic. Higher frequencies are only possible with longer pipelines. But these pipelines are not routable on LX150 anymore.

Unfortunately LX150's (at least the ones I have) can't be overclocked much.



Uhlbelk
Member
**
Offline Offline

Activity: 91
Merit: 10



View Profile WWW
September 06, 2011, 08:17:51 AM
 #47

Thanks for all the valuable info. You folks are awesome.

Freedom with SciFi Coin
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 07, 2011, 06:01:40 AM
Last edit: September 07, 2011, 07:02:52 AM by rph
 #48

In high vol (>1k) the mid-range virtex6 parts become close to 6s150 in price per LUT.
High vol pricing is basically set by the die size.

I would not use spartan6 in a large scale mining operation, as the SLICEX
makes about half the LUTs (thus, half the die) useless. Virtex6 or Kintex7, with a carry
chain in every LUT, should have higher MH/$ for the folks willing to invest real money
into FPGA mining. And then once you're in virtex, if your wallet allows, you can consider
Easypath..

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 07, 2011, 08:28:06 AM
 #49

In high vol (>1k) the mid-range virtex6 parts become close to 6s150 in price per LUT.
High vol pricing is basically set by the die size.

I never asked my distributor for 1k prices of Spartan's or Virtex's, but that would surprise me due to several reasons:
  • Virtex FPGA are faster. Xilinx would not sell them for the same price/LUT
  • Virtex FPGA's contain a huge amount of large DSP slices which are mainly used for multiplication and are almost useless for bitcoin mining. These DSP slices are the backbone of Virtex FPGA's (used for HPC) and Xilinx would not give them away for free
  • Spartan FPGA's are more simple (SLICEX)

Quote
I would not use spartan6 in a large scale mining operation, as the SLICEX
makes about half the LUTs (thus, half the die) useless. Virtex6 or Kintex7, with a carry
chain in every LUT
There is no shortage in carry chains. About 60% of the used slices are SLICEX's, so I wouldn't agree that they cant be used Wink



rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 08, 2011, 06:01:09 AM
 #50

Well, my theory is that two pipelines could (barely) fit into a 6s150-sized part, if it had more carry chains
plus DSP48E1s. With the SLICEX and only 180 DSP48A1s, there's just no way.

RE Pricing: There will always be a premium for the Virtex. But much smaller than one might suspect
based on non-negotiated list prices on the web. V6 is a 40nm part; S6 is 45nm. And since V6/K7 doesn't
have the SLICEX limitations, and offers larger devices (meaning fewer boards to assemble)
I think it's worth it for a big operation.

And.. I think DSP48, esp DSP48E1 in V6/A7/K7, can be very useful for SHA256.
I hope to validate that claim soon on real HW. Grin

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 08, 2011, 08:27:00 AM
 #51

Well, my theory is that two pipelines could (barely) fit into a 6s150-sized part, if it had more carry chains
plus DSP48E1s. With the SLICEX and only 180 DSP48A1s, there's just no way.
Did you ever try to validate your theory? Your theory does not consider some facts:
  • Only about 1/3 of the required sliced of the current design need to have a carry chain. As I wrote in my last post: There is not shortage in carry chains.
  • Its not sufficient just to place the LUT's, they also have to be connected in some way, i.e. they have to be routed. The routing resources are limiting factor.
  • DSP slices cannot be used for adder trees, are slower than LUT's and more difficult to route.

Quote
And.. I think DSP48, esp DSP48E1 in V6/A7/K7, can be very useful for SHA256.
I hope to validate that claim soon on real HW. Grin
I tried it on S6. The amount of required LUT's was reduced by less than 10% and AFAIR the clock was reduced to 70-80 MHz.

ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
September 08, 2011, 06:20:18 PM
 #52

While 2 pipelines could theoretically fit on a S6-LX150, routing is impossible (S6s have a lot less "long" routing resources than virtexes).
What *is* possible is one pipeline with 2 register stages per sha256 round, use SRL16s for W where possible, don't go overboard with em and don't let synthesis infer more shift regs.
And don't expect to get > 170MHz or so post-p&r on -2 without giving the placer some help.

*edit, SRL16, nor LSR16
*edit2: the xilinx docs aren't very clear on this, but a LUT6 in a SLICEM can be used as *2* equal-length SRL16s => a 32-bit wide shift reg up to 17 deep (slice FFs give the last stage) is only 4 SLICEMs.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 09, 2011, 06:46:09 PM
Last edit: September 09, 2011, 07:31:56 PM by rph
 #53

ArtForz, I've tried exactly that, and reached 156MHz in -3 (probably ~180MH/s OC'd).
I guess there is room to optimize a bit more.

The DSP48s are useful if you design around them. I've reached 500MHz+ with a rolled SHA256
core on V6 w/o much effort. The DSPs have dedicated routing and dedicated registers, so
there's less room for the SW tools to screw things up.

-rph
 

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 12, 2011, 08:33:40 AM
 #54

rph, please share your code with us or at least the report files (.syr, .map, .mrp, .par) so that I can believe your results.

ArtForz and rph, the theoretical speed limit according to xst for a 1 stage / sha256 round pipeline is 156.490 MHz (sounds familiar). This is defined by the levels of logic. In practice about 130 -135 MHz can be achieved.

The theoretical frequency limit of a 2 stage / sha256 pipeline reported by xst is up to 284.649 MHz, depending on the amount of additional registers. But as written before, I was not able to route such design. I even tried to use DSP slices which dropped the theoretical max. frequency to about 80MHz.

rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 17, 2011, 08:14:17 AM
Last edit: September 18, 2011, 05:03:52 AM by rph
 #55

> ArtForz and rph, the theoretical speed limit according to xst for a 1 stage / sha256 round pipeline is 156.490 MHz (sounds familiar).
> This is defined by the levels of logic. In practice about 130 -135 MHz can be achieved.

Agreed. The xst Fmax is unrealistically high b/c it does not really consider routing delays.

I was using a two cycle per stage (3-input adder) design like ArtForz described, to reach 156MHz in -3. It took about 3 hours to map.
I had to tune the build options; with some settings map would just freeze for 1+ day during global placement. Very annoying.
Will post more once I build the HW (hopefully this weekend), and ensure the design works and produces valid hashes
and can actually be powered/cooled.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 18, 2011, 05:21:28 AM
Last edit: September 18, 2011, 05:56:24 AM by rph
 #56

Build summary as ztex requested. Hopefully substantiates that 150MH/s+
is possible in this device, non-OC'd, with some room to spare!  Grin

Slice Logic Utilization:
  Number of Slice Registers:                96,777 out of 184,304   52%
    Number used as Flip Flops:              96,777
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     58,692 out of  92,152   63%
    Number used as logic:                   39,716 out of  92,152   43%
      Number using O6 output only:          29,405
      Number using O5 output only:             424
      Number using O5 and O6:                9,887
      Number used as ROM:                        0
    Number used as Memory:                   3,056 out of  21,680   14%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         3,056
        Number using O6 output only:             0
        Number using O5 output only:             0
        Number using O5 and O6:              3,056
    Number used exclusively as route-thrus: 15,920
      Number with same-slice register load: 15,858
      Number with same-slice carry load:        62
      Number with other load:                    0
---
...
Phase 12  : 0 unrouted; (Setup:0, Hold:0, Component Switching Limit:0)     REAL time: 1 hrs 30 mins 48 secs
Total REAL time to Router completion: 1 hrs 30 mins 48 secs
Total CPU time to Router completion: 1 hrs 31 mins 27 secs
---
Critical path:
Slack:                  0.121ns (requirement - (data path - clock path skew + uncertainty))
  Source:               engine/sha2/pipe/stagegen[5].stageX/o_state_32 (FF)
  Destination:          engine/sha2/pipe/stagegen[6].stageX/t2_30 (FF)
  Requirement:          6.666ns
  Data Path Delay:      6.382ns (Levels of Logic = Cool
  Clock Path Skew:      -0.021ns (0.239 - 0.260)
  Source Clock:         clk rising at 0.000ns
  Destination Clock:    clk rising at 6.666ns
  Clock Uncertainty:    0.142ns

  Clock Uncertainty:          0.142ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.213ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: engine/sha2/pipe/stagegen[5].stageX/o_state_32 to engine/sha2/pipe/stagegen[6].stageX/t2_30
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X49Y175.AMUX   Tshcko                0.461   engine/sha2/pipe/stagegen[5].stageX/state<3>
                                                       engine/sha2/pipe/stagegen[5].stageX/o_state_32
    SLICE_X32Y164.A5     net (fanout=2)        4.629   engine/sha2/pipe/stagegen[5].stageX/o_state<32>
    SLICE_X32Y164.COUT   Topcya                0.395   engine/sha2/pipe/stagegen[6].stageX/t2<3>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_lut<0>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<3>
    SLICE_X32Y165.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<3>
    SLICE_X32Y165.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<7>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<7>
    SLICE_X32Y166.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<7>
    SLICE_X32Y166.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<11>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<11>
    SLICE_X32Y167.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<11>
    SLICE_X32Y167.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<15>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<15>
    SLICE_X32Y168.CIN    net (fanout=1)        0.082   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<15>
    SLICE_X32Y168.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<19>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<19>
    SLICE_X32Y169.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<19>
    SLICE_X32Y169.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<23>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<23>
    SLICE_X32Y170.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<23>
    SLICE_X32Y170.COUT   Tbyp                  0.076   engine/sha2/pipe/stagegen[6].stageX/t2<27>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<27>
    SLICE_X32Y171.CIN    net (fanout=1)        0.003   engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_cy<27>
    SLICE_X32Y171.CLK    Tcinck                0.341   engine/sha2/pipe/stagegen[6].stageX/t2<31>
                                                       engine/sha2/pipe/stagegen[6].stageX/Madd_e0[31]_maj[31]_add_2_OUT_xor<31>
                                                       engine/sha2/pipe/stagegen[6].stageX/t2_30
    -------------------------------------------------  ---------------------------
    Total                                      6.382ns (1.653ns logic, 4.729ns route)
                                                       (25.9% logic, 74.1% route)

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
September 18, 2011, 09:19:54 AM
 #57

As a little encouragement, here's a decent run for my old design (ISE synth+map+p&r, letting synth infer shift regs, no placement constraints, ...) for -3 speed grade

Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                92,964 out of 184,304   50%
    Number used as Flip Flops:              92,819
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:              145
  Number of Slice LUTs:                     62,141 out of  92,152   67%
    Number used as logic:                   34,288 out of  92,152   37%
      Number using O6 output only:          21,087
      Number using O5 output only:             424
      Number using O5 and O6:               12,777
      Number used as ROM:                        0
    Number used as Memory:                   2,721 out of  21,680   12%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         2,721
        Number using O6 output only:           450
        Number using O5 output only:             0
        Number using O5 and O6:              2,271
    Number used exclusively as route-thrus: 25,132
      Number with same-slice register load: 25,117
      Number with same-slice carry load:        15
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                16,519 out of  23,038   71%
  Number of LUT Flip Flop pairs used:       62,163
    Number with an unused Flip Flop:         2,573 out of  62,163    4%
    Number with an unused LUT:                  22 out of  62,163    1%
    Number of fully used LUT-FF pairs:      59,568 out of  62,163   95%
    Number of slice register sites lost
      to control set restrictions:               0 out of 184,304    0%

...
Total REAL time to PAR completion: 19 mins 45 secs
Total CPU time to PAR completion: 20 mins 34 secs


Timing:
 ================================================================================
 Timing constraint: TS_coreclk = PERIOD TIMEGRP "tncoreclk" 182 MHz HIGH 50% INPUT_JITTER 0.2 ns;
  3102658 paths analyzed, 386321 endpoints analyzed, 0 failing endpoints
  0 timing errors detected. (0 setup errors, 0 hold errors, 0 component switching limit errors)
  Minimum period is   5.262ns.
 --------------------------------------------------------------------------------
 
 Paths for end point XLXI_A/rb20/regt1_31 (SLICE_X104Y33.CIN), 252 paths
 --------------------------------------------------------------------------------
 Slack (setup path):     0.232ns (requirement - (data path - clock path skew + uncertainty))
   Source:               XLXI_A/rb19/outE_17 (FF)
   Destination:          XLXI_A/rb20/regt1_31 (FF)
   Requirement:          5.494ns
   Data Path Delay:      4.928ns (Levels of Logic = Cool
   Clock Path Skew:      -0.111ns (0.620 - 0.731)
   Source Clock:         coreclk rising at 0.000ns
   Destination Clock:    coreclk rising at 5.494ns
   Clock Uncertainty:    0.223ns
 
   Clock Uncertainty:          0.223ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
     Total System Jitter (TSJ):  0.070ns
     Total Input Jitter (TIJ):   0.200ns
     Discrete Jitter (DJ):       0.233ns
     Phase Error (PE):           0.000ns
 
   Maximum Data Path at Slow Process Corner: XLXI_A/rb19/outE_17 to XLXI_A/rb20/regt1_31
     Location             Delay type         Delay(ns)  Physical Resource
                                                        Logical Resource(s)
     -------------------------------------------------  -------------------
     SLICE_X126Y28.BQ     Tcko                  0.408   XLXI_A/rb19/outE<19>
                                                        XLXI_A/rb19/outE_17
     SLICE_X115Y27.A4     net (fanout=8)        1.658   XLXI_A/rb19/outE<17>
     SLICE_X115Y27.A      Tilo                  0.259   XLXI_A/rb20/s1<6>
                                                        XLXI_A/rb20/s1<6>1
     SLICE_X104Y27.CX     net (fanout=1)        1.798   XLXI_A/rb20/s1<6>
     SLICE_X104Y27.COUT   Tcxcy                 0.093   XLXI_A/rb20/regt1<7>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<7>
     SLICE_X104Y28.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<7>
     SLICE_X104Y28.COUT   Tbyp                  0.076   XLXI_A/rb20/regt1<11>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<11>
     SLICE_X104Y29.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<11>
     SLICE_X104Y29.COUT   Tbyp                  0.076   XLXI_A/rb20/regt1<15>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<15>
     SLICE_X104Y30.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<15>
     SLICE_X104Y30.COUT   Tbyp                  0.076   XLXI_A/rb20/regt1<19>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<19>
     SLICE_X104Y31.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<19>
     SLICE_X104Y31.COUT   Tbyp                  0.076   XLXI_A/rb20/regt1<23>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<23>
     SLICE_X104Y32.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<23>
     SLICE_X104Y32.COUT   Tbyp                  0.076   XLXI_A/rb20/regt1<27>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<27>
     SLICE_X104Y33.CIN    net (fanout=1)        0.003   XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_cy<27>
     SLICE_X104Y33.CLK    Tcinck                0.314   XLXI_A/rb20/regt1<31>
                                                        XLXI_A/rb20/Madd_s1[31]_ch[31]_add_18_OUT_xor<31>
                                                        XLXI_A/rb20/regt1_31
     -------------------------------------------------  ---------------------------
     Total                                      4.928ns (1.454ns logic, 3.474ns route)
                                                        (29.5% logic, 70.5% route)

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
September 18, 2011, 09:48:35 AM
 #58

And here the exact same HDL/settings but using ISE 13.2 and tightening timing a bit:

Total REAL time to MAP completion:  30 mins 17 secs
Total CPU time to MAP completion:   28 mins 23 secs

Slice Logic Utilization:
  Number of Slice Registers:                92,968 out of 184,304   50%
    Number used as Flip Flops:              92,823
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:              145
  Number of Slice LUTs:                     60,406 out of  92,152   65%
    Number used as logic:                   34,257 out of  92,152   37%
      Number using O6 output only:          21,087
      Number using O5 output only:             409
      Number using O5 and O6:               12,761
      Number used as ROM:                        0
    Number used as Memory:                   2,721 out of  21,680   12%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         2,721
        Number using O6 output only:           450
        Number using O5 output only:             0
        Number using O5 and O6:              2,271
    Number used exclusively as route-thrus: 23,428
      Number with same-slice register load: 23,414
      Number with same-slice carry load:        14
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                15,446 out of  23,038   67%
  Number of LUT Flip Flop pairs used:       60,460
    Number with an unused Flip Flop:           868 out of  60,460    1%
    Number with an unused LUT:                  54 out of  60,460    1%
    Number of fully used LUT-FF pairs:      59,538 out of  60,460   98%
    Number of slice register sites lost
      to control set restrictions:               0 out of 184,304    0%

Total REAL time to Router completion: 25 mins 23 secs
Total CPU time to Router completion: 24 mins 26 secs


----------------------------------------------------------------------------------------------------------
  Constraint                                |    Check    | Worst Case |  Best Case | Timing |   Timing   
                                            |             |    Slack   | Achievable | Errors |    Score   
----------------------------------------------------------------------------------------------------------
  TS_coreclk = PERIOD TIMEGRP "tncoreclk" 1 | SETUP       |     0.233ns|     5.172ns|       0|           0
  85 MHz HIGH 50% INPUT_JITTER 0.2 ns       | HOLD        |     0.316ns|            |       0|           0

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 18, 2011, 10:45:11 PM
Last edit: September 18, 2011, 11:08:33 PM by rph
 #59

Interesting. 5.172ns == 193MHz!

Thanks for the data.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
September 19, 2011, 05:53:31 PM
 #60

Yep, and that's with 200ps clock jitter, your assumed 0-jitter clock would knock it down to 5.091ns cycle time => a bit over 196MHz Grin
On my real world rev1.1 boards with -2 speed grade, 1230-1250mV Vccint and 25°C ambient, this bitstream averages 193.9 MHz at very low error rate (0 errors over 2**35 hashes).
Pushing up error rate to 0.1% => 198.3MHz average.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!