Bitcoin Forum
June 22, 2024, 12:41:39 PM *
News: Voting for pizza day contest
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 [2]
21  Other / Off-topic / Re: Diablo Mining Company will never buy Butterfly Labs hardware on: July 05, 2012, 10:30:03 PM
What do you think that Kintex 7 480 would do Mh/s and Mh/W? Any insight into Artix 7s? They are supposed to slot into the Spartan 6 space once available in good volume.

I haven't tried mining on the 480s, only on V6 240s.  The reason behind this is I have a handful of unused boards that have V6s on them, while the K7 boards are all used for my real work.

I made an entry in the mining hardware comparison long ago:

The V6 240s run at 375 MH/s at about 16w.  This isn't optimized (other than to use some DSP48s), so I guess you could probably squeeze another 50-100 MH/s out of the devices with some effort.

The MH/s based on the size of the device breaks down as follows:

375MH/s / 240 = 1.5625 MH/s per "size unit"

Based on this, I would estimate the 480 could do:

480 * 1.5625 = 750 MH/s

I haven't used Xilinx's power estimator for a K7 bitcoin design, but Xilinx claims 50% less power than the V6.

So, you could assume your worst case would be 32w, best case 16w.  Somewhere in the middle will probably be your actual power.

BUT!!  You may think the Kintex 7 would be better for mining (price/performance) than a Spartan or Artix device, but you'd be wrong!

Take the Mod miner for example:

840 MH/s @ 40w in 4 x Spartan 150 devices

840 MH/s / (4 * 150) = 1.4 MH/s per "size unit"

So, the S6 is a little less efficient in terms of size/performance (partly because I used DSP48s in the V6 example to reduce logic usage, but there are other factors), but the price difference is huge.  I think the S6 LX150 is ~$100, so there's no real good reason to buy the K7 or V6.  The K7 and V6 provide more advanced functionality (high speed serial, more internal memory, more DSP slices, more pins, etc) that aren't required for bitcoin mining.  You'd end up paying for features you don't use.

Once the Artix devices come out in force, I suspect they will be similar in price/performance to the S6.  Power consumption will be somewhere between 50% less and the same as the S6.  However, they are the last devices to go into production (after the K7 and V7).
22  Other / Off-topic / Re: Diablo Mining Company will never buy Butterfly Labs hardware on: July 05, 2012, 09:56:54 PM
While performance could come close, Virtex 7 2000 can never compete with the price of BFL's ASIC, and at the end of the day, the main concern is how quickly can you pay off for the hardware.

And lets not mention power consumption. 'per chip' is an utterly meaningless metric, particularly when per GH costs and power efficiency of asics can be two orders of magnitude higher for any given process.

Anyway, there is no point trying to talk sense in to D3D.


BFL has not stated the power consumption of their SC minirigs, and their chips might be 130 or 90mm fab, while Virtex 7 is 28nm.

Even if Virtex 7s or Kintex 7s cost 3x more per mhash to what BFL claims theirs do, so what? At least the vendors of existing Spartan 6 products actually have shown to be committed to their relationship with the Bitcoin community, and these will be most likely the ones that are making Kintex/Virtex 7 products.

I have no idea how much they cost, but judging by the real cost of BFL's FPGAs in the MiniRig, and the fact that 28nm is very new so will be very expensive with no 2nd hand market, I would estimate at best you are looking at $1500 for 5GH Virtex7 vs $300 for 7GH with 2 ASIC coffee warmers.

Most likely BFL will miss the October date.

Lets say the Virtex 7 is 13x more usable slices and can be clocked 2x what Spartan 6s are stable at, thats around 7.2ghash. I think someone on the forums said they'd cost about $1000 each, so we're looking at, say, a $1200 product price. Thats only 4x more expensive per mh.

Rolling this as a SASIC could drop the price even more and also increase the stable clock speed 3-4x over Spartan 6s, or 10.8 - 14.4 ghash for, say, a $1000 product. Thats in the same ballpark as what BFL is claiming.

Woah, there's NO way the V7-2000 part will cost $1000.  As you may know, the price of the chips really depends on the so called "relationship" between Xilinx and your company.  The company must negotiate a discount based on its sized, volume and other "factors".

I work for a large company that is a huge user of Xilinx FPGAs (ranging in size small devices to large devices).  We're currently using Kintex 7 devices on a few projects.  For example, the production device cost of a Kintex 7 480 device for us is about $900 in quantities of at least 1000 per year.  This price goes down as the device gets older.  The price to a small time company will probably be higher because their overall volume of all devices combined will be much lower.

However, based on this pricing Xilinx has shown us for the Virtex 7 devices, I would expect the low volume price of the V7 2000 device (their largest ever) to be at least $10,000 in low volume.  The low volume price for the largest V6 device right now is over 12k.  I'd guess once the V7-2000T finally goes in to production, even if you could get the deep discounts, you'd probably have to pay 3-5k each.

The other problem is that the bigger the devices get, the harder it is to get a good yield.  The huge V7 2000 is 2 to 4 (I forget the exact number) separate devices glued together.  The yields can prevent this chips from being produced in high volume, so supply can also be an issue in addition to price.  Usually these chips go to the biggest customers first, since they are the most important accounts.

I believe that the best we can hope for from FPGAs is to find the sweet spot in price/performance and use multiple devices.  This is how all the FPGA mining products are setup today.  I believe that the price/performance ratio won't get much better than what is available from the FPGA mining solutions today.

The Xilinx hardcopy equivalent, easypath, costs $300k in upfront costs right off the bat for a %35 cost reduction over your current FPGA pricing...  So even this path can be difficult.
23  Economy / Currency exchange / Re: FastCash4Bitcoins: we buy your coins for cash TODAY! (Update: Dwolla in stock) on: June 08, 2012, 06:38:45 AM
+1 Sent some Dwolla USD and some BTC and completed the transaction quickly!  Will be doing more business with Gerald in the future!
24  Economy / Currency exchange / Re: UPDATE: FastCash4Bitcoins now has VISA gift cards for online/offline use. on: June 07, 2012, 04:54:37 AM
How do we conduct a transaction?  Via PM?
25  Bitcoin / Bitcoin Discussion / Re: [ANN] New AML Policy for all Dwolla users. on: May 28, 2012, 12:42:19 AM
Has anybody sent in their DL and a utility bill to make a Dwolla withdrawal since May 25th?  I'm somewhat wary of sending my personal info to Mt. Gox (What will happen to my info after the fact).
26  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: November 22, 2011, 07:07:02 PM
If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.
Thank you very much for your valuable input. If you have a moment, could you please post a snippet of HDL code that shows how you convinced ISE DS to use DPS48s for adders? Does ISE have some flag to make it infer DSP48s from additions? Or did you have to explicitly instantiate them?

Since my last post in this thread I learned a lot about ISE software. The license is node-locked to the Ethernet MAC address using standard FlexLM technology. So it allows for designing on one system and running the design on another system. I was afraid of a node-locking technology that would require connecting the ML605 board to the system that runs ISE to allow it to check the license.

Also, would you dare to speculate what will be the initial pricing on the Kintex-7 KC705 evaluation kit? I hesitate to buy ML605 right now because I could not really start working on it immediately due to the need to reorganize and remodel my physical workspace. On the other had I'm completely fascinated with contemporary FPGA design after a long break from doing any hardware-oriented design.


Hi, here's a section from the sha256_transform.v.  I picked one of the larger adders to replace with DSPs to help preserve logic resources.

I replaced a 4 input adder with a cascade of 2 DSPs.  I used coregen to generate two different DSP instances.  One with a 2 input adder (dsp_2_input) that used its dedicated carry routing (pcout) to connect to a 3 input adder (dsp_3_input_cascade) that used its dedicated carry routing (pcin).

I know you can ask ISE to infer DSP48s, but I think that's more a shotgun approach that I've never had much luck with.

Code:
	//////////////// Begin DSP adder new_w ////////////
//wire [31:0] new_w = s1_w + rx_w[319:288] + s0_w + rx_w[31:0];
wire [31:0] new_w;

wire [47:0] new_w_stage1_pcout;
wire[47:0] new_w_stage2_out;

dsp_2_input new_w_stage1 (
.c(s1_w), // input [31 : 0] c
.concat(rx_w[319:288]), // input [31 : 0] concat
.pcout(new_w_stage1_pcout), // ouput [47 : 0] pcout
.p()); // ouput [31 : 0] p

dsp_3_input_cascade new_w_stage2 (
        .pcin(new_w_stage1_pcout), // input [47 : 0] pcin
        .c(s0_w), // input [31 : 0] c
        .concat(rx_w[31:0]), // input [31 : 0] concat
        .pcout(), // ouput [47 : 0] pcout
        .p(new_w_stage2_out)); // ouput [47 : 0] p
       
        assign new_w = new_w_stage2_out[31:0];


/////////////// End DSP adder /////////////

Now as far as the KC705 boards, I am guessing that they will be about the same price as the ML605s, around $2000.  Now, you have to watch out because the first runs of the boards are going to be ES parts (there can be bugs).

At work, we have actually already built boards (not for bitcoin of course!) using ES K7 325T devices.  I haven't found any huge speed advantages over the V6 devices, so would not expect any huge frequency increases.  However, the device that's going on the KC705 board is going to be larger than the ML605 (240 vs 325?), so you'll have more room for your design.  However, the amount of DSP48s is about the same.

As far as porting the code from V6 to K7, if you took the bitcoin code as is, it would build for both just fine.  Except for pin changes of course.  If you started using DSP48s for the V6, you'd probably have to regenerate those for the K7, but that's not a big deal.
27  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: November 21, 2011, 11:28:11 PM
I have a quick question for those familiar with Xilinx Virtex-6 chips:

I have a different application that requires double SHA256 of a 256-bit string, thus the typical mining optimizations don't apply. Would the fully unrolled all-combinatorial-logic hasher fit into the XC6VLX240T-1FFG1156 that is included in the ML605 evaluation board? Please disregard any speed issues. At this moment I'm only concerned with the correctness of the implementation and being able to use my old VHDL files. The goal is to reproduce the defects in some faulty silicon of historical value.

With quite a difficulty I installed evaluation ISE_DS on my Ubuntu 10.04.3 and even managed to start the Xilinx FPGA Editor that uses old Motif libraries. But attempting to do any implementation on Virtex-6 device on my 4GB RAM laptop is hopeless; it goes deeply into swapping storm.

I tried to understand the modifications that somebody made to get this miner run on ML605. It apparently had 3 hashing cores, but I'm unclear if the DSP48E1 use was a requirement or choice. I'm also unclear if the 3 hashing cores were 3*single-SHA256 or 3*double-SHA256.

Thanks for any pointers you may have.


Hi,

I am the one who put 3 copies of the fully unrolled cores in the LX240T on the ML605.  I had to use the DSP48Es to get it to fit.  If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.  So, the answer to your question is yes, there should be no problem with a single instance of the double SHA256 core fitting into the ML605.

I also found that more than 4 GB memory was used when building with 3 copies of the fully unrolled code.  I'm using some older version of red hat for my development enviroment.  If you're running a 64 bit version of the application, upgrading to 8 GB of memory should get you going just fine.

As far as your ISE license question, I think the ISE might only be licensed to produce bitstreams for the specific device on the ML605.
28  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 11:40:18 PM
Quote
200MH is simply way out of the question for an S6-LX150.
That won't stop me from trying  Grin

As far as I can tell with the poking around I've done so far, the current bottleneck on the S6-LX150 is the far dependencies caused by the W calculations. These references make it so that the rounds are not isolated, and so cannot be routed into a uniform chain. This forces ISE to do completely absurd routing, splattering the placement of a round's components across a good 1/4th of the chip. And that, obviously, leads to massive routing delays. On my last few compiles, the worst-case paths were >80% routing (8ns+ of routing, with 2ns of logic).

Yeah, it looks like a "giant snake" that traverses the chip Cheesy

Quote
The current critical path is approximately two 3-way 32-bit adders implemented as 16 total slices, thanks to the Spartan-6 fast carry look ahead chains. Is there a means of optimizating that logic that I have missed?

These are the adders that I tried to move into DSP48s, as they have dedicated carry paths to and from adjacent DSPs in a column.  I didn't look at all how to optimize the actual math/operations at all though.
29  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 08:00:50 PM
I don't know the usage for the single unrolled core on a S6 150, but here's the usage from my design in the V6 240T using 2 cores.  I'm guessing you'll want to use all 180 of those DSP48s to reduce the logic usage.  The number of slices and LUTs used is going to be close to your maximum capacity, but with 372 (186 per core) DSP48s used.

Code:
Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:               101,697 out of 301,440   33%
    Number used as Flip Flops:              98,581
    Number used as Latches:                      1
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:            3,115
  Number of Slice LUTs:                     88,763 out of 150,720   58%
    Number used as logic:                   67,920 out of 150,720   45%
      Number using O6 output only:          32,057
      Number using O5 output only:           1,667
      Number using O5 and O6:               34,196
      Number used as ROM:                        0
    Number used as Memory:                   9,892 out of  58,400   16%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         9,892
        Number using O6 output only:         7,362
        Number using O5 output only:             0
        Number using O5 and O6:              2,530
    Number used exclusively as route-thrus: 10,951
      Number with same-slice register load: 10,889
      Number with same-slice carry load:        62
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                27,898 out of  37,680   74%
  Number of LUT Flip Flop pairs used:      105,799
    Number with an unused Flip Flop:        28,962 out of 105,799   27%
    Number with an unused LUT:              17,036 out of 105,799   16%
    Number of fully used LUT-FF pairs:      59,801 out of 105,799   56%
    Number of slice register sites lost
      to control set restrictions:               0 out of 301,440    0%

Specific Feature Utilization:
  Number of RAMB36E1/FIFO36E1s:                 40 out of     416    9%
    Number using RAMB36E1 only:                 40
    Number using FIFO36E1 only:                  0
  Number of RAMB18E1/FIFO18E1s:                  0 out of     832    0%
  Number of BUFG/BUFGCTRLs:                      5 out of      32   15%
    Number used as BUFGs:                        5
    Number used as BUFGCTRLs:                    0
  Number of ILOGICE1/ISERDESE1s:                 0 out of     720    0%
  Number of OLOGICE1/OSERDESE1s:                 0 out of     720    0%
  Number of BSCANs:                              0 out of       4    0%
  Number of BUFHCEs:                             0 out of     144    0%
  Number of BUFIODQSs:                           0 out of      72    0%
  Number of BUFRs:                               0 out of      36    0%
  Number of CAPTUREs:                            0 out of       1    0%
  Number of DSP48E1s:                          372 out of     768   48%
  Number of EFUSE_USRs:                          0 out of       1    0%
  Number of FRAME_ECCs:                          0 out of       1    0%
  Number of GTXE1s:                              4 out of      20   20%
    Number of LOCed GTXE1s:                      4 out of       4  100%
  Number of IBUFDS_GTXE1s:                       1 out of      12    8%
  Number of ICAPs:                               0 out of       2    0%
  Number of IDELAYCTRLs:                         0 out of      18    0%
  Number of IODELAYE1s:                          0 out of     720    0%
  Number of MMCM_ADVs:                           1 out of      12    8%
  Number of PCIE_2_0s:                           1 out of       2   50%
    Number of LOCed PCIE_2_0s:                   1 out of       1  100%
  Number of STARTUPs:                            1 out of       1  100%
  Number of SYSMONs:                             0 out of       1    0%
  Number of TEMAC_SINGLEs:                       0 out of       4    0%
30  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 04:37:59 AM
Can you humor me with a guess of how much this hardware would cost, iidx?

2250 Mhash/s- Wow! I wish I had that

It's pretty unreasonable, I think each of the cards were $2000 when I bought them about 6 months ago for a project.  Xilinx has "generously" reduced the price to $1795 now...  Not sure what the raw chip prices are, maybe $500 each in volume.

Quote
Another group at my company has custom emulation platforms with more than 50 (!!) Virtex 5 parts each. I wish I could spend some quality with one of those and make it slave away in the Bitcoin mines, but sadly, that's not going to happen. I could get away with stuff like that when I was working for a little startup instead of a megacorp, but then we couldn't afford toys like that back then.

Wow, 50 devices!!  Maybe you can ask if you can do some performance testing for that group Cheesy

I actually have another board that has a Virtex 5 on it (ML555) that I thought about also trying to use.  However, it has a pretty small device and no heatsink, so it probably would be a bad idea and yield 125-150 MHash at best.
31  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!) on: August 14, 2011, 02:57:42 AM
Hi Guys,

I had a bunch of spare stuff laying around at work, so I whipped up a mining configuration as an exercise.

Supplies:
6 Xilinx ML605 cards (XC6VLX240T, Virtex 6)
PCIe switch development kit (an external board that connects to your PC and has a ton of PCIe slots)
TI USB to GPIO pod (for the ML605 power supplies)

Starting with the Xilinx Verilog port of the code, I found that I could fit 2 instances of the LOOP = 0 core.  However, that wasn't enough for me.  I figured that if I used the DSP48s in the Virtex 6, I could fit at least 1 more in there.  With the 3 cores, I am using about 558 of the hardware multipliers.  Sadly, there isn't enough to fit a 4th in there.  I may be able to fit it in there if I use a few less hardware multipliers, but it will be tight.  I ended up running the cores at 125 Mhz, because that's the same speed my PCIe internal interface runs at.  There is more headroom available, 150 Mhz is probably doable, but power will start to become a concern.

It turned out that the ML605's power supplies could supply enough power to run 3 cores, but the digital power managers were not set to allow the full rated current.  I re-programmed the power managers to allow for the rated current in order to get the 3 core version of my design working.  The designs use about 16-18 watts of power and 16-17A on the VCCint rail (1v).  I had to supply additional cooling to make sure the power supplies didn't over heat (they have no cooling normally).

Next, I had to connect it to my PC for mining data.  Now, of course 6 serial ports wasn't going to be the most elegant solution (and my PC actually had no serial ports).  I used an off the shelf PCIe core in conjunction with the Xilinx hard IP to connect the 3 bitcoin cores to the PC.  Sadly, the PCIe core is a licensed product, so I won't be able to share the source here.

The hardest part for me was last - I had to figure out what data to get from a mining pool, what to do with it and how to get it in the card.  I found some open source C# mining libraries (I need to credit the guy, but I don't have the code in front of me), modified it and wrote a mining program to feed and poll all of the cards.  It was a pain in the ass to get that finally working, but through analyzing a bunch of different mining software I figured it out.

But finally, my experiment is working @ 2250 Mhash/s and about 100w!  The cost is out of control, but since I had these cards laying around from other experiments, I figured I'd give it a shot.

I'd be happy to contribute the changes to the modifications to usethe DSP48s, but I can't actually distribute the PCIe DMA/PIO engine since it's licensed (not from Xilinx).  I'm happy to distribute the source for the software too, since I didn't really find a C#/.NET windows version that suited my needs.

Questions and comments welcome!
32  Other / Beginners & Help / Re: -d in POCLBM doesnt wanna work on: August 14, 2011, 12:47:03 AM
You still need to execute poclbm.  In that last screenshot you didn't type "poclbm -d", you just typed -d.
33  Other / Beginners & Help / Re: Whitelist Requests (Want out of here?) on: August 14, 2011, 12:25:27 AM
Hi there,

I'm interested in bypassing the 5 post 4 hour limit to contribute to the FPGA mining thread.  I have made some modifications to the publicly available FPGA code (and software), and wanted to discuss it with the developers before attempting to commit it to their repository.

I've added an entry to https://en.bitcoin.it/wiki/Mining_hardware_comparison#FPGA_Devices that compares my contribution (see Xilinx XC6VLX240T-1FFG1156) to the other submissions.  While it won't win any $/MHash awards, it might be useful for anybody with the same development boards laying around at work.

Thanks!
34  Other / Beginners & Help / Re: Newbie restrictions on: August 14, 2011, 12:10:17 AM
I'm interested in posting in the fpga miner thread.  I made some changes to the code and a version running some extra cards I have at work.  If possible, I'd like to bypass the 5 post, 4 hour limit.

Or I'll just have to make a few random posts instead  Grin

You're in the wrong thread.


I see - I missed the white list thread, thanks!
35  Other / Beginners & Help / Re: Newbie restrictions on: August 13, 2011, 09:38:49 AM
I'm interested in posting in the fpga miner thread.  I made some changes to the code and a version running some extra cards I have at work.  If possible, I'd like to bypass the 5 post, 4 hour limit.

Or I'll just have to make a few random posts instead  Grin
Pages: « 1 [2]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!