Bitcoin Forum
April 25, 2024, 04:34:28 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [27] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432886 times)
teknohog
Sr. Member
****
Offline Offline

Activity: 519
Merit: 252


555


View Profile WWW
August 28, 2011, 06:15:01 PM
Last edit: August 30, 2011, 04:25:45 PM by teknohog
 #521

I've been discussing with jonand about building a cluster of cheap FPGAs, and I've got the basic idea working:

https://github.com/teknohog/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/DE2_115_cluster

It is a "cluster" of two miners on a single FPGA, but the links are asynchronous serial ports, so it could just as well be distributed. There is a hub that distributes work to miners and consolidates results, so a single serial port could drive a big number of FPGAs.

This will take some work to make everything smooth, but I think the basic idea is valid. For example, there is an if-else construct that should be rewritten (with generate?) for any number of miners.

Edit: the code is verified to work with an unholy alliance of one Xilinx and one Altera chip.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
1714019668
Hero Member
*
Offline Offline

Posts: 1714019668

View Profile Personal Message (Offline)

Ignore
1714019668
Reply with quote  #2

1714019668
Report to moderator
In order to get the maximum amount of activity points possible, you just need to post once per day on average. Skipping days is OK as long as you maintain the average.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714019668
Hero Member
*
Offline Offline

Posts: 1714019668

View Profile Personal Message (Offline)

Ignore
1714019668
Reply with quote  #2

1714019668
Report to moderator
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 28, 2011, 07:27:10 PM
 #522

Quote
I've been discussing with jonand about building a cluster of cheap FPGAs, and I've got the basic idea working:
Very, very cool!

makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
August 28, 2011, 10:53:02 PM
 #523

Speaking of cheaper FPGAs, in theory the faster speed grades of EP3C25 should be able to reach 30 MHash/s (though EP4C22 only appears to be capable of 25 and I haven't even been able to achieve this in practice due to DE0-nano issues). See the de0-nano-hax branch in my github repo. Uses up pretty much all the FPGA resources though, so I've no idea how interfaces like teknohog's would fare and it's probably mostly a curiosity.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
August 29, 2011, 06:26:26 PM
 #524

Hello again.
Maybe someone have an idea what would happen if file generated for LX75 I will try to load to LX150?
I have only web edition of ISE and that supports only to LX75 max. Maybe I will buy X6500 board from our FPGA eagles ( Wink )and there will be LX 150... I have some knowlage, so I want to try what its worth Wink
I suspect that it won't work, but confirmation will make me happier Wink
TIA.

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 30, 2011, 12:19:49 AM
 #525

Quote
Maybe someone have an idea what would happen if file generated for LX75 I will try to load to LX150?
Configuration data for an LX75 is not compatible with an LX150, so no matter what you do it won't work.

If you attempt to load a normally generated bitstream from an LX75 onto an LX150, configuration will immediately fail. Technically you can hack the bitstream and force it to load, but A) Xilinx says it may "damage your hardware if you do", and B) as states above the configuration data isn't compatible so you accomplish nothing.

I should note that boards like the X6x00 do not require ISE to load firmware into the FPGA. You'd only need ISE if you wish to compile your own firmware and bitstreams. It's unfortunate, but that's how Xilinx's licensing works :/ They offer a 30-day trial which I think lets you compile for any device.

Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
August 30, 2011, 06:22:02 AM
 #526

Just as I suspected.
I figure it out that you will use FT2232 in your design to load bitstream and data to FPGA Wink Correct? I use them to in few projects...
I want to learn FPGA's (I think about it for some 3 years) thats why I need to use ISE.
So 30-day trial or LX75. It would be possible to order X6500 with one LX150 and one LX75?

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
August 30, 2011, 10:34:22 AM
Last edit: August 30, 2011, 10:47:39 AM by makomk
 #527

Due to the lovely Bitcoin community I'm discontinuing development on my forks of this code. Also, one of the moderators is trying to get me banned which would kind of make it impractical anyway.

Edit: Really, I should've done this a long time ago - the Bitcoin community is freaking toxic - but it's so interesting...

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
teknohog
Sr. Member
****
Offline Offline

Activity: 519
Merit: 252


555


View Profile WWW
August 30, 2011, 03:00:51 PM
 #528

I want to learn FPGA's (I think about it for some 3 years) thats why I need to use ISE.

If you really want to learn about FPGAs, and not just mine Bitcoins, you will be much happier with a traditional dev kit. There are so many other fun things do with an FPGA, and for those you will need some I/O connectors, and a few onboard buttons/switches/LEDs will come in handy. Even some cheap kits will run a miner, for example the one mentioned here:

https://bitcointalk.org/index.php?topic=9047.msg451101#msg451101

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
August 30, 2011, 06:20:35 PM
 #529

Yes, I'm thinking about real dev kit too. But prices are sometimes too high (here in Poland that 69$ dev kit costs 130$). So buying mining rig makes more sense. And I will have excuse to my wife why I need to spent so much money:) Maybe there will be some room for few modifications. I'm wating for some specs of that new mining boards...

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
pusle
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile
September 19, 2011, 06:50:54 PM
 #530


Sorry, Imma cross posting biatch  Grin


This is a 5 input carry save adder I made in an attempt to fit two full chains in an LX150.
I don't know if this helps you guys out but I don't have time to test it myself atm  Embarrassed

download it here:  http://www.omegav.ntnu.no/~kamben/adder5x.vhd

or copy paste this:

-- This block uses 94 LUTs with only 29 Carry chain LUTs. (sliceM/L) (implemented purely combinatorial, no regs )
-- XST synth of 4 or 5 input adder uses 64 LUTs with 64 carry chain LUTs. (sliceM/L)   (implemented purely combinatorial, no regs )

LIBRARY IEEE;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned."+";

--Library UNISIM;
--use UNISIM.vcomponents.all;

ENTITY adder5x IS PORT (
 reset               : IN  std_logic;
 clk                 : IN  std_logic;

  ina                 : IN  std_logic_vector(31 downto 0);   
  inb                 : IN  std_logic_vector(31 downto 0);   
  inc                 : IN  std_logic_vector(31 downto 0);   
  ind                 : IN  std_logic_vector(31 downto 0);   
  ine                 : IN  std_logic_vector(31 downto 0);   
 
  qout                : OUT std_logic_vector(31 downto 0)); 
END  adder5x;


ARCHITECTURE rtl OF adder5x IS
 

SIGNAL  a:  std_logic_vector(31 downto 0);
SIGNAL  b:  std_logic_vector(31 downto 0);
SIGNAL  c:  std_logic_vector(31 downto 0);
SIGNAL  d:  std_logic_vector(31 downto 0); 
SIGNAL  e:  std_logic_vector(31 downto 0);   
SIGNAL  qr: std_logic_vector(31 downto 0);
--

SIGNAL SA,SAr     :std_logic_vector(31 downto 0);
SIGNAL SB,SBr     :std_logic_vector(31 downto 2);
SIGNAL S1,S2,S3   :std_logic_vector(31 downto 0);

--SIGNAL fasit : std_logic_vector(31 downto 0);

BEGIN
 
 
--  input_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN     
      a<=ina;
      b<=inb;                         
      c<=inc; 
      d<=ind; 
      e<=ine;   
--   END IF; 
--END PROCESS;   


--  pipe_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN          
      SAr<=SA;                           -- if your whole "chain" only has 1 pipeline register
      SBr<=SB;                           -- this might be a good place to put it
--   END IF; 
--END PROCESS;
   
   
--  output_reg: PROCESS (reset, clk)
--BEGIN 
--   IF (clk'event AND clk='1') THEN     
      qr<=SAr+(SBr & "00");             -- Regular carry chain adder for the last stage       
--   END IF; 
--END PROCESS; 


qout<=qr;


--fasit<=a+b+c+d+e;

------------
--calc


-- first LUT column of adder
-- 5 single bit inputs -> 3 bit sum output
LUT_stage1:FOR i IN 0 TO 31 GENERATE   
 
---------
S1(i)<=a(i) XOR b(i) XOR c(i) XOR d(i) XOR e(i);

-----
-- forced LUT alternative. slightly faster, uses more overall LUTs
-- could save 1 sliceM/L for every 2 adder blocks. Might make routing easier.

--LUT5_inst1a : LUT5
--generic map (
--INIT => x"96696996")
--port map (
--O =>  S1(i),
--I0 => a(i),
--I1 => b(i),
--I2 => c(i),
--I3 => d(i),
--I4 => e(i));   
-----
---------

LUT_inst1bc : LUT6_2
generic map (
INIT => x"E8808000177E7EE8")       
port map (
O6 =>  S3(i),
O5 =>  S2(i),
I0 => a(i),       
I1 => b(i),     
I2 => c(i),   
I3 => d(i),   
I4 => e(i),
I5 => '1');     

END GENERATE;


-- 2x3bit LUT sums -> 2+2bit output sum
-- max sum =  5+(2*5)=15, range 0-15 -> exact 4 bit
LUT_stage2A:FOR i IN 0 TO 15 GENERATE   

SA((i*2))<=S1((i*2));
SA((i*2)+1)<=S2((i*2)) XOR S1((i*2)+1); 

END GENERATE;   


--SB(0)<='0';
--SB(1)<='0'; 

LUT_stage2B:FOR i IN 0 TO 14 GENERATE   

LUT_inst2cd : LUT6_2
generic map (
INIT => x"0077640000641364")   
port map (
O6 =>  SB((i*2)+3),
O5 =>  SB((i*2)+2),
I0 => S2((i*2)),      -- B1
I1 => S3((i*2)),      -- C1
I2 => S1((i*2)+1),    -- A2
I3 => S2((i*2)+1),    -- B2
I4 => S3((i*2)+1),    -- C2
I5 => '1');   --   

END GENERATE;   


END rtl;
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
November 14, 2011, 02:23:32 AM
 #531

I have a quick question for those familiar with Xilinx Virtex-6 chips:

I have a different application that requires double SHA256 of a 256-bit string, thus the typical mining optimizations don't apply. Would the fully unrolled all-combinatorial-logic hasher fit into the XC6VLX240T-1FFG1156 that is included in the ML605 evaluation board? Please disregard any speed issues. At this moment I'm only concerned with the correctness of the implementation and being able to use my old VHDL files. The goal is to reproduce the defects in some faulty silicon of historical value.

With quite a difficulty I installed evaluation ISE_DS on my Ubuntu 10.04.3 and even managed to start the Xilinx FPGA Editor that uses old Motif libraries. But attempting to do any implementation on Virtex-6 device on my 4GB RAM laptop is hopeless; it goes deeply into swapping storm.

I tried to understand the modifications that somebody made to get this miner run on ML605. It apparently had 3 hashing cores, but I'm unclear if the DSP48E1 use was a requirement or choice. I'm also unclear if the 3 hashing cores were 3*single-SHA256 or 3*double-SHA256.

Thanks for any pointers you may have.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
November 14, 2011, 03:24:02 PM
 #532

Modifications are small so, if one fully unrolled core (double SHA256) have been packed to Spatan6 LX150, there shouldn't be ANY problem to fit into V6, even without using DSP slices.

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
November 14, 2011, 04:33:00 PM
Last edit: November 14, 2011, 05:02:33 PM by 2112
 #533

I haven't looked in FPGA Editor because the Linux version is a pain in the ass to actually run on my distro. It uses some grotty ancient Motif-based wrapper library that emulates some prehistoric version of the Windows API and requires deep magic to get it to launch.)
For the future reference: running ISE_DS' fpga_editor on Ubuntu Lucid requires the following magic incantations:

1) install old versions of libstdc++ from Ubuntu Hardy:
    libstdc++5_3.3.6-15ubuntu4_i386.deb libstdc++5_3.3.6-15ubuntu4_amd64.deb

2) install libmotif3 and libmotif-dev for your main architecture. If your Ubuntu is 64-bit then
    you may also want to install the 32-bit versions, as the Motif libraries aren't built with multilib:
    libmotif3_2.2.3-4_i386.deb libmotif-dev_2.2.3-4_i386.deb

3) change the DISPLAY environment variable to use the non-multiscreen format
    DISPLAY=:0

Instead of fighting with whatever user-firendly package management tools you are using you have an option of instaling the few releveant *.so and *.a by hand:

a) mkdir temp; cd temp
b) ar xv ../whatever.deb
c) tar xzvf data.tar.gz
d) sudo mv usr/lib/*.{so*,a} /usr/lib (when installing libraries matching the system)
e) sudo mv usr/lib/*.{so*,a} /usr/lib32 (when installing 32-bit libraries on the 64-bit system)
f) cd ..; rm -rf temp

Seems like 64-bit FPGA editor isn't starting cleanly on 64-bit Ubuntu and gives the following dynamic linking warnings:

.../bin/lin64/_fpga_editor: Symbol `_XtperDisplayList' causes overflow in R_X86_64_PC32 relocation
.../bin/lin64/_fpga_editor: Symbol `_XtGetPerDisplayInput' causes overflow in R_X86_64_PC32 relocation

but it appeared to operate correctly during my short inspection. I nonetheless installed the required 32-bit libraries on my 64-bit system and the 32-bit FPGA editor starts without any complains.

Oh, and the last thing: ISE seems to have hardcoded the Acrobat as a PDF reader. The quick workaround is:

cd /usr/bin; sudo ln -s evince acroread

No need to restart the Project Navigator.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
November 14, 2011, 07:09:32 PM
 #534

Modifications are small so, if one fully unrolled core (double SHA256) have been packed to Spatan6 LX150, there shouldn't be ANY problem to fit into V6, even without using DSP slices.
Thanks for the encouragement. I managed to do an implementation on VLX240T with LOOP_LOG2=1 on my 4GB laptop. It has about 33% utilization of SLICEs, design strategy was "Runtime reduction with multithreading" and the swapping wasn't that bad.

So the next question I have is: would I be able to keep comfortably doing my trial VLX240T designs if I upgrade my laptop to 8GB of RAM? Or would the experienced people rather suggest that I dedicate a desktop machine with 8-16GB RAM and a PCIe slot for my ML605 experiments? Xilinx says that the ISE_DS will be nodelocked to the actual Virtex-6 chip, I presume through the USB cable, right?
 

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
November 21, 2011, 11:28:11 PM
 #535

I have a quick question for those familiar with Xilinx Virtex-6 chips:

I have a different application that requires double SHA256 of a 256-bit string, thus the typical mining optimizations don't apply. Would the fully unrolled all-combinatorial-logic hasher fit into the XC6VLX240T-1FFG1156 that is included in the ML605 evaluation board? Please disregard any speed issues. At this moment I'm only concerned with the correctness of the implementation and being able to use my old VHDL files. The goal is to reproduce the defects in some faulty silicon of historical value.

With quite a difficulty I installed evaluation ISE_DS on my Ubuntu 10.04.3 and even managed to start the Xilinx FPGA Editor that uses old Motif libraries. But attempting to do any implementation on Virtex-6 device on my 4GB RAM laptop is hopeless; it goes deeply into swapping storm.

I tried to understand the modifications that somebody made to get this miner run on ML605. It apparently had 3 hashing cores, but I'm unclear if the DSP48E1 use was a requirement or choice. I'm also unclear if the 3 hashing cores were 3*single-SHA256 or 3*double-SHA256.

Thanks for any pointers you may have.


Hi,

I am the one who put 3 copies of the fully unrolled cores in the LX240T on the ML605.  I had to use the DSP48Es to get it to fit.  If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.  So, the answer to your question is yes, there should be no problem with a single instance of the double SHA256 core fitting into the ML605.

I also found that more than 4 GB memory was used when building with 3 copies of the fully unrolled code.  I'm using some older version of red hat for my development enviroment.  If you're running a 64 bit version of the application, upgrading to 8 GB of memory should get you going just fine.

As far as your ISE license question, I think the ISE might only be licensed to produce bitstreams for the specific device on the ML605.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
November 22, 2011, 12:32:05 AM
 #536

If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.
Thank you very much for your valuable input. If you have a moment, could you please post a snippet of HDL code that shows how you convinced ISE DS to use DPS48s for adders? Does ISE have some flag to make it infer DSP48s from additions? Or did you have to explicitly instantiate them?

Since my last post in this thread I learned a lot about ISE software. The license is node-locked to the Ethernet MAC address using standard FlexLM technology. So it allows for designing on one system and running the design on another system. I was afraid of a node-locking technology that would require connecting the ML605 board to the system that runs ISE to allow it to check the license.

Also, would you dare to speculate what will be the initial pricing on the Kintex-7 KC705 evaluation kit? I hesitate to buy ML605 right now because I could not really start working on it immediately due to the need to reorganize and remodel my physical workspace. On the other had I'm completely fascinated with contemporary FPGA design after a long break from doing any hardware-oriented design.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Enigma81
Full Member
***
Offline Offline

Activity: 180
Merit: 100



View Profile
November 22, 2011, 06:41:25 AM
 #537

The Kintex-7 Chips are fairly similar in price to the Virtex-6 Chips, but have about 50% more LUTs for that cost..

Digikey has a few of the chips in stock
http://search.digikey.com/us/en/cat/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/2556262?k=XC7

Based upon that information, I would expect the KC705 to be similarly priced to Virtex-6:240 Evaluation boards

Enigma
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410
Merit: 252


Watercooling the world of mining


View Profile
November 22, 2011, 06:51:11 AM
 #538

The Kintex-7 Chips are fairly similar in price to the Virtex-6 Chips, but have about 50% more LUTs for that cost..

Digikey has a few of the chips in stock
http://search.digikey.com/us/en/cat/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/2556262?k=XC7

Based upon that information, I would expect the KC705 to be similarly priced to Virtex-6:240 Evaluation boards

Enigma
For such a Xc7325T chip used in the eval board you quoted. Wich hashrate would you expect ?
How much different would its code be ?
Maybe the xc7 series is a potential canidate for the next fpga board generation.

Enigma81
Full Member
***
Offline Offline

Activity: 180
Merit: 100



View Profile
November 22, 2011, 07:55:21 AM
 #539

I haven't really looked all that much at the 7 series FPGA chips yet, so I can't speak very intelligently about them - but in general, I wouldn't expect the VHDL to be much different at all, and would assume a hashrate for that chip of somewhere in the neighborhood of 1000-1200 MH/s with good placement and resource usage.

These are ALL assumptions though.  Take those numbers with a very large grain of salt until someone takes the time to cram some cores in and synthesize a design on the new chip..

Edited to explain my math:
I'm guessing that 5-6 fully unrolled double SHA256 cores can be stuffed into the Kintex-7 with that number of LUTS (Guess Number 1)
I'm guessing that even in the new architecture, it's going to be damn difficult to place those cores with much better than 5ns delay = 200Mhz (Guess Number 2)
I'm guessing the new architectecture doesn't allow any magic that I don't yet know about/understand (Guess Number 3)
At one hash per clock, that's 200MH/s/core = 1000 to 1200 MH/s/chip

As you can see, lots of guesses.. but they're fairly educated..
Enigma81
Full Member
***
Offline Offline

Activity: 180
Merit: 100



View Profile
November 22, 2011, 08:09:02 AM
 #540

Just for reference, The Kintex-7 we're speaking of would have to manage 2025MH/s to beat the Spartan-6 LX150 in terms of MH/$

It's actually a little lower than that, since you only need one (bigger) power supply for the Kintex compared to 10 power supplies and support chips for the 10 spartans, but that's a rough estimate of where it would have to be to compete with the current Spartan-6 designs.  Can it get there... My gut feeling is no..

It will probably be able to beat the Spartan-6 in power consumption, but really, the spartan is already so low compared to GPU designs that it doesn't really matter.  Additionally, we're talking about such low power consumption that even a significant percentage of power saving would take a long time to pay off the more expensive chip.

Enigma
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [27] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!