Bitcoin Forum
November 18, 2024, 04:36:11 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
Author Topic: BTCMiner - Open Source Bitcoin Miner for ZTEX FPGA Boards, 215 MH/s on LX150  (Read 161726 times)
pusle
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile
September 19, 2011, 06:48:49 PM
 #61


This is a 5 input carry save adder I made in an attempt to fit two full chains in an LX150.
I don't know if this helps you guys out but I don't have time to test it myself atm  Embarrassed

download it here:  http://www.omegav.ntnu.no/~kamben/adder5x.vhd

or copy paste this:

-- This block uses 94 LUTs with only 29 Carry chain LUTs. (sliceM/L) (implemented purely combinatorial, no regs )
-- XST synth of 4 or 5 input adder uses 64 LUTs with 64 carry chain LUTs. (sliceM/L)   (implemented purely combinatorial, no regs )

LIBRARY IEEE;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned."+";

--Library UNISIM;
--use UNISIM.vcomponents.all;

ENTITY adder5x IS PORT (
 reset               : IN  std_logic;
 clk                 : IN  std_logic;

  ina                 : IN  std_logic_vector(31 downto 0);   
  inb                 : IN  std_logic_vector(31 downto 0);   
  inc                 : IN  std_logic_vector(31 downto 0);   
  ind                 : IN  std_logic_vector(31 downto 0);   
  ine                 : IN  std_logic_vector(31 downto 0);   
 
  qout                : OUT std_logic_vector(31 downto 0)); 
END  adder5x;


ARCHITECTURE rtl OF adder5x IS
 

SIGNAL  a:  std_logic_vector(31 downto 0);
SIGNAL  b:  std_logic_vector(31 downto 0);
SIGNAL  c:  std_logic_vector(31 downto 0);
SIGNAL  d:  std_logic_vector(31 downto 0); 
SIGNAL  e:  std_logic_vector(31 downto 0);   
SIGNAL  qr: std_logic_vector(31 downto 0);
--

SIGNAL SA,SAr     :std_logic_vector(31 downto 0);
SIGNAL SB,SBr     :std_logic_vector(31 downto 2);
SIGNAL S1,S2,S3   :std_logic_vector(31 downto 0);

--SIGNAL fasit : std_logic_vector(31 downto 0);

BEGIN
 
 
--  input_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN     
      a<=ina;
      b<=inb;                         
      c<=inc; 
      d<=ind; 
      e<=ine;   
--   END IF; 
--END PROCESS;   


--  pipe_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN          
      SAr<=SA;                           -- if your whole "chain" only has 1 pipeline register
      SBr<=SB;                           -- this might be a good place to put it
--   END IF; 
--END PROCESS;
   
   
--  output_reg: PROCESS (reset, clk)
--BEGIN 
--   IF (clk'event AND clk='1') THEN     
      qr<=SAr+(SBr & "00");             -- Regular carry chain adder for the last stage       
--   END IF; 
--END PROCESS; 


qout<=qr;


--fasit<=a+b+c+d+e;

------------
--calc


-- first LUT column of adder
-- 5 single bit inputs -> 3 bit sum output
LUT_stage1:FOR i IN 0 TO 31 GENERATE   
 
---------
S1(i)<=a(i) XOR b(i) XOR c(i) XOR d(i) XOR e(i);

-----
-- forced LUT alternative. slightly faster, uses more overall LUTs
-- could save 1 sliceM/L for every 2 adder blocks. Might make routing easier.

--LUT5_inst1a : LUT5
--generic map (
--INIT => x"96696996")
--port map (
--O =>  S1(i),
--I0 => a(i),
--I1 => b(i),
--I2 => c(i),
--I3 => d(i),
--I4 => e(i));   
-----
---------

LUT_inst1bc : LUT6_2
generic map (
INIT => x"E8808000177E7EE8")       
port map (
O6 =>  S3(i),
O5 =>  S2(i),
I0 => a(i),       
I1 => b(i),     
I2 => c(i),   
I3 => d(i),   
I4 => e(i),
I5 => '1');     

END GENERATE;


-- 2x3bit LUT sums -> 2+2bit output sum
-- max sum =  5+(2*5)=15, range 0-15 -> exact 4 bit
LUT_stage2A:FOR i IN 0 TO 15 GENERATE   

SA((i*2))<=S1((i*2));
SA((i*2)+1)<=S2((i*2)) XOR S1((i*2)+1); 

END GENERATE;   


--SB(0)<='0';
--SB(1)<='0'; 

LUT_stage2B:FOR i IN 0 TO 14 GENERATE   

LUT_inst2cd : LUT6_2
generic map (
INIT => x"0077640000641364")   
port map (
O6 =>  SB((i*2)+3),
O5 =>  SB((i*2)+2),
I0 => S2((i*2)),      -- B1
I1 => S3((i*2)),      -- C1
I2 => S1((i*2)+1),    -- A2
I3 => S2((i*2)+1),    -- B2
I4 => S3((i*2)+1),    -- C2
I5 => '1');   --   

END GENERATE;   


END rtl;


rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 20, 2011, 04:18:22 AM
Last edit: September 20, 2011, 04:30:49 AM by rph
 #62

ArtForz, you'd have at least 3 friends for life if you posted the RTL.  Grin

With 2 clocks per stage, there are no >3 input adders, and xst seems to handle those
reasonably well. synth is fine; I'm currently battling the mapper, and its craptastic 4ns routes.
Either I have some long-distance routing requirement that ArtForz somehow eliminated,
or the tools are just being dumb and need some area constraint love.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 20, 2011, 04:41:38 AM
Last edit: September 20, 2011, 05:15:20 PM by rph
 #63

Sharing is caring, so here's the business end of my VHDL. I'm planning to try
a few alternative options for the adders...

Code:
    one: if CYCLES = 1 generate
        t1     <= e1 + ch + i_t1;
        t2     <= e0 + maj;

        process(clk)
        begin
            if rising_edge(clk) then
                o_data(447 downto   0)   <= i_data(479 downto 32);
                o_data(479 downto 448)   <= s1 + i_data(287 downto 256) + i_data14;

                o_state( 31 downto   0)  <= t1 + t2;
                o_state( 63 downto  32)  <= i_state( 31 downto   0);
                o_state( 95 downto  64)  <= i_state( 63 downto  32);
                o_state(127 downto  96)  <= i_state( 95 downto  64);
                o_state(159 downto 128)  <= i_state(127 downto  96) + t1;
                o_state(191 downto 160)  <= i_state(159 downto 128);
                o_state(223 downto 192)  <= i_state(191 downto 160);


                o_t1     <= i_state(223 downto 192) + i_data(31 downto 0) + K_NEXT;
                o_data14 <= s0 + i_data(31 downto 0);
            end if;
        end process;
    end generate one;

    two: if CYCLES = 2 generate
        process(clk)
        begin
            if rising_edge(clk) then
                -- first cycle
                t1     <= e1 + ch + i_t1;
                t2     <= e0 + maj;

                data(447 downto   0)   <= i_data(479 downto 32);
                data(479 downto 448)   <= s1 + i_data(287 downto 256) + i_data14;

                state <= i_state;

                t1_p   <= i_state(223 downto 192) + i_data(31 downto 0) + K_NEXT;
                data14 <= s0 + i_data(31 downto 0);

                -- second cycle
                o_data <= data;

                o_state( 31 downto   0)  <= t1 + t2;
                o_state( 63 downto  32)  <= state( 31 downto   0);
                o_state( 95 downto  64)  <= state( 63 downto  32);
                o_state(127 downto  96)  <= state( 95 downto  64);
                o_state(159 downto 128)  <= state(127 downto  96) + t1;
                o_state(191 downto 160)  <= state(159 downto 128);
                o_state(223 downto 192)  <= state(191 downto 160);

                o_t1 <= t1_p;
                o_data14 <= data14;
            end if;
        end process;
    end generate two;

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
September 20, 2011, 09:59:47 PM
 #64

I'll give you a big fat hint: maybe having a nice and regular structure for the W updates isn't the best option...

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 23, 2011, 12:16:02 AM
Last edit: September 23, 2011, 12:28:30 AM by ztex
 #65

ArtForz and rph, motivated by your work I started a new attempt to implement a 2 stages per round design. This time I took more care of the adders (not the overall utilization and speed as before). I got I routed, but not faster than 160 MHz. But there a still several things to try out ...

I also analyzed the map/par reports from ArtForz. The design seems to be much easier to route. It is mapped/routed at least 2-3 times faster than all other designs I have seen (even the simple 1 stage per round designs). Or is there an optimizer option or constraint I missed?

The reason of this is not the the arrangement of "W". If I omit it I can't see any improvement in routability.

rph
Full Member
***
Offline Offline

Activity: 176
Merit: 100


View Profile
September 23, 2011, 03:59:41 AM
 #66

Agreed.. ArtForz's worst-case routing delays and build times are certainly much better than mine.
I have hit a wall around 156MHz. I brute-force-scanned xst/map/par options with multiple PCs,
and none of them make anywhere near a 2-3X improvement. It's an RTL/design issue;
he apparently has some tricks that we haven't discovered.

I am going to experiment with area/placement constraints next.

-rph

Ultra-Low-Cost DIY FPGA Miner: https://bitcointalk.org/index.php?topic=44891
c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
September 29, 2011, 04:56:42 AM
 #67

The 1.15x sounds promising, how is it coming along?

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
September 29, 2011, 03:09:37 PM
 #68

The 1.15x sounds promising, how is it coming along?

I sent the parts to the assembler about 2.5 weeks ago. I expect the finished boards next week. (Currently assemblers in Germany are over allocated.)

The final USD price may be a little bit lower than the first estimate due to the weaker EUR (and due to the fact that I payed the parts when the EUR was stronger).

With updated software the boards will generate at least 160 MH/s.

Here are a few pictures of the prototype (with and w/o heat sinks):




DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
September 30, 2011, 12:02:21 AM
 #69

Very nice.  Even though not economical for me I may just have to buy one anyways.

ngzhang
Hero Member
*****
Offline Offline

Activity: 592
Merit: 501


We will stand and fight.


View Profile
October 03, 2011, 05:52:08 PM
 #70

marked
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 03, 2011, 11:05:10 PM
 #71

Very nice.  Even though not economical for me I may just have to buy one anyways.
If you are looking for a general purpose development board I recommend USB-FPGA Modules 1.15d or 1.15b (http://www.ztex.de/usb-fpga-1/usb-fpga-1.15.e.html) + Experimental Board 1.3. This combination contains RAM and a some other additional features and is also more flexible.

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 06, 2011, 12:36:58 PM
 #72

Just two updates:

Good news: I achieved 187 MHz non-overclocked. I'm still trying, but unless I find no faster design I will publish it with the next release. At this speed there is almost no overclocking possible.

There will be a further delay for the 1.15x boards: My assembler will try to deliver the boards at begin of next week.




c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
October 06, 2011, 06:55:04 PM
 #73

Excellent news!

How does MHz translate to MH/s?

Is it roughly 1:1 or 2:1?

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 06, 2011, 07:50:41 PM
 #74

Excellent news!

How does MHz translate to MH/s?

Is it roughly 1:1 or 2:1?

1:1, i.e. 187 MH/s

ngzhang
Hero Member
*****
Offline Offline

Activity: 592
Merit: 501


We will stand and fight.


View Profile
October 08, 2011, 06:29:37 AM
 #75

Excellent news!

How does MHz translate to MH/s?

Is it roughly 1:1 or 2:1?

1:1, i.e. 187 MH/s

How about the power consuming @187MH/s?
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
October 08, 2011, 07:35:41 PM
 #76

Sharing is caring, so here's the business end of my VHDL. I'm planning to try
a few alternative options for the adders...

Code:
    two: if CYCLES = 2 generate
                -- second cycle
                o_data <= data;

Stupid question: doesn't this double the number of registers you need?

Your CYCLES=1 implementation appears to have one set of registers (o_data) whereas your CYCLES=2 implementation has two (o_data and data), though it's a bit difficult to be sure from just the segment you posted.

That's not necessarily a bad thing.  OTOH the map results you posted (elsewhere) show 50% register utilization, which doesn't seem like twice as many.  I must have missed something.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 10, 2011, 07:53:49 AM
 #77

Quote
How about the power consuming @187MH/s?

About 9W. Exact values follow.

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 10, 2011, 07:58:12 AM
 #78

Quote
Stupid question: doesn't this double the number of registers you need?

In a two stage per sha256 round pipeline you also need approximately twice as much registers as in a one stage per round design.  The amount of registers per stage is approximately equal.

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 12, 2011, 08:40:58 PM
 #79

The first 1.15x modules arrived yesterday.

Detailed product information (including schematics) and a new BTCMiner version (a few features still have to be added, e.g. hot-plug support in cluster mode) version will appear in the next days.

Those who don't want to wait can order the boards form the shop: http://shop.ztex.de/product_info.php?products_id=66. (prices for >4 units on request)
The boards run with the current BTCMiner version, but only at 135 MH/s. The new release will achieve about 190 MH/s.

ztex (OP)
Donator
Sr. Member
*
Offline Offline

Activity: 367
Merit: 250

ZTEX FPGA Boards


View Profile WWW
October 20, 2011, 07:37:05 PM
 #80

A new BTCMiner release has been published at http://www.ztex.de/btcminer. With the new design typically about 190 MH/s can be achieved
on USB-FPGA Modules 1.15x (192 MHz at an error rate of less than 1%)

The data in the initial post has been updated.

Since the USB-FPGA Modules 1.15x are available now, I created a separate thread in the mining hardware section: https://bitcointalk.org/index.php?topic=49180.0 This post also contains volume prices and estimated prices for license production programs.

Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!