Bitcoin Forum
May 29, 2024, 09:43:13 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 14 15 »  All
  Print  
Author Topic: BlockBurner LLC - Crucible FPGA Scrypt Miner - Announcement Aug-19  (Read 42340 times)
tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
April 16, 2013, 02:21:42 PM
Last edit: April 16, 2013, 02:33:11 PM by tacotime
 #41


The problem is that the on-device block RAM is insanely slow compared to GPU ram (about 10 times slower for most FPGAs).  The per slice block RAM for most FPGAs is also less than 128 KB (more like 8 KB in typical cases).


Well, I'm not as familiar with GPU, but I doubt it is 10 times faster.  And I believe you have been misinformed regarding the capacity as well.

The Spartan-6 LX 150 used on many of the boards already built has 4.9 million bits of memory.  The memory in -3 speed grade part can run at up to 320MHz
Newer but similar priced Artix-7 have 13.4 million bits, with up to 509MHz in -3 grade parts.

As it relates to scrypt and it's 128KB scratchpad, the core loop accesses memory sequentially in 1024-bit widths.  Within an FPGA, you can have access to all 1024 bits in a single clock.  While you may not be able to achieve that performance point due to other issues,  1024 bits @ 320/500MHz is nothing to sneeze at.



Total block ram on the whole chip for a spartan6 lx 150 (most expensive chip) is 4824 Kb.  http://www.xilinx.com/products/silicon-devices/fpga/spartan-6/lx.htm

Memory bandwidth for the block RAM is about 30-60 gb/s (your numbers above) while GPU internal bus is usually around 250 gb/s on higher end cards.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
phk
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 16, 2013, 02:46:07 PM
 #42


Total block ram on the whole chip for a spartan6 lx 150 (most expensive chip) is 4824 Kb.  http://www.xilinx.com/products/silicon-devices/fpga/spartan-6/lx.htm

Yes, that's what I said.  (4.9 million == 4800 K)    (the exact number is 4939776 bits).


Quote
Memory bandwidth for the block RAM is about 10 gb/s while GPU internal bus is usually around 250 gb/s on higher end cards.

I'm not following your arithmetic.  Are you citing some document somewhere for either of those numbers?  If so, can you paste a link? 

From my previous post, an FPGA memory with 1024-bit width at (lets downgrade it to a more modest 200MHz) is 200 billion bits per second or 25GB / s.
This would be per-memory-instance (or, per hypothetical scrypt-core).

Joe_Bauers
Hero Member
*****
Offline Offline

Activity: 802
Merit: 1003


GCVMMWH


View Profile
April 16, 2013, 02:49:18 PM
 #43

Total block ram on the whole chip for a spartan6 lx 150 (most expensive chip) is 4824 Kb.  http://www.xilinx.com/products/silicon-devices/fpga/spartan-6/lx.htm

Memory bandwidth for the block RAM is about 10 gb/s while GPU internal bus is usually around 250 gb/s on higher end cards.

Something like this might be neat to attempt.
http://www.wpi.edu/Pubs/E-project/Available/E-project-031212-183607/unrestricted/FPGA_Design_for_DDR3_Memory.pdf


Please let me know if you need me to test it out...  Wink
phk
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 16, 2013, 02:53:57 PM
 #44


Something like this might be neat to attempt.
http://www.wpi.edu/Pubs/E-project/Available/E-project-031212-183607/unrestricted/FPGA_Design_for_DDR3_Memory.pdf


Please let me know if you need me to test it out...  Wink

My point is that off-chip memory for SCRYPT is entirely unnecessary.  It really doesn't need much and it can fit entirely onchip.  The latency of going off-chip would kill performance.
tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
April 16, 2013, 03:01:09 PM
 #45


Total block ram on the whole chip for a spartan6 lx 150 (most expensive chip) is 4824 Kb.  http://www.xilinx.com/products/silicon-devices/fpga/spartan-6/lx.htm

Yes, that's what I said.  (4.9 million == 4800 K)    (the exact number is 4939776 bits).


Quote
Memory bandwidth for the block RAM is about 10 gb/s while GPU internal bus is usually around 250 gb/s on higher end cards.

I'm not following your arithmetic.  Are you citing some document somewhere for either of those numbers?  If so, can you paste a link?  

From my previous post, an FPGA memory with 1024-bit width at (lets downgrade it to a more modest 200MHz) is 200 billion bits per second or 25GB / s.
This would be per-memory-instance (or, per hypothetical scrypt-core).



The total RAM per block is 18KB. Each block has a 72-bit width. I don't really know where you're pulling your numbers from. Even if you calculate in parallel, 128/18 = 8 block RAM units required, with 72-bit widths each --> not 1024 bit width either.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
phk
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 16, 2013, 03:38:54 PM
 #46

The total RAM per block is 18KB. Each block has a 72-bit width. I don't really know where you're pulling your numbers from. Even if you calculate in parallel, 128/18 = 8 block RAM units required, with 72-bit widths each --> not 1024 bit width either.

I think you are misinformed about what is and is not possible.

You can construct whatever width you like by putting multiple units in parallel.  This is commonly done, and is a general feature of FPGA's not unique to Xilinx.

The vendors put them into small blocks like that to improve the granularity / flexibility for the designer.  As a result, you effectively lose capacity (bits) when your chosen configuration doesn't map efficiently to the underlying memory organization.

Artix-7 is even better, but limiting the discussion to Spartan 6 which many people have already bought, here is some documentation:

See page two of this:
(a) http://www.xilinx.com/support/documentation/ip_documentation/blk_mem_gen_ds512.pdf

See page nine of this:
(b) http://www.xilinx.com/support/documentation/user_guides/ug383.pdf

See page two of this:
(c) http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf


To get a x1024 memory using (a), you can see from (b) that one possibility might be (32) instances of (x32) width.
As far as the capability of the LX150 part commonly used on existing bitcoin mining boards, you will see in (c) that this devices has a total of (268) such blocks.
So accommodating the 128KB scratchpad in SCRYPT could be done with (64) blocks configured for (x32) width and (32) units in parallel.   The LX150 could possibly hold (4) such memories, but I think you run out of gates for SCRYPT arithmetic well before that.

Operatr (OP)
Hero Member
*****
Offline Offline

Activity: 798
Merit: 1000


www.DonateMedia.org


View Profile WWW
April 17, 2013, 05:46:40 AM
 #47

I thought a FPGA was reprogrammable? Why can't someone just buy a blank FPGA with a decent amount of onboard memory and program the chip? Am I misunderstanding how to set up a FPGA?

They are versatile though some are more optimized in architecture for certain things. At the moment I am looking into FPGAs with Scrypt in mind over a more generic type such as the Spartan 6.

Operatr (OP)
Hero Member
*****
Offline Offline

Activity: 798
Merit: 1000


www.DonateMedia.org


View Profile WWW
April 17, 2013, 02:55:36 PM
 #48

At this point interest in this seem high enough, I am putting together a volunteer dev team to work with me. If you are an interested FPGA/Microcomputing engineer, software engineer, or have otherwise relevant skill in bringing such a product to market, please PM me.

Project Overview

Design Goals:

Modular Scrypt FPGA system
USB Connectivity
Stand alone/Rack convertible casing for scalability
Associated software package

I have had a few PMs and have seen questions regarding pre-orders for this project:


On Pre-Orders


Any pre-order campaign will be associated with the current stage of development. Unlike other producers there will be no pre-orders until a certain capital requirement is met meeting the estimated costs associated with that stage. At this stage it would be in generating a working prototype device. I am taking a community approach for complete transparency, every transaction would be made public knowledge as I think if you are willing to take a chance on us, you should know exactly what your money is funding and see it develop before your eyes.

This approach minimizes risk and gives a linear progression of development that is seen by the whole community.

I don't believe it is fair to hold pre-orders in a way that in a way fakes it as if it is a real product sold online, knowing full well it does not exist. I think this practice itself is fraudulent in nature itself.

Prototype Stage

Proto-adopters would be taking the bulk of the risk, as such we would work out some other kind of benefit to funding assistance at this stage. I am open to ideas on what you would like to see if you opted to be a proto-adopter.

A known price point will be known before any pre-order campaing begins with a known cap to hit, all pre-order capital going into third-party escrow until the needed amount is reached. Otherwise it would be returned to you. This could be receiving a prototype device to help with testing or some kind of future revenue sharing.

Production Stage Once a working prototype is created, we will then move on to casing actual production costs, and much like the Proto stage, will have a certain goal needed before any capital is invested.

To do this will require a crowd-sourced effort, which would be conducted through various forums as well as things like Kickstarter campaigns and the like.

evilscoop
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250



View Profile
April 17, 2013, 03:04:40 PM
 #49

sweet..

when you get to testing and software dev stage i can help more, until then ill be watching this closely..
gl and thx
Operatr (OP)
Hero Member
*****
Offline Offline

Activity: 798
Merit: 1000


www.DonateMedia.org


View Profile WWW
April 19, 2013, 04:02:20 AM
 #50

Announcement:

A dev team is officially being formed

Thank you all for your support! There will be more updates soon as our team comes together to start mapping out Stage 1, followed by round 1 pre-orders (or you may simply donate) when it comes time.

Operatr

blastbob
Hero Member
*****
Offline Offline

Activity: 602
Merit: 500



View Profile
April 19, 2013, 04:04:33 AM
 #51

Good stuff!

Will pre order a few LTC hashers for sure, summer is coming. Heat is a issue

Bitrated user: blastbob.
Lacan82
Sr. Member
****
Offline Offline

Activity: 247
Merit: 250


View Profile
April 19, 2013, 04:17:36 AM
 #52

sweet Cheesy I'm interested

tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
April 19, 2013, 04:51:25 AM
 #53

The total RAM per block is 18KB. Each block has a 72-bit width. I don't really know where you're pulling your numbers from. Even if you calculate in parallel, 128/18 = 8 block RAM units required, with 72-bit widths each --> not 1024 bit width either.

I think you are misinformed about what is and is not possible.

You can construct whatever width you like by putting multiple units in parallel.  This is commonly done, and is a general feature of FPGA's not unique to Xilinx.

The vendors put them into small blocks like that to improve the granularity / flexibility for the designer.  As a result, you effectively lose capacity (bits) when your chosen configuration doesn't map efficiently to the underlying memory organization.

Artix-7 is even better, but limiting the discussion to Spartan 6 which many people have already bought, here is some documentation:

See page two of this:
(a) http://www.xilinx.com/support/documentation/ip_documentation/blk_mem_gen_ds512.pdf

See page nine of this:
(b) http://www.xilinx.com/support/documentation/user_guides/ug383.pdf

See page two of this:
(c) http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf


To get a x1024 memory using (a), you can see from (b) that one possibility might be (32) instances of (x32) width.
As far as the capability of the LX150 part commonly used on existing bitcoin mining boards, you will see in (c) that this devices has a total of (268) such blocks.
So accommodating the 128KB scratchpad in SCRYPT could be done with (64) blocks configured for (x32) width and (32) units in parallel.   The LX150 could possibly hold (4) such memories, but I think you run out of gates for SCRYPT arithmetic well before that.



I'm sorry, but I still don't follow.  (b) Table 4 that you cited shows a maximum width of 32-bits for a 9 KB block.  With 18 KB data blocks, the maximum width is 64-bits (plus error checks bits).

You can get get a 32-bit writes in parallel on 32 separate 9 KB blocks, which is sort of like a 1024-bit interface (I guess; 1024-bit interface really implies that you're writing 1024-bits a cycle through the same memory interface...).  I think a direct implementation like this won't achieve a very good speed, though (less than 10 KH/s on most of these chips).

The better implementation would just run in the allocated memory and remake the LUT as needed I would think.  See the kernel for cgminer and reaper, and use of the "lookup gap" function, which more or less does this.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
3ham3
Full Member
***
Offline Offline

Activity: 137
Merit: 100



View Profile
April 19, 2013, 10:47:21 AM
 #54


1. Do you think the market and community is ready for FPGA Litecoin?
Yes!


2. Is there definite interest in FPGA Litecoin machines? Would you buy one if the price was reasonable? What is reasonable?
I  am very interested, will be buying if the price is right.


3. Would you pre-order one to support first round funding for prototyping and first wave production?
Would need to see a complete working prototype before placing a pre order, or investing.
razorfishsl
Sr. Member
****
Offline Offline

Activity: 399
Merit: 250


View Profile WWW
April 19, 2013, 02:09:04 PM
 #55

 :'(Seriously some noobie statements  in this discussion.

1. FPGA  is NOT , I repeat NOT a single product from a single manufacturer and as such there are NO hard and fast rules on what you can and cannot do with  BRAMs or memory generation, even the Xilinx product range has a different 'flavour' across product lines.

So statements such as "you don't know what you are talking about" only show you up for the noob you are, if you were THAT WELL researched you would know this.

Talking about 'RAM' as a single entity is also a misnomer ,because generally there are multiple ways to 'construct' RAM, which is after all just a flipflop.

If you are "lucky" the FPGA may have BRAM blocks where the internal resources and routing are all optimized for you, and you just 'hook it up'
If you are not one of gods chosen people then you have to construct the 'RAM' from normal logic, with all the shitty routing and interconnection that infers.

2. Memory access speeds have little to do with it, ultimately it comes down to internal logic chains..., no matter how FAST your memory is,
if your shittly VHDL/verilog is so badly written it takes 20ns to execute a clocked routine, then you may as well just be using paper& pen as a scratch pad, ultimately it bottlenecks somewhere.
Xilinx allows their internal BRAM to be operated 'upto' 600Mhz on some of the V5/V6, but unless you can get the rest of your relevant logic upto that speed , it does not really matter how fast it is.

As regards Scrypt, I had actually contacted some members who claim to be interested in Technical co-op, but it came to naught....
People are only interested if they think you have an edge.

I have my own Scrypt code for Xilinx FPGA and a pluggable rack system, that takes 10 boards, I had to mux them as 8+2 hot spares.(yep sometimes they drop in & out of service randomly)

Its a nice size, about 70cm*20*35cm, which allows for cooling & to slide PCBS along to get the JTAG into each board, with per board highspeed 17CFM MAGLEV fans (none of those shitty fans with the oilpool and stupid split washer under a label)
Only oversight is WTF do I put the PSU's.....(I'd banked on an ATX actually being able to supply the 3V3 supply, but they all lie about the capability)

Unfortunately...
Performance is shite...... insofar as comparison to high-end CPU or GPUs.
Who knows if I can get an improvement but it is going to be very hard to beat the GPU thrughput Vrs cost.

High Quality USB Hubs for Bitcoin miners
https://bitcointalk.org/index.php?topic=560003
phk
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
April 19, 2013, 09:07:54 PM
 #56


I'm sorry, but I still don't follow.  (b) Table 4 that you cited shows a maximum width of 32-bits for a 9 KB block.  With 18 KB data blocks, the maximum width is 64-bits (plus error checks bits).

You can get get a 32-bit writes in parallel on 32 separate 9 KB blocks, which is sort of like a 1024-bit interface (I guess; 1024-bit interface really implies that you're writing 1024-bits a cycle through the same memory interface...).

Yes, an x1024 memory might be constructed with (32) blocks configured for x32 width.   Is there something you didn't understand about that?  (this is just repeating what I said earlier?)

CoinHoarder
Legendary
*
Offline Offline

Activity: 1484
Merit: 1026

In Cryptocoins I Trust


View Profile
April 19, 2013, 10:08:53 PM
Last edit: April 19, 2013, 11:05:11 PM by CoinHoarder
 #57

Good luck, I will be following closely.
TheSwede75
Full Member
***
Offline Offline

Activity: 224
Merit: 100



View Profile
April 19, 2013, 10:27:19 PM
 #58

Count me in for seed/prototype financing/purchase. No pain, no gain! (once dev team, ballpark cost etc. is presented).
tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
April 19, 2013, 10:30:41 PM
 #59

Yes, an x1024 memory might be constructed with (32) blocks configured for x32 width.   Is there something you didn't understand about that?  (this is just repeating what I said earlier?)

No, that makes sense, before I was confused because I thought you were implying that a 9 KB memory block could have a 1024-bit width.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
TheSwede75
Full Member
***
Offline Offline

Activity: 224
Merit: 100



View Profile
April 19, 2013, 10:32:45 PM
 #60

Also: Not to be the guy asking the stupid questions here, but what is stopping bulk purchases of GPU chips (specific clocking/memory designed for mining) in bulk from AMD? With say 25 undervolted and finetuned 7850 chips on a fairly simple board that would plug via USB and be recognized as a multi-crossfire system I can see that being a $$ while solution. Or maybe I am just dreaming..
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 14 15 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!