Bitcoin Forum
May 10, 2024, 01:19:00 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 [All]
  Print  
Author Topic: Looking for an FPGA with cache for BTC and Litecoin Mining - any ideas?  (Read 9485 times)
disclaimer201 (OP)
Legendary
*
Offline Offline

Activity: 1526
Merit: 1001


View Profile
August 04, 2012, 11:05:44 AM
 #1

Hey guys, I know this doesn't belong to BTC mining, but the hardware section made most sense imo.

Another one of my PSUs died recently, and I'm not sure if it makes sense to rebuild the rig for BTC and/or LTC mining. If possible I'd love
to switch to FPGA mining but with Asics on the horizon I'd only buy an FPGA that is able to mine Litecoins as well as Bitcoins (not at the same time obviously).

Anyone have plans on FPGA with memory cache for Ltc-mining?
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
Unlike traditional banking where clients have only a few account numbers, with Bitcoin people can create an unlimited number of accounts (addresses). This can be used to easily track payments, and it improves anonymity.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
1715303940
Hero Member
*
Offline Offline

Posts: 1715303940

View Profile Personal Message (Offline)

Ignore
1715303940
Reply with quote  #2

1715303940
Report to moderator
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1068



View Profile
August 04, 2012, 03:43:30 PM
 #2

http://www.ztex.de/usb-fpga-1/usb-fpga-1.15.e.html

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
disclaimer201 (OP)
Legendary
*
Offline Offline

Activity: 1526
Merit: 1001


View Profile
August 04, 2012, 09:05:26 PM
 #3


128 MB will not be enough to get a good performance out of it, or? And a working bitstream for mining would be needed as well, right? Maybe I'll ask again next year.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1068



View Profile
August 04, 2012, 09:14:54 PM
Last edit: August 05, 2012, 02:01:15 PM by 2112
 #4

128 MB will not be enough to get a good performance out of it, or? And a working bitstream for mining would be needed as well, right? Maybe I'll ask again next year.
One thread of scrypt(1024,1,1) requires exactly 131583 bytes of memory, which is 128.5 kB. Thus 128MB would allow for pipelining of over 1000 parallel scrypt() threads.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
pieppiep
Hero Member
*****
Offline Offline

Activity: 1596
Merit: 502


View Profile
August 05, 2012, 07:07:37 AM
 #5

I think I would round it up to 256 kB so you only need to make address lines for within the 256 kB and other address lines for the threads. That way you don't have to calculate where you must read by doing thread number * 128.5 kB.
But that would still give the possibility of 512 parallel scrypt threads with that amount of memory.
bitfury
Sr. Member
****
Offline Offline

Activity: 266
Merit: 251


View Profile
August 06, 2012, 07:45:56 AM
 #6

Well, scrypt's scratchpad is 1024 x 1024 matrix. there are two loops, causing major slowdown:

1st loop - for i from 0 to 1023 - filling scratchpad doing:
   scratchpad[1023..0] <= X[1023..0];
   X[511..0] <= xor_salsa(X[511..0], X[1023..512]);
   X[1023..512] <= xor_salsa(X[1023..512], X[511..0]);

2nd loop - use scratchpad for i from 0 to 1023
   X[1023..0] <= X[1023..0] xor scratchpad[X[521..512]][1023..0];
   X[511..0] <= xor_salsa(X[511..0], X[1023..512]);
   X[1023..512] <= xor_salsa(X[1023..512], X[511..0]);

While xor_salsa could be perfectly pipelined, in Spartan6 XC6SLX150 fits only 8 scratchpads.
If BRAMs are not used for bitcoin computations, it is possible to implement LTC mining for XC6SLX150 at about 50 - 100 kh/s per chip with about 80% of slices free.
So single chip can mine both - LTC and BTC using different of its internal resources - BRAMs for LTC and logics for BTC.

What is interesting to note - that scratchpad access could be perfectly pipelined as well, and is 1024-bit wide. That means that imaginable FPGA should have only 6 wires
to transmit out address (6 bits + clock) and get 1024 input wires for scratchpad data.

This means that multiple smaller DRAM chips working in parallel will do best job... Allowing about 500 mega-transfers for low-cost / mid-cost fpga, that is 500 giga-bits per
second or 60 gigabytes per second. Overall cost of DRAM will be about 150 EUR- and of FPGA to handle that about 300 EUR-. If works in fully-pipelined manner it would give
about 500 kh/s mining performance for litecoin application.

Generally performances achieved near the same for litecoin as for decent GPU boards with FPGA, but power consumption would be radically less than for SHA256 bitcoin
mining for example. Power dissipation would be very low. That is only point. Cost to build solution would be higher.

What is more interesting, that there will be no cheap way of ASIC for LTC purpose, as basically most of chip area would be RAM, and there will be no significant edge to produce
RAM for pipelining using say 250-nm or 90-nm tech process. But - building cheap 250-nm chips for computations and to drive DRAM arrays would give significant cost
reduction compared to installing FPGAs. Still - DRAM prices will not go anywhere and best DRAM-based solution would not outperform GPUs or CPUs by orders of magnitude.

Say for Scrypt it is best to pipeline about 32-36 calculations deep, not 1024... That would make xor_salsa calculations and DRAM access performances comparable.

Best on-die solution should contain 1024-bit wide (bus) and 32768-bit tall DRAM block - that will be biggest thing. For example for 90nm - 90nm is smallest feature size, while such single
transistor. Single holding cell with routing area would have size about 0.5 um^2. So overall chip area would be ~16 mm^2 (!) without self-healing features. And computation unit size would
be below 0.2 mm^2 :-)

For 90-nm that still requires $500k initial investments to build masks + investments into design, etc... And you'll get at about $1 per die price for chip that could compute 124 kh/s.
Power consumption will be neglible - about 0.1 - 0.5 W

What is more interesting that 180nm would require ~$150k-$200k initial investments and will lead to $4 / die price (die will be bigger) and about 60 kh/s performance.

And 250-nm would require _much_ less - of about $50k-$80k initial investments and will lead to $8 / die price (die will be very big! 128 mm^2!) with about 40 kh/s performance.

So I would consider 180-nm to 250-nm for LTC ASIC. big die maybe not that bad, as that die can be mounted without packaging easily (11 mm x 11 mm is really big!).

Well - these numbers are very preliminary... I am currently learning into ASIC design, I think I would design scrypt hasher chip as well - 250-nm requires really small amounts of money to start with,
and maybe there's something could be invented as well for speedup - say scrypt accesses memory randomly only in second part of using scratchpad, but when generating it access memory sequentially.
Hmm ... maybe even Litecoin chip would be done before Bitcoin chip, as it seems to be much simpler and uses well-known techniques - less competition :-)

wizzardTim
Legendary
*
Offline Offline

Activity: 1708
Merit: 1000


Reality is stranger than fiction


View Profile
March 19, 2013, 09:40:07 PM
 #7

Any news on this? have you designed the chip?

Behold the Tangle Mysteries! Dare to know It's truth.

- Excerpt from the IOTA Sacred Texts Vol. I
loshia
Legendary
*
Offline Offline

Activity: 1610
Merit: 1000


View Profile
March 20, 2013, 08:27:24 AM
 #8

I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Having it as commission bit stream is ok for me. There are a lot of Spartans out there and if this is possible at all, whoever makes it will be rewarded for sure
Any comments?

Please help the Led Boy aka Bicknellski to make us a nice Christmas led tree and pay WASP membership fee here:
https://bitcointalk.org/index.php?topic=643999.msg7191563#msg7191563
And remember Bicknellski is not collecting money from community;D
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1068



View Profile
March 20, 2013, 12:08:19 PM
 #9

I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Spartan-6 memory controller blocks are designed to control single memory chips, not multi-chip memory modules. There are 4 memory controller blocks in each Spartan-6, but depending on the package not all are connected to pins.

While it isn't impossible to build a memory-module controller from the regular Spartan-6 logic blocks, such controller will be inefficient and slow. In the Xilinx product line only Virtex FPGA can directly interface with multi-chip memory modules.

BTW, bitfury is very busy with his 55nm Bitcoin ASIC project:

https://bitcointalk.org/index.php?topic=140366.msg1641318#msg1641318

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
loshia
Legendary
*
Offline Offline

Activity: 1610
Merit: 1000


View Profile
March 20, 2013, 12:12:06 PM
 #10

I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Spartan-6 memory controller blocks are designed to control single memory chips, not multi-chip memory modules. There are 4 memory controller blocks in each Spartan-6, but depending on the package not all are connected to pins.

While it isn't impossible to build a memory-module controller from the regular Spartan-6 logic blocks, such controller will be inefficient and slow. In the Xilinx product line only Virtex FPGA can directly interface with multi-chip memory modules.

BTW, bitfury is very busy with his 55nm Bitcoin ASIC project:

https://bitcointalk.org/index.php?topic=140366.msg1641318#msg1641318
10X

Buy the way i am watching bitfury closely long time ago:)

Please help the Led Boy aka Bicknellski to make us a nice Christmas led tree and pay WASP membership fee here:
https://bitcointalk.org/index.php?topic=643999.msg7191563#msg7191563
And remember Bicknellski is not collecting money from community;D
pieppiep
Hero Member
*****
Offline Offline

Activity: 1596
Merit: 502


View Profile
March 20, 2013, 12:25:34 PM
 #11

Remember, litecoin doesn't use the memory bandwidth, it uses the L1 (L2?) cache bandwidth, which is much higher.
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
March 20, 2013, 01:05:42 PM
 #12

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=138&No=683

wizzardTim
Legendary
*
Offline Offline

Activity: 1708
Merit: 1000


Reality is stranger than fiction


View Profile
March 20, 2013, 01:55:44 PM
 #13


What exactly can we do with this board? It says it has advanced memory interfacing. Can we use it for mining LTCs?

Behold the Tangle Mysteries! Dare to know It's truth.

- Excerpt from the IOTA Sacred Texts Vol. I
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
March 20, 2013, 03:14:51 PM
 #14

You can add up to 4 GB ram. I thought that might be sufficent for an ltc lookup table.

tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
March 20, 2013, 04:13:35 PM
 #15

Any news on this? have you designed the chip?

These are theoretical numbers...  laSeek has been busting his ass to try to get kilohash/second rates into the double digits with inexpensive FPGAs.  The trials in altera FPGAs were a trainwreck.

The problem is that even with a large number of slices, you will run into the problem that
1) Memory bandwidth in FPGA devices is poor comparative to a GPU.  For on-slice cache it is 10-20x less than that of a GPU, and for off-chip memory it is about 20-40x less than a GPU.
2) Clock rate of FPGA devices in general is lower than that of GPUs.

You can resolve 1) by chaining memory interfaces in a multichip configuration, but that's a lot of hardware customization.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
wizzardTim
Legendary
*
Offline Offline

Activity: 1708
Merit: 1000


Reality is stranger than fiction


View Profile
March 20, 2013, 04:19:02 PM
 #16

You can add up to 4 GB ram. I thought that might be sufficent for an ltc lookup table.

That's good news. I do not think the RAM will be expensive, so who should we ask to get better info? Have you any insights on this: how much hashes it will produce, what is needed for programming the board to be able to mine with scrypt. I' m a software engineer, but I've never programmed a board..

Behold the Tangle Mysteries! Dare to know It's truth.

- Excerpt from the IOTA Sacred Texts Vol. I
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
March 20, 2013, 05:39:18 PM
 #17

That's exactly my problem. I write software and programmed pal's etc many years ago. But never programmed an fpga. I got a link to a sha256 implementation in vhdl (I know, that's not what ltc requires), and I compared it to the C sources just to get an idea, how similar they look. And at a first glance you can port the C sources almost 1:1. But I guess the devil is in the detail, so I won't claim, that a scrypt port is no problem. I wondered, if it's feasable to simulate the whole hardware, before any money is spent on prototype boards? But maybe the dev software will cost quite some money alone....don't know...

tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
March 20, 2013, 06:01:38 PM
 #18

LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
wizzardTim
Legendary
*
Offline Offline

Activity: 1708
Merit: 1000


Reality is stranger than fiction


View Profile
March 21, 2013, 09:05:11 AM
 #19

LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Is there any chance that the slow speed comes from the simulation itself? What if we tried it on a real board. Would the results be similar or way different (better)?

Behold the Tangle Mysteries! Dare to know It's truth.

- Excerpt from the IOTA Sacred Texts Vol. I
tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
March 21, 2013, 07:47:55 PM
 #20

The slow speeds are on real chips. The simulations are what runs fast.

They're working on a lot of optimizations for the N=1024, p=1, r=1 scenario that is the current implementation.  I think it's more of a technical challenge for laSeek as an FPGA engineer than anything else. It'll be interesting to see if he gets it off the ground.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
.m.
Sr. Member
****
Offline Offline

Activity: 280
Merit: 260



View Profile
March 31, 2013, 02:00:49 PM
 #21

Hi, what do you think about this one ?
Virtex-7 2000T: Designed with ASIC Prototyping and Emulation in Mind - FPGA enabled by Stacked Silicon Interconnect (SSI) technology delivers 2 million logic cells, 6.8 billion transistors in 28 nm design.
Around 20W @ 100 MHz  (3600 8 bit processing elements consumed 85% chip capacity providing 180 000 MIPS)

http://www.xilinx.com/applications/asic-prototyping/index.htm

oops - do they really cost 5000 USD each ?


███████████████████████████████████████
███████████████████████████████████████
█████████████████████████████
██████████████████████████
████████████████████████
███████████████████████
█████████████████▐████
███████████████████████
████████████████████████
██████████████████████████
█████████████████████████████
███████████████████████████████████████
███████████████████████████████████████
DECENT
FOUNDATION



██
██
██
██
██
██
██
██
██

██
██
██


[D]ecentralized application
[E]liminated third parties
[C]ontent distribution



██
██
██
██
██
██
██
██
██

██
██
██


[E]ncrypted & secure
[N]o borders
[T]imeless reputation



██
██
██
██
██
██
██
██
██

██
██
██



██
██
██
██
██
██
██
██
██

██
██
██

lame.duck
Legendary
*
Offline Offline

Activity: 1270
Merit: 1000


View Profile
March 31, 2013, 04:47:00 PM
 #22

LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Do you have a link? I could not find much information
tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
March 31, 2013, 05:00:35 PM
 #23

LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Do you have a link? I could not find much information

Proprietary design, they're not talking about it a lot.  You can try them at #litecoin-dev on freenode if you like.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
dodegkr
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
May 04, 2013, 10:05:59 AM
 #24

I wonder if the up/down ports on the cairns more 1 fpga could be used to interface with some sort of memory.
Pages: 1 2 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!