Bitcoin Forum
May 04, 2024, 03:11:47 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: I got an idea that could make scrypt fpga/asics a lot faster. [TECHNICAL]  (Read 887 times)
awesomeperson451 (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
June 26, 2015, 08:21:55 PM
 #1

I had a relatively simple design idea a while ago about how to speed up the hashrate of any asic/fpga design by a couple of magnitudes at the expense of a lot more logic. While I don't know if it was genius or retarded, I've been keeping that idea to myself in the hopes that someday I'd learn enough about chip design to make my own FPGA, and then later that I'd make a couple friends in the industry that would let me make a small run of asics at a do-able price.

Unfortunately, my interests have been moving in a different direction lately, and none of those hopes I had have come to fruition. So, I might as well share the idea with some of you fine folks. Just promise me that if you're a developer out there who wants to implement it in your design, you'll reserve me a spot on the pre-order and maybe give me an employee discount. Of course,  I can't make you do anything, but it'd be nice.

Anyway, on to the idea: The main problem with designing a fast scrypt ASIC is the fact that scrypt is memory intensive. We all know that already, right? Well, what you probably don't know is that the problem doesn't have so much to do with the physical amount of memory needed as much as it does with the amount of individual memory accesses needed. Why? Because memory, or more specifically, external RAM/cache memory, is slow. Really slow. And, if you design one, you'll find that the speed of your ASIC soon becomes the speed of your asics memory, divided by the number of cores you implemented, plus or minus a small bit of overhead.

So, my idea is, why use memory at all? Just because we have to store a bunch of bits at various points in the hashing process to use later doesn't mean we have to resort to a different chip entirely. Just build your own memory, in logic! Why use slow, multipurposed, RAM when you could use fast, application-specific, memory instead? After all, "application-specific" is in the name of the thing you're designing!

All you need is a bit of clever design and a whole shitload of flip-flops (as in 2 NOR gates feedbacking into each other, not the shoe). There's no seek time because you know where each bit is located, and now reads and writes are less complicated than adding 1+1. Memory latency is now just as fast as the rest of the circuit.

So... Good idea, or bad idea? Because I assume someone had to think of this before me.
1714835507
Hero Member
*
Offline Offline

Posts: 1714835507

View Profile Personal Message (Offline)

Ignore
1714835507
Reply with quote  #2

1714835507
Report to moderator
It is a common myth that Bitcoin is ruled by a majority of miners. This is not true. Bitcoin miners "vote" on the ordering of transactions, but that's all they do. They can't vote to change the network rules.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714835507
Hero Member
*
Offline Offline

Posts: 1714835507

View Profile Personal Message (Offline)

Ignore
1714835507
Reply with quote  #2

1714835507
Report to moderator
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
June 26, 2015, 11:09:40 PM
 #2

you are not talking about a few bits of memory to store, but a lot more...

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
awesomeperson451 (OP)
Newbie
*
Offline Offline

Activity: 28
Merit: 0


View Profile
June 27, 2015, 02:03:23 AM
 #3

Quote
you are not talking about a few bits of memory to store, but a lot more...

Yeah. I know. 128kB (1,024,000 bits) per hash per core, if I remember correctly, which would require 1,024,000 flip flops (plus some overhead from the various calculations that have to be done). Remember that memory usage only starts to get gigantic when you run multiple cores. Each modern ASIC chip has as many cores as they could fit on the die. As for GPUs, my 7970, for example, has 2048 cores. 2048 times 128kB equals just over 262mB, which is right in the ballpark of how much GPU ram it uses while hashing. Divide the number of hashes/second (700kh/s) it runs by the number of cores (2048), then by the number of memory accesses per hash (I forgot), then take the reciprocal of that (divide one by it), and I'd bet you would come up with a figure somewhere close to the memory latency of the GPU's ram (googled for an hour, still can't find it).

But can you fit that many flip flops on a chip, you might ask? Well, I figure, if you can fit it on an FPGA, you can fit it on an ASIC. Look at page 10-11 of this user manual. According to that, you can get up to 2,443,000 flip flops on an FPGA, and that's just that brand/series alone, so it can certainly be done on an ASIC.
Eastwind
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
June 27, 2015, 04:14:43 PM
 #4

You need fast memory directly connected to the cores, not the DDR5 memory.
MaxDZ8
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
June 30, 2015, 06:09:33 AM
 #5

You cannot find GPU memory latency easily because it's a function of the kernel being run.
For graphics kernels, latency is effectively 0.
Scrypt on my 7750 is not memory limited at all (using GAP 2), my performance scale with GPU clock perfectly.

The memory you're talking about exists in commercial products. GPUs have it, it's called Local Data Share or "local memory". It's already twice as big as L1 (just because some idiots think cache is efficient), it has lower latency (potentially a couple of clocks), twice the bandwidth, it has a 32-way crossbar (!!!) and I've been told it burns 1/4 of the power.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!