Bitcoin Forum
May 06, 2024, 06:58:47 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 »  All
  Print  
Author Topic: A custom designed FPGA miner for LTC?  (Read 5748 times)
WindMaster
Sr. Member
****
Offline Offline

Activity: 347
Merit: 250


View Profile
May 25, 2013, 09:22:59 PM
Last edit: May 26, 2013, 12:59:29 AM by WindMaster
 #41

This is a public service announcement for anyone that feels inclined to start sending BTC or LTC to nearly anyone that drops the words "Litecoin" and "FPGA" in the same post, even when it's apparent to everyone with in-depth knowledge on the subject that the OP likely doesn't know what he's talking about.


scrypt differs mostly because it uses an entirely new list so frequently.  
I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

The OP's claim is way worse than that.  If you look at what he posted over in the 'scrypt is "memory intensive" therefore no ASICs, but how?' thread, he elaborates a bit more on how he thinks scrypt works, that what he actually means when he talks about setting up and tearing down an entirely new list:


Scrypt is resistant because it is memory hard.  
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle.  This means that the setup and take down of the list is expensive and it has to be done with each iteration.  The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The OP failed the basic scrypt knowledge test, I'm afraid.  I saw a flawed explanation posted somewhere that looked like the OP's description, but can't remember where I saw it.  This is far enough "out there" that I bet the OP had to have read it in the same place.

You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash.  And the memory access pattern will be exactly the same every time you calculate the hash.  In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it.  This was possible because the memory access pattern and amount of memory needed is exactly the same every time.


Quote
The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.
I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

The OP doesn't have a shortcut at all.  Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.


To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

+1

In fact, if we look at his post in the other thread, he claims he already implemented it, and destroyed the FPGA on his dev board while it "sounded like a jet landing the whole time":


For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time).  It quickly became a paperweight.  
A $10,000 paperweight.  

Does not compute, for anyone with technical knowledge on the subject.  In the highly unlikely case that this did actually occur, it would mean the OP already has the dev tools as well and would have no need to replace the whole dev board (as he states earlier in this thread that the dev board costs much more than the FPGA IC), it would be more cost productive to desolder the FPGA IC, clean up the BGA pads and reflow a new FPGA onto the board.


Also ASICs can be built from some FPGAs and those ASICs can be still faster.

Altera's Hardcopy program is really just a mask programmed FPGA that Altera has pre-qualified for your particular netlist to run at a little higher speed.  I wouldn't call it a true ASIC, it doesn't achieve anywhere near the speed-up that you'd normally experience going from an FPGA to an actual real ASIC implementation built from your original Verilog source, and only achieves a few % cost reduction over Altera's equivalent FPGA's.  The only reason it costs less than the equivalent FPGA is that Altera doesn't have to qualify and test the FPGA for every possible design someone could load on it, they only have to test and qualify it for your specific netlist that was mask programmed on the die.  Best not point at Hardcopy as a valid route to an ASIC implementation for LTC.

Hopefully this gives people a little better idea what the odds are the OP is trying to scam people.  And I see people in this thread have already been sending LTC and BTC to him!  Wow..

OP: Take some time to learn how scrypt works.  Read Percival's original Tarsnap scrypt whitepaper.  Check out the source code for a few scrypt implementations.  That way you can have the correct details on the next BS / scam attempt.  Suggesting that scrypt's memory requirements are dynamic and determined by an expensively computed list calculated on each iteration was your biggest mistake here.
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
1715021927
Hero Member
*
Offline Offline

Posts: 1715021927

View Profile Personal Message (Offline)

Ignore
1715021927
Reply with quote  #2

1715021927
Report to moderator
WindMaster
Sr. Member
****
Offline Offline

Activity: 347
Merit: 250


View Profile
May 25, 2013, 09:30:04 PM
 #42

Just preserving a copy so the OP can't change his original posts in this thread or the other one to remove all the incorrect claims that are red flags:


Scrypt is resistant because it is memory hard.  
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle.  This means that the setup and take down of the list is expensive and it has to be done with each iteration.  The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The solution to this problem is to have 2 cores  and a metric crapton of on die ram.  
In this design, 1 core runs the prng algo, the other does the hashing.  
They need to coordinate a little with one another, but it is most definitely doable.
It's much, much harder than SHA256 and the hash rate will never be equal to an ASIC running SHA-256, but relative speed ups should theoretically remain the same as we saw in the progression of BitCoin mining.

Nevertheless, LTC can in fact be mined by even single core FPGAs at a much higher rate than with GPUs I estimate 10x to 100x speed up depending on a number of factors including bus speed, on-die ram, internal clock speed etc.
  
Also ASICs can be built from some FPGAs and those ASICs can be still faster.

The trick is to find an FPGA with as much on die memory as possible and then ensure that your implementation takes full advantage of it.

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time).  It quickly became a paperweight.  
A $10,000 paperweight.  

Nevertheless I did a lot of work on it in my spare time and I am considering a kickstarter project to fund continued development.
I almost started one until I found out the true cost of the FPGA I managed to toast and realized it was probably out of reach of pretty much everyone including myself.
It would have to come down by an order of magnitude in cost before it would be a financially viable option for most folks.



I have found what I believe is a shortcut in scrypt that if implemented correctly in hardware could dramatically speed up the hashrate.
I believe it should work and I know how I would implement it if I had the resources to acquire the FPGA and tools I need.

To show good faith I will elaborate on the algo and how the shortcut would work.  
This is really over simplified, but you are free to take this idea and roll with it.

scrypt the algo used by LTC and in fact all hashing algos, are comprised of 2 predominant steps.
#1 Generate a random list
#2 Hash across it.

To generate consistent results the random algo is actually deterministic pseudo-random and the setup for it is determined by a seed.
We will call this the prng.

The other step is hashing which is pretty well understood, you take a value from list a and replace it with a value from list b.
When you are done iterating you now have a hash.

scrypt differs mostly because it uses an entirely new list so frequently.  
The setup and tear down of this list requires quite a bit of CPU time and a lot of time is wasted on the memory bus performing storage & retrieval operations.
It cannot be done concurrently because the list itself changes frequently.

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

The secondary core is the hashing core.  It would tell the prng core to setup a new list.
Then it would retrieve position x off the list from the shared memory space.
Other than that it would also perform all the normal hashing functions in a dedicated memory space.

I believe the total I need to make this work is about $12k USD, the FPGA I'm targeting right now is $10k and a license for the dev tools will be about $2k.
If I can find a less expensive option then I will go for that, but there aren't that many FPGAs that meet requirements right now.  
The particular target FPGA also has a direct path to ASIC from the mfr.

If you're willing to donate to the effort, I will keep you in the loop with full disclosure including build instructions and a copy of the sources and the firmware.
I haven't decided on a license for this if it works, but you will at least have a right to personal use.  
Perhaps if enough people are interested in production level manufacturing we could go a different route.  I'm not particularly interested in making this something I do for the rest of my life, but the contrarian in me is very excited by the potential here.

The LTC donation address is below.
LKfKkRMvMf2stQMNzQdKCvaf2YueAv1QSa

You can also donate BTC to the key in my sig.
There is no maximum but if you do decide to donate please send at least 0.5 LTC or the equivalent in BTC.
Then post just the address you donated from and I'll PM you here with a bitmessage key to join the group.

Thanks in advance!
BChydro
Hero Member
*****
Offline Offline

Activity: 1426
Merit: 506


View Profile
May 25, 2013, 09:33:35 PM
 #43

I personally would rather leave scrypt mining FPGA or ASIC free. That's the main appeal of it to me. But if an FPGA device were created I'm sure there would be ample interest in it, but then the question is now where will all the GPU miners go???
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 25, 2013, 11:03:39 PM
 #44

This is a public service announcement for anyone that feels inclined to start sending BTC or LTC to nearly anyone that drops the words "Litecoin" and "FPGA" in the same post, even when it's apparent to everyone with in-depth knowledge on the subject that the OP likely doesn't know what he's talking about.


scrypt differs mostly because it uses an entirely new list so frequently.  
I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

The OP's claim is way worse than that.  If you look at what he posted over in the 'scrypt is "memory intensive" therefore no ASICs, but how?' thread, he elaborates a bit more on how he thinks scrypt works, what what he actually means when he talks about setting up and tearing down an entirely new list:


Scrypt is resistant because it is memory hard.  
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle.  This means that the setup and take down of the list is expensive and it has to be done with each iteration.  The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The OP failed the basic scrypt knowledge test, I'm afraid.  I saw a flawed explanation posted somewhere that looked like the OP's description, but can't remember where I saw it.  This is far enough "out there" that I bet the OP had to have read it in the same place.

You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash.  And the memory access pattern will be exactly the same every time you calculate the hash.  In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it.  This was possible because the memory access pattern and amount of memory needed is exactly the same every time.


Quote
The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.
I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

The OP doesn't have a shortcut at all.  Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.


To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

+1

In fact, if we look at his post in the other thread, he claims he already implemented it, and destroyed the FPGA on his dev board while it "sounded like a jet landing the whole time":


For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time).  It quickly became a paperweight.  
A $10,000 paperweight.  

Does not compute, for anyone with technical knowledge on the subject.  In the highly unlikely case that this did actually occur, it would mean the OP already has the dev tools as well and would have no need to replace the whole dev board (as he states earlier in this thread that the dev board costs much more than the FPGA IC), it would be more cost productive to desolder the FPGA IC, clean up the BGA pads and reflow a new FPGA onto the board.


Also ASICs can be built from some FPGAs and those ASICs can be still faster.

Altera's Hardcopy program is really just a mask programmed FPGA that Altera has pre-qualified for your particular netlist to run at a little higher speed.  I wouldn't call it a true ASIC, it doesn't achieve anywhere near the speed-up that you'd normally experience going from an FPGA to an actual real ASIC implementation built from your original Verilog source, and only achieves a few % cost reduction over Altera's equivalent FPGA's.  The only reason it costs less than the equivalent FPGA is that Altera doesn't have to qualify and test the FPGA for every possible design someone could load on it, they only have to test and qualify it for your specific netlist that was mask programmed on the die.  Best not point at Hardcopy as a valid route to an ASIC implementation for LTC.

Hopefully this gives people a little better idea what the odds are the OP is trying to scam people.  And I see people in this thread have already been sending LTC and BTC to him!  Wow..

OP: Take some time to learn how scrypt works.  Read Percival's original Tarsnap scrypt whitepaper.  Check out the source code for a few scrypt implementations.  That way you can have the correct details on the next BS / scam attempt.  Suggesting that scrypt's memory requirements are dynamic and determined by an expensively computed list calculated on each iteration was your biggest mistake here.

So I take it you have a functioning FPGA for LTC?  I would actually love to know even more about your design.  You do come off sounding like you have an impressive amount of knowledge on the subject.  As for me, I'm probably still standing at the top of Mt Stupid especially when I try to relate the concepts to others.
Nevertheless I am a programmer.  I've spent half my life finding optimizations in code, I'm pretty sure I see this one.

Ok so going back to where we started from.
#1 I am not raising money to develop a super secret top of the line nuclear rocket powered FPGA.  I have said from the beginning that I think I see a way to optimize a section of scrypt away.  I would like the opportunity to explore that option.  That is why the thread is entitled A custom designed FPGA miner for LTC.  The implication is that there would be difference in the way it works, but the output should be the same but faster.

#2 I'm pretty certain it's a well known fact by now that I had a board from a previous employer, I was working on porting the OpenCL miner to it and it ran fast and it ran hot and it fried the board.  This may have had something to do with the fact that I was using a version of the dev environment which I probably should not have been.  Hence the need for legit dev tools, which by the way is not just the IDE.  Had I known I was working with a $10k board at the time I would have taken a bit more caution.  But when I left my previous employer I asked if I could buy it at cost and they sold it to me for $850.  I wasn't privy to the fact that it probably cost them a bit more and now days I'm wondering if they even knew.  I'm pretty sure I did mention those facts in the referenced thread.  If not I'm sure I've mentioned them enough in other threads that it should be considered disclosed by now.  But yes everyone, you should be aware.  My original plan was just to compile the OpenCL miner and directly run it on the hardware.  I did this with an expired/not legit version of the dev tools, but it seemed to work.  The side effect of doing it that way was that it ran great for 3 hours, got hot enough to toast marshmallows then would shut down for an hour.  After a few days of working with this in the end fried itself.  Hence I need to rule out the actual dev tools.  Also it made me look closely at the source which is where I think I see my answer. 

#3 Of course I'm going to post the simplest laymans explanation of what is going on under the hood.  I declared explicitly in the beginning when I mentioned "This is really oversimplified but here it goes..."  At which point I then explain hashing in general and that scrypt is hard to FPGA because it requires access to a lot of RAM.  If that RAM is on the die it really does help to speed things along quite a bit even in a default scrypt implementation.  Run the OpenCL miner on the chip I'm talking about and see.  I also then go on and explain that the innovation is creating what are effectively 2 cores and letting them communicate.  One core has just the gates for the PRNG, the other core has the gates for the remainder of the hashing.  They share a common memory space for the lookup table.  I believe by decoupling them and taking advantage of certain reductions in complexity that we would see a speed up.  I don't know that this is in fact the case, but it looks right to me and I'm willing to do what it takes to prove or disprove my theory.

#4 I was unaware that the Altera ASIC was not a true ASIC.  This is important information and is game changing in my eyes, since part of the point is to provide a direct path to asic for anyone who wanted to participate.  Because of this I am willing to fully refund anyone who has contributed.  All they have to do is ask.  Thank you for bringing that to my attention.  I will keep it under advisement as I try to find a new path.


Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 25, 2013, 11:11:46 PM
 #45

Just preserving a copy so the OP can't change his original posts in this thread or the other one to remove all the incorrect claims that are red flags:


Scrypt is resistant because it is memory hard.  
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle.  This means that the setup and take down of the list is expensive and it has to be done with each iteration.  The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The solution to this problem is to have 2 cores  and a metric crapton of on die ram.  
In this design, 1 core runs the prng algo, the other does the hashing.  
They need to coordinate a little with one another, but it is most definitely doable.
It's much, much harder than SHA256 and the hash rate will never be equal to an ASIC running SHA-256, but relative speed ups should theoretically remain the same as we saw in the progression of BitCoin mining.

Nevertheless, LTC can in fact be mined by even single core FPGAs at a much higher rate than with GPUs I estimate 10x to 100x speed up depending on a number of factors including bus speed, on-die ram, internal clock speed etc.
  
Also ASICs can be built from some FPGAs and those ASICs can be still faster.

The trick is to find an FPGA with as much on die memory as possible and then ensure that your implementation takes full advantage of it.

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time).  It quickly became a paperweight.  
A $10,000 paperweight.  

Nevertheless I did a lot of work on it in my spare time and I am considering a kickstarter project to fund continued development.
I almost started one until I found out the true cost of the FPGA I managed to toast and realized it was probably out of reach of pretty much everyone including myself.
It would have to come down by an order of magnitude in cost before it would be a financially viable option for most folks.



I have found what I believe is a shortcut in scrypt that if implemented correctly in hardware could dramatically speed up the hashrate.
I believe it should work and I know how I would implement it if I had the resources to acquire the FPGA and tools I need.

To show good faith I will elaborate on the algo and how the shortcut would work.  
This is really over simplified, but you are free to take this idea and roll with it.

scrypt the algo used by LTC and in fact all hashing algos, are comprised of 2 predominant steps.
#1 Generate a random list
#2 Hash across it.

To generate consistent results the random algo is actually deterministic pseudo-random and the setup for it is determined by a seed.
We will call this the prng.

The other step is hashing which is pretty well understood, you take a value from list a and replace it with a value from list b.
When you are done iterating you now have a hash.

scrypt differs mostly because it uses an entirely new list so frequently.  
The setup and tear down of this list requires quite a bit of CPU time and a lot of time is wasted on the memory bus performing storage & retrieval operations.
It cannot be done concurrently because the list itself changes frequently.

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

The secondary core is the hashing core.  It would tell the prng core to setup a new list.
Then it would retrieve position x off the list from the shared memory space.
Other than that it would also perform all the normal hashing functions in a dedicated memory space.

I believe the total I need to make this work is about $12k USD, the FPGA I'm targeting right now is $10k and a license for the dev tools will be about $2k.
If I can find a less expensive option then I will go for that, but there aren't that many FPGAs that meet requirements right now.  
The particular target FPGA also has a direct path to ASIC from the mfr.

If you're willing to donate to the effort, I will keep you in the loop with full disclosure including build instructions and a copy of the sources and the firmware.
I haven't decided on a license for this if it works, but you will at least have a right to personal use.  
Perhaps if enough people are interested in production level manufacturing we could go a different route.  I'm not particularly interested in making this something I do for the rest of my life, but the contrarian in me is very excited by the potential here.

The LTC donation address is below.
LKfKkRMvMf2stQMNzQdKCvaf2YueAv1QSa

You can also donate BTC to the key in my sig.
There is no maximum but if you do decide to donate please send at least 0.5 LTC or the equivalent in BTC.
Then post just the address you donated from and I'll PM you here with a bitmessage key to join the group.

Thanks in advance!

Here let's preserve a copy together.  I agree, if I change anything in the original description after reading your "red flags" then I must be a scammer.  It could never be that I learned new information or tried to clarify or found out I was wrong about something. Smiley  So far that hasn't been the case.
By the way you have some interesting points.  They are a bit outside the realm of what I was trying to get at here, but if you see further optimizations please feel free to pitch in.  You are actually giving us all a valuable learning experience in the right way to implement an FPGA for LTC.  BTW what chipset and board are you using?  What devtools?  Thanks!

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Viceroy
Hero Member
*****
Offline Offline

Activity: 924
Merit: 501


View Profile
May 25, 2013, 11:16:20 PM
 #46

Can you write an efficient scrypt miner that we can put on the amazon cluster?  I have $100 in credit plus an offer to beat up a bunch of tesla's for 24 hours.
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 12:07:14 AM
 #47

Can you write an efficient scrypt miner that we can put on the amazon cluster?  I have $100 in credit plus an offer to beat up a bunch of tesla's for 24 hours.
I'm actually not sure about that.  For instance I know that GPU mining on the telsa is horribly inefficient for bitcoin.  I can only guess that it would be even worse for, litecoin.
However I do wonder how the CPUMiner would fare on one of those 64ECU units.

From a cost perspective spinning up a ton of micro instances as spot instances may infact be your best bet there and just point them at a pool.  I did that a month ago and it caused the pool operator to shut down thinking they were under DDOS attack.  Trimmed it back from 200 units to 20 and let it mine a week and got nothing for my trouble but a big amazon bill.

I'm going to take a look at the way scrypt is implemented in the CPU miner and see if there is anyway to optimize it a bit though because now you've got my brain working in that direction, thanks!

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
mtrlt
Member
**
Offline Offline

Activity: 104
Merit: 10


View Profile
May 26, 2013, 12:18:21 AM
 #48

Nova: You have blatantly misunderstood how hash functions work, and specifically how scrypt works. I agree with WindMaster, there is no way you have made, or will make a scrypt FPGA. I advise everyone to not send Nova money.
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 01:11:07 AM
 #49

Nova: You have blatantly misunderstood how hash functions work, and specifically how scrypt works. I agree with WindMaster, there is no way you have made, or will make a scrypt FPGA. I advise everyone to not send Nova money.

Ok, elaborate.  Please explain in laymans terms how a cryptographic hash function in general works first.  Then also explain in laymans terms how scrypt differs from the SHA-256 of bitcoin.

If I have completely misunderstood hashing over a lifetime of programming, then I really have some long hard thinking to do.

My guess is that you're focusing on an over simplified explanation, something I can present to people who may or may not have any sort of experience with programming or hardware dev and assuming that the reductions and omissions are there as an oversight or misunderstanding rather than the fact that they are not relevant to what I'm attempting to explain and thus intentionally omitted.

mtrlt, please enlighten us.

While doing so, don't try to point at the flaws in my explanation and say "it's not x but y & z".  Instead start from scratch.  Try to remember your audience here.
Frankly I'm genuinely interested in this.  I'm also impressed with all these FPGA experts chiming in with their knowledge, looks like lots of people have fully working LTC FPGAs. 

As for everyone else, the offer is still open here. 
If anyone has sent me money and decided that they no longer feel comfortable with the idea of what I've stated before they can request a refund.
Also mtrlt is half right, I have not made a fully functional FPGA for LTC yet, it's sort of why I'm asking for help to raise funds to buy the bits I need.

Some good news is that people have been suggesting alternatives to build this on which could end up being much cheaper.  I'm currently swimming in whitepapers Smiley

Anyways, I'm with mtrlt & windmaster on this.  I expressly advise against anyone sending me money unless they understand that what they are getting is the collective result of what I learn in this process.  While the goal is to produce an FPGA with a version of scrypt with a slower section optimized away, this effort may not produce a result other than "oh ok, so it didn't work because I was wrong about x".  If I had all the information I needed or I had the money to purchase the equipment to check it out, I wouldn't be going this route.

I am learning a lot though.  There do appear to be candidate boards that may work.  I'm studying them carefully now.

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
WindMaster
Sr. Member
****
Offline Offline

Activity: 347
Merit: 250


View Profile
May 26, 2013, 01:31:46 AM
 #50

mtrlt, please enlighten us.

Random tidbit that may be of interest if anyone is unsure whether mtrlt knows what he's talking about.  Nova, you mention above that you've looked at the OpenCL source for scrypt mining.  Assuming that you're talking about the OpenCL kernel from Reaper or cgminer, hop back into the source, scroll up to the top and examine the copyrights at the top of the file.

https://github.com/ckolivas/cgminer/blob/master/scrypt130511.cl#L2


While doing so, don't try to point at the flaws in my explanation and say "it's not x but y & z".  Instead start from scratch.  Try to remember your audience here.
Frankly I'm genuinely interested in this.  I'm also impressed with all these FPGA experts chiming in with their knowledge, looks like lots of people have fully working LTC FPGAs

This part is true.  However, most (all?) of the people that have actually done it have found that the cost/performance ratio is significantly worse than GPU's.  This was actually the case for BTC too, FPGA's never had the edge in the cost/performance ratio, only an edge on power consumption.  On the scrypt side of things, it's my position from first-hand experience that you'd have to have insanely expensive power or be willing to wait for multiple years for ROI on power cost savings for FPGA's to be worthwhile for LTC mining.  We know what we're doing on the FPGA and ASIC development side of things, and yet we have a data center full of 5850 and 6950 based rigs mining LTC.
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 01:33:11 AM
 #51


You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash.  And the memory access pattern will be exactly the same every time you calculate the hash.  In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it.  This was possible because the memory access pattern and amount of memory needed is exactly the same every time.

The OP doesn't have a shortcut at all.  Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.


I'm very interested in learning more about this approach.
You say that you shift every scrypt core 1 clock cycle from the previous one.
You then say that a 2 core approach would be retarded.

You either did or did not implement multiple cores, but it sounds to me like you implemented all of scrypt and loaded it to seperate cores.  You mean completely seperate chips or literal cores on the same chip?

You say it's retarded, but you make it sound like it works flawlessly for you.  Primary difference being you're calling all the way out to RAM and I'm saying let's keep that on-die.
Also I believe you're talking about all of scrypt in a single functional unit and I'm saying let's break it out into a generation core and a usage core.  You may have actually found a better solution.  I'd like to know your hashrate and platform though, if you don't mind disclosing it.

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 01:40:27 AM
 #52

mtrlt, please enlighten us.

Random tidbit that may be of interest if anyone is unsure whether mtrlt knows what he's talking about.  Nova, you mention above that you've looked at the OpenCL source for scrypt mining.  Assuming that you're talking about the OpenCL kernel from Reaper or cgminer, hop back into the source, scroll up to the top and examine the copyrights at the top of the file.

https://github.com/ckolivas/cgminer/blob/master/scrypt130511.cl#L2

At no point did I imply he didn't know what he was talking about.
However posting the entirety of the code and pointing at it and saying look, really wasn't an option. 
I was asking him to explain what's going on in it in just a couple of sentences or paragraphs.

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 01:57:12 AM
 #53

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

defines are a sort of macro they're going to be put into the final output as the code they represent.

Ask yourself what happens if you just have that section of code isolated as it's own separate core.
Then modify the code to call into that core rather than keep repeating that section over and over again?

Now we're on the same page.  It would be a start.  There are some other things I see a well, but I'll just keep that under my hat until I get a chance to make sure I'm right.
 

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
mtrlt
Member
**
Offline Offline

Activity: 104
Merit: 10


View Profile
May 26, 2013, 02:07:50 AM
 #54

Nova: You have blatantly misunderstood how hash functions work, and specifically how scrypt works. I agree with WindMaster, there is no way you have made, or will make a scrypt FPGA. I advise everyone to not send Nova money.

Ok, elaborate.  Please explain in laymans terms how a cryptographic hash function in general works first.  Then also explain in laymans terms how scrypt differs from the SHA-256 of bitcoin.

Why should I explain it to you in layman terms? I can only assume that you are indeed not a technical person. Besides, I have already proven that I know what I'm talking about (by for example writing a BTC miner from scratch, and the first ever open source GPU miner for LTC, which no-one has been able to improve significantly, at least not publicly). You haven't. The only reason you can't explain your ideas technically is that you don't know what you're talking about.

Quote
If I have completely misunderstood hashing over a lifetime of programming, then I really have some long hard thinking to do.

Time doesn't give you knowledge. Time gives you the opportunity to gather knowledge. When I started coding my own miner in 2011, I had no idea what a hash function even was or how to program OpenCL. Now I know. Before I started working on my Yacoin GPU miner, I had no idea how SHA-3 or ChaCha worked. After 13.5 hours, I had a fast GPU implementation.

Quote
My guess is that you're focusing on an over simplified explanation, something I can present to people who may or may not have any sort of experience with programming or hardware dev and assuming that the reductions and omissions are there as an oversight or misunderstanding rather than the fact that they are not relevant to what I'm attempting to explain and thus intentionally omitted.

I focus on an over-simplified explanation because that's all you have given. From what I can decipher from it, it's completely bogus.

Quote

mtrlt, please enlighten us.

While doing so, don't try to point at the flaws in my explanation and say "it's not x but y & z".  Instead start from scratch.  Try to remember your audience here.


My audience is you, and you alone.


EDIT:

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

defines are a sort of macro they're going to be put into the final output as the code they represent.

Ask yourself what happens if you just have that section of code isolated as it's own separate core.
Then modify the code to call into that core rather than keep repeating that section over and over again?

They are the rounds of SHA-256... not a random number generator. I am now completely sure you are completely ignorant. You also talk about #defines like they are a completely new concept to you. Are you even a programmer?
WindMaster
Sr. Member
****
Offline Offline

Activity: 347
Merit: 250


View Profile
May 26, 2013, 02:10:02 AM
 #55

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

That isn't a random number generator.  You're looking at macros for the SHA256 rounds.  Just stop, and go read the scrypt whitepaper.  Immediately, if not sooner..  I'm not trying to be rude, it's just that an immediate read of the scrypt whitepaper will be better use of your time at this point.
Luke-Jr
Legendary
*
Offline Offline

Activity: 2576
Merit: 1186



View Profile
May 26, 2013, 02:16:39 AM
 #56

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

That isn't a random number generator.  You're looking at macros for the SHA256 rounds.  Just stop, and go read the scrypt whitepaper.  Immediately, if not sooner..  I'm not trying to be rude, it's just that an immediate read of the scrypt whitepaper will be better use of your time at this point.
Well, strictly speaking scrypt is using SHA256 as a random number generator...

mtrlt
Member
**
Offline Offline

Activity: 104
Merit: 10


View Profile
May 26, 2013, 02:20:03 AM
 #57

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

That isn't a random number generator.  You're looking at macros for the SHA256 rounds.  Just stop, and go read the scrypt whitepaper.  Immediately, if not sooner..  I'm not trying to be rude, it's just that an immediate read of the scrypt whitepaper will be better use of your time at this point.
Well, strictly speaking scrypt is using SHA256 as a random number generator...

In a way, yes. But Nova's proposal of doing single SHA-256 rounds in a separate core is bogus. I am not very knowledgeable on FPGAs, but I'd assume you'd at least unroll it.. which would nullify what Nova is trying to accomplish in the first place.

Anyways, it's not like SHA-256 is the difficult part in calculating LTC hashes.
WindMaster
Sr. Member
****
Offline Offline

Activity: 347
Merit: 250


View Profile
May 26, 2013, 02:31:18 AM
 #58

Anyways, it's not like SHA-256 is the difficult part in calculating LTC hashes.

I think if Nova wants to get started in FPGA development, a more realistic idea would be for him to experiment with making an FPGA implementation of a Bitcoin miner instead.  He's already caught up on the details of SHA256 rounds afterall..  Smiley

Nova, consider experimenting with Bitcoin instead.  It'll be a lot easier, since SHA256D is almost the only thing you'll need to figure out (+/- some miscellaneous fiddly details and communications), and you can happily instance copies of your core all over the place until you run out of logic area.  Figuring out scrypt is much harder than that, but because it uses SHA256, you'll need to come to grips with that hashing algorithm first anyway.
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 02:53:20 AM
 #59

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

That isn't a random number generator.  You're looking at macros for the SHA256 rounds.  Just stop, and go read the scrypt whitepaper.  Immediately, if not sooner..  I'm not trying to be rude, it's just that an immediate read of the scrypt whitepaper will be better use of your time at this point.

*embarrassed*
That's actually a good catch and frankly something I had not seen before but should have.
It would have been a costly mistake to proceed on that and while I can see now it's Round not Rand for RND, I openly admit I did not see that before and this was a critical thinking error.  

Thank you both for showing me my mistake.
It completely breaks my premise.

There is no need to raise further funds for this.  These two have shown me a critical flaw in the plan which would have been to isolate the RND function off onto it's own core.
I'm really glad the original author came on and explained this because to me it wasn't clear and in the absence of anything resembling a comment in the source code, I find this information extremely valuable.

So here is what we've learned.

#1 I need to review closer to make sure the code is doing what I think it's doing when working with someone else code, especially when that code has no comments and all I have to go on is a whitepaper.

#2 I did in fact see the RND define as a hand written psuedo-random generator.  It is not one, and had my brain been in anyway functional I should have caught that before making an announcement of this type.  In my mind it's still a candidate for optimization, but that's probably just me being stubborn.

#3 Not that it's advisable but...
 
There are several dev boards and OpenCL FPGAs out there on the market.  
It should in theory be possible to modify the OpenCL miner to run on one of these FPGAs.
The Altera Stratix V is at the high end of these and I did burn one out trying.

That should be enough of a start, however I do plan to keep chasing this dog until it barks.  I'm no longer asking for contributions of any kind, I've gotten the information I needed.  I really appreciate everyone's patience on this.

Anyone who wishes a refund is entitled to it.

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Nova! (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 101


View Profile
May 26, 2013, 03:30:41 AM
 #60

They are the rounds of SHA-256... not a random number generator. I am now completely sure you are completely ignorant. You also talk about #defines like they are a completely new concept to you. Are you even a programmer?
Defines are not a new concept.  My line of thinking was that yes having that section of code unroll was precisely the problem.  Yes flat code can be nice when it executes in a single stack, but if you call out to a separate device with a single instruction it's faster in most cases than executing a bunch of things on the stack.

This is not my first FPGA project, but it is my second.  My first being one I can't go into depth about, but the gloss of it was "Here is an OpenCL FPGA, we already use OpenCL on GPU farms in our datacenter.  We are adopting the technology so that we can port our software to hardware and yield better performance for lower cost.  Here's a manual, here's a devboard, we're going golfing your's truly management".

That project worked well, most of what we had ported well.  It was almost straight across compiles in most cases.  When I left that job I was able to take my devboard which I played with and decided to try it at mining LTC.

The rest has been explained here.

As for your comment about whether I'm actually a programmer or not, yes I am, but this experience is making me wonder if maybe I've started to age out.  Now I look, it's pretty obvious where I made my mistake and I didn't check a fundamental assumption, I just assumed I knew.  There is no excuse.  That leaves me looking like a fool and I plan to leave this thread up and check it everytime before I post something "I just know will work". Smiley

Thanks for the info.

Donate @ 1LE4D5ERPZ4tumNoYe5GMeB5p9CZ1xKb4V
Pages: « 1 2 [3] 4 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!