Bitcoin Forum
April 23, 2024, 10:53:11 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 ... 99 »
  Print  
Author Topic: DIY FPGA Mining rig for any algorithm with fast ROI  (Read 99392 times)
Etherion
Sr. Member
****
Offline Offline

Activity: 512
Merit: 260



View Profile
May 17, 2018, 02:34:07 PM
 #721

The Phi algo change will test the theory that they can adapt the fpga soffware in a matter of hours or days??
Also I don't get why the engineering samples of these boards are cheaper and they can put them in mass production for a higher price?? Shouldn't it be the other way around?

How on earth can one make a prediction of time if we don't know the amount of effort required? We don't know what the change will be. we don't know who much time is even available to dedicate to the change.

All that we know is that FPGA's can be better at doing a specific job but it might be worst. There is an asic out for Eth that is less power efficient than 2 year old GPU's
1713912791
Hero Member
*
Offline Offline

Posts: 1713912791

View Profile Personal Message (Offline)

Ignore
1713912791
Reply with quote  #2

1713912791
Report to moderator
The Bitcoin software, network, and concept is called "Bitcoin" with a capitalized "B". Bitcoin currency units are called "bitcoins" with a lowercase "b" -- this is often abbreviated BTC.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
dhouse
Jr. Member
*
Offline Offline

Activity: 266
Merit: 2


View Profile
May 17, 2018, 02:43:39 PM
 #722

You are wrong. Smiley

Well that makes me sad. Mining Lux with an FPGA was the only reason i was following this thread!

Im not alone in being hyped to mine lux with fpga Smiley
But as i just went for ti rigg instead il be happy if they succeed in making it resistant :p Just hope no one bought to mine lux if they do succeed, others pain is not my gain.

Is Lux just changing their algo because of these FPGA threads? Because that's a lot of work to defend against something that might not even happen, and if it does it won't be such a big scale anyway. And I don't think anyone's producing an ASIC for it anytime soon. Head scratcher.
Lunga Chung
Member
**
Offline Offline

Activity: 277
Merit: 23


View Profile
May 17, 2018, 03:24:09 PM
 #723

Mining LUX with FPGA goes at least 6 months back, this forum is not the only source 
toxiroxi
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
May 17, 2018, 05:01:26 PM
 #724

I know those talks here are being about FPGA having them at home running and configured.

But does somebody have already looked into the possibilities of Amazon EC2-F1 instances? They are also providing FPGA in their datacenters (1 instance consists of 8 pieces of  16 nm Xilinx UltraScale Plus FPGA's).

As you can build images and re-deploy on any other FPGA i was wondering that might be in interest somehow? I just wanted to bring that up as another opporunity which might be in interest.
I would definitely try that out - if i understand this might be possible to run "off-shore" rather than home but still having dedicated hardware. Therefore i would be in a need of understanding if all the work being done here is also able being used and replicated in the Amazon Cloud/Datacenter. (especially the bitstreams/firmware)

Let me know your thoughts.

Cheers,

JCS-CCT
Jr. Member
*
Offline Offline

Activity: 154
Merit: 1


View Profile
May 17, 2018, 07:33:24 PM
 #725

I know those talks here are being about FPGA having them at home running and configured.

But does somebody have already looked into the possibilities of Amazon EC2-F1 instances? They are also providing FPGA in their datacenters (1 instance consists of 8 pieces of  16 nm Xilinx UltraScale Plus FPGA's).

As you can build images and re-deploy on any other FPGA i was wondering that might be in interest somehow? I just wanted to bring that up as another opporunity which might be in interest.
I would definitely try that out - if i understand this might be possible to run "off-shore" rather than home but still having dedicated hardware. Therefore i would be in a need of understanding if all the work being done here is also able being used and replicated in the Amazon Cloud/Datacenter. (especially the bitstreams/firmware)

Let me know your thoughts.

Cheers,



I believe Amazon has banned using their FPGAs for mining.
profdd
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
May 17, 2018, 10:29:05 PM
 #726

Found this product:
"BittWare’s XUPSVH is an UltraScale+ VU33P/35P FPGA-based PCIe card.  The UltraScale+ FPGA helps these demanding applications avoid I/O bottlenecks with integrated High Bandwidth Memory (HBM2) tiles on the FPGA that support up to 8 GBytes of memory at 460 GBytes/sec."
Each fpga device requires a unique bitstream. Think about it.


Sorry, what is "unique bitstream"?  You mean need to do unique coding for each FPGA model to do mining same coin?  Shocked
Tmdz
Hero Member
*****
Offline Offline

Activity: 1008
Merit: 1000


View Profile
May 17, 2018, 10:54:52 PM
 #727

Found this product:
"BittWare’s XUPSVH is an UltraScale+ VU33P/35P FPGA-based PCIe card.  The UltraScale+ FPGA helps these demanding applications avoid I/O bottlenecks with integrated High Bandwidth Memory (HBM2) tiles on the FPGA that support up to 8 GBytes of memory at 460 GBytes/sec."
Each fpga device requires a unique bitstream. Think about it.


Sorry, what is "unique bitstream"?  You mean need to do unique coding for each FPGA model to do mining same coin?  Shocked

Yea my take it is like a bios for the fpga card that tells it exactly what to do, so each one is unique for every fpga.

kind of like saying gpu's are a sledge hammer where a more basic set of instructions can be sent to it to take a swing at anything.
In a fpga it would more like programming a laser to etch out exactly what you want resulting in a more precise operation.
GPUHoarder
Member
**
Offline Offline

Activity: 154
Merit: 37


View Profile
May 17, 2018, 11:41:53 PM
 #728

The Phi algo change will test the theory that they can adapt the fpga soffware in a matter of hours or days??
Also I don't get why the engineering samples of these boards are cheaper and they can put them in mass production for a higher price?? Shouldn't it be the other way around?

There are not many new customers of FPGAs. They are heavily incentivized to offer “cheap” dev kits to get a company to see the value in the chip and build a product around it. First they get you hooked, then...
GPUHoarder
Member
**
Offline Offline

Activity: 154
Merit: 37


View Profile
May 17, 2018, 11:47:57 PM
 #729

Found this product:
"BittWare’s XUPSVH is an UltraScale+ VU33P/35P FPGA-based PCIe card.  The UltraScale+ FPGA helps these demanding applications avoid I/O bottlenecks with integrated High Bandwidth Memory (HBM2) tiles on the FPGA that support up to 8 GBytes of memory at 460 GBytes/sec."
Each fpga device requires a unique bitstream. Think about it.


Sorry, what is "unique bitstream"?  You mean need to do unique coding for each FPGA model to do mining same coin?  Shocked

Yea my take it is like a bios for the fpga card that tells it exactly what to do, so each one is unique for every fpga.

kind of like saying gpu's are a sledge hammer where a more basic set of instructions can be sent to it to take a swing at anything.
In a fpga it would more like programming a laser to etch out exactly what you want resulting in a more precise operation.

This is somewhat accurate. The bitstream is literally the blueprint for the exact circuit you want the FPGA to currently be wired as. Every model of FPGA is like a unique building - it needs its own tailored electrical blueprint, even though the electrical blue prints for two similarly sized datacenters might look very similar. Maybe on one the main power feed comes in on the south wall and the rows run north to south, and on the other the power comes in the east wall and rows are spaced differently running west to east.

With FPGAs you can easily rewire the whole building (but not change where the fixed resources are),  with an ASIC you’re starting from flat level ground, and once the building and wires are in they can never be changed.

With a GPU you can’t change the wires, all the machines are already installled and all the manufacturing lines are already installed in stone,  and each is only good for what it is good for, you can only tell the machines what order of operations to execute.
SuppaHash
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
May 18, 2018, 12:07:58 AM
 #730

IF someone knows other sources for FPGA mining in general, send me a PM, i'm currently starting to develop HPC aplications on FPGA, starting slowly but surely raised my interest on FPGA mining, even tough it is really expensive and hard to get development boards outside USA
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
May 18, 2018, 12:10:33 AM
 #731


Ran through timetravel10 today, looks like with 8 fpgas (one dedicated to each algo) you might be able to get up into 1-10Gh/s. Bitcore definitely needs to do something. A small fpga cluster could 51% them pretty easily.



2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
May 18, 2018, 01:18:03 AM
 #732

Ran through timetravel10 today, looks like with 8 fpgas (one dedicated to each algo) you might be able to get up into 1-10Gh/s. Bitcore definitely needs to do something. A small fpga cluster could 51% them pretty easily.
You've made some interesting optimizations.

My naïve reading is:

a) they have 11 algorithms coded
b) only first 10 are used
c) the whole hash is a nesting of always 10 sub-hashes
d) chosen without repetition
e) which gives 10! possibilities
f) the choice of permutation is keyed from the block height
g) not sequentially, but skipping up to 8! permutations

So my naïve implementation (one card dedicated to each sub-hash) would require 10 FPGA cards.

What is your secret ingredient?

Edit: Link to the source code: https://github.com/LIMXTEC/BitCore/blob/master/src/crypto/hashblock.h

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
May 18, 2018, 01:29:24 AM
Merited by 2112 (1)
 #733

Ran through timetravel10 today, looks like with 8 fpgas (one dedicated to each algo) you might be able to get up into 1-10Gh/s. Bitcore definitely needs to do something. A small fpga cluster could 51% them pretty easily.
You've made some interesting optimizations.

My naïve reading is:

a) they have 11 algorithms coded
b) only first 10 are used
c) the whole hash is a nesting of always 10 sub-hashes
d) chosen without repetition
e) which gives 10! possibilities
f) the choice of permutation is keyed from the block height
g) not sequentially, but skipping up to 8! permutations

So my naïve implementation (one card dedicated to each sub-hash) would require 10 FPGA cards.

What is your secret ingredient?

Edit: Link to the source code: https://github.com/LIMXTEC/BitCore/blob/master/src/crypto/hashblock.h

I was looking at timetravel not timetravel-10. My bad. Hashrate would be the same, but yes, 10 cards would be required.



whitefire990 (OP)
Copper Member
Member
**
Offline Offline

Activity: 166
Merit: 84


View Profile
May 18, 2018, 01:51:35 AM
Merited by 2112 (1)
 #734

Timetravel10 fits in a single VU13P.  You partition the FPGA into 16 blocks, and store about 14 partial bitstreams for each block.  Then you do a dynamic partial reconfiguration from DDR4 to build the pipeline at the start of each block based on the current algorithm sequence.  Yielding one hash per clock (i.e. 500MH/s @ 500MHz).  You need 16 blocks because some functions like Groestl and Echo require 2 blocks.  The FPGA can reconfigure itself in 0.25 seconds.  The problem with Timetravel and X16R/X16S is the long time it takes to load the DDR4 bitstream table via USB.  And you lose it if there is a power outage and must reprogram the DDR4 on each FPGA.  This where utilizing the PCI bus would be an advantage.

GPUHoarder
Member
**
Offline Offline

Activity: 154
Merit: 37


View Profile
May 18, 2018, 01:54:47 AM
 #735

Timetravel10 fits in a single VU13P.  You partition the FPGA into 16 blocks, and store about 14 partial bitstreams for each block.  Then you do a dynamic partial reconfiguration from DDR4 to build the pipeline at the start of each block based on the current algorithm sequence.  Yielding one hash per clock (i.e. 500MH/s @ 500MHz).  You need 16 blocks because some functions like Groestl and Echo require 2 blocks.  The FPGA can reconfigure itself in 0.25 seconds.  The problem with Timetravel and X16R/X16S is the long time it takes to load the DDR4 bitstream table via USB.  And you lose it if there is a power outage and must reprogram the DDR4 on each FPGA.  This where utilizing the PCI bus would be an advantage.



Yep this is the big reason chain hashing doesn’t stop FPGAs - partial reconfiguration, the overhead of which can be nearly entirely latency hidden.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
May 18, 2018, 01:57:53 AM
 #736

Timetravel10 fits in a single VU13P.  You partition the FPGA into 16 blocks, and store about 14 partial bitstreams for each block.  Then you do a dynamic partial reconfiguration from DDR4 to build the pipeline at the start of each block based on the current algorithm sequence.  Yielding one hash per clock (i.e. 500MH/s @ 500MHz).  You need 16 blocks because some functions like Groestl and Echo require 2 blocks.  The FPGA can reconfigure itself in 0.25 seconds.  The problem with Timetravel and X16R/X16S is the long time it takes to load the DDR4 bitstream table via USB.  And you lose it if there is a power outage and must reprogram the DDR4 on each FPGA.  This where utilizing the PCI bus would be an advantage.
My naïve reading of the above-linked code is that Echo is coded, but never used. Am I wrong?

 

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
senseless
Hero Member
*****
Offline Offline

Activity: 1118
Merit: 541



View Profile
May 18, 2018, 02:12:16 AM
 #737

Timetravel10 fits in a single VU13P.  You partition the FPGA into 16 blocks, and store about 14 partial bitstreams for each block.  Then you do a dynamic partial reconfiguration from DDR4 to build the pipeline at the start of each block based on the current algorithm sequence.  Yielding one hash per clock (i.e. 500MH/s @ 500MHz).  You need 16 blocks because some functions like Groestl and Echo require 2 blocks.  The FPGA can reconfigure itself in 0.25 seconds.  The problem with Timetravel and X16R/X16S is the long time it takes to load the DDR4 bitstream table via USB.  And you lose it if there is a power outage and must reprogram the DDR4 on each FPGA.  This where utilizing the PCI bus would be an advantage.
My naïve reading of the above-linked code is that Echo is coded, but never used. Am I wrong?


0->10, echo is 10 and the loop breaks on 11.



2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
May 18, 2018, 02:44:36 AM
 #738

0->10, echo is 10 and the loop breaks on 11.
We must be looking at a different code then.
Code:
#define HASH_FUNC_COUNT 10 // BitCore: HASH_FUNC_COUNT of 11
    uint32_t permutation[HASH_FUNC_COUNT];
    for (uint32_t i=0; i < HASH_FUNC_COUNT; i++) {
        permutation[i]=i;
    }
    // Compute the next permuation
    ...
    for (uint32_t i=0; i < HASH_FUNC_COUNT; i++) {
        switch(permutation[i]) {
            // cases 0 to 9 here
        case 10:
            // Echo is here, but isn't ever executed
        break;
    }
The code (and comments on the marketing websites) shows that initially there were 11! permutations, but right now that was reduced to 10! .

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
May 18, 2018, 05:48:33 AM
 #739

I had to watch a really boring "entertainment" program and used that time to edit the above file into a working C++ program. Echo was never used. Due to the peculiar permutation order Blake and Bmw are always fixed at position 0 and 1 respectively, only the remaining 8 positions change. So using the terminology from the whitefire990's post above timetravel10 requires only 9 reconfigurable blocks assuming that only Groestl requires a double block.
Code:
#include <algorithm>    // std::next_permutation
#include <iostream>
#include <inttypes.h>

#define HASH_FUNC_BASE_TIMESTAMP 1492973331 // BitCore: Genesis Timestamp
#define HASH_FUNC_COUNT 10 // BitCore: HASH_FUNC_COUNT of 11
#define HASH_FUNC_COUNT_PERMUTATIONS 40320  // BitCore: HASH_FUNC_COUNT!

int main()
{
    // We want to permute algorithms. To get started we
    // initialize an array with a sorted sequence of unique
    // integers where every integer represents its own algorithm.
    uint32_t permutation[HASH_FUNC_COUNT];
    for (uint32_t i=0; i < HASH_FUNC_COUNT; i++) {
        permutation[i]=i;
    }

    // Compute the next permuation
    uint32_t steps = HASH_FUNC_COUNT_PERMUTATIONS;
    for (uint32_t i=0; i < steps; i++) {
        for (uint32_t i=0; i < HASH_FUNC_COUNT; i++) {
    switch(permutation[i]) {
            case 0:
std::cout << "blake ";
            break;
            case 1:
std::cout << "bmw ";
            break;
            case 2:
std::cout << "groestl ";
            break;
            case 3:
std::cout << "skein ";
            break;
            case 4:
std::cout << "jh ";
            break;
            case 5:
std::cout << "keccak ";
            break;
            case 6:
std::cout << "luffa ";
            break;
            case 7:
std::cout << "cubehash ";
            break;
            case 8:
std::cout << "shavite ";
            break;
            case 9:
std::cout << "simd ";
            break;
            case 10:
std::cout << "echo ";
            break;
    }
        }
std::cout << std::endl;
        std::next_permutation(permutation, permutation + HASH_FUNC_COUNT);
    }
    return 0;
}

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
whitefire990 (OP)
Copper Member
Member
**
Offline Offline

Activity: 166
Merit: 84


View Profile
May 18, 2018, 06:36:28 AM
 #740

I had to watch a really boring "entertainment" program and used that time to edit the above file into a working C++ program. Echo was never used. Due to the peculiar permutation order Blake and Bmw are always fixed at position 0 and 1 respectively, only the remaining 8 positions change. So using the terminology from the whitefire990's post above timetravel10 requires only 9 reconfigurable blocks assuming that only Groestl requires a double block.

This is very interesting.  I did some math and there is a very decent chance Bitcore would fit in a VU9P (while for sure it fits in a more expensive VU13P). 
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 ... 99 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!