Will CN7, Equihash and Neoscrypt fit on one VU9P? Or will one need to have two or 3 to get those algorithm to work?
I want to order two dev kits but need some idea if I can just one or should I now buy 3.
I need to make a call on how many I need to order. So any estimate would be great CN7, Neoscrypt, Lyra2z, Lyra2v2, all fit in one VU9P. Equihash isn't worth it competing against Antminer Z9, unless we implement one of the Equihash forks. Thanks a million. I hope Zcash forks so that equihash becomes a option. The Zcash parameter change would not give FPGAs much advantage, if any. This board in particular lacks the bandwidth
|
|
|
Most of the algorithms out there already exist published "for free" - that isn't the hard part.
I found only Keccak. They'd still need expensive software licenses + lots of time and reasonable sets of skills to synthesize + deploy bitstreams for a variety of hardware.
Vivado (webpack/design) tcl console + gcc/cmake, skill not needed just 1-click. Both free. ps: VCU1525 Webpack edition won’t build you Ultrascale+ bitstreams. Though I guess you could use the trial of the full version.
|
|
|
who will reprogram this if a new algo is born?
How about the DEVFee? The devfee depends on the dev, as gpuhoarder eluded to in the previous post... There will likely be many devs coming out with bitstreams and software. Whitefire was just the first to announce. We are considering developing a platform that would allow any dev to develop firmwares for the boards and provide the development environment. The devfee collected on our software would depend on what devfee the dev wanted to set. I'm also doing this now. And any algo on which they want to earn I'll release on github for free. Most of the algorithms out there already exist published "for free" - that isn't the hard part. They'd still need expensive software licenses + lots of time and reasonable sets of skills to synthesize + deploy bitstreams for a variety of hardware. Not to mention we tend to be doing a lot of floor planning, rapid re-configuration, and other things beyond some verilog/VHDL. Typically speaking, especially for things like this where there is a very real high cost (in software and hardware + time) to produce the bitstreams, you will get what you pay for.
|
|
|
Ccminer is GNU, so any fork source must be published. How will the fpga programmer prevent his work from being copied? I understand that the Bitstream is a not GNU but surly the fpga can't replace the management app which will combine and coordinate the mining process.
I can see it being like a driver but what stops someone else from just using the driver and using your Bitstream
There is nothing magical in ccminer. We have had our own stratum interface and “host” miner for some time. Senseless has suggested building these boards with a “shell” that would allow encrypted bitstreams (industry standard IP protection. The traditional use of FPGAs is DoD level projects. There are lots of solutions. I would say don’t expect open source FPGA miners.
|
|
|
who will reprogram this if a new algo is born?
Whitefire would release a software update with the new bitstream included. As well as others of us.
|
|
|
Xilinx has a large range of FPGA's chips. Why is VU9P the preferred one at this stage? Is it due to the larger than norm Total Block RAM?
The VU9P is the chip Xilinx has chosen to use in their VCU1525 boards as well as their VCU118 dev boards. The chip used for the dev boards is usually a mid-line chip with good yield, and the volume means it almost always has a better price to performance ratio than any other chip.
|
|
|
There's a fairly new coin I'd like to mine on AWS but I'm not a programmer. I've been reading up on vhdl but I really don't know where start. Would anyone be willing to assist me?
If you don’t want to publically state the coin, pm me. If you’re not a programmer, or a hardware engineer, or an FPGA developer (very different from a programmer), you likely have a very steep slope to getting an efficient miner running a new algorithm. "User 'GPUHoarder' has not chosen to allow messages from newbies." Well that’s fun, as a relatively new member of this forum myself. Fixed.
|
|
|
There's a fairly new coin I'd like to mine on AWS but I'm not a programmer. I've been reading up on vhdl but I really don't know where start. Would anyone be willing to assist me?
If you don’t want to publically state the coin, pm me. If you’re not a programmer, or a hardware engineer, or an FPGA developer (very different from a programmer), you likely have a very steep slope to getting an efficient miner running a new algorithm.
|
|
|
Timetravel10 fits in a single VU13P. You partition the FPGA into 16 blocks, and store about 14 partial bitstreams for each block. Then you do a dynamic partial reconfiguration from DDR4 to build the pipeline at the start of each block based on the current algorithm sequence. Yielding one hash per clock (i.e. 500MH/s @ 500MHz). You need 16 blocks because some functions like Groestl and Echo require 2 blocks. The FPGA can reconfigure itself in 0.25 seconds. The problem with Timetravel and X16R/X16S is the long time it takes to load the DDR4 bitstream table via USB. And you lose it if there is a power outage and must reprogram the DDR4 on each FPGA. This where utilizing the PCI bus would be an advantage.
Yep this is the big reason chain hashing doesn’t stop FPGAs - partial reconfiguration, the overhead of which can be nearly entirely latency hidden.
|
|
|
Found this product: "BittWare’s XUPSVH is an UltraScale+ VU33P/35P FPGA-based PCIe card. The UltraScale+ FPGA helps these demanding applications avoid I/O bottlenecks with integrated High Bandwidth Memory (HBM2) tiles on the FPGA that support up to 8 GBytes of memory at 460 GBytes/sec."
Each fpga device requires a unique bitstream. Think about it. Sorry, what is "unique bitstream"? You mean need to do unique coding for each FPGA model to do mining same coin? Yea my take it is like a bios for the fpga card that tells it exactly what to do, so each one is unique for every fpga. kind of like saying gpu's are a sledge hammer where a more basic set of instructions can be sent to it to take a swing at anything. In a fpga it would more like programming a laser to etch out exactly what you want resulting in a more precise operation. This is somewhat accurate. The bitstream is literally the blueprint for the exact circuit you want the FPGA to currently be wired as. Every model of FPGA is like a unique building - it needs its own tailored electrical blueprint, even though the electrical blue prints for two similarly sized datacenters might look very similar. Maybe on one the main power feed comes in on the south wall and the rows run north to south, and on the other the power comes in the east wall and rows are spaced differently running west to east. With FPGAs you can easily rewire the whole building (but not change where the fixed resources are), with an ASIC you’re starting from flat level ground, and once the building and wires are in they can never be changed. With a GPU you can’t change the wires, all the machines are already installled and all the manufacturing lines are already installed in stone, and each is only good for what it is good for, you can only tell the machines what order of operations to execute.
|
|
|
The Phi algo change will test the theory that they can adapt the fpga soffware in a matter of hours or days?? Also I don't get why the engineering samples of these boards are cheaper and they can put them in mass production for a higher price?? Shouldn't it be the other way around?
There are not many new customers of FPGAs. They are heavily incentivized to offer “cheap” dev kits to get a company to see the value in the chip and build a product around it. First they get you hooked, then...
|
|
|
Soon we should have FPGA boards in the realm of GPU costs ($200-600) that “everyone” could buy.
Specs ? In those price points you won’t see 20+kh Cryptonight and 17GH/s Keccak You’ll also be dealing with older chips - 28nm variety Xilinx 7 series. But - you can get performance on par with similarly priced GPUs (at 10% of power)for most of the discussed algorithms. And that’s from a relatively new platform that hasn’t had near the level of optimization. I hesitate to give exact specs yet because things can change between engineering samples and shipping products. Overall if you have $500 to spend and want to get into FPGAs you can do it profitably, but if you have $5000 you can participate much more profitably $ for $.
|
|
|
The Dwarf looks like a very interesting project/product, but the last pic (showing 0 H/s) doesn't inspire confidence in the claimed specs.
Can you explain a bit more what is meant here: "This small device is able to replace one rig and produce 7 kh / 10 W on the v7 and 3.5 kh / 8W on the heavy."
Sounds like it saves 2W power by cutting the performance in half?
If it's still profitable when it's ready to ship I'd order a couple of them.
CryptoNight7 and CryptoNightHeavy are two different algorithms with different power and performance characteristics.
|
|
|
lol yea multiplication and division are the same thing...come on guys
Well, Chuck Norris is known to successfully divide by zero. Maybe he's posting here under an alias? Or maybe he had spread his knowledge of previously illegal operations to the "energy industry" or the "GPU hoarders"? Who knows? Reference please. I missed something - who said multiplication and division are the same thing? I mean sure multiplication by a reciprocal is the same as division but who is counting. I generally have a distaste for arguments over semantics and pedantics. It is precisely why I don’t do much in formal mathematics. I fully respect and appreciate the value of and need for such rigour, but it is a means to an end and not the end itself.
|
|
|
... If you are interested in acquiring hardware, contact jason.harvey@avnet.com for the VCU1525, or Christian Robichaud of Bittware, for the Bittware XUPP3R-VU9P ( crobichaud@bittware.com). Tell them you were referred by Zetheron Technology and you want the cards for crypto-mining and they can expedite the lead time, which is currently around 4 weeks. The intro price (at Avnet) on the VCU1525 is $3995 USD, but it will be going up to around $5K in July. ... Initially I tried contacting avnet to order a board to two to play with but am getting nowhere. I signed their export agreement and every couple of days ask what I need to do to order the hardware but never get an answer to the question. Same here, no response after signing the agreement. Maybe they are swamped.. same here. Also I dont get why Lux would want to be FPGA resistant, FPGAs seem awesome. Most of the people I have spoken with are Anti-ASIC but pro general purpose acceleration hardware (such as FPGAs). There is certainly a group that has a vested interest in keeping their GPU farms running or using existing hardware that users have. Soon we should have FPGA boards in the realm of GPU costs ($200-600) that “everyone” could buy. That may sway opinions, but I imagine some will still cling to their GPU centric goals. I can’t fault a group for making a decision and sticking to it.
|
|
|
... If you are interested in acquiring hardware, contact jason.harvey@avnet.com for the VCU1525, or Christian Robichaud of Bittware, for the Bittware XUPP3R-VU9P ( crobichaud@bittware.com). Tell them you were referred by Zetheron Technology and you want the cards for crypto-mining and they can expedite the lead time, which is currently around 4 weeks. The intro price (at Avnet) on the VCU1525 is $3995 USD, but it will be going up to around $5K in July. ... Initially I tried contacting avnet to order a board to two to play with but am getting nowhere. I signed their export agreement and every couple of days ask what I need to do to order the hardware but never get an answer to the question. Same here, no response after signing the agreement. Maybe they are swamped.. I work with Avnet all the time, sadly this is normal. Hardware the backside doesn’t move at consumer pace, very slowly. Also lead times are a few weeks now.
|
|
|
No need, it's all I had to say. The rest will come with time.
The group asked for my opinion and I agree. PHI2 appears to be more of the same ASIC resistance line of thinking based on some misconceptions about what makes things hard for hardware. The fact that you lumped ASIC and FPGA resistance in one line is more telling than anything else. To be less skeptical, show me your white paper documenting the PHI2 algorithm and decisions that were made and cryptoanalysis that was done on it to be FPGA resistant? It also appear you’ve released the GPL miner binary, so I respectfully request the source. This To be clear I am not against coins that decide from the get-go to be ASIC resistant, even those that decide to be FPGA resistant. I’m perfectly willing to design hardware to take advantage of a coins failure to actually be resistant though. The challenge alone is worth it, there are many ways to solve a problem. What I do think is important is to hold those decisions to scrutiny. Otherwise what you have is smaller coins waving the “hey we are asic resistant!” flag as a marketing tool, hoping to be the next big GPU coin regardless of the validity of the claims. They need to stand up to public review and get experts in the various disciplines to chime in on their decisions. From what I can tell PHI2 seems to have been developed in secret - and nothing is published on the algorithm itself, which makes me suspect. It also makes it clear that it hasn’t been broadly reviewed and given a chance to stand on its own merits.
|
|
|
No need, it's all I had to say. The rest will come with time.
The group asked for my opinion and I agree. PHI2 appears to be more of the same ASIC resistance line of thinking based on some misconceptions about what makes things hard for hardware. The fact that you lumped ASIC and FPGA resistance in one line is more telling than anything else. To be less skeptical, show me your white paper documenting the PHI2 algorithm and decisions that were made and cryptoanalysis that was done on it to be FPGA resistant? It also appear you’ve released the GPL miner binary, so I respectfully request the source.
|
|
|
I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures.
Thanks in advance. I'll put the blame squarely on the vendor's lap. Intel which now acquired Altera still lists "An Independent Analysis of Altera’s FPGA Floating-point DSP Design Flow" from 2011 as the only source mentioning "accuracy". I've found several other, newer papers; but they all repeat the old bullshit methodology: only using single-precision and only estimating the errors. At most they'll show fused-multiply-add like if double precision or https://en.wikipedia.org/wiki/Kahan_summation_algorithm never existed, or didn't apply. As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading.
The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility, it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs. Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well.
The funny thing is that the closest to honest comparison of Xilinx's FP I've found on the Altera's site: https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01222-understanding-peak-floating-point-performance-claims.pdfThe main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.
I think this claim is true, but somewhat pessimistic. I think it would be fairly easy once wider range of cryptocurrency programmers start to appreciate floating point and https://en.wikipedia.org/wiki/Chaos_theory as an useful building blocks for the proof-of-work algorithms. I've only skimmed the currently available literature on the subject, but it is next to trivial to demolish all the current claims of FPGA superiority that I was able to find today: 1) use double precision 2) use division or reciprocal (either accurate or approximate) 3) use square-root or reciprocal square-root (either accurate or approximate) and I haven't even gotten into transcendental functions (on CPUs) or using later, pixel-oriented hardware in the shaders (on GPUs). You did, however, motivated me to reconsider Altera/Quartus for certain future projects. They are now shipping limited, but fully hardware implemented single-precision floating-point in their DSP blocks and their toolchain had improved in terms of supported OS-es/device-drivers. I deal with a lot of complex, large FFTs on CPUs, GPUs, and FPGAs. The “only using single precision” is unfortunately true of every vendor - GPU and FPGA. Marketing wants to use the big number - and frankly so do most real world users now. Modern GPUs are horrible at double precision. It is a sad fate. Your comparison also compares a modern Stratix 10 (10 TFLOPs) to the previous generation Ultrascale (not Ultrascale+) with slower fabric and significantly fewer DSP blocks compared to the VCU1525 (XCVU9P-L2FSGD2104E) everyone has been talking about here. Compared to even modern weak DP GPUs any normal priced CPU is horrible at double precision. A modern GPU runs circles around the Complex FFTs using double precision vs a CPU. Both become quickly memory bound. The FPGA performance is usually on par or slightly better for the double precision, but the benefits in the rest of the calculation are much better. I think you’ll be hard pressed to build a hashing algorithm that is entirely Floating Point like a synthetic benchmark. The only place FPGAs really fall down is upfront cost. I’m still a bit confused by why you think sqrt/reciprocal, and the transidentals are so difficult for FPGA, or that’s they are magically free on GPUs/CPUs. On at least AMD GPUs these are macro-ops that take (100s of clockcycles) EDIT: searching for my reference on this, I see these ops are quarter rate. May have been thinking of division) . On the FPGA you can devote a lot of logic to lowering the latency on these functions, or you can pipeline them nice and long with very high throughput to match what you need for the algorithms in question. You have none of that flexibility on the GPU. What you do have is a tremendous amount of power and overhead in instruction fetching, scheduling, branching, caching, etc. to a limited set of ports to implement the opcodes for each GCN/CUDA core.
|
|
|
Are you so sure about that? The floating point performance of modern FPGAs per Watt is much better than GPUs. Even in the 28nm Virtex 7 days TFLOPs were roughly on-par, it’s neck and neck now and the next gen FPGAs are leading ahead on the AI/ half precision’s stuff. That Floating point performance gap was true several years ago but has rapidly closed since.
The types of instructions you’re listing also take many many clock cycles on GPUs and CPUs, and can almost always be implemented faster in FPGAs
I've never seen a honest comparison involving actual verification of accuracy, not even bit-accuracy. I've seen some very skewed benchmarks made with very ugly code that conflated/convolved FPU performance with memory bandwidth/latency limitations. https://en.wikipedia.org/wiki/False_sharing seems to be in fashion nowadays for obfuscation purposes. Frequently the comparison don't even use the real floating-point but some extended-precision fixed-point in the inner loops because the original CPU/GPU implementation was just a generic library code versus carefully-optimized special-purpose code for the FPGA. It does make business sense, especially with regards to time-to-market; but I wouldn't call that science, even if published in the ostensibly scientific journal. Do you recall where you've seen those comparisons? I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures. As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading. The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility, it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs. Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well. The main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.
|
|
|
|