Acorn M.2 FPGA based GPU Accelerator

GPUHoarder (OP)

Member

Offline

Activity: 154
Merit: 37

Acorn M.2 FPGA based GPU Accelerator

June 01, 2018, 06:06:38 AM

Merited by vapourminer (5), suchmoon (5), 64dimensions (5), smoolae (1)

#1

This information all existed in the discord but I wanted to share it with everyone.

So we’ve developed an FPGA accelerator over the past few months in M.2 (same as nVME drives) form factor designed to operate both standalone and in conjunction with GPUs.

The first version to be released has 4x high speed PCIe lanes to communicate between the system/GPUs as well as 512MB or 1GB of onboard DDR3 along with a 100k+ LE or 200k+ LE FPGA of high speed grade. We’ve named it the Acorn, and the three models are the CLE-101, CLE-215, and CLE-215+

General expectation is it will provide performance roughly scaled with price/performance of the VCU1525, but it has a unique role and is not applicable to all of the same algorithms. Its performance in this role is dominated by its interconnect bandwidth and not its processing power.

It is capable of providing up to 30MH of lift to a mining system with GPUs on a hand full of algorithms or operate independently at higher-than-GPU level hashrates for other non-memory intensive algorithms (Keccak, etc). I will be releasing it alongside our mining software and bitstreams to support hybrid GPU acceleration. This project was not developed commercially, it was developed out of a product for my day job for internal use in our own mining systems to give an edge to traditional PCs and gaming systems turned miners.

The accelerator works by streaming high bandwidth hash state between GPUs and the FPGA over PCIe., allowing each piece of hardware to handle the portion of the algorithm it is best at. In general this means memory bandwidth or area heavy portions of the algorithm may be handled by the GPU, and hash algorithms designed for hardware implementations are handled by the FPGA. This approach works for any algorithm whose internal state is 256 bit (60Mh gains) or 512 bit (x16r, Lyra2Rev2, etc.) or smaller. The accelerator supports rapidly reconfiguring its algorithms from on-board DDR to enable handling of per-block or period (TimeTravel10) re-sequencing. It was designed originally to provide performance gains (especially for older GPUs with poor cores) and power savings for ETH by way of offloading the opening and closing Keccak calculations, as well as hash-selection to improve locality of reference for early ETH rounds.

Given the anticipated path of ETH itself regarding POS and other fork possibilities please consider all those things if ETH is your target. It may be the most popular coin for GPUs, that does not mean it is the best use of FPGA or hybrid tech.

I’ve decided to make this hardware available to community at near cost, given all the FPGA interest lately, alongside my belief that broadly available general purpose acceleration hardware at its true market cost (not low volume industry specific dev boards) is the best defense against complete ASIC centralization. You will see this philosophy reflected in my activity around the VCU1525 board as well.

Anticipated pre-order prices of $199 for the CLE-101 512MB variant and $329 on the high end highest speed grade CLE-215+ 1GB DRAM version. On-board power consumption is nominally 15W. It will include a heatsink adequate for this dissipation level with reasonable airflow. It is important to note that to fit the FPGA this adapter is slightly outside of the 2280 M.2 specification, weighing in at 2380. The vast majority of M.2 slots should not have an issue with this.

I am also pursuing making available well priced options for individual PCIe x4 to M.2 M-key host boards (these are broadly available for $10-15), as well as Quad-M.2 PCIe switched and Bifurcated x16 host boards for those who do not have the available M.2 M-Key slots or require up to 240MH of acceleration.

I won’t post exact per algorithm stats or performance until I can do final testing of the actual boards to be shipped with the release hardware/heatsink/thermal management pieces in place, at which point I’ll accept pre-orders. This device requires quite a bit of testing to cover the list of common GPUs, PCIe configurations, and supported algorithms. I have no desire to sell anyone anything not useful to them, or to push a board at all, let alone one based on 3D renders, prototype parts pictures, or choppy YouTube videos, so I believe this full set of data along with final product pictures and overview must be published before I will take any preorders. I am sorry if that tests your patience.

Prototypes exist and I’ve already secured most of the hardware for a first batch so lead time will only be PCB + assembly.

At the time of shipping I will be releasing our internal miner software in closed source form for Windows and Linux that supports GPU only as well as Hybrid acceleration. You’re also welcome to develop your own bitstreams for the accelerator, and will have all the specifications necessary to do so.

I will also be publishing the interface for the bitstreams so that open source miners that wish to can use the FPGA directly.

We are handling all CE, FCC, RoHS, and other certifications as well as ITAR and export compliance, so we will be able to ship to all non-US embargo’d countries. Taxes and import duties will fall on the purchaser. We will be offering at least a 90 day warranty.

All feedback is welcome. This is not my source of income, nor that of the rest of my team, and we don’t want anyone’s money unless they are happy with what we’re offering. I’m also happy to continue conversations I am already having with coin devs and miner developers on how or if FPGAs fit into their plans for their coin and/or ASIC Resistance strategies. This community is about choice, and I will respect the choices of those teams.

So all I would like from all of you beyond the feedback, is for anyone interested to hit our pre-order registration survey at http://www.squirrelsresearch.com to help us ensure we’re covering your needs and wants and have all the appropriate hardware secured. Based on that info very detailed performance information and full device photos (spoiler - it looks like an SSD with a heatsink on it!) will be published at the time preorders open, expected in mid-June.

- David

trillobeat

Newbie

Offline

Activity: 39
Merit: 0