Bitcoin Forum
February 11, 2026, 06:48:44 PM *
News: Latest Bitcoin Core release: 30.2 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: UltrafastSecp256k1Zero-dependency high-performance secp256k1 CPU, CUDA, RISC-V  (Read 10 times)
shrec (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
Today at 05:27:59 PM
 #1

Hi everyone,

I’ve been working on a performance-oriented implementation of secp256k1 written from scratch in C++ and CUDA, with optional x86-64 (BMI2/ADX) and RISC-V optimizations.

This project is focused on architectural efficiency, benchmarking, and hardware-aware ECC implementation. It is not intended for breaking cryptography or private key recovery.

### Goals

* Implement secp256k1 without external big-integer libraries
* Maintain deterministic memory layout
* Avoid dynamic allocation in hot paths
* Explore hardware-level performance limits

### Features

* Complete field arithmetic (mod p)
* Scalar arithmetic (mod n)
* Affine and Jacobian coordinates
* GLV optimization
* CPU optimizations (BMI2/ADX)
* RISC-V RV64GC support
* CUDA batch kernels
* Benchmark suite included

### Measured Performance

On RTX 5060:
~2.5 billion Jacobian mixed-add operations per second (measured)

CPU benchmarks also show 3–5× improvement over naive implementations when using BMI2/ADX paths.

### Design Approach

The implementation treats elliptic curve math as a hardware interaction problem:

* Little-endian limb layout for computational efficiency
* Explicit carry handling
* Batch inversion via Montgomery’s trick
* Minimal abstraction in hot execution paths

The idea is to reduce unnecessary movement and keep arithmetic predictable at the instruction level.

### Scope Disclaimer

This project does NOT claim:

* Any weakness in secp256k1
* Practical discrete log attacks
* Private key recovery

It is purely for performance research, benchmarking, and educational exploration of ECC implementations.

### Repository

[https://github.com/shrec/UltrafastSecp256k1](https://github.com/shrec/UltrafastSecp256k1)

I would appreciate feedback from anyone working on:

* ECC performance
* GLV implementation details
* GPU optimization strategies
* RISC-V vectorization approaches

Thanks.

If there is interest, I can post detailed benchmark comparisons and profiling results.
Eze BTC
Jr. Member
*
Offline Offline

Activity: 65
Merit: 8


View Profile
Today at 06:07:21 PM
 #2

Hi everyone,

I’ve been working on a performance-oriented implementation of secp256k1 written from scratch in C++ and CUDA, with optional x86-64 (BMI2/ADX) and RISC-V optimizations.

This project is focused on architectural efficiency, benchmarking, and hardware-aware ECC implementation. It is not intended for breaking cryptography or private key recovery.

### Goals

* Implement secp256k1 without external big-integer libraries
* Maintain deterministic memory layout
* Avoid dynamic allocation in hot paths
* Explore hardware-level performance limits

### Features

* Complete field arithmetic (mod p)
* Scalar arithmetic (mod n)
* Affine and Jacobian coordinates
* GLV optimization
* CPU optimizations (BMI2/ADX)
* RISC-V RV64GC support
* CUDA batch kernels
* Benchmark suite included

### Measured Performance

On RTX 5060:
~2.5 billion Jacobian mixed-add operations per second (measured)

CPU benchmarks also show 3–5× improvement over naive implementations when using BMI2/ADX paths.

### Design Approach

The implementation treats elliptic curve math as a hardware interaction problem:

* Little-endian limb layout for computational efficiency
* Explicit carry handling
* Batch inversion via Montgomery’s trick
* Minimal abstraction in hot execution paths

The idea is to reduce unnecessary movement and keep arithmetic predictable at the instruction level.

### Scope Disclaimer

This project does NOT claim:

* Any weakness in secp256k1
* Practical discrete log attacks
* Private key recovery

It is purely for performance research, benchmarking, and educational exploration of ECC implementations.

### Repository

[https://github.com/shrec/UltrafastSecp256k1](https://github.com/shrec/UltrafastSecp256k1)

I would appreciate feedback from anyone working on:

* ECC performance
* GLV implementation details
* GPU optimization strategies
* RISC-V vectorization approaches

Thanks.

If there is interest, I can post detailed benchmark comparisons and profiling results.

That's an impressive one from you I must say.
Howbeit, I've  a thing to say that can be helpful in optimization of ECC:
If you need a tighter determinism in hot paths, consideration should be given to Fermat's little theorem inversion (via p-2 exponentiation), especially if side channel resistance is the ultimate target. It should be prioritize over constant-time binary extended gcd. However, if you want solid performance research, a not-so-slow, I mean quicker non-constant paths are not bad at all.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!