Bitcoin Forum
January 12, 2026, 06:07:17 AM *
News: Latest Bitcoin Core release: 30.2 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: [ANN] Kangaroo - Cross-Platform GPU Pollard's Kangaroo  (Read 83 times)
Redni (OP)
Sr. Member
****
Offline Offline

Activity: 380
Merit: 264


View Profile
January 10, 2026, 01:38:33 PM
Merited by kTimesG (5)
 #1

Kangaroo
High-Performance Cross-Platform ECDLP Solver

What is this?

Kangaroo is a high-performance implementation of Pollard's Kangaroo algorithm (also known as the lambda algorithm) for solving the Elliptic Curve Discrete Logarithm Problem (ECDLP) on the secp256k1 curve.

It is designed to find a private key d when it is known to lie within a specific range [a, b].

Why another Kangaroo?

Most popular Kangaroo implementations, such as the ones by JeanLucPons or RCKangaroo, are written in CUDA. This creates a significant barrier for many researchers and enthusiasts:
  • They require NVIDIA GPUs exclusively.
  • They do not work on AMD GPUs (Vulkan/RADV).
  • They do not work on Apple Silicon (Metal).
  • They require complex proprietary driver setups on Linux.

Kangaroo solves this by using Pure Rust and wgpu (WebGPU). By using WGSL compute shaders, we provide native hardware acceleration across nearly all modern GPUs and operating systems without any CUDA dependency.

Features

  • Cross-platform GPU support: Runs on Vulkan, Metal, and DX12 backends.
  • Distinguished Points (DP): Efficient collision detection with configurable DP bits to balance memory and speed.
  • Data Providers: Native integration with the boha library to automatically fetch Bitcoin Puzzle Transaction data.
  • Pure Rust: Zero-dependency on proprietary toolchains (like CUDA or OpenCL).
  • CPU Fallback: Includes a multi-threaded CPU solver for testing and verification.
Supported Hardware

  • AMD GPUs (via Vulkan/RADV)
  • NVIDIA GPUs (via Vulkan)
  • Apple Silicon (via Metal)
  • Intel GPUs (via Vulkan)
Installation

Arch Linux (AUR):
Code:
paru -S kangaroo

Via Cargo:
Code:
cargo install kangaroo

From Source:
Code:
git clone https://github.com/oritwoen/kangaroo
cd kangaroo
cargo build --release

Quick Start

Direct usage with a public key and range:
Code:
kangaroo --pubkey 03a2efa402fd5268400c77c20e574ba86409ededee7c4020e4b9f0edbee53de0d4 --start 8000000000 --range 40

Using boha provider (auto-fetches puzzle data):
Code:
kangaroo --target boha:b1000/66

List available puzzles from providers:
Code:
kangaroo --list-providers

Use Cases

When to use:
  • Solving Bitcoin Puzzle Transaction challenges where the public key is known.
  • Recovering keys with partial information (e.g., you know 240 bits of the private key).
  • Verifying if a private key exists within a specific large range.

When NOT to use:
  • Full 256-bit brute force.
  • BIP39 passphrase attacks.
  • Attacking addresses where the public key has never been revealed on-chain.

Links

GitHub Repository: https://github.com/oritwoen/kangaroo
Issue Tracker: https://github.com/oritwoen/kangaroo/issues
License: MIT

Disclaimer

This software is provided for educational and research purposes only. The authors are not responsible for any misuse. Always ensure you have the legal right to perform cryptographic research on any target you choose.
kTimesG
Full Member
***
Offline Offline

Activity: 714
Merit: 221


View Profile
January 10, 2026, 01:52:45 PM
 #2

Interesting, some new code finally, not based on JLP.

I have one big (huge in fact) dilemma though:

You are computing points in Jacobian coordinates, and check the DP criteria on them. Only then (if it's a DP, according to the check), you convert to affine for storage.

AFAIK this cannot work as you expect, because you need the affine coordinates (the stable, canonical representation) BEFORE you check for the DP criteria.

Are you sure your code works as expected?

It would be much faster to simply work in affine coordinates, and this issue would not exist.

Off the grid, training pigeons to broadcast signed messages.
Redni (OP)
Sr. Member
****
Offline Offline

Activity: 380
Merit: 264


View Profile
January 10, 2026, 02:27:06 PM
 #3

You're right to scrutinize this carefully - it's a subtle point!

Looking at the shader flow in kangaroo_jacobian.wgsl:

1. Batch normalization happens FIRST in each loop iteration (lines 118-188), converting all points to affine (Z=1)
2. Then DP check runs on p.x[0] which is already the affine X coordinate
3. Then the jump happens via jac_add_affine(), producing a Jacobian result
4. Loop repeats - next iteration normalizes again before DP check

So the DP criteria IS checked on affine coordinates. The batch Montgomery inversion amortizes the cost of fe_inv() across 64 threads per workgroup.

The misleading part is the struct comments (e.g. "Jacobian X" in DistinguishedPoint) - these are stale. After normalization, Z=1, so what's stored is effectively affine. I should clean those up to avoid confusion.

I've tested this with manual parameters on known keys, for example:

Code:
oritwoen@rog-zephyrus-g14 ~/P/kangaroo (main) [SIGINT]> kangaroo --pubkey 03a2efa402fd5268400c77c20e574ba86409ededee7c4020e4b9f0edbee53de0d4 --start 8000000000 --range 40
2026-01-10T14:22:52.104488Z  INFO kangaroo: Kangaroo ECDLP Solver
2026-01-10T14:22:52.104521Z  INFO kangaroo: =====================
2026-01-10T14:22:52.104527Z  INFO kangaroo: Pubkey: 03a2efa402fd5268400c77c20e574ba86409ededee7c4020e4b9f0edbee53de0d4
2026-01-10T14:22:52.104533Z  INFO kangaroo: Search range: 40 bits from 0x8000000000
2026-01-10T14:22:52.337594Z  INFO kangaroo: GPU: AMD Radeon RX 6800S (RADV NAVI23)
2026-01-10T14:22:52.337608Z  INFO kangaroo: Compute units: 65535
2026-01-10T14:22:52.337624Z  INFO kangaroo: DP bits: 11
2026-01-10T14:22:52.337637Z  INFO kangaroo: Kangaroos: 262144
2026-01-10T14:22:52.337641Z  INFO kangaroo::solver: Creating pipeline...
2026-01-10T14:22:52.337646Z  INFO kangaroo::gpu::pipeline: Loading shader sources...
2026-01-10T14:22:52.337649Z  INFO kangaroo::gpu::pipeline: Creating shader module...
2026-01-10T14:22:52.347979Z  INFO kangaroo::gpu::pipeline: Shader module created
2026-01-10T14:22:52.347993Z  INFO kangaroo::gpu::pipeline: Creating bind group layout...
2026-01-10T14:22:52.348012Z  INFO kangaroo::gpu::pipeline: Bind group layout created
2026-01-10T14:22:52.348016Z  INFO kangaroo::gpu::pipeline: Creating pipeline layout...
2026-01-10T14:22:52.348023Z  INFO kangaroo::gpu::pipeline: Pipeline layout created
2026-01-10T14:22:52.348027Z  INFO kangaroo::gpu::pipeline: Creating compute pipeline...
2026-01-10T14:22:52.352347Z  INFO kangaroo::gpu::pipeline: Compute pipeline created
2026-01-10T14:22:52.352370Z  INFO kangaroo::solver: Pipeline created
2026-01-10T14:22:52.352376Z  INFO kangaroo::solver: Generating jump table...
2026-01-10T14:22:52.363831Z  INFO kangaroo::solver: Jump table generated: 256 entries
2026-01-10T14:22:52.363845Z  INFO kangaroo::solver: Jump dist[0] = 0x00062895
2026-01-10T14:22:52.363852Z  INFO kangaroo::solver: Jump dist[1] = 0x000367e8
2026-01-10T14:22:52.363857Z  INFO kangaroo::solver: Jump dist[2] = 0x00082207
2026-01-10T14:22:52.363863Z  INFO kangaroo::solver: Jump dist[3] = 0x000dd9b2
2026-01-10T14:22:52.363868Z  INFO kangaroo::solver: DP mask created
2026-01-10T14:22:52.363872Z  INFO kangaroo::solver: Config created: steps_per_call=16
2026-01-10T14:22:52.363877Z  INFO kangaroo::solver: Creating GPU buffers...
2026-01-10T14:22:53.598747Z  INFO kangaroo::solver: Calibrating GPU performance...
2026-01-10T14:22:55.822730Z  INFO kangaroo::solver:   steps_per_call=16: 1148ms
2026-01-10T14:22:55.822753Z  INFO kangaroo::solver: Calibrated: steps_per_call=16
2026-01-10T14:22:55.823827Z  INFO kangaroo: Starting search...
⠉ [00:00:02] [########################################] 8388608/1048576 (0s)                                                                               2026-01-10T14:22:58.268120Z  INFO kangaroo::solver: Ops: 12M | DPs: 4054 (1999 tame, 2055 wild)
⠚ [00:00:05] [########################################] 16777216/1048576 (0s)                                                                              2026-01-10T14:23:00.896525Z  INFO kangaroo::solver: Ops: 20M | DPs: 8153 (4094 tame, 4059 wild)
⠂ [00:00:07] [########################################] 25165824/1048576 (0s)                                                                              2026-01-10T14:23:04.954863Z  INFO kangaroo::cpu::dp_table: Collision found! Key: 0xe9ae4933d6
  [00:00:09] [########################################] 1048576/1048576 (0s)                                                                               2026-01-10T14:23:04.954996Z  INFO kangaroo: Private key found: 0xe9ae4933d6
2026-01-10T14:23:04.955004Z  INFO kangaroo: Verification: SUCCESS
2026-01-10T14:23:04.955010Z  INFO kangaroo: Total operations: 29360128
2026-01-10T14:23:04.955017Z  INFO kangaroo: Time elapsed: 9.13s
kTimesG
Full Member
***
Offline Offline

Activity: 714
Merit: 221


View Profile
January 10, 2026, 02:57:52 PM
 #4

OK, cool, I didn't dig deeper, only noticed you were storing x and z for a DP, which seems weird.

But why do you use J coordinates anyway? For the kangaroo use-case, the batch addition is faster if done in affine, as there are less field operations to perform, and the inversion is inevitable.

Also some speed benchmarks would be welcome.

Off the grid, training pigeons to broadcast signed messages.
Redni (OP)
Sr. Member
****
Offline Offline

Activity: 380
Merit: 264


View Profile
January 10, 2026, 03:56:57 PM
 #5

Initially I had issues with GPU testing for the affine approach, which is why I went with Jacobian. I've since resolved those problems.

Based on a simple benchmark on puzzle #40, you're right - affine batch addition is about 30% faster than Jacobian on my AMD RX 6800S:

Code:
JACOBIAN: ~8.7s average
AFFINE:   ~6.3s average

I'll switch to affine as the default in the next release. I also plan to explore further optimizations and add proper benchmarks so people can compare performance across different GPUs. Thanks for pushing on this!
Redni (OP)
Sr. Member
****
Offline Offline

Activity: 380
Merit: 264


View Profile
January 10, 2026, 06:42:43 PM
 #6

New release, 30% faster using affine instead of jacobian Wink Details and benchmarks are in the release notes.

https://github.com/oritwoen/kangaroo/releases/tag/v0.3.0
kTimesG
Full Member
***
Offline Offline

Activity: 714
Merit: 221


View Profile
January 10, 2026, 08:48:05 PM
 #7

I really like your code, especially because it's clearly commented, unlike all the n00b projects out there which are a spaghetti mess.

That said, there are plenty of optimizations to be done, and some obvious bugs (which you are clearly aware of, given the comments).

For example, if the kang == jump point, it seems that the walk gets corrupted, but the probability is wrong. There are very high chances for this to happen, especially for the wild points (since they can clearly be at the beginning of the interval, e.g. with small scalars).

There are some ways to mitigate this without ever having to check if x1 == x2, for example by always making sure the shifted interval starts at (maxJumpDist + 1) - for 2-kang, this works if all kangs always walk forward, as it makes sure that it can never be the case that a kang equals any jump point.

There's also a totally different solution to that if you use symmetry (3-kang or RC's SOTA). You should document a bit what exact type of algorithm you implemented.

Inversion is the killer though - FLT is way too slow, especially on a GPU.

Nice work overall, but it feels more like a playground project. 70-bits on a decent GPU is solvable within minutes, from scratch, and within seconds once a decent amount of Tame DPs are already stored. Not X days.

Off the grid, training pigeons to broadcast signed messages.
Redni (OP)
Sr. Member
****
Offline Offline

Activity: 380
Merit: 264


View Profile
January 11, 2026, 04:34:41 PM
 #8

Thanks! You're right - this is still very much an experimental project.

The backstory: I was doing deep research and experiments with cryptography and blockchain (especially Bitcoin), and at some point hit a wall. None of the existing Kangaroo tools were up-to-date or worked natively with my setup - I'm on Arch Linux with an AMD GPU, and most implementations are CUDA-only or abandoned. So I built this from scratch for my own needs and decided to share it.

It's also a learning project for me, and I really appreciate the feedback. Getting suggestions like yours helps me improve both the code and my understanding of the algorithm.

Regarding your specific points:

Jump point collision - already handled. When dx == 0, I substitute fe_one() to avoid poisoning the batch inversion, then skip the add. See kangaroo_affine.wgsl:160-164.

Batch inversion - implemented using Montgomery's trick with Blelloch parallel scan (no individual FLT inversions in the hot path).

Algorithm variant documentation - fair point, I should document that this is a basic 2-kangaroo implementation without symmetry. Will add that to the README.

Performance vs SOTA - I know there's a gap. No symmetry optimization yet, and probably other things I'm missing. Happy to hear specific suggestions if you have them.

---

Latest version: v0.4.0

Changes since first release:
Code:
v0.2.0  GPU auto-calibration, data provider system, wgpu v28
v0.3.0  Affine batch addition
v0.4.0  Parallel batch inversion (Blelloch scan)

Performance progression (AMD RX 6800S, 48-bit range):
Code:
Version   Rate       Improvement
v0.2.0    3.70 M/s   baseline
v0.3.0    5.50 M/s   +49%
v0.4.0    8.84 M/s   +139% total (+61% vs v0.3.0)

---

Thanks again for taking the time to look at the code.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!