Bitcoin Forum
May 21, 2025, 12:19:18 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 »  All
  Print  
Author Topic: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo  (Read 10771 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic. (11 posts by 6+ users deleted.)
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 11, 2024, 09:39:38 AM
 #21

Update: final part #3 (RCKangaroo) is ready and will be released shortly.

I've solved #120, #125, #130. How: https://github.com/RetiredC
stilichovandal
Jr. Member
*
Offline Offline

Activity: 34
Merit: 5


View Profile
December 11, 2024, 03:18:48 PM
 #22

Update: final part #3 (RCKangaroo) is ready and will be released shortly.

Thank you,
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 12, 2024, 11:20:18 AM
Merited by Cricktor (3)
 #23

As promised, here is the third and final part: RCKangaroo, Windows/Linux, open source:
https://github.com/RetiredC/RCKangaroo
This software demonstrates fast implementation of SOTA method and advanced loop handling on RTX40xx cards.
Note that I have not included all possible optimizations because it's public code and I want to keep it as simple/readable as possible. Anyway, it's fast enough to demonstrate the advantage and you can improve it further if you have enough skills.

I've solved #120, #125, #130. How: https://github.com/RetiredC
Lolo54
Member
**
Offline Offline

Activity: 131
Merit: 32


View Profile
December 12, 2024, 02:14:24 PM
 #24

wow,  congratulations to you for finding #120, 125 and 130 (probably a world record for 130). Your skills have been rewarded with over 37 BTC recovered. How many GPUs did you have to use to find the 130? Have you used an optimized version of your RCKangaroo or another tool?
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 12, 2024, 02:59:13 PM
 #25

wow,  congratulations to you for finding #120, 125 and 130 (probably a world record for 130). Your skills have been rewarded with over 37 BTC recovered. How many GPUs did you have to use to find the 130? Have you used an optimized version of your RCKangaroo or another tool?

Less than 37BTC, check you calculations.
Most of GPU code from RCKangaroo was used for #130.
I spent two months, about 400 RTX4090, I was lucky and solved it twice faster than expected.

I've solved #120, #125, #130. How: https://github.com/RetiredC
Lolo54
Member
**
Offline Offline

Activity: 131
Merit: 32


View Profile
December 12, 2024, 03:09:15 PM
 #26

As much for me 26 bTC I had not noticed that for #120 the price had not yet been *10! the reward remains correct. Thank you for your response and the details but 400 RTX 4090 for 2 months is colossal and impossible for 99.9% of people!! Well done anyway
albertajuelo
Newbie
*
Offline Offline

Activity: 8
Merit: 2


View Profile
December 12, 2024, 04:58:47 PM
 #27

Thanks for sharing your repositories and source code.

Great job!
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 12, 2024, 05:33:29 PM
 #28

Some explanations about other GPUs support:
1. I have zero interest in old cards (same for AMD cards) so I don't have them for development/tests and don't support them.
2. You can easily enable support for older nvidia cards, it will work, but my code is designed for the latest generation, for previous generations it's not optimal and the speed is not the best, that's why I disabled them.

I've solved #120, #125, #130. How: https://github.com/RetiredC
albertajuelo
Newbie
*
Offline Offline

Activity: 8
Merit: 2


View Profile
December 12, 2024, 05:57:12 PM
Last edit: December 13, 2024, 10:21:19 AM by albertajuelo
 #29

Executing the puzzle #85 on RTX 4060

Quote
CUDA devices: 1, CUDA driver/runtime: 12.7/12.6
GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU, 8.00 GB, 24 CUs, cap 8.9, PCI 1, L2 size: 32768 KB
Total GPUs for work: 1

MAIN MODE

Solving public key
X: 29C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
Y: 0E02C3AFD79913AB0961C95F12498F36A72FFA35C93AF27CEE30010FA6B51C53
Offset: 0000000000000000000000000000000000000000001000000000000000000000

Solving point: Range 84 bits, DP 16, start...
SOTA method, estimated ops: 2^42.202, RAM for DPs: 3.062 GB. DP and GPU overheads not included!
Estimated DPs per kangaroo: 523.378.
GPU 0: allocated 841 MB, 147456 kangaroos.
GPUs started...
MAIN: Speed: 1180 MKeys/s, Err: 0, DPs: 174K/77175K, Time: 0d:00h:00m, Est: 0d:01h:11m
...
MAIN: Speed: 1132 MKeys/s, Err: 0, DPs: 154839K/77175K, Time: 0d:02h:28m, Est: 0d:01h:14m
Stopping work ...
Point solved, K: 2.310 (with DP and GPU overheads)


PRIVATE KEY: 00000000000000000000000000000000000000000011720C4F018D51B8CEBBA8

I compare the number of operations with JLP:

Quote
Kangaroo v2.1
Start:1000000000000000000000
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^84
Jump Avg distance: 2^42.03
Number of kangaroos: 2^23.32
Suggested DP: 16
Expected operations: 2^43.12
Expected RAM: 6347.6MB
DP size: 16 [0xFFFF000000000000]

Quick comparison, its using the half of the RAM used with the same DP and less operations
b0dre
Jr. Member
*
Offline Offline

Activity: 59
Merit: 1


View Profile
December 12, 2024, 09:15:31 PM
 #30

As promised, here is the third and final part: RCKangaroo, Windows/Linux, open source:
https://github.com/RetiredC/RCKangaroo
This software demonstrates fast implementation of SOTA method and advanced loop handling on RTX40xx cards.
Note that I have not included all possible optimizations because it's public code and I want to keep it as simple/readable as possible. Anyway, it's fast enough to demonstrate the advantage and you can improve it further if you have enough skills.

Thank you so much, any advice for Linux users?
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 12, 2024, 09:56:13 PM
 #31

Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1.
3090 shows about 3GKeys/sec only, it can be really faster.

Thank you so much, any advice for Linux users?

Linux exe is included as well, or you can compile it by yourself.

I've solved #120, #125, #130. How: https://github.com/RetiredC
b0dre
Jr. Member
*
Offline Offline

Activity: 59
Merit: 1


View Profile
December 12, 2024, 11:42:37 PM
 #32

Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1.
3090 shows about 3GKeys/sec only, it can be really faster.

Thank you so much, any advice for Linux users?

Linux exe is included as well, or you can compile it by yourself.

Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks!
Omniavincitbit
Copper Member
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 13, 2024, 10:46:19 AM
 #33

Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1.
3090 shows about 3GKeys/sec only, it can be really faster.

Thank you so much, any advice for Linux users?

Linux exe is included as well, or you can compile it by yourself.

Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks!


have a Makefile compatible with linux ?
b0dre
Jr. Member
*
Offline Offline

Activity: 59
Merit: 1


View Profile
December 13, 2024, 11:31:02 AM
 #34

Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1.
3090 shows about 3GKeys/sec only, it can be really faster.

Thank you so much, any advice for Linux users?

Linux exe is included as well, or you can compile it by yourself.

Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks!


have a Makefile compatible with linux ?


Sure

Code:
CC = g++
NVCC = nvcc
CFLAGS = -c -O3 -g  # Added optimization and debugging
LDFLAGS = -L/usr/local/cuda/lib64 -lcudart -I/usr/local/cuda/include
ARCH_FLAGS = -arch=sm_86  # Explicitly specify the architecture for RTX 3060
OBJ = RCGpuCore.o Ec.o GpuKang.o RCKangaroo.o utils.o
TARGET = RCKangaroo

all: $(TARGET)

$(TARGET): $(OBJ)
$(CC) -o $(TARGET) $(OBJ) $(LDFLAGS)

RCGpuCore.o: RCGpuCore.cu
$(NVCC) $(CFLAGS) $(ARCH_FLAGS) RCGpuCore.cu

Ec.o: Ec.cpp
$(CC) $(CFLAGS) Ec.cpp

GpuKang.o: GpuKang.cpp
$(CC) $(CFLAGS) GpuKang.cpp

RCKangaroo.o: RCKangaroo.cpp
$(CC) $(CFLAGS) RCKangaroo.cpp

utils.o: utils.cpp
$(CC) $(CFLAGS) utils.cpp

clean:
rm -f *.o $(TARGET)
Omniavincitbit
Copper Member
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 13, 2024, 02:55:58 PM
 #35

this is my version of make file

PROJECT = RCKangaroo
CC = g++
NVCC = nvcc
CFLAGS = -Wall -O2
INCLUDES = -I/usr/local/cuda/include 
LIBDIRS = -L/usr/local/cuda/lib64   
LIBS_CUDA = -lcudart               

# File sorgente
CPP_FILES = $(wildcard *.cpp)
CU_FILES = $(wildcard *.cu)
OBJECTS = $(CPP_FILES:.cpp=.o) $(CU_FILES:.cu=.o)

all: $(PROJECT)

$(PROJECT): $(OBJECTS)
   $(NVCC) $(OBJECTS) -o $(PROJECT) $(LIBDIRS) $(LIBS_CUDA)

%.o: %.cpp
   $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@

%.o: %.cu
   $(NVCC) -O2 $(INCLUDES) -c $< -o $@

clean:
   rm -f $(PROJECT) $(OBJECTS)
makekang
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
December 14, 2024, 07:03:13 AM
 #36

how to use it with RTX2070super, please, because I only have 2070, I am very interested in testing your work. I tried modifying the parameter settings but it failed.
RetiredCoder (OP)
Full Member
***
Offline Offline

Activity: 128
Merit: 117


No pain, no gain!


View Profile WWW
December 14, 2024, 09:05:25 AM
Merited by Etar (1)
 #37

how to use it with RTX2070super, please, because I only have 2070, I am very interested in testing your work. I tried modifying the parameter settings but it failed.

As far as I remember these cards have only 64KB of shared memory.
Set JMP_CNT to 512 and change 17 to 16 in this line in KernelB:
u64* table = LDS + 8 * JMP_CNT + 17 * THREAD_X;
and recalculate LDS_SIZE_ constants.
I think it's enough, though may be I forgot something...
The main issue is not in compiling, but in optimizations, my code is not for 20xx and 30xx cards, so it won't work with good speed there.
That's why I don't want to support old cards: if I support them officially but not optimize you will blame me that they have bad speed.
But feel free to modify/optimize sources for your hardware Smiley

I've solved #120, #125, #130. How: https://github.com/RetiredC
Etar
Sr. Member
****
Offline Offline

Activity: 654
Merit: 316


View Profile
December 14, 2024, 11:58:14 AM
 #38

That's why I don't want to support old cards: if I support them officially but not optimize you will blame me that they have bad speed.
But feel free to modify/optimize sources for your hardware Smiley
I'll be honest, your kangaroo finds the key faster than mine or jlp. Yes, the speed shows less, but in the end it finds it much faster.
Works even on 1660 super (~600Mkeys/s).
Thanks for sharing.
MrGPBit
Newbie
*
Offline Offline

Activity: 18
Merit: 1


View Profile
December 14, 2024, 12:40:25 PM
 #39

I'll be honest, your kangaroo finds the key faster than mine or jlp. Yes, the speed shows less, but in the end it finds it much faster.
Works even on 1660 super (~600Mkeys/s).
Thanks for sharing.

@Etar how did you do everything to make your GTX 1660 work? Can you tell me all the changes and show me them? Many thanks
Etar
Sr. Member
****
Offline Offline

Activity: 654
Merit: 316


View Profile
December 14, 2024, 12:52:41 PM
Last edit: December 14, 2024, 01:49:58 PM by Etar
 #40

@Etar how did you do everything to make your GTX 1660 work? Can you tell me all the changes and show me them? Many thanks
file: RCGpuCore.cu
line 285: u64* table = LDS + 8 * JMP_CNT + 16 * THREAD_X;
file: RCKangaroo.cpp
Line 99: if (deviceProp.major < 6)
file: defs.h
#define LDS_SIZE_A         (64 * 1024)
#define LDS_SIZE_B         (64 * 1024)
#define LDS_SIZE_C         (64 * 1024)
#define JMP_CNT            512
file: RCKangaroo.vcxproj
line 118: <CodeGeneration>compute_75,sm_75;compute_75,sm_75</CodeGeneration>
line 141: <CodeGeneration>compute_75,sm_75;compute_75,sm_75</CodeGeneration>

Code:
CUDA devices: 1, CUDA driver/runtime: 12.6/12.1
GPU 0: NVIDIA GeForce GTX 1660 SUPER, 6.00 GB, 22 CUs, cap 7.5, PCI 1, L2 size: 1536 KB
Total GPUs for work: 1

MAIN MODE

Solving public key
X: 29C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
Y: 0E02C3AFD79913AB0961C95F12498F36A72FFA35C93AF27CEE30010FA6B51C53
Offset: 0000000000000000000000000000000000000000001000000000000000000000

Solving point: Range 84 bits, DP 16, start...
SOTA method, estimated ops: 2^42.202, RAM for DPs: 3.062 GB. DP and GPU overheads not included!
Estimated DPs per kangaroo: 570.958.
GPU 0: allocated 772 MB, 135168 kangaroos.
GPUs started...
MAIN: Speed: 599 MKeys/s, Err: 0, DPs: 88K/77175K, Time: 0d:00h:00m:10s, Est: 0d:02h:20m:43s
Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!