RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 11, 2024, 09:39:38 AM |
|
Update: final part #3 (RCKangaroo) is ready and will be released shortly.
|
|
|
|
stilichovandal
Jr. Member
Offline
Activity: 34
Merit: 5
|
 |
December 11, 2024, 03:18:48 PM |
|
Update: final part #3 (RCKangaroo) is ready and will be released shortly.
Thank you,
|
|
|
|
RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 12, 2024, 11:20:18 AM |
|
As promised, here is the third and final part: RCKangaroo, Windows/Linux, open source: https://github.com/RetiredC/RCKangarooThis software demonstrates fast implementation of SOTA method and advanced loop handling on RTX40xx cards. Note that I have not included all possible optimizations because it's public code and I want to keep it as simple/readable as possible. Anyway, it's fast enough to demonstrate the advantage and you can improve it further if you have enough skills.
|
|
|
|
Lolo54
Member

Offline
Activity: 131
Merit: 32
|
 |
December 12, 2024, 02:14:24 PM |
|
wow, congratulations to you for finding #120, 125 and 130 (probably a world record for 130). Your skills have been rewarded with over 37 BTC recovered. How many GPUs did you have to use to find the 130? Have you used an optimized version of your RCKangaroo or another tool?
|
|
|
|
RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 12, 2024, 02:59:13 PM |
|
wow, congratulations to you for finding #120, 125 and 130 (probably a world record for 130). Your skills have been rewarded with over 37 BTC recovered. How many GPUs did you have to use to find the 130? Have you used an optimized version of your RCKangaroo or another tool?
Less than 37BTC, check you calculations. Most of GPU code from RCKangaroo was used for #130. I spent two months, about 400 RTX4090, I was lucky and solved it twice faster than expected.
|
|
|
|
Lolo54
Member

Offline
Activity: 131
Merit: 32
|
 |
December 12, 2024, 03:09:15 PM |
|
As much for me 26 bTC I had not noticed that for #120 the price had not yet been *10! the reward remains correct. Thank you for your response and the details but 400 RTX 4090 for 2 months is colossal and impossible for 99.9% of people!! Well done anyway
|
|
|
|
albertajuelo
Newbie
Offline
Activity: 8
Merit: 2
|
 |
December 12, 2024, 04:58:47 PM |
|
Thanks for sharing your repositories and source code.
Great job!
|
|
|
|
RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 12, 2024, 05:33:29 PM |
|
Some explanations about other GPUs support: 1. I have zero interest in old cards (same for AMD cards) so I don't have them for development/tests and don't support them. 2. You can easily enable support for older nvidia cards, it will work, but my code is designed for the latest generation, for previous generations it's not optimal and the speed is not the best, that's why I disabled them.
|
|
|
|
albertajuelo
Newbie
Offline
Activity: 8
Merit: 2
|
 |
December 12, 2024, 05:57:12 PM Last edit: December 13, 2024, 10:21:19 AM by albertajuelo |
|
Executing the puzzle #85 on RTX 4060 CUDA devices: 1, CUDA driver/runtime: 12.7/12.6 GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU, 8.00 GB, 24 CUs, cap 8.9, PCI 1, L2 size: 32768 KB Total GPUs for work: 1
MAIN MODE
Solving public key X: 29C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A Y: 0E02C3AFD79913AB0961C95F12498F36A72FFA35C93AF27CEE30010FA6B51C53 Offset: 0000000000000000000000000000000000000000001000000000000000000000
Solving point: Range 84 bits, DP 16, start... SOTA method, estimated ops: 2^42.202, RAM for DPs: 3.062 GB. DP and GPU overheads not included! Estimated DPs per kangaroo: 523.378. GPU 0: allocated 841 MB, 147456 kangaroos. GPUs started... MAIN: Speed: 1180 MKeys/s, Err: 0, DPs: 174K/77175K, Time: 0d:00h:00m, Est: 0d:01h:11m ... MAIN: Speed: 1132 MKeys/s, Err: 0, DPs: 154839K/77175K, Time: 0d:02h:28m, Est: 0d:01h:14m Stopping work ... Point solved, K: 2.310 (with DP and GPU overheads)
PRIVATE KEY: 00000000000000000000000000000000000000000011720C4F018D51B8CEBBA8
I compare the number of operations with JLP: Kangaroo v2.1 Start:1000000000000000000000 Stop :1FFFFFFFFFFFFFFFFFFFFF Keys :1 Number of CPU thread: 0 Range width: 2^84 Jump Avg distance: 2^42.03 Number of kangaroos: 2^23.32 Suggested DP: 16 Expected operations: 2^43.12 Expected RAM: 6347.6MB DP size: 16 [0xFFFF000000000000] Quick comparison, its using the half of the RAM used with the same DP and less operations
|
|
|
|
b0dre
Jr. Member
Offline
Activity: 59
Merit: 1
|
 |
December 12, 2024, 09:15:31 PM |
|
As promised, here is the third and final part: RCKangaroo, Windows/Linux, open source: https://github.com/RetiredC/RCKangarooThis software demonstrates fast implementation of SOTA method and advanced loop handling on RTX40xx cards. Note that I have not included all possible optimizations because it's public code and I want to keep it as simple/readable as possible. Anyway, it's fast enough to demonstrate the advantage and you can improve it further if you have enough skills. Thank you so much, any advice for Linux users?
|
|
|
|
RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 12, 2024, 09:56:13 PM |
|
Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1. 3090 shows about 3GKeys/sec only, it can be really faster. Thank you so much, any advice for Linux users?
Linux exe is included as well, or you can compile it by yourself.
|
|
|
|
b0dre
Jr. Member
Offline
Activity: 59
Merit: 1
|
 |
December 12, 2024, 11:42:37 PM |
|
Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1. 3090 shows about 3GKeys/sec only, it can be really faster. Thank you so much, any advice for Linux users?
Linux exe is included as well, or you can compile it by yourself. Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks!
|
|
|
|
Omniavincitbit
Copper Member
Newbie
Offline
Activity: 7
Merit: 0
|
 |
December 13, 2024, 10:46:19 AM |
|
Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1. 3090 shows about 3GKeys/sec only, it can be really faster. Thank you so much, any advice for Linux users?
Linux exe is included as well, or you can compile it by yourself. Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks! have a Makefile compatible with linux ?
|
|
|
|
b0dre
Jr. Member
Offline
Activity: 59
Merit: 1
|
 |
December 13, 2024, 11:31:02 AM |
|
Ok, it's better to support 30xx cards even if RCKangaroo is not optimized for them, so I released v1.1. 3090 shows about 3GKeys/sec only, it can be really faster. Thank you so much, any advice for Linux users?
Linux exe is included as well, or you can compile it by yourself. Really? "3090 shows about 3GKeys/sec only" this is a record man, I can believe! I just can say Thanks! have a Makefile compatible with linux ? Sure CC = g++ NVCC = nvcc CFLAGS = -c -O3 -g # Added optimization and debugging LDFLAGS = -L/usr/local/cuda/lib64 -lcudart -I/usr/local/cuda/include ARCH_FLAGS = -arch=sm_86 # Explicitly specify the architecture for RTX 3060 OBJ = RCGpuCore.o Ec.o GpuKang.o RCKangaroo.o utils.o TARGET = RCKangaroo
all: $(TARGET)
$(TARGET): $(OBJ) $(CC) -o $(TARGET) $(OBJ) $(LDFLAGS)
RCGpuCore.o: RCGpuCore.cu $(NVCC) $(CFLAGS) $(ARCH_FLAGS) RCGpuCore.cu
Ec.o: Ec.cpp $(CC) $(CFLAGS) Ec.cpp
GpuKang.o: GpuKang.cpp $(CC) $(CFLAGS) GpuKang.cpp
RCKangaroo.o: RCKangaroo.cpp $(CC) $(CFLAGS) RCKangaroo.cpp
utils.o: utils.cpp $(CC) $(CFLAGS) utils.cpp
clean: rm -f *.o $(TARGET)
|
|
|
|
Omniavincitbit
Copper Member
Newbie
Offline
Activity: 7
Merit: 0
|
 |
December 13, 2024, 02:55:58 PM |
|
this is my version of make file
PROJECT = RCKangaroo CC = g++ NVCC = nvcc CFLAGS = -Wall -O2 INCLUDES = -I/usr/local/cuda/include LIBDIRS = -L/usr/local/cuda/lib64 LIBS_CUDA = -lcudart
# File sorgente CPP_FILES = $(wildcard *.cpp) CU_FILES = $(wildcard *.cu) OBJECTS = $(CPP_FILES:.cpp=.o) $(CU_FILES:.cu=.o)
all: $(PROJECT)
$(PROJECT): $(OBJECTS) $(NVCC) $(OBJECTS) -o $(PROJECT) $(LIBDIRS) $(LIBS_CUDA)
%.o: %.cpp $(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
%.o: %.cu $(NVCC) -O2 $(INCLUDES) -c $< -o $@
clean: rm -f $(PROJECT) $(OBJECTS)
|
|
|
|
makekang
Newbie
Offline
Activity: 2
Merit: 0
|
 |
December 14, 2024, 07:03:13 AM |
|
how to use it with RTX2070super, please, because I only have 2070, I am very interested in testing your work. I tried modifying the parameter settings but it failed.
|
|
|
|
RetiredCoder (OP)
Full Member
 
Offline
Activity: 128
Merit: 117
No pain, no gain!
|
 |
December 14, 2024, 09:05:25 AM |
|
how to use it with RTX2070super, please, because I only have 2070, I am very interested in testing your work. I tried modifying the parameter settings but it failed.
As far as I remember these cards have only 64KB of shared memory. Set JMP_CNT to 512 and change 17 to 16 in this line in KernelB: u64* table = LDS + 8 * JMP_CNT + 17 * THREAD_X; and recalculate LDS_SIZE_ constants. I think it's enough, though may be I forgot something... The main issue is not in compiling, but in optimizations, my code is not for 20xx and 30xx cards, so it won't work with good speed there. That's why I don't want to support old cards: if I support them officially but not optimize you will blame me that they have bad speed. But feel free to modify/optimize sources for your hardware 
|
|
|
|
Etar
|
 |
December 14, 2024, 11:58:14 AM |
|
That's why I don't want to support old cards: if I support them officially but not optimize you will blame me that they have bad speed. But feel free to modify/optimize sources for your hardware  I'll be honest, your kangaroo finds the key faster than mine or jlp. Yes, the speed shows less, but in the end it finds it much faster. Works even on 1660 super (~600Mkeys/s). Thanks for sharing.
|
|
|
|
MrGPBit
Newbie
Offline
Activity: 18
Merit: 1
|
 |
December 14, 2024, 12:40:25 PM |
|
I'll be honest, your kangaroo finds the key faster than mine or jlp. Yes, the speed shows less, but in the end it finds it much faster. Works even on 1660 super (~600Mkeys/s). Thanks for sharing.
@Etar how did you do everything to make your GTX 1660 work? Can you tell me all the changes and show me them? Many thanks
|
|
|
|
Etar
|
 |
December 14, 2024, 12:52:41 PM Last edit: December 14, 2024, 01:49:58 PM by Etar |
|
@Etar how did you do everything to make your GTX 1660 work? Can you tell me all the changes and show me them? Many thanks
file: RCGpuCore.cu line 285: u64* table = LDS + 8 * JMP_CNT + 16 * THREAD_X; file: RCKangaroo.cpp Line 99: if (deviceProp.major < 6) file: defs.h #define LDS_SIZE_A (64 * 1024) #define LDS_SIZE_B (64 * 1024) #define LDS_SIZE_C (64 * 1024) #define JMP_CNT 512 file: RCKangaroo.vcxproj line 118: <CodeGeneration>compute_75,sm_75;compute_75,sm_75</CodeGeneration> line 141: <CodeGeneration>compute_75,sm_75;compute_75,sm_75</CodeGeneration> CUDA devices: 1, CUDA driver/runtime: 12.6/12.1 GPU 0: NVIDIA GeForce GTX 1660 SUPER, 6.00 GB, 22 CUs, cap 7.5, PCI 1, L2 size: 1536 KB Total GPUs for work: 1
MAIN MODE
Solving public key X: 29C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A Y: 0E02C3AFD79913AB0961C95F12498F36A72FFA35C93AF27CEE30010FA6B51C53 Offset: 0000000000000000000000000000000000000000001000000000000000000000
Solving point: Range 84 bits, DP 16, start... SOTA method, estimated ops: 2^42.202, RAM for DPs: 3.062 GB. DP and GPU overheads not included! Estimated DPs per kangaroo: 570.958. GPU 0: allocated 772 MB, 135168 kangaroos. GPUs started... MAIN: Speed: 599 MKeys/s, Err: 0, DPs: 88K/77175K, Time: 0d:00h:00m:10s, Est: 0d:02h:20m:43s
|
|
|
|
|