Bitcoin Forum
May 11, 2024, 09:14:12 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: secp256k1 library in pure assembly  (Read 1055 times)
ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
November 28, 2022, 12:48:10 PM
Merited by NotATether (10), Welsh (3), vapourminer (2), ABCbits (1)
 #1

Hello All


I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.

anybody knows?

Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
If you see garbage posts (off-topic, trolling, spam, no point, etc.), use the "report to moderator" links. All reports are investigated, though you will rarely be contacted about your reports.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715418852
Hero Member
*
Offline Offline

Posts: 1715418852

View Profile Personal Message (Offline)

Ignore
1715418852
Reply with quote  #2

1715418852
Report to moderator
PowerGlove
Hero Member
*****
hacker
Offline Offline

Activity: 510
Merit: 4005



View Profile
November 28, 2022, 01:45:03 PM
 #2

Off-topic (sorry), but just wanted to say that seeing questions like this on Bitcointalk really cheers me up! Smiley

I'm probably reading too much into it (nostalgia will sometimes do that to you), but someone asking for an assembly listing is (no joke) the highlight of my week (I really miss the old days).

Anybody else remember downloading TASM/MASM/NASM over dial-up and working through binders of printed out tutorials? Grin
ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
November 28, 2022, 01:55:38 PM
Merited by Welsh (2)
 #3

I have check and analyse for comparison : sha256 in pure asm (rewrite by myself) and c++.
in pure asm we have almost 120x faster than c++ (in line asm optimised)

so you will be shocked when you will check how many "optimised " are design in pure asm for differents mathematics problems in 2022 yr:)



Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
garlonicon
Hero Member
*****
Offline Offline

Activity: 804
Merit: 1932


View Profile
November 28, 2022, 03:49:41 PM
Last edit: November 28, 2022, 04:12:00 PM by garlonicon
Merited by Welsh (4), vapourminer (1), ABCbits (1)
 #4

Quote
I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.
If you are looking for a library, then check headers like "immintrin.h". The keyword is "intrinsics", for example x86 intrinsics list that means you can just call a C-like function in your code, and it will be converted to the pure assembly instruction when compiled. It will be easier to write some code in C++ or similar language, and call some functions, than writing everything in assembly, unless you know assembly very well.

Quote
Anybody else remember downloading TASM/MASM/NASM over dial-up and working through binders of printed out tutorials?
I used FASM and downloaded it in modern times, few years ago, when exploring how BIOS is constructed: http://flatassembler.net/
I also used it some time ago when I tried to write my own operating system from scratch: https://wiki.osdev.org/Main_Page

Quote
so you will be shocked when you will check how many "optimised " are design in pure asm for differents mathematics problems in 2022 yr:)
Well, assembly can speed things up if you can use the right opcodes, and if your processor supports it. In other cases, you may end up with code, that is correct, but not supported by your processor. So, the first thing is checking your hardware, and what is available, because some opcodes may trigger an error. Also, it is a high chance that using your CPU is not the best way of solving that, and if your code will be hardware-specific by design, then it may be profitable to prepare your code for some GPU or ASIC (but then you probably would need some custom hardware).

Edit: Also note that typical compilers has some flags that can be used to optimize it for some architecture with a given features. Compilers like GCC can produce those instructions, so check them first, it may be faster than your code in assembly, unless you really know that language well.
PowerGlove
Hero Member
*****
hacker
Offline Offline

Activity: 510
Merit: 4005



View Profile
November 28, 2022, 05:26:53 PM
 #5

I used FASM and downloaded it in modern times, few years ago, when exploring how BIOS is constructed: http://flatassembler.net/
Yup. FASM is pretty special and Tomasz Grysztar is an exceptional programmer! Writing a self-hosting assembler puts him in a very small group.

I also used it some time ago when I tried to write my own operating system from scratch: https://wiki.osdev.org/Main_Page
Nice one! 0xAA55 (and 0x7C00) is burned in my memory, too. Grin
pooya87
Legendary
*
Offline Offline

Activity: 3444
Merit: 10558



View Profile
November 29, 2022, 03:50:38 AM
Merited by NeuroticFish (2), ABCbits (2), NotATether (2), garlonicon (2), PowerGlove (2), vapourminer (1), DdmrDdmr (1)
 #6

I have check and analyse for comparison : sha256 in pure asm (rewrite by myself) and c++.
in pure asm we have almost 120x faster than c++ (in line asm optimised)
Have you ever published this code or has anybody else (more specifically a c++ expert) seen the code because 120x speed up does not sound right to me unless the code written in c++ is bad or entirely different (eg. simple implementation of SHA256 vs using intel SHA intrinsics) or your benchmark could be flawed.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6735


bitcoincleanup.com / bitmixlist.org


View Profile WWW
November 29, 2022, 06:02:18 PM
 #7

Hello All


I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.

anybody knows?

I mean if you challenge me to do it, I might actually come up with an optimized secp256k1 ASM for GNU/Linux one day... who knows  Cheesy

I'm already working on a version that uses GMP which itself is heavily optimized.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
November 30, 2022, 03:06:51 PM
 #8

as I see there is no known library for this

Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
albert0bsd
Hero Member
*****
Offline Offline

Activity: 856
Merit: 662



View Profile WWW
December 01, 2022, 12:56:23 AM
Last edit: December 01, 2022, 01:13:11 AM by albert0bsd
 #9

I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.

Write code in ASM is really hard, i have a long time without write anythin in ASM by my self the last code that check in ASM and edit just some lines was the libaesni for some of my old projects.

If there are any other developers interesting in write this code please let me know.

pooya87
Legendary
*
Offline Offline

Activity: 3444
Merit: 10558



View Profile
December 01, 2022, 03:59:44 AM
 #10

as I see there is no known library for this
Writing an entire ECC library in ASM is impossible, we are talking about thousands of lines of code that would be a lot more in ASM and as I said before the benefits is not as great as you'd think. However parts of the code can be written in ASM like what libsecp256k1 does by writing the field element code in ASM.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
PowerGlove
Hero Member
*****
hacker
Offline Offline

Activity: 510
Merit: 4005



View Profile
December 01, 2022, 09:23:44 AM
Merited by Welsh (4), ABCbits (3), pooya87 (2), vapourminer (1), npuath (1)
 #11

as I see there is no known library for this
Writing an entire ECC library in ASM is impossible, [...]
I don't know about that, man; heavier lifts have been made before. FASM is one example (an assembler written in assembly). If I check how many lines of code that has:

Code:
grep '^$' -rv ./fasm-1.73.30/fasm/source | wc -l

I get 35483. Even an elaborate, fully-featured secp256k1 library in x86-64 assembly would fit (more than) comfortably in 1/4 of that.

If that doesn't convince you (i.e. you feel that a significant fraction of FASM's source code is likely table-generated) then think of feats like the first RollerCoaster Tycoon game: Chris Sawyer wrote that in (99%) assembly. I don't know how familiar you are with gamedev, but something like RollerCoaster Tycoon completely dwarfs a secp256k1 library in terms of complexity.
ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
December 01, 2022, 10:44:00 AM
Merited by Welsh (2)
 #12

in secp256k1 - we have only five main macros:
- add   - easy implement
- substract - easy implement
- multiply - easy implement
- divide - easy implement

and it is "easy" problem.

The main problem in secp256k1 is modulo p

Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
albert0bsd
Hero Member
*****
Offline Offline

Activity: 856
Merit: 662



View Profile WWW
December 01, 2022, 12:33:33 PM
Merited by Welsh (4), NotATether (3)
 #13

The main problem in secp256k1 is modulo p

Actually we only need a good framework to do operations with big numbers, this also using all the capabilities of modern CPU.

Regards.

NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6735


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 01, 2022, 06:04:10 PM
 #14

The main problem in secp256k1 is modulo p

Actually we only need a good framework to do operations with big numbers, this also using all the capabilities of modern CPU.

Regards.


Literally this.

Have you heard of GAP? It's a C language framework for doing huge integer math and knows about group theory and such. A really smart assembly guy recommended it to me a few months ago.

https://www.gap-system.org/Download/

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
albert0bsd
Hero Member
*****
Offline Offline

Activity: 856
Merit: 662



View Profile WWW
December 02, 2022, 03:45:20 PM
 #15

also some good ecc implementation based on gmp: https://github.com/masterzorag/ec_gmp

I used that library for some tools that I made but it is not optimized for secp256k1 also it is some kind of vulnerable to some side channels attacks and incomplete because it declare EC.b parameter but it never use.

A lot of improvements can be made to that implementation.

The fastest implementation for secp256k1 code that I ever see and use it is already inside of kangaroo tool.

https://github.com/JeanLucPons/Kangaroo/tree/master/SECPK1

Same library that I actually use in my keyhunt code.





NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6735


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 02, 2022, 05:42:43 PM
 #16

The fastest implementation for secp256k1 code that I ever see and use it is already inside of kangaroo tool.

https://github.com/JeanLucPons/Kangaroo/tree/master/SECPK1

I wonder if there is a way to optimize it further though? Do you know whether it's making use of SSE?  But even more important than that, maybe there's a series of assembly instructions you can run to run repeated calls faster.

But since I use secp256k1 curve only for testing and research I do no care much for any of  possible vulnerabilities and attacks.

The safest (not necessary the fastest) secp256k1 is the one used in Bitcoin Core. But I don't use it because I keep getting wrong answers when I do arithmetic. Maybe the privkey bytes are not being filled correctly or something.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
albert0bsd
Hero Member
*****
Offline Offline

Activity: 856
Merit: 662



View Profile WWW
December 02, 2022, 07:11:36 PM
 #17

And EC.b can be used here:

Thank you for Pointing where b is used, I recheck the code and that function is not there, anyway it is easy to add.


Do you know whether it's making use of SSE?  But even more important than that, maybe there's a series of assembly instructions you can run to run repeated calls faster.

The SSE are used in some other code of JLP those are used to generate 4 sha256 and 4 rmd160 hashes of fixed data at the same time.

That is only useful when you are working with addresses or hashes rmd160 those SSE functions boost that part of the code some 2 or 3 times faster.

Code here:

https://github.com/JeanLucPons/VanitySearch/blob/master/SECP256K1.cpp#L380

Functions are inside of the hash folder

Also I use those in keyhunt for address and rmd160 search.

About if that code can be optimized, I already check it a lot of times and it is already optimized.

Regards

ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
December 03, 2022, 11:58:35 AM
 #18

but still it is slower than in pure asm.

at the moment I implemented "adding" in pure asm 256/512 bit

it is fastest than c++

(without modulo)



Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
albert0bsd
Hero Member
*****
Offline Offline

Activity: 856
Merit: 662



View Profile WWW
December 03, 2022, 02:38:24 PM
Merited by NotATether (5), ABCbits (2), vapourminer (1)
 #19

Ok, lets to to write it by parts using openAI

Code:
Write in assembly a code to add two numbers of 512 bits

Code:
; Define the two 512-bit numbers to be added
.data
num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF
num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA

; Define the 512-bit result
result: .quad 0, 0, 0, 0

; Main program
.text
.global main
main:

  ; Initialize the result to 0
  mov $0, %rax
  mov %rax, result(,%rax,8)

  ; Loop through each of the 64-bit words in the numbers
  mov $0, %rax
  loop:

    ; Add the corresponding 64-bit words from num1 and num2
    movq num1(,%rax,8), %rdx
    addq num2(,%rax,8), %rdx

    ; Store the result in the corresponding 64-bit word in the result
    mov %rdx, result(,%rax,8)

    ; Increment the loop counter and check if it has reached 4
    inc %rax
    cmp $4, %rax
    jne loop

  ; End the program
  ret

NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6735


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 03, 2022, 04:25:27 PM
 #20

Ok, lets to to write it by parts using openAI

Code:
Write in assembly a code to add two numbers of 512 bits

Code:
; Define the two 512-bit numbers to be added
.data
num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF
num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA

; Define the 512-bit result
result: .quad 0, 0, 0, 0

; Main program
.text
.global main
main:

  ; Initialize the result to 0
  mov $0, %rax
  mov %rax, result(,%rax,8)

  ; Loop through each of the 64-bit words in the numbers
  mov $0, %rax
  loop:

    ; Add the corresponding 64-bit words from num1 and num2
    movq num1(,%rax,8), %rdx
    addq num2(,%rax,8), %rdx

    ; Store the result in the corresponding 64-bit word in the result
    mov %rdx, result(,%rax,8)

    ; Increment the loop counter and check if it has reached 4
    inc %rax
    cmp $4, %rax
    jne loop

  ; End the program
  ret

Damn! I didn't know OpenAI could write code.

I used Dall-E to generate images before, but I wasn't aware of anything like this. Yeah, I've used Copilot, but I haven't generated any assembly with it.

This stuff could be very useful if it indeed works (AI generated code is sometimes buggy). It may not know how to generate a secp256k1 operation in ASM yet, but I think we'll get there soon (plus, ARM support!).

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!