ecdsa123 (OP)
Full Member
Offline
Activity: 207
Merit: 105
Dr WHO on disney+
|
Hello All
I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.
anybody knows?
|
|
|
|
PowerGlove
|
|
November 28, 2022, 01:45:03 PM |
|
Off-topic (sorry), but just wanted to say that seeing questions like this on Bitcointalk really cheers me up! I'm probably reading too much into it (nostalgia will sometimes do that to you), but someone asking for an assembly listing is (no joke) the highlight of my week (I really miss the old days). Anybody else remember downloading TASM/MASM/NASM over dial-up and working through binders of printed out tutorials?
|
|
|
|
ecdsa123 (OP)
Full Member
Offline
Activity: 207
Merit: 105
Dr WHO on disney+
|
|
November 28, 2022, 01:55:38 PM |
|
I have check and analyse for comparison : sha256 in pure asm (rewrite by myself) and c++. in pure asm we have almost 120x faster than c++ (in line asm optimised)
so you will be shocked when you will check how many "optimised " are design in pure asm for differents mathematics problems in 2022 yr:)
|
|
|
|
garlonicon
|
|
November 28, 2022, 03:49:41 PM Last edit: November 28, 2022, 04:12:00 PM by garlonicon Merited by Welsh (4), vapourminer (1), ABCbits (1) |
|
I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version. If you are looking for a library, then check headers like "immintrin.h". The keyword is "intrinsics", for example x86 intrinsics list that means you can just call a C-like function in your code, and it will be converted to the pure assembly instruction when compiled. It will be easier to write some code in C++ or similar language, and call some functions, than writing everything in assembly, unless you know assembly very well. Anybody else remember downloading TASM/MASM/NASM over dial-up and working through binders of printed out tutorials? I used FASM and downloaded it in modern times, few years ago, when exploring how BIOS is constructed: http://flatassembler.net/I also used it some time ago when I tried to write my own operating system from scratch: https://wiki.osdev.org/Main_Pageso you will be shocked when you will check how many "optimised " are design in pure asm for differents mathematics problems in 2022 yr:) Well, assembly can speed things up if you can use the right opcodes, and if your processor supports it. In other cases, you may end up with code, that is correct, but not supported by your processor. So, the first thing is checking your hardware, and what is available, because some opcodes may trigger an error. Also, it is a high chance that using your CPU is not the best way of solving that, and if your code will be hardware-specific by design, then it may be profitable to prepare your code for some GPU or ASIC (but then you probably would need some custom hardware). Edit: Also note that typical compilers has some flags that can be used to optimize it for some architecture with a given features. Compilers like GCC can produce those instructions, so check them first, it may be faster than your code in assembly, unless you really know that language well.
|
|
|
|
PowerGlove
|
|
November 28, 2022, 05:26:53 PM |
|
I used FASM and downloaded it in modern times, few years ago, when exploring how BIOS is constructed: http://flatassembler.net/Yup. FASM is pretty special and Tomasz Grysztar is an exceptional programmer! Writing a self-hosting assembler puts him in a very small group. Nice one! 0xAA55 (and 0x7C00) is burned in my memory, too.
|
|
|
|
pooya87
Legendary
Offline
Activity: 3542
Merit: 10750
|
|
November 29, 2022, 03:50:38 AM |
|
I have check and analyse for comparison : sha256 in pure asm (rewrite by myself) and c++. in pure asm we have almost 120x faster than c++ (in line asm optimised)
Have you ever published this code or has anybody else (more specifically a c++ expert) seen the code because 120x speed up does not sound right to me unless the code written in c++ is bad or entirely different (eg. simple implementation of SHA256 vs using intel SHA intrinsics) or your benchmark could be flawed.
|
|
|
|
NotATether
Legendary
Offline
Activity: 1694
Merit: 7133
In memory of o_e_l_e_o
|
|
November 29, 2022, 06:02:18 PM |
|
Hello All
I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.
anybody knows?
I mean if you challenge me to do it, I might actually come up with an optimized secp256k1 ASM for GNU/Linux one day... who knows I'm already working on a version that uses GMP which itself is heavily optimized.
|
|
|
|
ecdsa123 (OP)
Full Member
Offline
Activity: 207
Merit: 105
Dr WHO on disney+
|
|
November 30, 2022, 03:06:51 PM |
|
as I see there is no known library for this
|
|
|
|
albert0bsd
|
|
December 01, 2022, 12:56:23 AM Last edit: December 01, 2022, 01:13:11 AM by albert0bsd |
|
I'm looking for library written in pure nasm/masm - assembly for x8086-64 (not arm) Intel version.
Write code in ASM is really hard, i have a long time without write anythin in ASM by my self the last code that check in ASM and edit just some lines was the libaesni for some of my old projects. If there are any other developers interesting in write this code please let me know.
|
|
|
|
pooya87
Legendary
Offline
Activity: 3542
Merit: 10750
|
|
December 01, 2022, 03:59:44 AM |
|
as I see there is no known library for this
Writing an entire ECC library in ASM is impossible, we are talking about thousands of lines of code that would be a lot more in ASM and as I said before the benefits is not as great as you'd think. However parts of the code can be written in ASM like what libsecp256k1 does by writing the field element code in ASM.
|
|
|
|
PowerGlove
|
as I see there is no known library for this
Writing an entire ECC library in ASM is impossible, [...]I don't know about that, man; heavier lifts have been made before. FASM is one example (an assembler written in assembly). If I check how many lines of code that has: grep '^$' -rv ./fasm-1.73.30/fasm/source | wc -l I get 35483. Even an elaborate, fully-featured secp256k1 library in x86-64 assembly would fit (more than) comfortably in 1/4 of that. If that doesn't convince you (i.e. you feel that a significant fraction of FASM's source code is likely table-generated) then think of feats like the first RollerCoaster Tycoon game: Chris Sawyer wrote that in (99%) assembly. I don't know how familiar you are with gamedev, but something like RollerCoaster Tycoon completely dwarfs a secp256k1 library in terms of complexity.
|
|
|
|
ecdsa123 (OP)
Full Member
Offline
Activity: 207
Merit: 105
Dr WHO on disney+
|
|
December 01, 2022, 10:44:00 AM |
|
in secp256k1 - we have only five main macros: - add - easy implement - substract - easy implement - multiply - easy implement - divide - easy implement
and it is "easy" problem.
The main problem in secp256k1 is modulo p
|
|
|
|
albert0bsd
|
The main problem in secp256k1 is modulo p
Actually we only need a good framework to do operations with big numbers, this also using all the capabilities of modern CPU. Regards.
|
|
|
|
NotATether
Legendary
Offline
Activity: 1694
Merit: 7133
In memory of o_e_l_e_o
|
|
December 01, 2022, 06:04:10 PM |
|
The main problem in secp256k1 is modulo p
Actually we only need a good framework to do operations with big numbers, this also using all the capabilities of modern CPU. Regards. Literally this. Have you heard of GAP? It's a C language framework for doing huge integer math and knows about group theory and such. A really smart assembly guy recommended it to me a few months ago. https://www.gap-system.org/Download/
|
|
|
|
albert0bsd
|
|
December 02, 2022, 03:45:20 PM |
|
I used that library for some tools that I made but it is not optimized for secp256k1 also it is some kind of vulnerable to some side channels attacks and incomplete because it declare EC.b parameter but it never use. A lot of improvements can be made to that implementation. The fastest implementation for secp256k1 code that I ever see and use it is already inside of kangaroo tool. https://github.com/JeanLucPons/Kangaroo/tree/master/SECPK1Same library that I actually use in my keyhunt code.
|
|
|
|
NotATether
Legendary
Offline
Activity: 1694
Merit: 7133
In memory of o_e_l_e_o
|
|
December 02, 2022, 05:42:43 PM |
|
I wonder if there is a way to optimize it further though? Do you know whether it's making use of SSE? But even more important than that, maybe there's a series of assembly instructions you can run to run repeated calls faster. But since I use secp256k1 curve only for testing and research I do no care much for any of possible vulnerabilities and attacks.
The safest (not necessary the fastest) secp256k1 is the one used in Bitcoin Core. But I don't use it because I keep getting wrong answers when I do arithmetic. Maybe the privkey bytes are not being filled correctly or something.
|
|
|
|
albert0bsd
|
|
December 02, 2022, 07:11:36 PM |
|
And EC.b can be used here:
Thank you for Pointing where b is used, I recheck the code and that function is not there, anyway it is easy to add. Do you know whether it's making use of SSE? But even more important than that, maybe there's a series of assembly instructions you can run to run repeated calls faster.
The SSE are used in some other code of JLP those are used to generate 4 sha256 and 4 rmd160 hashes of fixed data at the same time. That is only useful when you are working with addresses or hashes rmd160 those SSE functions boost that part of the code some 2 or 3 times faster. Code here: https://github.com/JeanLucPons/VanitySearch/blob/master/SECP256K1.cpp#L380Functions are inside of the hash folder Also I use those in keyhunt for address and rmd160 search. About if that code can be optimized, I already check it a lot of times and it is already optimized. Regards
|
|
|
|
ecdsa123 (OP)
Full Member
Offline
Activity: 207
Merit: 105
Dr WHO on disney+
|
|
December 03, 2022, 11:58:35 AM |
|
but still it is slower than in pure asm.
at the moment I implemented "adding" in pure asm 256/512 bit
it is fastest than c++
(without modulo)
|
|
|
|
albert0bsd
|
|
December 03, 2022, 02:38:24 PM |
|
Ok, lets to to write it by parts using openAI Write in assembly a code to add two numbers of 512 bits ; Define the two 512-bit numbers to be added .data num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA
; Define the 512-bit result result: .quad 0, 0, 0, 0
; Main program .text .global main main:
; Initialize the result to 0 mov $0, %rax mov %rax, result(,%rax,8)
; Loop through each of the 64-bit words in the numbers mov $0, %rax loop:
; Add the corresponding 64-bit words from num1 and num2 movq num1(,%rax,8), %rdx addq num2(,%rax,8), %rdx
; Store the result in the corresponding 64-bit word in the result mov %rdx, result(,%rax,8)
; Increment the loop counter and check if it has reached 4 inc %rax cmp $4, %rax jne loop
; End the program ret
|
|
|
|
NotATether
Legendary
Offline
Activity: 1694
Merit: 7133
In memory of o_e_l_e_o
|
|
December 03, 2022, 04:25:27 PM |
|
Ok, lets to to write it by parts using openAI Write in assembly a code to add two numbers of 512 bits ; Define the two 512-bit numbers to be added .data num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA
; Define the 512-bit result result: .quad 0, 0, 0, 0
; Main program .text .global main main:
; Initialize the result to 0 mov $0, %rax mov %rax, result(,%rax,8)
; Loop through each of the 64-bit words in the numbers mov $0, %rax loop:
; Add the corresponding 64-bit words from num1 and num2 movq num1(,%rax,8), %rdx addq num2(,%rax,8), %rdx
; Store the result in the corresponding 64-bit word in the result mov %rdx, result(,%rax,8)
; Increment the loop counter and check if it has reached 4 inc %rax cmp $4, %rax jne loop
; End the program ret
Damn! I didn't know OpenAI could write code. I used Dall-E to generate images before, but I wasn't aware of anything like this. Yeah, I've used Copilot, but I haven't generated any assembly with it. This stuff could be very useful if it indeed works (AI generated code is sometimes buggy). It may not know how to generate a secp256k1 operation in ASM yet, but I think we'll get there soon (plus, ARM support!).
|
|
|
|
|