secp256k1 library in pure assembly

NotATether

Legendary

Offline

Activity: 1596
Merit: 6726

bitcoincleanup.com / bitmixlist.org

Re: secp256k1 library in pure assembly

December 27, 2022, 05:43:41 PM

#41

Quote from: arulbero on December 26, 2022, 06:48:47 PM

You have to distinguish between:

1) field operations (mod p, space of coordinates x and y):

https://github.com/bitcoin-core/secp256k1/tree/master/src/ all files with name : field*

and

2) scalar operations (mod n, space of private keys):

https://github.com/bitcoin-core/secp256k1/tree/master/src/ all files with name : scalar*

...

Alright.

There's one other thing to address: in the secp256k1_fe_mul (or something like that) function, the all but the last leg are multiplied by the constant R. This causes a result different from when I calculated an example in Python. So inside the fe_mul function, I need to modify it to avoid multiplying the values in the result (stack) by R, and send that multiplication to a temporary instead.

.
.^{BLACKJACK^♠FUN.}

███▄██████
██████████████▀
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
█████████████████▄
░██████████████▀
████████████
███████████████░██
██████████

CRYPTO CASINO &
SPORTS BETTING

│

.
REWARD SYSTEM
.ELITE SEAT..

│

▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
███████████████████

.

Whitepaper
JACK

Pieter Wuille

Legendary

Offline

Activity: 1072
Merit: 1174

Re: secp256k1 library in pure assembly

December 28, 2022, 03:19:38 PM

#42

Quote from: NotATether on December 27, 2022, 05:43:41 PM

Alright.

There's one other thing to address: in the secp256k1_fe_mul (or something like that) function, the all but the last leg are multiplied by the constant R. This causes a result different from when I calculated an example in Python. So inside the fe_mul function, I need to modify it to avoid multiplying the values in the result (stack) by R, and send that multiplication to a temporary instead.

That makes no sense; fe_mul just multiplies two field elements modulo p. That R constant is an implementation detail, that even differs between 32-bit and 64-bit platforms. It's not actually multiplying the result by this value.

Note that field elements are internally stored in a denormalized representation where the limbs can overflow. If you want to convert it to a portable format, use fe_get_b32.

I do Bitcoin stuff.

NotATether

Legendary

Offline

Activity: 1596
Merit: 6726

bitcoincleanup.com / bitmixlist.org

Re: secp256k1 library in pure assembly

December 28, 2022, 04:41:36 PM

#43

Quote from: Pieter Wuille on December 28, 2022, 03:19:38 PM

Note that field elements are internally stored in a denormalized representation where the limbs can overflow. If you want to convert it to a portable format, use fe_get_b32.

That must be why I was getting different results while testing. I'll check out this function and run my C++ and Python mod-mul programs again. It will be interesting to see the results of this.

.
.^{BLACKJACK^♠FUN.}

███▄██████
██████████████▀
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
█████████████████▄
░██████████████▀
████████████
███████████████░██
██████████

CRYPTO CASINO &
SPORTS BETTING

│

.
REWARD SYSTEM
.ELITE SEAT..

│

▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
███████████████████

.

Whitepaper
JACK

esa_a

Newbie

Offline

Activity: 1
Merit: 0

Re: secp256k1 library in pure assembly

February 09, 2023, 08:55:12 PM

#44

Quote from: AlexanderCurl on December 22, 2022, 05:17:29 AM

And moreover VanitySeacrh is only fast at "Point pub = secp256k1->ComputePublicKey(&privKey);" since it generates a "Point GTable[256*32];"
to be used in "Q = Add2(Q, GTable[256 * i + (b-1)])" having Jacobian Coordinates representation as calculation scheme.
If you take Add2 to add one million points in a sequence it is faster than classic point_addition with inversion by far.
But to use any Jacobian point after calculation further you need to use "Q.Reduce();" beforehand the same thing as inversion.
And in that situation classic scheme using GMP becomes faster since you just add point and can use it further immediately without any Reduce().

Hey, did you try piggypiggy's implementation? Was it any faster?

alexeyneu

Member

Offline

Activity: 312
Merit: 30

Re: secp256k1 library in pure assembly

February 09, 2023, 11:03:30 PM
Last edit: February 10, 2023, 12:02:10 AM by alexeyneu

Merited by vapourminer (1), Welsh (1)

#45

Quote from: albert0bsd on December 03, 2022, 02:38:24 PM

Ok, lets to to write it by parts using openAI

Code:

Write in assembly a code to add two numbers of 512 bits

Code:

; Define the two 512-bit numbers to be added
.data
num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF
num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA

; Define the 512-bit result
result: .quad 0, 0, 0, 0

; Main program
.text
.global main
main:

  ; Initialize the result to 0
  mov $0, %rax
  mov %rax, result(,%rax,8)

  ; Loop through each of the 64-bit words in the numbers
  mov $0, %rax
  loop:

    ; Add the corresponding 64-bit words from num1 and num2
    movq num1(,%rax,8), %rdx
    addq num2(,%rax,8), %rdx

    ; Store the result in the corresponding 64-bit word in the result
    mov %rdx, result(,%rax,8)

    ; Increment the loop counter and check if it has reached 4
    inc %rax
    cmp $4, %rax
    jne loop

  ; End the program
  ret

but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?

NotATether

Legendary

Offline

Activity: 1596
Merit: 6726

bitcoincleanup.com / bitmixlist.org

Re: secp256k1 library in pure assembly

February 14, 2023, 08:15:08 AM

Merited by vapourminer (1)

#46

Quote from: alexeyneu on February 09, 2023, 11:03:30 PM

but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?

There is really no point to be writing code in assembly that is not using instructions that are faster than the ones that gcc is compiling down to.

For example, there are a bunch of MOVs, CMPs, JMPs, Add/Xor/Lea instructions when you compile some C file down to assembly. There are only two ways to make this faster:

1 - you can somehow reformat the assembly to remove excessive MOVs, so that its using as few instructions as possible (will not result in a large performance improvement)
2 - your use case can be accelerated by SIMD instructions (will result in a much faster performance).

.
.^{BLACKJACK^♠FUN.}

███▄██████
██████████████▀
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
█████████████████▄
░██████████████▀
████████████
███████████████░██
██████████

CRYPTO CASINO &
SPORTS BETTING

│

.
REWARD SYSTEM
.ELITE SEAT..

│

▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
███████████████████

.

Whitepaper
JACK

alexeyneu

Member

Offline

Activity: 312
Merit: 30

Re: secp256k1 library in pure assembly

February 14, 2023, 07:13:34 PM
Last edit: February 14, 2023, 07:42:59 PM by alexeyneu

#47

Quote from: NotATether on February 14, 2023, 08:15:08 AM

Quote from: alexeyneu on February 09, 2023, 11:03:30 PM

but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?

There is really no point to be writing code in assembly that is not using instructions that are faster than the ones that gcc is compiling down to.

For example, there are a bunch of MOVs, CMPs, JMPs, Add/Xor/Lea instructions when you compile some C file down to assembly. There are only two ways to make this faster:

1 - you can somehow reformat the assembly to remove excessive MOVs, so that its using as few instructions as possible (will not result in a large performance improvement)
2 - your use case can be accelerated by SIMD instructions (will result in a much faster performance).

but you can do simd in c. stuff like this _mm_setzero_si128() . in protected mode os you executable is just a task with little to no hw access. so asm has no much sense . if you remove some mov's in exe it'll not increase performance
about this code(or smth like that) posted here - it's something stylized to msdos real mode asm . it's unrelated to binary produced after it.