Bitcoin Forum
May 03, 2024, 06:35:41 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3]  All
  Print  
Author Topic: secp256k1 library in pure assembly  (Read 1054 times)
NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6726


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 27, 2022, 05:43:41 PM
 #41

You have to distinguish between:

1) field operations (mod p, space of coordinates x and y):

https://github.com/bitcoin-core/secp256k1/tree/master/src/   all files with name : field*

and

2) scalar operations (mod n, space of private keys):

https://github.com/bitcoin-core/secp256k1/tree/master/src/  all files with name : scalar*


...

Alright.

There's one other thing to address: in the secp256k1_fe_mul (or something like that) function, the all but the last leg are multiplied by the constant R. This causes a result different from when I calculated an example in Python. So inside the fe_mul function, I need to modify it to avoid multiplying the values in the result (stack) by R, and send that multiplication to a temporary instead.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
1714761341
Hero Member
*
Offline Offline

Posts: 1714761341

View Profile Personal Message (Offline)

Ignore
1714761341
Reply with quote  #2

1714761341
Report to moderator
1714761341
Hero Member
*
Offline Offline

Posts: 1714761341

View Profile Personal Message (Offline)

Ignore
1714761341
Reply with quote  #2

1714761341
Report to moderator
1714761341
Hero Member
*
Offline Offline

Posts: 1714761341

View Profile Personal Message (Offline)

Ignore
1714761341
Reply with quote  #2

1714761341
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1174


View Profile WWW
December 28, 2022, 03:19:38 PM
 #42

Alright.

There's one other thing to address: in the secp256k1_fe_mul (or something like that) function, the all but the last leg are multiplied by the constant R. This causes a result different from when I calculated an example in Python. So inside the fe_mul function, I need to modify it to avoid multiplying the values in the result (stack) by R, and send that multiplication to a temporary instead.

That makes no sense; fe_mul just multiplies two field elements modulo p. That R constant is an implementation detail, that even differs between 32-bit and 64-bit platforms. It's not actually multiplying the result by this value.

Note that field elements are internally stored in a denormalized representation where the limbs can overflow. If you want to convert it to a portable format, use fe_get_b32.

I do Bitcoin stuff.
NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6726


bitcoincleanup.com / bitmixlist.org


View Profile WWW
December 28, 2022, 04:41:36 PM
 #43

Note that field elements are internally stored in a denormalized representation where the limbs can overflow. If you want to convert it to a portable format, use fe_get_b32.

That must be why I was getting different results while testing. I'll check out this function and run my C++ and Python mod-mul programs again. It will be interesting to see the results of this.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
esa_a
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
February 09, 2023, 08:55:12 PM
 #44

And moreover VanitySeacrh is only fast at "Point pub = secp256k1->ComputePublicKey(&privKey);" since it generates a "Point GTable[256*32];"
to be used in "Q = Add2(Q, GTable[256 * i + (b-1)])" having Jacobian Coordinates representation as calculation scheme.
If you take Add2 to add one million points in a sequence it is faster than classic point_addition with inversion by far.
But to use any Jacobian point after calculation further you need to use "Q.Reduce();" beforehand the same thing as inversion.
And in that situation classic scheme using GMP becomes faster since you just add point and can use it further immediately without any Reduce().

Hey, did you try piggypiggy's implementation? Was it any faster?
alexeyneu
Member
**
Offline Offline

Activity: 312
Merit: 30


View Profile
February 09, 2023, 11:03:30 PM
Last edit: February 10, 2023, 12:02:10 AM by alexeyneu
Merited by vapourminer (1), Welsh (1)
 #45

Ok, lets to to write it by parts using openAI

Code:
Write in assembly a code to add two numbers of 512 bits

Code:
; Define the two 512-bit numbers to be added
.data
num1: .quad 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF, 0x1234567890ABCDEF
num2: .quad 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA, 0x0987654321FEDCBA

; Define the 512-bit result
result: .quad 0, 0, 0, 0

; Main program
.text
.global main
main:

  ; Initialize the result to 0
  mov $0, %rax
  mov %rax, result(,%rax,8)

  ; Loop through each of the 64-bit words in the numbers
  mov $0, %rax
  loop:

    ; Add the corresponding 64-bit words from num1 and num2
    movq num1(,%rax,8), %rdx
    addq num2(,%rax,8), %rdx

    ; Store the result in the corresponding 64-bit word in the result
    mov %rdx, result(,%rax,8)

    ; Increment the loop counter and check if it has reached 4
    inc %rax
    cmp $4, %rax
    jne loop

  ; End the program
  ret
but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?
NotATether
Legendary
*
Offline Offline

Activity: 1596
Merit: 6726


bitcoincleanup.com / bitmixlist.org


View Profile WWW
February 14, 2023, 08:15:08 AM
Merited by vapourminer (1)
 #46

but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?

There is really no point to be writing code in assembly that is not using instructions that are faster than the ones that gcc is compiling down to.

For example, there are a bunch of MOVs, CMPs, JMPs, Add/Xor/Lea instructions when you compile some C file down to assembly. There are only two ways to make this faster:

1 - you can somehow reformat the assembly to remove excessive MOVs, so that its using as few instructions as possible (will not result in a large performance improvement)
2 - your use case can be accelerated by SIMD instructions (will result in a much faster performance).

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
alexeyneu
Member
**
Offline Offline

Activity: 312
Merit: 30


View Profile
February 14, 2023, 07:13:34 PM
Last edit: February 14, 2023, 07:42:59 PM by alexeyneu
 #47

but dude it's sandbox asm

some kind of this stuff https://stackoverflow.com/questions/64726805/understanding-a-basic-assembly-code-with-lea-instruction#comment114452574_64726805

it's not executable nor ring zero os part . it's nothing really. why to use asm then?

There is really no point to be writing code in assembly that is not using instructions that are faster than the ones that gcc is compiling down to.

For example, there are a bunch of MOVs, CMPs, JMPs, Add/Xor/Lea instructions when you compile some C file down to assembly. There are only two ways to make this faster:

1 - you can somehow reformat the assembly to remove excessive MOVs, so that its using as few instructions as possible (will not result in a large performance improvement)
2 - your use case can be accelerated by SIMD instructions (will result in a much faster performance).

but you can do simd in c. stuff like this _mm_setzero_si128() . in protected mode  os you executable is just a task with little to no hw access. so asm has no much sense . if you remove some mov's in exe it'll not increase performance
about this code(or smth like that)  posted here - it's something stylized to msdos real mode asm . it's unrelated to binary produced after it.
Pages: « 1 2 [3]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!