Bitcoin Forum
May 12, 2024, 07:35:37 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Fast seaching approachs  (Read 564 times)
JPage (OP)
Full Member
***
Offline Offline

Activity: 399
Merit: 105



View Profile
May 15, 2016, 10:04:05 AM
 #1

Let's say I have a data field that looks like this: "1K15YmnotHd3TL1ZxL4p41MmxD1mAnt1ts"
I'd like to know whether that pattern was present in the last block added to the blockchain.  No problem.  

Now, assume I have 1 million similar datafields.  What is a fast way to determine whether any of them saw action in the last block?  

SQLServer stored procedure?  In memory dataset?  How would you do it to optimize speed?

Since each block could have up to 3000 transactions - the question is: "which of these 1 million exist in this group of 3000?".  I have to perform that every ten minutes.  And, the future suggests there will one day be 10 million.  I'd like my strategy to have a little room for the future.
1715499337
Hero Member
*
Offline Offline

Posts: 1715499337

View Profile Personal Message (Offline)

Ignore
1715499337
Reply with quote  #2

1715499337
Report to moderator
1715499337
Hero Member
*
Offline Offline

Posts: 1715499337

View Profile Personal Message (Offline)

Ignore
1715499337
Reply with quote  #2

1715499337
Report to moderator
1715499337
Hero Member
*
Offline Offline

Posts: 1715499337

View Profile Personal Message (Offline)

Ignore
1715499337
Reply with quote  #2

1715499337
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715499337
Hero Member
*
Offline Offline

Posts: 1715499337

View Profile Personal Message (Offline)

Ignore
1715499337
Reply with quote  #2

1715499337
Report to moderator
cloverme
Legendary
*
Offline Offline

Activity: 1512
Merit: 1057


SpacePirate.io


View Profile WWW
May 16, 2016, 01:21:45 AM
 #2

Like everything else in life, you're going to have a million options.  I had a similar situation a few years ago (non-bitcoin related), results were in ms per query for about 15m records.  I'd do the reverse, i'd use addresses in the blockchain to search against what you have patterned. Here's one possibility...

Setup...
- Create a database to hold your bitcoin addresses and other data, tableX, columnY, columnZ, columnZa,etc
- Index Y, Z
- Use a language that allows the max size of an array by memory (c++, etc)
- Store all your millions of bitcoin addresses into columnY.
- Create an array (A) in memory to hold all your stored addresses (10m, etc).
- Read your database of bitcoin addresses, hash the addresses (something fast and small, like fnv1a64 to reduce of the number of bytes and still have a reasonable collision acceptance 2^64), if your're going to have more than 2^64 bitcoin pattern addresses, just use another small hashing algorithm.  Note that it doesn't have to be cryptographic hash)
- For each of your addresses, store the hash of the address into the array (the number of bytes should ideally be less than the length of the bitcoin address). Also store the hash of the address back into your database (if it already doesn't exist), columnZ.

Process the blockchain...
- Look in the database to see what block was processed last, or if first run, block 0
- For each block, read in all the addresses, hash them and store them into another memory array (B)
- Search through array (A) for values in (B)
- If you have a match, append the block number into columnZa (block0:block22:block88842:etc)
store the last block searched into the database


I'd use an OO language, 64 bit based libraries.
Your DB is going to be doing more reads than writes, so optimize accordingly.
You're going to hog up a lot of memory on the array where you have millions of records, so go for 128GB or higher
Don't use a virtual server, use a physical server where you can optimize your CPU configuration
JPage (OP)
Full Member
***
Offline Offline

Activity: 399
Merit: 105



View Profile
May 16, 2016, 04:26:30 AM
 #3

Like everything else in life, you're going to have a million options.  I had a similar situation a few years ago (non-bitcoin related), results were in ms per query for about 15m records.  I'd do the reverse, i'd use addresses in the blockchain to search against what you have patterned. Here's one possibility...

Setup...
- Create a database to hold your bitcoin addresses and other data, tableX, columnY, columnZ, columnZa,etc
- Index Y, Z
- Use a language that allows the max size of an array by memory (c++, etc)
- Store all your millions of bitcoin addresses into columnY.
- Create an array (A) in memory to hold all your stored addresses (10m, etc).
- Read your database of bitcoin addresses, hash the addresses (something fast and small, like fnv1a64 to reduce of the number of bytes and still have a reasonable collision acceptance 2^64), if your're going to have more than 2^64 bitcoin pattern addresses, just use another small hashing algorithm.  Note that it doesn't have to be cryptographic hash)
- For each of your addresses, store the hash of the address into the array (the number of bytes should ideally be less than the length of the bitcoin address). Also store the hash of the address back into your database (if it already doesn't exist), columnZ.

Process the blockchain...
- Look in the database to see what block was processed last, or if first run, block 0
- For each block, read in all the addresses, hash them and store them into another memory array (B)
- Search through array (A) for values in (B)
- If you have a match, append the block number into columnZa (block0:block22:block88842:etc)
store the last block searched into the database


I'd use an OO language, 64 bit based libraries.
Your DB is going to be doing more reads than writes, so optimize accordingly.
You're going to hog up a lot of memory on the array where you have millions of records, so go for 128GB or higher
Don't use a virtual server, use a physical server where you can optimize your CPU configuration


That's a lot of really good information.  Thanks for that.  Smiley
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!