Bitcoin Forum
May 24, 2026, 07:17:01 PM *
News: Latest Bitcoin Core release: 31.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Bare-metal C-engine bypassing RPC for P2P mempool telemetry  (Read 99 times)
Ahmadyskhan (OP)
Newbie
*
Online Online

Activity: 5
Merit: 9


View Profile WWW
May 21, 2026, 11:28:12 AM
Merited by NotATether (5), Pmalek (3), ABCbits (1)
 #1

I’ve been researching high-frequency on-chain data and ran into a massive latency bottleneck. Polling standard Bitcoin Core RPCs involves too much disk I/O and indexing bloat for true sub-50ms execution.

To solve this, I built Mempool Oracle—a custom C-engine that ignores the RPC entirely. It subscribes directly to the raw P2P mesh network.

How the architecture handles the firehose:

When a transaction hits the network, the engine uses custom FNV-1a hashmaps entirely in RAM to resolve complex CPFP and Replace-By-Fee (RBF) topologies before the data ever touches a drive. It calculates fee velocity and shock scores on the fly, and then pushes the data out via a decoupled, stateless HTTPS Server-Sent Events (SSE) gateway to ensure clients have zero TCP ping-pong overhead.

I've documented the backend architecture and open-sourced the Python client implementation so the exact JSON schema is visible.

Project link: https://www.mempool-alpha-oracle.com

I'd appreciate any feedback from protocol engineers on the P2P interception method or the RAM topology mapping.
BattleDog
Full Member
***
Offline

Activity: 242
Merit: 228



View Profile WWW
May 21, 2026, 02:28:30 PM
Merited by Pmalek (3), ABCbits (1)
 #2

If you want low-latency mempool data, bypassing slow RPC polling is reasonable. Nobody serious should be hammering `getrawmempool` in a loop and then acting surprised when the node starts coughing bolts. Bitcoin Core already gives you better paths though: ZMQ, direct peer connections, patched instrumentation, or just running your own listener stack beside Core. So the concept itself is not crazy.

But some of the wording here makes my left eyebrow boot into safe mode. "P2P interception method" is a strange way to describe being a peer and listening to tx relay. You are not intercepting anything unless you are doing something much uglier. Also "zero TCP ping-pong overhead" through SSE is marketing glitter. SSE may be fine for pushing events to clients, but it does not magically erase network latency, peer topology, INV/GETDATA behavior, orphan handling, compact relay quirks, or the fact that there is no single canonical mempool.

Also, FNV-1a is fast, sure, but if this is security-adjacent or adversarially exposed, I'd want to know where exactly it is used. Fast non-crypto hashes in public-facing hashmaps have a long history of becoming little denial-of-service piñatas if you are careless.

Open source the C engine and let people beat on it. A Python client and a JSON schema are nice, but protocol engineers are going to want the ugly bits: peer handling, tx validation assumptions, dedupe logic, DoS resistance, memory bounds, and how it behaves when the mempool turns into a zoo during a fee spike.
Ahmadyskhan (OP)
Newbie
*
Online Online

Activity: 5
Merit: 9


View Profile WWW
May 21, 2026, 04:31:33 PM
 #3

If you want low-latency mempool data, bypassing slow RPC polling is reasonable. Nobody serious should be hammering `getrawmempool` in a loop and then acting surprised when the node starts coughing bolts. Bitcoin Core already gives you better paths though: ZMQ, direct peer connections, patched instrumentation, or just running your own listener stack beside Core. So the concept itself is not crazy.

But some of the wording here makes my left eyebrow boot into safe mode. "P2P interception method" is a strange way to describe being a peer and listening to tx relay. You are not intercepting anything unless you are doing something much uglier. Also "zero TCP ping-pong overhead" through SSE is marketing glitter. SSE may be fine for pushing events to clients, but it does not magically erase network latency, peer topology, INV/GETDATA behavior, orphan handling, compact relay quirks, or the fact that there is no single canonical mempool.

Also, FNV-1a is fast, sure, but if this is security-adjacent or adversarially exposed, I'd want to know where exactly it is used. Fast non-crypto hashes in public-facing hashmaps have a long history of becoming little denial-of-service piñatas if you are careless.

Open source the C engine and let people beat on it. A Python client and a JSON schema are nice, but protocol engineers are going to want the ugly bits: peer handling, tx validation assumptions, dedupe logic, DoS resistance, memory bounds, and how it behaves when the mempool turns into a zoo during a fee spike.

Appreciate the thorough review, BattleDog. You raised some very valid points, let me clarify the architecture:

1. Terminology: You are completely right, "interception" is just marketing shorthand. Under the hood, it is simply a highly optimized passive listener. It handshakes as a standard peer, aggressively ingests inv messages, requests the missing data, and simply drops the relay responsibility.

2. SSE & The Canonical Mempool: I agree completely that a global canonical mempool is a myth. The "zero TCP ping-pong" claim was referring strictly to the client-side ingestion of the telemetry (the Python script connecting to the gateway), not the backend P2P propagation, which as you noted, is still bound by standard Bitcoin network realities.

3. Hashmaps & DoS Risk: Good catch on the RAM topology mapping. The mention of FNV-1a in the write-up was an oversimplification of how I'm handling the internal state to prioritize speed, but you are 100% correct about the HashDoS vulnerability if raw public inputs aren't sanitized before hashing. The engine aggressively prunes the topologies to mitigate memory exhaustion during spikes.

4. Open Sourcing: The Python client and JSON schemas are fully open source, but the C-engine itself is proprietary infrastructure backing a SaaS model, so I won't be open-sourcing the core node logic right now.

That said, since you clearly know your way around network topology, I would be happy to provision a private sandbox API key for you. You can point a script at the SSE stream and stress-test the raw latency/dedupe behavior yourself during the next major fee spike. Let me know if you want access.
ABCbits
Legendary
*
Offline

Activity: 3612
Merit: 10064



View Profile
May 22, 2026, 07:55:39 AM
 #4

4. Open Sourcing: The Python client and JSON schemas are fully open source

I don't see any open source license mentioned on repository i found (https://github.com/Ahmadyskhan/mempool-oracle-client-python). I just want to remind that no license usually means all right reserved, see https://choosealicense.com/no-permission/.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
NotATether
Legendary
*
Offline

Activity: 2338
Merit: 9714


┻┻ ︵㇏(°□°㇏)


View Profile WWW
May 22, 2026, 12:35:23 PM
 #5

Is this only applicable for the mempool?

Would it be theoretically feasible for me to e.g. fetch a confirmed transaction from X days ago, entirely in-memory? Of course, in order to fit in RAM (since even 128GB wouldn't be enough for that use case), the blockchain would have to be pruned to a particular cutoff date but preserve UTXO sets.

That would have a lot of interesting use-cases.

 
 b1exch.to 
  ETH      DAI   
  BTC      LTC   
  USDT     XMR    
.███████████▄▀▄▀
█████████▄█▄▀
███████████
███████▄█▀
█▀█
▄▄▀░░██▄▄
▄▀██▄▀█████▄
██▄▀░▄██████
███████░█████
█░████░█████████
█░█░█░████░█████
█░█░█░██░█████
▀▀▀▄█▄████▀▀▀
Ahmadyskhan (OP)
Newbie
*
Online Online

Activity: 5
Merit: 9


View Profile WWW
May 22, 2026, 05:48:06 PM
 #6

@ABCbits - Good catch. That was a plain oversight on my end when initially pushing the boilerplate. I've just added the MIT license to the repository. Appreciate you pointing that out so devs can actually integrate it legally.

@NotATether - Currently, the engine is strictly built for the mempool (streaming live unconfirmed transactions as they hit the network).

Theoretically? Yes, you could adapt the architecture to serve historical confirmed TXs entirely from RAM. But as you noted, the memory overhead becomes the primary bottleneck. Even with aggressive pruning to a specific cutoff date and maintaining the UTXO set in RAM, you'd be building a completely different beast—moving from a real-time event streamer to an ultra-low-latency historical query engine. It's a highly interesting use case (especially for high-speed backtesting), but it falls outside the current scope of what this specific oracle is optimized for.
Ahmadyskhan (OP)
Newbie
*
Online Online

Activity: 5
Merit: 9


View Profile WWW
May 23, 2026, 07:53:48 PM
 #7

For those questioning the real-world utility of bypassing the standard RPC, the engine just caught a massive internal exchange shuffle in the wild. A 2,631 BTC ($199M) transfer was broadcasted without RBF, immediately preceded by a 59 BTC RBF-enabled hot-wallet sweep from the exact same node IP. Catching this in RAM before block confirmation gives a distinct latency advantage for algorithmic routing.

-> WHALE | TXID: 211786068c5e58caa59f8dd4088374eec0279ab6cf9d429e76f279e67bd5274b | Size: 15.3621 BTC | RBF: False | ORIGIN: 86.48.28.16

-> WHALE | TXID: d47547fbfcc00aa42b89248b469a30fc6de5b7ad699ed1c3caeb0a331d1c06c2 | Size: 5.7947 BTC | RBF: False | ORIGIN: 173.206.159.91

-> WHALE | TXID: c47ddc81a63dce705126f79b72ce955e1b65e8c739537c0e80e6f74527ff1034 | Size: 59.5889 BTC | RBF: True | ORIGIN: 90.143.133.202

-> WHALE | TXID: 039dd6a5c4a4d8251fcdf0945905a04882ecd47c78e883a7ae8d59953b095069 | Size: 50.1309 BTC | RBF: True | ORIGIN: 35.200.48.92

-> WHALE | TXID: a6b330bbdf08a4da7fa597650f9e08fbff14cfdf894ae2a115fd984097711080 | Size: 2631.5581 BTC | RBF: False | ORIGIN: 90.143.133.202

(If anyone wants to stream the JSON feed and test the latency vs. the spot market themselves, the Python client logic is open-sourced here: https://github.com/Ahmadyskhan/mempool-oracle-client-python)

Curious for those of you also mapping the mempool—are you tracking the average spot market execution lag after these specific OTC cold storage sweeps, or just focusing on the RBF cascades?
NotATether
Legendary
*
Offline

Activity: 2338
Merit: 9714


┻┻ ︵㇏(°□°㇏)


View Profile WWW
Today at 03:54:04 AM
 #8

For those questioning the real-world utility of bypassing the standard RPC, the engine just caught a massive internal exchange shuffle in the wild. A 2,631 BTC ($199M) transfer was broadcasted without RBF, immediately preceded by a 59 BTC RBF-enabled hot-wallet sweep from the exact same node IP. Catching this in RAM before block confirmation gives a distinct latency advantage for algorithmic routing.

You may be able to pick up the transaction quickly, but in practice, only mining pools would be able to financially benefit from this. (and of course, a 1Gbps line is basically required at this point.)

But it sounds like a cool idea.

How would your client differentiate an OTC transaction from e.g. a large cold-wallet sweep?

 
 b1exch.to 
  ETH      DAI   
  BTC      LTC   
  USDT     XMR    
.███████████▄▀▄▀
█████████▄█▄▀
███████████
███████▄█▀
█▀█
▄▄▀░░██▄▄
▄▀██▄▀█████▄
██▄▀░▄██████
███████░█████
█░████░█████████
█░█░█░████░█████
█░█░█░██░█████
▀▀▀▄█▄████▀▀▀
ABCbits
Legendary
*
Offline

Activity: 3612
Merit: 10064



View Profile
Today at 08:45:46 AM
 #9

You mentioned automatic trading as example that benefit from your service. But is there any other usage (besides mining that already mentioned) that actually benefit from your service?

For those questioning the real-world utility of bypassing the standard RPC, the engine just caught a massive internal exchange shuffle in the wild. A 2,631 BTC ($199M) transfer was broadcasted without RBF, immediately preceded by a 59 BTC RBF-enabled hot-wallet sweep from the exact same node IP. Catching this in RAM before block confirmation gives a distinct latency advantage for algorithmic routing.
You may be able to pick up the transaction quickly, but in practice, only mining pools would be able to financially benefit from this. (and of course, a 1Gbps line is basically required at this point.)

Mining pool probably already use specialized software to get unconfirmed TX and create block template as quick as possible, along with internet with low latency and well connected nodes.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
Ahmadyskhan (OP)
Newbie
*
Online Online

Activity: 5
Merit: 9


View Profile WWW
Today at 09:00:59 AM
 #10

@NotATether & @ABCbits - Excellent questions. You both brought up the mining pool overlap, so I want to clarify the distinction between extracting fee value (what miners do) and extracting information value (what quants do).

You are 100% correct that mining pools already run highly optimized, well-connected nodes to parse unconfirmed TXs for block templates. But their goal is strictly maximizing fee revenue per block. My engine ignores the fee revenue game entirely. It focuses strictly on the market impact lag between Layer 1 (the blockchain) and Layer 2 (centralized exchange spot markets).

To ABCbits' question on other use cases besides automated trading:

Aside from directional spot trading, the biggest beneficiaries of this telemetry are Market Makers and Liquidity Providers.
If a market maker is providing liquidity on Binance, they are heavily exposed to toxic flow. If a 5,000 BTC supply shock hits the mempool, the MM needs to know before the exchange algorithms start slicing that volume into TWAP spot orders. By monitoring mempool density in real-time, Market Makers can dynamically widen their bid/ask spreads or pull their limit orders seconds before the spot market volatility actually hits.

To NotATether's point on the 1Gbps line requirement:

I completely agree. Ingesting the raw P2P inv firehose and resolving the RBF/CPFP topologies in RAM requires massive bandwidth and compute. That is exactly the bottleneck I built this to solve.
The C-engine sits on a dedicated, high-bandwidth server handling all the heavy lifting and memory management. It then calculates a "Shock Score" and streams only the lightweight JSON events out via Server-Sent Events (SSE). This means a trader running a Python bot on a standard home connection or a cheap VPS can react to the P2P firehose in milliseconds without needing a 1Gbps line or a 2TB NVMe drive.

Regarding differentiating an OTC transaction vs. a Cold-Wallet Sweep:
A deterministic, 100% guarantee is impossible before on-chain forensic tracing is complete, but the engine relies on probabilistic clustering.

We look at Node IP Origin Clustering and Coin Days Destroyed (CDD):

If the massive non-RBF transaction originates from the exact same node IP as a standard, RBF-enabled exchange hot-wallet sweep, it heavily skews the probability toward an internal exchange cold vault transfer rather than an external OTC settlement.

We track the age of the UTXOs being spent. Massive CDD spikes clustered with specific node IPs usually indicate deep cold storage movement rather than active OTC desk turnover.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!