 |
March 25, 2026, 10:32:35 PM |
|
A standard ClickHouse installation cannot handle real-time searches across billions of transactions. We have optimized the server (1.5TB RAM) on four levels to transform it into a "time machine" for blockchain. 1. Storage Architecture: Primary Key & Sorting Key We have moved away from standard indexing. Searching for dormant addresses requires filtering by last activity date (Last Seen) across a massive number of unique addresses. We use a combined sorting key: ORDER BY (LastSeenDate, AddressHash). This allows ClickHouse to physically place data from older transactions within the same data block, minimizing I/O when scanning historical epochs. 2. Sparse Indexes & Granularity To work with 1.5 TB of RAM, we configured the index_granularity parameter to 8192. This allows the entire primary index to reside in memory. · Searching for a specific range of dormant wallets (e.g., those created between 2010–2012 with no outgoing transactions) occurs without disk access — using only the sparse index in RAM. 3. In-Memory Data Processing (In-Memory Tables) For the hottest datasets (e.g., lists of known vulnerable public keys or weak signature R-values), we configured tables using the Memory and Join engines. · This eliminates serialization latency. Comparing an incoming transaction stream against a database of vulnerable patterns happens at speeds of up to 500 million rows per second. 4. Vector Instruction Optimization (SIMD) The Intel Xeon Gold processors in this server support AVX-512 instruction sets. · We compiled ClickHouse from source with optimization flags tailored specifically to the processor architecture of this server. · This delivers a 2–3x performance gain when filtering data (WHERE clauses operate at the CPU register level), which is critical for rapid hash iteration and matching. 5. Data Synchronization: Zero-Lag ETL The server ships with a fully configured pipeline: · Bitcoin Core Node → Custom ETL → ClickHouse We use buffered inserts (Buffer engine), allowing the node to write data without locks while the analytics engine reads with minimal latency — less than 100ms from block arrival to SQL availability. The data schema design, selection of optimal compression codecs (ZSTD for historical data, LZ4 for hot data), and calibration of MergeTree engines required over 200 hours of testing. Ready to Profit: You don't waste time syncing 700+ GB of data or fighting memory leaks. Simply power on the server, and within five minutes, you can run your first SQL query to discover targets.
|