With interest, I read the paper "Bandwidth-Efficient Transaction Relay in Bitcoin" (Erlay Project). It seems quite flawed, and I believe I can offer a better approach.
In what respect? The core reconciliation mechanism in erlay is arguably close to information theoretically optimal: If peer A wants to tell peer B about the IDs a knows and B doesn't know, it need only send as much data as the number of missing IDs, no matter how many they have in common even though A has no idea what txn B are missing.
Reconciling relay sets is almost functionally equivalent to reconciling mempools, however mempools can have persistent differences due to policy differences between nodes or due to ties in preferences between conflicting transactions. This means that there is a risk of continually leaking bandwidth due to a persistent difference in the mempool, which I think makes reconciling relay sets more attractive.
Let's reconsider full mempool reconciliation. We can train a reconciliation AI model on a CPU. The training times for an ordinary CPU are as follows:
What precisely is this machine learning model learning to predict? Absent such a description your post is hard to distinguish from the significant numbers of non-technical conmen in the cryptocurrency space that just spew jargon.
Now, applying bucketing (sharding by the initial hash byte, which divides the problem into 256 "equal" subproblems) on the same CPU yields:
The author of transactions can control the bytes of their hash, so any scheme that hopes to distribute work/load with respect to transactions are vulnerable to attack (e.g. attacker mines flood of transaction that all begin with a common byte).
If Party A sends this 66KB (44KB more realistically) model to Party B, Party B will immediately be able to respond with only the transactions that Party A doesn't have. The best part: notice how the AI model is smaller than a simple vector of hashes? It's about 3x-4x smaller.
That is a substantial amount of data, several times what the Bitcoin network uses today to relay a block in the presence of common transaction, which itself is substantially larger than we know is possible (existing mechanism is used for implementation simplicity and to reduce the latency impact of decode time).
Checking a friend's node, just to given an example, I see that the last 7 blocks took the following number of bytes to relay to this node: 38227, 18617, 30581, 31484, 41335, 42155, 24038.
reconcile(model, hash3) = false // This object is NOT a TX HASH but a Merkle tree root/inner node hash
Transaction order is a property of both the dependency graph and the miners selection algorithm, as a result attempting to reconcile on interior nodes will mostly just waste bandwidth because even with the transactions are the same the alignment in the tree is often different. Efficient reconciliation protocols effectively don't use any bandwidth for common data, so there shouldn't be an advantage on reconciling on interior nodes.
Full node protocol:
A block is mined by us/new TX signed, and added to the chain/mempool respectively.
If it’s a block, we send it to everyone using the unsolicited ProtocolPacketUnsolicitedData packet.
Initialize reconciliation for the just-mined block (height), and perform periodic reconciliation for the next candidate block (height+1).
When a new reconciliation model (based on Merkle root and nodes) is trained, we send it to all peers via ProtocolPacketReconcileRequest.
The other party may download it using ProtocolPacketReconcileModelRequest/ProtocolPacketReconcileModelResponse.
The other party will perform reconciliation and directly send us the objects that we don't have for that specific height/candidate block.
This and the following text has no coverage for if it's not a block but a new transaction. Erlay is *not* a block relay protocol. It is a transaction relay protocol, and given my above efficiency comment I don't think what you're imagining is particularly interesting as a block relay protocol compared to what is already deployed.