Many times in the past people have suggested using a DHT (either by name or by reinventing the concept) to store the entire blockchain but this breaks the trust-free nature of blockchain verification. To verify blocks a node must have the full block history. With the TxIds from the longest chain however you can not be spoofed if you ask an arbitrary node for the details of that txn. To create a different txn with the same TxId would require a preimage attack on the hash and that is considered infeasible as long as SHA-256 is cryptographically strong. This would allow txns to be stored in a distributed trust free manner using a DHT (
Distributed Hash Table). The primary reason would be to reduce the minimum storage requirements of each node and allow nodes to support the network on a partial basis. Currently there is an all or nothing requirement. You are either a full node (cost is ~25GB) or you do not help the network in any way (SPV nodes do not improve the security of the network). DHT of Transactions, a Distributed Transaction Table (DTT). The required changes not be significant and could be done in a backwards compatible manner.
1) Block structure for DHT nodes would be changed such that it contains just the block header and tx hashes (TxId).
2) A new block message would be created which allows relaying this "reduced block" (also useful in reducing the orphan cost of blocks).
3) The full transaction would be stored in the DTT. Individual nodes would store a subset of the DTT.
4) Current "full nodes" could be converted into DHT nodes by simply maintaining a 100% share of the DHT.
The naive solution would be for individual nodes to drop all txns which aren't in their share and use the DTT as needed. This would be rather inefficient as the probability of a node needing a particular txn (either internally or because it was requested by a peer) is not consistent for all txns. Certain txn would always be maintained in the local cache those would include:
a) Txns in "recent" blocks. In the event of a reorg updating the UTXO requires knowing the details of the txns of the block being orphaned.
b) Txns in the UTXO. This is txns which have at least one unspent output. The UTXO is needed to validate new txns and new blocks.
c) Txns in the memory pool. This is the set of txns which are valid and known to the node but not yet included in a block. If block messages include txns hashes only the memory pool will be needed to validate a block.
d) Txns involving the user's keys. This isn't a requirement but a user has a vested interest in ensuring copies of his txns are maintained. If nothing else this may be psychologically useful to improve trust in the DHT concept.
Understand however the integrity of txns comes from their txn hash (TxId) so a "badly optimized" DHT would have suboptimal performance but not reduced security.
Some rough numbers of the current storage requirementsFull raw blockchain = ~18.9 GB
Undo chain = ~2.5 GB (can this be reduced? seems undo information past say the last 144 blocks would be of little value)
Blockchain Indexes = ~2.0 GB (I haven't decoded these but find it interesting it is so large relative to the blocks)
Chainstate (UTXO) = ~400 MB (compressed format contains unspent outputs & txn metadata only)
Memory Pool = <1MB (unconfirmed txns, 538 at time of post)
Total: ~23.8 GB
The current blockchain statsNumber of blocks: 307,394
Number of transactions: ~41,182,000
Number of unspent txns: 3,347,562 (a txns with at least one unspent output)
Number of unspent outputs: 11,648,626
Breaking this downSize of hash-only blocks: 1,320 MB (includes headers & txn hashes)
Size of the UTXO: 400 MB (unspent outputs only)
Size of the Unspent Txns in entirety: ~1,500 MB
So the local cache storage requirements for a DHT node would be: 1.7 GB (or 2.8 GB if full txns of UTXO elements are retained). If we assume the average node maintains a 5% DHT share of the remaining txns (bulk of the block bodies) that would be another 1GB. DHT nodes would also keep a local copy of all memory pool but that is a rounding error on storage requirements.
This wouldn't reduce the bootstrap time or (or amount of bandwidth used) for bootstrapping full nodes. As a full node you would still need to download 100% of the txns of 100% of the blocks to verify the longest chain is the longest valid chain. However once bootstrapped the ongoing resource requirements would be reduced. This would work best if the protocol contained a block message which just sent blockheader & txns hashes. Currently when a full node receives and verifies a new block which extends the longest chain it stores the full block and removes the now spent txns from the UTXO. DHT nodes would instead record the reduced block (header & hashes only), then saved its DHT share of the spent txns and discard the rest. To reduce the overhead from reorgs and provide better coverage for syncing nodes needing only the tip of the blockchain it may make sense for DHT nodes to retain all txns for the most recent x blocks (144? one block day?).
If a structure like this was used for all nodes then "full nodes" would simply be nodes participating in the txn DHT that retain a 100% copy of the full txn set. The best comparison would be to the status of a "seeder" in the bittorrent protocol. Since all DHT nodes would retain a full copy of the UTXO these nodes could still support SPV clients. SPV clients could actually support the network by retaining a portion of the txn set. Retaining older txns would be especially beneficial in that they could reduce the load on full nodes when the network bootstraps new full nodes.