Idea: XPIR for more private lightweight nodes

theymos (OP)

Administrator
Legendary

Offline

Activity: 5180
Merit: 12900

Idea: XPIR for more private lightweight nodes

January 23, 2017, 09:59:06 PM

Merited by ABCbits (1)

#1

Currently, all lightweight nodes have to more-or-less give a list of all of their addresses to someone else in order to find their incoming and outgoing transactions. This totally destroys their privacy because it connects all of their addresses together, even if they're behind Tor. Some wallets try to obfuscate this with bloom filters, but research has shown that this isn't all that effective at maintaining privacy. Bitcoin Core avoids this problem by downloading the entire block chain, but this uses a lot of bandwidth, and if you don't store the entire block chain, you have to download everything again if you want to rescan.

I found that there is a (beta) library called XPIR which may be able to improve this. It uses homomorphic encryption to allow for a server to have a database which clients can query, but without the server or any listeners knowing which entries in the database the client is querying/receiving. So clients could ask the server (encrypted), "please give me all transactions associated with address ____", and the server would be able to provide this data without knowing anything about the client's query. This would significantly improve privacy.

Running a server will be pretty resource-intensive, so this isn't something that every full node is going to be doing. However, it would be appropriate for Electrum servers, sites like GreenAddress, etc.

I haven't actually tried the software, but looking at the paper, here's how I think it'd work on a technical level:

- The database to be queried must be structured as index -> value, where indices must be sequential integers starting at 0, and clients can only query for specific indices. So the first step is for the server to publish a mapping between all possible scriptPubKey queries and their indicies. This can be done by hashing each scriptPubKey seen in the block chain (maybe with wildcards to handle cases like stealth addresses), sorting the hashes, and giving each one a sequential index. All clients need to download this initial mapping. If the hash is 128 bits and the index is 64 bits (both of these could maybe be shortened), currently the mapping would be around 10.3 MB with today's 429k unique addresses. Everyone gets the same mapping, so there's no need to download it in any special anonymous way.
- The efficiency of the database goes down with the number of entries. Here, "entries" is the number of possible scriptPubKey queries. So each server should actually have several XPIR databases, each of which will contain perhaps around 100,000 possible scriptPubKeys. Info about which scriptPubKey ranges belong to which database will also have to be downloaded by clients, but this is a negligible amount of data. Each database needs to start its indices at 0, so the client will have to adjust the global index down according to which database it's querying. Segregating the scriptPubKeys like this allows the server to know that the client has some address in that range, but there are enough addresses to make this not-very-useful. The client could also send dummy queries to disrupt even this.
- The client would just send one query for every address in its wallet. For each one, they'd get all of the address's transactions along with its merkle branch linking it to the block chain. From page 14 of the XPIR paper, it looks like you can very roughly expect to download your transaction data at 12.5 kB/s, with a latency between first query and initial transaction data of 10-500 seconds depending mainly on the client's upload speed. These speeds seem OK.
- If clients are only interested in addresses that currently have BTC, you can significantly reduce the size of the database, and therefore increase efficiency. Similarly, the database size can be reduced for clients that have been online for a while and are therefore only interested in new/recent blocks.
- I'm not sure how many clients a reasonable server could comfortably handle. If it's not very many, then servers might need to charge for this. Maybe they could have a free queue, and then the possibility of paying to jump the queue. They could use blinded tokens to accept micropayments anonymously. The current XPIR library seems not to be very optimized, especially for multiple simultaneous clients, so performance improvements are probably possible.

Thoughts?

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Idea: XPIR for more private lightweight nodes

January 24, 2017, 12:50:01 AM

Merited by ABCbits (1)

#2

Search bitcoin-wizards logs for percy++ which is another PIR library; there have been some similar proposals to yours.

We added functionality in the Bitcoin Core RPC 'importprunedfunds' that could be used to import transaction data retrieved from this kind of PIR scanning service. I was thinking very much along the same lines as you with a free queue and then the ability to use blind tokens to add priority to entries in the queue.

It's possible to avoid downloading that initial index:

Create that index.

now build a binary tree over it e.g. the rood node says [go left for <= 1WWWW..., right for the rest]
then the next layer down has [go left for <= 1FFFFF..., right for rest] [go left for <= 1kkkkkk... right for rest]
then the later below that would have four entries and so on..

each row of the tree in a separate database... then you just make multiple queries with the PIR hiding which of the nodes you are reading..
until you get to the bottom layer and know what index your spk is at.

In practice sending the whole index is probably fine.

I'm not sure if XPIR does something special here, but normally all the results have to be the same size.. so the fact that there are wildly different numbers of transactions connected to addresses has to have something done about it.

theymos (OP)

Administrator
Legendary

Offline

Activity: 5180
Merit: 12900

Re: Idea: XPIR for more private lightweight nodes

January 24, 2017, 01:46:02 AM

#3

Quote from: gmaxwell on January 24, 2017, 12:50:01 AM

Search bitcoin-wizards logs for percy++ which is another PIR library; there have been some similar proposals to yours.

I looked at percy++, but it wasn't immediately clear to me what version of single-server, computation PIR it was using. XPIR had a nice paper that I could read. percy++ could be better, I don't know. Though it seems to be focusing mainly on the version of PIR where multiple servers are involved and assumed to be non-cooperating, which I feel is an assumption that's too difficult to ensure in practice for this situation.

Quote

We added functionality in the Bitcoin Core RPC 'importprunedfunds' that could be used to import transaction data retrieved from this kind of PIR scanning service. I was thinking very much along the same lines as you with a free queue and then the ability to use blind tokens to add priority to entries in the queue.

Perfect! I was thinking that an RPC method like that was exactly what was needed. Non-PIR services to provide that data will also be useful in the near future -- loss of rescan was a huge downside to pruning.

Quote

It's possible to avoid downloading that initial index:

Cool!

Quote

I'm not sure if XPIR does something special here, but normally all the results have to be the same size.. so the fact that there are wildly different numbers of transactions connected to addresses has to have something done about it.

It does seem to have constant size, but it doesn't seem that the size affects efficiency much. The entries also seem to be streamed to clients in order, since the XPIR paper talks about streaming video. So if a record is 10MB composed of 250kB followed by 9.75MB of zeroes, the client could stop reading at some random point after receiving their 250kB.

If that doesn't work, another thing that occurred to me is that you could have lists of txids, and then another PIR database for fetching transactions. Or even just the simple piece of data that all of the address's transactions appear between blocks x and y, which the client could then go download elsewhere. Perhaps the database could degrade through all three of these possibilities depending on how much transaction data there is.

I'm glad this stuff is on your mind!

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Idea: XPIR for more private lightweight nodes

January 24, 2017, 09:56:33 AM

#4

Percy++ can do single server... though it's much slower than the multi-server mode. I think it may be using the same kind of crypto as xpir for the the single server... not sure which of the two is faster. (I wasn't aware of xpir before your message.)

All of this stuff is super slow, but fast enough to be usable. It just needs a lot of glue work.

With respect to the multiserver assumption-- well if due to performance reasons your options are multi-server OR no PIR-- the multiserver is still better than nothing. It also has the advantage that no crypto break could ever compromise your privacy, where the single server PIR has the unfortunate property that if the server logs your queries and then later it's discovered the crypto was weak, your privacy could be lost.

I also think Percy++ can combine the two. E.g. multiserver, but if they cooperate then you're still protected but I could be misremembering. The elaborations are kind of moot when AFAIK no one uses PIR for anything anywhere.

HI-TEC99

Legendary

Offline

Activity: 2772
Merit: 2846

Re: Idea: XPIR for more private lightweight nodes

January 24, 2017, 10:55:46 AM

#5

I haven't had time to study whether XPIR or Percy++ rely on "key switching", but last August some cryptographers released a paper identifying weaknesses in a homomorphic encryption scheme by Zhou and Wornell. The paper explains how the encryption scheme relies on "key switching" and demonstrates three attacks on it.

The paper makes some assumptions that the exploits rely on, and they might not be relevant to XPIR or Percy++ even if those encryption schemes rely on "key switching".

The paper's abstract is here.

http://eprint.iacr.org/2016/775

The paper is here.

http://eprint.iacr.org/2016/775.pdf

These are the exploits and the assumptions made.

Quote

...we are able to mount three attacks. The first attack enables to recover a secret plaintext message broadcasted to multiple users. The second attack performs a chosen ciphertext key recovery attack and it was implemented and verified. The last attack is a related chosen plaintext decryption attack

Quote

Attack on Broadcast Encryption

A valid scenario for this attack would be one where a service provider has to send an activation key to its customers. The activation key is the same for all the customers. In such case, when the service provider has to send the encrypted activation key to enough customers, an unauthorized user could recover the activation key.

Quote

Chosen Ciphertext Attack

In this attack we assume that the adversary has access to an oracle that decrypts a given ciphertext. His goal is to retrieve the secret key S.

Quote

Chosen Related Plaintext Attack

In the attack we propose, we assume there is a secret x and that the adversary can obtain Enc(x + y) for many chosen y values. The purpose is to recover x. Interestingly, the adversary will take advantage in getting the encryption of x + y which is not Enc(x) + Enc(y). In clear, we will see that the y → M(I − 2J)Bin(carry(wx, wy), ℓ) function leaks.

HI-TEC99

Legendary

Offline

Activity: 2772
Merit: 2846

Re: Idea: XPIR for more private lightweight nodes

January 24, 2017, 06:13:16 PM
Last edit: January 25, 2017, 12:14:04 AM by HI-TEC99

#6

Quote from: gmaxwell on January 24, 2017, 09:56:33 AM

Percy++ can do single server... though it's much slower than the multi-server mode. I think it may be using the same kind of crypto as xpir for the the single server... not sure which of the two is faster. (I wasn't aware of xpir before your message.)

This is a list of PIR implementations that includes other schemes besides XPIR and the schemes Percy++ supports.

https://en.wikipedia.org/wiki/Private_information_retrieval

Quote

PIR implementations

Numerous Computational PIR and Information theoretic PIR schemes in the literature have been implemented. Here is an incomplete list:

Percy++[10] includes implementations of [AG 2007, DGH 2012, CGKS 1998, Goldberg 2007, HOG 2011, LG 2015].
RAID-PIR[14] is an implementation of the ITPIR scheme of [DHS 2014].
XPIR[15] is a fast CPIR implementation [ABFK 2014].
Popcorn[16] is a PIR implementation tailored for media [GCMSAW 2016].
upPIR[17] is an ITPIR implementation [Cappos 2013].

Popcorn is a modified version of Percy++. Unfortunately I haven't found a download for it yet. However page 7 of its paper states RAID-PIR has similar speeds to Percy++. Regrettably the RAID-PIR github hasn't been updated in two years, and the upPIR site's code hasn't been updated in 4 years.

http://www.cs.utexas.edu/~trinabh/papers/popcorn-PIR-nsdi16.pdf

Quote

Percy++’s CGKS ITPIR implementation is one of the fastest implementations for two-server ITPIR. An alternative is the CGKS implementation from RAID-PIR [35]

Page 13 lists other PIR implementations prior to popcorn.

Quote

Prior PIR implementations. Many of the CPIR and ITPIR protocols described above have been implemented. The Percy++ library [51] contains several of them [15, 31, 37–39, 52, 57, 74]. Also, [56] is implemented as a fork of Percy++, RAID-PIR [35] is implemented on top of upPIR [27], and there are numerous CPIR implementations [14, 33, 45, 76, 78, 86, 93, 104, 106, 110], among which XPIR [14] is the fastest. Popcorn incorporates some of these implementations as modules: it uses the XPIR library for CPIR and borrows the CGKS ITPIR [31] code from Percy++. Sections 5 and 6 empirically or analytically compare Popcorn against these prior implementations.

Unfortunately popcorn's privacy guarantee can be compromised by state-level adversaries using colluding organisations to serve objects.

Quote

Fundamental limitations. We see three main limitations. First, because Popcorn’s overheads grow linearly with the number of objects, it has no hope of scaling to YouTube-size libraries. Second, organizations that serve objects can collude to compromise Popcorn’s privacy guarantee. Admittedly, an assumption of no collusion may be unrealistic against state-level adversaries that can compromise multiple organizations (or already have). Third, Popcorn cannot support forward seeks during playback: such user actions alter the download pattern in a contentdependent way, thus revealing information.

These are a few other links to popcorn information.

http://keylorch.github.io/cs616-paper-presentation/#/23

https://github.com/keylorch/cs616-paper-presentation/blob/master/example/itpir.py

https://blog.acolyer.org/2016/04/01/scalable-and-private-media-consumption-with-popcorn/

Another PIR scheme implementation is the Apache Pirk framework.

https://pirk.incubator.apache.org/

Quote

Pirk is seeded with the Wideskies PIR algorithm which employs Paillier homomorphic encryption

It was initiated by the NSA and given to Apache in July 2016.

https://pirk.incubator.apache.org/pirk_origin

Quote

The initial scalable PIR algorithms and implementations of Pirk were developed at the National Security Agency and contributed to Apache via a Software Grant Agreement

http://events.linuxfoundation.org/sites/events/files/slides/ApachePirk-ApacheConEurope-20161115.pdf