I don't believe that nodes will ever give their peer list to their peers, at least not intentionally (assuming there are no flaws in the node implementation).
I meant, that a person running this proxy node would also get the peer information from a node they are running simultaneously somewhere else, and the peers are collected organically via peer discovery. Then the proxy nodes can be gathered into some list online, and a load balancer can be placed in front of the various proxy nodes to distribute request traffic across proxy nodes, and hence, geographically distant full nodes. Only load balancer costs will be incurred in that case, while a rate limiter can be set to avoid being bombarded with requests and paying for too much bandwidth.
This is confusing. Are you saying that you are already running a node? Why would you not query the node you already have running?
If you want to run a (very) light node that connects to other nodes, the only real information you can obtain is historical blocks (by block number) and newly broadcast transactions. I don't believe a peer node will ever provide information contained in most RPC commands.
Historical transactions and blocks (assuming the proxy node impl can deduce the nodes running with -txindex using trial and error). But then again, it's much better than paying for an API token to fetch the equivalent data.
Yes, nodes will send you existing blocks. However if you try to use other nodes as an API to find transactions, the number of nodes that are willing to provide "new" nodes historical blocks will go down. The cost of sending data is more expensive than the cost of receiving data (most cloud service providers do not charge for data downloaded, but do charge for data uploaded).
Also, trying to find a particular transaction or a particular block will be inefficient this way if you don't know which block you are looking for.
If you are wanting to create a new implementation of core that allows an arbitrary user to obtain information that you might get from RPC commands -- the only information I can think of would be information about address balances and transactions -- you could connect to an electrum server to get this information.
I know Electrum API is always a thing but that can't fetch blocks, and it can only get transaction history of an address you put inside your wallet. It's very cumbersome if you just want to analyze a group of transactions with different features (source address not being one of them).
If this is your goal, I would suggest that you put all transactions and blocks in a SQL database and search accordingly. This is assuming that you are searching for something that is not supported by a RPC command, and if that was the case, you should just use a full node that has the entire blockchain downloaded.
If you are searching for transactions based on some criteria, and are making multiple queries, you will ultimately have to access the same block multiple times. The time it takes to query a block from a peer is going to be much longer than the time to query a block locally. So you will see negative performance issues if you don't have the blockchain stored on the machine (or accessible via storage bucket) running the queries.