Ok, to render with modern algorithms you tyypically need the whole scene, no matter if you render only a single pixel, a 64x64 bucket or a 4k frame.
Just hold your fingers like a square in front of you (away from the screen) - probably you still can see indirect light from the sun that might even be hidden behind clouds. Similar a lot of things that are hidden by your fingers still influence the "bucket" that you see (shadows, reflections...).
Similar to this, even if you only render a small part of a scene, you need to consider a lot of objects that are not visible --> you need to transmit a lot of data.
You described how render farms work quite well (splitting into frames and letting each node work on them alone), still this usually requires so much data that most render farms work in a LAN and have big storage servers in the backend for large textures etc.
Another issue is that you need to give out things you might not want the person rendering the frame to know (e.g. a frame for the upcoming 2nd Hobbit movie). Homomorphic encryption is not that far yet, but might work out in the future for even more general services (then you just load a blob of data, process it and send it back without ever knowing what you actually did)
A commercial platform using BURP as far as I know:
http://www.renderfarm.fi/