I'm not clear who needs to create the hash-cash "token" - do you mean any node that connects to you that is not an already known peer (i.e. new incoming connections)?
The current rule is that anyone who asks for historical block data is send the blocks.
Hostile nodes could make it harder for new nodes to connect by trying to drain the bandwidth of QoS nodes.
I'd suggest that the algorithm be different to the standard PoW one though otherwise the "proof" will be of little issue to existing ASIC hardware.
I think ASICS are sufficiently hard coded that even a tiny change to the hash would make them useless.
The token could be formatted as a block header but with all the fields other than the previous pointer and merkle root set to zero.
It is likely that any byte array that doesn't have 80 bytes cannot be ASIC mined. Swapping the endianess of the output is another option.
The server could send a challenge for each block requested.
B = block hash
N = 32 byte nonce
S = salt (per peer)
M = bit mask
R = rotation
T = target
rotate(SHA256(B XOR N XOR S) XOR M, R) < T
The server could increase difficulty as it runs out of spare bandwidth.
A node could connect to 8 peers and ask for blocks from the peer with the highest T (lowest difficulty).