... are they all the same with a slight difference?
Yes, only last 4 bytes (32 bits) are incremented locally by GPU. That's little over 4 billion (2^32) hashes. If your GPU can do 500MH/s then all possible hashes will be checked in little more than 8 seconds and then mining software should throw next "work" for GPU. In one "work" unit you may find few nonces that will have proper value (share) but there'e possibility that there will be none. Luck, variance, but overall it should be 1:1.
How is one "work unit" different from another "work unit"?
So for every 500MH/s you get 1 "work unit" every 8 seconds, how are these different from each other?