just because you can store 256 versions/address types doesn't mean you have to use all of them today. you can use the ones you need to and leave the rest for future expansion. or use a smaller number of bits for the version/address type like 4 or 6 bits.
electrum actually uses a system where the first 4 bits tell you how long the version number is. it's very flexible and perhaps overkill.
a single byte would allow for 256 different address types (2^8).
not if you use it as a "flag". then each bit has a separate meaning and can be combined with other bits. if you use integer values (1, 2, 3,...) then you'll have to define a lot of different cases (1-> x, 2->y, 3->z,... 50->x+y, 60->x+z,...). that makes implementation a nightmare.
in contrast using 0b00000001->x, 0b00000010->y is enough because x+y is 0b00000011 with a simple OR (x|y)
what's the advantage of doing bitwise operations like these on the version bits? seems like a poor use of space to me.