1) Why are ASIC chips sold in bulk, and not each one alone.
Because they are produced in bulk. Producing a single chip is, relatively speaking, a lot more expensive than producing 1000. In mining devices, it's usually more efficient to solder multiple chips to a single board rather than have only one chip per board.
2) Why are some Bitcoin miners USBS, while others are hard disks
There's no such thing as a hard disk Bitcoin miner.
Miners need a way to communicate with the Bitcoin network. The simpler devices typically do this via a host computer to which they connect via USB. The host computer controls the miner and handles network communication. More advanced devices have their own network port and have a little bit of hardware inside that handles network communication, allowing them to be used without a dedicated computer.
3) Whats the difference between CPU, GPU and ASIC mining and why is ASIC mining the best?
In general: the more specialized a piece of hardware is, the more efficient it is. A CPU has a very broad range of applications and because of this, it's typically not so fast at them. A GPU has a much narrower range of applications, as it only works well for heavily parallelized operations (in this case: doing the same operations simultaneously for a lot of different input values), so it's much faster than a CPU for these types of operations, but it's more limited in what it can do.
An ASIC (Application Specific Integrated Circuit) is built for a single task. All the logic is baked into the chip, rather than programmed like it is done with a CPU or GPU. This means that except for the single task it is build for, an ASIC is nothing more than an expensive paperweight. However, since it's only designed for a single task, it can be optimized extremely well for this task. So an ASIC will have much better performance than a CPU or GPU for the task it is designed for and will be literally useless for any other task.