- Download a block (bandwidth usage)
- Verify its validity (CPU intensive, plus indexed queries)
- Insert the block in the DB - indexation (IO intensive, and probably O(log) on the index size, or perhaps even O(n), don't know)
- Proceed to next block
The blocks are downloaded into a memory buffer by the network handler thread. This buffer is 10 megabytes, so it can certainly contain several blocks simultaneously.
The initial verification is very fast.
The bulk of the verification is done while updating the index: we need to a) find previous outputs b) check they are not yet marked spent c) evaluate scripts and signatures d) mark previous outputs spent e) mark new outputs spendable. This has to be done transaction per transaction (mostly), as each transaction can legally spend the outputs of the previous ones. Certainly optimizations are possible, but it's not as easy as splitting verification and db updates into two threads.
By the way, Matt Corallo has a branch that does split the initial verification and connecting/indexing steps into separate threads, and this does indeed seem to improve throughput a bit.