How does the wallet software know which private keys to generate,
There are standards that most wallets follow in order to decide what BIP32 derivation path to use to derive each child key.
Sometimes the software has to ask you the type of address to set that derivation path or lets you enter it manually.
Sometimes they modify the mnemonic (like Electrum) to already include that information.
which are in use, and when to stop generating them?
The software usually has a hard coded max/cap constant that it uses to decide how many keys to derive at first. Like 20. Then they usually check if these addresses are used before, if they are then they derive more keys and check those too until it reaches an unused set of keys to stop.
The software usually has an option to manually set this value (eg. setting gap in Electrum).
Surely it could generate an infinite number?
Technically the number of keys is finite
Say I'd done 10,000 transactions in my wallet so there were 10,000 private keys. How would it know just to generate those 10,000?
It will generate 20 keys, sees they are used so derives 20 more and continues until it reaches 10,020th key and sees that the last 20 keys were not used.