This will not be possible, beacause there are to many disturbing frequencies.
imagine: your in a supermarket and there's the piip from the cash register, people talking, baby crying, the videocamere who makes piiip all the time, and some advertisement, ...
there are so many possibilities who can be emmittet in the frequence of the signal, that the transaction via sound will not function really good.
-> possyble it functions if you take high frequencies, low are easier disturbable. best will be, that it is so high, humans cant hear it.
(ooh but dogs can hear and bite)
Not possible on most phones if I'm not wrong. The speakers in there are intended only for human vocal range so they don't do very low or very high audio frequencies very well. It's probably easier to do high than low due to the physical size, but not likely so high humans can't hear it.
As for interference, I believe there are established signal processing methods for dealing with those as long as they don't completely drown out the signal. Shouldn't be a problem if the two devices are close together and directed at each other specifically.
What gets me is that thinking deeper, there might not be a practical situation where such a transfer could be authenticated anyway.