"Hello, my name is David Weekly and this is a test of speech quality
audio coding. The purple cat, masked, made an indelible impression on the
clandestine cohorts." - a random sentence with crisp consonants
MS Audio v4.0
- 5kbps, 8khz, mono (9 KB)
- 10kbps, 11khz, mono (16 KB)
- 10kbps, 22khz, mono (16 KB)
- 16kbps, 16khz, mono (24 KB)
- 16kbps, 44khz, mono (24 KB)
- 22kbps, 44khz, mono (33 KB)
- 22kbps, 44khz, stereo (33 KB)
- 32kbps, 32khz, mono (48 KB)
- 32kbps, 44khz, mono (48 KB)
- 40kbps, 44khz, mono (60 KB)
The 5kbps version, while comprehensible, is unpleasant to listen to;
it is echoed, as if I were talking through a tin can. The 10kbps
version at 22khz sounds rather robotic. Reducing the sampling rate
to 11khz produced a much more pleasant version, as there was less
high-frequency "drowning" of the signal. This is also illustrated
in the 16kbps versions at 16 versus 44khz. It seems clear that if
one is to use low bitrate signals with MS Audio, it's better to
use a low sample rate as well. The 32kbps still adds a rather annoying
"swish" to my voice, as if there were a thick piece of fabric on
my lips as I was speaking. At 40kbps, it becomes listenable, even
with some high-frequency artifacts still remaining.
MP3
The VBR (lowest) file here performed aimicably against the constant
bitrate samples. One notices a high-pitched ringing in the 24-48kbps
encodings. The 16kbps encoding is listenable, but sounds like I'm
speaking through a plastic tube of sorts.
Barath Raghavan
wrote in to say that Fraunhofer's encoder offers better quality than
Xing for low, constant bitrate speech. As soon as I get my
hands on some samples, I will post them.
Alternative Speech Codecs
The MetaVoice codec performed
outstandingly, intelligibly reproducing
my voice at a mere 2400 bits per second. While it sounds somewhat like a
Speak 'n Spell instead of me, the text comes across fairly clearly. I was
pleasantly impressed. The L&H CELP did not perform too well (IMHO) against
G.723.1 and ACELP.net, and while ADPCM offered high quality, the size was
nearly two orders of magnitude larger than MetaVoice.
ACELP.net would here be my recommended codec of choice for 5-15kbps speech
coding, with MetaVoice handling anything beneath that.
RealAudio
RealAudio did pretty well with their
16 kbps
(24 KB) encoding and the
32 kbps (55 KB)
were both pleasant to listen to, even if not transparent (i.e., there
were noticeable, but acceptable errors in the audio).
Recommendations
For encoding speech,
I recommend the following codecs for the specified bitrates:
| codec | speed |
| TrueVoice | < 5kbps |
| ACELP.net | 5kbps - 15kbps |
| RealAudio | 15kbps - 50kbps |
| MP3 VBR | > 50kbps |