<david.weekly.org> July 25 2008
codecs Speech Tests
 
{ auf deutsch
en español
en français
}
  <d.w.o>
  about
  books
  code
  codecs
  mp3 book
  news
  pictures
  poems
  university
  wine
  writings
  video
  get my updates



don't email

speech sample (2.0 MB)

"Hello, my name is David Weekly and this is a test of speech quality audio coding. The purple cat, masked, made an indelible impression on the clandestine cohorts." - a random sentence with crisp consonants

MS Audio v4.0

The 5kbps version, while comprehensible, is unpleasant to listen to; it is echoed, as if I were talking through a tin can. The 10kbps version at 22khz sounds rather robotic. Reducing the sampling rate to 11khz produced a much more pleasant version, as there was less high-frequency "drowning" of the signal. This is also illustrated in the 16kbps versions at 16 versus 44khz. It seems clear that if one is to use low bitrate signals with MS Audio, it's better to use a low sample rate as well. The 32kbps still adds a rather annoying "swish" to my voice, as if there were a thick piece of fabric on my lips as I was speaking. At 40kbps, it becomes listenable, even with some high-frequency artifacts still remaining.

MP3

The VBR (lowest) file here performed aimicably against the constant bitrate samples. One notices a high-pitched ringing in the 24-48kbps encodings. The 16kbps encoding is listenable, but sounds like I'm speaking through a plastic tube of sorts.

Barath Raghavan wrote in to say that Fraunhofer's encoder offers better quality than Xing for low, constant bitrate speech. As soon as I get my hands on some samples, I will post them.

Alternative Speech Codecs

The MetaVoice codec performed outstandingly, intelligibly reproducing my voice at a mere 2400 bits per second. While it sounds somewhat like a Speak 'n Spell instead of me, the text comes across fairly clearly. I was pleasantly impressed. The L&H CELP did not perform too well (IMHO) against G.723.1 and ACELP.net, and while ADPCM offered high quality, the size was nearly two orders of magnitude larger than MetaVoice.

ACELP.net would here be my recommended codec of choice for 5-15kbps speech coding, with MetaVoice handling anything beneath that.

RealAudio

RealAudio did pretty well with their 16 kbps (24 KB) encoding and the 32 kbps (55 KB) were both pleasant to listen to, even if not transparent (i.e., there were noticeable, but acceptable errors in the audio).


Recommendations

For encoding speech, I recommend the following codecs for the specified bitrates:

codecspeed
TrueVoice< 5kbps
ACELP.net 5kbps - 15kbps
RealAudio15kbps - 50kbps
MP3 VBR> 50kbps

  
  content & layout © copyright 1995-2008 -{ david e weekly }-