Super Wideband or Super “Hype-band”?
When VoIP applications like softphones started to use wideband codecs in 2003, they gave a major boost to the VoIP market. The improvement from narrowband codecs that use 3.4 or 4 kHz, to wideband codecs that use 7 or 8 kHz, was a giant step in terms of perceived voice quality, and totally changed people’s views on VoIP’s legitimacy.
Today there is a lot of talk about HD audio (usually referred as wideband) and super wideband codecs that use 14 or 16 kHz bandwidth. I have been listening to different audio and music samples with 3.4, 7, 8, 14, 16, 22 kHz to get a better understanding of the quality differences. As anyone who has tried Skype or Google Talk can attest, there is obviously a big difference when going from narrowband (3.4 kHz sampled with 8 kHz) to wideband speech (7 or 8 kHz sampled with 16 kHz). The bigger question is, can people hear the difference when using 7, 8, 14, 16, or 22 kHz in a normal voice conversation?
In my opinion, there is an audible difference when moving from a 7.0 kHz sample to a codec that supports 8 kHz, such as iSAC and iPCM-wb. However, there is a much less obvious difference between 8 kHz and 14 kHz, which I can only detect after listening to a speech sample several times. I experimented with different headsets and speakers, and found that studio grade equipment can accentuate the quality of the super wideband samples, but not to an extent that the average user would be able to regularly appreciate. Furthermore, super wideband codecs are more susceptible to background noise, to the point that my experience was actually much worse using 14kHz than 8kHz when I added even low levels of ambient noise. Once I went beyond 14 kHz, I was unable to hear any difference in quality at any range for speech or even music.
The basic conclusion of my simple tests is that quality differences between wideband and super wideband are not obvious above 8 kHz, but can be detected by using the right equipment.
There is obviously still more work to be done to provide the most robust speech quality for IP communications. Super wideband may end up pushing the market even further, but the jury is still out. Regardless of what transpires, GIPS will continue to support a wide range of codecs to provide the best user experience possible.
Tags: codecs, Google Talk, Skype, wideband







January 9th, 2009 at 6:07 am
Thanks Mats. Good analysis.
January 9th, 2009 at 6:37 am
Mats,
Thanks for the post. It seems we’re talking about similar things
I have just questioned the same sentiment of increasing resolutions of video and the usefulness of it. I totally missed the same “trend” in voice.
Tsahi
January 9th, 2009 at 7:09 am
Tsahi,
Thanks for your comment.
> It seems we’re talking about similar things
Very much so. Great post on your side.
It is also interesting that you sometimes connect frame rate with the HD term, although HD is from the beginning a resolution definition. Then it is a question if you can call a solution that provides 1 frame/s with 960×720 resolution a HD solution?
Mats
March 3rd, 2009 at 8:29 pm
[...] rates if the application in question is telephony. There are some in-the-know who argue that the difference between wideband and super-wideband is not appreciable. Of course, if the application is conveying music then that’s a whole ‘nuther [...]
March 5th, 2009 at 11:56 am
Mats,
or at least not HD.
Most of the times, HD means starting from 30 frames per second – I’d call anything lower than that disgraceful
HD today can go up to 60 frames per second in some solutions.
The whole point in HD is stating that you’re giving something better in terms of the media’s quality. Increasing resolution but decreasing frame rate considerably won’t provide an increase in the perceived quality in most cases, so it isn’t considered as HD in my book.
March 6th, 2009 at 1:55 am
Tsahi,
I agree with you that ~30 fps and above is what would give you the smooth video experience. The higher resolution the more senstive you are for the frame rate.
What is also important is to provide a flexible solution that can handle both packet loss (via error resiliance) and a drop in bandwidth limitation and still provide good HD experience.
November 1st, 2009 at 10:02 am
I’d like to point out that 30 fps is used because after 26fps the human eye/brain can’t notice the difference to reality