Posts Tagged ‘Google Talk’

A Broken Compass

Henrik Lundin
Posted by Henrik Lundin
on February 2nd, 2010 in Technology

Browsing around the papers presented at the latest NOSSDAV workshop, I found “An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger”. Having worked extensively with GIPS’ jitter buffer algorithms, and having some knowledge of Google Talk, I was intrigued by the title. The paper had some interesting experiments, but also a few giant leaps to conclusions.

The paper’s authors have created a laboratory test bench for PC soft phones where they emulate different network conditions (delay, jitter and packet losses), and measure objective speech quality (PESQ) and the end-to-end delay. Then they apply a previously proposed hybrid between PESQ and the E-Model to arrive at a score which takes both measured speech quality and delay into account. The idea is that both audio quality and end-to-end delay contribute to the total conversation experience, which is an easily supportable proposition. Finally, they derive an optimal playout buffer delay for each network condition based on this hybrid measure. I will come back to this approach later.

The experimental part of the paper, setting up the lab and examining the three clients, seems all fine to me, even though I’m not sure that their delay estimation algorithm really can cope with the rapid delay changes that modern jitter buffers apply. They also make rather wild assumptions on coding, packetization, and soundcard delays. But those are minor issues. My problem is their use of the objective hybrid model as a guide to optimality. It is widely know that PESQ is rubbish when it comes to assessing agile jitter buffers, simply because it cannot follow the swift delay adaptation. Tagging on a delay impairment factor to obtain a total user experience number frankly doesn’t improve the situation.

The authors wrap up their work by comparing the measured delays of the three clients, with the delay that renders the highest score in their hybrid measure under the same network conditions. The three clients all exhibit different behavior – not very surprising since they have different jitter buffers – but none of them follow what the authors claim to be optimal. Hence, the user experience of all three VoIP clients could be vastly improved, if only the “optimal” delay would be applied, is their conclusion. Allow me to disagree.

Surely these VoIP clients can be improved, but to distrust the man-years of design and implementation, and endless hours of in-house and customer tuning and testing, I need something more than the broken compass that is PESQ.

Super Wideband or Super “Hype-band”?

Mats Perjons
Posted by Mats Perjons
on January 8th, 2009 in Technology

When VoIP applications like softphones started to use wideband codecs in 2003, they gave a major boost to the VoIP market.  The improvement from narrowband codecs that use 3.4 or 4 kHz, to wideband codecs that use 7 or 8 kHz, was a giant step in terms of perceived voice quality, and totally changed people’s views on VoIP’s legitimacy.

 

Today there is a lot of talk about HD audio (usually referred as wideband) and super wideband codecs that use 14 or 16 kHz bandwidth. I have been listening to different audio and music samples with 3.4, 7, 8, 14, 16, 22 kHz to get a better understanding of the quality differences. As anyone who has tried Skype or Google Talk can attest, there is obviously a big difference when going from narrowband (3.4 kHz sampled with 8 kHz) to wideband speech (7 or 8 kHz sampled with 16 kHz). The bigger question is, can people hear the difference when using 7, 8, 14, 16, or 22 kHz in a normal voice conversation?

 

In my opinion, there is an audible difference when moving from a 7.0 kHz sample to a codec that supports 8 kHz, such as iSAC and iPCM-wb. However, there is a much less obvious difference between 8 kHz and 14 kHz, which I can only detect after listening to a speech sample several times. I experimented with different headsets and speakers, and found that studio grade equipment can accentuate the quality of the super wideband samples, but not to an extent that the average user would be able to regularly appreciate.  Furthermore, super wideband codecs are more susceptible to background noise, to the point that my experience was actually much worse using 14kHz than 8kHz when I added even low levels of ambient noise. Once I went beyond 14 kHz, I was unable to hear any difference in quality at any range for speech or even music.

 

The basic conclusion of my simple tests is that quality differences between wideband and super wideband are not obvious above 8 kHz, but can be detected by using the right equipment.

 

There is obviously still more work to be done to provide the most robust speech quality for IP communications. Super wideband may end up pushing the market even further, but the jury is still out. Regardless of what transpires, GIPS will continue to support a wide range of codecs to provide the best user experience possible.

Wideband Audio and Softphones

Roar Hagen
Posted by Roar Hagen
on November 25th, 2008 in Technology

It was very interesting to read Michael Graves’Rant” on softphones. What I found intriguing was that it seemed to be all about the need for wideband audio (and G.722). Wideband audio is defined by 16 kHZ sampling frequency (compared to 8 kHz sampling for narrowband used in regular telephony), which also doubles to audio bandwidth and provides fidelity closer to CD quality than the clunky telephony quality we are used to.

Skype is to me a softphone, and I see them as the enabler of the softphone market providing the breakthrough for desktop VoIP. The 2 major reasons for Skype’s success were that their service actually worked, and their high audio quality. They were able to raise the bar on audio quality largely because they had robust wideband audio from the get go. Codec wise, Skype has always been proprietary (proprietary vs. standards is another long story) and I think G.722 is old circuit switched technology not very well suited for packet networks.

Google Talk is in my book an even higher quality “softphone” (it seems like Google is never happy with anything but the best) and of course also wideband audio (using the GIPS iSAC codec).