Posts Tagged ‘Super-wideband’

What’s New in GIPS VoiceEngine 3.4?

Henrik Andreasson
Posted by Henrik Andreasson
on October 16th, 2009 in Company News, Technology

A new version of GIPS VoiceEngine is about to be released. Therefore I would like to give a brief overview of some of the most important feature updates. 

From a customer point of view, the two main additions are an expansion of our HD Voice support for Super Wideband (SWB), as well as support for Stereo Playout.   Let me start by describing the new integration of SWB in VoiceEngine 3.4. 

In order to expand support for SWB, all of VoiceEngine’s core functions now work efficiently at a 32 kHz sampling frequency. Currently, G.722.1 Annex C (or G.722.1C) is added to VoiceEngine 3.4, but the new architecture enables any codec using 32 kHz as the sample rate. G.722.1C provides 14 kHz audio bandwidth using 32 kHz sample rate at three different bitrates: 24, 32 and 48 kbps. However, as this blog has argued before, only supporting a SWB codec is not sufficient to providing high quality voice if the right additional components are not present. The new VoiceEngine from GIPS ensures true SWB quality since it contains SWB upgrades of all core components such as:

  • adaptive jitter buffer and error-concealment unit (GIPS NetEQ)
  • echo cancellation
  • automatic gain control
  • noise suppression
  • voice-activity detection
  • comfort noise generation
  • mixers

In addition, utilization of the new SWB components are only activated when needed, hence the footprints for 8 and 16 kHz modes are not increased compared with the previous version of VoiceEngine.

 VoiceEngine’s other main new feature in is the added support of stereo playout. The existing version of VoiceEngine allows “stereo modifications”, such as panning, but all actions must be performed on the client side. In the upcoming VoiceEngine 3.4, it will also be possible to play out a received stereo-signal packetized according to IETF RFC 3551. In essence, this means that a GIPS client will now be able to play out a dual-channel RTP stream, where a conference server, for instance, has performed some sort of spatial filtering of the conference participants. The end result would be that a user gets a feeling of all participants sitting around a conference table, with their voices coming from different directions. Note that the client is not performing any stereo intelligence, as the actual stereo effect must be generated at the transmitting side. As this feature pertains to the new SWB capability, the new stereo features are activated and deactivated dynamically and no new API calls are required. The only action needed on the client side to enable true stereo playout is to register a certain codec (payload type, name etc.) as a dual-channel codec. 

In addition to SWB and stereo playout, the following features will also be added to the latest release of VoiceEngine:

  • RTP-dump” APIs which allow recording of received and transmitted RTP streams into rtpdump-compatible format.
  • Complete Windows 7 and MAC OS X 10.6 (Snow Leopard) support.
  • Automatic ducking or stream attenuation is a new feature in Windows 7 that is intended for VoIP. By default, the operating system reduces the intensity of an audio stream when a communication stream, such as a phone call, is received on the communication device through the computer. The latest VoiceEngine 3.4 fully exploits this new functionality in Windows 7 and allows the user to define a certain audio device as the default communication device.
  • Possibility to build 64-bit versions of VoiceEngine for MAC OS X.

As the Technical Area Manager for voice technology, I can say that I am personally very excited about these new features, and look forward to seeing (and hearing) them enable some really cool and innovative products.

#I♥HDvoice

John Gallagher
Posted by John Gallagher
on August 10th, 2009 in Market Trends, Technology

#I♥HDvoice and there are multiple reasons. Jeff Pulver provides a very good summary here in this video – so I won’t repeat what is already a good message.

However, I will add this – all of us in the industry need to work at raising the profile of HD voice – so that all telephone users understand what is HD voice. The fact that we can’t always understand each other on the telephone is one glaring reason. So today, GIPS is reaching out to spread the word on HD voice and you can help too.

Let’s reach out to twitterers, colleagues, competitors, facebook friends IM buddies and compile a list of Mondegreens and spoonerisms.

What’s a Mondegreen or spoonerism I hear you say? A mondegreen is a phrase that is misheard or a misinterpretation of a phrase. A spoonerism is an error in speech or deliberate play on words in which corresponding consonants, vowels, or morphemes are switched.  We’ve all heard them. Rude, fun, bizarre – they run the gamut.

I plan to compile the best Mondegreens and spoonerisms and weave them into a blog topic that we can share around to promote the need for HD voice.

It’s an easy way to reach out to twitterers (I suppose that’s a word now) and bloggers and it would be great to see if we can raise the profile of HD voice through Twitter and get people talking about the need for huge improvements in our telecommunication with the technology that exists today.

So no matter how inane, it would be great if you could contribute, retweet #I♥HDvoice with a phrase – and what it was supposed to say.

To submit your tweet to GIPS on twitter click here.

Canned weed Buick – Yeah swing  gang!

Counterpoint: The Promise of HD Voice

John Hermansen
Posted by John Hermansen
on March 16th, 2009 in Market Trends, Technology

A couple months ago, my colleague Mats blogged about the dangers of overhyping “super wideband” speech.  Since then, there have been a few posts on the GIPS blog discussing the technological significance, as well as the market implications of super wideband. While these posts have certainly been accurate and well-informed, they have tended to downplay the importance of this emerging technology. Thus, I would like to highlight a few of the potential benefits of wideband technology, or HD voice, in general (as if we needed another opinion on the matter), and in turn hopefully broaden the overall discussion.

Perhaps the most promising element of the debate over HD voice is the growing awareness of the perception of voice quality. Skype’s announcement of their SILK codec, and Jeff Pulver’s plans for an HD VoIP Summit have raised the profile of the topic to the point where it is getting significant discussion. Since the inception of the company, GIPS has been arguing that voice quality matters. While there may be questions about the useful application of super wideband codecs, it only means that there is uncertainty in the market about the degree to which emerging technology should be implemented. The very fact that the discussion is taking place means that people recognize legacy solutions (e.g. those designed for the PSTN) are inadequate for the next generation of communication, and that new technology needs to be adopted to overcome these limitations.

Which leads me to my next point- an awareness of the importance of voice quality is good for the overall VoIP market. If VoIP can be recognized as not only a low cost alternative to the PSTN, but also a mode of communication that can deliver even better quality than what people have become accustomed to, then it will truly gain mass adoption. I can’t tell you how many times I have dialed into a conference call from a landline or cell phone and struggled to keep up with the conversation due to poor quality. When people realize a better world is possible, it will be too difficult to go back to the inadequate solutions of the past. At that point, as my colleague Larry likes to say, the term “VoIP” will go away, and people will just be using “voice”.

How super-wideband is super-wideband enough?

Andrew MacDonald
Posted by Andrew MacDonald
on February 26th, 2009 in Technology

With the recent discussions of super-wideband codecs (cf. Mats’ post), I had the notion to find, at least for me, at what bandwidth a speech signal would be transparent from its source. Or, more accurately, at what bandwidth I could no longer declare the signal to be non-transparent.

Transparency, in the context of subjective evaluation, means to be indistinguishable from a reference. In a general sense it is ultimately the goal of all lossy compression schemes (of which most audio codecs are examples). The encoded and decoded output of audio codecs such as AAC can be transparent to reference at practical bitrates.

I prepared a small ABX test, in which the listener is presented with a series of unknown signals which they must correctly identify as a reference or alternative. I used a female source sampled at 48 kHz and the source resampled to a series of different bandwidths for the alternatives. For each bandwidth, I used 10 trials, striking a balance between statistical significance and time required for the test.

On to the results! A correct score here means I was able to correctly identify the unknown signal as reference or alternative. (The frequency listed is the signal bandwidth; half the sampling frequency).

8 kHz — 10 correct
10 kHz — 10 correct
12 kHz — 10 correct
14 kHz — 10 correct
16 kHz — 8 correct (still ~95% confidence)
18 kHz — 3 correct

I stopped there, since I was clearly unable to any longer make a distinction. Furthermore, I probably can’t hear much higher than 18 kHz (yet another interesting test!). To be honest, I was very surprised with the 16 kHz result, so surprised in fact that I repeated the test only to arrive at the same result (which merely serves to improve the confidence…). I was, however, using a high quality USB sound device and accurate headphones. To make the score somewhat more relevant to a practical VoIP scenario, I retook the 16 kHz test using my laptop sound card and a cheap headset. This time I scored 5 correct, which is no better than flipping a coin.

All of this should be taken with a grain of salt, but it suggests that i) it’s possible to distinguish between reference and 16 kHz bandlimited speech given the best equipment, and ii) that a typical VoIP user might consider something less such as 14 kHz bandlimited speech to be transparent.