Author Archive for Dr Jam

VoIP Apps for iPhone Finally Set Free

Jan Linden
Posted by Jan Linden
on February 18th, 2010 in Technology

After AT&T’s announcement last fall that they would allow VoIP applications to use the 3G network very little has happened. No applications offering such services have actually been approved by Apple to be sold in the Apps Store. Until now, that is. Today our customer Toktumi announced that their latest upgrade of the Line2 application for iPhone has been approved by Apple. This application is touted by Toktumi as “the first iPhone calling app that works over 3G, Wi-Fi, and cellular networks using the same number”. This is pretty big news. The end user can get better call quality (HD Voice), improved coverage (through WiFi), and save a lot on call charges. VoIP applications have previously only been available on Symbian, Windows Mobile, and most recently Android devices, while maybe the most popular smartphone has lacked such support.

The Line2 application offers much more than just a standard VoIP application (As opposed to e.g. iCall, which is another VoIP application fro the iPhone). In fact, I use it on my Blackberry even though there is no VoIP support on that platform. This is because RIM hasn’t opened up the development environment in such a way that it is possible to develop a true VoIP application for the Blackberry environment. That topic is worth its own post so I will refrain from commenting more on this very frustrating issue…

Is QoS the Answer to VoIP Quality Issues?

Jan Linden
Posted by Jan Linden
on December 10th, 2009 in Technology

As long as I have been involved with VoIP the debate whether QoS methods are the solution to providing good voice quality has been ongoing. With QoS methods I refer to protocols that allows for prioritization of packets that have low latency requirements such as VoIP packets. Of course, if from the VoIP applications point of view, the network is perfect you should also expect perfect quality. As a a side point, that is a very reasonable expectation but unfortunately something that is very often not the case. The reasons can be endpoint hardware or software related or a combination of both. I discuss some of the potential issues in a previous blog post.

The reason why QoS methods are not heralded as the savior of VoIP quality (and video for that matter) is that they are often impractical to implement and not as efficient as one might assume. For example, if the amount of data on the network that needs to be prioritized represents a significant portion of the total traffic the scheme will fail completely. Another issue is the impact on the so called background traffic that doesn’t get prioritized and may result in unacceptable behavior of the less prioritized data streams.

QoS methods are successfully used in well managed and controlled networks but because the VoIP traffic often traverses many networks, including the largely unmanaged Internet, rarely can end-to-end prioritization be guaranteed.

because of these limitations of QoS methods it is crucial that any voice or video offering over packet networks deploy endpoints that can compensate for network issues.

So, you may ask, what can I do on my own network? In this article in  ComputerWorld you can learn how to tweak your WiFi router settings to implement QoS on your home network. As I mentioned previously, this will unfortunately only help the performance of the WiFi network and it requires changing the router configuration in a manner most consumers are not aware of or not able to do because of the complexity involved. So, even though I think it is a good idea to make such adjustments they only solve problems on a small portion of the data path for a call (the actual broadband connection is much more often the real culprit) and are unlikely to be done made by most end users. Therefore, as a developer of a VoIP or video over IP product you can never assume that QoS will save you, you have to make sure that your product has been properly designed to mitigate network issues.

Is Your HD Voice Solution Really HD?

Jan Linden
Posted by Jan Linden
on September 25th, 2009 in Industry News

Over the last few weeks there has been a lot of activity on the HD Voice front. I have myself participated in not less than three HD voice events since the beginning of the month. It started with the ITEXPO conference in L.A. in the beginning of September. A whole track was dedicated to HD Voice with several interesting panels. The room was full most of the time indicating a growing interest in this topic. I think people have started getting the benefits of HD Voice because now the discussion was much more focused on how it can get deployed quickly rather than what it is. In particular, the notion that just because you have a HD codec you don’t necessarily offer true HD Voice quality garnered a lot of interest. This was also the focus of my talk at the HD Communications Summit in New York a couple of weeks after ITEXPO. The codec is of course a crucial part of any VoIP solution and it sets the upper limit of the quality that can be achieved. So, a good HD Voice codec is a necessary but not sufficient requirement for offering HD Voice quality. Many other parts of the solution are equally important in order to achieve the best quality. The most important factors to consider are:

  • HD capable microphone
    • At least 16 kHz sampling frequency
  • HD Voice Quality Enhancement
    • Echo cancellation, noise suppression, gain control,…
  • High quality HD Voice codec
    • Suitable for usage scenario
  • End-to-end network HD Voice support
    • Preferably no transcoding
  • Network clean-up
    • Quickly adapting jitter buffer and smooth packet loss concealment
  • HD capable loudspeaker
  • Low latency

Only if all these factors are properly addressed will the users experience true HD Voice. My colleague John Hermansen found a very good way of illustrating this message with this picture:

An HD solution with just a codec is like a clunker with really nice rims...

An HD solution with just a codec is like a clunker with really nice rims...

The most recent event covering HD Voice was The VON CTO Summit which was organized by VON in conjunction with the VON conference in Miami this week. The event was advertised as ”…a high-level dinner roundtable at which leading competitive service providers will develop a road map for creating a nationwide IP-based peering fabric that will bypass the legacy PSTN and support advanced services such as HD voice.” The results from this meeting will be announced in the near future.

G.722 Revisited

Jan Linden
Posted by Jan Linden
on July 30th, 2009 in Uncategorized

After my last post, in which I mentioned that G.722 is not well suited for usage over unmanaged networks such as the Internet, I received a few comments on that topic and therefore thought it would be helpful to elaborate further in this posting.

Let me start with trying to straighten out a question mark related to G.722. What is really G.722?  When we talk about G.722, do we talk about one codec or the set of codecs that seemingly belong to the same group since they all have names that start with G.722? This set of codecs include, in addition to G.722 itself, G.722.1, G.722.1 Annex C, and G.722.2, which are all very different codecs (one could possibly argue that G.722.1 and G.722.C are not so different since basically G.722.1C is a super wideband version of G.722.1). Since they are distinctly different codecs it is customary to treat them separately and not use the G.722 name for the group of codecs but only for the actual G.722 codec itself. With this cleared out we can focus on the characteristics of G.722.

G.722 was standardized by the ITU-T in 1988 and is a wideband (7 kHz audio bandwidth) speech codec operating at 48, 56 and 64 kb/s. The technology it is based on is called sub-band ADPCM. ADPCM coding is recursive which results in a strong dependency on previously received data when decoding at the current time instant. This obviously has a negative impact on performance when frames of data are lost. Without the proper history of data 100 % correct decoding is not possible and some kind of guesswork has to be included.

Even though G.722 is an old standard it is still being considered for new deployments, such as the New generation DECT (DECT-NG) standard. The main factors making it an attractive choice are that it is a wideband codec with low complexity and no IPR issues. It supports three rates, 48, 56, and 64 kb/s which facilitates some limited adaptation to available bandwidth. However, on the negative side you have to consider that the quality for the bit rate is not very high, and the robustness against packet loss is not the best.

In fact, until recently there was no packet loss concealment (PLC) technique defined for G.722. Implementing PLC was left completely to the device or application manufacturer.  The result is that the PLC and, and therefore G.722 performance, varies significantly between implementations.  Recently, The ITU added Appendix III and IV to the G.722 standard which specifies two PLC algorithms that can be used with G.722. Appendix III delivers better quality but at the price of significantly increased decoder complexity while the complexity increase when using Appendix IV is negligible. Both are significantly better than typical implementations I have seen previously. However, both algorithms are based on IPR that most likely require a license from the patent holders to be used.

For more details, please read this study of several ITU wideband codecs. The following figure, taken from the France Telecom presentation compares the performance of G.722, G.722.2, and G.729.1:

 Wideband Codec Comparison

The conclusion is pretty clear. In clean (no packet loss, no background noise) conditions G.722 at 64 kb/s has noticeably lower quality than G.729.1 and G.722.2 at 24 kb/s. It should be noted that the quality of G.722 at 56 kb/s and 48 kb/s drops significantly over the 64 kb/s mode.

Consider the following figure from the same study for comparison of PLC  methods for G.722:

 G.722 PLC Comparison

PLC A is Appendix III, PLC C is Appendix IV, and PLC 0 is a brute force method that just sets all missing codewords in the decoder to what corresponds to the minimum value, which give a lower limit on what a PLC method for G.722 can do. If the packet loss concealment methods proposed in the new Appendices to G.722 are used, decent quality can be achieved up to about 5 % packet loss, however less sophisticated methods cannot even handle 1 % of packet loss properly.

In summary, the pros of G.722 are that it is IPR free (PLC not included) and the relatively low complexity. On the negative side you find a high bandwidth utilization, lower quality than other options even at significantly higher bit-rate, and inconsistent packet loss robustness. As has previously been stated, it is clear that the codec in itself is just a small piece of the puzzle and that it is the implementation, more than anything else, that determines the quality.

Links to a few codec comparisons:

http://www.cablelabs.com/specifications/PKT-SP-CODEC-MEDIA-I07-090702.pdf

http://portal.etsi.org/stq/workshop2007presentations/Quinquis_slides.pdf

http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

More on the HD Communication Summit

Jan Linden
Posted by Jan Linden
on May 26th, 2009 in Industry News, Market Trends, Technology

As pointed out in John’s blog, the HD Communication Summit last week was a great gathering of industry experts, all with the same goal of advancing HD voice deployments. At times the discussions were fairly but always very constructive.

The hottest issue related to the number of wideband codecs that need to be supported. Some suggested that in order for HD Voice to really take off a very limited (two to three) set of codecs has to be agreed upon. Clearly there are scenarios where interoperability is no issue and hence any codec can be used in such scenarios. Others said that it is unrealistic to assume that such a small set of codecs will be agreed upon and that transcoding will be a necessity. AudioCodes, for example, suggested that this is the most likely scenario. Dave Frankel, CEO of ZipDX also suggested that we need to accept transcoding, at least initially, to get HD Voice going. If not, we run the risk of not getting anywhere by waiting for the codec “war” to come to an end. Currently, it seems like G.722 is the most common choice for interoperability. As Jason Fischl with Skype pointed out, not only is the bitrate high but G.722 is also extremely sensitive to packet loss and therefore not a good choice for VoIP anywhere outside the managed networks.

Another discussion related to codec choices that created some debate was the topic of licensing. The codec landscape contains everything from open source, through license free proprietary codecs to standards with very complex licensing situations. For many, I believe, this was the first time they realized that even a codec that is labeled free is rarely truly free. Many free licensing agreements include marketing and IPR conditions that will be, by some, considered as having a high cost. In addition, indemnification from IPR claims does not come for free.

Most people I talked with agreed that the fastest way to create an end user pull for HD Voice is by widespread deployments in the wireless networks. It was very interesting to hear Benoit de Boursetty, Director FTNA, at Orange describe Orange’s deployments of HD Voice. Clearly this is an operator that takes HD Voice seriously and sees it as a key differentiator. Benoit said that they don’t see a distinct pull for HD Voice but on the other hand he claimed that it does increase customer retention.

The message in my own presentation was that the codec is just one piece of the HD Voice puzzle. I.e., in order to experience true HD Voice, all other parts, including acoustic hardware, echo cancellation, and other signal processing have to support HD Voice and provide the best possible quality. It doesn’t matter how good of a codec you have if the other parts of your solution are not up to par. I was glad to see that several other of the speakers at the event, including Martyn Humphries of Broadcom and Christian Stredicke of Snom, made the same observation. Christian also suggested a HD Voice label to be put on HD Voice capable devices.

For more details about the summit, check out the twitter feed here.

Jeff Pulver announced that the next event will be held on September 15 – 16 this fall.

 

Some additional blogs on this event:

http://pulverblog.pulver.com/archives/008925.html

http://dougonipcomm.wordpress.com/2009/05/21/hd-communications-summit-pulver-announces-hd-marketing-association-fcc-petition-fall-event/

http://dougonipcomm.wordpress.com/2009/05/22/hd-communications-summit-codec-convergence-hd-logo-take-center-stage/

http://blog.radvision.com/voipsurvivor/2009/05/20/can-you-hear-me-now-2/

http://www.mgraves.org/voip/2009/05/hdvoice-summit/

Slow Down Your Communication

Jan Linden
Posted by Jan Linden
on April 17th, 2009 in General

In my post from eComm I mentioned the work of Stefan Agamanolis with Distance Lab. He likened today’s mobile communication with fast food and proposed “Slow Communication” as corresponding to the current trend of Slow Food. Very rarely do we pay full attention to a phone conversation anymore. Either we are on the computer at the same time or because we are no longer tethered to a fixed phone we are easily being distracted by things around us.

I have thought a lot about this topic lately and found that there is almost no occasion when I pay full attention to the actual phone conversation. I find myself always doing something else. Even at home, I’m most of the time close to the PC. In addition, wireless phones make it possible to be where everything is happening. Using VoIP on the PC  makes this even worse since you are basically tethered to a device which offers abundant opportunities for doing other things while on the phone (Internet browsing, email, IM, games, watch movies, etc).

In my younger days this was definitely not the case. I grow up in a home with only one phone. It was of course stationary and it was placed in the entrance hall of the house. All my friends with only a few exceptions had a similar set up. The exception was that some actually had two phones, the second phone typically in the parents’ bedroom. When you were on the phone there was basically no distraction – the TV and radio were in other rooms and you could close the door to lock out annoying siblings.

After thinking about this for a while it has become clear to me that I would like to, at least for certain calls, be able to concentrate fully and “slow down” the communication. The quality of a conversation obviously improves significantly if there are no distractions and one can focus fully on the call. So, the question I ask myself is how to apply slow communication in my daily routines. The work at Distance Lab is very interesting but I am thinking about what I can do right now without the need for a complicated setup.

After experimenting with this a little bit I have found that there is a tremendous improvement in conversation quality (How is that measured? – That will have to be the topic of a later post) by just taking some simple steps. The first one is obvious; just try to find a place with no distractions. What I have also found is that the effect of turning out the light results in another significant step towards the goal. I was actually surprised how much that helps. It is just a bit dangerous if you are tired. It is also important to find a comfortable place to sit – lying down is not at all recommended. The things I have tried so far are all very obvious and simple but I feel that by taking these simple measures there has been a significant impact on the quality of those phone conversations. I will continue to experiment with this and I will get back if anything interesting results.

Not Much Love for VoIP and Video in iPhone OS 3.0

Jan Linden
Posted by Jan Linden
on March 17th, 2009 in Uncategorized

Today Apple previewed the iPhone OS 3.0 and announced the immediate availability of a Beta version for developers. There has been speculation that this release will address some of the major challenges that face voice and video over IP developers on this platform. Two issues in particular have bothered us; namely  the lack of support for applications to run in the background and the fact that applications cannot get access to video recorded from the built-in camera.

Support for running applications in the background is essential for a fully functional voice or video application. Without it you either have to have the application running at all times, which makes it impossible to run other applications, or accept that inbound calls may not get through. Apple has to some extent addressed this issue with the introduction of a push notification system. Actually, this system was initially announced last summer for release in September 2008 but has not been released until now. The push notification system is based on communication with an Apple server over the cellular network but offers far from the functionality that can be provided with support for background processes.

In order to offer two way video conferencing, support for capturing video from the iPhone’s camera is necessary. Hence, it is very disappointing that no such support has been added in this release. This limits video conferencing applications to one way video. It is my guess that Apple will fairly soon release an iPhone with a front facing camera, which is another pre-requisite for a good video conferencing solution and that there will be no support for video capture into 3rd party applications until this enhanced phone is available.

It seems like Apple has addressed a number of general issues for iPhone developers but left us VoIp and video over IP developers without much to get excited about.

eComm – A Great Place

Jan Linden
Posted by Jan Linden
on March 6th, 2009 in Market Trends, Technology
This week I attended the eComm conference. What a great conference it was! Thanks Lee for putting this together. I think practical details such as keeping the presenters on a short leash and diligently keeping to the time schedule makes for a very good experience. The 15 minute presentation format and no parallel sessions are also, in my mind, the right format for this type of conference.

There were many great presentations ranging from very technical and geeky to refreshing high level thoughts on communications. Even though there were many more really good ones I would like to single out a few that I found especially interesting.

Ge Wang of Smule/Stanford had an exciting keynote on “Creating New Expressive Social Mediums on the iPhone” where he presented a number of really cool applications for the iPhone including an application called Ocarina that turns the iPhone into a flute (you blow into the microphone).

 

Ge Wang playing the Ocarina on the iPhone at eComm2009.

Ge Wang playing the Ocarina on the iPhone at eComm2009. Copyright 2009 by James Duncan Davidson

In terms of new applications/services I really liked Matt Ranney’s presentation on  RebelVox‘ technology that in a great way combines live and  asynchronous voice communications. This can be viewed as an integration of Voice SMS/IM, text IM, and live voice calls. This is definitely a type of service I would be prepared to pay for.

A nice perspective on today’s communication style was presented by Stefan Agamanolis with Distance Lab. He likened today’s mobile communication with fast food and proposed “Slow Communication” as corresponding to the current trend of Slow Food. Very rarely do we pay full attention to a phone conversation anymore. Either we are on the computer at the same time or because we are no longer tethered to a fixed phone we are easily being distracted by things around us.

The trend towards enabling web developers (rather than just voice developers) with simple enough tools to allow them to build voice applications into their web offerings is continuing to evolve. A recent example is Voxeo’s launch of Tropo.com.

As a speech coding person it would be surprising if I didn’t comment on Skype’s SILK codec announcement. The codec, which can be run in narrowband, wideband, or even “superwideband” mode seems to be a very well designed codec with good quality at many bitrates. Binaries can be obtained without any licensing fees and there is no obvious restriction for usage. I.e., it can be used for applications that do not involve Skype at all. As practically all free codecs, and most standard codecs for that matter, it doesn’t come with indemnification against patent infringements. That is only to be expected, and quite natural since there is no licensing fee associated with the usage of the codec. Indemnification is of course one of the benefits you get from buying a solution from a vendor like GIPS. In addition to making binaries available to everybody it was announced that Skype is planning to release source code to select partners for optimization on certain platforms.

Regarding the technical specifications of SILK my only concern is regarding complexity and memory usage. Not that any of those numbers are worse than comparable codecs; they are actually in the same ballpark as most and complexity is better than e.g. AMR-WB. However, this level of complexity is high for many mobile and embedded solutions and there is a need for lower complexity wideband codecs.

A very nice gesture by Lee was to donate 10 % of the proceeds to a local charity. The money went to Shelter Network that “…is committed to providing housing and support services that create opportunities for homeless families and individuals to re-establish self-sufficiency and to return to permanent homes of their own”

 
Myself talking about VoIP on the iPhone at eComm 2009. Copyright 2009 by James Duncan Davidson

Myself talking about VoIP on the iPhone at eComm 2009. Copyright 2009 by James Duncan Davidson

Wow, did Apple just get a patent on video conferencing for touch screen devices?

Jan Linden
Posted by Jan Linden
on February 4th, 2009 in Uncategorized

The simple answer to that question is: No, they did not.

In Alexander Wolfe’s article in Information Week on the iPhone patent just awarded to Apple he focuses on that the patent indicates that Apple is planning video conferencing for the iPhone. This is obviously interesting and very likely given that the phone described in teh patent has a user facing camera. However, the article also gives the impression that the camera and video conferencing are included in what is covered by the patent. To be clear, Alexander isn’t stating this as a fact but it makes sense to point out that so is not the case. Especially since there are several other similar reports making similar suggestions.

If you read the patent carefully you will find that the description of  video conferencing solutions is included in the “Description of Embodiments” part. As for all patents, what is actually patented is stated in the claims section where there is no mention of a video application. The claims are all about the touch screen functionality. Those claims are of course very interseting and can cause problems for competitors.

Mobile VoIP Java Client

Jan Linden
Posted by Jan Linden
on January 8th, 2009 in Technology

After watching the BCS Championship game I am in a good mood for a short post. On that topic I start with:

Congratulations Gators!             

Every time there is news about a new Java based VoIP client for mobile phones I get questions about what this really means. This happened again today after it was revealed in recent news that Skype are about to release Skype Lite for Android and other Java enabled phones.

If you read the information closely you will notice that Skype Lite actually doesn’t support VoIP calls at all. For calls even between Skype users it is using the regular cellular voice network. The same is true for other similar offerings by e.g. fring (minifring). This is because it is far from a trivial task to design a generic Java based VoIP client.

Before addressing this topic I will comment on what I wrote in a recent blog entry where I talked about how to create a download free VoIP solution:

“Building a solution that doesn’t require a download is actually practically impossible because of the limitations that a Java Script cannot access audio and video captured on the PC. I.e., only one way streaming is possible for a Java Script based solution. “

It is easy to confuse the Java Script based solution with a plain Java solution, which is what we are talking about here. A Java client can utilize the full Java stack that is supported on the device and hence is not limited to streaming applications only, as the Java Script based client would be.

In theory Java, using the Java Media Framework (JMF) can be used to write a VoIP application but codec support is very limited and significant latency issues do exist, even on powerful PC platforms. This is due to the inefficiency of the Java model (running a virtual machine) as well as the fact that JMF has been developed with focus on streaming applications.

So to answer the question why is it so hard to build a generic Java based client for VoIP several factors have to be mentioned.  Maybe the three most important ones are:

1.   A fully Java based solution will be too slow to run in real time or take up all available CPU on the mobile device. It is possible to circumvent this issue by using a native library in C or C++ but that requires a library for each platform.

2.   Different devices are so different that any attempt to make a generic solution will still need device specific optimization for optimal performance

3.   Lack of support in the Java stack for certain important voice related features or poor implementations of the same. These issues are typically related to the sound drivers and network socket drivers and will be device dependent