Author Archive for Stefan Holmer

SVT Provides Great Coverage, if Not Top Quality

Stefan Holmer
Posted by Stefan Holmer
on March 2nd, 2010 in Technology

Sweden has exceptional public service television called SVT. During the winter Olympics, SVT broadcasted the most popular events, sometimes two or three at a time on different channels. As if that’s not enough, they also have a free web service called SVT Play, where they streamed almost all Olympic competition in good quality. Finally, for those really devoted to sports, SVT even has an iPhone app for watching the broadcasts on the go.

At first glance, the quality of the online stream appears to be really good. They’re encoding at a bit rate of 810 kbps and defaults to using Flash, probably with ON2 VP6 as the codec. For the Windows user, Windows Media is also available.

However, most people probably prefer to watch the stream in full screen mode. This is where I think SVT Play fails to deliver. If you look at the image below, which is a part of a full screen video sequence on SVT Play, you can see severe aliasing at the edges. This is the most apparent at the edge between the man’s neck and his shirt.

tommy_fs

The aliasing appears due to bad – or nonexistent – interpolation when upsampling the images. Whether or not this has to do with problems with Flash or SVT Play I cannot tell for sure, but we can at least assume it can be solved since watching a YouTube video (which also uses Flash) in full screen looks good, as is demonstrated in the screen capture of a section of a full screen YouTube clip below.

zombieland

There have recently been a lot of discussions about the video tag in HTML5, and what codecs to use with it. Some prefer license-free codecs, while some prefer the best possible performance. But one thing is for sure: regardless of how good your codec is, the experience is what is most important. Having bad post-processing will always have the last say, no matter how many bits and CPU cycles you spend on encoding your video source.

Traffic Shaping for HD Video

Stefan Holmer
Posted by Stefan Holmer
on April 23rd, 2009 in Technology

To me real-time video communication is essentially about three things:

  • 1. Estimating the available resources, such as computation power, channel capacity and quality.
  • 2. Making the best use of those resources.
  • 3. Protecting against network impairments such as jitter and packet losses

The best way to achieve the second goal, in my opinion, is to utilize traffic shaping, e.g. shaping video traffic so that we optimize the quality experienced by the user.  Typically we have a bandwidth limitation which we must make sure to stay below. The most common way to do this is by changing the quantization step size and/or the frame rate until the limitation is reached.

As network capacity increases, and consumers demand more bandwidth intensive applications, we approach a situation where even people making their trans-Atlantic call to mom want to use HD video. Unfortunately the varying quality of many internet routes, and the huge variance in available bandwidth due to cross traffic, also mean that the quantization step size and the frame rate will vary a lot. For instance, consider the case of a real-time HD video call over a channel with 2 Mbit/s of available bandwidth. The image is sharp and we have a smooth flow. Suddenly we get a lot of cross traffic impairing our available bandwidth to about 500 kb/s. Now the application must decide how to combine increased quantization step size and decreased frame rate. In other words, what is the lowest image quality and most jerkiness the user is willing to tolerate?

If we look at it from another perspective, in the past we had lower bandwidth and were using lower video resolutions. Now we have more available bandwidth and are thus using higher resolutions- up to HD. If the bandwidth along the route at which we’re making our call varies a lot, should we use HD even though at times we must conform to bit rates as low as 500 kb/s? Wouldn’t it be better to lower the resolution?

At GIPS we solve this problem by automatically spatially sub-sampling each frame before encoding if we notice that the available bandwidth has gone too low for the preferred frame size. At the decoder side we then up-sample the image again after decoding. In this way we have added an additional parameter to tune for traffic shape to allow for the best possible end-user experience.

Below are two screenshots of video frames produced by different coding methods. The first video frame has been JPEG encoded with high quantization step size. The second frame has also been sub-sampled and JPEG encoded to the same file size (requiring a lower quantization step size), then decoded and up-sampled. Notice the pixilization of the first image, especially around still and uniform objects such as the woman’s pants and the floor. Now look at how much clearer the second image appears. By taking into account network limitations, and shaping traffic accordingly, we are able to produce a much more life-like experience for the end-user.

 video_frame_1

 

   video_frame_2