Introduction
I created this activity for a STEM event at a local secondary schools (aimed at Year 8s). Each group of about 6 students stays with you for 15-20 minutes before they rotate around. There are about 8-10 rotations.
I wanted to talk about my job in streaming video and video compression seemed like something I could demonstrate and that the students would understand and relate to. As I started to think about it, I realised I could build quite a good interactive demo in Lego. I spent the next hour or two sorting through boxes of Lego looking for 2x2 blocks!
The picture above shows the table I used at the event. Most of what follows is from a set of slides that I built to accompany the demo. In reality, I walked through the process with the Lego on the table rather than use the slides.
Demo walkthrough
How big are video streams?
The first task was to set up just how big raw video streams are. The slide shows some basic calculations for HD and 4K video at 50 frames (images) per second, assuming 8-bits each of red, green and blue (24 bits per pixel).
So that’s about 2.5Gbps for HD and 10Gbps for 4K video.
I asked the students if they knew their home broadband bandwidth (a lot did!). Given that none of the students had a bandwidth anywhere near the raw data rate, how come they can watch Netflix, iPlayer, etc on their home Wi-Fi?
Introducing the first image
To keep things simple, we are going to use a 6x6 image. Can you tell what it is? Obviously it is a pyramid, on the sand, with the sun shining!
If we wanted to send (stream) this image to someone else, we need to convert it into a sequence of numbers. The key on the right shows the encoding we have chosen for the different colour bricks.
Simple encoding
This simplest way of turning this in to a sequence of numbers is to scan the image from left to right and top to bottom (like reading a book). If we do that, we end up with the sequence of 36 numbers (one per pixel) shown underneath the image.
With that sequence of numbers, the “receiver” can recreate the image, as long as they know the width of the original image.
Snake scanning
Rather than a simple left-to-right scan, it’s better to do what’s called a snake scan. With the snake scan, we scan left-to-right across the first row, then drop down and scan right-to-left, drop down and repeat (as shown in the diagram).
Using the snake scan, the sequence of numbers that we’d send is shown in the slide. Can you see how it’s different to the previous sequence?
Note that this sequence is still 36 numbers so we haven’t changed the amount of data that needs to be sent.
Compressing the image
Look at the sequence of blocks we created above. You will notice that there are frequent runs of the same colour. This is true for real pictures too - there are normally areas of the picture that are the same colour.
Instead of just sending the number corresponding to the colour of the block, we can send two numbers - the length of the sequence and the colour of the block. Eg 6,1 means a run of 6 blue blocks.
We can think of re-arranging the sequences of the same colour blocks into columns. Each column needs two numbers to describe it. For our image, we can now send the image with 20 numbers instead of 36 - that’s 55% of the original size! We have very nearly halved the amount of data that needs to be sent.
This technique is called run-length encoding.
So far we are just looking at a simple static image.
Adding further frames
Now let’s look at the other two frames in our three-frame movie! As you can see, a crow has flown in to the scene in the second frame (there is only so much I can do at this resolution!) and it has flown slightly further across in frame three.
The thing to note here is that all three of the frames are very similar. That’s true of real video as well - we have 50 frames per second and normally relatively little changes in 20 milliseconds.
So, rather than sending frames 2 and 3 just like frame 1, can we devise a way of just sending what’s changed?
Encoding only the differences
To encode frame 2, let’s calculate the difference between frame 1 and 2. As you can see, the only difference is the single pixel that shows our “crow”.
We’re going to use a couple of new bricks to have special meanings. These bricks don’t appear in our images so there’s no ambiguity with what they mean. The red brick is going to represent “nothing” (ie nothing needs to change) and the orange brick means “stop” (ie we’re done with all the changes for this frame).
To encode frame two, we snake scan the difference image just as before. There are 11 pixels that do not change, then a black pixel, and no other changes. Using our new encoding, we can send 11,0 (11 x nothing) 1,4 (1 x black) 1,5 (1 x stop).
The entire second frame can be sent with just 6 numbers (11,0,1,4,1,5). In the naive encoding we would send 36 (one per pixel) so that’s a saving of over 80%!!
Have a go and see if you can work out how to encode the third frame.
Encoding of our three-frame movie!
The third frame is only slightly more complicated. The first 10 pixels don’t need to change. Then there’s the crow in its new position, and we have to redraw the blue under where the crow was in frame 1. So frame 2 is just 10,0,1,4,1,1,1,5).
This image shows the entire encoding of our movie. If we sent every pixel we’d need 36 x 3 = 108 numbers. Using our compression algorithm, we need just 34 numbers - that’s a reduction of nearly 70%.
Real video compression algorithms are more complex than this, but the principles are the same. They look for ways of sending individual images as effiently as possible, and then how to send sequences based on the differences between frames.
Things to think about…
- Our run-length encoding algorithm can actually make things worse! Look at this image with alternating pixels - we would need 72 numbers to encode that image rather than the original 36!
- Because our difference frames are so small, the compression will get better (more efficient) for longer sequences. A 30-minute video at 50 frames per second would have 30 x 60 x 50 = 90,000 frames. In theory we could send one full frame (run-length encoded) and 89,999 difference frames. In reality, we send a full frame every second or so.
- The encoding we’ve developed here is lossless. That means the receiver can recreate the source image exactly. In practice, real compression is lossy - it removes detail from the original so it is more efficient to encode. The trick is to do that in a way that people don’t really notice.
Supporting material
In addition to the Lego demo, I also created a video that I had running on a loop on an iPad. I used the open-source Big Buck Bunny video and created a split-screen video that showed the original frame on top with the difference between the current and previous frames underneath. It proved to be a very good demonstration of how little changed between frames.
If you watch the video for a few seconds, every so often the difference frame will “flash”. I asked each group why they thought that happened, and every one worked out what was going on!