Easy HD Expressway!The best seamless transition from cctv to full HD - ccHDTV
The Technology

The Technology

/The Technology/ccHDTV Technology/Introduction of "Latency"

Introduction of "Latency"

Technical Forum: The Latency

 To discuss the topic of this article – the latency, we have to mention a related concept – the fluency. For surveillance system, the fluency means the smoothness of instant video display (versus playback). The closer to what human eyes see, the better. The latency means the delay between the event being captured by the sensor and the event being shown on the screen. The shorter the delay, the better. Theoretically speaking, the better the fluency, the longer the latency, and vice versa. Therefore, under limited hardware capability and system efficiency, most home and commercial surveillance products have to trade one off for the other.

 Do you remember what the surveillance video looked like twenty years ago? It could be something like this: “A thief pried the door open, and before we saw how he did it, he had been in the middle of the house. Soon for a while he used superior kungfu to appear at the door of the room, and the next moment he had already stolen the money and left.” Such stumbling and unreal video is in fact due to the low-efficiency system. The recorded video had frame rate much lower than what human eyes can capture, resulting in serious lack of smoothness.

 The question then is how to design a surveillance system as fluent as what human eyes see directly? The answer starts from two phenomena – the vision persistence and the “brain fill”. The former refers to the illusion that occurs when the signal has ceased to enter the eye. The latter refers to the function that our brain will fill the gap of the discontinuous images. The result of the two phenomena is that for normal people, the brain will consider a video is continuous or smooth as long as the frame rate is greater than 24 fps (frames/sec).

 Due to the above-mentioned reason, the frame rates of analog TV specifications such as PAL, NTSC, and SECAM are all greater than 24 fps. There are slightly difference among these specifications due to the different frequencies of the electricity at different countries. For example, the frequency of the electricity at United States and Japan is 60Hz, so they naturally adopt 30 fps (NTSC’s specification is 29.97 fps, which is very close to 30 fps). The frequency of the electricity in Europe is 50Hz, so they use 25 fps. Modern surveillance cameras which claim “real time” use 25 fps (PAL’s spec.) or 30 fps (NTSC’s spec.) as well.


 After explaining “the fluency”, we come back to the main topic “the latency”.

 As mentioned earlier, the latency means the delay between the event being captured by the sensor and the event being shown on the screen.

 The concept or problem of the latency is less mentioned before because early surveillance cameras are all analog cameras, with negligible latency. Along with the penetration of digital surveillance cameras, people start to be aware of the latency.

 

 In addition to the migration of analog to digital system, low-latency becomes more and more important in quite a few applications. The following are some examples.

  1. In a live TV show. The host and the remote audience should react at the same time.
  2. In a casino. The staff in the control center find that the deceivers are playing tricks and immediately notify the security personnel. If there is latency, the highly-skilled deceivers would have already thrown away or destroy the evidence.
  3. Traffic surveillance or at locations where high-speed photography is required. The speed of Taiwan High-Speed Rail can reach 315 km/hr, the speed of Japaness Shingansen can be 320 km/hr, and the Nagoya Maglev (magnetic levitation) scheduled to operate in 2027 can even reach 603 km/hr. These high-speed trains travels 88 to 168 m per second. It is easy to imagine that once accidents happen, the speed of reaction could make a great difference. We have to understand that the “fast” and “slow” in this scenario is in the order of milliseconds.

 In the above-mentioned scenarios, the delay is very critical and should be as small as possible. However, in most locations or applications, the video quality, the storage capability, the remote backup, etc. are more important than the millisecond-order latency. That is to say, low-latency is not the ONLY concern. The camera manufacturer should consider all factors when determining the specifications.

 In general, the latency of analog cameras is smaller than that of digital cameras. The earlier CCTV analog cameras and the recent analog HD cameras (AHD, HDCVI, and HDTVI) capture the images and send to DVR in analog signal format. There are no digitization, compression, and un-compression before reaching DVR, so the latency is less than 100ms. Some people thought the latency of analog cameras is nearly zero, which is not absolutely correct. All technologies introduce latency, but if the latency can be controlled within 200 ms, people might not be aware of it.

 The digital surveillance products available in the market include HD-SDI, IP, and DTV. Among them, HD-SDI sends uncompressed video data directly in order to preserve the video quality and to reduce the latency. However, the high data rate due to uncompressed video not only requires high cable quality, but also limits the transmission distance. HD-SDI is difficult to gain popularity in the market and finds applications only in some special cases.

 IP surveillance system, which enters the market for a long time, is the mainstream digital surveillance product. Not like HD-SDI, IP cameras send compressed video instead. The data compression follows the most popular H.264 or H.265 standard. IP cameras send data over the network, whose bandwidth-sharing nature introduces latency. Due to the contribution from both compression and transmission over network, IP system’s latency becomes too obvious to be ignored and people start to discuss this phenomenon and its consequences.

 DTV is proposed as a solution to the problem of system upgrade from analog to digital over the existing coaxial cable. It is developed by ITE Tech. Inc., and is derived from DVB-T digital TV technology. Similar to IP, DTV cameras send compressed video. However, instead of bandwidth-sharing, DTV allocates fixed bandwidth for each camera, and thus reduces latency due to bandwidth sharing. There is still latency, but the latency is predictable.

 Although the latency of analog transmission is relatively low, the transmission distance and video quality of it are both limited. Moving from analog to digital is no doubt the trend. For digital systems, video compression before transmission is also inevitable because of the higher and higher demand for video quality. We might think 720P and 1080P HD are good enough, but once 4K or higher-quality videos are available, people become used to them without difficulty. The higher-quality video brings higher data rate, and data compression is a must due to limited transmission bandwidth. If compression is not avoidable, how to optimize latency for such systems? We must first understand the sources of latency in digital systems with video compression.


 The processes of an event’s happening, the event being captured by a camera, and the captured event being displayed are as follows.

(1) Video Capture.

 After an event is captured by the CCD/CMOS sensors in a camera, the captured signals are processed by ISP (Image Signal Processor) to adjust resolution, exposure, white balance, dynamic range, illumination, etc. The adjustment is to recover the video details under all kinds of environments.

 The design of camera affects not only video quality but also latency. Sometimes bad design might cause pauses after long-time operation due to unexpected buffer overflow. It may require buffer clearing or reset to resume normal operation.

(2) Data compression.

 For the convenience of data transmission and storage, the video signals are digitized and compressed according to the quality requirements. The advantage of this digitization and compression is that the quality will not degrade due to transmission and storage. Note that the computing power, which affects compression time, and compression ratio, which affects the compressed data rate, of the ISP processor greatly affects the fluency.

(3) Modulation.

 Modulation is a process to convert an analog or digital data to a signal that can be transmitted. There are many modulation methods, and they are designed for easy transmission or efficient spectrum utilization. For example, one way to share spectrum is to allocate different carrier frequencies for different baseband signals. The receiver can select the desired baseband signals through bandpass filters. This is the basics to achieve multiplexing in one transmission medium. In surveillance systems, CCTV uses analog modulation, while IP and DTV uses their own digital modulation methods.

(4) Transmission.

 Transmission is the propagation of signal or wave from one location to another.

 What affects the latency is the transmission protocol. For example, IP cameras send the video data in the form of IP packets through routers. The network architecture determines how the routers are distributed and also affects the transmission time. In addition, the router needs time to check the packet header, to check the error indicator, and to find the routing paths. Sometimes the router has to buffer the packet for various reasons before sending it out. All these factors contribute to the latency. The nature of networks makes the latency of IP surveillance systems neither fixed nor predictable.

 Instead of packet-based transmission, DTV uses streaming. It is like allocating one lane for each car and changing lanes is now allowed. The cars can then maintain a constant speed and will not be blocked by other cars. The latency of DTV is both fixed and predictable. Assume a system with the following parameters: 16QAM, CR=2/3, GI=1/32, and 8MHz bandwidth. The overall date rate is 16.09 Mb/s. Take coaxial cable as an example. Assume the frequency range is 100MHz~900MHz, and the channel spacing is 12MHz (8MHz channel with guard band). From calculation, we have (900 – 100) MHz /12 MHz = 66, 16 Mb/s x 66 channels = 1.056 Gb/s. This means the coaxial cable supports 66 DTV cameras and the aggregate data rate is 1.056 Gb/s.

 To summarize, the major difference of IP and DTV digital surveillance system is DTV’s latency is fixed and predictable, while IP’s is not.

(5) Demodulation. Demodulation is extracting the original information-bearing signal from a modulated carrier wave.

(6) Un-compression:

 Un-compression is the process to recover the compressed video to display. The decoding power of NVR/DVR affects the fluency. If the resolution of the video exceeds the processing capability of the NVR/DVR (resolution too fine or frame rate too high), or the image processing power of the main processor of the NVR/DVR is limited, pause may happen.

(7) Display:

To display the decoded video signal on the monitor or TV through interfaces such as BNC, VGA, or HDMI. The latency may vary from monitor/TV to monitor/TV due to different design.


 Understanding the possible factors to affect latency, we use the following table to show an example of latency estimate of DTV products.


Modulation and demodulation

In the DVB-T specifications, an interleaver is included to spread out the errors caused by impulse-like noise. The depth of the interleaver is 11 MPEG2 packets and each packet has 188 bytes. Taking BW=6MHz, 2K, 16QAM, CR=3/4, GI=1/32 for example, from ETSI EN300744, the useful data rate is 13.572 Mbps. The total latency is 188 bytes/packet x 11 packets x 8 bits / 13572 Kbps ~ 1.22 ms.

1.22 ms

The other source of latency during modulation and demodulation is the time for signal processing (fft, channel estimation, etc.), and it varies from ICs to ICs. From our experiences, the processing time is about 4 OFDM symbols (one in Tx, and three in Rx). Assume again the above parameters and also from ETSI EN300744, the symbol duration is 308 us. The delay for 4 OFDM symbols is 308us x 4 = 1.232ms.

1.23 ms


Data transmission

The time for data transmission is related to the useful data rate and the I frame size. Taking BW=6MHz, 2K, 16QAM, CR=3/4, GI=1/32 for example, the useful data rate is 13.572 Mbps from ETSI EN300744. Assuming the I frame size after compression is around 100 KB, the time for transmit the I frame is 100KB x 8bits / 13572 kbps ~= 59ms.

Note that this 59 ms is the time to transmit one I frame, not the time for the signal to propagate. The data, after modulation, is converted to wave, which propagates at the speed of light. Therefore, the latency due to transmission distance can almost be ignored.

59 ms

Video decoding

The data rate after compression is in general not even because the compression ratio is greatly dependent on the video contents. At the receiver end, in order for smooth video playing, the uncompressed data rate has to be evened. One way to achieve this goal is to use buffer. If one frame is stored in the buffer, in a 30fps system, this introduces a delay of 33ms. However, if not considering fluency, this delay can be ignored.

33 ms

In addition, the decoding of video itself takes some signal processing time. Here we use 20ms.

20 ms

Video display

As mentioned, the speed of display affects the latency, and the latency varies from maker to maker. We use 33ms here.

33 ms

Total

 

210 ms

 The above example provides a rough estimate, and the resulting latency is 210ms. In the process of estimation, we make quite a few assumptions about the Tx and Rx. It is very likely that the latency of market available product is different from this number, and the difference can be huge. As mentioned earlier, the manufacturer may optimize the product according to the market needs, so it is not surprising to expect a wide range of latency difference.

 In addition, from the descriptions in the table, we see that the latency contributed by COFDM modulation is 1.22 (from interleaver) + 1.23 (from signal processing) = 2.45ms. This is the latency overhead of DTV, and it is relatively small. In addition, since the data rate can be adjusted by modulation parameters. One can choose a higher data rate to reduce the latency due to transmission, in addition to optimize the encoder to reduce I frame size.


 To summarize, the latency is inevitable of digital system with compression. Compared to analog surveillance system, although digital surveillance system has larger latency, it had advantages that analog system can not provide. Under proper design, the latency can be optimized to an acceptable range. In the category of digital surveillance, the latency of IP systems varies with network topology, while that of DTV systems is predictable. From the above example, we understand that 210ms is achievable, and this latency is within the range that human eyes think the video playing is “real-time”. DTV, which can be used in coaxial table, is really a reliable choice for upgrade.