Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video

时间：2024-07-28

Wei-Hsiang Chang， Mei-Juan Chen， Gwo-Long Li， and Yu-Ting Chen

Efficient Coding Unit and Prediction Unit Decision Algorithm for Multiview Video Coding

Wei-Hsiang Chang， Mei-Juan Chen， Gwo-Long Li， and Yu-Ting Chen

—To aim at higher coding efficiency for multiview video coding， the multiview video with a modified high efficiency video coding （MV-HEVC）codec is proposed to encode the dependent views. However， the computational complexity of MV-HEVC encoder is also increased significantly since MV-HEVC inherits all computational complexity of HEVC. This paper presents an efficient algorithm for reducing the high computational complexity of MV-HEVC by fast deciding the coding unit during the encoding process. In our proposal， the depth information of the largest coding units （LCUs） from independent view and neighboring LCUs is analyzed first. Afterwards， the analyzed results are used to early determine the depth for dependent view and thus achieve computational complexity reduction. Furthermore， a prediction unit（PU） decision strategy is also proposed to maintain the video quality. Experimental results demonstrate that our algorithm can achieve 57% time saving on average，while maintaining good video quality and bit-rate performance compared with HTM8.0.

Index Terms—Coding unit， multiview video coding，prediction unit.

1. Introduction

Three dimensional television （3DTV）， high definition television （HDTV）， and free viewpoint video （FVV） have become the focus of multimedia development. The joint video team （JVT） proposed multiview video coding （MVC）as an extension of the H.264/AVC video coding standard to support multiview video applications. Furthermore， to take the advantage of high coding efficiency of H.265/HEVC［1］，Muller et al. extended the high efficiency video coding（HEVC） standard for coding of multiview video （MV） and depth data［2］. Fig. 1 shows the frame structure of MV-HEVC coding. The coding order involves encoding the frame of independent view （V0） first， and then encoding the frame at the same instant in time as the dependent view（V1）. Each dependent view has an interview reference frame to help the prediction. Comparing MV-HEVC with MVC extension of H.264/AVC， MV-HEVC gains a less bitrate but costs a lot in terms of computation time.

Fig. 1. Frame structure of multiview coding system.

To speed up the coding time of HEVC， the work in ［3］considered the depth similarity in the temporal and spatial domains. According to statistical probability， two sets are defined. The α set consists of largest coding units （LCUs）with higher probabilities， while the β set consists of LCUs with lower probabilities. When encoding each LCU， the depths in the two sets will be checked and then the case will be classified according to one of three degrees of similarity（high， medium， and low）. The coding unit will be predicted according to the degree of similarity. The work proposed in［4］ consisted of an adaptive coding unit depth range determination （ACUDR） and three early termination methods. For ACUDR， the depths of the neighboring LCUs are multiplied by the corresponding weighting values to derive a predictive depth value， and then to decide the candidates of coding unit sizes.

To reduce the high computational complexity problem of MV-HEVC， this paper proposes a fast coding unit （CU）and prediction unit （PU） decision algorithm to aim at that goal. In our proposal， the depth information between LCUs in view and spatial directions is analyzed first to constitute our proposed CU decision algorithm. To further decrease the computational complexity while keeping the compression efficiency， the conditional probability betweenthe depth and PU size is calculated to establish our proposed PU decision algorithm. Through the proposed algorithm， the computational complexity of MV-HEVC can be reduced significantly with ignorable rate distortion performance degradation.

This paper is organized as follows. In Section 2， we analyze the correlation between the current LCU and interview/spatially neighboring LCUs， and then describe our proposed algorithm. Section 3 demonstrates the experimental results. Section 4 provides the conclusion.

2. Proposed Fast Coding Unit and Prediction Unit Decision Algorithm

In this paper， the depth relationship between LCUs is analyzed first and the algorithm is then proposed according to the observation results. Here， the depths of current LCU and neighboring LCUs are used for observing the relationship. However， to further utilize the information between views， the depth of predicted LCU from independent view will be also observed. To derive the predicted LCU depth from the independent view， our proposed algorithm first calculates the global disparity vector which represents the geometrical shift between views. Once the global disparity vector has been calculated，the global disparity vector information will be used to find the predicted LCU depths from the independent view. The detailed operations of our proposed algorithm will be explained below.

2.1 Calculation of Global Disparity Vector

In multiview video coding， a correlation exists between adjacent views. The independent view （V0） and the dependent view （V1） have a slight disparity due to the camera setting positions. Therefore， we can use such disparity information of V0 to predict the position of image content in V1. The global disparity vector is calculated by

where W and H are the horizontal and vertical LCU numbers. DV（i， j） indicates the disparity vectors of the LCU in the 0th frame of the dependent view. The averaged DV is treated as the global disparity vector. Since the locations of the cameras are substantially fixed， the global disparity vector in each picture is similar. We only calculate it from the 0th frame to obtain the global disparity vector between views so that the computational complexity for deriving GDV can be reduced significantly.

2.2 Analysis of the Correlation between Neighboring LCUs and Current LCU

After determining the global disparity， we can derive the depth information for dependent view （V1） from independent view （V0） by using the global disparity. Fig. 2 shows the relationship between V0 and V1 by using global disparity mapping. Since the depth is predicted by LCU unit， the global disparity vector （GDV） is divided by n to obtain DXand DY， given by

where the variable of n is set to 64 since the LCU size is 64×64 in the MV-HEVC coding system.

Afterwards， the prediction of maximum depths can be achieved by

Fig. 2. Deriving the maximum depth information from V0 to V1 by global disparity vector mapping.

Fig. 3. Neighboring LCUs of the current LCU.

Fig. 4. Probability of neighboring LCU’s max depth more than current LCU’s max depth for sequences of （a） book-arrival，（b）newspaper， and （c） average of all test sequences.

Once all required depth information has been derived successfully， we analyze the relationship between the maximum depths of the current LCU and predicted LCUs from V0， and the neighboring LCUs， as shown in Fig. 3. Here， the predicted depth from V0 is derived by （1） to （3）. Five sequences with 1028×768 resolution （kendo， balloon，newspaper， book-arrival， and lovebird） are used for analysis. Fig. 4 plots the averaged probabilities of different depth relationships. In Fig. 4， the letters of A to I correspond tothe neighboring LCUs as shown in Fig. 3. From Fig. 4， we can observe that the probabilities between views are much higher than those in the spatial domain.

2.3 Fast Coding Unit Decision Algorithm

Based on the analytical results shown in previous section， we propose a fast coding unit decision algorithm to reduce the computational complexity. The operation of our proposed algorithm is described below in detail. First， we divide the neighboring LCUs into two sets. V0 set represents the depths between views including LCU A， B，C， D， and E. V1 set includes F and G， which represent the depths in the spatial domain. In addition， the LCUs of H and I are not included in our proposal since they appear less probability to be selected. Fig. 5 shows the flowchart of our algorithm. For the LCU located at the upper left corner of the frame， all depths are checked to find out the best coding unit size. Thus， the MaxDepthFinalis set to 3. For the LCUs at the boundary， the MaxDepthFinalis determined by the maximum value between the respective depths with the highest probabilities of available LCUs in the V0 and V1 sets. If the LCU is not located at the boundary， the predicted maximum depth （MaxDepthP） of the current LCU is obtained by the maximum value in V0 and V1 sets. However， this may result in MaxDepthPbeing larger than the best coding unit. In order to overcome the situation， we compute AD by （4） to indicate the average difference between MaxDepthPand the combined V0 and V1 sets. Here， the variable N is set to 7， which represents the number of depth candidates within both of V0 and V1 sets. j indicates the index of LCUs of the combined sets. Nei（j） is the depth of the LCU in the combined sets.

Afterwards， we use AD to determine the MaxDepthP. If AD is small， it means the depths of MaxDepthPand the combined set is similar. Then a variable of MaxDepthADcan be set as MaxDepthP. In contrast， if AD is large， it means the greater deviation between MaxDepthPand the combined set size. In this situation， the MaxDepthADis decreased to reduce the computation according to TH1 to TH3. In our algorithm， TH1， TH2， and TH3 are set to 0.86，1.71， and 2.57， respectively.

Fig. 5. Flowchart of proposed fast coding unit decision algorithm.

2.4 Determination of Prediction Units

In the coding procedure of HEVC， a CU can be divided into several prediction units （PUs） and each PU has to be checked by the rate distortion cost. In order to avoid the performance degradation caused by early termination of the coding unit decided by MaxDepthADand further reduce the computational complexity， we propose the determination scheme of PUs in each CU based on analysis of the conditional probability of PU distributions. Table 1 tabulates the simulation results of conditional probability. From this table， we can find that the probabilities of 2N×2N，N×2N， and nL×2N are higher than the other modes for depth 1 or 2. In addition， for depth 3， 2N×2N and N×2N will also be higher than the other modes. The proposed PU determination algorithm based on the analytical results is shown in Fig. 6 and its operation is described as follow.

Table1: Probability of PU modes for various best CU sizes

First， the depth of current CU is checked. If DepthCur（the depth of the current CU） is less than or equal to MaxDepthFinalor MaxDepthAD， all of the prediction modes should be checked to derive best results. Otherwise， if DepthCuris larger than MaxDepthADand less than or equal to MaxDepthP， the proposed PU determination is applied. If DepthCuris equal to 1 or 2， only 2N×2N， N×2N， and nL×2N modes are checked. Otherwise， if DepthCuris equal to 3， we only check 2N×2N and N×2N modes. The reason for using these prediction modes in each corresponding CU is that the total probabilities for these modes are larger than 90%. Finally， if all above conditions have not been satisfied and DepthCuris larger than MaxDepthP， no prediction mode will be processed.

Fig. 6. Flowchart of proposed PU decision algorithm.

3. Experimental Results

In this paper， we implement our algorithm in MV-HEVC reference software HTM8.0［5］with two views. Five test sequences with resolution 1024×768 （newspaper，kendo， balloon， book-arrival， and lovebird） and two test sequences with resolution 1280×960 （champagne-tower and dog） are evaluated. The experimental environment parameters are shown in Table 2.

Table 2: Parameters of experimental environment

Table 3 gives the comparison of hit-rate MaxDepthPand MaxDepthAD. In this table， the higher hit-rate means the higher chance to include best result after encoding. From this table， we can observe that hit-rate of MaxDepthPcan achieve 97.9%. Even for the computational complexity reduced version MaxDepthAD， the hit-rate can reach 92.7%. Table 4 shows the performance comparison of our proposed algorithm with ACUDR of ［4］ for dependent view by calculating the Bjøntegaard delta （BD） bit-rate［6］and BD PSNR （peak signal-to-noise rate）［7］. Both methods are implemented on the software HTM8.0. Compared with HTM8.0， the proposed algorithm increases the BD bit-rate only 0.19%； the BD PSNR drops only 0.011dB， and gets 57.49% time saving. Compared with the ACUDR of ［4］， the BD bit-rate is reduced by up to 1.32% in the book-arrival test sequence， with an average reduction of 0.88%. The largest BD PSNR increase of 0.041 dB is in the kendo test sequence， with an overall average increase of 0.026 dB.

Table 3: Comparison of hit-rate for MaxDepthPand MaxDepthAD

Table 4: Coding performance of the proposed algorithm and ACUDR of ［4］ in HTM 8.0 （dependent view V1）

The experimental results show that our proposed scheme is 14.89% faster than ACUDR of ［4］ and provides a better BD bit-rate and BD PSNR performance. For high motion sequences， the depth information in the temporal domain may be inaccurate. However， the interview information is irrelevant to the motion of the video and remains robust. The proposed algorithm employs the depth information between views to achieve better performance than methods relying only on temporal and spatial correlations.

4. Conclusions

In this paper， a fast CU decision algorithm was proposed for MV-HEVC for reducing the computational complexity. Based on the high correlation between the independent view and dependent view， the depth information between them and the neighboring LCUs were used to compose our fast CU decision algorithm. In addition， we proposed the PU decision algorithm to maintain the coded vide quality based on the observing results between CU and PU. Simulation results demonstrated that our proposed algorithm can achieve 57% coding time saving on average with ignorable rate distortion performance degradation compared with HTM8.0，and achieve higher coding time savings and less rate distortion performance degradation on average compared with previous work ACUDR of ［4］.

［1］ High Efficiency Video Coding， Recommendation ITU-T H.265， 2013.

［2］ K. Muller， H. Schwarz， D. Marpe， C. Bartnik， S. Bosse， H. Brust， T. Hinz， H. Lakshman， P. Merkle， F. H. Rhee， G. Tech，M. Winken， and T. Wiegand， “3D high efficiency video coding for multi-view video and depth data，” IEEE Trans. on Image Processing， vol. 22， no. 9， pp. 3366-3378， 2013.

［3］ Y. Zhang， H. Wang， and Z. Li， “Fast coding unit depth decision algorithm for inter-frame coding in HEVC，” in Proc. of Data Compression Conf.， 2013， pp. 53-62.

［4］ L. Shen， Z. Liu， X. Zhang， W. Zhao， and Z. Zhang， “An effective CU size decision method for HEVC encoders，”IEEE Trans. on Multimedia， vol. 15， no. 2， pp. 465-470，2013.

［5］ L. Zhang， G. Tech， K. Wegner， and S. Yea， “Test model of 3D-HEVC and MV-HEVC，” Document JCT3V-G1005 of Joint Collaborative Team on 3D Video Coding Extension Development， January 2014.

［6］ G. Bjontegaard， “Calculation of average PSNR differences between RD curves，” ITU-T SG16/Q6 Document，VCEG-M33， Austin， April 2001.

［7］ G. Bjontegaard， “Improvements of the BD-PSNR model，”ITU-T SG16/Q6， Document， VCEG-AI11， Berlin， July 2008.

Wei-Hsiang Chang was born in Taoyuan in 1989. He received the B.S. degree in electrical engineering from Tamkang University， Taipei in 2012， and the M.S. degree in electrical engineering from National Dong Hwa University， Hualien in 2014. His research interests include multiview video coding and HEVC.

Mei-Juan Chen received her B.S.， M.S.，and Ph.D. degrees in electrical engineering from National Taiwan University， Taipei in 1991， 1993， and 1997， respectively. Since August 2005， she has been a professor with the Department of Electrical Engineering，National Dong Hwa University， Hualien. She also served as the Chair of the department from 2005 to 2006. Her research topics include image/video processing， video compression， motion estimation，error concealment， and video transcoding.

Dr. Chen was the recipient of many awards: including the Dragon Paper Awards in 1993 and the Xeror Paper Award in 1997，K.T. Li Young Researcher Award in 2005， Distinguished Young Engineer Award in 2006， Jun S. Huang Memorial Foundation best paper awards in 2005 and 2012， and IPPR society best paper award in 2013.

Gwo-Long Li received his B.S. degree from the Department of Computer Science and Information Engineering， Shu-Te University，Kaohsiung in 2004； M.S. degree from the Department of Electrical Engineering，National DongHwa University， Hualien in 2006； and Ph.D. degree from the Department of Electronics Engineering， National Chiao-Tung University， Hsinchu in 2011. During 2011 to 2014， he was an engineer with Industrial Technology Research Institute（ITRI）， Hsinchu. In 2006， he received the Excellent Master Thesis Award from Institute of Information and Computer Machinery. He is currently a senior engineer with Novatek Microelectronics Corp.， Hsinchu. His research interests include the video signal processing and coding and its VLSI architecture design.

Yu-Ting Chen was born in Taipei in 1993. She is now pursuing her B.S. degree in electrical engineering with National Dong Hwa University， Hualien. Her research interest mainly lies in 3D video coding.

Manuscript received November 1， 2014； revised January 13， 2015. This work was supported by NSC under Grant No. NSC 100-2628-E-259 -002 -MY3.

W.-H. Chang and Y.-T. Chen are with the Department of Electrical Engineering， National Dong Hwa University， Hualian （e-mail: destiny20216@hotmail.com； 410023017@ems.ndhu.edu.tw）.

M.-J. Chen is with the Department of Electrical Engineering， National Dong Hwa University， Hualian （Corresponding author e-mail: cmj@mail.ndhu.edu.tw）.

G.-L. Li is with Novatek Microelectronics Corp.， Hsinchu （e-mail: gwolong@gmail.com）.

Color versions of one or more of the figures in this paper are available online at http://www.journal.uestc.edu.cn.

Digital Object Identifier: 10.3969/j.issn.1674-862X.2015.02.001