Extended Patch Prioritization for Depth Filling within Constrained Exemplar-based RGB-D Image Completion

. We address the problem of hole ﬁlling in depth images, obtained from either active or stereo sensing, for the purposes of depth image completion in an exemplar-based framework. Most existing exemplar-based inpainting techniques, designed for color image completion, do not perform well on depth information with object boundaries obstructed or surrounded by missing regions. In the proposed method, using both color (RGB) and depth (D) information available from a common-place RGB-D image, we explicitly modify the patch prioritization term utilized for target patch ordering to facilitate improved propagation of complex texture and linear structures within depth completion. Furthermore, the query space in the source region is constrained to increase the efﬁciency of the approach compared to other exemplar-driven methods. Evaluations demonstrate the efﬁcacy of the proposed method compared to other contemporary completion techniques.


Introduction
As three dimensional scene understanding based on scene depth is becoming ever more applicable, missing or invalid depth information has resulted in the need for special case facets of subsequent processing (e.g.semantic understanding, tracking, odometery and alike), and the prevalence of low cost, yet imperfect, depth sensing has seen depth completion emerge as an important research topic.
Despite significant prior work in color image completion [1-3, 6, 8, 11], depth filling is by contrast scantly present within the literature [4,7,9,10,13,30] emerging as a relatively new research area posing significant challenges [12].Although there have been many attempts to use structure-based or exemplar-based color image completion approaches for depth hole filling [1][2][3]5], particular factors such as the absence of granular texture, clear object separation and the lack of in-scene transferability of varying depth sub-regions all create notable obstacles not present in the corresponding color completion case [29].
In this paper, we propose an improved exemplar-based inpainting approach [1] for depth completion (Fig. 1) that adds additional "boundary" and "texture" terms to aid in determining the priority of the sample patches used to propagate the structure and texture into the target region (Fig. 2).High computational demands, commonly associated with such approaches, are also reduced by dynamically constraining the query space based on the location of spatially adjacent sample patch selections (Section 3).This is demonstrated by providing superior results within a traditional exemplar-based image completion paradigm against other leading contemporary approaches (Fig. 1).
In a notable work, [7] improves upon the fast marching method-based inpainting proposed by [3] for depth filling.By assuming that the adjacent pixels with similar color values have a higher probability of having similar depth as well, they introduce an additional "color" term into the function to increase the contribution of the pixels with the same color.
By contrast, [21] uses a fusion-based method integrated with a non-local filtering strategy.Their framework follows [22], utilizing a scheme similar to non-local means to make accurate predictions for depth values based on image textures.Herrera et al. [10] propose an approach similarly guided by the color image based on the assumption that every surface is continuous and smooth within their energy function formulation.This "smoothness" term encourages flat depth planes in the completion process whilst ignoring the possibility of visible texture or relief in the filled depth region and hence limiting plausible (reasonable) completion characteristics.Zhang et al. [4] improve [1] by adding a "level set distance" term to the priority function.A joint trilateral filter performs smoothing post process.
Overall, although such exemplar-based methods have rarely been used in depth completion, they have the tendency to preserve texture.With increased granularity in modern depth sensing and increasing detail in depth scene rendering (e.g.illumination correction), the consideration of texture detail (relief) within any depth filling process is now paramount.As such, we propose an improved exemplar-based formulation capable of efficient and plausible depth texture completion.

Proposed Approach
In our approach, improvements are made to the framework of the exemplar-based inpainting [1] to create a more suitable and efficient depth filling approach.In the methodology of [1], the target region and its boundary are identified, a patch is selected to be inpainted and the source region is queried to find the best-matching patch via an appropriate error metric ( e.g.sum of squared differences).After the candidate patch is found, all the information is updated and the process starts over.An extremely important factor in generating desirable results is the order in which these patches are selected for filling.
In [1], the priority of each patch is given by: where C(p), the "confidence" term, and D(p), "data" term, are determined by: where |Ψ p | is the area of the selected patch Ψ p , I is the image, Ω is the target region, α is the normalization factor (255), n p is a unit vector orthogonal to the target boundary, and ⊥ is the orthogonal operator (Fig. 2).Before the inpainting begins, the "confidence" term is initialized as: The "confidence" term prioritizes patches constrained by more valid depth values (fewer missing neighbors) and the "data" term encourages the filling of patches into which isophotes (lines of equal intensity) flow.This framework creates a balance between these two terms for a more plausible inpainting [1].However, when completing real-world depth images with large holes covering entire objects, boundaries, and isophotes, the information in the accompanying color image (within RGB-D) can be used to create a suitable depth filling approach.In our approach, the "confidence" term is initialized and updated based on the depth image while the "data" term is calculated over the corresponding color image region (from RGB-D).To ensure a better flow of dominant linear structures into the target region, a "boundary" term is added based on the color image: where G x≥τ and G y≥τ are strong intensity gradients in the color image in the x and y directions respectively, with τ being the gradient threshold (e.g.τ = 0.7).This term essentially prioritizes patches that contain a larger number of pixels that are part of a significant edge or gradient structure in the color image.This ensures a better propagation of object boundaries into the target region.As seen in Fig. 4, the original exemplarbased approach [1] gives equal priority to points A, B, and C (Fig. 4, result of [1]) while the proposed method prioritizes points B and C because of the "boundary" term (Fig. 4, proposed approach), which greatly effects the quality of the results.Additionally, a "texture" term is introduced to guarantee a better propagation of texture into the target region.Since the color and depth gradients in certain parts of an image do not always match due to factors such as lighting and perspective, color information is not always a great indicator of texture.However, soft depth gradients always point to texture and relief, even though a depth image might appear smooth to the human eye.The "texture" term, which is applied to the depth image, determines which parts of the image surrounding the target boundary contain texture and encourages the process to fill them earlier to propagate texture in the target region: where G x<τ and G y<τ are slight intensity gradients in the depth image in the x and y directions respectively, with τ being the gradient threshold (e.g.τ = 0.3).Smallest changes in the depth image are identified and taken into account for a better relief texture propagation.As seen in Fig. 5, in which significant edges and linear structures are hard to find, the proposed method correctly prioritizes patches with slight depth changes and functions better than the original approach [1].After adding the two aforementioned terms, the priority evaluation function is transformed to: where C(p), D(p), B(p), T (P ) are the "confidence" term (based on the depth image), "data" term (based on the color image), "boundary" (based on the color image), and the "texture" term (based on the depth image) respectively.Finally, in most exemplar-based methods [1,4,6,11], the entire source region is queried for candidate patches.However, our analysis shows that most suitable candidates for any patch are located close to where the best-matching candidates were found for adjacent patches in previous patch filling iterations.As a result, a dynamic search perimeter is created when sampling candidates for a patch with previously filled neighbors (Fig. 6).The maximum and minimum of x and y indices of the selected candidates  for the previously-filled adjacent patches are used to determine the perimeter.Tests run over 20 different color and depth image pairs indicate that in 91.2% of queries, the best matching patch was found inside perimeter.Although this can negatively effect the quality of the results for the remaining 8.8% of patches, the efficiency is improved by an average of 31% (with negligible standard deviation), which is significant.

Experimental Results
Hole filling is fraught with constant compromises between efficiency and accuracy.The proposed approach is an example of this, as it outperforms many of its predecessors qualitatively and quantitatively [1,3,7,10] while being faster than others [1,2,9].Results were evaluated using a number of images, but in the interest of space, only a few are presented here.We utilize the Middlebury dataset [28] to provide qualitative and quantitative evaluation.Fig. 3 demonstrates that the proposed method generates plausible results without significant invalid outliers, blurring, jagging or other artefacts compared to other approaches [1-3, 7, 9, 10].All flaws and artefacts are marked in Fig. 3. Table 1 provides quantitative evaluation of the proposed approach against the same comparator set (GIF is the guided inpainting and filtering [7], SSI the second-order smoothness inpainting [10], FMM the fast marching method [3], FEI the framework for exemplar based inpainting [2], FBF the Fourier basis for filling [9], and EBI the exemplar-based inpainting [1]).As shown in Table 1, the method is in balance between efficiency and accuracy.While it is more efficient than other exemplar-based methods [1,2], it has a smaller root-mean-square error and fewer bad pixels (based on the evaluation methodology of [27]) than faster comparators [3,7].Experiments were performed on a 2.30GHz CPU (Table 1).
Fig. 1 demonstrates the results of the proposed method in comparison with [1-3, 7, 9, 10] when applied to examples from the KITTI dataset [25] (resolution, 1242 × 375).Depth is calculated using [26] with significant disparity speckles filtered out.The proposed method results in sharp images with fewer additional artefacts (Fig. 1).The closest performing approach, the variational framework for exemplar-based approach of [2], shows comparable quantitative performance (RMSE/PBMP, Table 1) in some aspects but our approach offers a mean computational saving of 15.2% over [2].The faster approaches [3,7,10] have significantly worse completion performance (Table 1, Fig. 1 & 3) than our approach.We have created a video displaying the results of the work.We invite you to view the video, which can be found here: https://vimeo.com/251792601.

Conclusions
In this paper, the problem of depth completion is addressed in an exemplar-based framework with a focus on a balance between efficiency and attention to surface (relief) detail accuracy.While exemplar-based methods, are mostly used for color images, their ability to preserve texture in the target region makes them suitable for depth filling when texture is of importance.Here, the priority term that determines the order of patch sampling has been modified to allow for a better propagation of strong linear structures and texture into the target region.Moreover, by constraining the query space, the method performs more efficiently than other exemplar-based approaches.Our evaluation demonstrates that while the efficiency of the proposed method is better than other exemplar-based frameworks, the plausibility and statistical relevance of the depth filled results compete against the accuracy of contemporary filling approaches in the field.

Fig. 4 .
Fig. 4. A demonstration of the effect of the "boundary" term.

Fig. 5 .
Fig. 5.A demonstration of the effect of the "texture" term.