Underwater Image Bidirectional Matching for Localization Based on SIFT

时间：2024-08-31

Yan Linand Bo Liu

Yan Lin1,2＊and Bo Liu2

1. State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian 116024, China
2. Department of Naval Architecture and Ocean Engineering, Dalian University of Technology, Dalian 116024, China

For the purpose of identifying the stern of the SWATH (Small Waterplane Area Twin Hull) availably and perfecting the detection technique of the SWATH ship’s performance, this paper presents a novel bidirectional image registration strategy and mosaicing technique based on the scale invariant feature transform (SIFT) algorithm. The proposed method can help us observe the stern with a great visual angle for analyzing the performance of the control fins of the SWATH. SIFT is one of the most effective local features of the scale, rotation and illumination invariant. However, there are a few false match rates in this algorithm. In terms of underwater machine vision, only by acquiring an accurate match rate can we find an underwater robot rapidly and identify the location of the object. Therefore, firstly, the selection of the match ratio principle is put forward in this paper; secondly, some advantages of the bidirectional registration algorithm are concluded by analyzing the characteristics of the unidirectional matching method. Finally, an automatic underwater image splicing method is proposed on the basis of fixed dimension, and then the edge of the image’s overlapping section is merged by the principal components analysis algorithm. The experimental results achieve a better registration and smooth mosaicing effect, demonstrating that the proposed method is effective.

SWATH; underwater image registration; SIFT; bidirectional matching strategy; automatic stitching

1 Introduction1

In recent years, scientists throughout the world have increased the urgency of exploring the deep sea environments. The question is why study these regions? And what is the undersea world like? The reason is people are interested in machine vision research in underwater environments where there are plentiful mineral resources among others. Some kinds of relative researches could contribute to exploring the bottom of the oceans, for example the monitoring of the fin fish growth rate and shape change (Costa et al., 2006), the detection of hydrothermal chimney images (Espiau and Rivers, 2001), undersea cabled observatories (Aguzzi et al., 2011 and 2012), and so on. However, underwater environments present numerous challenges for the design of robot vision sensors and algorithms. This paper merely focuses on the discussion on the algorithm and implementation of the underwater image matching and stitching, as well as the significance of this research for the robot vision design. In deep seas, when dealing with natural scenes with underwater cameras, there are some problems with non-structured objects such as ores, or with the case of uncalibrated mono-cameras. One of the difficult challenges is to extract robust characteristic information from unknown natural images, which does not exist in any universal method at the moment. In fact, some underwater images acquired are blurred and complex; there are many reasons for this. Firstly, these images come from a single camera sequence without any knowledge about the external and internal parameters of the camera, and secondly, illumination conditions are bad, and the real environment of the images is very noisy. Therefore, it is assumed that the main objects are rigid but non-structured, and have random textures. In addition, there are many types of motion in sequence: a rigid one corresponding to the camera motion and the movement of seawater among others.

Generally, image registration technology is the process of transforming overlapped two or more images of the same scene which are taken at different times, from different viewpoints, or by different sensors. It geometrically aligns two images of the reference and sensed images. During the last recent decades, image registration has been widely studied and significant progress has been achieved for on-land applications. However, the study of the benthic area has benefited from recent progress in underwater technology, allowing the deployment of optical cameras for systematic surveying; the underwater image processing technique has also been developed and applied. In the following presentation, the problem of underwater image registration and mosaicing will be discussed.

Images registration is widely used in machine vision, remote sensing, medical imaging etc. The registration technique can generally be classified in two classes, namely the intensity-based technique and the feature-based technique. This paper does not discuss the method of theintensity-based technique. This paper only regards the manner of the feature-based technique. The technique’s applications may be divided into many groups, as well as the matching method consisting of the following four major steps (Zitova and Flusser, 2003): (a) Feature detection; (b) Feature matching; (c) Transform model estimation; (d) Image resampling and transformation. One of the above major steps is feature detection, which is the foundation of image matching. Many studies (Roberts, 1963; Moravec, 1981; Harris and Stephens, 1988; Lowe, 1999, 2001 and 2004) have been done: the earlier researches include the Moravec algorithm (Moravec, 1981) based on local interest points detecting corners, and the Harris corner detector (Harris and Stephens, 1988), etc. The hot spot of recent studies focuses on the scale invariant feature transform (SIFT) (Lowe, 1999, 2001, 2004), feature detection, and traditional edge detectors, such as with the Roberts operator (Roberts, 1963), Sobel operator (Sobel, 1990), Prewitt operator (Prewitt, 1970), LoG operator (Lindeberg, 1998), Canny operator (Canny, 1986) and the SUSAN (smallest univalue segment assimilating nucleus) (Smith and Brady, 1997) operator, which are all effective and mature for detecting the edges of images. These researches are usually based on either feature point representatives such as centers of gravity, line endings, distinctive points etc., or the phase congruency information (Kovesi, 2003), or derivatives of the intensity of the image, etc.

As is well known, there is a tremendous difference in pixel quality between underwater images and on-land images. Considering that these images are derived from underwater environments, with the uncalibrated mono-camera, a robust method must be found to extract the features of these underwater images. As far as the SIFT descriptor (Lowe, 1999, 2001 and 2004) is concerned, this method for extracting distinctive features holds some favorable properties including the features are invariant to image scale and rotation, changes in illumination, addition of noise, and is robust across a substantial range of affine distortions. In this paper, an improved SIFT approach is employed to deal with these underwater images.

This paper is organized as follows: section 1 describes how to obtain the underwater image; section 2 and section 3 propose a bidirectional matching algorithm and stitching method respectively. The experimental results are derived in section 4 and a conclusion is made in section 5.

2 The acquisition of underwater images

In general, underwater image information can be obtained with the aid of light, sound and vision. However, underwater image research and development is still basically in the early stage, lagging behind land-based imaging. Many mature technologies for land-based vision, from a practical point of view, are not suitable for underwater or can not simply be transplanted to underwater. Owing to the complexity and changeability of marine environments, the factors including impact of ocean currents on target identification, effects of sea water absorption and scattering, image information fast attenuation and low visibility of water, etc. result in poor image quality. Therefore, various researches and developments of underwater robot vision systems must consider the impact of underwater environments and the imaging features of the underwater visual system.

Underwater images typically include sound image and optical image, the former relies on sonar detectors, and the latter is obtained from the underwater camera. This section will briefly introduce the procedures of underwater image acquisition. First, determine the target location and distance under the guidance of the sound vision searching target. Second, enter the light areas of the visual work and detect and recognize the target under the guidance of the optical vision, and then acquire the underwater image information. In this paper, the stern of the SWATH ship is observed by use of the un-calibrated mono-camera and underwater image sequences are acquired. Through analysis of the video images, some of the speed performance of the SWATH ship can be summarized.

As far as the underwater optical image is concerned, image enhancement operating is necessary before analyzing and dealing with the underwater image, for example Fig. 1-4 show the SWATH ship underwater body image and their histogram equalizing image and corresponding histogram value.

Fig. 1 Original image

Fig. 2 Histogram of the original image

Fig. 3 Image after equalization

Fig. 4 Histogram of the image after equalization

3 Bi-directional matching algorithm and stitching method

In this section, a novel bi-directional matching algorithm is proposed to improve the accuracy of the image matching. This registration strategy is clear: firstly, the SIFT feature vector extraction method is adopted for extracting the eigenvector of two images respectively; secondly, the set of matching points of the unidirectional registration is calculated by the SIFT method; thirdly, repeating the previous step, the opposite direction of the matching map is calculated by the same approach, note that it is not necessary to set up the same threshold value (including ratio value and dist-threshold value) in these two image registrations.

The image translation model is the mathematical foundation of the images being matched and stitched, so the projection transformation model is adopted in this paper. According to the relation of transformation between the target images taken from different points of view and different perspectives, the transformation relation is as follows (Yang, 2008): where, the (xi′,yi′) and (xi,yi) are the coordinates of the key-points in the sensed and reference images, respectively.

where, H is a 3×3 matrix, called a homograph.

Since the matrix H has eight degrees of freedom, according to the SIFT algorithm, the above equation (1) is solved as long as the coordinate values of the four pairs of matching points are gotten, then the homograph H can be obtained. In order to determine the exact location of the overlap between images, the sensed image must be re-sampled and transformed into a new blank image, thus a fusion image is built. However, due to the frames being extracted in different lighting conditions, this often leads to production of the images with different brightness, and this fusion will also make the important part appear discontinuous with light and dark regions. To stitch the reference and sensed images smoothly, the new image will be merged by the principal component analysis (PCA) algorithm (Pan et al., 2011).

4 Image registration and mosaic experiments and results

In this section, the image sequences of the real underwater images such as the underwater image of the stern of the SWATH ship are used for image registration. The SIFT feature extraction method is adopted to extract scale invariant interest points for the gray level images and its histogram equalizing images in different ratios of distances, respectively. The ratio of distance is the ratio between the closest neighbor and the distance of the second closest neighbor.

Fig. 5 The relationship between the number of matches and the different ratios of distances

Fig. 6 Correct matches rate

Fig. 5 displays the relation between the number of matches and the different ratios of distances. AB-match-histeq designates the unidirectional matching from image A to image B in the histogram equalizing images; BA-match-histeq designates the unidirectional matching from image B to image A in the histogram equalizing images. Similarly, AB-match-gray and BA-match-gray are the unidirectional matching images in the gray level images. The basic strategy of the SIFT feature matching algorithm is to take feature points from image A, and find its Euclidean nearest distance between two points in image B, in the two feature points. If the value of the ratio is less than the preset matching threshold, then the Euclidean closest point in image B corresponds to the pointof image A. Since the match possesses the directionality, the matching from image A to image B is called AB unidirectional image registration, otherwise, BA unidirectional image registration. Fig. 6 shows the rate of the correct matches. The correct-match-histeq designates the rate of correct matches in the histogram equalizing images; the correct-match-gray designates the rate of correct matches in the gray level images. From Fig. 5 and Fig. 6, it can be seen that with more accuracy, the smaller the ratio of the threshold value of the match and the quantity of the correct match pairs is fewer at the same time. On the contrary, the higher the ratio of the threshold value, the more the quantity of unidirectional match pairs, and the error will also rapidly increase with the rise of the number. By combining the above analyses, the distance ratio is chosen as 0.7. Fig. 7 shows the result of applying the SIFT feature detection to gray level images with the distance ratio 0.7, in which there is a pair of significant false matching.

Fig.7 AB unidirectional image registration

Fig.8 AB bi-directional image registration

In accordance with the bidirectional matching strategy proposed in this paper, these images will be matched again. For eliminating false matches, a coefficient of the distance threshold will be set in a neighboring region; the distance threshold was set equal to 0.5, the results of the matching images are shown in Table 1 and Table 2. Fig. 8 shows the result of a bi-directional image registration with a 0.7 distance ratio, the number of the correct matching is 18.

Table 1 Results of gray level image matching by the unidirectional and bidirectional match method

Table 2 Results of image after histogram equalized matching by the unidirectional and bidirectional match method

Finally, the above analysis is concluded with a group of experimental plans, as shown in Fig. 9-11. By comparing Fig.7 with Fig.8 concerning the two images, it can be concluded that the bidirectional match algorithm is superior to the unidirectional match method under the same conditions (taking the distance ratio equal to 0.7 for example) from Table 1 and Table 2. As far as image fusion is concerned, from Fig. 11, intuitively it can be felt that the edge of the image overlapping section fusion has good effect by using the PCA algorithm. This experiment of image sequence matching and stitching in real scene achieves a better registration and smooth mosaic effect.

Fig. 9 Alignment of image B with image A by the bi-directional match method

Fig. 10 Alignment of image A with image B by the bidirectional match method

Fig. 11 Image fusion

5 Conclusions

In this paper, the discussion on the problem of underwater image registration was begun with analyzing the results of different match ratios, and then the selection of the match ratio and our method is put forward. In order to make a choice with what is available, a comparative analysis was done with histogram equalizing images and gray level images respectively.

The proposed method provides some satisfactory results for this paper, which are summarized as follows:

(a) In view of some virtues of the SIFT, namely, SIFT is one of the most effective local features of scale, rotation and illumination invariant, this paper has successfully employed the SIFT approach for processing underwater images. The experimental results also show that the SIFT approach is robust with respect to added noise, viewpoint changes and illumination changes, which makes it possible to achieve good image registration under water.

(b) A novel bidirectional matching strategy based on the SIFT algorithm is proposed, which can effectively reduce the false matching rate by comparing it with the unidirectional matching method, and improve the accuracy of underwater image registration.

(c) In comparing Fig. 11 to Fig. 1, it can be seen that this paper’s method can rebuild a large visual field and high-resolution underwater image, and make it possible to construct panoramic underwater image mosaicing. In addition, it helps us to analyze underwater targets more clearly.

(d) Finally, the matching images are merged by the PCA algorithm, and the experiment results shows that the PCA method has a good effect.

Conclusively, as described in this article, it enables us to realize that the image processing technology has a wide range of applications for marine engineering, and is obviously of crucial significance.

Aguzzi J, Costa C, Robert K, Matabos M, Antonucci F, Juniper SK, Menesatti P (2011). Automated image analysis for the detection of benthic crustaceans and bacterial mat coverage at VENUS undersea cabled network. Sensors, 11(11), 10534-10556.

Aguzzi J, Company JB, Costa C, Matabos M, Azzurro E, Manuel A, Menesatti P, Sarda M, Canals M, Delory E, Cline D, Favali P, Juniper SK, Furushina Y, Fujiwara Y, Chiesa JJ, Marotta L, Priede NBIG (2012). Challenges to the assessment of benthic populations and biodiversity as a result of rhythmic behaviour: video solutions from cabled observatories. Oceanography and Marine Biology, 50, 235-286.

Canny J (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(4), 679-714.

Costa C, Loy A, Cataudella S, Davis D, Scardi M (2006). Extracting fish size using dual underwater cameras. Aquacultural Engineering, 35(3), 218-227.

Espiau FX, Rivers P (2001). Extracting robust features and 3D reconstruction in underwater images. OCEANS 2001: MTS/IEEE Conference and Exhibition, Honolulu, USA, 4013-4018.

Harris C, Stephens M (1988). A combined corner and edge detector. Proceedings of Fourth Alvey Vision Conference, Manchester, UK, 147-151.

Kovesi P (2003). Phase congruency detects corners and edges. The Australian Pattern Recognition Society Conference: Proceedings DICTA, Sydney, Australia, 309-318.

Lindeberg T (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79-116.

Lowe DG (1999). Object recognition from local scale-invariant features. International Conference on Computer Vision, Corfu, Greece, 1150-1157.

Lowe DG (2001). Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, USA, 682-688.

Lowe DG (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91-110.

Moravec H (1981). Rover visual obstacle avoidance. International Joint Conference on Artificial Intelligence, Vancouver, Canada, 785-790.

Pan Y, Sun QS, Xia DS (2011). Image fusion framework based on PCA decomposition. Computer Engineering, 37(13), 210-212. (in Chinese)

Prewitt JMS (1970). Object enhancement and extraction. Lipkin BS, Rosenfeld A, eds. Picture Processing and Psychopictorics. Academic Press, Philadelphia, USA, 75-149.

Roberts LG (1963). Machine perception of three-dimensional solids. Ph.D. thesis, Department of Electrical Engineering, MIT, Massachusetts, USA, 1-39.

Smith SM, Brady M (1997). SUSAN—a new approach to low level image processing. International Journal of Computer Vision, 23(1), 45-78.

Sobel I (1990). An isotropic 3×3 image gradient operator. Freeman H, ed. Machine Vision for Three-Dimensional Scenes. Academic Press, Stanford, USA, 376-379.

Yang ZL (2008). Research on image registration and mosaic based on feature point. Ph.D. thesis, Xidian University, Xi’an, China, 17-31. (in Chinese)

Zitova B, Flusser J (2003). Image registration methods: a survey. Image and Vision Computing, 21(11), 977-1000.

Author biographies

Yan Lin was born in 1963. He is a professor of the Dalian University of Technology. His research interests include ship and offshore structure design, ship CAD software system development, and ocean engineering equipment research.

Bo Liu was born in 1977. He is a PhD of School of Naval Architecture and Ocean Engineering, Dalian University of Technology. His current research interests include technology of naval architecture and ocean structure.

1671-9433(2014)02-0225-05

date: 2013-06-25.

Accepted date: 2014-04-23.

Supported by the “Liaoning Baiqianwan” Talents Program (No. 200718625), the Program of Scientific Research Project of Liao Ning Province Education Commission (No. LS2010046), and the National Commonweal Industry Scientific Research Project (No. 201003024).

＊Corresponding author Email: linyanly@dlut.edu.cn