Classification of Underwater Target Echoes Based on Auditory Perception Characte

时间：2024-08-31

Xiukun Li, Xiangxia Meng, Hang Liuand Mingye Liu

Classification of Underwater Target Echoes Based on Auditory Perception Characteristics

Xiukun Li1,2＊, Xiangxia Meng1,2, Hang Liu1,2and Mingye Liu1,2

1. Science and Technology on Underwater Acoustic Laboratory, Harbin Engineering University, Harbin 150001, China
2. College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China

In underwater target detection, the bottom reverberation has some of the same properties as the target echo, which has a great impact on the performance. It is essential to study the difference between target echo and reverberation. In this paper, based on the unique advantage of human listening ability on objects distinction, the Gammatone filter is taken as the auditory model. In addition, time-frequency perception features and auditory spectral features are extracted for active sonar target echo and bottom reverberation separation. The features of the experimental data have good concentration characteristics in the same class and have a large amount of differences between different classes, which shows that this method can effectively distinguish between the target echo and reverberation.

underwater target detection; auditory perception characteristics; target echoes; bottom reverberation; Gammatone filter

1 Introduction1

When detecting underwater quiet or buried targets, active sonar system performance is seriously interfered with by the existence of the reverberation caused by the roughness of the seabed and other scatters. Analyzing the different characteristics between the target echo and bottom reverberation is helpful to designing effective identification methods, and improving the recognition performance. The time-frequency analysis method is introduced as well as higher order statistics methods, such as wavelet transform and spectrum analysis. But due to the interference of bottom reverberation in complex backgrounds, satisfactory results are difficult to achieve by using these methods.

The human auditory system is significantly superior with high sensitivity and a large dynamic range when hearing sound. In the field of underwater acoustics, the sonar soldier can judge the existence of a target and its type through listening to the received echoes. The human auditory system plays an important role in processing sonar echoes, so it can be applied to extracting characteristics of the underwater target echo. The human auditory mechanism was simulated by Wang Yang et al. (2006) and Wang Na and Chen (2006), and psychoacoustics parameters such as loudness, specific loudness, specific sharpness, etc. were used for underwater target recognition. The results showed that the extracted characteristics can demonstrate the nature of the signal and improve the performance of target recognition. Based on the research on the perception of the physical properties of objects vibrating both in the air and underwater (Tucker and Brown, 2003), the researchers developed a novel framework to classify underwater transient signals using features selected by the Parcel algorithm (Tucker and Brown, 2005). Young and Hines (2007) extracted the auditory perception characteristics that describe the timbre properties of the echoes when transmitting impulsive or continuous signals. Based on this, the automatic classifier was designed for discriminating between target and clutter echoes, which had good recognition performance (Young and Hines, 2007 and Allen et al., 2011). The auditory perception characteristics related to timbre were extracted from the high-order diagonal slice spectrum to classify the target echo, reverberation and noise in the received signal by Yang (2012). Through the processing of the experimental data, this method was proved to be able to achieve good classification performance.

In the literatures above, the perception features perform well in underwater target detection. But not one of them solved the problems caused by reverberation. In this article, the perception features are used to classify the target echoes and reverberation in order to increase the detection performance. Based on the method’s extracting auditory perception features in the field of speech and music, the Gammatone filter is used in speech signal processing as the human auditory perception filter to build the auditory model. Also, with a viable method of feature extraction for sonar received echoes, the auditory perception characteristics of the target echo and reverberation are extracted. Through analyzing and comparing the extracted characteristics, it is illustrated that the auditory perception characteristics can be used as an effective means to distinguish between the target echo and the bottom reverberation.

2 Human auditory model

The linear frequency modulation pulse signal (LFM) is commonly used in active sonar systems to detect and recognize underwater targets. The reflected echo from the shell structure has the same waveform as the origin pulse, therefore the sonar received echoes can be considered as the combination of the LFM signals and the interference signals in the channels. It has obvious frequency changing characteristics such as the LFM. On the other hand, reverberation which is formed from scattered waves generated by plenty of random scatters does not have the frequency changing characteristics. The human auditory system has a time-frequency analyzing function for sound, and it is sensitive to signals with changing frequency following some rules. Therefore, the human auditory model can be simulated to distinguish between the target echo and reverberation.

The cochlea in the inner ear dominates in processing the voice signals for the human auditory system. The basilar membrane in the cochlea transforms voice signals into electrical signals, and transmits the signals to the nerve system. Different points of the basilar membrane are expected to generate maximum response to different frequency stimuli. Thus, the basilar membrane functions as a set of parallel bandpass filters. This filter property of the human auditory system is primarily used. Researchers put forward a variety of auditory filter models to study the human auditory system. The Gammatone filter with a small amount of parameters can performance well in simulating the basilar membrane filter characteristics (Chen et al., 2008), and it has a simple unit impulse response function to derive the transfer function, which can be realized easily using the IIR filter (Slaney, 1993). So in this paper, the Gammatone filter bank is used as the human auditory model.

The Gammatone filter bank has a simple unit impulse response function, and in the time domain it can be expressed as (Van Immerseel and Peeters, 2003):

where, ()0ut=when 0t＜, and ()1ut= with the others, which ensures the filter is causal. And B represents the bandwidth of the filter, n represents the filter order and determines the slope of the edge,cf is the center frequency of the filter, and φ is the initial phase. The center frequencies of the filter bank are distributed evenly in the equivalent rectangular bandwidth (ERB) scale, and the relationship between the ERB and the linear frequency (in kHz) is given by (Moore and Glasberg, 1996):

Each filter bandwidth is represented as B=b1×ERB(fc), whereERB(fc)stands for the equivalent rectangular bandwidth of the Gammatone filter. The equivalent rectangular bandwidth refers to the width of a rectangular filter whose output is equal to the Gammatone filter in the condition of inputting the same white noise. And the relationship between it and the center frequency is: And parameter b1=1.109 is introduced to adapt the filter characteristic to the physiological characteristic. In these situations, the fourth-order Gammatone filter can simulate the filtering characteristics of the basilar membrane.

Ignoring the constant gain Bnand the initial phase φ, and setting b=2πB,ωc=2πfc, the authors conducted the Laplace transform to Eq.(1). By transforming the equation, the expression for the fourth order of the Gammatone filter’s transfer function is obtained: It can be seen that the Gammatone filter’s transfer function has four first-order zero points and two fourth-order pole points, and its convergence domain is Resb＞-, including the jω axis which ensures that it is causal and stable.

To easily use the impulse response invariance method, Eq.(4) is transformed to cascade and the parallel expression of the first and second order systems as:

By using a1=cos(ωcT), a2=sin(ωcT), a3=e-bTand

adding a correction factor T, the Gammatone filter′s transfer function in the Z domain can be obtained and expressed as:

This illustrates that the Gammatone filter has the eighth-order transfer function in the Z domain, which can be generated by cascading four second-order IIR filters.

Here the Gammatone filter is used to process sonar received echo signals, and the characteristic is extracted to distinguish the target echo from the bottom reverberation, based on the properties of the filter output waveform.

3 Auditory perception features

The output waveforms of the different channels of the Gammatone filter bank have different properties. The time-frequency perception features and spectral features are used as the characteristics to distinguish the target echo from the bottom reverberation. Here is the meaning of these features and calculation method.

3.1 Time-frequency perception features

The features discussed here are extracted from the time domain of the output waveforms of the filters, and due to the filter separating the input into different frequency signals, which means the signal has been processed in the frequency domain; the features are called time-frequency perception features. The time-frequency perception features include the subband attack time, subband attack slope, subband decay time, subband decay slope and subband correlation.

The subband attack time (SBAT) refers to the time that is from the beginning of the waveform to the maximum of the temporal envelope, describing the time when each channel waveform envelope reaches the peak. The subband attack slope (SBAS) refers to the slope for the waveform from the beginning to the peak of the temporal envelope, describing the change degree before the peak. The subband decay time (SBDT) refers to the time that is from the peak of the temporal envelope to the ending, and combining it with the SBAT is the duration of the waveform. The subband decay slope (SBDS) refers to the slope for the waveform from the peak to the ending, describing the change degree after the peak. The specific meaning of SBAT is represented in Fig.1.

Fig.1 The meaning of SBAT

The subband correlation (SBCorr) refers to the average of the correlation coefficient for the temporal envelope of a particular channel and the other temporal envelopes. The computing method is:

where,ijρrepresents the correlation coefficient for the waveform from the channel i and the channel j, and N is the number of filter channels.

The characteristics mentioned above were extracted from each channel, from which it can be seen that each characteristic is a set of values. Thus, each characteristic curve changing along with the frequency can be obtained, and also the statistical properties can be extracted from each one. Taking the SBAT for an example, we can extract the maximum value, the minimum value, the average value, the center frequency corresponding to the maximum value and the center frequency corresponding to the minimum value, abbreviated as maxSBAT, minSBAT, meanSBAT, maxSBAT-F, minSBAT-F, respectively. The same statistical properties can also be achieved from the other four characteristics, SBAS, SBDT, SBDS, and SBCorr. Thus 25 statistical time-frequency perception features can be extracted, which are used to distinguish between the target echo and bottom reverberation.

3.2 Auditory spectral features

Loudness is a psychoacoustics parameter to describe the intensity of the human auditory system to sound, and it can be calculated by use of the Moore model. Firstly the excitation level (Moore and Glasberg, 2004) of the signal should be calculated. Here the excitation level stands for the signal energy level in the cochlea after passing through the filter bank. Specifically it refers to the energy sum of all the energy in the bandwidth of the ERB frequency after passing through the cochlea. The calculation formula (Ma et al., 2008) is:

where Pjrefers to the sound pressure of thejthfrequency point, and=2×10-5Pa is the reference sound pressure. The relationship between the excitation level and the loudness (Zheng et al., 2007) is:

where E stands for the excitation level, andTHRQE is the threshold of the audible field standard for humans, both in units of dB. For calculation, C is 0.046 871; and above 500 Hz, the values ofTHRQE, G and α are 2.306 7, 0.1and 0.2. The loudness is obtained in units of sone/ERB.

Through the calculation process above, a loudness value can be obtained from each channel, and then the loudness spectrum curve can be drawn. Auditory spectral features can also be extracted from this set of data, mainly including the peak loudness value (PLV), the peak loudness frequency (PLF) and the loudness centroid (LC). The loudness peak value in the frequency range is the maximum of the loudness and the corresponding frequency is the peak loudness frequency. The loudness centroid describes the distribution characteristics of loudness along with the frequency, and the calculation formula is:

where N stands for the channel number,ERB()fn stands for the ERB frequency corresponding to the thnchannel, and ′()Nn stands for the loudness of the thnchannel. The meaning of these auditory spectral features is shown in Fig.2.

Fig.2 The meaning of auditory spectral features

4 Experimental data processing

First, the auditory perception features of the LFM signal are extracted to observe the characterization ability of the characteristics to the signal. Because the properties of the target echo are like the LFM signal, the perception features of the target echo can be predicted through them. Then the sea trial data is processed to validate the capability of the features to distinguish between the target echo and bottom reverberation. Here, the frequency of the signal is normalized by the sampling frequency.

4.1 The LFM signal analysis

The LFM signal, with the normalized frequency ranging from 0.03 to 0.06, is simulated here. Considering the operation distance and frequency response of the transducer, and the value of the acoustic wavelength relative to the dimension of the target, the frequency of the transmitted pulse also ranges from 0.03 to 0.06. To keep the signal intact, the center frequencies of the filter bank ranging from 0.03 to 0.06 are chosen, with 64 channels. After being applied to the filter bank, the features mentioned above are extracted. In order to see how the characteristics represent the properties of the signal, the curves of the characteristics changing along with the center frequencies of the filter bank are obtained. These curves are shown in Fig.3～Fig.8.

Fig.3 The peak value changing along with frequency

Fig.4 The SBAT changing along with frequency

Fig.5 The SBAS changing along with frequency

Fig.6 The SBDS changing along with frequency

Fig.7 The SBCorr changing along with frequency

Fig.8 The loudness value changing along with frequency

It can be seen in Fig.3 that there is a turning point on the curve of the output peak values changing along with the center frequency. It initially rises fast before the turning point, then rises slowly, and finally drops. This is due to the fact that in the time domain the peak of the filter presents an uptrend changing along with the increase of the center frequency, but on both ends of the signal there is no significant bulge on the output waveform. As shown in Fig.4 the subband attack time increases repeatedly with the frequency. This is because the frequency of the LFM pulse signal increases with time and the filter can only pass the frequency components within the passband. Thus, the time corresponding to the output waveform peak should be the time when the frequency of the LFM pulse signal reaches the center frequency of the filter. The subband attack slope initially rises, and then falls, demonstrating the relative changing characteristics of the peak value and the subband attack time. The value of the subband decay slope at low frequency is very small, and then significantly changes closing to the highest center frequency. This is due to the fact that the peak value is very close to the end around the highest frequency. Overall the value of the subband correlation is small, suggesting that the similarity is very low between the output waveform and other channels. The loudness reflects the distribution of the signal energy, with changing characteristics similar to the peak value. From these curves, it can be seen that the perception features can characterize the frequency property of the LFM pulse, which is the basis for classifying the target echo and reverberation in this article.

4.2 Sea trial data processing

In this section, the data from a sea trial is processed. This experiment is conducted on a proving ground with a plane sandy seabed. The target is placed on the bottom of the sea. A monostatic sonar with line array receivers is used to detect the target. The incident signal is vertical which can achieve a strong target echo and reverberation. And the waveform is the LFM signal, with normalized frequency ranging from 0.03 to 0.06. The target echo and reverberation are contained in the received signal. To illustrate the capacity of the perception characteristics to classify them, these two types of signals are separated before being applied to the filter bank. The center frequencies of the filter bank range from 0.03 to 0.06, with 64 channels.

The characteristics of the auditory perception are extracted from the target echo and the reverberation, respectively. The results show that not all characteristics can obtain the classification adequately. Some perform well, while others do not. As was pointed out earlier, the reverberation is a random process that depends heavily on the environment. In these situations, such as a varied number of scatters or various types of seabed sediment, the properties of the reverberation may be different. While the perception characteristics depict different points of the property, some of them may not be able to describe the differences between the target echo and reverberation. In different experiments, the characteristics that have the best performance can be chosen. In our experiment, there are 10 features that can classify the signals. Here are 4 of them as shown in Fig.9～Fig.12.

Fig.9 The distribution of the maxSBAT-F value

Fig.10 The distribution of the maxSBAS-F value

Fig.11 The distribution of the maxSBDS-F value

Fig.12 The distribution of the LC value

The SBAT reflects the time from the beginning to the peak of each output temporal envelope. Because of the obvious changing characteristics of the signal frequency, the longest SBAT is close to the length of the signal, which usually appears near the highest frequency. On the contrary, the shortest SBAT appears around the lowest frequency. It is similar to the LFM pulse signal, which has a more concentrated distribution. But the reverberation which is composed of several echoes, with different amplitudes and lengths of the echoes and irregular frequency changes, is greatly influenced by the actual situation. Here the selected samples include the measurement data from different sensors in the same situation, so the highest frequencies corresponding to the extracted SBAT are concentrated, as shown in Fig.9. The SBAS reflects the relative amounts between the peak value and the SBAT. The biggest SBAS of the target echo appears around the lowest frequency, and the smallest SBAS of the target echo appears around the highest frequency. They have obvious concentration and the reverberation is contrasted. Here the characteristics of the selected samples are shown in Fig.10.

The SBDS reflects the relative amounts between the peak value and the SBDT. The peak value of the target echo appears in the time when the signal frequency is equal to the filter center frequency. And the higher the frequency is the later the peak value appears, namely the shorter the SBDT, the bigger the SBDS. However the peak drops around the highest frequency, the biggest value of the SBDS should appear at some frequency point lower than the highest frequency, because this reverberation is not the same, as shown in Fig.11. The distribution of the loudness of the target echo is relatively uniform, and that of the reverberation mainly concentrates in the high frequency part. Therefore the LC value of the reverberation is larger than that of the target echo, as shown in Fig.12.

In order to obtain better performance, we used a combination of the multiple characteristics to separate the target echo and reverberation. The results are shown in Fig.13. In Fig.13(a), the characteristics including maxSBAS-F, maxSBDS-F, and LC all have excellent performance respectively. When combined together, it can be seen that there is quite a long distance between the two kinds of signals. In Fig.13(b), the maxSBDT-F and maxSBAT-F perform well, while the PLV does not. But from the combined features, the dots are also concentrated. The three features in Fig.13(c) have bad performance except for the LC. But a plane can be calculated in the 3-D space that can keep the two signals on each side.

Fig.13 The distribution of some kinds of combined features

From the figures, it can be seen that for these three situations, very good classification performance can be achieved, and the distribution of the target echo and reverberation are concentrated respectively. There are some individual points that are dispersed, which comes from the performance of the transducer channel and the selection of the data. When actually applied, these points can achieve better effect by limiting some parameters of the signal. In conclusion, though some characteristics have poor performance with the classification, they may be helpful to improve the performance when combined.

Through the simulation and on the basis of the results of the processing of the sea trial data, it can be seen that the auditory perception features can describe the differentcharacteristics between the target echo and bottom reverberation, and by using these characteristics, the active sonar received target echo and bottom reverberation can be effectively distinguished from one another.

5 Conclusions

Motivated by the unique advantages of the human auditory system, the auditory perception features are used to distinguish between the target echo and reverberation in order to improve the performance of underwater target detection and recognition. Based on the research achievements in the field of speech and tone recognition, the authors used the Gammatone filter bank as a human auditory model, and extracted the characteristics including the time-frequency perception feature and auditory spectral feature from the output of the filter. The experimental results show that the features of the target echo and reverberation have good concentrated characteristics and the separability is excellent between the different categories of the combined features. This verifies that it is feasible to classify the active sonar echoes based on auditory perception characteristics.

Allen N, Hines PC, Young VW (2011). Performances of human listeners and an automatic aural classifier in discriminating between sonar target echoes and clutter. The Journal of the Acoustical Society of America, 130(3), 1287-1298.

Chen Shixiong, Gong Qin, Jin Huijun (2008). Gammatone filter bank to simulate the characteristics of the human basilar membrane. Journal of Tsinghua University (Science & Technology), 48(6), 1044-1048. (in Chinese)

Ma Yuanfeng, Chen Kean, Wang Na (2008). The study of loudness’ calculation based on Moore’s model. Technical Acoustics, 27(3), 390-395. (in Chinese)

Moore BCJ, Glasberg BR (1996). A revision of zwicker’s loudness model. Acustica United with Acta Acustica, 82(2), 335-345.

Moore BCJ. Glasberg BR (2004). A revised model of loudness perception applied to cochlear hearing loss. Hearing Research, 188(1-2), 70-88.

Slaney M (1993). An efficient implementation of the Patterson-Holdsworth auditory filter bank. Technical Report No. TR-35, Apple Computer Inc.

Tucker S, Brown GJ (2003). Modelling the auditory perception of size, shape and material: Application to the classification of transient sonar sounds. The 114th Audio Engineering Society Convention, Amsterdam, 1-12.

Tucker S, Brown GJ (2005). Classification of transient sonar sounds using perceptually motivated features. IEEE Journal of Oceanic Engineering, 30(3), 588-600.

Van Immerseel L, Peeters S (2003). Digital implementation of linear Gammatone filters: Comparison of design methods. Acoustics Research Letters Online, 4(3), 59-64.

Wang Yang, Sun Jincai, Chen Kean (2006). Feature extraction of underwater targets based on psychoacoustic parameters. Journal of Data Acquisition & Processing, 21(3), 313-317. (in Chinese)

Wang Na, Chen Kean (2006). Investigation on underwater target recognition based on auditory characteristics. Proceedings of ASC 2006 Chinese Acoustic Academy Conference, Beijing, 413-414. (in Chinese)

Young VW, Hines PC (2007). Perception-based automatic classification of impulsive source active sonar echoes. The Journal of the Acoustical Society of America, 122(3), 1502-1517.

Yang Yang (2012). The feature extraction and classification of active sonar echoes based on timbre parameters. M.E thesis, Harbin Engineering University, Harbin. (in Chinese)

Zheng Wen, Chen Kean, Ma Yuanfeng (2007). Key problem of calculation of loudness based on Moore’s model. Audio Engineering, 31(6), 11-13. (in Chinese)

Author’s biography

Xiukun Li was born in 1962. She is a professor at Harbin Engineering University. Her current research interests include underwater acoustic signal processing, underwater buried object detection, sonar array signal processing and pattern recognition, etc.

1671-9433(2014)02-0218-07

date: 2013-10-16.

Accepted date: 2013-12-26.

Supported by the National Natural Science Foundation of China (Grant No.51279033).

＊Corresponding author Email: lixiukun@hrbeu.edu.cn