A dual channel perturbation particle filter algorithm based on GPU acceleration

时间：2024-07-28

LI Fan,BI Hongkui,XIONG Jiajun,YU Chenlong,and LAN Xuhui

1.Department of Graduate,Air Force Early-warning College,Wuhan 430019,China;

2.Department of Early Warning Intelligence,Air Force Early-warning College,Wuhan 430019,China

Abstract:The particle filter(PF)algorithm is one of the most commonly used algorithms for maneuvering target tracking.The traditional PF maps from multi-dimensional information to one dimensional information during particle weight calculation,and the incorrect transmission of information leads to the fact that the particle prediction information does not match the weight information,and its essence is the reduction of the information entropy of the useful information.To solve this problem,a dual channel independent filtering method is proposed based on the idea of equalization mapping.Firstly,the particle prediction performance is described by particle manipulations of different dimensions,and the accuracy of particle prediction is improved.The improvement of particle degradation of this algorithm is analyzed in the aspects of particle weight and effective particle number.Secondly,according to the problem of lack of particle samples,the new particles are generated based on the filtering results,and the particle diversity is increased.Finally,the introduction of the graphics processing unit(GPU)parallel computing the platform,the “channel-level”and “particle level”parallel computing the program are designed to accelerate the algorithm.The simulation results show that the algorithm has the advantages of better filtering precision,higher particle efficiency and faster calculation speed compared with the traditional algorithm of the CPU platform.

Keywords:particle filter(PF),dual channel filtering,graphic processing unit(GPU),parallel operation.

1.Introduction

The particle filter(PF)is an optimal regression Bayesian filtering algorithm based on Monte-Carlo simulation[1-3],which employs a set of sampled particles with correlation weights to approximate the system state distribution and outputs filtering results based on the particle weighted summation method.Compared with other filtering algorithms such as Kalman and H∞,PF has low requirements on environment and system,and this algorithm also has good expansibility and universality.It is widely used in aerospace,control and electronic fields.The main problems of PF are particle weight degradation[4,5]and sample scarcity[6],so far,many scholars have been devoted to the research of particle weight degradation and put forward a variety of solutions.The first is resampling,by copying the weight particles and removing the small weight particles to mitigate the weight degradation of the particles.There are mainly classification samplings such as combined sampling,residue sampling and importance resampling(SIR)and other methods.However,the resampling can lead to the loss of particle diversity or even serious sample shortage problems.The second is to improve the importance of PF density function,Kalman filter(KF),extended Kalman filter(EKF)and unscented Kalman filter(UKF)[7-11].Many methods are introduced into the distribution function of PF,such as Kalman particle filter(KPF)proposed by Xu et al.[12],extended particle filter(EPF)algorithm proposed by Zuoet al.[13],and unscented particle filter(UPF)algorithm proposed by Liu et al.[14].These methods suppress the degradation of the weight of particles to a certain extent,but the tracking effect is not ideal when the target maneuvers because of the limitations of these algorithms.The lack of samples and the degradation of weights are a contradiction between PFs and difficult to be completely eliminated,the degradation of weight is manifested as the weight in the partial particle concentration.The lack of the sample shows that the particle state distribution is too concentrated.In general,the more serious the degradation of particle weight,the more serious the lack of diversity after resampling.The lack of diversity is the negative effect of solving the weight degradation.Therefore,the key to solve this dual problem is the design of the resampling method,the current emergence of double sampling[15],effective particle size and other methods[16].

In addition,the performance of the PF algorithm is closely related to the number of particles,although the large number of particles can get better filtering effect,it will lead to the problem of low computational efficiency and low real-time performance[17-19].

In this paper,the PF algorithm is used to track the near space hypersonic target,the difference of prediction accuracy of particles in different dimensions is analyzed,a dual-channel perturbation particle filter algorithm is proposed based on graphic processing unit(GPU)acceleration (GPU-DC-PF),and the improvement of particle degradation is proved from the aspects of particle weight and effective particle number.Then, filtering results perturbation is used to generate new particles to alleviate the problem of particle scarcity.Finally,the GPU computing platform is introduced,and the implementation of the parallel scheme acceleration algorithm is designed.

2.GPU-DC-PF improved method

Assuming that the motion equation and the measurement equation of the system are respectively:

where X(k)and Z(k)are the state vector and the measurement vector,respectively;F(k)and H(k)are the state transition matrix and the measurement matrix,respectively;G(k)and u(k)are the input control matrix and the control signal,respectively;V(k)and W(k)are zero mean white Gaussian process noise and measurement noise,and their covariance are Q(k)and R(k)respectively.

2.1 PF algorithm

PF is essentially a sample of random particles{Xi(0:to approximate the statistical distribution of the state through the posterior probability density p[X(0:k)|Z(1:k)],after prediction and filtering to output the final filtering result based on particle weighting.X(0:k)is the state vector from 0 to k,Z(0:k)is the measurement vector from 0 to k,Xi(0:k)is the particle vector from 0 to k,qi(k)is the ith particle weight,Nsis the number of particles.However,since it is difficult to sample directly from p[X(0:k)|Z(0:k)],the importance probability density function π[X|Z]is usually used to obtain samples.The standard PF algorithm based on sequential important resampling(SIS)is as follows[20].

Step 1 Particle sampling

Randomly select Nssamples from the importance density function π[X|Z].

Step 2 Calculate the sample weights

The weights are normalized as

Step 3 Resampling

Preserve large weights particles and remove small weight particles. Then the posterior probability density distribution is approximately:

Step 4 State and covariance estimation

Conventional maneuvering target tracking is mostly performed in two-dimensional plane or three-dimensional space.The traditional PF maps from multi-dimensional information to one-dimensional information in Step 2 performs weight calculation,which leads to particle prediction information and weight mismatch.The physical nature of this problem is that the information transmission is incorrect and the information entropy is weakened,where the information entropy is the amount of information contained in the source.Sampling to get the current state of the forecast in Step1,be cause the prediction errors of different particles in different dimensions are different,the information entropy of each particle is also different,the information difference between the particles in the dimension is neglected in the process of Step 1 to Step 2,which leads to the decrease of the information entropy of the particles in Step 2.In view of this problem,this paper adopts the method of information equalization mapping to solve the entropy reduction problem of useful information in Step 1 to Step 2.

2.2 Particle prediction accuracy analysis

Assume that the target is a two-dimensional trajectory.The realization of PF weights can be calculated by likelihood function.

Fig.1 Likelihood functions based on residual joint distribution

Fig.1 shows the likelihood functions based on residual joint distribution,it is a curved surface,when the likelihood value Λj(k)is determined,the curved surface is crossed with a plane parallel to xoy,and the resulting tangent projected on the xoy plane is shown in Fig.2.There are numerous residual solutions satisfying the current likelihood.Obviously,the residuals differences of the particles in different dimensions are assimilated.In the traditional PF algorithm,the residual composition in the neighborhood of point B defaults to the current residual distribution in Fig.2,but the difference in the prediction accuracy of the particle in different dimensions is ignored in this algorithm.However,in the actual target tracking process,the target is in different dimensions of the movement speed,mobility cannot be completely consistent,that is,the residual distribution cannot be located in the B point neighborhood.There may be A,C and D points in Fig.2.Although the A,C and D points in the x and y dimension residual distribution differences are obvious,it is difficult to indicate this difference based on the joint distribution residual.

Fig.2 Residual distribution of likelihood function

Fig.3 shows mapping relationship between the particle residual,likelihood function and particle weight,assuming that the mapping of the residual to the likelihood function is fxy:Mxy→Nxy,where Mxy=and there arethat is,the mapping of fxyis not unique.Similarly,is not unique.It can be seen that there is no one-to-one correspondence between the joint distribution residual,the likelihood functions and the particle weight in the traditional PF algorithm.

Fig.3 PF mapping relationship between residual,likelihood function and particle weight

In the traditional PF algorithm,the weight of the particles is obtained based on the joint distribution residual,in fact,the accuracy of particle prediction for different dimensions has been compromised.If the accuracy of the particle j in different dimensions is similar at a certain time,the weight can indicate the real prediction accuracy of the particle j,but in most cases,the prediction accuracy of the majority of the particles in different dimensions is quite different.

Considering this problem,the GPU-DC-PF uses the edge distribution residual to describe the likelihood function,so that the algorithm can achieve a one-to-one mapping relationship between the residual,likelihood function and particle weight.The likelihood function of particle j in x-dimension is

where Zxis the measurement component,Rxis the measured noise variance component,is the x-dimensional measurement matrix,andis the covariance input.

According to(9),the shape of x-dimensional likelihood functionis a curve.When the likelihoodis determined,the distribution characteristic of the particle j in the x-dimensional prediction error can be accurately represented.

Fig.4 shows the GPU-DC-PF mapping relationship between residual,likelihood function and particle weight.Let the mapping relationship of the residual to the likelihood function be fx:Mx→Nx,wherethat is,fxis a subjective,and any residual has a likelihood value corresponding to it;there isis a single shot,and any residual has only one likelihood corresponding to it.

Fig.4 GPU-DC-PF mapping relationship between residual,likelihood function and particle weight

Assume that the mapping relationship of the residual to the likelihood function is gi:R(fx)→ Mx.∀n∈from fxfor a single shot that m1=m2,f-1xis unique,and any likelihood value exists with a residual expected to correspond to it;∀n∈R(fx),ifwhereAs defined by,we know that n1=n2,that is,is a single shot,any likelihood has and only one residual corresponding to it.Thereis one-to-one correspondence between the residuals,the likelihood functions and the particle weights in the GPU-DC-PF algorithm.

2.3 Particle degradation analysis

Assuming that the target observation state is Z(k)=[Zx(k),Zy(k)]Tat the time k,in the traditional PF,the weights of the two-dimensional particles are solved based on the current measurement information.Only a few of particles of considerable weight are involved in the final filtering calculation,and the efficiency is low.The essence of the PF algorithm is to select the current measurement ofneighborhood particles as the effective particle set Uxy,and through the effective particle set to get the filtering output,however,in the actual tracking process,the state of different dimensions is independent,and the prediction accuracy of any particle in different dimensions is different.The advantages of GPUDC-PF are analyzed in the aspects of particle weight and effective particle number.

2.3.1 Particle weight analysis

For convenience,let the covariance of residual at time kbe constant a,then there is

Fig.5(a)is a three-dimensional comparison of weights of the joint distribution and the edge distribution,where the edge distribution only considers x-dimensional measurement information.Fig.5(b)is the plane projection of Fig.5(a).The weight of the edge distribution is always greater than the weight of the joint distribution,when the difference ofandincreases,the GPU-DCPF has higher discrimination in particle size prediction in different dimensions,and can effectively suppress weight degradation.

Fig.5 Comparison of weights of different distributions

2.3.2 Analysis of effective particles numbers

Point A in Fig.6 has a very high approximation in the xdimension,but low in y-dimension.This difference leads to A/∈Uxy,where Uxyis the effective particle set of x and y dimensions.In this paper,a method to approximate the real state of particles based on different dimensions is proposed,the effective particle sets Uxand Uyare selected based on different dimensions,and the different dimension filter output is calculated according to different sets.

Fig.6 Particle distribution diagram based on real state

To sum up,Ux⊇Uxy,Uy⊇Uxy,the distribution of effective particles in the GPU-DC-PF is larger,the number of effective particles is larger,and the survivability of the particles is higher compared to the traditional PF.

2.4 Information entropy analysis

According to the definition of information entropy,we can get the information entropy of x-dimensional residual and likelihood function respectively:

where p(·)is the probability.The PF algorithm due to different dimensions of the residual coupling has h(vx(k))+h(vy(k))＞ h(Λ(k)),that is the lost part of the information in the transmission process,and it leads to a reduction in useful information.There is h(vx(k))=h(Λx(k))in GPU-DC-PF,the useful information in the delivery process is not eliminated and it ensures the correctness of the information transmission.In addition,h(vx(k))and h(Λy(k))are independent of each other,the information of different dimensions does not have coupling,and this feature is consistent with the rules of parallel computing.Similarly,other dimensions of GPU-DC-PF have the same nature.

2.5 Particle perturbation update

The lack of sample refers to the resampling after a few large weight particles are repeatedly copied,and this process will lead to a significant reduction in the number of different particles.In response to this problem,each time after filtering,the filter results are superimposed by randomized perturbation[21],which improves the particle diversity and alleviates the problem of particle depletion[22-24].

From the structure point of view,GPU-DC-PF does not change the main structure of the PF algorithm,the only difference is that the PF algorithm is carried out in different directions,PF only needs to calculate a likelihood function,a weight for a particle.In the GPU-DC-IMM,a particle needs to calculate the likelihood function and the weights separately in different dimensions.In addition,GPU-DCPF based on the results of the perturbation of new particles generates particles to alleviate the problem of lack of particle samples.

3.GPU-based dual-channel PF parallel computing design

The strong law of large numbers shows that the larger the number of particles,the better the accuracy and convergence performance of the filter,but the larger number of particle operations will have larger computational redundancy,which leads to the real-time problem that limits the application of the PF algorithm.In view of this problem,on the one hand,we can optimize the filter design,balance the contradiction of estimation accuracy and calculation redundancy;on the other hand,based on the current hardware level,open up a new computing platform to optimize algorithm implementation.

With the major breakthroughs in floating point operations,GPU-based parallel computing has become a feasible solution to real-time problems,in particular,since the emergence of the compute unified device architecture(CUDA)[25-27],GPU-based parallel computing has become one of the mainstream in aerospace,data mining,supercomputers and other high-performance common computing.The NVIDIA released a feasibility study based on GPU acceleration to enable the phased array radar missile tracking system to achieve 10 000 times performance improvements at NVIDIA’s GTC conference in 2016.At this point,GPU-based parallel computing has be come a phased array radar to achieve high real-time weathervane.This paper combines the characteristics of GPU multi-core multithread to plan GPU-DC-PF algorithm parallel computing tasks[28-30],a “channel-level”and “particle-level”parallel computing scheme is designed[25],while the GPUDC-PF “channel level”parallel corresponds to the GPU thread block parallel,and the “particle level”parallel corresponds to the thread parallel.The specific programs are shown in Fig.7 and Fig.8.

Fig.7 “Channel-level”parallelism

Fig.8 “Particle-level” parallelism

3.1 “Channel level” in parallel

The operation of different channels corresponds to different dimension filtering.The data are calculated independently in different channels,respectively,to complete the particle prediction,weight calculation,resampling,perturbation to generate new particle,and achieve different dimensions of the filter calculation.

3.2 “Particle level” in parallel

Each particle corresponds to a thread in the “particle-level” parallel program, the total time consuming of the algorithm is the time consuming for a single particle with the longest computation time.The detailed steps in the GPU-DC-PF are as follows.

Step 1 Particle parallelism prediction

Build the kernel threads with the same number of particles to complete the particle prediction,residual likelihood calculation,independently updating the state of each particle to achieve maximum parallelization.

Step 2 Weight calculation and normalization

Based on the residual likelihood of each particle,the weight of the particle is solved and normalized(serial).

Step 3 Particle serial resampling

Particle resampling is done by serial calculation when the correlation between particles is unavoidable for all particles resampling.

Step 4 Filter estimation

All the resampled particles are weighted summed,build the same number of kernel threads as the number of particles where each thread completes a particle state weighting to maximize the advantages of GPU parallel computing.

Step 5 Perturbs to generate new particles

With the target filter state as the mean,the state error covariance as the variance perturbation generates new particles instead of the multiple copies of the particles.The filter results are output to the CPU.The filter is completed at this moment and then the next moment cycle.

3.3 Time complexity analysis

If the traditional serial computing method is adopted,the time complexity of the particle filter algorithm is a function of the number of particles Ns,the time complexity is the linear order of the number of particles O(Ns).In the weight calculation and normalization stage,according to(3)and(4),the time complexity is O(Ns).In the resampling stage,using sequential importance sampling(SIS)for resampling,PF time complexity for the square order is about O().In the state and covariance estimation phase,the complexity is still O(Ns).GPU-DCPF is divided into different dimensions at the above stage.The GPU-DC-PF calculation is twice the PF in each stage.However,the superposition of time complexity of the same order does not affect the result,therefore,the GPU-DCPF is the same as the PF time complexity in the above stages.In addition,GPU-DC-PF also generates new particles by perturbation.At this stage,it is first necessary to detect and remove duplicate particles,similar to bubble sequencing,and its time complexity isAnd then produce new particles,the time complexity iswhereandare the number of particles remaining after the particles are deleted in different dimensions.Sinceis not constant to,its complexity is determined by the party that produces the majority of the particles.

While GPU-DC-PF using GPU to achieve parallel computing in the actual process,combined with GPU-DC-PF implementation steps,the time complexity is constant order O(1)in the parallel computing part,and the time complexity in the serial calculation section does not change.

4.Simulation analysis

Assuming that the target is released at a high altitude of 15 km,jumping cruises at 60 to 100 km altitude,the full ballistic motion pattern is complex and the maneuverability is strong.The state transition matrix F(T)=,and,the control matrix,observed noise w(k)and process noise n(k)are Gaussian random noise for N(unit/m)distribution,the sampling time interval is 1 s,with the simulation platform operating system for Windows 7,CPU model for the dual-core Intel(R)Core(TM)i7-3770 3.40 Hz,GPU model for the NVIDIA GeForce GTX 650,programming environment for Microsoft Visual Studio 2008.Compared with SIR,adapting particle filter(APF)[31]and sequential quadratic programming-particle filter(SQP-PF)algorithm[32],the number of particles is 1 000,and 100 Monte Carlo simulation results are shown in Fig.9,where only the SQP-PF is compared with the GPU-DC-PF due to the limitation of the picture.

In order to illustrate the advantages of the GPU-DC-PF algorithm compared with SIR,APF and SQP-PF,the particle weight distribution of 400-420s in different directions is compared and analyzed.

Fig.9 Comparison of filtering results

In order to verify the advantages of GPU-DC-PF in particle efficiency and computational efficiency,the filtering effect of the four algorithms is shown in Table 1 by changing the number of particles.Set the following parameters to evaluate the performance of the algorithm,overall root mean square error(ORMSE)is the total variance(unit/m),Ne is the number of effective particles,f is the number of filtering failures,T is the number of steps to track,and Tc is the time consuming(unit/s)of the algorithm.

where M and Neffiare the Monte Carlo times of successful filtering,the number of effective particles in the ith Monte Carlo simulation experiment,qji(k)is the weight of the normalized particle j at the time k in the ith experiment,ceil[·]is the rounding function,and qjkis the weight of the normalized particle j at time k.

Table 1 Comparison of algorithm performance

Compared with the above filter results,the GPU-DC-PF algorithm has the following advantages.

(i)The tracking accuracy improves.It can be seen from Fig.9 and Table 1 that the GPU-DC-PF algorithm has a relatively high accuracy when the number of particles is the same.GPU-DC-PF through the dual-channel particle distribution filter avoids the particles to predict the accuracy of assimilation in different directions,and more accurate characterization of different directions of different patterns of movement,especially when the difference in maneuvering strength is large in different directions,the traditional PF algorithm shows multiple error spikes,GPUDC-PF can maintain a more stable tracking accuracy.

(ii)Effectively mitigate the degradation of particle weights.It canbe seen from Fig.10that the particle weight distribution indifferent directions of the SQP-PF algorithm is difficult to accurately indicate the accuracy of particle prediction,that is,as shown in Fig.10(a)and Fig.10(c),there is a problem that the particle has a higher prediction accuracy in the y direction but lower in the x direction.The essential reason is that the prediction accuracy of the particle in the y direction affects the prediction accuracy in the other direction.The particles must simultaneously achieve higher prediction accuracy in different directions in order to obtain a larger weight in SQP-PF,but the actual tracking process in different directions of movement independent,this feature greatly reduces the number of effective particles,the particle weights are calculated through different channels to maximize the prediction advantage of particles in either direction in GPU-DC-PF,Fig.10(b)and Fig.10(d).Table1 shows that GPU-DC-PF has a more uniform particle weight distribution,more effective particles,better adaptability and fewer filter divergence times than SIR,APF and SQP-PF.

(iii)Real-time enhancement.Speed up the algorithm through the GPU to achieve rapid operation,reducing the filter time consuming,and greatly expanding the particle filter in the field of high real-time filtering applications.

Fig.10 Normalized particle weight distribution comparison

(iv)To a certain extent inhibit the lack of particles.As can be seen from Table 1,the GPU-DC-PF algorithm requires the smallest number of particles,and the generation of new particles is based on the results of filtering results in the expansion of particle diversity,which alleviates the problem of lack of particle diversity due to repeated replication of particles during resampling.

5.Conclusions

The GPU-DC-PF algorithm is proposed for the difference of the effectiveness of particles in different dimensions.Firstly,through the selection of effective particles in different dimensions,the problem of the degradation of traditional PF weights is alleviated,and the prediction advantage of particles in any dimension is brought into full play.Secondly,the new particles enhance the particle diversity based on the perturbation of the filtering results.Finally,in order to improve the real-time performance of the algorithm,a parallel scheme is designed according to the characteristics of GPU parallel operation,the simulation results show that GPU-DC-PF has improved performance in tracking accuracy,effective particle number,particle diversity and real-time performance.