Facial Landmark Localization by GibbsSampling pling

时间：2024-05-19

BofeiW ang,D iankai Zhang,Chi Zhang,Jiani Hu,and W eihong Deng(.ZTE Corporation，Shenzhen 58057，China;.Beijing University of Posts and Telecommunication，Beijing 00876，China)

BofeiW ang1,D iankai Zhang1,Chi Zhang2,Jiani Hu2,and W eihong Deng2
(1.ZTE Corporation，Shenzhen 518057，China;
2.Beijing University of Posts and Telecommunication，Beijing 100876，China)

In this paper，we introduce a novelmethod for facial landmark detection.We localize facial landmarks according to the MAP crite⁃rion.Conventional gradient ascent algorithms get stuck at the local optimal solution.Gibbs sampling is a kind of Markov Chain Monte Carlo(MCMC)algorithm.We choose it for optimization because it is easy to implement and it guarantees global conver⁃gence.The posterior distribution is obtained by learning prior distribution and likelihood function.Prior distribution is assumed Gaussian.We use Principle Component Analysis(PCA)to reduce the dimensionality and learn the prior distribution.Local Linear Support Vector Machine(LL⁃SVM)is used to get the likelihood function of every key point.In our experiment，we compare our de⁃tector with some other well⁃known methods.The results show that the proposed method is very simple and efficient.It can avoid trapping in local optimal solution.

facial landmarks;MAP;Gibbs sampling;MCMC;LL⁃SVM

1 Introduction

F acial landmark detection isa crucialstep for face⁃re⁃lated tasks such as face recognition[1]-[3]，face tracking，face animation and 3D facemodeling.The accuracy of detection significantly affects the perfor⁃manceofsuch face⁃related systems.

Existingmethods[4]-[15]for facial landmark detection are:

1)A component detector，which is usually a classifier trained for the local feature of each component，is used to search the whole face image and decide which subwindow is the relevant component.Tomake it robust for the degradation and corruption of local feature，shape constraint is combined to choose theoptimal figuration of thekey points.

2)A regressor is trained according to the whole image re⁃gion or local feature.The regressor directly predicts the posi⁃tion of the key points.It ismore efficient because it predicts the key points without scanning.Facial landmark detection is quite challengingwhen a face image isaffected by the face an⁃gle，facial expression，and accessories such as glasses.In Fig. 1，we show some such challenging images.

In an Active Shape Model(ASM)[6]，the deformable face shapes are represented by a setof key points thatare localized with feature detection methods.The shape variations aremod⁃ eled by Principal Component Analysis(PCA)so that the face shape can only vary in controlled direction which is learned during training.An Active Appearance Model(AAM)[4] solves the problem by jointlymodeling holistic appearance and shape.In AAM，the shape and texture are combined in the PCA subspace so that PCA coefficients are jointly tuned to re⁃duce thegeometry and texture differences from themean face.

▲Figure1.Examplesof face imagewithwider rangeofpose,expression and affiliations.

Everingham et al.[9]model the face configuration by using pictorial structures and handle awider range of pose，lighting，and expression bymodeling the jointprobability of the location of nine fiducials relative to the bounding box with amixture of Gaussian trees.Belhumeur etal.[12]propose a Bayesianmod⁃el that combines the outputs of the local detectorswith a con⁃sensus ofnon⁃parametric globalmodels for part locations.Uri⁃car etal.[10]jointly optimize appearance similarity and defor⁃mation cost with a parameterized scoring function where the parametersare learned from training instances rather than vali⁃dation instances using the structured output Support Vector Machine(SVM)classifier.In recent years，many regression methods have been proposed.Thesemethodsmake it possibleto precisely localize facial landmarks.Dantone et al.[11]use the head pose as a global feature and uses conditional regres⁃sion forests to learn the distributions conditional to global face properties.Cao etal.[7]directly learnsa two⁃levelboosted re⁃gression function based on shape indexed features to infer the whole facial shape from the image and explicitlyminimize the alignment errors over the training data.Xiong et al.[16]pre⁃dictshape incrementby applying linear regression on SIFT fea⁃tures.

In this paper，we introduce a novelmethod for detecting fa⁃cial landmarks.We localize the landmarks according to the Maximum a posterioriprobability(MAP)criterion.The posteri⁃or distribution is obtained by learning prior distribution and likelihood function.Prior distribution is assumed to be Gauss⁃ian.Local Linear SVM(LL⁃SVM)[17]，[18]is used to obtain the likelihood function ofevery key point.Tomaximize the pos⁃terior distribution and guarantee global convergence，we use Gibbs sampling[19].Compare to the existing methods，our method can efficiently optimize the posterior probability in a huge probability space.

The remaining of this paper is organized as follows.Section 2 describes the detailed methodology for localizing the facial landmarks.Section 3 explains the experiment configuration. We summarize thispaper in Section 4.

2 Localization of Facial Landmarks

We localize facial landmarkson the face image.We firstob⁃tain the face box by an off⁃the⁃shelf face detector，and then we normalize the face image to 100×100 size.We implement the localization work on the normalized face image and then con⁃vertback to the original image.We denote the facial landmark position as a vector X=[x1，y1，x2，y2，...，xm，ym]Twhere xi，yiare the horizontal and vertical coordinates of the landmark，the gray face image IG.Localizing the position is used to find an optimal X*in the face image by maximizing the posterior probability.The posteriorprobability isgiven by:

We first learn the prior probability density distributionThen，we use a novelmeth⁃ od to find the optimal X*.To locate the key points on the face box detected by our off⁃the⁃shelf face detector，the key points are restricted to a small region relative to the face box.There⁃fore，we define a search window for every key point and locate the key points in the corresponding window.We calculate the distribution of each key point.Then the search window is de⁃fined to includealmostallpoints.

2.1 Likelihood Learning

The likelihood is used to measure the feature similarity of the probe points and the truth points.To obtain the likelihood score，we usemethodssuch asstatisticmodel，regressionmeth⁃od，and discriminativemethod.We choose the discriminative method to obtain the likelihood score.We call this likelihood score in the searching window as salientmap in the following part.We calculate the likelihood probability of normalizing the likelihood score to 0⁃1.0.The intensity value of each pixel in the neighborhood isused as the local feature.The LL⁃SVMcan produce the likelihood score for every point.Next，we simply depicta fastversion of LL⁃SVM.As described in[17]，the cod⁃ing vector of orthogonal coordinate coding(OCC)should be normalized in L1⁃norm.In our experiment，we find that，ifwe have normalized x，it is unnecessary to normalize the coding value Cxbecause such L1⁃normalization is trivial in improv⁃ing localization accuracy.So，to simplify our deduction，we omit the laststep ofOCC.Further，the localization task is con⁃cerned only with the relative value of the LL⁃SVMoutput on the detected area.Therefore，the bias b can be ignored，and the decision function isaquadratic form，given by:

where xj，yj，αjare the supportvector，the labelof the sup⁃port vector，and the coefficient of the support vector.G is the generator matrix composed of the normalized orthogonal bases using SVD.Then we can re⁃write A to a real symmetricmatrix.A'=(A+AT)/2，which can begiven as:

where Z=[] Z1，Z2，Z3，Z4，…is the orthogonalmatrix consist⁃ ing of eigenvectors of A'andis a diagonal matrixwhose diagonalelement is the responding Eigenvalues.

where d is the number of nonzero eigenvalues.Ifλiis sorted by its absolute value in descend order.We can approximate xTAx as:

From experience，we find that n0can be highly compressedrelative to n.The result is guaranteed even whenThe training procedure for Fast LL⁃SVMis shown in A lgo⁃rithm 1.In experiment，the fast LL⁃SVMcan be 3 times faster than LL⁃SVM.

Algorithm 1.Training for Fast LL⁃SVMDenotes:D is thematrix of training samples.Each columns of D isa feature vectorofa train sample. 1:Learning Orthogonal Basis Vectors(u,s,v)=SVD(D)，se⁃lect the top N columns of u as the basis vectors.These vectors assemble thegeneratingmatrix G. 2:Compute training instancematrix K Kij=K() i，j= 3:Use the traditional SVMpackage to solve Feed matrix K，training labels and other parameters to the SVMpackage，get the supportvectors SVs and respondingweightα. 4:Compute A′，then the approximate coefficientsλiand vectors zi.

2.2 Prior Learning

The prior probability of X can be learned from the training set.Tomake it convenient to deduce，we describe X as ama⁃trix with dimension n2.Each row represents one key point. First，we decompose X into several independent parameters: scale，angle，translation and inherentshape factor.Thenweget

where s is the scale factor，Tris the orthogonal matrixrelated to the rotation of key points;Cxyis the translation componentwith each row as[CxCy];and V is the normalized shape.Because these parameters are independent，the priorprobability is:

Wemodel different parameters in differentways.Similar to ASM，the distribution of the inherent shapew is also assumed Gaussian.Thenwe use PCA to reduce the dimension and learn the distribution.

In this formula，we unfold V and uvin vector B is thematrix composed by the eigenvectors of the covariancematrix of V.is The mean normalized shape is denoted uv.The eigenvalue is the variance in each dimension.We could obtain the prior dis⁃ tribution of V by distribution of u.

2.3 TheOptim ization by Gibbs Samp ling

Once the prior and likelihood probability have been deter⁃mined，we search for the optimal X tomaximize the posterior probability.We find the globally optimal solution by traversal，but the computation complexity is very high.For example，if we intend to locate seven key points，every key point has 900 candidate locations.Therefore we have to perform 9007times. Markov⁃chain Monte Carlo(MCMC)method is a technique for sampling from probability distributions by constructing a Mar⁃kov chain whose equilibrium distribution is equal to the target distribution.MCMC is used in our system for the property of guaranteed global convergence.It has been successfully used in face prior learning and image segmentation.There are differ⁃entkindsofMCMCmethods，includingMetropolis⁃Hastingsal⁃gorithm，Gibbs sampling，and Slice sampling.Gibbs sampling does not require any tuning if all the conditional distributions of the target distribution can be sampled exactly.Thus，we choose Gibbs sampling.

In our Gibbs sampling，the key points location X is decom⁃posed into different parameters.The probability space is sim⁃ply controlled by P={} u，s，θ，Cxy，where u is them⁃dimension⁃al vector decided by PCA and Cxyis two⁃dimensional vector. That is to say，we can sample X by sampling the parameters. We write P in P={P1，P2，P3，...，PK}，where K is the total number of parameters of themodel.We denote the i th sample Pi.Thesampling process isas follows:

1)Beginwith initialvalue P0;

2)For each sample Pi，sample each variablefrom its conditional distribution up on the othersThat is，sampleeach variable from its condition distribution up⁃on all other variables，using of themost recent value for each variable.

The conditional distribution ofeach variable can be comput⁃ed from the joint density.The conditional distribution is the marginal distribution ofeach random variable becausewe have defined them as independent.Repeating step 2，we obtain the samples P1，P2，P3，...subjected to the target distribution. Then，we get samples of locations X1，X2，X3，...，Finally，X*iscomputed using(1).

For initialization，we have tried two methods:Average of Synthetic Exact Filters(ASEF)[20]and SVM.We choose SVMbecause it ismore precise than ASEF.The accuracy of initial⁃ization affects the convergence rate of Gibbs Sampling.In this paper，we use LL⁃SVMto obtain the likelihood score for every key point.Details are given in the last section.We choose themaximum probability pointas the initialpoints.

3 Experimentand Com parison

In this section，we present the experiment to evaluate the proposed facial landmark detector.We also compare the pro⁃posed method with such methods as the independently SVMdetector，active shape models，and the detector proposed by Everingham etal.

3.1 Experim ent Settings

To determine the effectiveness of the proposed method，we test our detector in the Labeled Faces in theWild(LFW)[21] database.There are 13，233 images，each 250×250 pixels.It contains a great variance and the image quality is very low，which is realistic.Dantone et al.describe themanual annota⁃tion of theeight interested landmarksof LFW.

To keep comparabilitywith[10]，we randomly split the LFW database into training，testing and validation sets.The propor⁃tion of the three sets is 6:2:2.We compared with other compet⁃ing detectors on the same testing set.The training and valida⁃tion partsare selected using the samemethod as in[10].

We used the proposed detector and base line independent SVMdetector.The other competing detectors had their own training databases.Different evaluation criteria are used to measure the detectors:mean normalized deviation，defined by (10);andmaximalnormalized deviation，defined by(11).

where xjis the jth points ground truth location andis the j th points predict location.Mis the number of key points.K(X) is the normalization factor[10]，de fi ned as the length of the line connecting themid⁃pointbetween the eye centerswith the mouth center.

In our experiment，we test the detector with 7⁃landmarks: cornersof the eyes(4 landmarks)，cornersof themouth(2 land⁃marks)，and thenose.

3.2 Com pared Methods

In this sub⁃section，we introduce all themethods compared in our experiments.Some examples of localization are shown in Fig.2.

3.2.1 Proposed Method

First，we use the Fast LL⁃SVMto determine the likelihood score for every key point.Details are given in the last section. We choose themaximum probability pointas the initial points. In theory，we can obtain the optimal solution of X in a certain number of steps.Through experimentation，we find that the time needed for convergence is very long.The reason for this is that the posture is various，leading to a very huge prior proba⁃bility space.

▲Figure2.Examplesof localization in LFW database.

We align the key points to a template by procrustes analysis method[22]and then determine the priormodel according to the aligned points.Thus，the probability space is compressed. Fig.3 shows a comparison between the two conditions.We find that the probability space is smallafter the alignmentpro⁃cess.Then，wemake samples in the probability space by the Gibbs Sampling algorithm.Finally，the optimal X*is comput⁃ed using(1).

3.2.2 Independently SVMDetector

This detector consists of eight independent SVMclassifiers for each landmark.For training，LIBSVM[23](the SVMtool package)is used.For each individual landmark，the training set is constructed.This set contains examples of the positive and negative class.We use the gray value of pixel patch as the features.That is，for each point，we choose the pixel value in theneighborhood as its feature.Thepositive samplesaregener⁃ated by patches cropped around the ground truth positions of the respective components.The negative samples are patches cropped from the image with a certain distance to the ground truth positions.The distance between the negative samples and theground truth pointssatisfies the following condition:

▲Figure3.Comparison of theprobability distribution ofscaleand theta.

where xjis the ground truth location of j th key point andis the negative samples location，we setβ=0.1 in the experi⁃ment.

Tomake the comparison meaningful，we also use Fast LL⁃SVM.The SVMregularization constantC isalso setas 50 in or⁃der tominimize the classification error computed on the valida⁃tion part of the LFW database.The parameter setting is the sameas thatused in ourproposedmethod.

For testing，we use the well⁃trained classifier of all compo⁃nents to predict the resultsof points in the test image.For each facial landmark，we choose the point with the maximum re⁃sponseof the classifieras thepredicted position.

3.2.3 Active ShapeModels

ASMs[4]，[6]are statisticalmodels of the shape of objects that are iteratively deformed to fit to an example of the object in a new image.The ASMalgorithm has been widely used to analyze facial andmedical images.Some extensions to this al⁃gorithm have been proposed.For example，Constrained Local Models[5]use PCA tomodel the landmark’s appearance，and Boosted Regression Active Shape Models[24]use boosting to predict a new location for each point，given the patch around the current position.Stasm[25]is a C++software library that is also based on ASM.We compare our proposedmethod with Stasm.

3.2.4 Detector Proposed by Everingham etal.

Everingham et al.[9]handle a wider range of pose，lighting，and expression bymodeling the jointprobability of the location of nine fiducials relative to the bounding boxwith amixture of Gaussian trees.The local appearance model is learned by a multiple instance variantof the AdaBoostalgorithm with Haar⁃like featuresused as theweak classifiers.The deformation cost is expressed as amixture of Gaussian trees，and the parame⁃ters in thismix are learned from examples.This landmark de⁃tector is publicly available and we compare itwith our detec⁃tor，which is trained in a databaseof consumer images.To com⁃pare this detector，we consider only the relevant landmarks for our detector.

3.2.5 Flandmark Detector Proposed by Uricar etal.

Uricar et al.[10]jointly optimize appearance similarity and deformation costwith a parameterized scoring function.The pa⁃rameters in this function are learned from training instances rather than validation instances using the structured output SVMclassifier.

3.3 Analysisand Im provement

3.3.1 Sampling Analysis

We evaluate the sampling performance by the corresponding minimum⁃error sample.The upper⁃bound of location accuracy is then calculated.The accuracy is defined as the proportion of samples whose location errors are within a certain value.By mean deviation within 0.10 is 92.45%，and appears higher withmore samplings，but it is almost stable after 300 itertions. But the upper bound of accuracy when mean deviation within 0.05 keepsgrow withmore samplings. continually increasing the sampling iterations，we depict the upper⁃bound curve of location accuracy(Fig.4).As we can see，within 200 iterations the upper bound of accuracy when

▲Figure4.Theupper⁃bound ofaccuracy(cumulativehistograms for themean deviation)increaseswith theGibbsSampling iterations augment.

One problem is inaccuracy because it is very hard to only sample at the ground truth position.We propose two schemes to solve this problem.The two schemes are denoted asmethod Promotion 1，method Promotion 2.First，we filter the likeli⁃hood score through a Gaussian filter tomake the score plane smooth.In thisway，we avoid very low ground truth due to lit⁃tle noise.In the other solution，“hard”sampling is changed to“soft”sampling.We sample a soft constraint for every point. That is，we construct a Gaussian Window centered in the each sampling point as the softweight，and then find themaximum response in the window afterweighted.Fig.5 shows the obvi⁃ouseffectof the two strategies to theproposedmethods.

We also propose a method to improve the efficiency of Gibbs Sampling.Themethod like restarting is used.After ev⁃ery few iterations，we restart the sampling procedure with the lastoptimal sample.In our experiment，thismethod can largely reduce the sampling iterations.

3.3.2 Accuracy Comparison

We present the accuracy comparison in Fig.6.The evalua⁃tion criteria are themean normalized error and maximum nor⁃malized error.Tab le 1 shows the percentage of examples from the test partof the LFW databasewith themean/maximal nor⁃malized deviation lessorequal to 10%.The curvesof the Flan⁃dmark detector and detector of Eveingham etal.are estimated according to the results in[10].The proposed method esti⁃matesmore than 97%of the imageswith themean normalizeddeviation less than 10%.This is similar to the Flandmark de⁃tector and far better than the othermethods.Ourmethod esti⁃matesmore than 65%of the images with themax normalized deviation less than 10%.This is far better than the Flandmark detector，which is53.23%.

▲Figure5.Cumulativehistograms for themean and themaximalnormalized deviation shown for theproposedmethod and promotionmethods.

▲Figure6.Cumulativehistograms for themean and themaximalnormalized deviation shown for allcompared detectors.

▼Table 1.Percentage of examples from the test part of the LFW data⁃basewith themean/maximalnormalized deviation lessor equal to 10%

Independent LL⁃SVMs only detect the key points in localar⁃ea.They donotutilize the shape constraint.The Flandmark de⁃ tector[10]jointly optimizesappearance similarity and deforma⁃tion costwith a parameterized scoring function using the struc⁃tured output SVMclassifier.The detector proposed by Eveing⁃ham et al.uses an ensemble ofweak classifiers as a local dis⁃criminative classifier.The deformation cost is expressed as a mixture of Gaussian trees whose parameters are learned from examples.ASMalso uses PCA tomodel the face appearance，while its“profilemodel”is less powerful then LL⁃SVM.This“profilemodel”looks for strong edgesor uses the Mahalanobis distance tomatch amodel template for the point.

The proposed Gibbs Sampling method is a novel method that combines the local discriminative information with the global constraint.We use PCA model to constrain the shape whereas existingmethods always design out sophisticated for⁃mula.Themain benefit of this algorithm is its powerful local discriminative classifier and its simple theory to utilize the global constraint.Such a simplemethod can achieve even bet⁃ter results.

4 Conclusion

In this paper，we propose a novelmethod for facial land⁃marksdetection.Weuse theMAPcriterion to localize the land⁃marks and LL⁃SVMto get the likelihood function of every key point.PCA is used to reduce the dimensionality and learn the prior distribution.The posterior probability is optimized by Gibbs sampling.Various experiments on LFW database have shown that thismethod is effi cient.The problem is that pro⁃posedmethod isstill tooslow toapply to real time system.

[1]W.Deng，J.Hu，J.Guo，W.Cai，and D.Feng，“Robust，accurate and efficient face recognition from a single training image:a uniform pursuit approach，”Pat⁃tern Recognition，vol.43，no.5，pp.1748-1762，May 2010.doi:10.1016/j.pat⁃cog.2009.12.004.

[2]W.Deng，J.Hu，J.Lu，and J.Guo，“Transform⁃invariant PCA:a unified ap⁃proach to fully automatic face alignment，representation，and recognition，”IEEE Trans.Pattern Anal.Mach.Intell.，vol.36，no.6，pp.1275-1284，Jun.2014. doi:10.1109/TPAMI.2013.194.

[3]W.Deng，J.Hu，and J.Guo，“Extended SRC:undersampled face recognition via intraclass variant dictionary，”IEEE Trans.Pattern Anal.Mach.Intell.，vol.34，no.9，pp.1864-1870，Sept.2012.doi:10.1109/TPAMI.2012.30.

[4]T.F.Cootes，G.J.Edwards，and C.J.Taylor，“Active appearance models，”IEEE Trans.Pattern Anal.Mach.Intell.，vol.23，no.6，pp.681-685，Jun.2001. doi:10.1109/34.927467.

[5]D.Cristinacce and T.Cootes，“Automatic feature localisation with constrained lo⁃calmodels，”Pattern Recognition，vol.41，no.10，pp.3054-3067，Oct.2008. doi:10.1016/j.patcog.2008.01.024.

[6]T.F.Cootes，C.J.Taylor，D.H.Cooper，and J.Graham，“Active shapemodels⁃their training and application，”Comput.Vision and Image Understanding，vol. 61，no.1，pp.38-59，1995.doi:10.1006/cviu.1995.1004.

[7]X.Cao，Y.Wei，F.Wen，and J.Sun，“Face alignment by explicit shape regres⁃sion，”Int.J.Comput.Vision，vol.107，no.2，pp.177-190，Apr.2014.doi: 10.1007/s11263⁃013⁃0667⁃3.

[8]X.Zhu and D.Ramanan，“Face detection，pose estimation，and landmark local⁃ization in the wild，”in IEEE Conf.Comput.Vision Pattern Recognition,Provi⁃dence，USA，Jun.2012，pp.2879-2886.doi:10.1109/CVPR.2012.6248014.

[9]M.Everingham，J.Sivic，and A.Zisserman，“Hello!my name is...buffy—auto⁃matic naming of characters in TV video，”17th BMVC，Edinburgh，UK，Sept. 2006，pp.899-908.doi:10.5244/C.20.92.

[10]M.Urˇicˇárˇ，V.Franc，and V.Hlavácˇ，“Detector of facial landmarks learned by the structured output SVM，”in 7th Int.Conf.Comput.Vision Theory Appl.，Rome，Italy，pp.547-556.

[11]M.Dantone，J.Gall，G.Fanelli，and L.Van Gool，“Real⁃time facial feature de⁃tection using conditional regression forests，”in IEEEConf.Comput.Vision Pat⁃tern Recognition,Providence，USA，Jun.2012，pp.2578-2585.doi:10.1109/ CVPR.2012.6247976.

[12]P.N.Belhumeur，D.W.Jacobs，D.J.Kriegman，and N.Kumar，“Localizing parts of faces using a consensus of exemplars，”IEEE Trans.Pattern Anal. Mach.Intell.，vol.35，no.12，pp.2930-2940，Dec.2013.doi:10.1109/TPA⁃MI.2013.23.

[13]J.Hu，Y.Li，W.Deng，J.Guo，and W.Xu，“Locating facial features by robust active shape model，”in 2nd IEEE Int.Conf.Network Infrastructure Digital Content，Beijing，China，Sept.2010，pp.196-200.doi:10.1109/IC⁃NIDC.2010.5657840.

[14]X.P.Burgos⁃Artizzu，P.Perona，and P.Dollár，“Robust face landmark estima⁃tion under occlusion，”in IEEE Int.Conf.Comput.Vision，2013，pp.1513-1520.doi:10.1109/ICCV.2013.191.

[15]Y.Sun，X.Wang，and X.Tang，“Deep convolutional network cascade for facial point detection，”in IEEE Conf.Comput.Vision Pattern Recognition，Portland，USA，Jun.2013，pp.3476-3483.doi:10.1109/CVPR.2013.446.

[16]X.Xiong and F.De la Torre Frade，“Supervised descentmethod and its appli⁃cations to face alignment，”in IEEE Conf.Comput.Vision Pattern Recognition，Portland，USA，Jun.2013，pp.532-539.doi:10.1109/CVPR.2013.75.

[17]Z.Zhang，L.Ladicky，P.Torr，and A.Saff ari.(2011).Learning anchor planes for classification.in Neural Inform.Process.Syst.Conf.，Granada，Spain，2011，pp.1611-1619.

[18]L.Ladicky and P.Torr，“Locally linear supportvectormachines，”in 28th Int. Conf.Mach.Learning，Bellevue，USA，2011，pp.985-992.

[19]S.Geman and D.Geman，“Stochastic relaxation，Gibbs distributions，and the Bayesian restoration of images，”IEEE Trans.Pattern Anal.Mach.Intell.，vol. PAMI⁃6，no.6，pp.721-741，Nov.1984.doi:10.1109/TPAMI.1984.4767596.

[20]D.S.Bolme，B.A.Draper，and J.R.Beveridge，“Average of synthetic exact fi lters，”in IEEE Conf.Comput.Vision Pattern Recognition，Miami，USA，Jun. 2009，pp.2105-2112.

[21]G.B.Huang，M.Mattar，T.Berg，and E.Learned⁃Miller.(2007).Labeled faces in the wild:a database for studying face recognition in unconstrained environ⁃ments.[Online].Available:http://vis⁃www.cs.umass.edu/lfw/lfw.pdf

[22]D.G.Kendall，“A survey of the statistical theory of shape，”Statist.Sci.，vol.4，no.2，pp.87-99，1989.

[23]C.⁃C.Chang and C.⁃J.Lin，“LIBSVM:a library for support vectormachines，”ACMTrans.Intell.Syst.Technol.，vol.2，no.3，article 27，Apr.2011.doi: 10.1145/1961189.1961199.

[24]D.Cristinacce and T.Cootes.(2007).Boosted regression active shapemodels. BMVC[Online].Available:http://www.dcs.warwick.ac.uk/bmvc2007/proceed⁃ings/CD⁃ROM/papers/paper⁃131.pdf

[25]S.Milborrow and F.Nicolls.(2014).Active shapemodelswith sift descriptors and mars.[Online].Available:http://www.milbo.org/stasm⁃files/active⁃shape⁃models⁃with⁃sift⁃and⁃mars.pdf

Biographies phies

BofeiW ang(wang.bofei@zte.com.cn)received his BE degree in electronic informa⁃tion engineering and MS in Communication and information system from Huazhong University of Science and Technology(HUST)，China in 2003 and 2007.He is a se⁃nior video and image algorithm engineer of ZTE Corporation.His research interests include videoand imageprocessing，pattern recognition，and computer vision.

Diankai Zhang(zhang.diankai@zte.com.cn)received his BE degree in electronic in⁃formation engineering and MS degree in signal and information processing from Nanjing University of Posts and Telecommunications(NUPT)，China in 2006 and 2009.He isa senior video and image algorithm engineer of ZTECorporation.His re⁃search interests include video and image processing，pattern recognition，and com⁃puter vision.

Chi Zhang(zhangchi2013@bupt.edu.cn)received his BE degree in Electronic Infor⁃mation Engineering from NUPT in 2013，and is currently amaster student in School of Information and Telecommunications Engineering of Beijing University of Posts and Telecommunications(BUPT)，China.His research interests include pattern rec⁃ognition，machine learning，and computervision.

Jiani Hu(jnhu@bupt.edu.cn)received her BE degree in telecommunication engi⁃neering from China University of Geosciences in 2003，and PhD degree in signal and information processing from BUPT in 2008.She is currently a lecturer in Schoolof Information and Telecommunications Engineering，BUPT.Her research in⁃terests include information retrieval，statisticalpattern recognition，and computer vi⁃sion.

W eihong Deng(whdeng@bupt.edu.cn)is an associate professor in School of Infor⁃mation and Telecommunications Engineering，BUPT.His research interests include statistical pattern recognition and computer vision，with a particular emphasis on face recognition.He has published over 40 papers in international journals and con⁃ferences，including a technical comment on face recognition in Science magazine. He also serves as the reviewer for several international journals，such as IEEE TPA⁃MI，IJCV，IEEE TIP，IEEE TIFS，PR，and IEEE TSMC⁃B.His dissertation titled“Highly accurate face recognition algorithms”wasawarded the Outstanding Doctor⁃al Dissertation by Beijing Municipal Commission of Education in 2011.He has been supported by the program for New Century Excellent Talents by the Ministry of Education ofChinasince2013.

2014⁃08⁃20

10.3969/j.issn.1673-5188.2014.04.004

http://www.cnki.net/kcm s/detail/34.1294.TN.20141208.1906.001.htm l,pub lished online Decem ber 8,2014

This wo rk is suppo rted by ZTE Industry⁃Academ ia⁃Research Cooperation Funds.