时间:2024-05-19
QIU Chen,DAI Tao,GUO Bin,YU Zhiwen,LIU Sicong
(1.Northwestern Polytechnical University,Xi’an 710072,China;2.Chang’an University,Xi’an 710064,China)
Abstract:Person identification is the key to enable personalized services in smart homes,including the smart voice assistant,augmented reality,and targeted advertisement.Although research in the past decades in person identification has brought technologies with high accuracy,existing solutions either require explicit user interaction or rely on images and video processing,and thus suffer from cost and privacy limitations.In this paper,we introduce a devicefree personal identification system–HiddenTag,which utilizes smartphones to identify different users via profiling indoor activities with inaudible sound and channel state information(CSI).HiddenTag sends inaudible sound and senses its diffraction and multi-path reflection using smartphones.Based upon the multi-path effects and human body absorption,we design suitable sound signals and acoustic features for constructing the human body signatures.In addition,we use CSI to trigger the system of acoustic sensing.Extensive experiments indicate that HiddenTag can distinguish multi-person in 10–15 s with 95.1%accuracy.We implement a prototype of HiddenTag with an online system by Android smartphones and maintain 84%–90%online accuracy.
Keywords:person identification;acoustic sensing;CSI;smart home
N umerous applications are enabled with the realization of smart living environments and Internet of Things(IoT).Person identification is essential for smart home services,such as real-time recommendations on TV and human-machine interactions in video games[1–2].Based on the personalized applications,users can obtain desirable services pervasively[3–5].Therefore,an accurate,light-training and real-time person identification approach is needed.
Existing person identification mechanisms have many limitations that prevent them from being adopted pervasively.One of the biggest limitations is that they are often intrusive to users’privacy.Camera and computer vision based solutions can recognize different persons effectively,but unfortunately users’faces,gestures and other information may be exposed to others[6–8].For example,monitoring a person’s face when she/he is sitting on the sofa and walking in the hallway may cause privacy concerns.
Moreover,many person identification methods need a user to do extra work to help recognize the user.For example,smart speakers such as Amazon Echo and Google Home can identify users by their voiceprints.This approach requires users to speak to trigger recognition[9],which is a reactive solution.We therefore ask the question:can we identify users without asking them to do any additional work and preserve their privacy?
To this end,we introduce HiddenTag,a new device-free person recognition system without pre-installed infrastructure or additional sensors.Only with the built-in smartphone,as shown in Fig.1,the acoustic sender provides the high frequency sound(18–21 kHz)from which people cannot hear.When a user enters the smart environment,the user can keep the normal activities,such as walking,standing,and other types of human activities,and all these can be profiled by an acoustic receiver on off-the-shelf mobile devices.Based on the multi-path effects and bodies absorption in experimental scenarios,HiddenTag constructs the acoustic signatures for different persons.We design high frequency based features and enrich these features by utilizing sweeping and multi-tone techniques.Besides,we explore channel state information(CSI)to detect the human body and trigger the person identification approach.By leveraging machine learning models,our system recognizes different users efficiently in smart home environments.Case studies show the online identification can reach 90.2%accuracy and the corresponding offline group achieves 96.0%accuracy.
▲Figure 1.Concept view of HiddenTag
In addition,we pre-trained some common types of noises in the learning model and made HiddenTag more robust to noises.According to the collected historical data and temporal correlation feature,our system further calibrates some errors by using the proposed prediction model.Related Smart Things such as smart LED bulbs and media players are able to provide personalized services based on classification results.This paper makes the following contributions:1)To the best of our knowledge,HiddenTag is the first high frequency(18–21 kHz)acoustic sensing solution for person identification;2)HiddenTag has introduced sweeping and multi-tone techniques to enrich feature spaces.Adding common types of noises makes HiddenTag robust to real environment noises;3)HiddenTag is implemented both online and offline.The proposed offline system achieves 96.2%accuracy with four users and the corresponding online system reaches 90.2%accuracy.
The rest of the paper is organized as follows.Section 2 introduces the system design.Experiments and simulations are shown in Section 3.Section 4 further discusses the evaluations.We provide related work and comparison in Section 5.Conclusions and future work are in Section 6.
▲Figure 2.Signal variations after band-pass filter in preliminary experiments
•HiddenTag employs existing smartphones without complex hardware modifications.The procedure of HiddenTag is illustrated in Fig.2,where HiddenTag is a device-free system based on acoustic sensing.Off-the-shelf smartphones send high frequency(18–21 kHz)sound signals via speakers.The sound emitter can select one from the following three models:single-tone model,multi-tone model,and sweeping model.Affected by the user’s indoor activities,the acoustic signals are changed in the propagation channels.Receivers of HiddenTag sense the varied acoustic information by microphones.By leveraging a band-pass filter,our system only processes the sound in the frequency range of 18 kHz and 21 kHz,which cannot be heard by human beings.
•In the training phase,based on feature engineering,the system trains different users and labels corresponding data.In the testing phase,HiddenTag adopts a band-pass filter to reduce noises.Classifiers based on the machine learning model are built to identify different users in smart home environments.Further,HiddenTag implements various personalized services(smart LED,music,smart TV,etc.)relying on the results of person identification.
The fundamental idea of HiddenTag is that users can be recognized by their acoustic signatures.When the recorder receives acoustic signals,different degradations occur at different frequencies due to frequency selective fading.Additionally,multi-path effects,diffraction,and reflection also impact the acoustic signals.Once users walk in an indoor environment,walking activities and human bodies cause unique multi-path effects and body absorption.Fig.3 illustrates the causes of such attenuation.
To verify this perspective,we conduct preliminary experiments in an empty room,the size of which is 5 m×5 m.We employ two Huawei Mate 30 smartphones as the acoustic sender and the receiver.The heights of the sender and receiver are 75 cm.The acoustic sender generates sounds frequencies from 18 kHz to 21 kHz.The sampling frequency is 48 kHz.In sweeping mode,the sweeping period is 0.02 s.
We record acoustic data for three control groups.In the beginning,the experimental room is empty.In the following two groups,User 1 and User 2 enter the room and walk around the sender and receiver.
As shown in Fig.4,thex-axis indicates the time of the experiment and theyaxis refers to the range of sound frequency.We conclude that for each control group,the power distributions on the different spectrums are different,and less power is distributed on the spectrum when there are users compared with the empty group.
Therefore we design an approach that leverages the different signatures to identify users in the following.
As we explore the inaudible sound that can be generated from built-in smartphones,choosing parameters for sound generation is a challenge.According to our experimental results and literature,only generating single-tone acoustic signals between 18 kHz and 21 kHz is difficult to support accurate person identification because of the limited information.The feature space is constrained by a fixed sending frequency.
▲Figure 3.Signal variations after band-pass filter in preliminary experiments
▲Figure 4.Time-spectral comparison for different ambient mediums
As shown in Eq.(1),S(t)is the amplitude value of the sin wave,andf0is the frequency of the sound wave that we send.Iff0is a fixed value,the value ofS(t)can only reflect the wave at a certain frequency,which means that we do not utilize the inaudible sound on smartphones efficiently.Therefore,we introduce two other models,namely the sweeping model and the multi-tone model,to improve the identification accuracy by enriching feature space.
1)Sweeping model:We propose periodic frequency sweeping from 18 kHz to 21 kHz and set sampling frequency as 48 kHz.Consequently,the frequencies change quickly and cover all the frequencies from 18 kHz to 21 kHz in a short time period.This selection makes the generated sound inaudible for most people,but enriching the feature space for acoustic sensing.
Different from Eq.(1)wheref0is a fixed value,the value off0is determined by Eq.(2)in the sweeping model.fuandflindicate the upper bound and lower bound of the sweeping range.tdis the duration of each sweeping period.Δtis the increment of the current time.As a result,the feature space of sweeping mode includes the data information from different frequencies.
2)Multi-tone model:Even though the sweeping model includes different sound frequencies in a certain time period,for a specific time point,it can only emit a fixed frequency.In this subsection,we propose a multi-tone model.The sender provides more than one sound wave at the same time.The sender emits inaudible sound waves composed of multiple frequencies.Each component of the synthetic sound represents one sound wave at the designed frequency.Consequently,the multi-tone model enables the opportunity to cover more frequencies simultaneously.However,if HiddenTag emits sound at different frequencies,the distributed power on each frequency will decrease.We will apply the three models and compare the results in the section of performance evaluation.
In general,the multi-tone model can enrich feature space by increasing the number of tones.However,the increasing number of tones will reduce the power assigned to each tone.If the power distributed on each tone is too low,the identification results will decrease when we apply support vector machine(SVM)classification.Fig.5 shows the result of FFT for the 3 sound generation models.
The process of receiving sounds is introduced as follows.
1)Sensing trigger:In HiddenTag,a sensing trigger is needed for person identification.Sensing trigger in our system detects users in a certain area rather than the whole home space.That is,HiddenTag should not recognize users everywhere except for the targeted sensing areas in the smart home.When a user enters the targeted area,HiddenTag will be turned on to collect acoustic data.Otherwise,the HiddenTag remains inactive.This switch can save the energy of smartphones and avoid high frequency acoustic signals when they are unnecessary.In our system,we adopt Wi Fi CSI signals[10–11],which are accurate and pervasive RF signals in smart homes,as the sensing trigger.Once our system detects CSI variations between wireless routers and receivers,HiddenTag will start acoustic sensing in the experimental area where the receiver locates.
2)Fast Fourier transform(FFT):Modern smartphones are able to generate sound waves with frequencies from 20 kHz to 22 kHz.There is an interesting phenomenon:most people cannot hear the sound between 18 kHz and 22 kHz.Considering that the users in the smart home do not suffer from the hearable noises,we can leverage such sound to identify different users.We use two smartphones in which the FFT converts time domain signals into representation in the frequency domain.That is,the FFT takes a block of time-domain data and returns the frequency spectrum of the data.Based on applying FFT and inverse fast Fourier transform(IFFT),we obtain data from both the time domain and frequency domain.
3)Band-pass filtering:In order to reduce the noises from the background and focus on the high acoustic frequency range,we adopt a band-pass filter.A band-pass filter passes signals with frequencies in a certain range and attenuates signals with frequencies out of that range.We keep the sound signals in the frequency range between 18 kHz and 21 kHz.The order of the band-pass filter is 9.
▲Figure 5.Three models of sound generation
1)Constructing acoustic features:Designing suitable feature space is important and challenging for high frequency sound.Different from most speech recognition works,classical features such Mel frequency cepstral coefficient(MFCC)[12]and AFTE[13]do not work well in our system.In HiddenTag,we explore classical features in statistics and extract them from both the time domain and frequency domain.
The features are calculated for a time window,the size of which can be adjusted based on the system’s recommendations.In each time window,Table 1 shows the main features adopted in our system.
Specifically,we introduce power spectral entropy and crest factor in detail.In specific,entropy is a common measurement of disorder within a macroscopic system.In HiddenTag,spectral entropy is defined as following steps.First,we compute the spectrumX(ωi)of the received signal.Next,we calculate the power spectral density(PSD)of the received signal via squaring its amplitude and normalizing it by the number of bins.
Then,we normalize the calculated PSD so that it can be viewed as a probability density function(PDF).
The power spectral entropy can be now calculated using a standard formula for an entropy calculation.
Crest factor is a feature indicating the ratio of peak values to the effective value for a waveform.For example,crest factor 1 indicates no peaks and higher crest factors indicate peaks.In our system,as shown in Eq.(6),the crest factor refers to the peak amplitude(xpeak)of the waveform divided by the root mean square(RMS)value(xRMS)of the waveform.LetCdBdenote the crest factor and RMS denote the square root of mean square(the arithmetic mean of the squares of a set of numbers),we have:
▼Table 1.Main features extracted in HiddenTag
2)Handling noises:Although we have used band-pass filters to reduce the noises which are not in the target range,there are other noises distributed on the frequency area from 18–21 kHz.These noise samples may reduce the classification accuracy of HiddenTag.Considering common noises in smart home environments include speaking,clapping,and some background noises,our system can add to or remove four types of noises(background,clapping,speaking and door knocking)from the dataset automatically when we train classification models.Besides,we can assign different ingredients to each type of noise.Once the noises occur in the testing phase,since the training model includes common noises,our system is confident in handling such a problem.
3)Classification:HiddenTag leverages SVM as the classification algorithm.Before implementing SVM in the proposed system,we should consider two problems.Which type of kernel shall be adopted?How to set the value of the penalty parameter?In our datasets,since the number of features is larger than that of observations,according to characterizations of common kernels,we select linear kernels for our SVM approach.Additionally,a low-value penalty parameter in SVM tends to make the decision surface smooth,while a high penalty parameter tries to train all samples correctly by giving the model freedom to select more samples as support vectors.We need to select the penalty parameter in SVM to achieve optimal results.HiddenTag adopts grid search to choose the penalty parameter.Besides,since our system aims to identify users in a short time period,the observation samples are limited.According to the features of common kernels used in SVM(linear kernel,radial basis function(RBF)kernel,etc.),we adopt linear kernels to obtain the optimized classification results.
4)Calibrate exceptions by prediction:Even if HiddenTag is able to identify different users,there still exists the probability of recognizing users incorrectly.Based on our observations,if the proposed system identifies users successfully for most cases,when some exceptions happen,we can calibrate the errors by historic information.In our system,as shown in Algorithm 1,we introduce an approach to avoid exceptions by leveraging the historical information.In each round,when we identify a user,our system not only counts the classification result from SVM in the current round,but also adds the previous results with a certain proportion(α).The parameterαcan be adjusted according to the feedback of test cases.
Algorithm 1.Calibration algorithm for exceptions in HiddenTag Require:α–between 0 and 1;P′i(j)–classification result
of user i in time period j before calibration Ensure:U max–the user with maximal prediction probability(identification result);n is the number of users,m is the number of time rounds for int i=0;i Because HiddenTag can distinguish users in a smart home with convincing accuracy,we implement more applications via SmartHome Hub to provide personalized services.Our system integrates smart LED and speakers to show the identification results.For the installed smart LED,it will be assigned with different colors to different users.Once the user is identified,the corresponding color will be shown on the bulbs.The speaker can play personalized music for different users.If the user’s web account is associated with HiddenTag,the preference music will be played on the smart speaker once the user is recognized.The system does not require explicit user interactions,such as login to an account,recognizing,recalling,or executing users’preferences.More applications can be integrated through SmartHome Hub based on the results of person identification. HiddenTag includes an Android application and a moduleview-control(MVC)based website to process acoustic data and recognize users.All the devices are deployed in a smart home environment.We use a Huawei Mate 30 smartphone as the controller,sender,and receiver.A proposed mobile application plays inaudible sound(18–21 kHz)on senders.It supports three models:single-tone,multi-tone,and sweeping frequency.Initially,we choose a multi-tone model for our experiments.Our system has generated 15 tones which are distributed from 18–21 kHz uniformly.The speaker’s volume is set to 100%.The distance between the sender and receiver is 3 m.The area between the sender and receiver is empty.The sender and receiver are placed 75 cm above the floor.After receiving the varied acoustic signals by human activities,received acoustic data will be transmitted to the Dell T3640 server via Wi Fi.Based on the Python Scikit-Learn library,HiddenTag classifies different users via SVM.Thec(penalty parameter)value is selected by grid search. In our evaluation,we seek to answer three questions:Does HiddenTag identify different users successfully?Since there are often more than 3 family members living in a home environment,how many distinct users that our system can identify?What factors can affect the experimental results? In the offline analysis,we use accuracy in a confusion matrix to describe person identification results.For online test results,we define that accuracy is the success rate for our recognition. We divide the case study into two phases:the training phase and the testing phase. In the training phase,when each user enters the experimental environment,our system will detect user activities and start to profile the user.A user walks normally between the sender and the receiver.The user can also turn around and stand shortly.This training procedure lasts for 60 s.After the training procedure,the user leaves the experimental room. ▲Figure 6.Experimental scenario and case study When the user returns to the room,once she/he walks into the same experimental area,HiddenTag starts to recognize the user and shows the confidence of user recognition.This step is the testing phase.Fig.6 illustrates the experimental environment and corresponding case study. In this subsection,we observe the group with four users as shown in Table 2.Four users participated in the experiment.Each user was trained and tested separately.Table 3 is the confusion matrix for the classification.As shown in Fig.7(a),testing accuracy can achieve 96.1%.We thus conclude that the time length of training influences the accuracy.The longer time of training obtains better results.However,considering our application scenario should limit training procedure to a certain time length,we choose 60 s in our implementation. Then,we focus on the number of users in our case study.We extend our experiments from 4 users to 10 users.After changing the number of users,based on Fig.7(b),we notice our system still achieves an accuracy of more than 90%.Although the system performs better when the system includes fewer users,HiddenTag can still process 10 users with acceptable accuracy. ▼Table 2.Information of four volunteers ▼Table 3.Confusion matrix of four-volunteer experiment Different volumes of the sender sound will change the sound signal strength and identification accuracy.We did the control group experiments to detect which percentage is the best volume for our experiment.Fig.7(c)shows the improvement with increasing volume. Additionally,the distance between the sender and receiver is another factor that affects recognition results.According to existing experimental settings,we only adjust the distance between the sender and receiver.Fig.7(d)shows that with closer distance,the group achieves better accuracies.Only when the distance is too short to profile walking activities(within 1 m),the accuracy will decrease. Then,we compare three sound generation models and discuss which one is the best model for the proposed system.In Fig.7(e),we conduct other two control groups by using a single-tone model and a sweeping model.For the single-tone model,we set the frequency of the sound to 20 kHz.For the sweeping model,we sweep frequency from 18 kHz to 21 kHz once per second.We compare the three techniques in different scenarios(smart homes,offices and open halls),and come to the conclusion that sweeping and multi-tone models outperform single-tone models.Because multi-tone and sweeping models increase accuracies by enriching feature space.The multi-tone model is subjected to power decrease and thus needs a power amplifier to improve performance. ▲Figure 7.Experimental results of evaluations Additionally,based on our observations,the errors of the proposed system mainly occur in the first or second frame.Within the time increasing,the errors will decrease sharply.This phenomenon is caused by two reasons.First,the acoustic signature of each user cannot form in a very short time period.Once a user has walked 3–5 gait cycles,our system can recognize the user based on the acoustic signature.Second,as illustrated in Algorithm 1,the results will be calibrated by historical data.The beginning frames do not have the capability of enhancing accuracies by counting the results in previous rounds. In this section,we further analyze HiddenTag based on these factors:online performance,experimental environment,and noise handling. In order to deploy HiddenTag in a real platform,we develop an online system to show the real-time identification results.HiddenTag adopts Node.js and Python Flask to display the real-time accuracies.The time delay of classification results can be controlled from 2 s to 5 s.Table 4 illustrates the comparison between online and offline results in the same experimental scenario.As shown in Eq.(7),for a certain user,the online accuracy is the ratio that times of successful identifications(Ns)divided by the total times of identifications(Nt). Although online accuracy is lower than offline accuracy,it still reaches 85.0%–90.0%.Next,we extend the online experiments from a room to other scenarios with different layouts and materials.We test HiddenTag in a conference room and a coffee room.The two experiments have achieved online accuracies of 87.5%and 90.5%.The case studies have proved HiddenTag can work normally in different environments. There are two reasons that the accuracy is lower in the online system.First,in offline classification,SVM is able to choose optimal parameters by brute-force searching.However,it is difficult to optimize all the parameters in a short time period due to computational limitations in an online system.Second,the experimental environment is changing between online training and online testing.We discuss this issue in the following subsection. Our experiments are conducted in the same environment.Unfortunately,the same experiment scenario is always changing due to variations of environmental factors,such as temperature and humidity.To assess its impact,a user walked in the experimental scenario by a five-minute interval.Table 5 shows that the same user is classified by HiddenTag for four times in different time slots.Even if one user enters the room for four times,each event can be identified as different users with 62.25%accuracy.To eliminate the environmental changing,the system needs to continuously collect long-term training data implicitly.Relying on a larger dataset that includes more environments variations,we can identify people even if the environment changes sharply. ▼Table 4.Comparison between online and offline results ▼Table 5.Confusion matrix of identifying the same user in different time periods We simulate typical noises when conducting HiddenTag in an experimental scenario.In this experiment,we use a Huawei Mate 30 smartphone to play 3 audio files including a song named“Amazing Grace”,the trailer of Game of Throne 8,and a lecture talk of a machine learning class on Coursera.The playing smartphone is close to the receiver(30 cm).By adopting the proposed noise handling method introduced in Section 2.5,Fig.7(f)shows that our system still reaches acceptable accuracies even if it encounters different types of noises.Hidden-Tag can work normally under some types of noises,but the accuracies decrease when it encounters some noises,such as the noises made by the working elevator and starting of the heater. Existing person identification approaches broadly rely on computer vision and image techniques.By analyzing users’faces and fingerprints,researchers and engineers have provided numerous solutions to user recognition.As a classical face recognition approach,Turk and Pentlend leverage Eigenfaces to define the face space and identify people[14].Recent researchers use the deep network to enhance recognition accuracy[6,15–16].DeepID3[6]designs a high-performance deep convolution network and adds supervision to early convolutional layers,and it represents the state-of-the-art technology on You-Tube Faces benchmarks.Voice recognition is another type of common approach to identify a user.MUDA et al.[17]explore MFCC and dynamic time warping(DTW)techniques to recognize users.In addition,biometrics techniques,such as fingerprint and retina,are other common types of person identification[18–21].Unfortunately,all of these methods face privacy concerns.Although these approaches can recognize users by biometric information,the key personal and private information has to be exposed. Recently,some alternative methods have been proposed to identify persons.Researchers adopt wireless sensing to identify persons,gestures,and even micro-activities[22–23].By classifying variations of WiFi signals,Wi Who[23]leverages CSI to describe the user’s walking behaviors and identify users in WiFi environments.However,wireless sensing methods often need specific devices,such as the emitter and the receiver with CSI drivers,which are not common in smart home environments. Acoustic sensing has been a hot topic recently,and lots of corresponding applications,such as speech recognition,indoor localization are implemented in smart homes[24–27].GEIGER et al.[28]presented a system for identifying humans by their walking sound,by leveraging MFCC and Hidden Markov Model,which has reached the offline identification rate of 65.5%for 155 subjects.This approach depends on the sounds of footsteps.Once the shoes and floors are changed,the system might not work normally.This method does not consider noise handling and online accuracies in a real environment.Actually,there is no solution for person identification area by acoustic sensing without human voice or step sound. As shown in Table 6,different from the existing solutions,HiddenTag is a device-free and highly accurate person identification approach.By using built-in smartphones,we can recognize users only by profiling the common indoor activities at home and in office environments. HiddenTag represents the first device-free system that employs inaudible acoustic sensing to achieve accurate person identification.Through this process without any hardware modification,we gain important insights:1)acoustic information with frequencies from 18–21 kHz can profile human indoor activities and recognize users in smart home environments;2)sweeping frequency and multi-tone models can improve SVM classification for acoustic datasets by enriching features;3)online and offline identification accuracy can reach more than 90%in simplified testing and training procedures which are close to normal activities in the environments similar to smart homes.We believe HiddenTag’s salient advantages will enable a myriad of personalized services in smart homes,including smart voice assistants,augmented reality,energy saving,and various pervasive applications. Moving forward,we are aiming to further improve the identification accuracies by leveraging other machine learning techniques such as recurrent neural networks and generative adversarial networks and enrich the acoustic features by leveraging transfer learning.In addition,we aim to extend single person identification to multi-person with more walking patterns. ▼Table 6.Comparison between HiddenTag and other classical approaches2.6 Applications of Personal Identification at Smart Home
3 Evaluation
3.1 Experiment Setup
3.2 Evaluation Metrics
3.3 Case Study
4 Diving into Depth
4.1 Pushing Offline to Online
4.2 Environment Changing
4.3 Noise Handling
5 Related Work and Comparison
6 Conclusions and Future Work
我们致力于保护作者版权,注重分享,被刊用文章因无法核实真实出处,未能及时与作者取得联系,或有版权异议的,请联系管理员,我们会立即处理! 部分文章是来自各大过期杂志,内容仅供学习参考,不准确地方联系删除处理!