Digisign Ref 1

2020 IEEE International Conference on Pervasive Computing and Communications (PerCom)
SilentSign: Device-free Handwritten Signature

Verification through Acoustic Sensing
Mengqi Chen, Jiawei Lin, Yongpan Zou, Rukhsana Ruby and Kaishun Wu
College of Computer Science and Software engineering, Shenzhen University, Shenzhen, China
{chenmengqi2017,linjiawei2017}@email.szu.edu.cn, {yongpan,ruby,wu}@szu.edu.cn
Abstract—Signature is one of the most prevailing identity

authorization approaches. It is yet inconvenient to use in real
life in the sense that a majority of existing signature verification
approaches rely on additional digital signing devices. In this
paper, we propose a portable device-free signature verification
system named SilentSign which makes use of acoustic sensors
(i.e., microphone and speaker) embedded in smart devices to
enable secure and convenient signature verification service. The
basic idea is to leverage acoustic signals to measure the distance
variation of the tip of the pen while signing. We carefully design
the signal modulation scheme, develop a phase-based distance
measurement technique, and train the verification model for high
performance and robustness. Compared with conventional digital
signing systems, SilentSign allows users to sign more invisibly
and conveniently. We conduct extensive experiments involving 35
participants to evaluate SilentSign. Results show that SilentSign
can achieve 98.2% AUC and 1.25% EER.
Index Terms—Signature Verification, Acoustic Sensing
Fig. 1. SilentSign sends inaudible sound signal from speaker to capture the
I. I NTRODUCTION vertical motion variation of a pen’s tip during the signing process. Features are
extracted from motion variation and used to train a machine learning model
Handwritten signature verification (or, HSV for short) aims to verify the signer.
at verifying whether a given signature is genuine or forgery, in that a signature is associated with several attributes like
and claiming consent on some obligations [1]. It shows the form, movement and variation which show unique patterns
considerable significance and wide application in our daily life for different people and can be viewed as a person’s identity.
such as signing important documents and handling banking As a result, in online HSV systems, apart from the signature,
business. However, due to the shortcomings of existing HSV the signing process itself can be also utilized for online
techniques, handwritten signatures are reported to be forged in verifying the identity of signers, which enhances the security
a large number of serious fraud cases. A recent report filed by of handwritten signatures.
JP Morgan shows that fraud associated with paper checks is
ranked first for consecutive years among a variety of payment Within the scope of online HSV, most prior schemes [4]–[7]
methods [2]. This indicates the significance of HSV research. use a digital signing device such as a digitizer or touchpad,
Depending on the acquisition method, HSV systems can be on which users can sign with their signatures with specialized
classified into two categories, namely, offline verification and pens. However, these systems require specialized hardware,
online verification [1]. The former refers to writing one’s name which makes them not applicable in the cases where users sign
on paper materials and checks the signature afterward, which on paper materials in daily life. In other words, they can not
is the most traditional HSV approach. This kind of HSV occurs provide real-time verification service for offline handwritten
in cases where paper documents such as contracts, receipts and signature scenarios. The latest work [8] takes advantage of
etc. need to be signed. But this lacks the real-time verification the sensing capability of popular wearable devices, namely,
process of the signer and makes the signature easier to be a smartwatch, to capture the wrist movement via inertial
forged. With the development of hardware, researchers have sensors for signature verification. Compared with prior works
proposed some novel HSV technology that makes use of [4]–[7], this novel approach overcomes the shortcomings of
smart devices to accomplish online verification of signers the unavailability of hardware. Nevertheless, it still has the
and enhance the security of HSV. The underlying principle following shortcomings. Due to the restriction of computing
is to capture dynamic properties of signing movement, such resources, the smartwatch-based HSV system has to offload
as the order of strokes, speed and pressure of the pen in collected data to another computing device, for example, a
order to verify the identity of singers, with the help of smart smartphone or a tablet. That is, an HSV system needs to
devices such as digitizer and tablet [3]. The feasibility lies equip with two separate devices. Even with more powerful
978-1-7281-4657-7/20/$31.00
Authorized ©2020
licensed use limited to: University IEEE
College London. Downloaded on July 06,2020 at 11:11:37 UTC from IEEE Xplore. Restrictions apply.
computing capability, the smartwatch-based HSV technique Offline system uses offline acquisition devices such as a
yet needs another device running applications like electronic scanner or camera to obtain static images as input data. The
banking services owing to its limited screen size. What is verification process is done after the writing process. Current
more, this method requires a signer to wear a device, which research mainly focuses on the online HSV approach due to
may degrade the user experience. its popularity in today’s marketplace. Online systems usually
Consequently, we raise such a question: can we design rely on dynamic data such as pen pressure, azimuth, altitude
an HSV system with only an off-the-shelf device and without and so on. Pen or arm motion data while signing on the paper
the user wearing or touching any additional hardware? In can be captured by various digitizing tools such as digitizing
this paper, we propose SilentSign, an acoustic-based touch- tablets, special pens and smart wrist [4], [5], [8]. Compared
free HSV system that can transform any smart device with to the aforementioned works, SilentSign novelly uses acoustic
acoustic sensors into an online HSV system. SilentSign lever- signals to track the motion of teh tip of the pen as input data.
ages embedded speaker-microphone pair readily available on Moreover, SilentSign has both advantages of online(dynamics
commercial smart devices without equipping any additional data) and offline(Device-free signing on the paper) systems.
hardware or making hardware modification. It achieves a fine-
grained signature verification objective accurately. The basic B. Biometric Authentication on Mobile Devices
idea is to utilize inaudible ultra-sound to capture the vertical Biometric behavior or biometric authentication on mobile
trajectory of the pen tip during the signing process as shown and wearable devices is a popular topic in recent years. Various
in Fig. 1. Then, with image similarity distance as the feature, biometrics such as voice, iris, face and keystrokes, captured by
we train a machine learning classifiers to determine whether different sensors on portable or wearable devices, have been
the signature trajectory is genuine or forged when an unknown proved to be used for the purpose of authentication [10]–[12].
signature comes. Other features such as dental [13] and face [14], heart rate [11]
To summarize, we list the following contributions in this and breath [12] have been used in authentication on mobile or
work: wearable devices as well. On the other hand, due to the unique
• We propose an acoustic-based HSV method that can be habits caused by the living environment, behavior biometric
easily implemented on readily available smart devices. It is a more trusted feature that can be used for authentication.
can not only supplement real-time signature verification VibWrite [10] captures the dynamic motion of a finger when
function for scenarios of signing on paper materials but a user performs a specific gesture on the touch screen to
also replace specialized hardware in existing online HSV authenticate its identity. Compared to the aforementioned
with a handy device. Compared with similar work [8], it methods, handwritten signature as authentication feature has
does not require a signer to wear a additional device. been used for a long time in history, its proven uniqueness
• We design a universal machine learning model for signa- and application for special occasions is irreplaceable.
ture verification by combining imaging similarity features
C. Acoustic Sensing
(e.g., SSIM, PSNR and Hausdoff distance) that character-
ize the dynamic pattern of signing trajectories. By such Acoustic sensing as a non-contact means of human-
design, SilentSign achieves favorable performance while computer interaction has broad application scenarios. Due to
a new user is enrolled without retraining the model. the range spread and the smaller amount of processed data,
• Finally, we conduct extensive experiments and evaluate sound-based sensing is more advantageous in motion detection
our system comprehensively. We recruit 35 students and or localization, such as gestures by using mobile and wearable
clerks in our University for experiments and collect a total devices [15]–[17], indoor localization [18], [19]. FingerIO
number of 1400 recordings of genuine and forged sig- [15] uses an inaudible OFDM modulated sound frame to
natures. The evaluation results show that SilentSign can locate the moving of finger by detecting the change of two
successfully distinguish genuine and forged handwritten consecutive frames. VSkin [16] leverages the structure-borne
signature at AUC of 98.2% and EER of 2.37%. and the air-borne sound paths to sense gestures performed
The rest of this paper is organized as follows. We outline the on the surface of the smartphone. BeepBeep [18] measures
related work in Sec. II. We provide the required background in- the distance between devices by acoustic ranging. [19] uses
formation and overview of the architecture in Sec. III. Sec. IV a chirp-based ranging sonar achieving the localization error
and Sec. V introduce the techniques used in system design and within 1 m. In this paper, SlientSign combines the phase-based
verification model construction. We evaluate the performance and frame-based approach by using Zadoff-Chu coded that has
of the system in Sec. VI. In Sec. VII and Sec. VIII, we discuss nice auto-correlation properties and the ability to track phase
the remaining problems and future work, and conclude this changes. Leading the advantage of a high refresh rate and
paper respectively. directly correlate with the movement of the pen tip.
II. R ELATED W ORK III. S YSTEM A RCHITECTURE

A. Handwritten Signature Verification A. Considerations
Relying on the data acquisition type, existing methods for We make the following considerations while designing
HSV can be divided into two types: offline and online [9]. SilentSign.
Authorized licensed use limited to: University College London. Downloaded on July 06,2020 at 11:11:37 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. The overview of a sample acoustic sensing system.
the user just requires to sign his/her signature in the sensing

Fig. 2. The system architecture of SilentSign. range and the system conducts verification.
In the authentication phase, the user just requires to sign
• Sensing Direction: In traditional online HSV approaches, his/her signature in the sensing range of SilentSign to track the
the dynamics of signatures are captured by a digital pen movement. SilentSign extracts response and phase change
signing device with the data modeled as which we mention in the enrollment phase. The features are
S(t) = [x(t), y(t), p(t)...]T , t = 0, 1, 2, ..., n extracted by comparing input data and stored data and then
fed into the trained SVM classifier for final verification.
in which, x(t) and y(t) represent coordinates of the pen
tip at time t, and p(t)... represent other features such IV. S YSTEM D ESIGN
as pressure and azimuth. In a typical English signature, A. Acoustic Sensing
the x(t) typically grows linearly with small oscillations There are the following considerations during designing the
on the linear curve, while y(t) changes back and forth acoustic sensing part of SilentSign.
between positive and negative values more frequently 1) Current digitizers have a sampling rate of 75 ∼ 200 Hz
with more obvious oscillation. Therefore, it is feasible and a tracking accuracy of 0.2 mm. To achieve better
to only consider the vertical movement while ignoring verification accuracy, the performance of our acoustic
horizontal movement [20]. sensing system should be close.
• Sensing Accuracy: The sampling rate of a current com- 2) Due to the multipath effect, received echoes are reflected
mercial digitizer is 75 ∼ 200 Hz with an ideal accuracy from multiple objects. Thus, we need to differentiate the
about 0.2 mm [21]. Signing is a very delicate action. path corresponding to the moving pen from others.
Using the digitizer, traditional signature verification not 3) For better user experience, we transmit and record in-
only captures the pen tip movement but also pressure and audible sound.
azimuth. Due to the limitation of acoustic sensing, it is To meet these requirements, we design an acoustic sensing
impossible to utilize these properties as the features in system as shows in Fig. 3. It consists of three major compo-
our system. Therefore, at least we need to make the 1d nents including signal generation, signal reception and distance
tracking accuracy as good as the digitizer. measurement. The signal generation component is responsible
B. Overview for generating inaudible modulated ultrasound signal. The sig-
nal reception component receives reflected sound signals from
SilentSign system architecture is shown in Fig. 2. The smart surrounding objects via microphone, and then synchronizes
device transmits and records inaudible sound with the built- the sender and receiver. The distance measurement compo-
in speaker and microphone. By measuring impulse response nent demodulates the received signals to extract the distance
and phase changes of received signals, it tracks the movement variations between the smartphone and moving pen tip. This
of a pen tip in the vertical direction. Since the trajectory is component consists of three steps, namely, estimating different
a sequence of 2-D vectors, we can regard it as a gray-scale impulse responses, estimating the phase change and formatting
image. We extract traditional image similarity features from the impulse response. We shall describe each component in a
two types of trajectory data, genuine and forged, to train an detail as follows.
SVM classifier. Moreover, SilentSign consists of two usage
stages, namely, signature enrollment and signature verification. B. Transmit Signal Generation
In the former stage, the users supply enough number of A phase-coded pulse is usually used in radar applications.
signatures as the template samples. In the authentication stage, Giving a pulse, it can be divided into N bit sequence denoted
as S = {s[1], ..., s[N ]} with each bit coded with different speaker and microphone. In other word, we know the length of
phases. Finding a specific code with excellent resolution LOS and travel time between sent first pulse and received it.
is criteria due to the unlimited number of possible phase Then, after we found first pulse of LOS, we use it as reference
codes. A manageable solution is to find a code with a of start time by simply adding fixed delay. Fix delay is based
good autocorrelation function. In this paper, we choose 127 on the distance of speaker and microphone.
bits Zadoff-Chu coded (ZC sequence), because of its ideal To locate the first pulse, SilentSign adopts an adaptive
periodic autocorrelation function properties. Furthermore, [16] energy-based LOS detection technique to find a precise LOS
has proved 127 bits ZC sequence can track moving object with path. We add ZC1024bits in the following 24000 zero points.
an average movement distance error of 3.59 mm and 3 KHz. Raw 1024 bits ZC sequence has a high auto-correlation gain.
These properties are very close to the digitizer. Once the recording is started, we perform the cross-correlation
∗
To generate an adaptive inaudible ZC transmit signal, we function IR(t) = ZCR (−t) ∗ ZC1024bits (t) to obtain impulse
first modulate raw 127 bits ZC sequence (ZC127bits ) into ∗
response, where ZCR (−t) is the conjugation of received
17 ∼ 23 KHz. Then we apply the frequency domain inter- baseband signal. Fig. 4 shows the impulse response of ini-
polation on ZC127bits by padding zeros in the middle of the tially received pulse, due to the ideal periodic autocorrelation
ZC127bits in frequency domain until the length of sequence properties of the ZC sequence. The auto-correlation of the ZC
reach 1024 bits. After this processing, we get interpolated sequence has a low auto-correlation side lobe level, and the
ZC sequence (ZC1024bits ) which the bandwidth of the result first peak is the LOS path.
sequence is about 6 KHz at the sampling rate of 48 KHz. Then, After applying the cross-correlation function, the next step
we modulated the interpolated ZC sequence into the passband is to precisely find the position of the LOS peak. For this
by multiply the real part and imaginary part of ZC1024bits purpose, we use an adaptive energy-based algorithm to find
with a carrier. The carrier frequency is 20.25 KHz. Finally, the rough starting point of the LOS path. We assume that the
the frequency of transmit signal SZCT is in the range of remaining noise power follows the Gaussian distribution. μ(t)
17.29 ∼ 23.25 KHz. and σ(t) are the average power and its standard deviation at
For synchronizing the sender and receiver, we add 24000 time t. We denote the amplitude of the IR by a discrete series
zeros followed by ZC1024bits in the very beginning of SZCT . IR(t) and use a sliding window of width W to calculate the
In the latter part of this section, we will explain how it works. average noise power. μ(t) and σ(t) are calculated by
Generated transmit signals can be saved as a WAV file then
played by the speaker of the smartphone. The microphone 1 1
starts recording while the speaker is playing the sound. After μ(t) = A(t) + (1 − )μ(t − 1)
W W
receiving the reflected signals, we first use an adaptive energy- 1 1
based synchronization approach to synchronize the sender and σ(t) = B(t) + (1 − )σ(t − 1)
W W
receiver. Then, we demodulate the received signals by down-
converting passband signals into baseband ones. where
W +t
1
C. Processing of Received Acoustic Signal A(t) = |IR(k)2 |
W
Traditional sonar systems can synchronize the sending and k=t
recording operations of the signal. After starting the recording W +t
1
operation, the sonar concurrently manages the buffering of the B(t) = (|IR(k)|2 − A(k))2
received signal and calculates the distance of the reflected W
k=t
path. Synchronization of the sender and receiver provides a
reference for the delay between the sending and receiving μ(0) = 0 and σ(0) = 0, A(t) is the accumulated power,
time of initial pulse through line-of-sight (LOS). Without syn- and B(t) is the overall standard deviation of signals within
chronization, the delay between initial pulse and first received a sliding window. A rough starting point of IR(t) can be
pulse may not accurately present the time interval of pulse determined if the following relation hold.
travel through LOS, which will cause deviation to subsequent
distance measurement. Due to the compatibility issue of the |S(t)|2 > μ(t) + λ1 σ(t)
android operating system, it is difficult to synchronize speaker
and microphone. where λ1 is a constant which is independent of the noise
1) Adaptive Energy-based LOS Detection: To solve this level. We empirically set W and λ1 as 1024 points and 18,
problem, we add 24000 point of zero at the very beginning respectively. As shown in Fig. 4, the red line is a rough starting
of SZCT . 24000 points last 0.5 second that makes sure the point, and the peak is within the next 1024 points in the
recording operation of the microphone before transmitting the LOS path. Finally, we apply a maximum function to find the
pulse. This allows the microphone to receive first pulse com- exact position of this peak. LOS path is a baseline of the
pletely. Since our acoustic sensing system base on monostatic following distance measurement. After adding a fixed delay
sonar, speaker and microphone is fixed on the smartphone to the position of the LOS path, we use this position as the
which means we have already known the length between starting point of the impulse response.
1 20 where NZC is 1024, the length of ZC sequence, ni is the
Distance (cm)
position of the maximum point in the ΔIR which we will
Energy
0.5 10 explain in the last section. For computational cost-saving,

we only calculate the path coefficient of the moving path.
0 0 Differential IR Estimations indicate which path is related to
0 2 4 6 0 2 4 6
Time (s) Time (s) moving pen tip, Given the tracked phase change information
of this path, the change of distance can be calculated by using
Fig. 4. Adaptive energy-based ini- Fig. 5. Distance variation during the accumulated phase as follows.
tial pulse detection. the signing. t i
Δθi−1
D. Distance Measurement di (t) − di (0) = − i=1 ∗ λc
2π
1) Differential IR Estimations: In modern sonar systems, a where λc is the wavelength of sound λc = c/fc and
sonar transmitter typically sends a known training sequence. Qht Qht−1
Sound signals propagate through the air, meet objects within
t
Δθt−1 = −
Iht Iht−1
the detection range, and then reflect back to the receiver in
We combine low sampling rate IR response estimation with
a very short time interval. During this process, the signal
distance change, and then by calculating the maximum point in
is reflected back from multiple different length paths that
IR estimation result, the distance variation is a two-dimension
lead to discrepant time delay and the received signal is
array as shown in Fig. 5.
a mixture of all paths. To separate different paths at the
3) Format IR Estimations: Since the initial moving point is
receiver, we use the cross-correlation function to estimate the
uncertain, for the better training, we format IR estimation by
Impulse Response (IR). For tracking the moving object, we
the following step. We first scale the value of IR estimation
can locate the changing channel path due to the movement of
to 0 − 1 to build a good model.
the object by subtracting the IR between two adjacent time
periods. After synchronizing the sender and receiver, we first IR(t) − min(IR)
IRnorm (t) =
demodulate the received echoes into baseband signals denoted max(IR) − min(IR)
by SZCR (t) with a low-pass filter. The Impulse Response Then, we use the following algorithm to remove the beginning
(IR) can be estimated by using the cross-correlation function, point not associated with the detection of the movement
∗
IR(t) = SZCR (−t)∗SZCT (t). Each peak in the IR estimation and find the moving start point as the center of the array,
indicates one propagation path at the corresponding delay. If IRf ormated is the training sample for the following discussion.
the pen tip starts to move, the magnitude of propagation path
changes, and then we can achieve these changes by apply Algorithm 1: IR Estimation Format
subtraction of impulse response between two time’s intervals Input: IRnorm
as follows: Output: IRf ormated
1 col = 0; row = 0;
ΔIR = IRt∼t+W −1 − IRt+W ∼t+2W −1 2 center = 0;
3 colSize = ColumnSizeOf IR(IRnorm (:)(0));
Furthermore, to save computational cost, we use a standard 4 while sum(IRnorm (:)(col)) == 0 do
energy threshold-based algorithm to detect the event when the 5 col + +
pen tip starts to move. If the pen tip moves, the position of a 6 end
maximum point in the ΔIR is the distance between the pen 7 [value, row] = max(IRnorm (:)(col));
tip and smartphone. However, since the window size is 1024, 8 IRf ormated = [zeros(colSize − row); IRnorm (1 :
the frame refreshing rate is about 46.875 Hz, which is below row + colSize, :)]
that of digitizers (75 ∼ 200 Hz). To increase the refreshing
rate, we shall incorporate the estimation of phase change in
the following section. V. AUTHENTICATION M ODEL
2) Estimation of Phase Change: Once the moving path A. Feature Extraction
is detected, we calculate the path coefficient of the moving Traditional similarity features such as Structural sim-
path to improve the refreshing rate. Finally, by measuring the ilarity (SSIM), Peak signal-to-noise ratio (PSNR), Mean
phase change of the path coefficient, the distance variation squared error (MSE), and Hausdorff distance has been widely
of a moving pen tip can be calculated by incorporating the used for measuring the image similarity [22], [23]. There-
phase change. Path coefficient illustrates how the amplitude fore, in our system, we calculate the above four features,
and phase of the given path change with time. The formula to generate a four-dimension similarity feature vector (S =
compute path coefficient formula as following: {SSIM, P SN R, M SE, HAU SDORF F }) from two scaled
and formatted IR response IRA , IRB . Depending on whether

NZC−1
these two types of IR are generated from the same genuine
∗
ht [ni ] = SZCR [t + l] ∗ SZCT [(l − ni ) mod NZC ]
signature dataset, we label feature vector as genuine or forged.
l=0
CDF
0.5
Normal pen
Apple pencil
0
0 5 10 15 20
1-D tracking error (mm)
Fig. 7. 1-D tracking errors with Fig. 8. Acoustic Sensing Range.

normal pen & Apple pencil.
VI. P ERFORMANCE E VALUATION

Fig. 6. Model training Design A. Acoustic Sensing
B. Model Training 1) Tracking accuracy in 1-D: We first evaluate the accu-

racy of distance estimation with SilentSign implemented on
1) Training Dataset Enrollment: During the enrollment SAMSUNG galaxy note 8. During the evaluation, we attach a
phase of the training dataset, the user provides a reasonable ruler of 50 cm on the top of an A4 paper on which we draw a
number of signatures to calculate feature vector F . Based on line along with the scale of the ruler to get the ground truth.
vector’s labels, our training data can be classified into two When we move a pen from a starting position along the line,
sets including Genuine Signature VS Forged Signature, and SilentSign makes use of the speaker and microphone in Note 8
Genuine Signature VS Genuine Signature. to track the distance change between the pen and smartphone.
2) Training Phase: For each experimenter, we randomly The ground truth is the length of the line measured by the
select 15 genuine signatures as reference signatures denoted scale of the ruler. As we use A4 papers in our experiments,
by Ri where i represents the index of experimenters. The the overall testing distance is about 29.7 cm.
remaining signature samples are denoted as Gi . His/Her forged In distance estimation, as we use the LOS path as the
signatures denote as Fi . During model training, we first baseline to measure the following distance, a compensation
i
calculate similarity vectors SG between each pair of signatures factor requires to be added to the resultant distance. This
Gi and Ri , then label them as genuine. Meanwhile, we also is because the initial pulse detected is not the sending time
calculate similarity vectors SFi between each pair of signature of acoustic signals. Instead, it is the receiving time of the
in Gi and Fi , then label them as forged. The training dataset first echo component. Consequently, we need to add the time
consists of the above two types of samples. After that, we feed of flight (TOF) of LOS as a compensation factor, which is
the training samples into classifiers to train verification models essentially a time shift between the sender and receiver. This
which decides whether a new input signature is genuine or factor is determined by the distance between a speaker and a
forged. The training phase is shown in Fig. 6. In traditional microphone. According to our measurements, it is 0.7 cm for
signature verification system, a user needs to enroll certain SAMSUNG galaxy note 8. We also compare distance estima-
number of genuine signatures as templates and retrain the tion performance with different pens. By repeating the above
system. In contrast, our system can be applied to a newly measurement for 400 times, we can obtain the Cumulative
registered user with less retaining. This is because the trained Distribution Function (CDF) of distance estimation errors as
classifier finds thresholds deciding genuine or forged for later shown in Fig. 7. As we can see, SilentSign achieves average
used in the verification process. During the system design, we errors of 4.09 mm and 4.20 mm for normal pen and Apple
explore four different classification models, including Logistic pencil, respectively. The 90th percentile errors are 9.64 mm
Regression (LR), Naive Bayes (NB), Random Forest (RF), and and 9.80 mm which are similar to traditional digital signing
Support Vector Machine (SVM), to select an optimal one and devices.
achieve better verification performance. We shall detailedly 2) Tracking Range: Since the speaker and microphone have
demonstrate the comparison of different models in Sec. VI-B4. the directionality property, we evaluate the tracking range
of smartphones in this section. The area of A4 paper is
21 cm × 29 cm, we first divide the paper into 609 square
C. Signature Verification blocks (each block has 1 cm length and width). Then we place
the smartphone above the centerline of the landscape paper.
In the verification phase, each new user needs to register Second, we draw a circle in each area. If SilentSign acoustic
his/her signatures in the system first. In this way, the system sensing system detects the pen movement within the sensing
shall store their signatures as templates in the database and range, the path corresponding to the pen position will change,
label him/her as genuine. When a non-registered person signs, the initial value appears in the different impulse responses.
SilentSign will compare the obtained signatures with the stored We continually check every 609 block and mark it when it
signatures, calculate similarity vectors, and pass them into a has an initial value in the different impulse responses. The
trained classifier to decide whether the signatures are genuine experimental result is shown in Fig. 8. Darker area means more
or forged. sensitive. Since the microphone and speaker of the mobile
phone are directional, the closer to speaker and microphone, signatures, and each subject was asked to practice the
the narrower the lateral range that can be detected. From signatures until before he/she becomes skilled. Finally,
our observation, the best sensing region is a 7 × 11 cm2 we collect 700 forged signatures from 35 subjects. Each
rectangle (the region surrounded by red dashed line in Fig. 8), subject contains 20 forged signatures, and these signa-
and the distance between this square and smartphone is 11 tures are created by other 5 subjects.
cm. Compared to the commercial signature pad (e.g., Wacom 3) Signature Verification Setup: To evaluate the perfor-
STU-300, the signing range is 2.5 × 9.9 cm2 [24]), our mance of genuine signature and random forged signatures,
7 × 11 cm2 signing range is larger and enough for signature genuine signature and skilled forged signatures is one of the
verification and avoiding user signing beyond the sensing area. main study topics of any signature verification system. The
Therefore, we use this region as signing range in the following meaning of a random forged signature is that signature is
experiment. created without any knowledge. By evaluating this, we can
B. Signature Verification understand whether our system is robust, preventing random
signature pass through. Skilled signatures are created with a
1) Data Collection Setup: We recruited subjects to collect
certain level of training on the genuine signature of the claimed
genuine and forged signature data. The subjects were asked to
user [8]. As a result, we consider three testing cases to evaluate
sign their names within the signing range on the iPad Pro
the verification model of SilentSign. Note that in our dataset,
by using Apple pencil. In their signing process, we place
for each subject u, there are 20 genuine signatures denote as
the smartphone above their signature position and turn on
Gi and 20 forge signatures denote as Fi .
the SilentSign app to sense the movement of the pen tips.
Although our signing range is relatively large, we did not • Case 1: distinguishing between genuine signatures and
specifically indicate that the signature must be written in the skilled forgeries (denoted as ‘SF’).
center of the signing range. As a result, the position of each • Case 2: distinguishing between genuine signatures and
signature of each participant relative to the mobile phone will random forgeries (denoted as ‘RF’).
randomly move. But the following evaluation results show that • Case 3: distinguishing between genuine signatures and
this random relative movement did not affect the verification both types of forgeries (denoted as ‘ALL’).
accuracy. Moreover, we conduct our data collection in the All genuine signatures in case 1, case 2 and case 3 are
rich-noise lab to challenge our system. On the other hand, for randomly selected from u’s Gi , and the 15 forged signatures
collecting forged signatures, we record the screen by using the are randomly selected out of the Fi of u. We select 15 subjects
IOS screen record function. A subject who plays the role of (not including u that we first selected) as a random forger,
a forger can imitate other subjects with genuine signature by and then randomly select 1 genuine signature out of his/her
watching recorded signing video. To protect personal privacy, Gi for each of those subjects (we have total 15 samples as
we make a guarantee to each participant that their signature RF). Finally, we randomly select 15 signatures out of all the
data will not be made public and will only be used in the signature samples (we have a total 15 samples as ALL).
experiment. After labeling signatures, we then calculate the similarity
2) Data Collection: In our experiments, we recruit 35 vector between genuine signature and SF, genuine signature
participants including males and females at different ages and and RF, genuine signature and ALL, and fed to a trained
with different nationalities from our university. We collect classifier for verification. The experiments associated with case
these samples over one month to prove that the performance is 1, case 2 and case 3 have been repeated 50 times. Moreover,
time-invariant. The whole data collection experiments include we change random seed in each iteration to keep our system
the following two steps. generalizable. Final results are the average ones summarized
• Step 1: collecting Genuine Signatures. In this step, we by all iterations. We compare the performance of different
collect genuine signatures from 35 subjects. Each of them classifiers, namely, LR, NB, RF and SVM as mentioned in
is required to provide 20 signature samples. Before they Sec. V-B2.
sign, we will place the phone above the signature signing Similar to the works [8], [25], we adopt two main metrics
range like Fig. 1. In the meanwhile, we turn on the to quantify the performance of SilentSign, namely, area under
acoustic sensing app to track the pen movement in the curve (i.e., AUC) and equal error rate (i.e., EER). AUC is
vertical direction and record sign trajectory through the defined as the area under the receiver operating characteristic
screen recording function. Finally, we collect 700 genuine curve (i.e., ROC). The higher it is, the better the system works.
signatures from 35 subjects. EER is the point on the ROC curve that corresponds to an
• Step 2: collecting Forged Signatures. In this step, we equal probability of miss-classifying a positive or negative
collect the forged signatures of 35 subjects. Each of them sample. The lower its value is, the better the system performs.
is required to imitate 20 samples of a forged signature. We 4) Performance of different models: Fig. 9 shows the AUC
randomly select 5 genuine signatures from other 5 users. and EER of four different classifiers. All classifiers perform
Each subject imitates these 5 genuine signatures, and good, and the SVM model outperform others: AUC = 98.6%
every genuine signature is repeatedly imitated 4 times. and EER = 1.7% in SF, AUC = 96.7% and EER = 1.5%
Before signing, we play the recorded video of these in ALL, AUC = 98.2% and EER = 1.3% in RF. We believe
100 100 15
SF ALL RF SF ALL RF
4
AUC (%)
AUC (%)
EER (%)
EER (%)
90 10
95
2
80 5
SF ALL RF SF ALL RF
90 0 70 0
SVM LR RF NB SVM LR RF NB 0 10 20 30 0 10 20 30
Classifiers Classifiers # of training subject # of training subject
(a) The performance of different clas- (b) Different classifiers performance - (a) AUC (b) EER
sifiers in terms of AUC EER
Fig. 11. AUC & EER for different number of training Subjects.
Fig. 9. AUC & EER for different classifiers.
TABLE I
100 15 AUC & EER FOR DIFFERENCE SIGNATURE COMPLEXITY
SF ALL RF
AUC (%)
EER (%)
80 10 AUC (%) EER (%)

SF ALL RF SF ALL RF
60 5 Simple 83.1 85.9 85.5 18.9 16.8 16.7
SF ALL RF
Medium 88.8 91.9 94.4 11.5 6.9 3.1
40 0 Complex 93.8 92.4 96.1 3.4 4.2 3.8
0 5 10 0 5 10
# of reference signagure # of reference signagure
of 30 subjects as testing samples. Next, we randomly select 10
Fig. 10. AUC & EER for different reference signatures. subjects as training subjects and 25 as testing subjects. We run
the training and testing operations one after another until the
the nature of the features we used is the reason why SVM number of training subjects reaches 30. In order to ensure that
classifier achieve the best results. The features used are the our experiments are not interfered with by abnormal samples,
similarity value between reference and questioned signatures. our evaluation has been repeated 25 times. The average AUC
Genuine signatures are more likely to achieve high similarity and EER are shown in Fig. 11. When the training samples
value than forged signature. Moreover, the ranking of the increase, the performance becomes better. The best score
SVM classifier for the three tasks is as follows. ALL have achieves at task RF, while the number of training subject is
the best performance followed by RF, and SF is the worst. 30, the AUC is 98% and EER is 5.3%.
These results just satisfy our intuition that skilled forgeries 7) Impact of signature complexity: We also evaluate
are mostly similar to the genuine signature since the distance whether the signature complexity influences performance. To
variation is almost the same. In the following results, we use do this, we first define three complexity levels of signatures
the SVM model as the final classifier for signature verification by ’Simple’, ’Medium’ and ’Complex’. If a signature contains
and continually evaluate the verification accuracy of the SVM less than 4 letters, over 10 letters, or in between, its com-
model. plexity is defined to be ’Simple’, ’Complex’ or ’Medium’,
5) Required number of reference samples: In this section, respectively. For each level, we select 3 participants with
we evaluate the impact of required number of reference sam- signatures meeting the criteria for verification. The result is
ples. We keep the number of subjects the same and increase the showed in Table I. As we can see, with signature complexity
number of reference samples ranging from 1 to 10 for each gets higher, the AUC increases and EER decreases, indicating
subject gradually. A trained SVM model is used to classify the improvement of system performance. It’s recommended
three tasks we mention in section VI-B3. Fig. 10 shows the that using a signature of medium complexity and above.
AUC and EER for a different amount of reference samples. As 8) Impact of number of forger imitators in training:
the number of reference samples increases, the scores improve SlientSign is intrinsically a non-retrain system for forgers
rapidly from AUC=42.1% and EER=13.6% using a single since their signatures are difficult to be obtained in real-
reference signature to AUC=97.4% and EER=1.2% using 3 world application scenarios. But it is possible to collect data
reference signatures. And the best score achieves at task RF, from forger imitators who could be our friends, colleagues, or
while the number of reference signature is 9, the AUC is recruited participants, to improve the authentication ability of
98.9% and EER is 1.3%. From our observation, even if 3 our system. Thus, a question should be answered that how the
reference signatures seem to be enough, increasing reference system performance could be affected by the number of forger
signatures amount leading to a robust and secure system. imitators in model training. In response, we randomly divide
6) Required number of training subjects: In this section, all the participants into three groups, namely, legitimate users,
we evaluate how many training subjects are enough to achieve forger imitators, and real forgers. We train binary classification
good performance. For this purpose, we train our models with models with data from the legitimate user and forger imitators
varying amounts of subjects, starting with 5, adding 5 subjects and test it with data from the remaining participants (i.e., real
each time as a training set, the rest are testing set, until the forgers). As we can see, when data of more forger imitators
number of testing subjects reaches 30. For example, we have are used, the system performance can be enhanced. When the
35 subjects in total. We first randomly select 5 subjects, their number of forger imitators in the training stage reaches 10,
genuine and forged signatures as training samples, and the rest the AUC and EER get around 96% and 4%, respectively.
100 negatively affected. This is because the orientation determines

SF ALL RF
15
the relative movement between signing activity and the device,
AUC (%)
EER (%)
95
10 which in turn affects the echo signals. This is one of the lim-
90
5 itations of our system. We envision that this can be improved
SF ALL RF
85 0 by extracting orientation-independent features and collecting
0 10 20 30 0 10 20 30
# of forger imitators
training data from several orientations in the future work.
# of forger imitators
(a) AUC (b) EER B. The impact of lack of forged signatures
Fig. 12. AUC & EER under different number of forger imitators included in As aforementioned, the signatures of real forgers can not
model training. be obtained in real-world application scenarios, which causes
performance degradation as shown in the evaluation. Although
100 20
AUC EER adding data of forger imitators into training can improve the
90 15 performance, it keeps stable in terms of AUC and EER even
AUC (%)
EER (%)
80 10
when more forger imitators’ data are used. To gain further
optimization and fulfill the higher requirement in certain
70 5
scenarios like banking services, it is feasible to design a more
60
P0 P1 P2 P3 P4
0 advanced verification model by making use of deep neural
Smartphone positions network which is more powerful to extract deep features. We
(a) The experimental setup to evaluate (b) The AUC& EER when smart- leave this as one of our future work.
the impact of smartphone position phone is placed in dofferent positions
VIII. C ONCLUSION
Fig. 13. the performance of SilentSign when the smartphone is placed at In this paper, we propose an acoustic sensing-based hand-
different positions.
written signature verification method which can be imple-
9) Impact of the smartphone position: We finally evaluate mented on handy smart devices such as smartphone and
the impact of smartphone position on verification performance. tablets. Compared with the common touchscreen-based HSV
As shown in Fig. 13(a), we move the smartphone from the system, our method has a lower hardware requirement and
original position P0 along four directions for 10 cm and place can be applied in scenarios of signing on paper materials to
it at four different positions (P1 ∼ P4 ). Then we request supplement real-time signature verification. Our approach is
two participants to perform genuine signatures and forged a purely software-based solution and only uses a speaker and
signatures respectively as described in Sec. VI-B for 10 times microphone which are basic components of most commodity
at each position. Based on the collected samples, we run the devices. By extracting intrinsic patterns of signing movements,
verification process with samples in P0 as reference signatures our well-designed system SilentSign can achieve satisfactory
and get the results as shown in Fig. 13(b). As we can see, signature verification performance in terms of metrics of AUC
the smartphone position indeed affects system performance. and EER. Although it still has limitations in practicability
When it is moved away from the original position P0 , the and robustness, we believe that this is a promising technology
already trained system degrades But for different positions, deserving further research.
the impact varied. When the smartphone is moved horizontally
(P1 and P2 ), the performance has smaller decrease; while for ACKNOWLEDGMENT
vertical movements (P3 and P4 ), the performance degradation We really appreciate the kind effort of our shepherd, Prof.
is more obvious. The underlying reason is vertical movements Petteri Nurmi, for giving precious suggestions to revise and
cause changes in relative orientations between signing activity improve the paper. This research was supported in part by
and the smartphone. This further affects the measurement of the China NSFC Grant (61802264, 61872248, U1736207),
vertical movement which is used as a key feature. Guangdong NSF 2017A030312008, Fok Ying-Tong Educa-
tion Foundation for Young Teachers in the Higher Education
VII. D ISCUSSION AND F UTURE W ORK Institutions of China (Grant No. 161064), Shenzhen Science
In this part, we mainly discuss the limitations and future and Technology Foundation (No. JCYJ20180305124807337,
work of SilentSign. No. ZDSYS20190902092853047), GDUPS (2015), and Nat-
ural Science Foundation of SZU (No. 860-000002110537).
A. The impact of relative orientation Yongpan Zou is the corresponding author.
Although we have verified with experiments that SilentSign R EFERENCES
is not sensitive to signing positions within the sensing area,
[1] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Offline handwritten
it is to be pointed that this holds true when the device does signature verificationliterature review,” in 2017 Seventh International
not move as shown in Sec. VI-B9. Essentially, SilentSign is Conference on Image Processing Theory, Tools and Applications (IPTA).
sensitive to the relative orientation between signing a pen IEEE, 2017, pp. 1–8.
[2] J. Morgan, “2019 afp payments fraud and control survey
and the device. If the device is moved vertically or rotated report,” https://www.jpmorgan.com/commercial-banking/insights/
relative to the sensing area, the system performance will be 2019-afp-payments-fraud-control-survey-report.
[3] A. Pansare and S. Bhatia, “Handwritten signature verification using of the 24th Annual International Conference on Mobile Computing and
neural network,” International Journal of Applied Information Systems, Networking. ACM, 2018, pp. 321–336.
vol. 1, no. 2, pp. 44–49, 2012. [15] R. Nandakumar, V. Iyer, D. Tan, and S. Gollakota, “Fingerio: Using
[4] K. K. Gurrala, “Online signature verification techniques,” Ph.D. disser- active sonar for fine-grained finger tracking,” in Proceedings of the 2016
tation, 2011. CHI Conference on Human Factors in Computing Systems. ACM, 2016,
[5] D. Muramatsu and T. Matsumoto, “Effectiveness of pen pressure, pp. 1515–1525.
azimuth, and altitude features for online signature verification,” in [16] K. Sun, T. Zhao, W. Wang, and L. Xie, “Vskin: Sensing touch gestures
International Conference on Biometrics. Springer, 2007, pp. 503–512. on surfaces of mobile devices using acoustic signals,” in Proceedings
[6] A. Kholmatov and B. Yanikoglu, “Identity authentication using im- of the 24th Annual International Conference on Mobile Computing and
proved online signature verification method,” Pattern recognition letters, Networking. ACM, 2018, pp. 591–605.
vol. 26, no. 15, pp. 2400–2408, 2005. [17] Y. Zou, Q. Yang, R. Ruby, Y. Han, S. Wu, M. Li, and K. Wu, “Echowrite:
[7] A. A. Jaini, G. Sulong, and A. Rehman, “Improved dynamic time An acoustic-based finger input system without training,” in Proceedings
warping (dtw) approach for online signature verification,” arXiv preprint of IEEE ICDCS. IEEE, 2019, pp. 778–787.
arXiv:1904.00786, 2019. [18] C. Peng, G. Shen, Y. Zhang, Y. Li, and K. Tan, “Beepbeep: a high
[8] A. Levy, B. Nassi, Y. Elovici, and E. Shmueli, “Handwritten signature accuracy acoustic ranging system using cots mobile devices,” in Pro-
verification using wrist-worn devices,” Proceedings of the ACM on ceedings of the 5th international conference on Embedded networked
Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2, no. 3, sensor systems. ACM, 2007, pp. 1–14.
p. 119, 2018. [19] P. Lazik and A. Rowe, “Indoor pseudo-ranging of mobile devices using
[9] J. Fierrez and J. Ortega-Garcia, “On-line signature verification,” in ultrasonic chirps,” in Proceedings of the 10th ACM Conference on
Handbook of biometrics. Springer, 2008, pp. 189–209. Embedded Network Sensor Systems. ACM, 2012, pp. 99–112.
[10] J. Liu, C. Wang, Y. Chen, and N. Saxena, “Vibwrite: Towards finger- [20] G. Gupta and A. McCabe, “A review of dynamic handwritten signature
input authentication on ubiquitous surfaces via physical vibration,” in verification,” Department of Computer Science, James Cook University
Proceedings of the 2017 ACM SIGSAC Conference on Computer and Townsville, Qld, vol. 4811, 1997.
Communications Security. ACM, 2017, pp. 73–87. [21] “Testing the accuracy of pen tablets,” https://neuroscript.net/tablets/
[11] C. X. Zhao, T. Wysocki, F. Agrafioti, and D. Hatzinakos, “Securing reviews accuracy.php.
handheld devices and fingerprint readers with ecg biometrics,” in 2012 [22] E. A. Silva, K. Panetta, and S. S. Agaian, “Quantifying image similarity
IEEE fifth international conference on biometrics: theory, applications using measure of enhancement by entropy,” in Mobile Multimedia/Image
and systems (BTAS). IEEE, 2012, pp. 150–155. Processing for Military and Security Applications 2007, vol. 6579.
[12] J. Chauhan, Y. Hu, S. Seneviratne, A. Misra, A. Seneviratne, and International Society for Optics and Photonics, 2007, p. 65790U.
Y. Lee, “Breathprint: Breathing acoustics-based user authentication,” in [23] “Hausdorff distance between convex polygons,” http://cgm.cs.mcgill.ca/
Proceedings of the 15th Annual International Conference on Mobile ∼godfried/teaching/cg-projects/98/normand/main.html.
Systems, Applications, and Services. ACM, 2017, pp. 278–291. [24] “Testing the accuracy of pen tablets,” http://signature.wacom.
[13] Y. Zou, M. Zhao, Z. Zhou, J. Lin, M. Li, and K. Wu, “Bilock: User eu/wp-content/uploads/2014/06/Wacom factsheet-signature pad
authentication via dental occlusion biometrics,” Proceedings of the ACM STU-300-EN.pdf.
on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2, [25] A. Fischer, M. Diaz, R. Plamondon, and M. A. Ferrer, “Robust score
no. 3, p. 152, 2018. normalization for dtw-based on-line signature verification,” in 2015
[14] B. Zhou, J. Lohokare, R. Gao, and F. Ye, “Echoprint: Two-factor au- 13th international conference on document analysis and recognition
thentication using acoustics and vision on smartphones,” in Proceedings (ICDAR). IEEE, 2015, pp. 241–245.

Digisign Ref 1

Uploaded by

Copyright:

Available Formats

Digisign Ref 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digisign Ref 1

Uploaded by

Copyright:

Available Formats

2020 IEEE International Conference on Pervasive Computing and Communications (PerCom)

SilentSign: Device-free Handwritten Signature

Abstract—Signature is one of the most prevailing identity

II. R ELATED W ORK III. S YSTEM A RCHITECTURE

Fig. 3. The overview of a sample acoustic sensing system.

the user just requires to sign his/her signature in the sensing

1 20 where NZC is 1024, the length of ZC sequence, ni is the

0.5 10 explain in the last section. For computational cost-saving,

Fig. 7. 1-D tracking errors with Fig. 8. Acoustic Sensing Range.

VI. P ERFORMANCE E VALUATION

B. Model Training 1) Tracking accuracy in 1-D: We ﬁrst evaluate the accu-

80 10 AUC (%) EER (%)

100 negatively affected. This is because the orientation determines

You might also like