Aalborg Universitet
Aalborg Universitet
Aalborg Universitet
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
? Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
? You may not further distribute the material or use it for any profit-making activity or commercial gain
? You may freely distribute the URL identifying the publication in the public portal ?
Take down policy
If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to
the work immediately and investigate your claim.
I.
Natural
INTRODUCTION
II.
A. Multimodal Processes
We have integrated four basic processes in order to support
multimodal functionality, Face Detection-Recognition, Object
Detection-Recognition, Speech Recognition and Gesture
Recognition.
EXPERIMENTAL RESULTS
of
commands
and
The first step of our use case scenario deals with desktop
login through users face detection-recognition. To detect the
user has the choice either of standing (default position) or
sitting (seated position) in front of the Kinect device, so that
the skeleton tracking procedure is activated. After the users
skeleton is detected (see upper left window of Figure 2, Figure
3, and Figure 4), our system continues with cropping the RGB
image at heads joint coordinates, creating a rectangle around
TABLE II.
The second step (see Table II) deals with the application
selection through object detection-recognition. To accomplish
that, once again the user must be in front of the Kinect device,
in order for the skeleton tracking to start and as soon as they
are tracked, the cropping of the RGB image at the right hands
joint coordinates is ready to begin, creating a red rectangle
around the object (see RGB image in Figure 6 and Figure 7).
The cropped image is then compared with a set of objects
images which are already stored in our database. The result of
successful object recognition is the name of the recognized
object (see Output Results field in Figure 6 and Figure 7).
Successful object recognition results are sent to the Sentence
Compiler through the Qualifier Input Control which in turn
compiles a correct sentence, if any, using the results from step1
and step2 (e.g. Anestis Palette at Figure 6 or Petros Book
Figure 11: Application Operation according to Gestures and Voice commands Flow Diagram
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
REFERENCES
[1]
[2]
[3]
[24]
[25]