1 s2.0 S0007850624000131 Main

JID: CIRP
ARTICLE IN PRESS [m191;May 23, 2024;17:50]
CIRP Annals - Manufacturing Technology 00 (2024) 1 4
Contents lists available at ScienceDirect
CIRP Annals - Manufacturing Technology

journal homepage: https://www.editorialmanager.com/CIRP/default.aspx
A hand-interaction model for augmented reality enhanced human-robot

collaboration
Sebastian Blankemeyer, David Wendorff, Annika Raatz (2)*
Leibniz University Hannover, Institute of Assembly Technology and Robotics, An der Universitaet 2, 30823 Garbsen, Germany
A R T I C L E I N F O A B S T R A C T
Article history: Flexible and rapidly adaptable automated production processes, e.g. with collaborative lightweight robots, are key
Available online xxx aspects for the competitiveness of companies in globalised markets. However, software adaptation of the robot
still requires specific programming skills. We have developed a human-centred programming by demonstration
Keywords: approach based on augmented reality to enable operators to intuitively adapt the robot programme. The devel-
Augmented reality
oped hand-interaction model overcomes the challenge of object tracking during the assembly demonstration
Human robot collaboration
phase. This allows quick programme modifications by the operator using the head-mounted augmented reality
Assembly
device. The human-in-the-loop concept ensures a highly reliable and robust programming process.
© 2024 The Authors. Published by Elsevier Ltd on behalf of CIRP. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/)
1. Introduction mechanisms [10]. Another recent promising approach shows how

augmented reality can be linked to a digital twin of the manufactur-
Manufacturing companies, especially those with high levels of man- ing system during the programming process to provide relevant pro-
ual processes, are faced with a complex dilemma [1]. On the one hand, cess data for function blocks [4].
the use of automation is crucial for companies to remain competitive, Our approach aims to bridge the gap between flexible hardware and
especially in high-wage countries. This is all the more important given people with a high level of process knowledge but no programming
the growing shortage of skilled workers, for example due to an ageing experience, using AR as the communication medium between the oper-
population. On the other hand, there is a desire for a high degree of flex- ator and the robot. Therefore, we present an AR-based approach that
ibility in production in order to meet individual customer requirements enables companies with a shortage of skilled programmers to quickly
and to be able to react quickly to changes. Consequently, adaptable and intuitively adapt production processes. Within this approach, we
automation systems, such as lightweight robots designed to work with propose a novel hand-interaction model that allows the operator to
humans, have significant potential to address these challenges [2]. This monitor the translation of the demonstrated assembly steps into the
difficulty is particularly pronounced for small and medium-sized enter- basic motions of the Method-Time-Measurement (MTM). This permits
prises (SMEs) due to the increasing shortage of skilled workers. the use of the operator’s extensive and highly valuable process knowl-
Lightweight robots designed for collaboration have the potential to edge, which is often neglected in other programming by demonstration
automate production processes flexibly and cost-effectively [2,3]. Inte- (PbD) approaches. Coupled with a rapidly adaptable hardware, this idea
grated sensors allow the hardware to be quickly adapted to different pro- supports companies in making rapid process adaptations.
cesses. Many robot models simplify programming, allowing operators to In the following, a brief overview of the holistic framework for
use familiar methods, similar to using a smartphone. However, users rapid programming is given. The developed, integrated hand-interac-
must adapt to different manufacturer formats and these systems often tion model is then described in detail. Finally, the model’s functional-
lack direct user feedback, contradicting human-centred process design. ity is demonstrated in a pick-and-place process.
Simplifying the programming process is therefore already the
subject of recent research [4,5]. The use of augmented reality (AR) as 2. Holistic framework for AR enhanced HRC
a communication and/or programming medium between human and
machine is a promising technology to overcome the challenges of a To support SMEs with their automation challenges, we have
flexible automated production system [6,7,8]. For example, Wang et developed a framework that addresses many of the requirements
al. developed an augmented reality approach to improve the accuracy that arise during integration of new HRC workstations. Our frame-
of Human Robot Collaboration (HRC) [9]. An approach by Makris et al. work allows companies to quickly adapt the workstations by using
has implemented an operator support tool for HRC that shows the AR. In addition, it is an enabler for vendor-independent, human-cen-
operator relevant information about the process and safety tred automation. To qualify our concept for practical applicability,
both human and hardware implementation must be considered.
* Corresponding author. While some of the required components and functionality are already
E-mail address: [email protected] (A. Raatz). available, others still need to be researched.
https://doi.org/10.1016/j.cirp.2024.04.001
0007-8506/© 2024 The Authors. Published by Elsevier Ltd on behalf of CIRP. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
Please cite this article as: S. Blankemeyer et al., A hand-interaction model for augmented reality enhanced human-robot collaboration, CIRP
Annals - Manufacturing Technology (2024), https://doi.org/10.1016/j.cirp.2024.04.001
JID: CIRP
2 S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4
Our framework for AR enhanced HRC consists of three different requires object specific training. To use this method, a training data-
layers (see Fig. 1). The shop floor consists of lightweight robots set must be created and trained prior to pose estimation. On the other
designed to interact with humans. These are already commercially hand, the MegaPose [12] method is able to recognise novel, not pre-
available in large numbers from various robot manufacturers. Periph- known, objects. This means that only CAD files of the objects need to
erals, such as edge devices or databases, are connected in a different be provided during pose estimation. The trade-off is that the initial
layer. The operator with the necessary interfaces to the technical sys- processing time of approximately 27 s is comparatively long. In order
tems is located in the HRC-HUB. Through the use of universal APIs to overcome this disadvantage, we use a fixed sensor, which makes it
and middleware-based approaches in between the layers, our system possible to determine the poses even before the PbD.
ensures flexible extensibility and adaptability. Table 1 shows a comparison between the two methods and their
respective image processing time, as well as the average recall on the
Industrial 3D Object Detection Dataset (ARITODD). The average recall
score indicates the accuracy of the object’s pose estimation within spe-
cific parameters. A higher score is desirable. This score was determined
for the ITODD dataset, which has a strong focus on objects and require-
ments from industrial use cases. Therefore, we chose this evaluation
metric as it best meets the requirements of our applications.
Table 1
Comparison of used pose estimation methods.
Method Novel objects ARITODD Time (s)
ZebraPose ‘ 0.504 0.250

MegaPose @ 0.422 27.039
We chose MegaPose because it requires no prior training. This

means that no additional effort is required to create a dataset and no
qualified personnel is needed.
While the average recall of both methods is sufficient for pose
estimation of static objects, it does not guarantee accurate detection
of moving objects. The strong occlusion during the demonstration,
mainly by the operator’s hand, adds to this problem. In conclusion,
Fig. 1. General overview of HRC HUB vision and its integration within the shop floor the currently available methods are not sufficient to meet our
and peripheral services and devices.
requirements for reliable and continuous pose estimation during the
PbD. Therefore, an additional solution for moving objects is required.
At the centre of the HRC HUB is the human operator who, as a
process expert, knows the characteristics and parameters of the pro- 2.2. A novel hand-interaction model for PBD
duction process. The operator interacts with the system through our
Human Machine Interface (HMI). The role of the HMI is to enable the In contrast to pose estimation approaches, there are already robust
programming by demonstration. To do this, the real environment hand tracking methods in AR that meet our requirements, such as
must be digitally captured using sensors. Methods such as the 6D Microsoft’s MRTK2. Therefore, it is logical to use hand tracking to
pose estimation and our novel hand-interaction model handle this derive the object position during movement for our concept of PbD.
input. A head-mounted display (HMD) (e.g. the Microsoft HoloLens2) To this end, we propose a novel hand interaction model that enables
captures and processes the operator’s input and provides feedback. the manipulation of virtual components by handling them in the real
The HMI connects to other components via our core stack. This stack world. Our proposed model is less susceptible to occlusion and offers
connects the HMI to peripherals and the shop floor while also containing the possibility of systematically decomposing assembly tasks into basic
additional supporting components. It processes input from the HMI with operations. This is achieved through a combination of hand tracking
an event handler and stores the corresponding data in a document-ori- during the demonstration and various object colliders for the interaction
ented database. We also use Plug and Produce to automatically deploy between the hand and the object. In addition, virtual indicators on the
the system. The core stack interacts with the specific robot on the shop back of the hand provide visual feedback to the operator (see Fig. 2).
floor via our standardised Robot API. It can therefore be connected to
the robot’s native programming (e.g. Kuka Robot Language or Java for
Kuka Sunrise.OS) or with the use of ROS2, which is a middleware-based
approach that allows the use of different robot systems. There are also
interfaces to various peripherals that allow easy integration of services.
For example, there will be an open communication interface to the oper-
ator or other personnel, as well as a database for the CAD models of the
products. A reliable interface to the CAD models is needed since they are
required both for visualising the components in the virtual world and
for the pose estimation explained in the following.
Fig. 2. Relation between state, operation virtual indicator and object collider (green
2.1. 6D pose estimation as a key component for PBD with dashed line) within the hand-interaction model.
To use PbD effectively, it is necessary to capture the pose of the

real parts during the demonstration. Only then can we transfer the In the initial state of an interaction, the object is being held “off-
demonstrated processes into a program for the robot. This is why 6D hand”. A box collider at a distance of 8 cm from the object is set up to
pose estimation of rigid objects is an integral part of our HMI. detect when the operator’s hand approaches the object. A collision
We have chosen two different state-of-the-art methods for our event with at least two tracked fingertips of the operator is assumed to
PbD approach, both of which offer unique advantages. The ZebraPose be an approach. There is a similar procedure for gripping objects. A
[11] method uses surface encoding to achieve superior results on tex- mesh collider is created to monitor the contact between the fingertips
tureless objects. However, as a machine learning based method, it and the surface of the object. This collider must be slightly larger than
JID: CIRP
S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4 3
the dimensions of the CAD model, as deviations in pose estimation or initial position, capturing during motion and the final position. Each
fingertip detection can lead to deviations between the assumed and method has specific advantages and disadvantages. Therefore, the most
actual distance between the hand and the object. As soon as two or suitable one depends on the respective requirements such as tasks and
more fingertips are in constant contact, the operation is set to grasp and environmental conditions. We have selected the methods to best match
enters the “in-contact” transition state. The points of contact are the requirements of SMEs with a rapidly changing product range.
recorded and can be used later to determine a possible gripping position
for the end effector. Gripping connects the CAD object to the root of the
operator’s hand. During object manipulation, the operator’s fingertips
may be covered, making the root of the hand a more stable option as it
can be continuously tracked. The position of the movement is continu-
ously recorded during the “in-hand” state. To recognise when the dem-
onstration is finished, a small box collider with a distance of 1.5 cm to
the object is applied. Once the fingertips are no longer in contact with
the collider, the state returns to ‘in-contact’, concluding the PbD.
During the demonstration, the interaction between hand and
object colliders is used to derive MTM base operations (reach, grasp,
move, position and release). Virtual indicators display the respective
active operation to the operator. If there are any deviations between
the real and derived operations, the operator can recognise them and
take appropriate action. To complete the PbD, a robot program can be Fig. 4. Stages of object tracking and respective different methods of pose estimation
created based on the demonstrated MTM sequence. A similar (used methods are highlighted).
approach was presented in our previous work [5].
For the initial object detection and associated ROI determination, a

3. Implementation of the hand-interaction model marker based approach was chosen, as this is very robust, while still
offering some flexibility, due to the moveable markers. A more flexible
Our hand-interaction model enables reliable programming by approach would be the object detection via a fixed sensor or the HMD.
demonstration in four steps. The first two steps involve preparation, However, this comes with specific challenges and is less robust.
and the operator only executes the final two steps (see Fig. 3). In the initial pose estimation, the rough position is already given
The initial step is to prepare the assembly model and markers. This by the ROI. We decided to use a fixed sensor for novel objects (Mega-
includes separating and transforming the individual components. The Pose) because it does not require object-specific training. This has the
geometric conditions and relative positions are then determined and advantage of not requiring additional skills, especially for SMEs.
can later be used for the exact target pose. The second step is to estab- However, this comes with the limitation of longer image processing
lish the connection and exchange data. The process of pairing the nec- time for the initial image. The use of a fixed sensor compensates for
essary technical devices, such as HMD or edge device, is semi- any delay by allowing pose estimation to be performed prior to the
automatic. The operator only needs to confirm the correct connection. demonstration process.
A data connection is then initialised between the HMD and the edge We use our developed hand-interaction model for moving object
device, over which the relevant data (e.g. RGB data stream) is trans- tracking. This has the advantage that the overlapping of components
ferred. The virtual workspace is then implemented in step three, con- during grasping does not affect the position determination. By deriv-
necting the real and virtual worlds through a basic reference system ing the position from the root point of the hand, stable and reliable
established by an marker. Furthermore, a pose estimation procedure tracking of the components is possible.
(in this case MegaPose) is used to align the components and virtual In the final stage, pose estimation of the target pose is particularly
assemblies. In the final step, the operator performs the PbD using the important as it enables subsequent assembly. The use of geometric
hand-interaction model, as described in Section 2.2. constraints provides a very robust option, as they do not depend on
sensory input. However, this requires a geometric relationship within
3.1. Object tracking during demonstration stages an assembly group. We therefore use the pose derived from hand
tracking as it is more flexible and can be used independently of
Object tracking is a crucial aspect of deriving the MTM-1-based assembly relationships (e.g. simple Pick-and-Place processes).
representation when demonstrating the assembly, as the blocks are As previously mentioned, the approach can utilise the human
defined from the poses of the objects together with the human move- operator’s intelligence and perception to intervene if any deviations
ments. For each stage of the four stages in the tracking of objects, there are detected. If the operator is not satisfied with the accuracy of the
are different methods to derive the pose of the object, see Fig. 4. The overlap between the virtual and real parts, he or she can simply
stages include determining the region of interest (ROI), the detailed restart the PbD by moving the part back to its initial position.
Fig. 3. Procedure and necessary steps for our programming by demonstration approach.
JID: CIRP
4 S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4
3.2. Programming by demonstration in assembly 4. Conclusion and outlook
The operator’s workstation is shown in Fig. 5 and includes a light- The pressure to remain competitive is forcing companies to auto-
weight robot (in this case a LBR iiwa from KUKA), the components of mate their production processes. Lightweight robots already offer a suit-
the assembly and the operator with the head-mounted augmented able opportunity to take advantage of the strengths of automation while
reality device (HoloLens2 from Microsoft). The components are maintaining a high level of flexibility. We present an approach to sim-
placed on the markers that define the ROI for pose estimation. Once plify the programming of such workstations in this paper. The various
the pose of the real components has been determined, they are steps and components of the overall approach are briefly presented and
superimposed as virtual holograms in the HMD. This approach offers explained. The focus is on the human-machine interface with its main
the advantage, particularly for SMEs with a large number of variants, components of pose estimation and the novel hand-interaction model.
that production processes can be adapted quickly by exchanging The AR interface allows the operator to continuously monitor and adjust
markers and estimating the pose without the time-consuming crea- the process without in-depth programming knowledge. The resulting
tion of training data. process representation is available as an MTM sequence and can then
be transferred to a robot program or used for further process analysis.
The presented HRC Hub is still in activate development and will
be particularly useful for SMEs. The hand-interaction model with the
human-in-the-loop approach allows the specific process knowledge
of the operator to be used to obtain a reliable and correct demonstra-
tion output. Combined with the flexible hardware, this provides an
optimal solution for rapid process adaptation.
Further work can be carried out on selecting and combining
appropriate object tracking approaches based on application tasks.
This can enhance usability and streamline processes. Additionally, it
can be transferred to service robotics and skilled trades, where intui-
tive programming is also a challenge.
Declaration of competing interest
The authors declare that they have no known competing financial

interests or personal relationships that could have appeared to influ-
Fig. 5. Overview of our workplace showing the different components and their respec- ence the work reported in this paper.
tive coordinate systems (fixed RGB camera not shown).
CRediT authorship contribution statement
The functionality of the system is illustrated in Fig. 6. The figure Sebastian Blankemeyer: Conceptualization, Investigation, Meth-
shows the scenery that the operator can perceive during the execution odology, Writing original draft. David Wendorff: Investigation,
of the assembly process (real and virtual content). The operator has Software, Visualization, Writing original draft. Annika Raatz: Con-
picked up a component and is moving it to its target position (defined ceptualization, Funding acquisition, Project administration, Supervi-
by the operator). The red virtual indicator signals that the current sion, Writing review & editing.
operation is “Move”. Additional content about the components, such
as the name, can be displayed to the operator if required. As can be References
seen, there is a deviation between the real and virtual workpiece,
which is linked to the operator’s hand. This is due to an error in the [1] Keme ny Z, Vancza J, Wang L, Wang XV (2021) Human Robot Collaboration in
Manufacturing: A Multi-Agent View, Advanced Human Robot Collaboration in
previous pose estimation. However, the human-in-the-loop approach
Manufacturing, Springer, 3–43.
allows the correction by restarting the PbD. If the demonstration was [2] Wang L, Gao R, Vancza J, Kru € ger J, Wang XV, Makris S, Chryssolouris G (2019)
successful, the derived MTM-based representation can be translated Symbiotic Human-Robot Collaborative Assembly. CIRP Annals - Manufacturing
into a robot program and sent to the robot. Thus, our approach offers a Technology 68(2):701–726.
[3] Raatz A, Blankemeyer S, Recker T, Pischke D, Nyhuis P (2020) Task Scheduling
quick and intuitive method for utilising the operator’s process knowl- Method for HRC Workplaces Based on Capabilities and Execution Time Assump-
edge and combining it with the robot’s strengths (e.g. endurance) and tions for Robots. CIRP Annals - Manufacturing Technology 69(1):13–16.
the human’s strengths (e.g. ability to react to unexpected events). [4] Liu S, Wang XV, Wang L (2022) Digital Twin-Enabled Advance Execution for Human-
Robot Collaborative Assembly. CIRP Annals - Manufacturing Technology 71(1):25–28.
Using a HMD in HRC workstations has the advantage of ambidexterity, [5] Blankemeyer S, Wiemann R, Posniak L, Pregizer C, Raatz A (2018) Intuitive Robot
meeting accessibility requirements and enabling the use of both hands Programming Using Augmented Reality. Procedia CIRP 76:155–160.
for operational tasks. [6] Nee AYC, Ong SK, Chryssolouris G, Mourtzis D (2012) Augmented Reality Applica-
tions in Design and Manufacturing. CIRP Annals - Manufacturing Technology 61
(2):657–679.
[7] Overmeyer L, Ju € tte L, Poschke A (2023) A Real-Time Augmented Reality System to
See Through Forklift Components. CIRP Annals - Manufacturing Technology 72
(1):409–412.
[8] Makris S, Karagiannis P, Koukas S, Matthaiakis A-S (2016) Augmented Reality Sys-
tem for Operator Support in Human Robot Collaborative Assembly. CIRP Annals -
Manufacturing Technology 65(1):61–64.
[9] Wang XV, Wang L, Lei M, Zhao Y (2020) Closed-Loop Augmented Reality Towards
Accurate Human-Robot Collaboration. CIRP Annals - Manufacturing Technology 69
(1):425–428.
[10] Makris S, Pintzos G, Rentzos L, Chryssolouris G (2013) Assembly Support Using AR
Technology Based On Automatic Sequence Generation. CIRP Annals - Manufactur-
ing Technology 62(1):9–12.
[11] Su Y, Saleh M, Fetzer T, Rambach J, Navab N, Busam B, Stricker D, Tombari F
(2022) ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estima-
tion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Rec-
ognition (CVPR), 6738–6748.
[12] Labbe Y, Manuelli L, Mousavian A, Tyree S, Birchfield S, Tremblay J, Carpentier J,
Aubry M, Fox D, Sivic J (2023) MegaPose: 6D Pose Estimation of Novel Objects Via
Fig. 6. Photo recorded with HoloLens 2 showing holographic virtual objects during the Render & Compare. In: Proceedings of the 6th Conference on Robot Learning (CoRL),
programming by demonstration. PMLR, 205:715–725.

1 s2.0 S0007850624000131 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0007850624000131 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0007850624000131 Main

Uploaded by

Copyright:

Available Formats

JID: CIRP

ARTICLE IN PRESS [m191;May 23, 2024;17:50]

CIRP Annals - Manufacturing Technology 00 (2024) 1 4

Contents lists available at ScienceDirect

CIRP Annals - Manufacturing Technology

A hand-interaction model for augmented reality enhanced human-robot

1. Introduction mechanisms [10]. Another recent promising approach shows how

2 S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4

Method Novel objects ARITODD Time (s)

ZebraPose ‘ 0.504 0.250

We chose MegaPose because it requires no prior training. This

To use PbD effectively, it is necessary to capture the pose of the

S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4 3

For the initial object detection and associated ROI determination, a

4 S. Blankemeyer et al. / CIRP Annals - Manufacturing Technology 00 (2024) 1 4

3.2. Programming by demonstration in assembly 4. Conclusion and outlook

Declaration of competing interest

The authors declare that they have no known competing ﬁnancial

CRediT authorship contribution statement

You might also like