Tulechki Nikola
Tulechki Nikola
Tulechki Nikola
JURY
3
Acknowledgements
Much like rock climbing, writing a thesis is both a solitary endeavour and
a team exercise. Making progress up the mountain is not possible without
someone standing on terra firma and holding the other end of the rope. Com-
pleting a thesis is not possible without a strong, rich and varied supportive
context. For that, before we begin, I would like to express my gratitude to all
the people who have helped me in doing this research.
First and foremost I would like to thank Ludovic Tanguy who has been
my adviser not only for this thesis but for my entire life in academia. Starting
from the very first introductions to the domain of natural language processing
and to computer programming in 2007 right up to this very moment, Ludovic’s
advice, guidance and support have been invaluable. Whether presented with
an outrageous “great" idea or with a last minute crisis, Ludovic has always
had a trick up his sleeve and with the fitting words has pointed me in the
right direction. For all that and much more, thank you, Ludovic!
I would like to thank the members of the jury, Patrice Bellot and Yannick
Toussaint for accepting to write the detailed reports and Cécile Fabre for
accepting to be my examiner.
This thesis would not have been possible without the material support
and the context provided by both CFH/Safety Data and the CLLE-ERSS
linguistics research laboratory.
From CFH/Safety Data, I would like to thank Eric Hermann and Michel
Mazeau for providing me with this opportunity and for believing in the po-
tential of natural language processing in the context of risk management. I
would also like to express my gratitude to them for introducing me to the fields
of industrial ergonomics and human factors and thus providing the founda-
tions for my understanding of the complexities of human work and interaction
with technology. My respect also goes to the rest of the CFH/Safety Data
team: Céline Raynal, Christophe Pimm, Vanessa Andréani, Marion Laignelet
and Pamela Maury for constituting of this exceptionally rich working envi-
ronment. From CLLE-ERSS I would like to thank all the past and present
members, whose various inputs throughout both the time of this writing and
my academic upbringing constitutes the foundations of this research. A won-
derful and colourful crowd, of which I feel honoured being a part.
5
6
I would especially like to thank Assaf Urieli and Nicolas Ribeiro without
whose help and contributions whole sections of this thesis would not exist
as well as my good friend and colleague Aleksandar Kalev for his years long
professional and personal support.
My gratitude also goes to Marie-Paule Péry-Woodley for providing a much
needed external perspective and helping me overcome the dreaded writer’s
block and Mai Ho-Dac for always finding a way to show me the bright side of
academia.
For their invaluable input in helping me understand the the intricacies of
aviation and flying I would like to thank Grégory Caudy, Jerôme Rodriguez
and especially Reinhard Menzel for patiently sharing his profound knowledge
of aviation safety management.
For accepting subjects related to my work for their Masters projects and for
their exemplary work, I thank Céline Barès, Joao Pedro Campello Rodriguez
and Clement Thibert.
For their support and encouragements, I thank my fellow doctoral students
Fanny Lalleman and François Morlane-Hondère and Caroline Atallah.
Finally to all those, whose love has made me wake up with a smile in the
morning and to all those whose words have made me fall asleep with a peaceful
mind. Thank you!
Contents
List of Figures 11
List of Tables 13
Introduction 17
7
8 CONTENTS
Conclusion 181
Bibliography 191
List of Figures
11
12 List of Figures
13
List of Acronyms
BEA Bureau for Safety Investigations and Analysis for Civil Aviation (Bu-
reau d’Enquêtes et d’Analyses pour la Sécurité de l’Aviation Civile)
IR Information Retrieval
15
16
17
18
That day the first accident caused by man’s poor understanding and im-
proper use of complex technology had occurred. The first accident report had
been submitted. The first post-accident investigation had been conducted and
the first set of safety-related regulations had been issued.
Today millions of fires burn around the globe, breathing life in the immense
apparatus that keeps our society on its feet. Complex techno-social systems
such as energy extraction and production, transportation, healthcare, man-
ufacturing and the military often involve thousands of individuals working
with complex machinery, channelling vast amounts of energy. We take it for
granted that these systems almost never fail. We entrust our lives on to while
expecting them to forever innovate and outperform. Yet, safety is not a nat-
ural byproduct of the industrial process. In order to achieve and maintain
acceptably low levels of failure, modern systems rely on a framework of reg-
ulatory processes that guide day to day work practices and decision making.
And, as much as energy is the lifeblood of any industry, information is the
vital fluid of its immune system.
the heavy use of domain specific vocabulary. For such solutions to be useful,
one also needs to take into account the redundant information and overlap
between taxonomies and natural language descriptions. Likewise, given the
widespread use of taxonomies, text categorisation, a well proven technology,
is applicable to incident and accident data and has the power to reduce the
need for manual coding, while increasing the coverage of industry-standard
metadata throughout a given collection. This requires however a thorough
understanding of the specificities of these nomenclatures in order to correctly
define the classification task.
• Having identified the needs of safety experts and explored the informa-
tional landscape in which they engage in their activities, we show how
NLP can contribute to improving the tools used by them when working
with incident and accident reports stored in electronic format and incor-
porate language processing in tools specifically tailored to the industry’s
requirements.
We show that text can be viewed not only as the vehicle of information
between humans, but also as a resource that, when properly tapped and ex-
ploited has the potential to improve the overall quality of communication of
safety-related information within a given system. And we show that by con-
sidering the specificities of the data and the sector, one both improves the
quality of NLP applications designed to operate within the specific domain,
better chooses specific NLP methods and technologies and better adapts them
to the precise needs expressed by the community.
20
CFH - Safety Data at the time were (and still are) working with a num-
ber of actors from civil aviation, both public entities such as the French state
regulator (DGAC5 ), the authority responsible for carrying out safety inves-
tigations (BEA6 ), the European Aviation Safety Agency (EASA7 ) as well as
private aircraft manufacturers and service providers.
They were developing a text-based document classification solution for in-
cident and accident reports (Hermann et al., 2008) and were seeking to expand
their R&D8 effort to more than just classification of accident reports. This
project was thus launched in partnership with the NLP group of CLLE-ERSS,
with an initial focus on prediction and identification of human-factors related
issues in free text with the objective to integrate such functionalities in CFH
- Safety Data’s existing commercial solutions destined at safety professionals.
At that time, with the usual enthusiasm associated with the beginning
of a thesis, we focused the initial research project around the idea of “weak
signal detection” and our goal was to propose methods for identifying new
and unseen risky scenarios in incident and accident report narratives. We
started looking at methods for detecting outliers and statistical anomalies.
As some of these methods are based on distance (or similarity), we started
playing with document-document similarity and very early on (winter 2010)
we proposed a basic application for identifying similarities among incident and
accident reports. In order to present these reports, rather than showing a list
of documents to the user, the application would make use of an interactive
visualisation technique, combining chronological distribution and textual sim-
ilarity. When we presented the prototype to the clients of CFH - Safety Data
data they immediately found it very pertinent to their everyday needs. The
prototype became the timePlot system, which we present in Chapter 5.
2
http://www.safety-data-analysis.com/
3
http://w3.erss.univ-tlse2.fr/
4
Industrial Agreements for Training Through Research(Conventions Industrielles de For-
mation par la REcherche)
5
Directorate General for Civil Aviation(Direction Générale de l’Aviation Civile)
6
Bureau for Safety Investigations and Analysis for Civil Aviation(Bureau d’Enquêtes et
d’Analyses pour la Sécurité de l’Aviation Civile)
7
European Aviation Safety Agency
8
Research and Developpement
INTRODUCTION 21
Document outline
This document is organised as follows:
Chapter 1 introduces the basics of accident prevention. We first discuss
the events that need to be studied in order to prevent accidents. Next, we
take a look at safety as a multidisciplinary and broad problem, ranging from
airplanes to politicians and how information about incidents plays a key role
in managing it. Finally, through an example, we discuss how this information
is used.
Chapter 2 presents incident and accident data and how it circulates through
the regulatory framework of civil aviation. We will start by applying a risk
management model to the sector and list the different entities involved in the
safety process. Next, we will see a representative cross-section of the types of
occurrence data, how it is produced and what information it vehicles. Next,
we will explain how this data is stored and organised using taxonomies, before
showing examples of how this data is used in order to improve the safety of
civil aviation and discussing what the main problems that arise when manip-
ulating occurrences on a large scale are. Finally we draw up a list of needs
can be addressed by NLP applications.
Chapter 3 presents the domains of Information Retrieval and Text Cat-
egorisation and how they answer the needs expressed by the aviation safety
community. Each section is organised by first presenting the domain and the
key concepts, before discussing the specific implication of their application to
occurrence data.
Chapter 4 is divided in two parts. First, we present our solution to the
problem of normalising the textual material we encounter in incident and
accident reports in order to transform it to formats suitable for vector space
modelling. Next, we discuss the vector space modelling framework, central to
many current NLP methods.
Chapter 5 presents the timePlot system for detecting similar occurrence
reports. We present the tool’s graphical interface and show examples of the
results it presents to the users. We then discuss how the tool was really used
and how, by observing the actual use of such a tool, we came to gain further
insight into the needs of the users.
Chapter 6 explores the notion of similarity from several different angles,
each addressing a different aspect of the complex notion. We first present a
method that learns from documents and their associated metadata attributes
and allows to filter out one or another facet of similarity. Next, we address
the question of multilingual databases and explore the potential of second-
order similarity methods to model collections of documents written in different
languages. Next, we compare the results of Topic Modelling to the information
in ASRS’s metadata and study their overlap. Finally, we present an approach
based on active learning, allowing a user to model a certain aspect of an
accidental scenario by providing the system with a few initial examples.
Chapter One
25
26 1.1. WHAT IS AN ACCIDENT?
1
By similarity, the media seem to think that the outcome in terms of death tolls is more
important than the comparability of the events themselves.
2
Source:OAG Aviation & PlaneCrashInfo.com accident database, 20 years of data (1993
- 2012)
3
From 2003 to 2013, 5085 perished in Aircraft accidents, while over 8400 deaths are
recorded in the official Bulgarian statistic for traffic accidents.
4
International Civil Aviation Organisation
BASICS OF ACCIDENT MODELLING AND RISK MANAGEMENT 27
consequences of the event and at the final stage, cultural readjustment, lessons
are (hopefully) learnt from what just happened.
An accident can therefore be described as a complete instantiation of the
process from stage 1 trough 7. The magnitude of the accident depends on
the mitigation measures taken in order to stop a minor event from escalating.
Stopping before the situation gets out of control signifies breaking the chain
reaction between the onset (stage 4) and a new triggering event (stage 3).
The (initial) triggering events, however can be so insignificant that im-
mediate mitigation measures are so effective that the accident is “stopped in
its tracks” and phases 6 and 7 do not occur. Furthermore, as we will see, a
properly functioning system by definition does not allow development further
than phase 2. Gradual build-up of the very conditions potentially leading to
an accident is not allowed.
In any case accident prevention is largely about monitoring and under-
standing the system. When an accident occurs, crucial lessons are learned.
Those lessons are applied to the system so the process of failure is interrupted
at the earliest possible stage. In civil aviation, official accident investigations
(§2.1.2) produce this kind of feedback. However, as the system gets safer, we
(fortunately) have less and less concrete examples of accidents to work with.
BASICS OF ACCIDENT MODELLING AND RISK MANAGEMENT 29
In order to improve safety, we therefore have to work with cases, where the
failure process was initiated but was stopped before becoming a full-blown
catastrophe: incidents and abnormal situations. Monitoring such events be-
comes the focal point of the ongoing effort of improving the safety of an almost
perfect system.
In the same spirit, in civil aviation, events which should be reported are
split in three categories. Annex 13 to the Convention on International Civil
Aviation (ICAO, 2001) gives the definitions shown in Figure 1.2.
In ICAO’s definition a continuum is clearly present. An accident is defined
as a function of the severity of the occurrence. An incident is defined as
opposed to an accident and a serious incident, defined as an incident that was
almost an accident, manifesting the overlap between the two concepts.
Johnson (2003, pp. 17-18) perfectly illustrates the difficulty of defining
events ranging from, say, the discovery of an apple on the floor of a jet-
liner’s cockpit6 to the meltdown of Chernobyl’s reactor core. He gives a
meta-definition, listing seven different strategies at attempting to define these
events.
It is not in the scope of this thesis to discuss the conflicting definitions of
what an accident or incident is, nor to provide yet another one. So, in order
to skirt the accident/incident dichotomy and the need to discretise what is
clearly a continuum, we will employ the term occurrence and define it in a
6
This particular event comes from the internal incident reporting program of an airline.
The danger is that the fruit may block the rudder pedals at an inappropriate moment during
the flight.
30 1.1. WHAT IS AN ACCIDENT?
In a dynamic and constantly evolving world, the different levels are subject
to various disruptive forces to which the system must adapt. L1 for exam-
ple is influenced by shifts in public opinion and politicians seek to respond
by changing legislation. On the level of individual companies, changes in the
market such as competition or shortage of resources call for counteraction. On
the level of staff and (L4, L5) management, phenomenons such as “normali-
sation of deviance” (Vaughan, 1996) introduce a gradual and continuous shift
towards riskier behaviours. Finally the ever more rapidly evolving technology
causes constant changes to level L6.
Understanding each level involves different academic disciplines: political
science (L1, L2), law (L1, L2), economics and sociology, (L1, L2, L3), or-
ganisational psychology and management theories (L3, L4), human-machine
interaction and human factors (L5) and various engineering disciplines (L6).
Change affects the system as a whole, but radically different frameworks are
used to analyse and adapt to change at different levels of the hierarchy. This
leads to misalignments that weaken the system and lead to catastrophes.
When changes are made at higher levels, they often disregard the implica-
36 1.2. RISK MANAGEMENT IN A COMPLEX SYSTEMS
tions that they have on the lower levels of the hierarchy. When changes occur
on lower levels of the hierarchy, the higher ones must be adequately informed
in order to adapt accordingly.
In order to ensure safe operations within the system, vertical alignment of
the different levels must be maintained. This boils down to ensuring effective
two-way information flow within the hierarchy.
• Systems get more and more complex with time. With scale and advanc-
ing technology, the number of “moving parts” within a system, both in
a strict sens and metaphorically speaking increases dramatically. There
are millions of parts in a single airplane. In most cases they are pro-
duced by hundreds of subcontractors from all over the world. For a single
flight to be completed thousands upon thousands of interactions need to
be performed ranging from the pilot acting upon the throttles, through
different air-traffic controllers ensuring a free corridor up to the airline
personnel calculating fuel needs and even booking the hotel for the pi-
lots. Each one of these interactions has the potential to impact safety
and all need to be considered. With complexity the need for empirical
data for decision making is increasing.
BASICS OF ACCIDENT MODELLING AND RISK MANAGEMENT 37
• Systems are becoming safer. With time and well functioning risk man-
agement, common and “simple” sources of failure are eliminated. In
consequence, today’s accidents are of a far more complex and uncom-
mon nature than those of yesteryear. Understanding and preventing
them thus requires a far more detailed knowledge of the underlying pro-
cesses and of the system as a whole (Amalberti, 2001).
Placed within Rasmussen’s model, the bulk of the work in this thesis ad-
dresses issues situated on the ascending flow of information between the front
line operators and the higher levels of the hierarchy. It is this information that
safety experts need in order to gain insight on the overall state of operations.
This account taken from Macrae (2007) shows how the nature of the exper-
tise is manifested in the very creation of the relation. Drawing a connection
between these events means first isolating the factors that were alike in both
BASICS OF ACCIDENT MODELLING AND RISK MANAGEMENT 39
incidents (runway overrun, water logged runway) factoring out elements such
as the ineffective crew communication and filtering out factors such as the
malfunctioning windscreen wiper, which was incidental in one of the events.
In this example, not knowing the existence of all of the cargo-related events
would have obviously prevented the experts from making the connections.
The opposite is also true. Having to keep track of hundreds or even thou-
sands of parallel occurrences would exceed the capacity of any single human.
Furthermore the final element needed to make the connection was extrinsic
information - the identical localisation of all the occurrences. This example
shows the importance of access to well organised and categorised databases.
In both examples interpreting the occurrences was based first and foremost
on establishing a relation between them and categorising the nature of that
relation. In both cases the reasoning was based on a thorough expertise of the
domain and information about the occurrences. Investigators had knowledge
about these events. In the first example it seems that the “QF1” occurrence
was well known, and immediately came to mind. In the second example inves-
tigators probably became “alert” when several similar events were reported
over a short period and were looking out for more of the same kind. Effec-
tiveness of these tactics depends mostly on the expertise of investigators and
providing them with just the right amount of information.
This type of reasoning pointed us to the basic need in the industry for
facilitated access to information contained in incident and accident narratives
and ultimately to the prototype presented in chapter 5 and the further re-
search into the subject presented in chapter 6.
40 1.4. CHAPTER CONCLUSION
“I never saw a wreck and never have been wrecked, nor was I ever
in any predicament that threatened to end in disaster. [...] I can-
not imagine any condition which could cause a ship to founder.
I cannot conceive of any vital disaster happening to this vessel.
Modern shipbuilding has gone beyond that.”
— Cpt. Edward Smith (Captain of Titanic)1
This chapter is about incident and accident data and how it circulates through
the regulatory framework of civil aviation. We will start by applying a risk
management model to civil aviation and listing the different entities involved
in the safety process. In Section 2.1 we will see a representative cross section
of the types of occurrence data and how it is produced. Next, in Section 2.2 we
will explain how this data is stored and organised using taxonomies. We will
introduce the concept of meta data and show examples of different solutions.
Then, in Section 2.3 we will show how this data is used in order to improve
the safety of civil aviation. Finally in Section 2.4 we will discuss what are the
main problems that arise when manipulating occurrence on a large scale and
how NLP solves some of them.
1
New York Times, April 16, 1912
41
42
A century of failures
As we saw in the previous chapter, when the system fails, light is shed on its
inherent weaknesses and actions are subsequently taken in order to improve
its robustness. Civil aviation is no exception, where significant accidents are
also the most important catalysts for improving safety. Major changes are
introduced to the system, in the wake of every plane crash, slowly shaping the
manufacturing and regulatory landscape as we know it today. Before diving
in the details of how data is produced, managed and used, let us take a look
at three major early accidents that influenced aviation on the manufacturing
and organisational levels..
On the manufacturing level, airplane design has been a continuous process
of trial and error. Starting from the Comet Crashes in 1954 (RAE, 1954),
meticulous investigation of accidents helped reveal the causes of countless
technical failures and propose solutions and design improvements that are
present in all of today’s aircraft. The De Haviland Comet was the first com-
mercial passenger jetliner and it was not before two of them exploded in mid
air instantly killing all aboard, that they were declared unsafe to fly, due to a
combination of poor design and manufacturing techniques.
Pressurisation-depressurisation cycles caused fatigue cracks to form at the
corners of the airplanes’ square windows. The cracks grew bigger and bigger
until structural integrity was lost and the aircraft literally popped like a bal-
loon. Today, due to the lessons learned from these accidents, mid-air explosive
decompression due to metal fatigue is a thing of the past.
This particular series of accidents also has the merit to have founded the
discipline of accident investigation itself. In the immediate aftermath, the
United Kingdom saw its ambitions at becoming a global leader in commercial
jet-powered aviation suddenly grind to a halt. Consequently, a considerable
political will2 was directed at finding the problem. Given that both aircraft
had disintegrated at cruising altitude and over the Mediterranean sea, very
little evidence to what went wrong was readily available. The investigators
had to seek help from the Royal Navy in recovering the wreckage from the sea
bed (a first) and then workout a theory to why the aircraft had exploded. The
hypothesis gradually narrowed down to metal fatigue and in order to prove
their theory, the investigators conducted a real-scale test by enclosing the
same aircraft in a sealed water tank and subjecting it to endless pressurisation
cycles until the fuselage lost structural integrity, thus proving the metal-fatigue
theory. All aircraft with a pressurised cabin manufactured since have rounded
windows.
On the organisational level things are similar. One particular accident,
the Grand Canyon Collision in 1956 (NTSB, 1957) laid the foundations of
2
Rumour has it that Sir Winston Churchill himself was personally involved in the enquiry
following the crashes.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 43
commercial aviation as we know it today. That particular accident, consisting
in a mid-air collision of two passenger airliners over the Grand Canyon, did not
involve any technical failure. The two airplanes were in perfect working order.
The causes were to be found in the very way flying was (or rather wasn’t)
organised at the time. Once outside of the immediate vicinities of the airport,
the responsibility for maintaining separation and avoiding collisions fell solely
on the flight crews. They were tasked with communicating their positions
among each other and negotiating with one another to ensure that they pass
at a safe distance. In case the radio communications failed, the only barrier
preventing collisions was the eyes of the pilots on the lookout for conflicting
traffic and the relative vastness of the skies. It was not before long that two
planes collided over the Grand Canyon. After the collision, public outcry put
enough pressure on government that flight safety became an issue at the very
highest political level. A monumental effort was undertaken to ensure that
such accidents do not occur in the future in the US leading, among other
things to the introduction of continuous radar tracking of flights, minimum
separation standards, mandatory flight corridors and the creation of the FAA3 ,
the US state regulator.
A major accident even kicked off voluntary incident reporting. The inves-
tigation of a crash in 1974, when a passenger jet flew into a mountain, found
out that the crew had misunderstood instructions from ATC4 (NTSB, 1975).
It also revealed that only six weeks prior to the accident, at the same location,
another aircraft had misunderstood the clearance and only narrowly avoided
the mountain. The airline had rushed to inform its own flight crews about
the danger but, due to a lack of an adequate feedback channel, other airlines
had not received any warning. The obvious avoidability of the accident led to
an agreement between the FAA and NASA5 in 1976 to create and operate a
voluntary confidential non-punitive reporting program called ASRS6 (§2.1.4).
ASRS is currently considered as one of the success stories in voluntary inci-
dent reporting and the model is being copied to other industries (Barach and
Small, 2000).
Figure 2.1 shows the main types of entities that participate in the system,
which we have arranged according to their proximity to the actual physical
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 45
processes involved with flying airplanes. At the very bottom are the individ-
uals doing the work (the operators): Pilots, technicians, ground crews, air
traffic controllers, airport staff etc. . . . Safetywise, they are tasked with con-
trolling the physical processes that they are responsible for: flying, controlling
maintaining. The arrows represent the flow of feedback information from the
level of operations to the higher levels of the system.
Operators are almost all part of a larger entity (the service providers):
Airlines, airports, ATC, etc. . . They are responsible for providing and ensuring
a safe working environment for their staff by crafting procedures and rules and
enforcing them within the relative perimeter.
On the government level, several distinct entities are involved in the safety
process. These are the national regulators, who are responsible for crafting
the rules that each service provider must oblige with as well as for enforcing
these rules. In France the DGAC is the national regulator, in the United states
it’s the FAA, in Canada it’s the Ministry of Transport (Transport Canada). At
the national level we also find the accident investigation authorities, like
for example the BEA (France), the NTSB7 (USA), the TSB8 (Canada) as well
as various programs designed for information exchange, such as voluntary
reporting programs.
Finally, as commercial aviation is not confined to within national borders
there are a number of entities that regulate and coordinate the activity on an
international level. The most notable are ICAO, and in Europe the EASA
and Eurocontrol, responsible for European ATC.
Generally speaking the entities closer to the bottom are the ones that
mostly produce feedback data and those closer to the top are the ones that
mostly consume it. Also, as data propagates from the bottom up, the higher
the entity, the more diverse data sources and data types it accumulates. The
DGAC for example collects data from all the service providers as well as
from the accident investigation authority and from other regulatory author-
ities through data exchange programs. Programs such as ASRS effectively
bypass the company level and aim specifically at collecting information from
the operators on the government level.
However information also propagates from top to bottom. This is called
dissemination. Accident investigation authorities publish their findings and
the information is consumed at lower levels, public data sources (in North
America) are maintained by the FAA and the Canadian authorities publishing
large amounts of data and, of course, everybody can read the specialised press
dedicated to incidents and accidents in aviation.
Besides the above-mentioned entities, there are a number of other actors
that produce and consume the data that we are concerned with. These are:
7
National Transportation Safety Board
8
Transportation Safety Board of Canada
46 2.1. PRODUCING OCCURRENCE DATA
Investigations can take anywhere from a few weeks to several years before
all the relevant data is gathered and analysed. The end result is an accident
report and often a change in regulation and/or policy aiming at introducing
barriers to the specific accident scenario being investigated.
9
http://www.ascendworldwide.com/
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 47
2.1.2.2 Report examples
An official accident investigation report details all10 the relevant facts about
the incident, provides information about the investigation itself, exposes the
findings of the investigation, determines the probable causes of the accident
and produces recommendations about how to improve the system.
• Factual part: The facts and circumstances of the accident are pre-
sented. Typically this part includes a detailed description of the event
unfolding and is a collection of factual data about the event. It is often
supplemented by a more thorough description of those aspects of the
event or of the circumstances that are relevant to the particular inci-
dent. If, for example, weather was a factor the weather conditions will
be exposed in detail.
• Analytical part: The part which presents the analysis of the investi-
gators based on the gathered facts. Based on those facts the sequence
of events that led to the accident is reconstructed and the causes deter-
mined.
Figure 2.2 represents the the table of contents of the NTSB’s report on the
Asiana flight 214 accident on July 06 2013 and shows the different sections.
10
Considered relevant by the investigating body
48 2.1. PRODUCING OCCURRENCE DATA
1. Factual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 History of Flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Injuries to Persons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Damage to Airplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
1.4 Other Damage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Personnel Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
1.6 Airplane Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Meteorological Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.8 Aids to Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.10 Airport Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
1.11 Flight Recorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.12 Wreckage and Impact Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.13 Medical and Pathological Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.14 Fire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
1.15 Survival Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.16 Tests and Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.17 Organizational and Management Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1.18 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
2. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2 Accident Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3 Flight Crew Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.4 Autoflight System Training and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.5 Pilot Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.6 Operations Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.7 Low Energy Alert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.8 Survival Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.1. Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.2 Probable Cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4. Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
The different parts have different rhetorical functions. Figure 2.3 shows
excerpts from the beginning of the document. There is an overall summary of
the accident as well as a very detailed chronological narrative of the accidental
sequence. These parts “paint” the overall picture and context and present the
facts as they were collected by the investigation authorities.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 49
On July 6, 2013, about 1128 Pacific daylight time,1 a Boeing 777-200ER,
Korean registration HL7742, operating as Asiana Airlines flight 214, was on
approach to runway 28L when it struck a seawall at San Francisco Interna-
tional Airport (SFO), San Francisco, California. Three of the 291 passengers
were fatally injured; 40 passengers, 8 of the 12 flight attendants, and 1 of
the 4 flight crewmembers received serious injuries. The other 248 passen-
gers, 4 flight attendants, and 3 flight crewmembers received minor injuries
or were not injured. The airplane was destroyed by impact forces and a
postcrash fire. Flight 214 was a regularly scheduled international passenger
flight from Incheon International Airport (ICN), Seoul, Korea, operating un-
der the provisions of 14 Code of Federal Regulations (CFR) Part 129. Visual
meteorological conditions (VMC) prevailed, and an instrument flight rules
(IFR) flight plan was filed.
Figure 2.3: Excerpts from the “Factual Information - History of Flight” (§1.1,
p. 19) section of NTSB/AAR-14/01
For high profile cases, such as the Asiana crash, the level of detail of
the facts can be very minute. Figure 2.4 is an excerpt from the “Personnel
Information” section where the morning activities of the pilot plying (PF) are
presented. Similar sections exist for all three members of the flight crew.
50 2.1. PRODUCING OCCURRENCE DATA
On Saturday, July 6, the PF woke about 0700 feeling rested. He went jogging,
returned about 0800, and ate breakfast. He took a bus to ICN about 0930,
arrived about 1030, and began preparing for the flight. The official show
time was 1510, but he met his instructor (the PM) about 1440, and they
began briefing for the flight. The PF had a cup of coffee when he arrived at
the airplane.
Figure 2.5: Excerpts from the “Analysis - Flight Crew Performance” (§2.5, p.
86) section of NTSB/AAR-14/01
Figure 2.6 shows a passage from the analytical section where a system is
described in detail. Their primary function is to define the objects that are
discussed in the report.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 51
The 777 was equipped with a low airspeed alerting system that was first
certified on the 747-400 in 1996 and then certified for the 777-200B in 1997.
This system, developed as a result of a safety-related incident reported by a
customer airline in 1995,101 was designed to alert flight crews of decreasing
airspeed to avoid imminent stalls. The system, which activates when airspeed
decreases 30% into the amber band, was not designed to alert crews that their
airspeed had fallen below Vref during approach. According to Boeing, the
triggering threshold for the low airspeed alert was selected to avoid nuisance
alerts during normal operations and to minimize them during intentional
operations at low airspeeds. Minimizing nuisance alerts is an important
consideration in the design of alerts because too many false alerts can increase
flight crew response times or cause crews to ignore alerts altogether.
Figure 2.6: Description of the “low airspeed alerting system" system (§2.7, p.
104) of NTSB/AAR-14/01
Finally figure 2.8 shows the recommendations section of the report. This
is a manifestation of the overall safety process. After the accident, the inves-
tigators invite the regulator (in this case the FAA) to craft new legislation as
well as different protagonists to reconsider certain aspects of their operations,
all in the object off never repeating the same accident.
52 2.1. PRODUCING OCCURRENCE DATA
All in all, official accident reports are the most comprehensive source of
information about a particular occurrence, and taken as a whole constitute
the repository of all that we have learned about why airplanes crash, during
almost a century of flying them.
Data-wise these documents present major challenges. They are intended
for “human consumption” only and are formatted accordingly. Mostly pub-
lished as pdf files they are difficult to exploit automatically. Even “simple”
tasks such as indexing in a full-text search engine (§2.3.1) require specific pre-
processing to gain access to text in machine-readable form. Furthermore the
quantity of (redundant) information they contain and the internal structuring
and formatting make it rather difficult to access the relevant parts.
Searching for reports where crew fatigue was a factor, for example will be
difficult using off-the-shelf search engines. Querying for the term fatigue will
bring far too many reports where the term “fatigue" itself is present without
being relevant to the task at hand.
Working with such reports, Thibert (2014) in his master’s thesis demon-
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 53
strated that even for (apparently) simple tasks such as the one mentioned
above keyword-based approaches are insufficient and factors such as lexical
ambiguity and negation need to be taken into account.
Internal structuring and formatting of such documents imply that differ-
ent sections have different rhetorical roles (Teufel and Moens, 2002). Some
are explanatory, some are argumentative, some are descriptive. Automatically
discerning the roles of a section has the potential to vastly improve perfor-
mance of information retrieval (§3.1) and text mining systems by targeting
the analysis on those parts of the document that are susceptible to contain
the relevant information, rather than on the whole document. If one is inter-
ested in extracting the sequence of events, fo example one would target the
(descriptive) “history of flight” section of a report. Exploring this possibility
Campello Rodrigues (2013) showed in his master’s thesis that automatically
discerning rhetorical function of the different parts is a feasible task.
We also used official accident reports as a resource for calculating simi-
larity between documents written in two distinct languages. In this system
we leveraged the redundancy of the present information and the fact that, in
Canada, accident reports are systematically published both in English and in
French, to provide a interlingual layer of processing for language independent
similarity calculation and achieved encouraging results (§6.3) (Tulechki and
Tanguy, 2013)
service.
The report in figure 2.9 represents the narratives from a preliminary report
in the Canadian CADORS database. CADORS is unique as they publish
information systematically in English and in French. In this report, we can
see how the information is progressively updated. The initial notification was
published on April 17 and an update giving some more information was added
to the record on April 22.
This collection is interesting in part due to the fact that documents are
systematically published in two languages (as are the reports from the Cana-
dian TSB), thus making them perfect candidates for building parallel corpora
(Véronis, 2000). We used this collection for evaluating the performance of a
system for detecting similarities across languages (§6.3).
15
Civil Aviation Daily Occurrence Reporting System
16
Accident and Incident Data System
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 55
[2015-04-17] A 2115828 Ontario Inc. Cessna 172N (C-GZTJ) from Vancouver
/ Boundary Bay, BC (CZBB) to Qualicum Beach, BC (CAT4) experienced
an engine failure while doing a VFR photo survey work over Texada Island.
The aircraft overturned while landing on a field. Two other aircraft working
with C-GZTJ circled overhead awaiting emergency personnel. Pilot was
able to walk to a farmhouse with minor injuries.
The report in figure 2.10 is an accident brief from the FAA’s AIDS database.
It is a summary of an accident presented in a concise manner. It is interesting
to note the concise writing style and the (deliberate?) choice to use all capital
letters, even though this particular occurrence dates from February 2015.
While it might seem trivial, the fact that the text is written in all capital
letters might pose a problem for tools that automatically identify sentence
boundaries (tokenisers §4.1.1) as they often rely on capitalisation as a cue to
determine where a sentence starts (Kiss and Strunk, 2006). The equivalent
56 2.1. PRODUCING OCCURRENCE DATA
CADORS has 200933 published reports on its website17 . The reports are in
the form of highly structured html pages (see fig. 2.20 for a screenshot). Since
April 2014, CADORS also provides a data feed by email where two xml files
(for English and French) are sent on a daily basis.
AIDS data is available on their website18 and 98865 reports are retrievable
through the web service.
19
we selected three for illustrative purposes.
58 2.1. PRODUCING OCCURRENCE DATA
Synopsis
Three pilots and three controllers reported an incident where, due to Con-
troller coordination issue, one air carrier started takeoff roll when another
was crossing the runway downfield, resulting in an aborted takeoff.
Narrative: 3
There was an unnecessary distraction in the Tower just prior to the event
that could have led to the near collision of the aircraft on the runway. For the
greater part of the afternoon we had in trail restrictions for our departures,
due to weather around our airspace. During this time, the Supervisor we
had in the Tower was letting the system work, and the Tower was quiet and
calm. This Supervisor was relieved by another Front Line Manager (FLM).
The first thing [the new FLM does is] to call Center, and ask for re-routes for
the departures for no reason. Like I said before, the system was working fine;
we weren’t delaying aircraft that could depart. However, in doing this, the
FLM was able to get one aircraft exempt from the 20 mile in trail restriction.
This aircraft had already taxied out, and in the Local East Bay. This drew
the attention of the Local East Controller, along with the Ground East Con-
troller. If their attention wasn’t diverted to this unnecessary coordination,
they would have been scanning better and possibly been able to stop this
event from happening.
Narrative: 5
On our taxi out to Runway XXL we were instructed to hold short of XXL
at Taxiway AAA which we complied with. We were then told to cross XXL
left on D full length, aircraft on XXL will be position and hold. I confirmed
with my First Officer cleared to cross, when we proceeded to cross I noticed
the aircraft was commencing the takeoff roll. I immediately added thrust
to expedite across and questioned Ground on the clearance. He initially did
not respond and then told us to go to Tower. We noticed the CRJ2 had also
aborted the takeoff.
Narrative: 6
We were instructed to line up and wait on XXL. After the preceding aircraft
rotated we were cleared for takeoff. We took a few seconds on the runway
to check for landing traffic and to finish the Takeoff Checklist. After a brief
delay of 4-5 seconds the pilot flying (Captain) pushed the thrust levers for-
ward and I was setting the thrust. Shortly after setting the thrust we both
noticed the CRJ7 at least half a plane length across the hold short line and
continuing to cross in front of us on Taxiway AAA. The Captain called for
the abort and initiated the aborted takeoff. Due to the close proximity of
the crossing aircraft, I applied the brakes as well. Soon after we had the
aircraft stopped, Captain was making an announcement to the passengers
and ATC was communicating with us. After the CRJ7 cleared the runway
we were instructed to turn right. It is hard to say how things could have
been done differently since the CRJ7 was still on Ground Control and we
could not hear ATC clearing them to cross XXL. The day VMC conditions
definitely allowed us to easily spot the crossing aircraft.
Given that ASRS capture data since the 1970s, the form of the reports
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 59
evolved considerably over time. In roughly the first two decades of its exis-
tence, the system imposed a particular writing style to the report narratives.
Rather than writing in standard English, the reports were keyed in using a
semi controlled and standardised language, making heavy use of abbrevia-
tions for common aviation terms such as ACFT for “aircraft” and WX for
“weather”. The reports were also written using only capital letters. Figure
2.13 shows an example of this writing style, along with its "translation".
Such reports present issues mainly due to their domain specific terms. A
search engine for example (§3.1) needs to be provided with a list of abbre-
viations and a specific normalisation layer (§4.1.2) in order to be capable of
retrieving documents employing such wording.
FLT (flight) WAS SBND (southbound) ON J-209 AND HAD BEEN CLRED
(cleared) TO FL390a BY A PREVIOUS CTLR (controller). OVER SBYb
VORc , CLIMBING THRU (through) FL360, TFC (traffic) WAS CALLED
BY ZDCd NBOUND (northbound) AT FL370 AND 4 MI (military) TFC
(traffic) WAS OBSERVED AND CENTER THEN HAD US DSND (descend)
TO FL350.
a
Flight Level 39000 feet
b
Salisbury–Ocean City–Wicomico Regional Airport
c
VHF Omni Directional Radio Range navigation system
d
Washington Air Route Traffic Control Center
Figure 2.13: Narrative of ASRS ASN45677 using the old writing style
Figure 2.14 shows the narratives of three reports from a large airline company’s
SMS system. For clarity we chose reports in English, but the database we
used contains a mix of both English and French. We can see the uncontrolled
writing style, non standard use of punctuation and first person wording.
The wording and (lack of) grammar employed in these reports makes all
but the most basic language processing very inefficient. Lack or non-standard
use of punctuation (such as the use of a semi-colon in place of a period for
separating sentences) makes sentence splitting difficult. Token identification
can also be tricky, given that terms such as “v/s” contain delimiter characters.
The third report is difficult even at the most fundamental level - that of iden-
tifying the language in which it is written, (vital information for all language
processors). In the same report we can also see a (not so uncommon) encoding
issue. The apostrophe is replaced by a question mark, probably during data
migration.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 61
REFUS DE REPONSE DE L’ATC A LA DEMANDE DE
ROULAGE.
Reports from service provider’s SMS is protected private data. It is not pub-
licly available. The formats differ and are usually custom build solutions
integrated in the service providers data management system.
it contains is factual and concise. The reports are collected from a variety21
of sources and are coded and checked internally before publishing.
2.1.6.3 Press
21
The provider does not wish to provide details.
22
http://avherald.com/
23
The interview is available on the website
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 63
Incident: Jetblue A320 near Amarillo on Mar 27th 2012, captain
incapacitated by panic attack
Finally, websites such as The Aviation Safety Network 24 also provide useful
data about incident and accidents. Not affiliated with any official organism,
this site is run by a non for profit organisation and aviation enthusiasts who
have compiled a comprehensive collection of occurrence reports, constantly
updated as new events occur.
Figure 2.17 is a narrative from an ASN report.
24
http://aviation-safety.net/
64 2.1. PRODUCING OCCURRENCE DATA
25
http://avherald.com/
26
http://aviation-safety.net/
27
http://aviation-safety.net/wikibase/
AND DATA
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
Collection Type Occurrence class Producer Addressee Purpose Published Edited Dynamic
Operator Aviation community Report error Corrections
ASRS Voluntary Reporting Incidents Institution Authorities Inform about danger Yes Deidentification No
Accidents
CADORS Regulator Incidents Institution Aviation community Inform Yes NA Yes
General public Inform
TSB Aviation community Explain
BEA Authorities Persuade
NTSB Investigators Accidents Institution Specific entities Rule Yes Proofread No
Accidents Aviation community Inform
Aviation Herald Press Incidents Private initiative General Public Explain Yes Autopublication Yes
Aviation community
ASN Press Accidents Private initiative General Public Inform Yes Autopublication Yes
Community
ASN Wikibase Wiki Accidents (user generated) Inform Aviation Community Yes Autopublication Yes
Accidents Multiple Regulator Multiple
DGAC Regulator Incidents (aggregation) Company (aggregation) No Raw Yes
Businesses
ASCEND Data Provider Accidents Institution (commercial) Institutions Inform Limited NA NA
Report error
Inform about danger
Mandatory report
Internal SMS SMS Incidents Operator Management Express opinion No Raw No
65
66 2.1. PRODUCING OCCURRENCE DATA
The boxplots in figure 2.18 show the distribution of document size across
the database. The whiskers show the extremes. First of all we can see that,
besides the official accident reports of the BST (1248 words on average), doc-
uments tend to be relatively short, about 100 words for SMS and DGAC and
200 words for ASRS and AvH on average.
In every collection, there tend to be some much longer documents. Both
the DGAC’s and The Aviation Herald’s databases contain reports of over
4000 words, corresponding to high profile accident investigations for which
a lot of information is generated. In both cases this is not surprising. The
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 67
DGAC collect occurrences from a variety of sources and thus work both with
very succinct incident reports coming from service providers and long accident
reports coming from the BEA and other investigative authorities through data
exchange channels. The Aviation Herald reports on accidents and depends on
the amount of information their staff is able to gather. Sometimes they work
with accident briefs and sometimes they report all through the investigation
process of a major crash, thus accumulating vast amounts of data.
Reports from the SMS tend to be relatively stable with only a few docu-
ments longer than 500 words and so do ASRS’s. This stability is due to the
fact that these databases are constituted by a single unique process and thus
the reports are comparatively homogeneous.
We can also distinguish between the collections according to the following
characteristics:
• Writing style: Writing style also varies across reports. Official accident
reports (§2.1.2.2) are formal documents written carefully and proofread
before publication. Other cases, such as first person narratives written
(or typed into an i-pad) on the fly on a cockpit table (such as the example
in fig. 2.14) exhibit non standard use of punctuation, spelling mistakes
and a mix of standard and very technical language. Other cases, such
as early reports from ASRS (fig. 2.13) and AIDS reports (fig. 2.10)
use all capital letters. Early ASRS reports also use a standardised set
of aviation abbreviations, a remnant from the times when screen real
estate was a scarce commodity.
exhibit weak structure as they have zones with different functions, but
no explicit signalling such as headings.
All in all occurrence data can take many forms and serve various initial
functions. One thing is common to all reports - they might carry very valuable
information. We will now see the various storage solutions that help organise
this data and give access to the information it contains.
AND DATA
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
Collection Corpus size (nb docs) Mean length (nb. words) Informational content Writing style Structure Multimodality Language
First person
subjective abbreviations multiple narratives
ASRS 167,000 254 partial CAPS organised by taxonomy None English
third person English
CADORS 200,933 ≈ 150 partial formal None None French
TSB TSB: 1093 images
BEA BEA: 2432 formal diagrams English
NTSB NTSB: - 1249 (TSB) exhaustive third person semi-structured documents tables French
images
diagrams
tables
videos
Aviation Herald 13,090 214 brief to exhaustive formal, third person weakly-structured hyperlinks English
images
diagrams
tables
videos
ASN 19,127 NA (short accounts) brief formal, third person None hyperlinks English
images
diagrams
tables
videos
ASN Wikibase 162,336 NA (very short accounts) brief formal, third person None hyperlinks English
French
DGAC 443,181 106 mixed mixed None unstructured some English
informal
subjective first person French
SMS NA 92 partial use of abbreviations None unstructured English
69
70 2.2. STORING AND ORGANISING OCCURRENCE DATA
Similar to the example from the previous section, reports from CADORS
also provide metadata. In Figure 2.20 we can see the same sort of factual
information as in the ASN report, but also a set of descriptors of the accident
in the form of occurrence categories and a list of discrete events in the “aircraft
events” part. These are coded by CADORS staff upon analysis of the event
and represent analytical metadata.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 73
ACN: 1002555
Time / Day ASRS Report Number.Accession Number :
1003030
Date : 201204 Human Factors : Distraction
Local Time Of Day : 1801-2400
Person : 4
Place
Reference : 4
Locale Reference.Airport : ZZZ.Airport Location Of Person.Aircraft : Y
State Reference : US Location In Aircraft : Flight Deck
Altitude.AGL.Single Value : 0 Reporter Organization : Air Carrier
Function.Flight Crew : First Officer
Environment Function.Flight Crew : Pilot Flying
Flight Conditions : VMC Qualification.Flight Crew : Commercial
Light : Daylight ASRS Report Number.Accession Number :
1003083
Aircraft : 1 Human Factors : Communication Breakdown
Communication Breakdown.Party1 : Flight Crew
Reference : X Communication Breakdown.Party2 : ATC
ATC / Advisory.Tower : ZZZ
Aircraft Operator : Air Carrier Person : 5
Make Model Name : Regional Jet 200 ER/LR
(CRJ200) Reference : 5
Crew Size.Number Of Crew : 2 Location Of Person.Aircraft : Y
Operating Under FAR Part : Part 121 Location In Aircraft : Flight Deck
Flight Plan : IFR Reporter Organization : Air Carrier
Flight Phase : Takeoff Function.Flight Crew : Captain
Route In Use : None Function.Flight Crew : Pilot Flying
Qualification.Flight Crew : Air Transport Pilot
Aircraft : 2 (ATP)
ASRS Report Number.Accession Number :
Reference : Y 1003084
ATC / Advisory.Ground : ZZZ Human Factors : Communication Breakdown
Aircraft Operator : Air Carrier Communication Breakdown.Party1 : Flight Crew
Make Model Name : Regional Jet 700 ER/LR Communication Breakdown.Party2 : ATC
(CRJ700)
Crew Size.Number Of Crew : 2 Person : 6
Operating Under FAR Part : Part 121
Flight Plan : IFR Reference : 6
Flight Phase : Taxi Location Of Person.Aircraft : X
Route In Use : None Location In Aircraft : Flight Deck
Reporter Organization : Air Carrier
Person : 1 Function.Flight Crew : First Officer
Function.Flight Crew : Pilot Not Flying
Reference : 1 Qualification.Flight Crew : Commercial
Location Of Person.Facility : ZZZ.Tower ASRS Report Number.Accession Number :
Reporter Organization : Government 1005043
Function.Air Traffic Control : Local Human Factors : Communication Breakdown
Qualification.Air Traffic Control : Fully Certified Communication Breakdown.Party1 : Flight Crew
ASRS Report Number.Accession Number : Communication Breakdown.Party2 : ATC
1002555
Human Factors : Human-Machine Interface Events
Human Factors : Situational Awareness
Anomaly.ATC Issue : All Types
Person : 2 Anomaly.Conflict : Ground Conflict, Less Severe
Detector.Automation : Air Traffic Control
Reference : 2 Detector.Person : Flight Crew
Location Of Person.Facility : ZZZ.Tower When Detected : In-flight
Reporter Organization : Government Result.Flight Crew : Rejected Takeoff
Function.Air Traffic Control : Supervisor / CIC Result.Air Traffic Control : Issued Advisory / Alert
Qualification.Air Traffic Control : Fully Certified Result.Air Traffic Control : Issued New Clearance
ASRS Report Number.Accession Number :
1002968 Assessments
Person : 3 Contributing Factors / Situations : Human
Factors
Reference : 3 Primary Problem : Human Factors
Location Of Person.Facility : ZZZ.Tower
Reporter Organization : Government
Function.Air Traffic Control : Other / Unknown
31
European Aviation Safety Agency
32
The FAA distribute occurrences in an outdated ECCAIRS format (e4f) via the ASIAS
website (http://www.asias.faa.gov/).
78 2.2. STORING AND ORGANISING OCCURRENCE DATA
Most interesting are the analytical branches of the taxonomy used to model
the accident or incident scenario. The Occurrence Category branch provides
a high-level description of the corresponding event. In theory, every event
can be reliably categorised using one or more of the 36 labels. A consistently
labelled database would allow safety experts to examine trends and statistics
based on the labels, as well as filtering incident searches by label. Like the
rest of the ADREP taxonomy, the labels themselves are normalised and are
associated with a set of conditions that describe when they should be used.
Table 2.3 shows the list of possible values.
3.2 Probable Causes The accident was due to the following causes:
• High-speed passage of a tyre over a part lost by an aircraft that had
taken off five minutes earlier and the destruction of the tyre.
• The ripping out of a large piece of tank in a complex process of trans-
mission of the energy produced by the impact of a piece of tyre at
another point on the tank, this transmission associating deformation
of the tank skin and the movement of the fuel, with perhaps the con-
tributory effect of other more minor shocks and /or a hydrodynamic
pressure surge.
• Ignition of the leaking fuel by an electric arc in the landing gear bay or
through contact with the hot parts of the engine with forward propa-
gation of the flame causing a very large fire under the aircraft’s wing
and severe loss of thrust on engine 2 then engine 1.
In addition, the impossibility of retracting the landing gear probably con-
tributed to the retention and stabilisation of the flame throughout the flight.
Figure 2.25: Probable causes section of BEA report on the Concorde crash
Each event from the sequence is a set of assembled attributes. The fourth
80 2.2. STORING AND ORGANISING OCCURRENCE DATA
• The Event Type: (“Aircraft wing related event” in blue) comes from
a four level hierarchy listing all possible events that can occur on an
aircraft.
• The Flight Phase:(“during Take-off run” in blue) comes from a sepa-
rate list and specifies at which point of the flight the particular event
occurred.
• A cross reference to the Aircraft (“F-BTSC” in blue) specifies the regis-
tration of the aircraft concerned by the event (in this case the Concorde)
• The Descriptive factor (“Wing plates/skins” in red) further specifies
the event by providing the part of the wing that was affected
• The Explanatory factor (“Aircraft manufacturing design staff” in
black) specifies which elements of the system should be addressed in
order to correct the problem.
The ADREP taxonomy has proven to be very useful when used correctly,
facilitating data exchange and providing a common frame of reference when
speaking about incidents and accidents in aviation (Stephens et al., 2008).
However most of the time, fine-grained categorisation is simply not avail-
able, as in the case of the DGAC database we are working with, where only a
third of the occurrences are coded with the occurrence category, and even less
for more precise information such as event types, which is the main branch in
ADREP for abstracting information about the precise sequence of sub-events
that occurred.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 81
Acronym Term and detail
ARC Abnormal runway contact - Any landing or takeoff involving abnormal runway
or landing surface contact.
BIRD Birdstrike - Occurrences involving collisions / near collisions with bird(s) /
wildlife; A collision / near collision with or ingestion of one or several birds.
CFIT Controlled flight into or toward terrain - Inflight collision or near collision
with terrain, water, or obstacle without indication of loss of control.
CTOL Collision with obstacle(s) during take-off and landing - Collision with ob-
stacle(s), during take-off or landing whilst airborne.
F-NI Fire/smoke (non-impact) - Fire or smoke in or on the aircraft, in flight or on
the ground, which is not the result of impact.
GCOL Ground Collision Collision while taxiing to or from a runway in use.
LOC-I Loss of control - inflight Loss of aircraft control while or deviation from intended
flightpath inflight.
MAC Airprox/ ACAS alert/ loss of separation/ (near) midair collisions Airprox,
ACAS alerts, loss of separation as well as near collisions or collisions between
aircraft in flight.
RAMP Ground Handing Occurrences during (or as a result of) ground handling opera-
tions.
Primary categories
LOLI Loss of lifting conditions en-route Landing en-route due to loss of lifting con-
ditions.
UIMC Unintended flight in IMC Unintended flight in Instrument Meteorological Con-
ditions (IMC).
GTOW Glider towing related events Premature release, inadvertent release or non-
release during towing, entangling with towing, cable, loss of control, or impact
into towing aircraft / winch.
EXTL External load related occurrences Occurrences during or as a result of external
load or external cargo operations.
MED Medical Occurrences involving illness of persons on board the aircraft.
NAV Navigation error Occurrences involving the incorrect navigation of aircraft on
the ground or in the air.
UNK Unknown or undetermined Insufficient information exists to categorize the
occurrence.
OTHR Other This category includes any occurrence type that is not covered by any other
category
83
84 2.3. USING OCCURRENCE DATA
A flight manager for an air carrier notes that the number and severity of
runway incursions at several major airports his air carrier services appear to
be down over the past several years. He feels that the runway safety training
his and other airline conduct, and the work of FAA’s Office of Runway Safety
has had a positive impact. Reviewing his airline’s training material he de-
cides to update the runway safety training material with more recent ASRS
Database examples of runway incursion incidents from the past 4 years.
the flight manager notes that there is a wide spectrum of causal and con-
tributory issues in this data set. He really wants to focus on incidents where
confusion or misunderstanding played a role, so he modifies his search strat-
egy.
Step 2: Do not change any of the values for Date of Incident, FAR Part,
Location, or Event Type. Add the following text search terms:
Text = Confus% OR Misunder%
Note: The “%” symbol will find all words where the text begins with what
was entered, i.e., “Confus%” will find “Confusion,” “Confused,” etc. The
“OR” operator will surface records that reference any of these terms. Make
sure both “Narrative” and “Synopsis” are checked.
This example shows how metadata and full-text search are combined in
order to answer the information need of a user. Given the complexity of the
query, GUI34 solutions need to be adapted to the query. Figure 2.26 shows the
query-builder on the ASRS website. A query is formulated by first selecting
the entities one is interested in. Then, for each entity, a separate pop-up
window appears in which the user either chooses the value from a list or types
in a string.
In the next chapter (§3.1) we will discuss solutions to exactly this type of
issues from the field of information retrieval.
Figure 2.27 (DGAC, 2013, p. 12) shows that there are fewer fatal accidents,
both per million departures (grey line) and per billion kilometres travailed
(green line) as well as fewer individual fatalities per billion kilometres (black
line).
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 87
Detailed breakdowns for different cross sections of the industry are also
provided. Figure 2.28 (DGAC, 2013, p. 30) gives a typology of accidents in
2013 of general aviation aircraft registered in France. The criterion used is
the occurrence category of the ADREP taxonomy (§2.2.3.5). One can see, for
example, that most fatal accidents result from loss of control in flight35 , or
that abnormal runway contact accidents36 , while being of the most frequent
type did not cause loss of life.
35
perte de contrôle - en vol
36
contact anormal avec la piste ou le sol
88 2.3. USING OCCURRENCE DATA
This type of use essentially bridges the previous two. Starting from a
specific incident the expert will query the collection in order to find others
that resemble it. He will then perform some measure of aggregation, whether
purely statistical (as in counting) or informational, where he will interpret and
summarise his findings. It was already question of such a use in Section 1.3,
where experts trace a series of related events to a common source.
This type of use relies the most on the human expertise involved, as inter-
pretation by an expert is an integral part of the process. One should however
seek to provide them with the best tools suited for the job. This reasoning
2.4. ISSUES WHEN DEALING WITH LARGE COLLECTIONS OF
90 OCCURRENCE DATA
motivated us to research and develop the timePlot system for identifying sim-
ilar reports, which we present in Chapter 5. Identifying the particular ways
in which individual occurrences relate one to another is a viable solution to
aid experts to discover trends.
Intelligence and monitoring, aided by heavily automated components will
surely be part of the systems of the future. We can not but recall here one
of the original (and overambitious, as is often the case) goals of this thesis -
building automatic monitoring systems. Factoring in, on the one hand the dy-
namic nature of activities, one can not only look for trends, but to monitor in
their development over time, identifying relevant occurrences as they happen.
From here it is a small step to imagine anomaly detection components that
automatically identify trends as they start developing, based on disturbance
of the temporal distribution of events in the information flow. In our case, we
quickly came to realise that such systems will only be possible once we find
ways to deal with the noisiness of the data in order to provide inputs of suf-
ficient quality. Thus our focus shifted to the preconditions. We nevertheless
continue to consider automatic monitoring systems as a long term objective
and direction of future research.
Fuelling this use of incident data is a general state of affairs worth mention-
ing. We have dubbed it the ‘collective hindsight bias. Hindsight bias is “is the
inclination, after an event has occurred, to see the event as having been pre-
dictable” (Roese and Vohs, 2012). In the case of high profile accidents there is
a comparable effect on a social level. The mix of emotions focus attention on
the one hand to the experts responsible for preventing the disaster and on the
other prompt “all available hands” go sifting through the record to identify a
trend or a signal that should have alerted someone. With enough manpower
someone always finds a signal, a trend that lead to the catastrophe. If it holds
at least a little bit of credibility this is picked up by mainstream media. In
the but rarest of cases is the trend a genuine type 2 error, but nevertheless
the institutions have no choice but to spend valuable resources explaining or
debunking the hypothesis. The fear of such a situation puts extreme pressure
on safety experts not to miss a pattern in the data.
When incident and accident reports start to pile up, they become notori-
ously difficult to exploit without the adequate tools. Full-text search engines
are becoming ubiquitous and many off the shelf solutions exist, yet as the
example in the previous section (§2.3.1) illustrates even serious institutions
such as NASA, maintaining the ASRS database struggle to provide full-text
search capabilities that take into account the inherent variability of natural
language even at a basic level.
Official accident reports, most often published in pdf format are sometimes
impossible to query without processing and some publishers only provide the
most basic keyword search capabilities.
The NTSB website’s integrated search for example only allows to search
for contiguous strings of words in the text. It does not account for even basic
variation such as plurals, provides no term highlighting and searches via a
sequential scan of their whole database at each query, considerably slowing
down the process.
Obviously, producing reliable statistics from such reports is also impossible
without first manually classifying them into whatever aggregation criterion one
is looking for. If no metadata is present, in order to produce the example from
the previous section (fig. 2.28), an expert would need to comb through all the
reports and classify them within an occurrence category schema.
Looking for patterns and monitoring the system also would require reading
all incoming reports and manually tracking them.
Multilingual databases also pose their unique set of problems. While En-
glish is the lingua franca of aviation, other languages are sometimes also used.
This is an issue when maintaining large databases of incident reports and seek-
ing to access their content. Large national airline companies often collect data
in both English and the local language in their internal reporting program as
many of the pilots are not native or bilingual. Accessing the data is prob-
lematic and we witness different in house solutions developed for convenience,
such as translating the titles of the reports so that they are all in the same
language (fig. 2.14).
For the above mentioned reasons, in practically all cases some classifica-
tion is performed. Relying on taxonomies though comes with its own set of
problems.
2.4. ISSUES WHEN DEALING WITH LARGE COLLECTIONS OF
92 OCCURRENCE DATA
2.4.2 Issues with coded data and taxonomies
As we saw in the previous chapter, accurately describing an occurrence is a
complex and expertise-intensive task. Even the simplest of incident reporting
system may have tens of fields for factual data and at least one set of high-
level categories. More complex ones such as ICAO’s ADREP taxonomy used
in ECCAIRS have thousands of fields describing every aspect of the occur-
rence. Such systems are designed with scale in mind. The initial ambition of
these systems is that, by using taxonomies, they will constrain every possible
occurrence within a predefined set of possible values with minimum loss of
information. “An uncoded occurrence is a lost occurrence. It is unusable.
It is just dead weight in the database” once told us an expert working with
ECCAIRS data sets, meaning that the quality of coding is paramount to the
success of data-based risk assessments. In reality however, this is rarely the
case for various reasons.
• All incidents will be reported worldwide on time with all the information
about the occurrence will be reflected in the report.
• Anyone who is interested will have access to this body of information.
• The data will be stored in a single common format.
• The data will be organised (via a taxonomy) and indexed in such a way
so that a query could be formulated to (fully or partially) describe any
accident or incident scenario, complete with any level of detail regarding
the occurrence’s context.
• Any grouping on any criterion could be performed in order to produce
aggregations and perform quantitative analysis such as statistics trends.
Basically the utopia boils down to two points: Collecting all relevant data
(which is far from the scope of this work) and providing powerful means for
accessing the information it contains. In a way the second part is already
being addressed by the ADREP taxonomy, which (while far from perfect) in
37
Extract Transform Load
38
There are cases where narratives are split into different sub fields, each with a different
discursive function that need to be collated into a single one. When gathering ASRS reports
(2.1.4) for example the host solution might not have the structure needed to accommodate
multiple narratives and a separate synopsis.
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 95
itself remains a considerable advance with respect to most other industries.
In theory every accident could be coded in ECCAIRS using ADREP and
then be available to be part of the answer to whatever safety related question
somebody asks. In practice this is not the case. The example we saw in
section 2.2.3.5 on the perfectly coded Concorde crash, came from a PowerPoint
presentation of the tool. In reality we have never seen such a well coded
occurrence in all the years of working with such data. The system simply
does not scale without investing unbelievable resources for consistent manual
coding.
What ADREP really attempts is to normalise this information in a schema
sufficiently abstract to be generalised over a large collection, but yet suffi-
ciently precise to capture most of the intricacies of speaking about flying
airplanes, hence its complexity. However one tends to forget that most of the
information available in an incident or accident report is also contained in the
text. Just compare the natural language and ADREP versions of the Con-
corde crash (figs. 2.24 and 2.25. Almost all the information from the ADREP
version is in the probable cause statement.
So what is it’s status of natural language narratives in the present?
For one, efforts at developing and maintaining taxonomies and adopting
coding-based solutions has had the effect to occult solutions aimed at exploit-
ing the narrative parts. The predominant rhetoric seems to be that (espe-
cially for incidents) natural language accounts are for human consumption
only. They are read at the time of collection and screening, but once the
process is over and the immediate actions taken, the report is archived in a
database. The text only becomes useful if a human reads it once more. But
ironically, natural language accounts are the most resilient bit of data in a ac-
cident report as it flows through the system. They are immune to bottleneck
effects, to format changes and to taxonomy incompatibilities. and they are
ubiquitous - 99% of accident and incident reports have narratives.
So the question we ask is: how, by considering report narratives as
an input material, can we provide safety experts with better access
to the information that incident and accident reports vehicle?
On one hand, if we follow the current trend in the industry, we might
consider that taxonomy based approaches are the only way to go and then
search for methods that help to more efficiently code and maintain the data
using narrative parts as input material for automatic classifiers. We discuss
such work in section 3.2.
On the other hand, we could consider that narratives are all that is needed.
To put it in other words, to declare that taxonomy based approaches are a
failure and start researching how to build the most powerful full-text search
engine to replace them.
Balancing between these two extremes we mostly explored the relationship
between text and metadata in an empirical manner. In the next part we will
show that, by considering natural language as raw input material to various
96 2.5. SUMMARY OF THE ISSUES AND NLP AS A SOLUTION
• The first and foremost need we identified with safety experts is to iden-
tify patterns in the data (§1.3, §2.3.3). Out first take on the subject
was to build a system detecting similar occurrences in large incident
databases. In Chapter 5 we describe a system used to detect similar
occurrences in large incident databases. Today the system is used in a
large airline (§2.1.5) and at the DGAC. A demonstration version, soon
to become a commercial product is also available for ASRS (§2.1.4 and
The Aviation Herald (§2.1.6). The system is used also as a full text
search engine, providing a much simpler and more intuitive solution than
ECCAIRS and the search engines provided by the web services of the
data providers. Furthermore, willing to explore the taxonomy/text in-
formational redundancy, we experimented using the DGAC’s database,
looking to neutralise aspects of variation already captured by ADREP
Occurrence Categories in order to find reports that are similar for rea-
sons absent in the original coding. This work is presented in Section 6.2
and is a first step towards devising methods that search for secondary
patterns, hidden by the primary aspects of variation.
• Multilingual databases are an issue for large service providers and for
entities that collect incident information from different sources (§2.4.1).
SAFETY INFORMATION IN CIVIL AVIATION: ACTORS, MODELS
AND DATA 97
In Section 6.3 we present a system inspired by Cross Lingual Explicit
Semantic Analysis (Sorg and Cimiano, 2012) in order to detect similar re-
ports written in different languages (English and French) we use the data
provided by the Canadian accident investigation authority (§2.1.2.2) and
evaluate it on the CADORS database (§2.1.3).
This chapter presents the domains of Information Retrieval and Text Cate-
gorisation. Each section is organised by first presenting the domain and the
key concepts, before discussing the specific implication of their application to
occurrence data.
99
100 3.1. INFORMATION RETRIEVAL
• The Boolean model: This is the earliest and most simple framework
for IR. Manning et al. (2008) introduce by the problem with a mock
example:
• The vector space model: Unlike Boolean retrieval, which is very pre-
cise, vector space (Salton et al., 1975) modelling allows to quantitatively
represent the relationship between documents and queries and the attri-
bution of a score of relevance to the returned results. Thus the results
can be ranked according to how well they resemble the query (and an-
swer the information need) Vector space modelling is the predominant
paradigm in IR today and will be discussed in detail in the following
chapter (§4.2).
on the specific search task, many other display modes are possible and have
been applied (Kules et al., 2008).
3.1.1.4 IR performance
A perfect IR system will return all the relevant documents to a given query and
only those documents. However this is rarely the case. There is always some
irrelevant documents in the results (noise) as well as some relevant documents
are missing from the results (silence). Combining the notions of relevance and
presence in the results, the documents during a given search session can be
split into four categories:
• True Positives: Documents that are relevant and are returned by the
system. (expected)
• False Positives: Documents that are not relevant and are returned
by the system. (noise)
• False Negatives: Documents that are relevant and are not returned
by the system. (silence)
• True Negatives: Documents that are not relevant and are not re-
turned by the system. (expected)
The higher both values are the better the system performs.
We will now use an example and present some of the issues that arise when
searching information in natural language documents.
a user’s point of view the process looks simple. Just type in what they are
looking for. However, the relationship between the query and the document
is not always that simple. Let us walk through a real example of a search
scenario and look at the basic language-related issues.
Imagine the following information need:
Figures 3.2 and 3.3 are excerpts from documents that describe wet run-
way excursions. Both are relevant (true positives) and satisfy the Boolean
constraints.
There are however reports that do not satisfy the constraint and yet satisfy
the user’s information need. An example is the report in figure 3.4, which is
a false negative and part of the silence.
NLP: DOMAINS OF APPLICATION 105
[. . . ] While turning onto the base leg for runway 20 the stick shaker
briefly activated due to turbulence, the autopilot disconnected, the airplane
pitched up to 12.5 degrees nose up and rolled to a 43.5 degrees left bank
before the crew was able to regain control. [. . . ] When the autopilot
disconnected she put her hands back onto the control wheel and felt the
stick shaker for a moment. She attributed the following attitude excursions
to turbulence.
Figure 3.5: AH report 42cbc93c
Finally the example (false positive) in Figure 3.6 concerns a report that
makes reference to another report. The search terms are situated in the ref-
erence and not in the main text. It illustrates how discourse structure can
have an impact on the relevance of the results.
1
For this example we simplify the information need and omit “wet” from the criteria.
106 3.1. INFORMATION RETRIEVAL
The runway was wet with runway markings hardly recognizeable, re-
surfacing work was in progress. [. . . ] [D]uring roll [ the aircraft ] ran over
a sand bag holding a cable for the temporary runway edge lights causing
damage to a tyre, that did not deflate however. [. . . ] Following another
similiar incident new NOTAMs were released on Mar 11th clarifying the
work in progress and runway modifications, see also Incident: Travel Service
B738 at Lajes on Mar 10th 2011, runway excursion on landing.
as Finnish, German and Turkish exhibit a very large number of variants for
every given base form.
Not taking into account morphological variation in an IR system hurts
recall as documents containing variants of the query terms are not retrieved.
Morphological variation can be taken into account in two2 ways in IR
systems, one linguistic and one non-linguistic:
• One can might employ a lemmatiser and derive a base form for all the
variants which will then be used to match them. Plural nouns will
be folded to the singular, gender to the masculine and verbs to their
infinitive form.
Both methods have their pros and cons. Stemming is error-prone but ro-
bust. A stemmer can be applied to any surface form. Because it relies only
on surface forms however, it may incorrectly stem some words. “Organisa-
tion" for example will become “organ", or in French “laser" will become “las"3
Lemmatisation on the other hand is costly, as it needs a resource that lists
all the variants and their common root. Moreover, in order to be correctly
performed the lemmatiser needs to know the part of speech of every term.
For this reason, the lemmatisation is associated with POS-tagging, the pro-
cess that derives parts of speech. For occurrence data, with all the domain
vocabulary, one practically has to build such a resource form scratch. Costly
and difficult as it is, POS-tagging and lemmatisation are however an essen-
tial step if one wants to perform more complex processing such as syntactic
parsing.
2
Actually there are three. One might list and add all the variants of all the terms to
the query. This is known as query expansion but it is terribly inefficient both to perform at
every query and to maintain an index with unnormalised terms.
3
This particular example plagued us in the timePlot system. The stemmer treats the
string “laser" as the infinitive of a French verb from the first group. Given that laser pointers
are a current subject of interest (5.3.3) and LAS is the airport code of Las Vegas, the system
returned a lot of false positives due to this stemming error, for which we had to account by
introducing lists of stemming exceptions.
108 3.1. INFORMATION RETRIEVAL
• Spelling errors.
We strongly suggest that for better text search results you use
as many variations of a word as possible including its abbreviation.
For example, if a user is looking for reports that reference the word
"takeoff" in a reports’ body of text, the terms/words "tkof," "take
off," and "take-off," should also be included in the search strategy
in order to obtain the best possible results.
There are two ways to tackle the problem of lexical variation, a symbolic
and a statistical approach:
for the unexpected. This is the case when sifting through data on occurrences
and just looking for “something out of the ordinary”. Thus a usage scenario of
a purposefully designed IR system is also to allow easy browsing through the
data, while providing the users with cues to potential patterns. The timePlot
system, which we present in Chapter 5 was initially aimed at just such a usage
scenario.
and complex queries based on the initial results is common. What pro-
cessing methods and what interface design choices can assist the users
in understanding the data and the results presented?
et al., 2013) have made TC one of the more active fields in contemporary
NLP.
TC is studied for at least 60 years (Maron, 1961) and there are basically
two ways to attack the problem. The first is the knowledge engineering ap-
proach where expert knowledge is directly encoded in the system. In other
words, manually produced rules or heuristics are applied to the documents
and based on these rules the systems chooses one category or another. The
other approach is the machine learning approach, where a classifier function
is built inductively from a representative set of already classified examples.
Although useful in some highly controlled environments, the knowledge en-
gineering method requires considerable resources and expertise. This is why
most of the research today is centred on machine learning methods.
The problem can be formally defined as approximating an assignment func-
tion F : D × C → {0, 1}, where D is the set of documents and C the set of
classes. F (d, c) = 1 if one particular document d belongs to the category c.
The classifier is the approximating function M : D × C → {0, 1}. The clas-
sifier needs to be as close as possible to the assignment function F (Feldman
and Sanger, 2007).
Three aspects of statistical TC are important: the nature of the classifi-
cation problem, the choice of classifier and how to represent documents.
hard work. Often, when a given system has acceptable performance one sticks
to it.
The different systems commonly used in TC applications are:
We will not discuss the different classifiers further. Our personal position
on the subject is that any classifier that does the job well is a good choice.
It is worth mentioning however that the models produced by some classifiers
are less opaque than others. In some cases one might be interested in the
exact reasons a given classifier produces a given outcome. Decision trees are
by definitions easy to interpret as the model they build is a hierarchical struc-
ture of binary decisions based on a single feature. They are suited for more
exploratory approaches. SVMs on the other hand are notoriously difficult to
interpret and they are suited for result-oriented approaches.
• Missing metadata is the case when new reports simply do not have coded
values and the system assigns one to them automatically.
The first issue that arises with applying TC to this problem is the defini-
tion of the classification problem. There is only a tiny subset of all meta-data
attributes that are classifiable in a straightforward manner. Rare are those
attributes that are only simple lists of mutually exclusive values. The flight
phase 7 is such an example. More common are multilabel classification prob-
lems such as the ADREP occurrence categories (§2.2.3.5) whose classification
we will discuss in the next section. Hierarchical categories are also common.
Another particularity is that, as metadata is organised in taxonomies,
there is almost always some internal structure and domain specific logic and
rules in its application. The occurrence categories are for example (in theory)
divided in primary and secondary categories. A document must have one
of 15 primary classes and may have one of 21 secondary categories. Thus
the question is more complex than a multi-class TC problem (even though
we treat it as such in the example in the next section). One more aspect of
using complex taxonomies is the fact that there are more than one metadata
attribute to classify and that informational redundancies can be established
between the different branches. An aircraft can not experience an abnormal
runway contact 8 during Approach. Exploiting these links between different
branches can potentially produce more coherent classifications.
Lastly and most importantly, the nature of the objects we classify are not
documents in the strict sense. TC as it is defined is just that - assigning
categories to texts or documents. With occurrence data however the object
is not a text, it is a record of an event. Except with official accident reports
(§2.1.2.2), text only vehicles some information about the occurrence. Other
information is only present in metadata attributes. Thus mixed approaches
need to be researched in order to combine the two complementary sources of
information.
Data sparsity is another issue. Usually categories are unevenly distributed,
as we will see in the next section.
For a mix of the above stated reasons, some parts of the metadata are
simply too complex to be approached via TC techniques. With their high
level of structuring, the Events of the ADREP taxonomy for example are a
very hard problem for simple TC. Even if we look at them as a TC problem
(that is that an Event is a label for the occurrence) the sheer number of
values makes and sparsity of the data makes efficient categorisation highly
improbable. The high-level of structuring (§2.2.3.5, Figure 2.24) however is
coherent with a knowledge modelling approach.
7
The flight phase simply denotes the different stages of a flight: standing, push-back,
taxi, take-off, climb, cruise, approach, landing, taxi, standing.
8
ARC occurrence category
116 3.2. AUTOMATIC TEXT CATEGORISATION
3.2.3.1 Context
The database currently consists of 404,289 occurrence reports from 2004 until
September 2014. Among these, only one third are labelled with at least one
occurrence category. The corpus used in the study thus contains 136,861
documents, which amount to a total of 15 million words.
NLP: DOMAINS OF APPLICATION 117
3.2.3.4 Results
Having a close look at the results, we found obvious inconsistencies in the
original coding. One of the errors we identified was a common confusion
between some of the categories and the OTHR10 category. When looking
through the errors concerning the RAMP 11 category we identified that events
concerning spillage of fuel while refuelling were (correctly) classified by the
tool as RAMP events, while in the training corpus, roughly one out of five12
such events had been attributed the OTHR category.
Table 3.2 shows detailed results of the classifier’s performance for various
categories. It appears that our classifier gets very good results (with a pre-
cision exceeding 90%) for several categories, among which we can find some
that are very frequent. For ATM and BIRD, both relatively frequent cat-
egories, the classifier performs well enough to allow for entirely automated
classification with no human supervision.
Other categories are inherently difficult, even when frequently used. There
are many components in an aircraft and they all may fail. The (non-powerplant)
9
http://liblinear.bwaldvogel.de/
10
Other - the catch-it-all category defined as “Any occurrence not covered under another
category.”
11
Ground Handling - Occurrences during (or as a result of) ground handling operations.
12
Determined by a manual examination of 200 documents.
118 3.2. AUTOMATIC TEXT CATEGORISATION
3.2.3.5 Industrialisation
In all these results validated the adoption of the system within Safety-Data’s
commercialised tools. Given that performance varies according to the different
categories, it was decided to adopt a hybrid strategy where certain documents
will be coded in a fully automatic manner while for others the system will
produce suggestions that should be validated by an expert.
A “high precision” strategy will be adopted for fully automatic classifica-
tion. Given that each of the 36 binary classifiers produces a probability for
a yes answer (that a given category describes a given document), we can cal-
culate the threshold for which the system achieves a certain level of precision
separately for each category. This level was set to 95%. If the probability for
a given document and a given category is above the threshold, the category is
automatically added. If it is below, the document is marked for manual vali-
dation, with the most probable categories presented in the form of suggestions
to the user. For categories such as BIRD, where the system performs well, it
17
There are several categories dealing with collisions.
18
When reviewing the data, we are convinced that this particular category is largely
under-represented: there are many events that should be coded GCOL and are not.
NLP: DOMAINS OF APPLICATION 119
produces high recall, where for others such as RE, the recall is relativly poor.
In both cases, though, the quality of the assigned codes is satisfactory.
This case is the inverse of the high recall strategy for IR (§3.1.3.3). While
users tolerate a lot of noise in an IR context, where they have control and
visibility of the results, in the classification task the importance is to have
accurately coded data and the users both have a higher tolerance for silence
and are willing to engage in manual validation of the borderline cases.
This chapter is divided in two parts. First, in Section (4.1) we present our
solution to the problem of normalising the textual material we encounter in
incident and accident reports in order to transform it to formats suitable for
vector space modelling. Next, in Sections 4.2 and 4.3 we discuss the vector
space modelling framework and present the notion of dimensionality reduction,
central to many current NLP methods.
121
122 4.1. EXTRACTING FEATURES
4.1.1 Tokenising
The first step of any feature extractor is splitting the text up into individual
tokens - the words (Grefenstette and Tapanainen, 1994). In western languages
such as English this step seems trivial and often a basic whitespace tokeniser
does the job to a very satisfying degree. But even for English a tokeniser
must be able to handle ambiguous punctuation such as hyphens, full stops in
acronyms and other borderline cases. For other languages, however tokenisa-
tion is much more difficult. Chinese for example does not use whitespace to
separate words, so even at this basic level, a sophistication, such as dictionary
matching of sequences is required.
MTOW
MAX TKOF WT
MAX TKOF WEIGHT
MAX TAKE OFF WT
max takeoff weight
max take off weight
Maximum Take Off Weight
Maximum Take-off Weight
maximum take-off weight
MAXIMUM TAKEOFF WEIGHT
Maximum Takeoff Weight
Maximum takeoff weight
maximum takeoff weight
The example in figure 4.1 shows how several levels of variation articulate
and produce different surface forms. A feature extractor should be able to
take into account as much as possible such types of variation. We will now
see a processing chain built to account for such cases.
2
We present it here to discuss and exemplify some of the design choices that deal with
the problems presented in the previous section. While we have participated in some of the
said choices we do not by any means claim ownership of this work, which is a joint effort by
the Safety Data team.
126 4.1. EXTRACTING FEATURES
The example in figure 4.3 is the synopsis of a report from the ASRS
database. It is written in an abbreviated concise writing style, typical for
ASRS until a few years ago. It contains acronyms, such as NMAC and abbre-
viations such as “RWY” (runway). We will use it to exemplify the different
steps of the process.
Table 4.1 shows the end result - the features extracted by the processing
chain.
We will now see in grater detail the different stages of processing that
produce this output.
level is needed and information about the language is passed on at the latter
stages.
Next, each sentence is tokenised by a regular expression based tokeniser.
The sentences are split up into individual words, essentially corresponding to
the features of types word, stopWord, acronym and punctuation in table 4.1.
At this stage pre-processing and normalisation rules are applied to known
variants and abbreviations. For example the strings “CLRNC”, “ACFT” and
“RWY” in the original text are replaced by their expanded variants - “clear-
ance”, “aircraft” and “runway’.
Also, regular expression based detection of special types of tokens, such
as dates, urls or units of measurement as well as the stopwords. The latter
are kept in the feature list with a dedicated type as they are useful for later
stages of processing.
128 4.1. EXTRACTING FEATURES
vector and the columns to a set of features that describe them. In the simplest
possible VSM for representing a collection of documents, the features will
correspond to the words contained in the documents. Each document will be
represented by a bag (or multiset 5 ). For example, for the document “Every
day is a new day” the corresponding bag will be {a, day, day, every, is, new}.
We can represent the bag with the vector x = h 1, 2, 1, 1, 1 i, where the
first element in the vector is the frequency of a in the bag, the second element
the frequency of day and so on. Thus the collection of documents (or the set
of bags) can be represented as a matrix X where each row xi: corresponds
to a bag, each column x:j to a unique member and each element xij to the
frequency of the j-th element in the i-th bag.
Table 4.2 shows the term matrix constructed after analysing the three
documents in figure 4.4. For convenience we have only represented single token
features. The term space constructed from these three examples corresponds
to the vocabulary of the (tiny) corpus and after removing stopwords. It is
composed of twenty four unique features. Thus the space is said to have
twenty four dimensions. Each document is represented by a vector in the
space and its coordinates on the n-th dimension correspond to the frequency
of n-th feature in the document.
two documents. “NMAC”’ however appears in (only) 6761 of the total 339320
texts in the collection, whereas “land” is present in 98609 texts. Their idf
values are then respectively 1.7 and 0.53, reflecting the relative importance
of each term. When considering similarity, for example the fact that two
documents share “NMAC” will be considered about 3.5 times more important
than if they share “land”.
Tf-idf is far from the only weighing function out there. Many exist, such
as PMI (Turney, 2001) or Okapi/BM25 (Spärck Jones et al., 2000) and, as
tf-idf itself, each has countless variants.
The first two are the most important and are highly related. The process of
compressing the original matrix grounds itself upon the inherent redundancy
of surface forms found in text. When this redundancy is predictable within
a collection there is a high probability that the surface forms are related. If
many texts in the collection contain the words “bird”, “strike”, “collision”
but also “seagull”, “goose” and “falcon”, the algorithm will determine that
these terms can be mapped on a single dimension (instead of 4) without any
major loss of information. This dimension can be then interpreted as related
to bird-strikes, hence the notion of latent meaning.
Since LSA, many other methods of matrix compression have been invented,
most notably Topic Modelling Blei (2012), which we have applied and tested
on the ASRS database (see §6.4).
It is interesting to note that comparable results can be achieved with-
out complex processing of the term matrix as a whole. Random Indexing
(Sahlgren, 2005) is a dimensionality reduction technique that does nothing
more than represent the individual features as vectors that are the sum of
their contexts. By initially assigning a sparse random vector to each feature
and then iteratively scanning the texts and summing the vectors of the fea-
tures in it’s immediate vicinity, this method achieves a similar more abstract
representation of word meaning.
the method’s authors and providing “good enough” latent dimensions, so that
an interested party can index even small collection without having to worry
about first constructing the model. Conversely, the “explicit” nature of ESA
is not mandatory. Claveau (2012), for example, shows that one can use ESA-
like methods in an intrinsic fashion, by constructing a second-order mapping
using the documents from the indexed collection itself.
The second point of interest is the interpretability of the reduced dimen-
sions. What ESA prides itself upon is that the dimensions of the resulting
reduced space are directly interpretable (hence the E for “explicit”). One can
at a glance determine the reason that two texts are considered similar by a sys-
tem by looking at the titles of the Wikipedia articles they are associated with.
LSA avoids altogether the question by stressing on the “latent” and “hidden”
nature of the resulting dimensions. The rhetoric around Topic Modelling is
more nuanced. It puts forward the interpretability of sets of associated terms
that form the dimensions. As one can see by looking at the examples in sec-
tion 6.4, these sets of terms are surely coherent, but nonetheless only provide
a very basic insight into the nature of the resulting dimensions and require
a great deal of interpretive effort to be usable in a real world6 indexing sce-
nario. Furthermore, in our opinion, where interpretability is concerned, there
is no fundamental difference between Topic Modelling and the other smooth-
ing methods, such as LSA where one could extract from the model the n terms
with the highest loading for any given dimension.
6
If we want to provide the reason fo a given similarity score to a user, for example, it
would be quite cumbersome to show him several columns of related terms.
Chapter Five
One cannot hope thus to equal the speed and flexibility with which
the mind follows an associative trail, but it should be possible to
beat the mind decisively in regard to the permanence and clarity of
the items resurrected from storage.
— Vannevar Bush As We May Think
In this chapter we present the timePlot system we have built for detecting sim-
ilar occurrence reports. Section 5.1 presents the problem and how similarity
between documents is computed. In Sections 5.2 and 5.3 we present the tool’s
graphical interface and example of the results it presents to the users. Finally,
in Sections 5.4 and 5.5 we discuss how the tool was used and discuss how, by
observing the users’ interactions with it, we came to gain further insight into
their actual needs.
137
138
As we saw in Sections 1.3 and 2.3.3, one of the main challenges, when
working with databases of occurrence reports, is the identification of recurrent
risks. An obvious manifestation of such risks are multiple distinct events with
almost identical or very similar circumstances. When these events are subject
to incident or accident reports, it is possible to identify them by comparing
the individual entries in a given database and representing their resemblance
by computing a similarity score for each pair of entries.
In a way most events recur all the time. It is not because an event recurs
that it is automatically of interest. However if a certain type of event starts
recurring more than usual, it might indicate a pattern. For this reason the tool
we present combines the notion of similarity and the chronological distribution
of similar events - the “time” in timePlot 1 .
Now we will present how we apply the existing methods of calculating
textual similarity to the specific task at hand and what the particularities of
the data and the way it is used can teach us about the different methods we
tested and envisioned. Given the heterogeneous and unstable nature of the
coded data (§2.2.2, §2.4), the initial focus was to build a “similarity analysis”
system using only the narrative data as a source. Such a system has the benefit
to be robust and uninfluenced by the many issues and biases of the coded data.
Also, by explicitly considering only the textual parts of the reports the scope of
such an analysis is extended to databases with little or no coding and without a
clearly defined taxonomy. Such is the case of “young", undefined, or constantly
evolving reporting architectures (§2.4.2.3). Loss of coded data may also occur
due to bottleneck effects when data is exchanged between institutions using
different standards and formats (§2.4.2.4).
As we saw in Section 4.2, geometrically representing text is one of the fun-
damental methods in modern NLP, bridging the gap between the inherently
symbolic nature of human language and the numerical objects that machines
manipulate with ease. Both robust and simple to conceive and maintain,
vector-space modelling was the method we chose for building the system.
The basic idea was to exploit the narrative parts of incident and accident
reports and, by computing a similarity score between each pair of documents
in the collections to generate a layer of structure that is presented to the user
in the form of an interactive visualisation.
From an end-user’s perspective, we intended to develop a system requiring
a minimum of initial while allowing interactive browsing of a database of inci-
dents. The basic idea was to stimulate the expert’s serendipity by explicitating
sets of occurrences, that might indicate a pattern. For this an interactive vi-
sualisation replaced the more traditional list of results we are accustomed to
find. This echoes the unspecified information need we described in Section
1
We kept the name of the very first prototype. “timePlot.pl” was the file name of the
perl script which printed an outrageous html file with data for the similarities pre-loaded in
javascript variables.
THE TIMEPLOT SYSTEM: DETECTING SIMILAR REPORTS OVER
TIME 139
3.1.3.2.
From a system’s perspective we willingly kept the things simple by only fo-
cusing on the texts in the reports narratives. This allowed both to be immune
to the metadata-related issues discussed in Section 2.4.2 and to use the tool as
a basis for exploring how the narratives relate one to another and effectively
use the system as a stepping stone towards the more complex methods dis-
cussed in Chapter 6. In hindsight this choice proved a wise one, as the system
was rapidly proposed as a service by Safety Data and its simplicity allowed
it to scale to databases of close to half a million documents as well as to be
deployed for clients from other fields than aviation with little or no metadata.
In the next sections we will first discuss how we compute similarity between
documents, next we will present the user interface in detail and we will show
several incident scenarios with different chronological distributions identified
by the system. Last, we will examine how the system was deployed at various
institutions and the lessons learned from examining how it was really used.
similarity score for each pair of documents. This matrix was then pruned
for computational efficiency, discarding all the scores below a fixed threshold
(0.10), removing more than 90% of the similarity scores, before manually
loading the results in the system for visualisation and browsing.
Figure 5.1 shows the search engine tab for selecting a source report. Visible
in the upper half of the screen is the query builder that lists the criteria
142 5.2. PRESENTATION OF THE TOOL
available for choosing the source report. They correspond to keyword queries
in the narrative fields (title and text) and to different3 metadata attributes.
The criteria are joined by a logical AND. A document must satisfy all the
criteria in order to be shown in the result list. The list itself is visible on the
bottom half and is simply a table showing the title of the report, the date and
several metadata attributes. In the example shown in Figure 5.1 the user is
searching for documents containing “volcan”4 in their title.
Figure 5.2: timePlot GUI: source report selection via direct input
Figure 5.2 shows the input field where the user could paste the text of an
existing report and find similar reports. This action essentially bypasses the
selection interface (fig. 5.1) and shows the similarity page with the user text
as source report. Initially this feature was intended to be used in order to
circumvent the slow update cycle. The users wanted to be able to search for
similar reports using data, which had not yet been imported into the system,
as source reports. However, as we will see in the next section (§5.5) this
feature was also used as a classical full text search engine, with several query
terms rather than full report narratives.
After the source report has been chosen or entered by the user, the system
identifies similar reports and presents them, alongside the source report, in
the form of an interactive scatter plot (Figure 5.3). On top is the source
report. Underneath is the scatter plot. Time is represented on the X-axis
and similarity to the source report on the Y-axis. Each point on the plot
represents a similar report.
3
The attributes that can be used are chosen by the client.
4
volcano
THE TIMEPLOT SYSTEM: DETECTING SIMILAR REPORTS OVER
TIME 143
Hovering5 on a point in the scatter plot displays the corresponding report’s
title and underlines in yellow the words in common between it and the source
report. This feature allows the user to quickly understand in what way the
two reports are similar and was much appreciated by the users.
Clicking on the point opens a pop-up dialog with the report in question
and the common terms underlined in yellow. (Figure 5.4). In the pop-up
dialogue, a “Plot”6 button allows him to refocus the report in question as the
source report effectively allowing the user to navigate between reports in an
exploratory manner.
5
Passing the mouse pointer without clicking
6
We did not give much consideration to the naming of this button but it happened that
for the users at the DGAC it became synonymous with the action of “displaying a document
in the timePlot tool”. Thus a new French verb, “plotter” was created and is currently used
by the people using the tool at that particular agency.
144 5.3. CHRONOLOGICAL DISTRIBUTIONS OF RISKY SCENARIOS
Under the scatter plot, a trend-line (visible on figs. 5.6 and 5.5) represents
the variation in frequency of the similar reports over time. Together with the
overall distribution of the points on the plot, these two provide the user with
information related to the behaviour of a given risk over time.
To compute the trend-line, we divide the overall temporal range in a fixed7
number of periods. The score corresponds to the sum of the similarity scores
of the documents in each that period. The values are normalised to their
z-scores and represented as a smooth line chart.
We will now see three examples of different chronological distributions of
risky scenarios, showing the usefulness of the temporal dimension for organ-
ising the results.
In figure 5.5, the source report concerns a bird strike, the aircraft collided
with a bird on take-off. This is a very common type of occurrence. However,
when we examine its temporal distribution, we can clearly identify a pattern.
Most of the occurrences are concentrated in the warm months of the year (in
Europe). This example of seasonality is not surprising as birds naturally tend
to be less active in winter.
146 5.3. CHRONOLOGICAL DISTRIBUTIONS OF RISKY SCENARIOS
Figure 5.6 also illustrates the built-in transparency of the system. When
hovering on a point on the plot, the system dynamically highlights words that
the reports share. Also, when a similar report is opened in a pop-up dialog
(Figure 5.4), shared words are highlighted in both the source and the similar
report. Besides providing an intuitive way for the user to determine if the
report is of interest, this feature also provides information about the reasons
148 5.3. CHRONOLOGICAL DISTRIBUTIONS OF RISKY SCENARIOS
Figure 5.7 shows the tool with a single term, “souffle”9 used as a “source
report”. This is a real query submitted by a user and illustrates the misuse of
the tool we discuss in the next section. Also the results present no temporal
pattern. We can see two vertical clusters. The one higher up corresponds to
documents that have the term “souffle" in the title sections (as they are given
more weight by the system). The points lower on the graph are documents
that have the term only in the body of the document. The vertical pattern is
an artifact of the rounding of the similarity score performed by Lucene.
Even if the tool was not intended for this kind of use, we can see how
the scatter plot visualisation gives a much more concise view of the results
and allows more documents to be returned. Currently the limit is set to 3000
documents. Thus much higher recall is possible. This is appreciated by the
users who have shared with us that they prefer noisy results than ones with a
lot of silence. To put it in other words, having to filter through false results
is preferred to not finding true ones.
9
Jet blast
THE TIMEPLOT SYSTEM: DETECTING SIMILAR REPORTS OVER
TIME 149
narratives, timePlot provides ways to quickly and easily find relevant infor-
mation.
The DGAC are also starting an occurrence data sharing program sup-
ported by the tool. The service providers (airports and companies) willing
to share part of their incident data will get free access to the tool with all
the data that other operators participating in the programme have shared.
Currently there are 167 active users and 560 monthly queries are performed
at the DGAC.
One interesting scenario concerns the airline’s testing of the tool. As part
of the test we had provided the tool loaded with a database of publicly avail-
able incident reports. One of the questions that the safety officers were inter-
ested in concerned events that occurred at some of their diversion10 airports.
For one particular airport in central Russia, the tool shed light on a larger
than normal concentration of runway overruns - cases where the landing air-
craft did not manage to stop in time. The problem was related to improper
drainage of the runway surface and the company updated the procedures for
landing there in case of emergency according to these findings.
In another case the experts were asked to investigate a series of specific
incidents. The identification of similar incidents over an extended time period
allowed them to determine that the original cluster was "a statistical accident"
and not a developing trend, thus avoiding the (very costly) creation of a special
investigative task force.
Rather than pasting whole narratives, some users started using it more like a
full-text search engine. The user would type in several search terms and then
explore the results on the chronological scatter plot. This identified the need
for such tools within the industry, where current solutions like ECCAIRS back
metadata based exploration and undermine the textual information contained
in the narratives.
A search query such as “approche non conforme ANC”12 basically takes
advantage of the indexing done for calculating similarity between documents
and the dynamic highlighting of the input terms (fig. 5.6) and allows the
user to quickly scan the collection for documents containing any or all of
the terms. The fact that the user entered both the developed form and the
acronym (ANC) clearly shows that the user is aware that the data is noisy
and that the reports he is interested in may contain either of the variants.
A similar tactic was observed in the logs of the version deployed at the
airline, where queries like “souffle jet blast” would be formulated to search in
the narratives of both the reports in English and those in French. This lead
us to search for the methods that allow cross lingual support, that we present
in Section 6.3.
A use case scenario, related to regulation about the use of mobile phones on
airplanes exemplifies this trend. The regulation had recently changed and led
the company to consider allowing their use in the cockpit by the pilots. Using
the tool, they searched for reports about possible interference (by essentially
putting-in related keywords such as “interference”, “cockpit” and the names
of different systems), and found one case where a mobile phone of a passenger
seated in one of the front rows interfered with crucial instruments. Based on
this, it was decided to maintain the ban in the company’s standard operating
procedure.
11
While technically a misuse of the tool with regard to its initial purpose, we have to
mention that this is actually the intended purpose of the Lucene search engine, we ourselves
had “hijacked” to allow the tool to scale-up to the quantities of data it now processes.
12
Unstabilised approach
152 5.5. TIMEPLOT IN MISUSE
Data clues to the next generation of tools destined at the exploration of large
databases of incident and accident reports.
This chapter explores the notion of similarity from several different angles. It
is a collection of four independent approaches, each addressed at a different
aspect of the complex notion. In Section 6.2 we present a method that learns
from documents and their associated metadata attributes and allows to filter
out one or another aspect of similarity. Next, in Section 6.3, we address the
question of multilingual databases and explore the potential of second-order
similarity methods to provide coherent representations of collections contain-
ing documents written in different languages. We compare the result of Topic
Modelling to the information in ASRS’s metadata in Section 6.4. Finally, in
Section 6.5 we present an approach based on active learning, allowing a user
to model a certain aspect of an accidental scenario by providing the system
with a few examples.
1
"Because it can produce a few notes, tho they are very flat; and it is nevar put with
the wrong end in front!"
155
156 6.1. CHAPTER INTRODUCTION
1. “Bird-strike on takeoff”
2. “Turbulence on takeoff”
158
6.2. FILTERING ASPECTS OF SIMILARITY BASED ON METADATA
3. “Bird-strike on landing”
If we are interested in the first text and want to identify similar occur-
rences, the system will identify both documents 2 and 3 as similar, with a
score of 0.5, as they both share one token with the first document.
From an expert’s point of view, however, these similarities are quite dif-
ferent. He would instantly differentiate between the pair 1 and 3 where a
a similar event occurred and 1 and 2 where completely different events oc-
curred in similar circumstances. Similarity is, in a way faceted. While the
expert might accept2 such behaviour from the system, in a real world analysis
scenario, it will be helpful to be able to filter out the facets.
Besides, these facets of similarity are already reflected in the coded data. In
ICAO’s ADREP taxonomy (§2.2.3.5), for example, separate branches concern
the flight phase and the occurrence category. The flight phase will indicate
at which moment3 the event occurred. The occurrence category is a list of
36 values, classifying events on a macro level. It happens that both bird-
strikes and turbulence encounters are sufficiently frequent as to have dedicated
occurrence categories. The three documents in the example will then be coded
as follows:
based on a threshold. And calculate and compare the similarities in the filtered
and unfiltered matrices.
Table 6.2 shows average overlap (AO) for both unfiltered similarity and
filtered similarity. We also calculated a mean disturbance rate (DR) represent-
ing, on average, the number of new documents in the top 30 similar documents
when a filter is applied.
AO FlPh AO OccCat DR
Unfiltered 75% 89% -
Filtered for FlPh 64% 84% 9,8
Filtered for OccCat 73% 69% 13,6
Table 6.2: Filtered and unfiltered mean overlap between textual similarity and
coded data
We can see that applying a filter for a given facet reduces the number of
similar documents which share that facet with the source document. For an
incident report concerning bird-strikes on take-off, at average 89% of the 30
most similar reports will be about bird-strikes and 75% of the top 30 , will
concern events occured at take-off. When we filter the occurence category,
the average number of similar reports concerning birdstrikes drops to 69%.
At average there are 13,6 new documents in the top 30.
Let us look more in detail at what this disturbance contributes from a
qualitative perspective and how such a system has the potential to identify
minor secondary facets of similarity. In our test corpus we identified the
following document:
However when filters on both flight phase and occurrence category are
applied, documents that share only these facets of similarity will naturally
appear further down the list of similar documents and shared secondary facets,
such as the “double input” event, will be emphasised and contribute more to
the similarity score.
indexed collection. In our case such a resource simply does not exist. Fortu-
nately, as Claveau (2012) demonstrates, second order similarity can achieve
better results than traditional first order methods even when the corpus of
pivots is a randomly assembled collection of texts, without any structure. To
our knowledge, the question on how to construct the pivot corpus, in the
context of a collection of domain-specific documents has not been explored.
With respect to the above mentioned considerations and heavily con-
strained by practical issues such as availability, we chose a solution half way
between the (presumed) corpus of concepts of the original ESA implementa-
tion and an unstructured collection of texts.
We constructed the pivot corpus using official accident reports (§2.1.2.2)
issued by the investigation authority of Canada, the TSB. Given that, in
Canada, both French and English are official languages, accident reports are
systematically published in both. They are generally long documents and
have identifiable parts, each having a different discursive function. As a re-
minder, the beginning will usually consist of a narrative of the event. Later
on, the analytical parts will “zoom in” and provide descriptions of the exact
mode of failure of a given (human or mechanical) subsystem. It follows that,
taken as a whole, accident reports can not be considered as representative of
concepts. However, when broken up into smaller sections, each section rep-
resents a rather concise theme. The following paragraph from such a report,
for example, has high internal coherence, explaining a particular aspect of the
behaviour of helicopters:
Pushing the cyclic forward following a pull-up or rapid climb, or even from
level flight, produces a low-G (weightless) flight condition. If the helicopter is
still pitching forward when the pilot applies aft cyclic to reload the rotor, the
rotor disc may tilt aft relative to the fuselage before it is reloaded. The main
rotor torque reaction will then combine with tail rotor thrust to produce a
powerful right rolling moment on the fuselage. With no lift from the rotor,
there is no lateral control to stop the rapid right roll and mast bumping
can occur. Severe in-flight mast bumping usually results in main rotor shaft
separation and/or rotor blade contact with the fuselage.
The same chain is applied once to each of the pivots for them to be com-
patible with the indexed documents.
A first-order vector space representation is constructed and weighing (§4.2.2)
is applied using the PPMI11 method (Turney and Pantel, 2010). Then a simi-
larity score is calculated between each of the documents and each of the pivots
of the corresponding language in order to construct the second-order vectors.
We then calculate a cosine similarity, using the same method as in the
original timePlot implementation (§5.1).
FR EN
R@1 0,43 0,45
R@10 0,71 0,74
R@100 0,90 0,94
Table 6.4: Mate retrieval results
As we can see, the results are encouraging. In more than 40% of the
cases, the translated document was the most similar document returned by
the system and in more than 70% of the cases it was within the 10 most
similar documents. For comparison, Sorg and Cimiano (2012) report a R@10
between 0.27 and 0.52.
6.3.4 Discussion
This experiment demonstrated that the ESA family of methods is applicable
in the context of multilingual databases of incident reports.
DIMENSIONS OF SIMILARITY: FROM SIMPLE LEXICAL OVERLAP
TO INTERACTIVE FACETING AND MULTILINGUAL SUPPORT 167
The experiments showed however several areas where further research has
the potential to improve the results. The availability of an adequate multi-
lingual resource for the pivots is crucial. While we achieved acceptable re-
sults by just taking paragraphs from accident reports as pivots, our intuition
tells us that the explicit character of the method merits further investigation.
Gabrilovich and Markovitch (2007) emphasises on the importance of having
the “right” concepts and, accordingly, part of the work goes into investigating
the “right” way to concatenate Wikipedia articles based on different levels of
grouping in the categorisation hierarchy of the online encyclopedia. In our
case we can ask ourselves how to smooth the pivot corpus in order to get a
more “natural” set of pivot documents. This can be done in multiple ways.
One will be to provide more advanced methods of “cutting-up” the doc-
uments in conceptually coherent parts. The work we did with Campello Ro-
drigues (2013), aimed at zoning accident reports into sections with different
rhetorical functions, provides an interesting starting point. Being capable of
isolating relevant parts of these documents based on their overall rhetorical
structure, only those zones, having well defined and context-independent in-
formational content (such as the purely descriptive parts) could provide a less
noisy corpus of pivots.
The explicit character of the method also has the advantage af being in-
terpretable. Given that we can identify which of the pivots are contributing
to a given similarity score, one can then extrapolate (for example by using
standard document classification techniques (§3.2) and values from the coded
data) which aspects of an incident are captured by a given cluster of similar
documents, effectively addressing the same concerns discussed in the previous
section.
The aforementioned considerations are valid for both intralingual and in-
terlingual ESA-like methods. For the interlingual part only, the question of the
availability of aligned resources is a central one. While we were “lucky" that
the English-French pair is represented by the Canadian documents, aligned
technical documents for other language pairs are not easy to come by. Cur-
rently one of the needs expressed by the aviation safety community in Europe
(and carried by the European Commission) is putting order in the centralised
incident repositories, where incidents are reported in all of the European lan-
guages. For a ESA-like method to be applicable, all the pivots need to be
translations of the same (domain specific) texts in all official languages in the
EU.
Table 6.5: The 5 first topics extracted from the ASRS corpus
12
The hyper-parameters were left to their default value: α = 1/T , β = 1/T , 50 passes.
13
The topics’ order is insignificant as it is an artefact of the randomisation process at the
beginning of the modelling process.
170 6.4. TOPIC MODELLING APPLIED TO THE ASRS DATABASE
A safety expert was presented the 15 most contributing words for each of
the 50 topics, and was asked to describe in a few words what each of these
topics could mean. His feedback is presented in the “Expert” column of ta-
ble 6.5. For 43 topics out of 50 the expert was able to identify a theme or
a small set of themes that could be expressed by the words with the highest
probability values. Although some of the words may seem opaque to a layman,
most of them are in fact quite transparent. Contributing words for topic 4,
for example comprise both the overal category (WX is the standard acronym
for weather), various meteorological phenomena (ice/icing, rain, thunderstorm
(TSTM )), common modifiers (light, moderate, severe) or consequences (turbu-
lences (TURB)); all this make it an easily interpretable topic. This is not the
case for topic #5, where no coherence could be found, as the most contributing
words are scattered across several aspects of flying an airplane.
The document×topic matrix provides another means for interpreting the
topics: each document is represented by a vector of weights across the 50
topics. That means that each topic can be viewed as a distribution over the
documents, and as such can be compared to the documents’ metadata. We
thus computed Pearson’s correlation coefficient between each topic and each
metadata value across the documents (considering 1 if the document’s meta-
data contain this value, and 0 otherwise). This gave us a different, more
objective angle to interpret each topic, as we could identify which metadata
value was the most strongly associated to each topic. These values are in-
dicated in the “Metadata” column of table 6.5, along with the correlation
coefficient’s score14 .
First, we can see that for some topics (number 1, 3 and 4 in our selection)
one or two highly correlated values (> 0.4) can be identified, and that these
confirm the expert’s interpretation. Other attributes can appear as secondary
correlates, such as flight phase and reporting person, but nevertheless it ap-
pears that such topics have captured a well-known aspect of incident reports.
This is the case for 38 of the 50 topics. It has to be noted that any aspect
of a report can be thus “captured” by a topic. For example, one particular
topic was associated to flights in California, the contributing words being the
names of locations in this traffic-dense area.
A second case is that of the topics that could easily be identified by the
expert but do not show any marked correlation with the metadata. This is
the case for topic 2 in our selection, where the only correlated attribute is
the company policy, although with a very low score. This kind of topic is
extremely interesting, as it shows that corpus analysis by this kind of method
can make some aspects of incident reports emerge. Only 2 of these could
be identified in the 50 topics examined in our experiment: fatigue and flight
14
Only the attributes with a positive correlation higher than 0.1 are presented. This
threshold was chosen arbitrarily as the population is too large to have non-significative
correlations scores.
DIMENSIONS OF SIMILARITY: FROM SIMPLE LEXICAL OVERLAP
TO INTERACTIVE FACETING AND MULTILINGUAL SUPPORT 171
planning. It is important to note that the fatigue attribute was added to
the ASRS taxonomy, along with other human factors, in 2009. Even though
the subset it covers is too small for meaningful results, and is heavily biased
because of this temporal constraint, partial analysis indicates that this topic
is highly correlated to this attribute.
The 10 remaining topics could not be associated to any single aspect of
reports. This is the case for topic 5 in our selection, where the correlated
attributes are numerous and scattered, making no more sense to the expert
than the contributing words. Other configurations in this category are topics
for which several identifiable topics are mixed together, and which are split
when a larger number of topics T is extracted.
6.4.3 Discussion
Although we only performed a limited number of experiments with topic mod-
elling on incident reports, it appears that topic modelling is suitable for oc-
currence data. It is a very robust method that takes clear advantage of large
collection of redundant documents as it is the case for incident reports. Most
of the topics identified are in fact relevant aspects of these documents, as
can be seen through an expert’s interpretation. However, only a small frac-
tion of identified topics are both relevant and independent from the metadata
attributes, and as such provide an added value.
One of the main limitations of this approach is the granularity of the
extracted topics, especially when it is compared to the level of details attained
in the organised description and indexing of aviation incident reports. As
seen in the previous analysis of the resulting topics, most of the topics do
little less than confirm an organisation that is clearly expressed by some of
the metadata. If in some cases this method can identify non-encoded aspects,
they are difficult to detect among other unavoidably noisy topics. However,
this technique can be extremely valuable for reports database that are not
supported by a thorough classification scheme and extensive metadata. This
can be the case of databases that need to be consolidated, or even for the
replacement of an unsuitable taxonomy.
On the technical level, topic models are somewhat sensible to a number of
parameters, the first of which is the requested number of topics. We performed
several tests on the same data with T = 10, T = 100 and T = 200. None of the
topics among the 10 were interpretable, as they all mingle several aspects of
the reports. Interesting things happened with 100 topics, including the clear
and expected separation of topics (from the 50 described above) that could
be identified as an agglomeration of quite distinct sub-topics by the expert.
However, this led to only a few such improvements, most other topics were
deemed unnecessarily split. With the highest tested value (200), many result-
ing topics were related to geography, with high-weighted tokens corresponding
to airports, beacon codes and city names (mostly in the US). Although these
172 6.4. TOPIC MODELLING APPLIED TO THE ASRS DATABASE
topics were coherent and easily interpreted, their informational value seems
quite low. Finally, we could identify a few very stable topics across the vari-
ation on T ; this is the case for topic 2 (related to fatigue) that was found
almost identical in all experiments with T > 50. In the end, the optimal value
for T cannot be evaluated without a complete and thorough interpretation of
resulting topics, and is estimated to be highly dependent on the collection of
documents.
Nevertheless, we see potential applications of the technique in both cal-
culating similarity and as an initial step when designing taxonomies for large
collections of textual reports, when a taxonomy is not available15 .
In order to apply topic modelling to similarity, one just needs to compute the
similarity scores between documents based on their score for each topic. Such
an application will essentially address the same issue we described in section
6.2, where the user will be able to “switch on/off” different topics and thus
influence the similarities identified by the system. To build on the previous
examples (table 6.5) if for some reason an expert is not interested in the role
weather played in an incident he would turn off topic 4. Thus documents
similar because of weather related issues will not be identified. Such a shift in
similarity will hopefully bring to light another more sable or hidden pattern
in the data.
In determining the significant overlap between the produced topics and the
metadata we can now consider topic modelling as a viable solution for treat-
ing large uncoded (or ineffectively) coded collections. Essentially the same
interpretation exercise we performed in this experiment can be considered as
a base for a first version of a coding taxonomy. An expert would be presented
with the major topics and asked to interpreting them. Thus an initial set of
metadata categories can be established. Such an approach would also have
the added value that, by definition, the categories will be easy to classify by
automatic classification techniques as they will be reflected in the narrative
parts.
15
As is sometimes the case in industries and sectors where incident reporting is a more
recent enterprise.
DIMENSIONS OF SIMILARITY: FROM SIMPLE LEXICAL OVERLAP
TO INTERACTIVE FACETING AND MULTILINGUAL SUPPORT 173
atives. At each iteration, the system first trains a new model Mi given the
current training set T . It then calculates a new dimension vector Di using the
model Mi . Within the algorithm, we’ll assume D contains a real positive or
negative distance from the SVM hyperplane, although it is trivial to convert
this to a yes/no answer by taking positives to be yes and negatives to be no.
Finally, the system reconstructs T as follows: T .p is automatically calculated
by taking all documents where the distance from the hyperplane exceeds the
bootstrap threshold. The expert is then asked to review the n documents
closest to the hyperplane margin on both sides, and determine whether they
are really positives or negatives, assigning them respectively to T .P or T .N .
6.5. ACTIVE LEARNING FOR INTERACTIVE MODEL
176 CONSTRUCTION
The assumption is that correctly reclassifying a small number of documents in
these marginal areas allows us to converge much more quickly than a random
review of documents.
The learning ends when the expert is satisfied with the dimension values
assigned to documents—presumably when the hyperplane correctly distin-
guishes the majority of documents reviewed.
Table 6.7 shows the results of another simulation, this time on 7,025 doc-
uments from the ASRS database (selected on a temporal criterion from the
corpus described in section 2.1.4). We simulated the search for incident re-
ports where confusion was a factor and we use the Human Factors attribute
of the Person entity as a validation criterion. We tested for those documents
classified with the value Confusion. The initial query is the word “confusion”.
While this configuration is closer to the real-word use the system is in-
tended for, it is also a much more difficult task than identifying bird-strikes.
This difficulty can be estimated by training a simple classifier for this meta-
data: our best configuration achieved only 66% F1-score, while we reach 95%
for the BIRD category in the DGAC corpus.
Accordingly, the system performance is worse than in the previous sce-
nario, but the behaviour is comparable. At iteration 3 the system has iden-
tified 253 more true positive documents with only 40 being submitted to the
expert for validation. After 10 iterations, if the F1 score is still below 50%,
recall has doubled.
i T .p T .P T .N True+ P (%) R (%) F1 (%)
0 774 0 0 472 60.98 25.46 35.92
1 1048 0 0 574 54.77 30.96 39.56
2 1280 14 6 670 52.34 36.14 42.76
3 1443 24 16 725 50.24 39.10 43.98
4 1564 26 34 765 48.91 41.24 44.74
10 1936 57 123 900 46.49 48.54 47.49
Table 6.7: Results for confusion (ASRS corpus)
Let us sum up the main advantages and drawbacks of the four methods
presented in this chapter:
183
184
safety experts. It became apparent that, combined with some specific expecta-
tions, such as high recall and transparent results, users were willing to engage
in iterative and prolonged search strategies. Based on these observations, we
developed an iterative process allowing example-based modelling of a given
scenario, with the added benefit of producing a persistent model that can be
applied to any document.
All in all, this thesis shows that text can be viewed not only as the vehi-
cle of information between humans, but also as a resource that, when prop-
erly tapped and exploited, improves the overall quality of communication of
safety-related information within a given system. We are confident that by
incorporating NLP components in the information processing framework of
a high-risk system, it is possible to conceive robust, bottom-up and scalable
methods allowing more efficient use of large quantities of occurrence data,
leading ultimately to an even better understanding of complex socio-technical
systems and rendering them even more reliable in the future.
Furthermore, we are not alone in believing in the potential of language
processing technologies applied to safety-related information. Today, time-
Plot’s successor, PLUS 19 , an industry-ready commercial application, built by
a team of talented engineers, based largely upon the results of this research is
a proof of the contributions NLP has to offer to the domain of risk manage-
ment. The application was built with the same functionalities as the timePlot
system and all the principles discussed in Chapter 5 also hold for it. At the
time of this writing, the tool is operational or in the phase of being deployed
at companies and government institutions, both in the civil aviation domain
as well as in other sectors. Here is a (non exhaustive) list of CFH - Safety
Data’s clients using or about to use the tool:
• Civil Aviation: EASA, DGAC, Air France, Dassault Aviation, WFP20
• Space: Astrium
• Rail: SNCF, RATP
• Energy: EDF
• Medical: UGECAM
As a consequence, we are also starting to notice how, by introducing cus-
tom tools and processes built around NLP technologies, practices within the
community are starting to shift and the available data started to be looked
upon in novel ways. While until recently report narratives were meant for “hu-
man eyes only”, now we start hearing voices from the community stating that
automatic processing of languages can replace21 the current taxonomy-based
paradigm.
19
PLUS stands for Processing Language Upgrades Safety.
20
the World Food Program is in charge of the United Nation’s transport operations.
21
We personally find this claim a bit too extreme and overambitious, but we feel flattered
nonetheless.
CONCLUSION 185
This thesis is coming to an end, but work on the subject is all but begin-
ning. As we hope it has become clear in, we believe that access to data and to
users are the essential prerequisites to successfully apply NLP to a given task.
It also goes without saying that there are considerable engineering challenges
associated with managing data and building applications.
We have reached now, with CFH - Safety Data the point where a compre-
hensive NLP-based solution for safety related data is becoming mature. It is
therefore time for us to ask the question we were asking at the beginning of
the thesis in a different manner. While our objective was then to apply NLP
to risk management in civil aviation, now the question becomes: “what is the
best way to proceed in the future?” In other words, how do we evaluate the
contribution of the different NLP-components in this particular context, how
do we choose between alternatives and how do we optimally parameter them?
Given that the objective of this thesis is to equip specific users with tools
that satisfy specific needs, we look at the question of evaluating the system(s)
from an extrinsic perspective and on an end-to-end basis. Given the variety of
tasks, we came to think about the possibility to conduct such evaluations. We
see it as a spectrum, ranging from the necessary and straightforward approach
to cases where conducting formal evaluations is unreasonably difficult:
3. Feasible but costly: For IR systems, for example, the evaluation needs
to simulate the users expectations. The TREC competitions provide
ample examples of both the difficulty of the task and the necessity for a
great number of separate simulations in order to cover different contexts
of IR searching in a firm’s internal document collection (Balog et al.,
2008) is different than searching the web (Collins-Thompson et al., 2014).
Building a similar evaluation protocol for incident and accident data is
certainly feasible but will be a very costly enterprise. For the time being
it is not reasonable, at least until we have both sufficiently refined our
understanding of the information needs and sufficiently observed real
world user interactions with an IR system in this particular context.
We are certain that the bulk of the tasks NLP applications in the domain
of risk management will fall in the last category. Which brings us to recon-
sider the current evaluation paradigms, apply them whenever possible, but
also look for complementary means of ensuring acceptable performance. Our
approach is twofold and revolves around the notions of transparency and user
involvement.
188
We would also to be able to act on the smallest possible level of grain and
thus prefer modular approaches to monolithic approaches for any given task.
In other words we would prefer having the possibility to intervene at a very23
small scale in order to adjust a particular problem or error produced by the
system.
It follows that, by building transparent and modular processing, while
maintaining a channel for feedback from an informed (and involved) user will
allow us to act a posteriori and act based on a concrete example of an unde-
sirable result.
Regarding the work we presented in this thesis, the main consequence of
these considerations is that they make us reconsider using (opaque) dimen-
sionality reduction techniques as the unique abstraction layer between
text and representation. As a consequence, we will start looking for ways to
progressively integrate symbolic and knowledge rich methods in the tools we
propose.
22
Such a position would be completely unthinkable and counterproductive for other types
of applications, such as web search engines, where people constantly try to “game” the system
for higher visibility in the search ranks. As a consequence the system’s performance depends
on the opaqueness of the underlying processing.
23
A trivial example is the one we mentioned about the stemming of the word “laser” to
“las” (§3.1.2) in this case we do not want to have to reinvent a stemming algorithm, but to
be able to manually add an exception.
CONCLUSION 189
• It follows that they offer little if no direct control over the mappings. It
is difficult24 to act on a small scale and, say, add or remove variants of
a given term.
• They are monolithic. Because they account for different aspects of how
language varies, they tend to be applied as single components with basic
tokens as input and “conceptual” dimensions as an output.
we will continue to test and use dimensionality reduction in those areas where
either an objective formal evaluation is possible (such as Text Categorisation)
or where they come as an aid to humans for necessary modelling tasks, such
as building and maintaining lexicons.
The other option as we saw is to describe meaning and its relationship to
its primary vehicle (text). A knowledge-rich approach implies constructing
a world model as a central resource for extracting meaning from text. For
Nirenburg (2004) a knowledge rich approach for NLP is comprised of:
And while to Turney and Pantel (2010) it “seems possible that all of the
semantics of human language might one day be captured in some kind of
Vector Space Model”, we know that the semantics of language can be described
and, provided there is a resource with sufficient cover and scale, can be applied
to extract the underlying meanings from text in an understandable form.
The main criticism of knowledge-rich methods is that the sheer effort
needed to construct the model is forbidding their practical application outside
small and controlled lab environments. In other words they lack robustness.
Such approaches were mainstream from the 1960’s to the 1990’s but were
gradually replaced by statistical NLP in the late 1990’s (Spärck-Jones, 2001).
Now they are starting to (spectacularly) come back with systems such as
IBM’s Watson (Gliozzo et al., 2013) or Inquire (Chaudhri et al., 2013).
Lannoy (1996) discusses a potential application of full scale semantic repre-
sentations to the domain of risk management only to show that the complexity
of the modelling effort is forbidding. It also illustrates two fundamental prob-
lems with a number of such approaches:
• Secondly, (given the ambitious goal) the success of such a system de-
pends upon the completeness of all the modelling (all levels of language
and a complete model of the domain) in its entirety before it is capa-
ble of delivering any usable results. In practice knowledge rich methods
CONCLUSION 191
• High stakes: Aviation accidents are incredibly costly and even a slight
improvement of safety saves a lot of money.
• Users generate more and more quality information and are being ex-
ploited for knowledge acquisition (Lafourcade, 2007; Wang et al., 2012).
• Robust large scale methods for knowledge extraction from text are avail-
able, partially as an answer to the availability of large quantities of an-
notated texts (Bellot et al., 2014).
192
Ahsan, S., Alshomrani, S., and Hassan, A. (2013). Semantic data mining for
security informatics: Opportunities and challenges. Life Science Journal,
10(12s).
Ale, B. J. M., Bellamy, L. J., Roelen, A. L. C., Cooke, R. M., Goossens, L.
H. J., Hale, A. R., Kurowicka, D., and Smith, E. (2005). Development of a
causal model for air transport safety. page 107–116.
Allan, J. (2006). A heuristic risk assessment technique for birdstrike manage-
ment at airports. Risk Analysis, 26(3):723–729.
Amalberti, R. (2001). The paradoxes of almost totally safe transportation
systems. Safety Science, 37(2–3):109–126.
Andrzejewski, D., Zhu, X., and Craven, M. (2009). Incorporating domain
knowledge into topic modeling via dirichlet forest priors. In Proceedings
of the 26th Annual International Conference on Machine Learning, pages
25–32. ACM.
Arampatzis, A. T., Van Der Weide, T. P., van Bommel, P., and Koster, C. H.
(2000). Linguistically motivated information retrieval. Encyclopedia of Li-
brary and Information Science: Volume 69-Supplement 32, page 201.
Arun, R., Suresh, V., Madhavan, C. V., and Murthy, M. N. (2010). On
finding the natural number of topics with latent dirichlet allocation: Some
observations. In Advances in Knowledge Discovery and Data Mining, pages
391–402. Springer.
ASRS (2014). ASRS coding taxonomy.
Balog, K., Thomas, P., Craswell, N., Soboroff, I., Bailey, P., and De Vries,
A. P. (2008). Overview of the TREC 2008 enterprise track. Technical
report, DTIC Document.
Barach, P. and Small, S. D. (2000). Reporting and preventing medical
mishaps: lessons from non-medical near miss reporting systems. BMJ,
320(7237):759–763.
193
194 BIBLIOGRAPHY
Bellot, P., Bonnefoy, L., Bouvier, V., Duvert, F., and Kim, Y.-M. (2014).
Large scale text mining approaches for information retrieval and extraction.
In Faucher, C. and Jain, L. C., editors, Innovations in Intelligent Machines-
4, volume 514 of Studies in Computational Intelligence, pages 3–45. Springer
International Publishing.
Blaser, S., Agnew, S., Kannan, N., and Ng, P. (2004). High volume targeting
of advertisements to user of online service. US Patent 6,757,661.
Blei, D., Ng, A., and Jordan, M. (2003). Latent Dirichlet Allocation. Journal
of Machine Learning Research, 3:993–1022.
Chang, J., Gerrish, S., Wang, C., Boyd-graber, J. L., and Blei, D. M. (2009).
Reading tea leaves: How humans interpret topic models. In Bengio, Y.,
Schuurmans, D., Lafferty, J., Williams, C., and Culotta, A., editors, Ad-
vances in Neural Information Processing Systems 22, pages 288–296. Curran
Associates, Inc.
Chaudhri, V. K., Cheng, B., Overtholtzer, A., Roschelle, J., Spaulding, A.,
Clark, P., Greaves, M., and Gunning, D. (2013). Inquire biology: A text-
book that answers questions. AI Magazine, 34(3):55–72.
BIBLIOGRAPHY 195
Clark, A., Fox, C., and Lappin, S. (2013). The Handbook of computational
linguistics and natural language processing. Willey-Blackwell.
Collins-Thompson, K., Bennett, P., Diaz, F., Clarke, C. L., and Voorhees,
E. M. (2014). TREC 2013 web track overview. Ann Arbor University,
Tech. Rep.
Croft, W. B., Metzler, D., and Strohman, T. (2010). Search engines: Infor-
mation retrieval in practice. Addison-Wesley Reading.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harsh-
man, R. A. (1990). Indexing by latent semantic analysis. Journal of the
American Society for Information Science, 41(6):391–407.
Gliozzo, A., Biran, O., Patwardhan, S., and McKeown, K. (2013). Semantic
Technologies in IBM Watson. In Proceedings of the Fourth Workshop on
Teaching NLP and CL, pages 85–92, Sofia, Bulgaria.
Gonzalo, J., Verdejo, F., Chugur, I., and Cigarran, J. (1998). Indexing
with wordnet synsets can improve text retrieval. In Proceedings of COL-
ING/ACL’98 Workshop Usage of WordNet for NLP.
Heinrich, H., Petersen, D., Roos, N., Brown, J., and Hazlett, S. (1980). Indus-
trial Accident Prevention: A Safety Management Approach. McGraw-Hill.
196 BIBLIOGRAPHY
Hermann, E., Leblois, S., Mazeau, M., Bourigault, D., Fabre, C., Travadel,
S., Durgeat, P., and Nouvel, D. (2008). Outils de traitement automatique
des langues appliqués aux comptes rendus d’incidents et d’accidents. In 16e
Congrès de Maîtrise des Risques et de Sûreté de Fonctionnement, Avignon.
Ho, C.-H. and Lin, C.-J. (2012). Large-scale linear support vector regression.
Journal of Machine Learning Research, 13:3323–3348.
Hu, Y., Boyd-Graber, J., Satinoff, B., and Smith, A. (2014). Interactive topic
modeling. Machine learning, 95(3):423–469.
Jansen, B. J., Booth, D. L., and Spink, A. (2009). Patterns of query re-
formulation during Web searching. Journal of the American Society for
Information Science and Technology, 60(7):1358–1371.
Kim, S.-M. and Hovy, E. (2006). Extracting opinions, opinion holders, and
topics expressed in online news media text. In Proceedings of the Workshop
on Sentiment and Subjectivity in Text, pages 1–8. Association for Compu-
tational Linguistics.
Kristjannson, T., Culotta, A., Viola, P., and Callum, A. M. (2004). Inter-
active information extraction with constrained conditional random fields.
In proceeding of the Conference of the American Association for Artificial
Intelligence (AAAI), San Jose, CA.
Kules, W., Wilson, M. L., Shneiderman, B., et al. (2008). From keyword
search to exploration: How result visualization aids discovery on the web.
Lafourcade, M. (2007). Making people play for lexical acquisition with the
jeuxdemots prototype. In Proceedings of SNLP’07: 7th international sym-
posium on natural language processing, page 7.
NTSB (1975). Trans world airlines, inc. boeing 727-231, n54328 berryville, virg
december 1, 1974. Aircraft Accident Report NTSB - AAR -7516, National
Transportation Safety Board.
Salton, G., Wong, A., and Yang, C. (1975). A vector space model for automatic
indexing. Communications of the ACM, 18(11):613–620.
Singh, A., Rose, C., Visweswariah, K., Chenthamarakshan, V., and Kamb-
hatla, N. (2010). Prospect: a system for screening candidates for recruit-
ment. In Proceedings of the 19th ACM international conference on Infor-
mation and knowledge management, pages 659–668. ACM.
Stephens, C., Ferrante, O., Olsen, K., and Sood, V. (2008). Standardizing
international taxonomies.
Tanguy, L., Tulechki, N., Urieli, A., Hermann, E., and Raynal, C. (2015).
Natural language processing for aviation safety reports: from classification
to interactive analysis. Computers in Industry. In print.
Tanguy, L., Urieli, A., Calderone, B., Hathout, N., and Sajous, F. (2011). A
multitude of linguistically-rich features for authorship attribution. In PAN
Lab at CLEF.
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical
dirichlet processes. Journal of the american statistical association, 101(476).
BIBLIOGRAPHY 201
Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on
TOEFL. In Raedt, L. D. and Flach, P., editors, Machine Learning: ECML
2001, number 2167 in Lecture Notes in Computer Science, pages 491–502.
Springer Berlin Heidelberg.
Wang, J., Kraska, T., Franklin, M. J., and Feng, J. (2012). Crowder:
Crowdsourcing entity resolution. Proceedings of the VLDB Endowment,
5(11):1483–1494.
202 BIBLIOGRAPHY