Measuring of The Reliability of Nde: F. Fücsök, C. Müller, M. Scharmach
Measuring of The Reliability of Nde: F. Fücsök, C. Müller, M. Scharmach
Measuring of The Reliability of Nde: F. Fücsök, C. Müller, M. Scharmach
ABSTRACT
It is an important question regarding to the contemporary NDT systems how reliable is the result
of the test. This question is rising connected to all diagnosis system in human medicine
mechanical engineering or civil constructions. The reliability of an NDE system means the
consistency of capability the system to detect, to classify and to evaluate the existing deviation
within test pieces. The main elements of the reliability are: the intrinsic capability of the system,
the effect of application parameters and the human factors.
At present time we cannot determine all of the effect of the elements, so we have to measure the
reliability with different methods like POD (probability of determination) and ROC (Receiver
Operating Characteristic) methods. According to the modular concept it is possible to determine
the reliability all of the modules of NDE systems differently.
The NDT section of Scientific Society of Mechanical Engineering recognised the importance of
reliability degree of NDE. So we organised a round-robin test of radiographic film evaluation as
a module of radiographic test, and try to measure the effect of human factor.
The first round-robin test finished and the evaluation method is presented in details as an
example of the measuring of reliability. The second round-robin test is in progress in Hungary.
The Slovenian radiographers are invited to take part in this important and interesting test.
1. Introduction
The reliability of a diagnostic system is an important question for the human doctors, for the
fracture mechanic experts, or for the customers of NDT laboratories. The reliability of an NDE
system means the consistency of capability the system to detect, to classify and to evaluate the
existing deviation within test pieces. But measuring the reliability is a difficult problem because
it depends on a lot of elements.
Many radiographic exposures and film interpretations are made daily in a typical industrial test
laboratory. Yet, questions remain regarding the precise probability of detecting specific
discontinuities, including the reliability of each individual inspector or laboratory. Although each
173
laboratory's most experienced inspectors evaluate each radiograph, the actual reliability of these
inspectors remains somewhat unknown.
If you are in a fieldwork you will find that everybody (including the welders) can evaluate the
radiographic films. So the most serious quarrels are about the results, with other words the
reliability of the radiographic test. That was the reason why we chose the topic of an
international Round robin test (RRT), the radiographic film evaluation. The NDT section of the
Hungarian Scientific Society of Mechanical Engineering recognised the importance of reliability
of the film evaluation, and organised a Round-robin test.
The paper is organized as follows: The next section will give a short background of the ROC
method. Then the practical procedure of the RRT is described. Finally the results of the RRT of
radiographic film evaluation are presented and a new RRT will be announced.
TP: true positive: the defect was indicated where it was present
174
FN: false negative: the defect was not indicated where it was present
TN: true negative: the defect was not indicated where it was not present
FP: false positive: the defect was indicated where it was not present
The probability of detection (POD) or other words the probability of True Positive:
TP
POD = P(TP) =
TP + FN
The probability of false alarm or other words the probability of False Positive:
FP
PFA = P(FP) =
TN + FP
2.2 An example
On the Figure 2 you can see a sketch of a welded seam, which contains 22 cells. The welding
contains a 7 long cell defect, (see the thicker green line). Let suppose, the defect was detected
partly at a wrong place (see the thinner red line).
TP 5
POD= = = 0,71
TP+FN 5+2
175
The probability of false alarm:
FP 2
PFA= = = 0,13
FP+TN 2+13
These results were plotted on the ROC diagram at the Figure 3, where you can see the result of a
perfect tester and a guessing tester, too.
RESULT OF EXAMPLE
0,5
0,13
0,5 1 p(FP) = PFA
p[TP] ROC
(Receiver Operating
Characteristic)
sensitivity raises as reliability curve
p[TP]
p[FP] p[FP]
176
In practice it is not possible to apply continuously growing signal thresholds and to count correct
and false call rates for each. Therefore different discrete categories of signal counting are defined
to be applied by the inspectors during the non-destructive testing evaluation as indicated in
Figure 5.
1.0
1=I
4
3 2 = I + II
p[TP] 2 3 = I + II + III
1 4 = I + II + III + IV
0.5
Determination of the whole curve:
regression by Maximum Likilihood
method, binormal mode
0.0
0.5 1.0
p[FP]
Detectability:
1.0
Standard deviations for all curves
Noise : 1.0
Signal : 1.0
Meanvalue of noise : 0.0
177
Considering the area under the ROC-curve (see Figure 6) it may vary from 0.5 (pure chance
curve 1) up to 1.0 which corresponds to an ideal NDT system belonging to the left corner's step
curve. The fictive systems, shown in Figure 6, are the performance of the system increases from
curve 1 to curve 7.
With the distance from the line 1 a good summary performance value is given showing the
capability of the method or the capability of human factor.
178
0,95 0,05
all data
1
0,9
0,8
11204
0,7 Practice Years: 4
Detection Points
0,6
Best
p(TP)
0,4 Worst
Worst Points
0,3
0 05
0,2
0,1
0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
p(FP)
0,95 0,05
selected data
1
0,9
0,8
11204
0,7 Practice Years: 4
Detection Points
0,6
Best
p(TP)
0,4 Worst
Worst Points
0,3
0 05
0,2
0,1
0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
p(FP)
179
Discussing the circumstances of the RRT with the participants, many of them expressed, they
could not exactly follow the prescription of the test. They have known that the small
discontinuities could be omitted because of the strictest acceptance level, so they omit them. So
the most practiced evaluators, supporting their long term practice, did not write the small gases
and slags into the list, and they have got wrong reliability of detection. So we have to do two
types of evaluations: taking into consideration of all discontinuity (called Old evaluation) and
only the bigger ones which are over the acceptance level 1 of the EN 12517:1998 (called New
evaluation).
On the Figure 8 can be seen the results of the evaluation of the Croatian participant coded 11204
only the flaws over the acceptance level 1. The working point can be found over the 95%
reliability level. It means, the evaluator will met the ASME Code requirements with more than
95% probability.
4. Conclusions
From the evaluation of the results of the RRT we learned, the reliability of the work of the key
persons of laboratories can be increased. The best results of probability of detection is 88,3 %
with 12,9% false alarm in the old evaluation, and 85,2 % POD with 2,1 % PFA in the new
evaluation. These results are reasonable, but the overall results are worse with high false alarm
rate.
It is clear, the reliability of the film evaluation have to increase. Refresh trainings and Round
robin tests are necessary. The Reliability Laboratory in BAM has a lot experiences in the field of
RRT. For this reason, we continue this RRT with a new set of film.
We collected a well-selected set of radiographic shots which fulfil the following demands:
technically good,
have enough planar and volumetric failures,
the true values have to be as correct as possible,
contain different grey scale indications.
The new Round robin test with the new set of films is in progress in Hungary. We invite the
Slovenian radiographers to take part in this important and interesting test.
5. References
[1] Metz, C.E. 'Basic Principles of ROC analysis' Seminars in Nuclear Medicine 8 4 (1978).
[2] Metz, C.E. 'Some practical issues of experimental design and data analysis in radiological
ROC studies' Invest Radiol. 24 (1989), pp 234-245.
[3] Swets, J.A. 'Assessment of NDT systems-Part I' Mater. Eval. 41 11 (1983),
pp 1294-1298.
[4] Swets, J.A. 'Assessment of NDT systems-Part II' Mater. Eval. 41 11 (1983),
pp 1299-1303.
[5] Nockemann, C., Heidt, H., and Thomsen, N. 'Reliability in NDT: ROC study of
radiographic weld inspections' NDT&E International 24 5 (1991), pp 235-245.
[6] Zscherpel, U., ‘Film Digitisation Systems for DIR: Standards, Requirements, Archiving
and Printing’, the e-Journal of NDT & Ultrasonics, Vol.5 (2000) No.5
www.ndt.net/v05n05.htm.
180