Application of Regression and Neural Models To Pre

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228111211

Application of Regression and Neural Models to Predict


Competitive Swimming Performance

Article in Perceptual and Motor Skills · April 2012


DOI: 10.2466/05.10.PMS.114.2.610-626 · Source: PubMed

CITATIONS READS

62 731

7 authors, including:

Adam Maszczyk Robert Roczniok


Akademia Wychowania Fizycznego im. Jergo Kukuczki … Akademia Wychowania Fizycznego im. Jergo Kukuczki …
125 PUBLICATIONS 2,732 CITATIONS 64 PUBLICATIONS 1,175 CITATIONS

SEE PROFILE SEE PROFILE

Zbigniew Waśkiewicz Miłosz Czuba


Akademia Wychowania Fizycznego im. Jergo Kukuczki … 68 PUBLICATIONS 1,464 CITATIONS
84 PUBLICATIONS 1,461 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Zbigniew Waśkiewicz on 08 January 2018.

The user has requested enhancement of the downloaded file.


Ammons Scientific
ORIGINALITY | CREATIVITY | UNDERSTANDING

Dear Author: Please Do Not Post This Article on the Web!*

To maintain the integrity of peer-reviewed and editorially approved


publications in Perceptual and Motor Skills, Ammons Scientific, Ltd. retains
copyright to this article and all accompanying intellectual property rights.
Ammons Scientific, Ltd. provides this copy for the author’s educational
use and research, defined as noncommercial use by the individual author,
and specifically includes research and teaching at the author’s educational
institution, as well as personal educational development and sharing of
the article with the author’s close colleagues. Any other use, including, but
not limited to, reproduction and distribution through paper or electronic
copies, posting on any websites, or selling or licensing additional copies
is prohibited. This article cannot be used for any commercial purpose
whatsoever. Terms of use are available on the Ammons Scientific website.

*A code has been embedded in this PDF to allow the publisher to find copies and remind
posters about the terms of use.

http://www.AmSJP.com
Perceptual and Motor Skills, 2012, 114, 2, 610-626. © Perceptual and Motor Skills 2012

APPLICATION OF REGRESSION AND NEURAL MODELS TO


PREDICT COMPETITIVE SWIMMING PERFORMANCE1

ADAM MASZCZYK and ROBERT ROCZNIOK ZBIGNIEW WAŚKIEWICZ

Department of Sports Training Department of Team Sport Games


Chair of Methodology and Statistics

MIŁOSZ CZUBA KAZIMIERZ MIKOŁAJEC

Department of Sports Training Department of Team Sport Games

ADAM ZAJĄC ARKADIUSZ STANULA

Department of Sports Training Department of Sports Training


Chair of Methodology and Statistics
Jerzy Kukuczka Academy of Physical Education
Katowice, Poland

Summary.—This research problem was indirectly but closely connected with


the optimization of an athlete-selection process, based on predictions viewed as de-
terminants of future successes. The research project involved a group of 249 com-
petitive swimmers (age 12 yr., SD = 0.5) who trained and competed for four years.
Measures involving fitness (e.g., lung capacity), strength (e.g., standing long jump),
swimming technique (turn, glide, distance per stroke cycle), anthropometric vari-
ables (e.g., hand and foot size), as well as specific swimming measures (speeds in
particular distances), were used. The participants (n = 189) trained from May 2008
to May 2009, which involved five days of swimming workouts per week, and three
additional 45-min. sessions devoted to measurements necessary for this study. In
June 2009, data from two groups of 30 swimmers each (n = 60) were used to identify
predictor variables. Models were then constructed from these variables to predict
final swimming performance in the 50 meter and 800 meter crawl events. Nonlin-
ear regression models and neural models were built for the dependent variable of
sport results (performance at 50m and 800m). In May 2010, the swimmers’ actual
race times for these events were compared to the predictions created a year prior to
the beginning of the experiment. Results for the nonlinear regression models and
perceptron networks structured as 8-4-1 and 4-3-1 indicated that the neural models
overall more accurately predicted final swimming performance from initial training,
strength, fitness, and body measurements. Differences in the sum of absolute error
values were 4:11.96 (n = 30 for 800m) and 20.39 (n = 30 for 50m), for models struc-
tured as 8-4-1 and 4-3-1, respectively, with the neural models being more accurate. It
seems possible that such models can be used to predict future performance, as well
as in the process of recruiting athletes for specific styles and distances in swimming.

A scientific approach to sports, especially competitive sports, is multi-


dimensional, incorporating such traditional sciences as pedagogy (Hamil-
1
Address correspondence to Adam Maszczyk, Department of Sports Training, Academy of
Physical Education, Mikolowska 72A str., 40-065 Katowice, Poland or e-mail (a.maszczyk@
awf.katowice.pl).

DOI 10.2466/05.10.PMS.114.2.610-626 ISSN 0031-5125


Prediction of Competitive Swimming Performance 611

ton, 2009), psychology (Woodman, Zourbanos, Hardy, Beattie, & McQuil-


lan, 2010), sports medicine (Barton & Lees, 1993), and physiology (Jones,
Padilla, Zhu, 2010). Most relations in sport science are not linear, as each
unit change in an independent variable will not always bring about simi-
lar change in the dependent variable (Zehr, 2005). Thus, sport scientists
must use nonlinear tools such as nonlinear regression or neural models
when investigating and model with a wide variety of relationships, in-
cluding physiological responses to exercise, periodization of training
loads, and optimizing the recruitment of talented youth.
Until recently, the typical approach to optimizing athlete recruitment
and prediction of sport results were based on multiple regression proce-
dures. From a cognitive theoretical and methodological perspective, ap-
plication of neural networks to optimisation seems plausible. Neural
networks can be employed wherever there are relations between indepen-
dent variables (inputs) and dependent variables (outputs); however, they
are especially useful when seeking complex input-output relationships
which are difficult to capture using the statistical methods usually em-
ployed (e.g., analysis of relations or the separation of taxonomically ho-
mogeneous groups). Since relations between variables may be either lin-
ear or nonlinear, in recent years, artificial neural networks have been used
more frequently. Today, this tool is frequently used for solving problems
in modelling and prediction (Haykin, 1994; Maier, Wank, Bartonietz, &
Blickhan, 2000; Lees, 2002; Zadeh, 2002; Zehr, 2005; Bartlett, 2006).
The artificial neural network applied to optimise the recruitment pro-
cess and predict outcome allows simulation of the performance an indi-
vidual is likely to achieve based on his individual data alone. Naturally,
the reliability of this information depends on whether a suitable regres-
sion model (linear or nonlinear) has been correctly built, the structure and
type of the neural network regarding its capacity for making generalisa-
tions, and the volume of data available to test the network. If these con-
ditions are met, then it is possible that a recruitment process can be opti-
mised and the prospective results of young athletes predicted well.
The nature of recruitment and selection of an athlete consists in find-
ing the vector of the candidate’s abilities with respect to each stage of
sports training. Therefore, the selection process can be optimised by cre-
ating a large source of information on a candidate’s sport abilities with as
few examined features as possible, using a regression model and a neural
model (multilayer perceptron or linear). Because a large number of cases
with optimal sources of information on an dependent variable Y or vari-
ables Y1, Y2 (sports outcomes) can be created using only a few variables, it
is not necessary to take multiple measurements of many features, some of
which do not inform outcomes because they are collinear.
612 A. MASZCZYK, et al.

This article is a continuation of the problem presented by Roc-


zniok and Rygula (2007) using Kohonen’s Network and Maszczyk, Za-
jac, and Rygula (2011) using Neural Network Models (Multilayer percep-
tron; MLP), but is not part of a larger study. In the current study, models
based on Kohonen’s networks were used to classify participants into three
groups, which after one year of training achieved very good, average, or
very weak performances. This implies that these models may be used for
data mining aimed at assisting recruitment of candidates for competitive
swimming. It is hypothesized that the neural network modelling will bet-
ter identify the swimmers compared to a typical regression model.
The investigation tested which models (regression models or artificial
neural networks) predicted sport results more precisely, which ones better
support and optimise the recruitment of athletes as well as the selection
process, and identified variables that offer the most information and qual-
ity as explanatory variables in the regression model and neural models.
Method
Participants
Written informed consent was obtained from all participants. Partici-
pants were free from any known cardiovascular or metabolic diseases as
reported in a health questionnaire. They were informed of the aim and ex-
perimental risks of the study. This project was approved by the Bioethics
Committee for Scientific Research at the Academy of Physical Education
in Katowice. The authors declare that they have no conflict of interest.
Model Construction group.—The investigation considered sports results
achieved by a Model Construction group of 189 swimmers (M age = 12 yr.,
SD = 0.5), from the Silesian Region of Poland. The participants were se-
lected using a purposeful technique, in which the selection criterion was
the athletes’ having four years of training experience and their ability to
swim three competition strokes correctly. The core investigation was pre-
ceded by 12 months of general and specific physical fitness training. All
athletes were tested before and after the twelve months training period.
Each athlete participated in 5 training sessions per week in his own sports
club and three additional single times for the purpose of this research,
once in an indoor swimming pool (45 min.) and twice in a gym (90 min.).
During swimming practice, the athletes were taught to refine basic swim-
ming techniques. The different structures of training loads included the
experimental factor.
Model Testing group.— Two groups of swimmers were tested to predict
performance at the distances of 50 m (n = 30) and 800 m (n = 30). Firstly,
these participants were trained in the same way as the Model Construc-
tion group, and measures were taken in June 2009. Secondly, performance
in the two swim distances, 50 m and 800 m, were measured in the same
Prediction of Competitive Swimming Performance 613

two groups of 30 swimmers in May 2010 so that these real results could
be compared to the models’ predictions made using the June 2009 data.
Design
In order to test the hypothesis, multidimensional statistical analyses
were applied to measurements taken in the Model Construction group.
The values of variables measured by means of robust scales and tests were
used in multiple regression models. The research problem was addressed
using empirical and predictive investigation, based on the data obtained
in the form of a multidimensional vector of variables, including inde-
pendent Xn variables and two dependent variables Y1 and Y2. On the basis
of measurement results of 189 swimmers from 2008–2009, mathematical
models were created. Then, in 2009–2010, an additional study was con-
ducted with the Model Testing group, in order to verify previously cre-
ated models, which were based on the two groups of 30 swimmers for the
50 m (n = 30) and 800 m (n = 30) crawl events.
Measures
Numerous characteristics of the participants were measured as inde-
pendent variables, such as body build, general and specific physical fit-
ness. The dependent variables were the results achieved by the swimmers
in competition at distances of 50 m and 800 m crawl stroke after 12 months
of training.
Kicking.—25 m LL2 (sec.) and 50 m LL (sec.): flutter kick with a kick
board, only legs active.
Exercises.—25 m AA (sec.): crawl stroke, arms only, the board held
between legs. 800 m ALAS (sec.): crawl stroke, alternatively legs, arms
and complete stroke, 4 × 200 m (each 200 m = 50 m LL/50 m stroke/50 m
AA/50 m stroke).
Technique.—Turn (sec.): whole-body reaction time and spatial orien-
tation in the flip turn for crawl stroke. The participant stood 7.5 m from
the turning wall, swam to the wall at maximum speed when signalled,
performed a flip turn, glided, and swam a crawl stroke back to the start-
ing point. Three trials were measured (± .01 sec.) and the best time was
recorded. Starting dive (sec.) off the block followed by the crawl stroke:
the participant stood on the block in the starting position, dove off when
signalled and swam crawl stroke for 7.5 m. The time was measured (± .01
sec.) in three trials and the best time was recorded. Glide test (sec.) was
used to assess balance: the participant stood with his back against the
turning wall and, when signalled, performed a 7.5-m glide to swim the
crawl stroke. Three time trials (± .01 sec.) were performed and the best re-
sult was recorded.
2
Throughout the rest of the manuscript, LL = legs only with kickboard; AA = arms only with
board between legs; ALAS = alternating arms only, legs only, and compete crawl stroke.
614 A. MASZCZYK, et al.

Indexes.—Swimmer Coordination Index (SCI) was calculated us-


ing the times for swims and kicks during the crawl stroke: SCI = [(25 m
AA + 25 m LL)/2] − 25 m. AA cycle (n cycles/25 m) was measured as the
crawl stroke cycle of an athlete swimming with a board held between the
legs (i.e., during the AA measure). The Distance Per Stroke (DPS) was cal-
culated over a distance of 25 m; the better technique a swimmer has, the
longer is a single cycle. Crawl stroke cycle, all elements of the stroke (cy-
cles/25 m): this indicates the power of thrust of the swimmer’s arms and
how well an athlete “feels” the water. The starting position was in the wa-
ter to avoid possible interference by a starting dive.
Speed.—Speed variables were evaluated every 6 weeks, yet the result
accepted for further analysis was recorded in May 2009. In order to build
the model, one of two measured times having the smallest values at the
end of the training cycle were selected. The variables were 25 m (sec.), 50
m (sec.), or 800 m (sec.) crawl stroke time.
General fitness measures.—Standing long jump (cm) was a measure
of power. The participant stood in front of a line with legs spread at hip
width, swung both arms and jumped forward as far as possible. The dis-
tance between the starting line and the last mark made by the swimmer’s
feet touching the ground was measured. The longer distance of two tri-
als was recorded (± 1 cm). Leg power is very significant in swimming, as
it determines the starting dive length and largely determines turn speed;
moreover, many authors propose it as a good criterion for evaluating an
athletes’ speed-strength potential (Bompa, 2000). Vital lung capacity (ml),
the maximum amount of air a person can exhale from the lungs after a
maximum inspiration, was measured by a spirometer.
Anthropometric measures.—Body height (cm), body mass (kg), foot
length (cm) measured between the anthropometric points pterion and
acropodion (Alfredson, Nordstrom, Pietila, & Lorentzon, 1999; Mester
& Perl, 1999), hand length (cm) measured between the anthropometric
points stylion and phalangion, arm span (cm), the distance between the
dactylions of the right and left hands measured when the arms are spread.
Rohrer Coefficient (%) = mass(g)/height(cm) * 100; Body Mass Index
(BMI, %) is the height (cm) to squared body mass (kg) ratio (Deurenberg,
Weststrate, & Seidell, 1991).
Procedure
The data of 189 swimmers, which were entered into the neural net
and regression models, were obtained from the measurements made from
May 2008 to May 2009. Predictors were confirmed by regression and neu-
ral net models for the Model Testing group, comprising two new groups of
30 swimmers each, of the same age and training experience as the Model
Construction group, and whose results were not used to build the models;
Prediction of Competitive Swimming Performance 615

these participants were trained and measures taken in June 2009. The re-
sults of predictions for these two groups were verified by comparing the
model-generated predictions with the actual results achieved by the same
two groups of 30 swimmers in May 2010 (n = 60), see Fig. 1 for details.
Analysis
In general, whenever a regression model can be transformed into a
linear model, this is the preferred analytic method for estimating the re-

Training Group I

189 Swimmers
Group I, N = 189 (2008–2009)

Model Construction Group


Model Construction

Testing

20 Variable
2 Outcomes Correlations analysis
log-transformation
50 m Training 800 m

Regression Regression
Model Y1 Model Y2
4 indenpendent 8 indenpendent
variables variables

Neural models Neural models


4-3-1 8-4-1

Training Group II

30 30
Group II, N = 60 (2009–2010)

Sprinters Distance

Testing
and prediction
Model Testing

4 variables 8 variables
measured measured

Training

50 m race 800 m race


time time
Comparison of

Comparison of

Testing
predicitions

predicitions

(race times)

1. Regression 1. Regression
2. Neural models 2. Neural models
Y150 m Y2800 m

Fig. 1. Procedure of model construction and testing groups


616 A. MASZCZYK, et al.

spective model. The linear multiple regression model is very well under-
stood mathematically and, from a pragmatic standpoint, is much easier to
interpret. Therefore, returning to the simple exponential regression mod-
el of Y1 as a function of Xi, one could convert this nonlinear regression
equation into a linear one by simply taking the logarithm of both sides of
the equations, so that ln (Y1) = −b1*Xi. If one substitutes ln (Y1) with y, the
standard linear regression model results, as shown earlier without the in-
tercept, which was ignored to simplify matters. Thus, the Y1 rate data can
be log-transformed and then multiple regression used to estimate the re-
lationship between Xi and Y1, that is, compute the regression coefficient b1.
Means and standard deviations were calculated for all variables. The
Kolmogorov-Smirnov test of normality and Levene’s test of homogene-
ity of variance were performed to verify the normality of the distribution.
The scores in trials and tests were used as the explanatory variables
(20 variables). A correlation matrix was calculated for independent varia-
bles (X1 − X20) with dependent variables (Y1, 50 m; Y2, 800 m). The set of the
model’s independent variables which were most related with the depen-
dent variables and less related to each other, was assembled using Pear-
son’s coefficients (Kothari, 2004; Benigno & Woodford, 2006). Non-linear
variables were log-transformed.
The multiple stepwise regression was used to select explanatory vari-
ables offering the best prediction of athletes’ results for swim distances of
50 m and 800 m in the Model Construction phase. These four predictor
variables were log-transformed and used to form regression models pre-
dicting Y1 (50 m) (glide, foot length, body height, 25 m AA) and Y2 (800
m; 50 m LL, 800 m ALAS, 25 m AA, standing long jump, AA cycle, hand
length, vital lung capacity, Rohrer Coefficient); see Table 1 for details.
Regression and Neural Network Models
Constructed graphs of variables indicated nonlinearity. It is well rec-
ognized that when any type of statistical inquiry in which principles from
some body of knowledge enter into the analysis is likely to lead to a non-
linear model. Such models play a very important role in understanding
the complex interrelations among variables. A nonlinear model is one
in which at least one of the parameters is nonlinear. More formally, in
a nonlinear model, at least one derivative with respect to a parameter
should involve that parameter. In this study the Y1(t) = exp(a1t + b1t2) and
Y2(t) = exp(a2t + b2t2) nonlinear models were used, and verified after being
transformed to linear models using the transformation Xn1(t) = ln Y1(t) and
Xn2(t) = ln Y2(t).
For generalization and prediction of swimming results, Multilayer
Perceptron (MLP) neural models were used to model the 50 m and 800
m performances with the following structures: 4-3-1 and 8-4-1 [four and
Prediction of Competitive Swimming Performance 617

Table 1
Regression Statistics for Y1–50 m and Y2–800 m Regression Models

Variable β SE β B SE B t p
Y1, 50 m crawl race
Intercept 147.64 1.52 57.05 .001
Glide −.48 .20 −25.54 8.72 −4.45 .002
Foot length .90 .20 64.41 8.85 8.26 .001
Body height −.74 .20 −53.24 8.26 −7.05 .003
25 m AA −.43 .20 −32.43 8.42 −4.08 .001
Y2, 800 m crawl race
Intercept 90.43 2.23 82.35 .001
50 m LL −.00 .09 −0.27 3.39 −0.05 .008
Vital lung capacity .34 .05 11.71 2.79 7.39 .001
800 m ALAS .35 .05 11.21 3.09 5.94 .001
Hand length .34 .05 10.73 2.96 6.35 .003
25 m AA −.53 .08 −18.61 3.98 −6.94 .001
Rohrer coefficient .20 .04 5.19 2.35 4.47 .001
Standing long jump −.26 .05 −7.23 2.47 −5.12 .002
The AA technique −.24 .05 −6.45 2.94 −3.95 .004
Note.—Percent variance accounted for: Y1, 50 m swim, R2 = .83; for Y2, 800 m swim, R2 = .89.

eight input neurons (variables), respectively, one hidden layer (with three
and four neurons, respectively) and one outcome]. In the Neural Network
Statistica Module (NNSM), 100 epochs is the standard procedure, then 20
epochs of optimization. The Network improves its performance during
the teaching process to give smaller errors in the training and validation
set. When the network is overlearned, an increase in validation error be-
gins. At this moment, one must stop the learning process and the NNSM
automatically shifts back model values to optimal epochs. (Szaleniec,
Witko, Tadeusiewicz, & Goclon, 2006; Szaleniec, Tadeusiewicz, & Witko,
2008; Maszczyk, et al., 2011). The networks were trained using the Leven-
berg-Marquardt algorithm. The training process had an iteration charac-
ter (which means that in subsequent epochs of learning, the thresholds are
modified in such a way as to diminish the value of the total network er-
ror). The Levenberg-Marquardt algorithm is most often applied in small
networks, in this case an 8-4-1 network for the 800 m and a 4-3-1 network
for the 50 m swim (Fig. 2). The benefit of this algorithm is the lack of hid-
den layers determined by the user. The Neural Model application (with
the Levenberg-Marquardt algorithm) automatically suggests the optimal
number of layers-to-epochs. The researcher only has to define the number
of epochs. If one wants to have a more accurate model, one must add more
learning epochs, yet care is needed, for when the network is overlearned,
it loses the ability to generalize data. In our models, the initial setting was
600 epochs. The level of significance for all analyses was set at p ≤ .05.
618 A. MASZCZYK, et al.

Example for neural model


Y1 = 50 m

Input Hidden Output


layers
G

FL
Y1 = 50 m
BH

25 m AA

Fig. 2. Artificial Neural Network Model for dependent variable Y1 = 50 m with 4-3-1
structure. G = glide, FL = foot length, BH = body height and 25 m AA = arms n cycles/25 m.

Testing Data used to Verify Model-generated Predictions


The primary goal of the investigation was to compare and assess the
predictive abilities of the nonlinear regression and neural models. This ne-
cessitated testing the prediction values against the actual performance of
swimmers.
After data gathering, regression and neural models were built, and
a second phase of research was conducted. In June 2009, 60 participants
(two groups of swimmers of which one (n = 30) consisted of sprinters and
the other (n = 30) included long distance swimmers) performed the same
training protocol as the first sample, and once again independent vari-
ables were measured, either 50 or 800 m crawl stroke, depending on which
model their data were to be tested. Race times after one year of training
were predicted using the above regression and neural models, then one
year after the prediction of results for the sprinters and distance swim-
mers using the models, their times were recorded for the 50 and 800 m rac-
es, respectively (true values). In May 2010 model-generated predictions
were compared to the actual times swimmers achieved at the 50 and 800
m distances, and absolute errors were calculated.
The calculation of absolute errors was dictated by the specificity of
the regression models. The regression function is built upon the method
of least squares in which the sum of the squares in the numerator of the
function must approximate as closely as possible that of the denominator.
This creates a situation in which the model predicts results with great de-
viation, yet after adding up the deviations, the error will be close to zero.
Thus, the model does not possess highly specific predictive possibilities.
Only after adding up the values of absolute deviations in the neural and
regression models can one detect the superiority of nonlinear neural mod-
els, in which the absolute error is much smaller than in the regression
models (see details in Table 3 and Table 4).
Prediction of Competitive Swimming Performance 619

Table 2
Regression Statistics of Assessment of 4-3-1 and 8-4-1 Structure
MLP Models for Dependent Variables Y1, 50 m and Y2, 800 m

Data Learning Validation Test


Standard Series Series Series
MLP 4-3-1
Normalized Root .34 .33 .34
Correlation .94 .95 .96
MLP 8-4-1
Normalized Root .18 .22 .18
Correlation .98 .98 .98

All statistical analyses in both groups of athletes were carried out on a


PC using the statistical package Statistica 9.0, Statistica Neural Networks
module (Release 4.0 E), and Excel 2010 from Microsoft Office 2010.
Results
The distribution graph of variables showed nonlinearity; thus, the
variables were mathematically transformed to linear functions. With this
set of explanatory variables assembled, the construction of the multiple
regression models began. The first regression model, for the distance of 50
m, had the following form where G = glide, FL = foot length, BH = body
height and 25 mAA = arms n cycles/25 m (variables that came out of the
modeling process as predictive):
Y1(50m) = 147.64 − 25.54*G + 64.41*FL − 53.24*BH − 32.43*25 mAA ± 10.11.

The second regression model, for the distance of 800m, where


VLC = vital lung capacity, HL = hand length, RC = Rohrer coefficient;
SLJ = standing long jump, AAc = AAcycle, had the following form (vari-
ables that came out of the modeling process as predictive):
Y2(800m) = 90.43 − 0.27*50 mLL + 11.71*VLC + 11.21*800 mALAS +
10.72*HL − 18.61*25 mAA + 5.19*RC – 7.23SLJ – 6.45*AAc ± 8.72.

Using the independent variables that were significantly associated


with performance, the multilayer perceptron (MLP) models were con-
structed (Table 2). The outcome suggested that the perceptron model al-
lowed prediction of values of the dependent variable Y1, 50 m. Results for
the 4-3-1 networks, the Normalized Root Mean Squared Error (NRMSE)
for the training, validation and test series are shown in Table 2. The po-
tential practical usefulness of the constructed model was confirmed by a
large magnitude of correlation coefficients between independent and de-
pendent variables in each group (i.e., .94, .95 and .96). For the dependent
variable Y2, 800 m, the NRMSE value for the training subset was .17, while
620 A. MASZCZYK, et al.

Table 3
Prediction for Dependent Variables Y1, 50 m (n = 30)

n True Regression Model MLP 4-3-1


Values
(sec.) Calculated Regression Absolute Calculated Network Absolute
Value of Error Regression Value of Error Network
Regression (sec.) Error Network (sec.) Error
(sec.) (sec.) (sec.) (sec.)
1 27.73 29.38 −1.65 1.65 28.22 −0.49 0.49
2 34.22 36.33 −2.11 2.11 34.11 0.11 0.11
3 29.55 30.65 −1.1 1.1 29.78 −0.23 0.23
4 37.12 39.22 −2.1 2.1 38.1 −0.98 0.98
5 38.28 36.96 1.32 1.32 38.22 0.06 0.06
6 38.74 35.11 3.63 3.63 36.46 2.28 2.28
7 37.43 36.23 1.2 1.2 36.78 0.65 0.65
8 41.57 42.84 −1.27 1.27 43.02 −1.45 1.45
9 39.58 41.68 −2.1 2.1 41.72 −2.14 2.14
10 35.75 37.88 −2.13 2.13 37.85 −2.1 2.1
11 29.62 29.76 −0.14 0.14 29.66 −0.04 0.04
12 29.48 28.66 0.82 0.82 29.45 0.03 0.03
13 39.02 39.23 −0.21 0.21 39.17 −0.15 0.15
14 32.18 32.11 0.07 0.07 32.14 0.04 0.04
15 38.33 38.42 −0.09 0.09 38.39 −0.06 0.06
16 34.23 33.23 1.00 1.00 33.35 0.88 0.88
17 36.76 35.67 1.09 1.09 35.69 1.07 1.07
18 28.89 29.12 −0.23 0.23 29.08 −0.19 0.19
19 29.22 28.88 0.34 0.34 29.03 0.19 0.19
20 27.87 27.68 0.19 0.19 27.72 0.15 0.15
21 31.33 32.12 −0.79 0.79 32.09 −0.76 0.76
22 32.56 32.36 0.2 0.2 32.42 0.14 0.14
23 34.12 34.67 −0.55 0.55 34.65 −0.53 0.53
24 33.45 36.35 −2.9 2.9 35.44 −1.99 1.99
25 36.56 35.56 1.00 1.00 36.12 0.44 0.44
26 29.34 28.34 1.00 1.00 29.11 0.23 0.23
27 28.78 27.45 1.33 1.33 27.98 0.8 0.8
28 31.58 29.56 2.02 2.02 30.55 1.03 1.03
29 30.58 32.68 −2.1 2.1 31.72 −1.14 1.14
30 34.11 34.25 −0.14 0.14 34.15 −0.04 0.04
Sum 1,007.98 1,012.38 −4.40 34.82 1,012.17 −4.19 20.39

for the validation and test set values were .22 and .17, respectively. This
indicates that the network model had a good data fit, because NRMSE
values were low and comparable in learning, validation, and test series
(Table 2) (Werbos, 1994). Thus the practical usefulness of this model was
supported by a large magnitude of correlation coefficients between inde-
pendent and dependent variables in each group also given in Table 2.
Table 3 and Table 4 include the results of the verification procedure
Prediction of Competitive Swimming Performance 621

Table 4
Prediction for Dependent Variables Y2, 800 m (n = 30)

n True RegressionModel MLP 8-4-1


Values
(sec.) Calculated Regression Absolute Calculated Network Absolute
Value of Error Regression Value of Error Network
Regression (sec.) Error Network (sec.) Error
(sec.) (sec.) (sec.) (sec.)
1 587.47 589.45 −1.98 1.98 588.12 −0.65 0.65
2 508.56 521.23 −12.67 12.67 516.45 −7.89 7.89
3 507.15 512.87 −5.72 5.72 510.45 −3.3 3.3
4 641.41 639.12 2.29 2.29 640.31 1.1 1.1
5 582.42 567.34 15.08 15.08 579.78 2.64 2.64
6 803.91 807.23 −3.32 3.32 805.81 −1.9 1.9
7 538.58 529.12 9.46 9.46 535.55 3.03 3.03
8 558.78 580.11 −21.33 21.33 578.11 −19.33 19.33
9 565.25 571.45 −6.2 6.2 569.22 −3.97 3.97
10 655.45 651.34 4.11 4.11 653.12 2.33 2.33
11 692.32 723.56 −31.24 31.24 719.38 −27.06 27.06
12 574.77 587.23 −12.46 12.46 581.33 −6.56 6.56
13 547.28 538.34 8.94 8.94 545.33 1.95 1.95
14 576.15 587.34 −11.19 11.19 585.56 −9.41 9.41
15 632.75 636.67 −3.92 3.92 635.45 −2.7 2.7
16 645.28 701.21 −55.93 55.93 701.24 −55.96 55.96
17 589.39 567.34 22.05 22.05 569.54 19.85 19.85
18 678.53 681.34 −2.81 2.81 679.89 −1.36 1.36
19 587.78 578.34 9.44 9.44 580.77 7.01 7.01
20 599.31 589.87 9.44 9.44 593.67 5.64 5.64
21 687.25 678.19 9.06 9.06 681.34 5.91 5.91
22 691.84 645.45 46.39 46.39 655.45 36.39 36.39
23 598.43 589.89 8.54 8.54 596.12 2.31 2.31
24 578.34 571.23 7.11 7.11 572.11 6.23 6.23
25 603.25 598.27 4.98 4.98 601.81 1.44 1.44
26 546.62 541.23 5.39 5.39 543.67 2.95 2.95
27 567.91 564.71 3.2 3.2 565.01 2.9 2.9
28 605.04 612.96 −7.92 7.92 611.93 −6.89 6.89
29 597.31 599.87 −2.56 2.56 595.24 2.07 2.07
30 621.42 619.36 2.06 2.06 620.19 1.23 1.23
Sum 18,169.95 18,181.66 −11.71 346.79 18,211.95 −42 251.96

by which the prediction values generated by the nonlinear neural net-


works and nonlinear regression models for particular groups of competi-
tive swimmers were compared with the actual race times for the tested
swimmers.
Discussion
The primary objective was to identify the efficiency and predictive
622 A. MASZCZYK, et al.

usefulness of the artificial neural networks treated as a tool for recruitment


of athletes, in contrast to the widely used regression models. An attempt
was made to identify which variables were most informative and thus se-
lected as models’ explanatory variables. The nonlinear form of the multi-
ple regression (based on measurements) was used to select those variables
which accounted for the most variance for athletes’ swimming distances
of 50 m and 800 m, and those variables were log-transformed and used to
build models for the 50 m and 800 m swim times of the Model Construc-
tion group. Therefore, the model predicting Y1 (50 m swim) was built with
the following transformed independent variables: 25 m AA, glide, body
height and foot length. In the regression model these variables accounted
for 83% of the variance in race.
In the analysis of the effect of these independent variables on race
times used to build model predicting Y1 (performance in 50 m sprint), race
time was related to dynamic strength, swimming technique, and anthro-
pometric parameters (hand and foot size). These results correspond to
Hannula and Thornton’s findings (2001). They showed in their study that
the most predisposed for the swim sprint are subjects with a good “feel”
of the water, have the long legs and feet, quickly learn new swimming
techniques and have excellent coordination abilities.
The predictiveness of the variable “glide” is evident from Laughlin
and Delves (2004), which indicated that a swimmer moves in a medium
one thousand times thicker than air, so hydrodynamic efficiency is ex-
tremely important. In the ideal, athletes will assume the best position to
move smoothly through the water and perform long arm strokes that en-
gage the least resistance (indicated by 25 m AA). In the course of labora-
tory testing, the fastest swimmers generated weaker propulsive force than
their slightly slower colleagues. This apparent paradox arises because de-
spite being capable of generating a stronger propulsive force, the former
do not need to exert extended energy because they have better hydrody-
namic efficiency. This suggests that for the 50m swim sprint, the analyti-
cal procedure provided four indicators that tap hydrodynamic efficiency.
From this standpoint, they should be useful as reliable predictors of the
development of young competitive swimmers.
Using similar methodology, the variables 50 m LL, 800 m ALAS, 25
m AA, standing long jump, AA cycle, hand length, vital lung capacity,
and Rohrer Coefficient were selected to include in model Y2 (800 m). The
specified variables explained close to 89% (R2 = .89) of the variance in 800
m race times. A larger number of abilities were required to predict perfor-
mance in the 800 m swim distance than in the 50 m swim sprint. The tests
supported earlier predicted relations of endurance and vital lung capacity.
This finding also indicates that this relation is among the dominant factors
Prediction of Competitive Swimming Performance 623

underlying prolonged exertion (Hannula & Thornton, 2001). An interest-


ing finding was the significance of the 50 m LL variable in the biometric
regression model of the 800 m swim distance. As shown by the values
of the model’s standardised parameters, the variable’s usefulness arises
from it reflecting outstanding speed and technique. Moreover, as indicat-
ed by the inclusion of somatic variables as significant predictors, adequate
body build, and especially effective use of the large muscle groups in the
legs also significantly affects endurance swim performance.
This is consistent with Hannula and Thornton (2001), who defined
endurance as the time an individual requires to perform an exercise while
maintaining predetermined effort intensity. The regression model’s struc-
tural parameter values for the 800 m race time outcome variable shows
that the result achieved by swimmers in this endurance event are strong-
ly influenced by the athlete’s dynamic strength, swimming technique,
anthropometric variables, vital capacity, and the strength of the lower
limbs. These results are in accordance with earlier findings of Laughlin
and Delves (2004) as well as Counsilman (1977), who indicate that per-
formance in endurance events were strictly related to dynamic strength,
swimming technique, and anthropometric variables, indicating a relation-
ship between somatic build and sport results. Because the eight abilities
represent the best combination of the regression model’s explanatory vari-
ables and explain as much as 89% of the variability in the 800 m race times,
these variables can be considered as good predictors of sport results for
youth swimmers in endurance events.
The same variables, most informative in explaining the regression
models, were used to build the neural models. The practical value of the
perceptron model structured as 4-3-1was confirmed. The multilayer per-
ceptron network built for the dependent variable Y2 (800 m) using 8 inde-
pendent variables was structured as 8-4-1; this model had very good pre-
dictive properties.
To assess which of the modeling tools used during this investigation
would be more useful for potentially recruiting athletes, 60 athletes whose
results were not used to build either the multiple regression models or the
neural models were sampled at random. Assessing the data presented in
Tables 3 and 4 showed that if only the total value of errors were consid-
ered, the nonlinear regression’s algorithms seem to outperform the neural
networks’ algorithms considerably, in terms of prediction quality. How-
ever, after the regression errors and the neural network errors were trans-
formed into absolute errors and summed, it became clear that the neural
networks’ errors were much lower than those generated by multiple re-
gressions (Tables 3 and 4). Thus, the neural networks generated smaller
predictive errors. Data presented in Table 3 and Table 4 shows that errors
624 A. MASZCZYK, et al.

of the neural networks are negative, denoting that prediction errors were
larger for athletes achieving the fastest race times. The goodness of fit be-
tween the networks and the outcomes predicted for well-performing ath-
letes was very high.
These findings confirmed the conclusions that Edelman-Nusser,
Hohmann, and Henneberg (2001) formulated and point to the applicabil-
ity of perceptron networks in predicting swimming performance. They
demonstrated the possibility of application of nonlinear mathematical
models, based on Artificial Neural Networks in predicting performance
of elite swimmers at the Olympic Games in Sydney. Their data included
results from the previous nineteen years and, their neural model with a
9-2-1 structure was very similar to that presented in this study. Discrepan-
cies with input variables and hidden layers can result from differences in
the level of training and experience of swimmers.
The research results indicate that the neural models are a useful and
potentially superior tool which offers better optimisation possibilities in
predicting sports results, athlete recruitment and selection processes, than
the widely applied regression models. The results of analysis and com-
parisons obtained using modelling allowed creation of realistic models of
sports results based on neural network technology, which predicted fu-
ture performance reasonably accurately based on prior measures of per-
formance and body characteristics.
The present models developed were based on objectively measured
values of the samples. With the utilization of module Run One-off Case
(Haykin , 1994; Lula & Tadeusiewicz, 2001; Edelmann-Nusser, et al., 2001;
Maszczyk, et al., 2011), competitive swimmers’ sports results showed
good agreement with the model prediction after one year of training. The
multilayer perceptron networks (MLP) allows multilayer perception mod-
eling based on data input from a keyboard. In Option Run One-off Case,
one can modify input data values and consequently predict outcomes for
competitive swimmers. Further research can refine and develop this ap-
proach with other swimming events, and athletes of different ages and
sport levels.
References
Alfredson, H., Nordstrom, P., Pietila, T., & Lorentzon, R. (1999) Bone mass in the
calcaneus after heavy loaded eccentric calf-muscle training in recreational athletes
with chronic achilles tendinosis. Calcified Tissue International, 64, 450-455.
Bartlett, R. M. (2006) Artificial intelligence in sports biomechanics: new dawn or
false hope? Journal of Sports Science and Medicine, 5, 23-42.
Barton, G., & Lees, A. (1993) Development of a connectionist expert system to identify
foot problems based on under foot pressure patterns. Clinical Biomechanics, 10,
31-39.
Prediction of Competitive Swimming Performance 625

Benigno, P., & Woodford, M. (2006) Optimal taxation in an RBC model: a linear-quad-
ratic approach. Journal of Economic Dynamics and Control, 30(9-10), 1445-1489.
Bompa, T. O. (2000) Training guidelines for young athletes. Champaign, IL: Human Kinet-
ics. Pp. 1-20.
Counsilman, J. (1977) Swimming power. Swimming World, 10, 33-41.
Deurenberg, P., Weststrate, J. A., & Seidell, J. C. (1991) Body Mass Index as a measure
of body fatness: age- and sex-specific prediction formulas. British Journal of Nutri-
tion, 65, 105-114.
Edelmann-Nusser, J., Hohmann, A., & Henneberg, B. (2001) Prediction of the Olympic
competitive performance in swimming using neural networks. In J. Mester, G.
King, H. Strüder, E. Tsolakidis, & A. Osterburg (Eds.), Book of Abstracts of the 6th
Annual Congress of the European College of Sport Science, Cologne: Nice, France: Eu-
ropean College of Sport Science. P. 328.
Edelmann-Nusser, J., Hohmann, A., & Henneberg, B. (2002) Modeling and prediction
of competitive performance in swimming upon neural networks. European Journal
of Sport Science, 2, 1-10.
Hamilton, D. (2009) Pedagogy and the long course of learning. Pedagogy, Culture, &
Society, 17, 115-121.
Hannula, D., & Thornton, N. (2001) The swim coaching bible. Vol. 1. Champaign. IL:
Human Kinetics.
Haykin, S. (1994) Neuralnetworks: a comprehensive foundation. New York: Macmillan.
Jones, M. K., Padilla, O. R., & Zhu, E. ( 2010) Survivin is a key factor in the differential
susceptibility of gastric endothelial and epithelial cells to alcohol-induced injury.
Journal of Physiology and Pharmacology, 61, 253-264.
Kothari, C. R. (2004) Research methodology methods and techniques. New Delhi: New
Age International. Pp. 55-152.
Laughlin, T., & Delves, J. (2004) Total immersion: the revolutionary way to swim better,
faster, and easier. New York: Fireside. Pp. 12-34.
Lees, A. (2002) Technique analysis in sports: a critical review. Journal of Sports Sciences,
20, 813-828.
Lula, P., Tadeusiewicz, R. (2001) STATISTICA Neural Networks. Krakow, Poland: Stat-
Soft.
Maier, K., Wank, V., Bartonietz, K., & Blickhan, R. (2000) Neural network based mod-
els of javelin flight: prediction of flight distances and optimal release parameters.
Sports Engineering, 3, 57-63.
Maszczyk, A., Zając, A., & Ryguła, I. (2011) A neural network model approach to ath-
lete selection. Sports Engineering, 13(2), 83-93.
Mester, J., & Perl, J. (1999) Unconventional simulation and empirical evaluation of biological
response to complex high training loads. In: P. Parisi, F. Pigozzi, G. Prinzi (Eds.); Sport
Science ´99 in Europe. Pp. 163-175.
Roczniok, R., & Ryguła, I. (2007) The use of Kohonen’s neural networks in the recruit-
ment process for sport swimming. Journal of Human Kinetics, 17, 75-88.
Szaleniec, M., Tadeusiewicz, R., & Witko, M. (2008) The selection of optimal neural
models for forecasting of biological activity of chemical compounds. Neurocomput-
ing, 72, 241-256.
626 A. MASZCZYK, et al.

Szaleniec, M., Witko, M., Tadeusiewicz, R., & Goclon, J. (2006) Appliction of artificial
neural networks and DFT-based parameters for prediction of reaction kinetics of
ethylbenzene dehydrogenase. Journal of Computer-Aided Mololecular Design, 20,
145-157.
Werbos, P. (1994) How we cut prediction error in half by using a different training method.
World Congress on Neural Networks, San Diego, Tom 1, 225-236.
Woodman, T., Zourbanos, N., Hardy, L., Beattie, S., & McQuillan, A. (2010) Do per-
formance strategies moderate the relationship between personality and training
behaviors? An exploratory study. Journal of Applied Sport Psychology, 22, 183-197.
Zadeh, L. (2002) From computing with numbers to computing with words. Interna-
tional Journal of Applied Math and Computer Science, 12, 4-12.
Zehr, E. P. (2005) Neural control of rhythmic human movement: The common core
hypothesis. Exercise & Sport Sciences Reviews, 33, 54-60.

Accepted April 25, 2012.

View publication stats

You might also like