1. Introduction
The Asian Institute of Technology is an information on past and current
applicants. We decided to make use of
autonomous postgraduate institution
with a highly international student body. this information by applying data mining
The institute has an enrollment of 2000 techniques to predict student
students from 45 different countries with performance at AIT based on the
the majority nationality comprising only information contained in the application.
35% of the total student body. Each Decision support systems have been
year we receive applications from built to help advisors instruct students in
students who have completed previous choosing suitable courses and
bachelors or masters degrees at any one appropriate study plans [1,2]. Previous
of approximately 600 different work on student performance prediction
institutions. This extreme diversity in has used logistic regression to examine
applicants for admission makes accurate the impact of various factors on student
evaluation a highly challenging task. performance [1]. Bekele and Menzel [5]
The evaluation of applications has been used Bayesian networks to predict
traditionally performed by faculty mathematics performance of high school
members who have some degree of students. Their model categorized
knowledge each particular country’s students into three categories: below
educational system. Unfortunately, it is satisfactory, satisfactory, and above
not always possible to find a faculty satisfactory. The work reported in the
member in each program who has good present paper differs from theirs in
knowledge of each country’s system of highly international nature of the
higher education. We were thus applicant pool and the more fine grained
motivated to produce a decision support prediction.
system to provide a more methodical The rest of this paper is organized as
approach. follows. Section 2 provides background
Research on academic performance [3,4] on Bayesian networks and the
suggests using student outcome as a methodology for their application to this
good basis to assess applicants’ problem. Section 3 presents empirical
qualifications. A performance prediction evaluation of the prediction model. The
model can be built by applying data paper ends with conclusions and a
mining to available admission and discussion of ongoing and future work.
graduation grade point average data.
Fortunately, AIT has a large database of
2. Methodology program (master or doctor) to which the
A Bayesian network [6] is a graphical applicant is applying. The attributes
representation of a probability major and field of study have a large
distribution. It is a directed acyclic number of values so they were processed
graph in which nodes represent random by clustering the values into groups of
variables and links represent similar majors and similar fields of
probabilistic influences between the study.
variables. Probabilistic dependence and The attributes nationality and institute of
independence are expressed by the the previous degree have large number
presence or lack of paths between nodes of values (86 and 1707, respectively)
in the graph. The fact that probabilistic without any intrinsic meaning. We thus
dependence is encoded in the network decided to transform them into more
topology in this way permits probability meaningful values.
distributions over large numbers of
random variables to be compactly The socio-economic environment can
represented and permits calculations to play a major role in the performance of
be performed efficiently. Due to the students. So we used the World Bank
inherent uncertainty of the performance classification of countries according to
prediction problem, we chose to use their Gross National Income (GNI) to
Bayesian networks for the modeling group countries into GNI categories of
task. Using a probabilistic model has the LIC (Low Income), LMC (Low Middle
advantage that it can later become a Income), NOC (High Income, non-
component of a higher level OECD), OEC (High Income, OECD)
optimization model. and UMC (Upper Middle Income).
At AIT coursework is generally The most important factor concerning
completed within the first year of study. the university of the previous degree is
So we used the grade point average the quality of the academic programs
(GPA) accumulated after the first year as there. We gauged this by correlating the
the dependent variable. The numeric GPA of the previous degree with that
GPA at AIT ranges from 0 to 4.0, which obtained at AIT. If students consistently
is also translated into a letter grade with enter AIT with somewhat average grades
values of A, B+, B, C+ and Fail (C or from a particular university but graduate
below). The number of students with high grades, we take this as
classified as Fail is low due to an evidence of the high quality of that
institute policy of continuous review of institution. We rated institutions on a
students and either special tutoring for or scale from 0 to 10.
dismissal of students with low grades. Experimentation with a number of
Based on research on student different Bayesian network structures
performance [3, 5] and available showed that the simple Naïve Bayes
attributes in the admission data, ten model shown in Figure 2.2 produced the
attributes were chosen as predictors of best results. Predictor attributes are
performance: age, gender, marital status, displayed along the sides and top of the
nationality, English test score, institute graph. The model contains two nodes
of the previous degree, major of for the predicted variable CGPA. The
previous degree, GPA of the previous one from which the edges to the other
degree, field of study and degree attributes emanate is a discrete valued
node and the other with the double edge n
