Feature Selection in a Fuzzy Student Sectioning
Mahmood Amintoosi1 and Javad Haddadnia2
Mathematics Department, Sabzevar Teacher Training University, Iran, 397
Engineering Department, Sabzevar Teacher Training University, Iran, 397
{amintoosi, haddadnia}@sttu.ac.ir
Abstract. In this paper a new student sectioning algorithm is proposed.
In this method a fuzzy clustering, a fuzzy evaluator and a novel feature
selection method is used. Each student has a feature vector, contains
his taken courses as its feature elements. The best features are selected
for sectioning based on removing those courses that the most or the
fewest numbers of students have taken. The Fuzzy c-Means classifier
classifies students. After that, a fuzzy function evaluates the produced
clusters based on two criteria: balancing sections and students’ schedules
similarity within each section. These are used as linguistic variables in a
fuzzy inference engine. The selected features determine the best students’
sections. Simulation results show that improvement in sectioning performance is about 18% in comparison with considering all of the features,
which not only reduces the feature vector elements but also increases the
computing performance.
The course timetabling problem essentially involves the assignment of weekly
lectures to time periods and lecture room in such a way that a set of constraints
satisfy. Current methods for tackling timetabling problems include evolutionary
algorithms [14], [15], [17], [29], [30], genetic algorithms [12], [13], [18], [19], [20],
graph-based methods [10], [11], [35], simulated annealing [2], [3], tabu search [24],
[33], [34], neural networks [26], hybrid and heuristic approaches [6], [7], [28], [38],
constraint logic programming [1], [9], [21], [23], [32], ant colony optimization [36]
and fuzzy expert systems [5].
A particular problem related to timetabling is student sectioning. This problem is due to courses which involve a large number of students. For a variety
of reasons, splitting these students into a few smaller sections is desirable: for
1. Room capacity requirements: the when number of students in a course is
greater than every room capacity.
2. The policies of the institution: some institutions have rules about maximum
capacity of courses (e.g. 50 for specialized courses and 60 for public courses).
3. A good student sectioning may reduce the number of edges in the conflict
matrix [16].
E. Burke and M. Trick (Eds.): PATAT 2004, LNCS 3616, pp. 147–160, 2005.
c Springer-Verlag Berlin Heidelberg 2005
M. Amintoosi and J. Haddadnia
Most previous works related to the course scheduling problem have concentrated on timetable construction, with little regard to student sectioning.
Selim [35] introduced the idea of split vertices and made a start to determine
those vertices, which should be split in order that the chromatic number may be
reduced. Selim treated the problem as a conflict graph and showed how one could
pick out certain vertices in the conflict graph to split, reducing the chromatic
number of the graph to the desired value by increasing the number of sections
in the timetable. With this idea Selim decreases the chromatic number of the
conflict matrix, from 8 to 3. Thus the total number of periods needed is reduced.
The main algorithm of Laporte and Desroches [25] has the following stages:
1. Constructing student schedules without taking into account section enrollments and room capacities.
2. Balancing section enrollments.
3. Respecting room capacities.
One of its interesting features is the weight it gives to the overall quality of
student schedules.
Aubin and Ferland [6] generate an initial timetable with an assignment of
the students to the course sections; then an iterative procedure is used which
adjusts the timetable and the grouping successively until no more improvement
of the objective function can be obtained. At each iteration, two procedures are
1. Given the grouping generation during the preceding iteration, the timetable
is modified to reduce the number of conflicts.
2. With this timetable, the grouping of students is modified to reduce the number of conflicts.
Hertz [24] used a tabu search technique for both timetabling and sectioning
problems. He assumed that the numbers of students in each section are fixed.
The neighborhood N (s) of a solution s consists of all those grouping which can
be obtained from s by exchanging the two students of two different sections of
a course.
The initial student sectioning of Muller and Rudova [27] is based on Carter’s
[16] homogeneous sectioning and it is intended to minimize future student conflicts. They attempt to improve the solution with respect to the number of
student conflicts. This is achieved via section changes during the search. Each
student enrollment in a course with more than one section was processed. An
attempt was made to switch it with a student enrollment from a different section. If this switch decreased the total number of student conflicts, it would be
Amintoosi et al. [4] introduced a fuzzy sectioning method which decreases
the average number of conflicts in a genetic timetabling program.
In this paper we concentrate on initial student sectioning prior to timetabling.
A new student-sectioning algorithm is proposed. In the proposed method a fuzzy
clustering, a fuzzy evaluator and a novel feature selection method are used.
Feature Selection in a Fuzzy Student Sectioning Algorithm
A Fuzzy c-Means algorithm classifies students in a large class into smaller
sections. Each student has a feature vector in fuzzy classifier. The courses taken
by each student are its feature elements.
The produced clustering is evaluated with a fuzzy function, according to
some criteria: size of clusters and students’ schedules similarity of each section.
The above parameters have been used in a fuzzy inference engine as linguistic
variables for clustering evaluation.
By removing those courses that have been taken by the most or by the fewest
numbers of students, the best features (courses) are selected. Appropriate values
for the most and the fewest values are determined with an iteration procedure.
In each iteration, courses which contain each of the following properties are
– courses which have been taken by a high percentage of students, greater than
a specified threshold, and
– those that have been taken by a low percentage of students, lower than
another threshold.
A Fuzzy c-Means classifier classifies remaining courses. The produced clusters
are evaluated by the mentioned fuzzy function. The best classification of students
will be ready at the end of the above loop.
Simulation results in the average case show that about 53% of courses are
essential for clustering and with these selected courses the clustering performance
would be about 18% more efficient.
The reminder of this paper is organized as follows. Section 2 describes the
fuzzy C-means clustering algorithm. In Section 3, the proposed method is explained in more detail and in Section 4 simulation results are considered. Section
5 concludes.
Fuzzy c-Means Clustering
Fuzzy c-Means (FCM) is a data clustering algorithm in which each data point
is associated with a cluster through a membership degree. Most analytical fuzzy
clustering approaches are derived from Bezdeck’s FCM [8], [31]. This technique partitions a collection of NT data points into r fuzzy groups and finds
a cluster center in each group, such that a cost function of a dissimilarity measure is minimized [22]. The algorithm employs fuzzy partitioning such that a
given data point can belong to several groups with a degree specified by membership grades between 0 and 1. A fuzzy r-partition of input feature vector
X = {x1 , x2 , . . . , xNT } ⊂ ℜn is represented by a matrix U = [µik ], where the
entries satisfy the following constraints:
µik ∈ [0, 1] , 1 ≤ i ≤ r , 1 ≤ k ≤ NT
µik = 1 , 1 ≤ k ≤ NT
µik < NT ,
1 ≤ i ≤ r.
M. Amintoosi and J. Haddadnia
U can be used to describe the cluster structure of X by interpreting µik as
the degree of membership of Xk to cluster i. A proper partition U of X may be
defined by the minimization of the following objective function:
Jm (U, C) =
(µik )m d2ik
k=1 i=1
where m ∈ [1, +∞] is weighting exponent called the fuzzifier, C = {c1 , c2 , . . . , cr }
is the vector of the cluster centres, and dik is the distance between Xk and the
ith cluster. Bezdek proved that if m ≥ 1, d2ik > 0, 1 = i = r, then U and C
minimize Jm (U, C) only if the entries of them are computed as follows:
µ∗ik =
(dik /djk ) m−1
(µik )m xk
One of the major factors that influences the determination of appropriate
clusters of points is the dissimilarity measure chosen for the problem. Indeed,
the computation of the membership degrees µ∗ik depends on the definition of the
distance measure dik , which is the inner product of norms (quadratic norms) on
Rn . The squared quadratic norm (distance) between a pattern vector Xk and
the center ci of the ith cluster is defined as follows:
d2ik = ||xk − ci ||G = (xk − ci )T G(xk − ci )
where G is any positive definite (n × n) matrix. The identity matrix is the
simplest and most popular choice of G.
The FCM algorithm consists of a series of iterations alternating between
Equations (5) and (6). This algorithm converges to either a local minimum or a
saddle point of Jm (U, C). FCM is used to determine the cluster centers ci and
the membership matrix U for a given r value as follows:
Step1 : Initially the membership matrix is constructed using random values
between 0 and 1, such that constraints (1)–(3) are satisfied.
Step2 : For each cluster i (i = 1, 2, . . . , r) the fuzzy cluster centres ci are
calculated using Equation (6).
Step3 : For each cluster i, the distance measures dik are computed using
Equation (7).
Step4 : The cost function in Equation (4) is computed and if either it is
found to be below a certain tolerance value, or its improvement over the previous
iteration (dJm ) is below a certain threshold, then it is stopped and the clustering
procedure is terminated.
Feature Selection in a Fuzzy Student Sectioning Algorithm
Step5 : A new U using Equation (5) is computed and steps 2–5 are repeated.
By using the above fuzzy clustering procedure, the students in large classes
are divided into r clusters. Clustering evaluation is done with a fuzzy function.
This fuzzy evaluation is based on a fuzzy inference engine. In the next section
the proposed method is explained.
The Proposed Method
The aim is to allocate students of a course into smaller sections, satisfying the
following criteria:
1. Student course selections must be respected.
2. Section enrollments should be balanced, i.e. all sections of the same course
should have roughly the same number of students;
3. Section capacities and policies of institute should not be exceeded.
4. Student schedules in each section would be the same as each other (as much
as possible).
A fuzzy c-Means algorithm is used for student sectioning [4]. This algorithm
satisfies criterion 1. Other criteria are evaluated at the evaluation phase. A fuzzy
inference engine evaluates the produced clustering. With a well-defined set of
rules [4], the criteria 2–4 are considered.
Removing those courses that have been taken by the most or the fewest
numbers of students achieved the best feature elements. The simulation results
show that about 53% of features are important for classification; in addition, the
clustering performance with selected features is better than the performance in
the case that all features were considered.
The proposed algorithm contains three basic parts as follows:
– method of data representation,
– fuzzy clustering and fuzzy evaluation,
– feature selection method.
The following sections explain the basic parts of the proposed method.
Data Representation
In the proposed method each student has a feature vector. The courses taken by
each student are its feature elements, represented by a bit array. Suppose that
P is the number of all courses and Vi is the list of taken courses by student i.
As shown in Equation (8), Vi is the feature vector of the ith object (student):
1 if student i has taken lesson j
Vij =
Vi = (Vi1 , . . . , ViP ) ,
0 otherwise .
Table 1 shows an example for five students. Column 2 contains the selected
courses list from three total courses, which are taken by each student. Column
3 shows the corresponding feature vector for each student. If sectioning of A is
desirable, students 1 and 2 will be in Section 1, and others remain in the second
section. Column 4 displays the sectioning results.
M. Amintoosi and J. Haddadnia
Table 1. List of courses taken by five students and their corresponding feature vectors
Courses taken
Feature vector
Section no.
Fuzzy Clustering and Fuzzy Evaluation
In our algorithm Fuzzy C-Mean has been used as classifier. The input of the
classifier is the students’ feature vectors, as explained in the previous section.
For simplicity, the number of clusters is assumed to be 2.
Clustering evaluation has been done with a fuzzy function. Rates of section
balancing and similarity of students’ schedules in each section (criteria 1 and 2)
are its inputs and its output is the clustering performance. Two lingual variables
“Density” and “N1PerN2” (N1/N2) are defined as inputs of a fuzzy inference
engine. Density of clusters is the sum of the common courses of all student
pairs (a, b) such that students a, b lie in the same section. By dividing the
mentioned summation with its maximum value, the value of “Density” will be
normalized.“N1PerN2” represents the section’s balancing rate. It is supposed
that N1 is the size of the smaller section and hence, the range of this variable
is between 0 and 1. Since the number of students in each section should be as
equal as possible, the suitable values of N1PerN2 are close to 1. Figure 1 shows
the membership functions of the mentioned variables.
The output of the fuzzy inference engine is named “Performance” and has
the following values: Bad, NotBad, Medium, Good and Excellent. Our Fuzzy
rules were defined as follows:
Fuzzy Rules
Rule 1:
if (Density
Rule 2:
if (Density
Rule 3:
if (Density
Rule 4:
if (Density
Rule 5:
if (Density
Rule 6:
if (Density
is High) and (N1PerN2 is Suitable) then (Performance is Excellent)
is High) and (N1PerN2 is Middle) then (Performance is Good)
is High) and (N1PerN2 is UnSuitable) then (Performance is Bad)
is Med ) and (N1PerN2 is Suitable) then (Performance is Good)
is Med ) and (N1PerN2 is Middle) then (Performance is Medium)
is Med ) and (N1PerN2 is UnSuitable) then (Performance is Bad)
Feature Selection in a Fuzzy Student Sectioning Algorithm
Fig. 1. Membership functions of our linguistic variables: “Density” (upper),
“N1PerN2” (lower)
Rule 7:
if (Density is Low ) and (N1PerN2 is Suitable) then (Performance is NotBad)
Rule 8:
if (Density is Low ) and (N1PerN2 is Middle) then (Performance is Bad)
Rule 9:
if (Density is Low ) and (N1PerN2 is UnSuitable) then (Performance is Bad)
The rules are defined such that the influence of unsuitable sections sizes is
more than the effect of students’ schedules similarity. Rules 3, 6 and 9 reflect
this. Hence the decision surface of our rules’ database shown in Figure 2 has an
asymmetric face.
Fig. 2. Decision surface of our rules’ database
M. Amintoosi and J. Haddadnia
Feature Selection Method
Feature selection plays an important role in classification problems. Advances in
feature selection not only reduce the dimension of feature vector but also reduce
the complexity of classifier. The most important rule in feature selection is reducing the feature elements as far as possible such that their class discrimination
remains [37].
As you can see in Table 1, the first feature (course A) is common between
all vectors (students). Removing it should not influence the clustering results.
In our problem it seems that removing the following courses is a good idea:
1. courses that the most number of students have been taken;
2. courses that the fewest number of students have been taken.
Removing those courses that none of the students or all of them have taken
is done in a pre-processing stage. One problem is to specify the appropriate
threshold values for the most and the fewest parameters. An exhaustive search
procedure finds them as follows: the percentage of students that have taken each
course is determined and these values are finite. The number of such courses is
equal to or less than the number of all courses. If P is the number of all courses
taken by the students of the class, a procedure with worst time complexity O(P 2 )
will find the appropriate values for the most and the fewest. Before entering a
loop, the percentage of students that have taken each course, and the percentage
that have not taken each course, are determined. Each value of these two lists
can be a threshold for the most (T1 ) and the fewest (T2 ) parameters, respectively.
In each iteration, those courses for which the percentage of students which have
taken them is greater than T1 , or for which the percentage of students that have
not taken them is greater than T2 , will be removed.
The fuzzy C-Means algorithm classifies students based on these selected
courses (features). After that, the fuzzy function evaluates the produced clusters
based on two criteria: balancing sections and students’ schedules similarity of
each section. After the last iteration the best classification of students will be
arrived at. The following pseudo-code illustrates the overall procedure. In this
algorithm Clustering() is a classifier function and ClusteringEvaluation() is an
evaluator function.
AllCourses = Set of all courses that students of this
course have taken; //(All Features)
List1 = Percentage of students that have taken each
course, sorted in non increasing order;
Thresholds1 = Distinct elements of List1;
List2 = Percentage of students that have not taken each
course, sorted in non increasing order;
Thresholds2 = Distinct elements of List2;
for i: = 1 to length(Thresholds1 ) do
Feature Selection in a Fuzzy Student Sectioning Algorithm
MaxSet = Set of Courses that percentage of students
which have taken them > Thresholds1[i];
for j: = 1 to length(Thresholds2 ) do
MinSet = Set of Courses that percentage of students
which have not taken them > Thresholds2[j];
SelectedFeatures = AllCourses – (MaxSet ∪ MinSet );
Clusters = Clustering( SelectedFeatures);
if ClusteringEvaluation(Clusters) is better than
previous clusters’ performance then
BestClusters = Clusters;
T1 = Thresholds1[i];
T2 = Thresholds2[j];
end; // end if
end; // end for j
end; // end for i
Simulation Results
The information used for simulation is taken from the Mathematics department
at Sabzevar University. Students and courses are randomly selected with the
following characterization:
– total number of students: 210;
– number of courses: 38.
The simulator program has been written with Matlab on a Windows platform.
Figure 3(a) shows the feature vectors for a course with 67 students and total 31 features. Figure 3(b) highlights the selected features. As can be seen
in the last row of Table 2 (related to this course) the number of selected features (NSF column) is 16 out of 31. With these selected features the performance is better by about 35% (P1/P2 column) than with consideration of all
At each iteration of the feature selection algorithm, performance is evaluated
with the selected features. The values of the fuzzy function’s inputs are varying
with every set of selected courses (features). Figure 4 demonstrates the inputs
and output values of fuzzy evaluator function in a total of 179 iterations for the
mentioned course with 67 students. The best solution is shown with a circle on
iteration no 87 in Figures 4 and 5. The values of the “Density”, “N1PerN2”,
“Performance” and the number of selected features are 0.73, 0.97, 0.75 and 16
respectively. The first iteration (values: 0.78, 0.60, 0.56 and 31) corresponds with
the case in which all features have been considered in clustering. As can be seen,
performance increases by about 35% (0.75/0.56 = 1.35).
M. Amintoosi and J. Haddadnia
The simulation results on 12 courses are shown in Table 2. As can be seen,
the performance with the proposed method (P1) is always better than or equal
to the performance when all features are considered (P2) in clustering. All of
the features are required only in three examples (Examples 3, 10 and 11). In
the average case, 53% of features are required and performance is increased by
about 18%.
Fig. 3. (a) Feature vectors for a course with 67 students and 31 features (courses). (b)
The selected features (columns in the figure) are highlighted.
Fig. 4. The values of inputs and output of fuzzy evaluator function in the Feature
Selection Algorithm. Dashed, dotted and solid lines represent the value of “Density”,
“N1PerN2” and “Performance”, respectively. The best solution is shown with a circle
on iteration no. 87.
Feature Selection in a Fuzzy Student Sectioning Algorithm
Fig. 5. The best number of selected features is 16 and demonstrated with a circle on
iteration no. 87. The point formation depicts the outer and the inner loops in the
proposed feature selection algorithm.
Table 2. The simulation results for 12 courses. P1 is the best performance achieved
(with selected features), P2 is the performance taking into account all the courses. NSF
is the best number of selected features, NTF is the number of total features, T1 and
T2 are the values of the most and the fewest parameters.
LessonNo N1PerN2
1 0.44
0.79 0.33 0.95
1 0.97
1 0.93
1 0.64
0.18 0.5 0.57
1 0.71
0.32 0.33 0.77
1 0.71
1 0.97
1 0.98
0.52 0.25 0.9
Concluding Remarks
In this paper a novel course selection method and a fuzzy evaluation function for
the student sectioning problem is proposed. Our aim is to allocate students of
a course to smaller sections—prior to timetabling—so as to satisfy the following
– student course selections must be respected,
– section enrollments should be balanced,
M. Amintoosi and J. Haddadnia
– section capacities and rules of institution should not be exceeded,
– student schedules in each section should be the same as each other (as far
as possible).
Firstly, with a Fuzzy c-Means algorithm, students in a large class have been
classified. Each student has a feature vector in the fuzzy classifier. The courses
taken by each student are his/her features. In contrast to the usual graph-based
sectioning, not only is the number of courses common to two students important,
but also the number of courses that is not taken by both of them is significant.
Secondly, the produced clusters are evaluated with a fuzzy function, according
to two criteria: balancing sections and students’ schedules similarity of each
section. By removing those courses that the most students or the fewest students
have taken, the best features (courses) are selected. Simulation results in the
average case show that about 53% of courses are essential for clustering. The
best classification of students corresponds with these selected features. Clustering
performance with these selected features is increased by about 18% compared
with considering all of the features.
