Automated Scoring System For Essays: Summary

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Automated Scoring System for Essays

Members: P.Aruna, R.Dhivya Priya, R.Divya Harshini


Project Guide : Dr K S Eashwara Kumar

SUMMARY:
The objective of automated essay scoring system is to assign scores to essays written in an
educational setting. It is a method of educational assessment and an application of natural language
processing. This system makes use of word based document vector construction method as it is versatile.
Likewise we adopt the Content Vector Analysis (CVA) in preference to Latent Semantic Analysis (LSA).
This is because in LSA a higher order algorithmic complexity of O(n^2k^3) is involved in SVD and
words are necessarily required to exhibit normal distribution for a good performance. CVA can be used in
this case as the distribution of words in corporate datasets can be expected to be of random nature only.
Our system will use Model Based approach because memory based approach requires a large training
dataset. Also, this is popular in text classification problems where very high-dimensional spaces are the
norm. In comparison to the Memory based approach that needs huge space, storage and training
requirements, our model based approach of calculating the deviation of the essay examined from the
ideal scored essays is better preferred.
The system first evaluates text complexity features, such as the number of characters in the
document(Chars),number of words in the document(words),number of different words (Diffwds) fourth
root of the number of words in the document, as suggested by the Page(Rootwds), number of sentences in
the document(Sents),average word length(Wordlen=Chars/Words),average sentence length
(Sentlen=Words/Sents) and number of words longer than five characters(BW5). Each feature has its own
use. For example, the number of words represents the length of the essay since the length requirement is
say 250-300 words. This feature can check the empty essay or essay which is ridiculously short that it
cannot be processed and rejects it immediately. Otherwise a score can be assigned accordingly. Once the
essay passes the feature extraction process, the next step is to check the essay for any spelling mistakes.
The count of the number of spelling mistakes has to be recorded and the errors must be autocorrected.
Then the essay must be checked for grammatical mistakes and based on it a component of score must be
assigned. The next step is to remove the stop words and the essay is subjected to stemming. The number
of times the word occurs in a document (tf) and the number of documents containing the word (df) are
calculated. The inverse document frequency (idf) is calculated using df and the tfidf weight is computed.
Content Vector Analysis is then carried out and a score is assigned to the test essay based on its deviation
from the reference essay. The individual raw scores, namely from the feature extraction process, the
grammar/spell check process and the content vector analysis process, are taken and weights are assigned
for each component. The scores are then subjected to regression techniques using which the final score is
calculated. In this manner, we can ensure that the essays are graded uniformly with equity and less
fatigue.

STATUS:

Completed after Review-I

Completed before Review-I


Completed for Review II ( 2nd online submission)

Completed for Review II

Modules:
GUI:
The user is allowed to key in the test essay to be evaluated.On clicking the SUBMIT button the essay
will be recorded in a file.
Special Case: If the user by any chance happens to click the SUBMIT button without typing the essay,a
prompt will appear asking him to key in the essay and then press the submit button for the first time
alone. The next time the user presses the submit button,it will be counted as no answer and his score
will be 0.

Surface Feature Extraction:


The text complexity features are extracted and they are compared with the requirements specified.Each
feature has its own use. For example,the number of words represents the length of the essay since the
length requirement is say 250-300 words.This feature can check the empty essay or essay which is
ridiculously short that it cannot be processed and rejects it immediately.Otherwise a score can be
assigned accordingly
Stanford toolkit: POS tagger- Part-Of-Speech Tagger is used to parse the sentence and the tagged
sentence is used to find the number of verbs in the essay.

Spell Checking:
Hey shall we tell we wrote the code or took the snippet?
Grammar Checking:

JLinkGrammar is used to check the number of grammar mistakes.A shell script that runs the
JLinkGrammar was written and is called from Net Beans.

Content Vector Analysis:


The auto corrected test essay is subjected to stop word removal and then stemming.All the keywords
from the corpus is extracted and term document matrix is constructed.In the matrix,the first document
represents the test essay.Hence,its correlation with the other documents is calculated.Depending on the
deviation the score is assigned.

Regression:
The individual raw scores namely from the feature extraction process,the grammar/spell check process
and the content vector analysis process,are taken and weights are assigned for each component.
Text Complexity Feature Score Component1-3%
Spell Check Score Component2-7%
Grammar Check Score Component3-10%
Relation to the topic Score Component4-80%
The final score is assigned based on the weighted sum of all the score components.

Experiment:
Input:Test Essay & Corpus
Output:Score

Contribution of the Candidate:


Change it if u want See to it all get 100% and we dont put ourselves
into trouble by writing modulewise split up ;)
Implementation:
Aruna P-20
Dhivya Priya R-40
Divya Harshini R-40

Documentation:
Aruna P-50
Dhivya Priya R-25
Divya Harshini R-25

Background Work:
Aruna P-30
Dhivya Priya R-35
Divya Harshini R-35

Project Guide : Dr K S Eashwara Kumar

You might also like