Root Identification Tool For Arabic Verbs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332515338

Root Identification Tool for Arabic Verbs

Article in IEEE Access · January 2019


DOI: 10.1109/ACCESS.2019.2908177

CITATIONS READS

7 2,499

1 author:

Bakeel Azman
Sana'a University
5 PUBLICATIONS 75 CITATIONS

SEE PROFILE

All content following this page was uploaded by Bakeel Azman on 09 March 2020.

The user has requested enhancement of the downloaded file.


Received February 2, 2019, accepted February 25, 2019, date of current version April 17, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2908177

Root Identification Tool for Arabic Verbs


BAKEEL AZMAN
College of Computer and Information Technology, Sana’a University, Sana’a, Yemen
e-mail: [email protected]

ABSTRACT Numerous Arabic morphology systems have been devoted toward morphed requirements of
words that are required by other text analyzers. Term rooting is an essential requirement in those systems, yet
rooting module in the state-of-the-art morphology systems insufficiently meet that requirement, especially
verb term. Consequently, due to termination in stemming term rather than a rooting term. Since the stem
of the verb is not the root of the verb, it is not feasible to generate or inference verb’s derivations and
whole it’s surface forms (patterns) such tense, number, mood, person, aspect, and others of verb irregular
patterns. Therefore, we propose a new model for identifying the verb’s root produced in a tool (RootIT) in
order to overcome verb root extraction without disambiguation out of traditional methods, applied in current
morphology systems. A major design goal of this system is that it can be used as a standalone tool and
can be integrated, in a good manner, with other linguistic analyzers. The adopted approach is a mapping
surface verb with full-scale derivative verbs discharged previously in the relational database. Moreover,
the proposed system is tested on the adopted dataset from PATB verbs extracted from CoreNLP system. The
extracted dataset, containing more than (7950) distinguishes verbs belonging to (1938) different roots.
The results obtained outstrip the best-compared system by (2.74%) of high accuracy.

INDEX TERMS Root, verb, pattern, stem, morphology, identifying, and ANLP.

I. INTRODUCTION in order to paraphrase a fragment text along without violating


The Arabic sentence falls under one of two types, either the meaning. In addition, task of semantic analysis needs to
nominal or verbal sentence [1]. Their existence in the text collect sentence arguments on the basis of thematic their role,
can be of nearly equal proportion, on the understanding that which has more importance as a system [3]. The point, which
it is possible to convert one type to another. In respect to motivated this study, is our own work that intends to parse
verbal sentence, it cannot be formed without having a verb the cognate object that occurs in Arabic sentence, as we have
which classifies that sentence as a verbal sentence. The term been in serious need to get the verb’s root of which spans the
of verb in that type of sentence is our main concern in this cognate clause.
study. Arabic and other Semitic languages have long been Rooting studies in a specialized manner enrich basically
described in terms of a root interwoven with a pattern. The the morphology systems with one of the most important mor-
root is a sequence of consonants, as each Arabic verb contains phed feature. In addition to supplying analyzers with a special
(3) or (4) consonants that generally remain unchanged in all tool concerning term roots along with all their patterns and
of its conjugated forms and form the consonantal root; all the surface forms. Embedding such a tool within related applied
remaining information on a conjugated form is called ‘‘pat- systems play pivot roles in the manipulation of sentence
tern’’ such as ( ) [2]. The patterns elements, out of these systems are information extraction,
morph the meaning of the root to create a variety of related in Ontology relations selection (predicate) [4], in lexicon
words. Thus, the task of rooting verb constitutes a cornerstone and dictionary, in machine translation [5], [6], in subject-
for all morphology systems and almost linguistic analyzers verb agreement and object-verb agreement, in verb move-
systems that cares about events (verbs) along with its variant ment, in verb sense disambiguation, Arabic spell checking,
surface forms. Furthermore, systems aspire to obtain a verb diacritizing ( ) and so forth.
conjugation, other systems require a verb’s surface form such The extraction of verbs roots is important not only for
parsers, and other desire to provide alternative verb’s forms a pure linguistic analysis, but also for discovering differ-
ent arguments of sentence, purpose, and effect in speech,
The associate editor coordinating the review of this manuscript and and for more-sophisticated Natural Language Understanding
approving it for publication was Mohamed Elhoseny. systems in general. In spite of this importance, there is no

2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
45866 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
B. Azman: Root Identification Tool for Arabic Verbs

specific tool or sufficient embedded module stand up to iden- language like Arabic. In English, both words are used in the
tify word’s root. same way, to indicate which one can have an affix. Since there
Our proposed model is specialized in the extraction of are very few affixes in English, it doesn’t really matter.
real verb root existing in its original (3) or (4) characters
distancing busy with other morphed features, although it is II. RELATED WORK
possible to include those features due to modeling of root Current morphology systems seek to get the stem of word
and its surface forms in this model. The main method of rather than the. They endeavor for obtaining the stem of a term
RootIT tool depends on tree-structure hierarchized for each as it looks like a non-affixed term such MADAMIRA [13]
verb root, starting by root as tree root, following some levels that adopts some attached toolkit producing stem, based on
ending with leave that present real verb form in real text. decomposition of the input word e.g. suffixes and prefixes
The whole surface forms of the verbs have been derivated along with clitics. The extraction rules have failed in getting
via ALECSO (Arab League Educational, & Scientific and sound root e.g. ‘‘denounce’’, ‘‘commit’’,
Cultural Organization) derivational system Sarf [7]. By this ‘‘hope’’and ‘‘be stained’’. All morphology systems are
method, all derivated and all surface forms of verb roots are supposed to make a difference between stem and root [14].
represented in relational database, in this way, the ability of The potential root of a term should be casted into its original
extracting any verb root will be fully performed. The work orthography of letters, just trilateral or quadrilateral atom.
area will be performed on any text and on any form of verb The list of Arabic stemming proposed models is
without constraints on inputs as happening in similar systems. huge [15]–[22]; thus, it is impossible to review them briefly
in this section. Notwithstanding, we will review the main
A. ROOT vs STEM approaches adopted in common models and highlight state-
A root is a form, which is not further analyzable, either in of-the-art in the same subject. The dominant approaches
terms of derivational or inflectional morphology [8]. Addi- in extracting roots systems are two, light stemmer and
tionally, it is that part of word-form that remains when all root-based stemmer [23]. Stem-based algorithms remove
inflectional and derivational affixes are removed. A root is prefixes and suffixes from Arabic words, while root-based
also the basic part, which is always present in a lexeme. While algorithms reduce stems to roots [24]. Light stemming refers
the stem is a root of a word, together with any derivational and to the process of stripping off a small set of prefixes and/or
inflectional affixes are added [9]. A stem consists minimally suffixes without trying to deal with infixes or recognize
of a root, but may be analyzable into a root plus derivational patterns and find roots [14], by very briefly word, Light
morphemes. A stem may require an inflectional operation stemmer cannot deepened for extracting of root [25], [26].
(often involving a prefix or suffix) in order to ground it into The second approach is root-based stemmers, whereas the
discourse and make it a fully understandable word. If a stem name implies that root is extracted from the word by means
does not stand by itself in a meaningful way in the language, of morphological analysis. It attempts to restore original root
it is referred to as a bound morpheme. of a word and group words accordingly [23]. The basic two
Furthermore, there is a term called ‘‘base’’ which is any steps of root-based stemmers are first to remove prefixes,
form to which affixes of any kind can be added [10]. This and suffixes. Second, is to extract roots by analyzing Arabic
means that any root or any stem can be termed as a base, but words according to their morphological components. This is
the set of bases is not exhausted by the union of the set of accomplished by rule based techniques, table lookup [25],
roots and the set of stems: a derivationally analyzable form or by a mixture of the two.
to which derivational affixes are added can be only referred One of the earliest techniques developed for root-based
to as a base. That is, ‘‘target’’ can act as a base for a stemmers is Khoja stemmer [20]. In this technique, prefixes
prefix to give ‘‘will target’’, but in this process and suffixes are removed, then two dictionaries are used,
could not be referred to as a root because it is analyzable in one to match the remaining letters against Arabic patterns,
terms of derivational morphology, nor as a stem since it is not and the second is to confirm the correctness of the root.
the adding of inflectional affixes which is in question [11]. Taghva et al. [27] is similar to Khoja et al. with no
A root differs partially from a stem in that a stem must use of dictionaries; rather a rule-based technique is used.
have lexical meaning. A root has no lexical meaning and the Sonbol et al. [28] uses a rule-based technique to extract
semantic range of the root is vague if there is any. A stem roots by dividing letters to a part of root, and others are
may contain derivational affixes and it becomes of concern further divided into sub-groups which are examined with
only when dealing with inflectional morphology. In the form well-defined rules to extract the final root, with no use of
‘‘They will target’’ the stem is ‘‘ ’’, although in a dictionary. Moreover, Spline function technique is utilized
the form ‘‘ ’’ the stem is ‘‘ ’’. Stemming sometimes for extracting root. That method is divided into two phases.
affects the semantic of a word, whereas lemma preserve the First phase involves seeking all the possible roots of each term
meaning of a word [12]. analyzed out of the context with a morphalizer. Second phase
Root, stem and base are all terms used in the literature to constructs a disambiguation approach based on continuous
designate that part of a word that remains when all affixes are quadratic splines to choose among these roots the one that
removed. The distinction is only useful in a highly inflected corresponds to the word context [29]. Momani and Faraj [30]

VOLUME 7, 2019 45867


B. Azman: Root Identification Tool for Arabic Verbs

filters rootless words, and then removes suffixes and prefixes. and derivations. Since most of the rule-based Arabic stem-
It removes excessive letters only if it takes place more mers go to remove prefixes, infixes, and suffixes, most of
than once in a word. these stemmers cannot not recognize the right root of some
A new stemming model is introduced to design and imple- Arabic terms, i.e. ‘‘they target’’, and ‘‘decom-
ment an Arabic light stemmer which claimed that it can iden- pose’’, ‘‘interrogated’’. Majority drops takes place in
tify the word root. It uses predefined mathematical rules and vowel-words with one of letters. All those Arabic words
several relations between letters of the term. After applying do not include all the letters of the corresponding Arabic root,
an appropriate rule on a word, some clitics will be removed, so rule-based Arabic stemmers are incapable to extract their
then mapping the produced word against roots dictionary. correct roots.
If no mapped root in the dictionary, splitting the original word Up to now, there is no available appropriate tool for Arabic
process repeats through another rule. The process continues NLP applications fully meets requirements of word root.
recursively until finding a root or last category rules with no Although many research projects have focused on the prob-
root [31]. However, the rules set cannot face multitude of Ara- lem of Arabic morphological analysis using different tech-
bic word morphology, as evidenced by some unsound words niques and approaches as we most briefly discussed fragment
( . . . , etc.). Mohammed [32] conceived models.
the two approaches problems, so he proposed an Arabic
stemmer that combined the rules of root-based stemmer and III. PROPOSED WORK
light-based stemmer to success in facing up that failures. Such The system of RootIT for verb root identification is a
problems are removing the affix before matching it with a root-based system, and it contains two modules, roots
Tafeala and not dealing with the word of three letters database module and module of detecting of root. The general
length, resulted from these two stemmers. A model attempted approach in most morphology systems are top-of-bottom,
to mix the two approaches along with statistical approach starting of capturing a sentence word w, then mapping w
by [33]. It was only for generating the possible roots of any against embedded lexicon, if no mapped, the cycle evokes
given Arabic word. The analyzer is based on automatically trimer of preffixations or suffixation. The process continues
derived rules and statistics. It compounded three modules, recursively, trimming, mapping, until w matches a lexicon
one for taking advantage of a list of Arabic word-root pairs to entry which present the stem of w. Our approach seems tiny
derive a list of prefixes and suffixes, another one for building different; it adopts bottom-up approach along with top-of-
stem templates, and last one for calculating the possibility that bottom. The method centerizes the root r in the center of
a prefix, a suffix, or a template would appear. The second tree, then the root r is reproduced its conjugations and so
accepts Arabic words as input, attempting to construct possi- surface forms surrounded with levels. The leave level (real
ble prefix-suffix temple combinations, and outputs a ranked verb) indicates to its root center via tracing path which will
list of possible roots. present the root of the verb v. FIGURE 1 shows method in an
The most relevant method is [34], which attempts to build example for ‘‘aim’’ root.
a lexicon including all verbal forms. The verbal forms are
generated from more than (15000) roots which is an unde-
fined source. The generation process is applied based on
root-patterns using finite-state transducers theory. Almost
2.5 million verb forms are generated and classified within
a lexicon. Nevertheless, the verification of produced forms
remains unclear, thus the validation of forms cannot be
judged. That is because process of test is implemented on
small fragment of Nemlar [35] corpus which itself is not more
than (500000) words, including nouns and adjectives accord-
ing to his claims. Many systems, related to this research,
have been studied, but unfortunately we have not come across
a model that includes all the basic and standard roots in
the thesaurus, such as the Al-Mukhtar Al-Sahah1 dictionary,
which contains (7400) and more of roots, although the used
ones do not exceed (3000) roots.
In general, extracting root systems are mostly relying
on rule-based approach [36], which in turn has deficien-
cies in tackling abundance of Arabic language inflections

1 https://www.almaany.com/ar/dict/ar-ar/?c=%D9%85%D8%AE%D8%
AA%D8%A7%D8%B1%20%D8%A7%D9%84%D8%B5%D8%AD%
D8%A7%D8%AD c FIGURE 1. Tree-hierarchized structure for . ‘‘aim’’ root.

45868 VOLUME 7, 2019


B. Azman: Root Identification Tool for Arabic Verbs

Detecting root is the function of surface form verb and system, Stanford CoreNLP. Total number of dataset verbs
its path. Given a verb vfrom text fragment, we can retrieve tagged by embedded tool belong to Stanford called Max-
matched verbs against roots database. Since vis mostly entTagger amount to 7591 surface verbs excluding unsound
non-vocalized word, the result of query of search vretrieves tagged verbs. Those verbs belong to distinct (1938) roots, (76)
list of matched entries, that have variant roots. The matched quadrilateral root and the rest are trilateral roots.
verbs are clustered in clusters of roots, as the cluster head is The evaluation method, that we have adopted, is the com-
the root while cluster children are the matched verbs. The root monly used metric and the standard evaluation measure in the
of v is the cluster that has maximum number of children. As an IE community. The standard evaluation measures Precision,
example, consider verb ‘‘it announces’’ retrieved more Recall, and F-measure are used to evaluate the performance
than one root ( ). The sound one is of our system to compare the results of the comparative other
‘‘announce’’, so large retrieved in the list of verbs (children four systems [39]. The comparative systems cannot find the
of clusters) is for and less for rest clusters. root for some verbs (a few listed in TABLE 1). The failure in
Development of RootIT tool seems straightforward with- those systems might be due to two factors. The first one is that
out complexity. RootIT’s method to identify verb root, that such systems get satisfied by stem of verbs without diving
adopts mapping method the real verb in text against surface to root of verb, and the second one is limited rules pattern
verb forms stored in database as shown in tool architecture used by systems against verbs morphology requirements.
in FIGURE 2. Surfaces forms of verb are structured in tree Although Khoja system, that has obtained efficient accuracy
hierarchy, as the root in the tree’s root is followed with some in detecting verbs’ roots as well as nouns’ roots, it has been
levels which represent morphology features ended with leave unable to recognize some verb patterns that have evasiveness
level, which in turn represents the instances of verb forms. trait with formatted morphology rules in system algorithm of
such verbs as . . . etc.

TABLE 1. Example of verbs rooting results using four compared systems


and our system.

FIGURE 2. RootIT architecture.

Relational database is adopted for storing outputs of the


biggest Arabic morphology system – Sarf. Consequently,
we have developed a specific sub-tool acts on typed a root r
into Sarf interface hence registers Sarf outputs (whole surface
forms of r) inter of database. Sarf’s roots are imported from
common thesaurus of Al-ShAh . Al-ShAh has 7442 roots
of which 5674 trilateral roots and 1829 quadrilateral roots.
Overall surface verb forms resulted are 2,775,742 verbs and
514,734 gerunds. By those roots account, we can claim that However, due to our root-based approach, RootIT can
it comprises all Arabic roots and there was no root left. extract all patterns of a verb and trace correct path root that
However, there are some roots that are rarely used or unused most match the given verb pattern. Instead of using affixes
at all. To the side of surface root forms storage, some mor- (prefixes and suffixes) in a list along with formatted template
phology features are enrolled, such features active, passive, that stop confused toward abundances of Arabic verb mor-
tense, mood, aspect, diacritic form. The DBMS of derby is phology.
used since being the easiest one for embedding within related
systems. The preprocess of document-mode input of raw-data TABLE 2. Rooting accuracies of the five systems.
is POS-tagging which is implemented by a POS-tagger.

IV. EXPERIMENT
To assess the accuracy of the RootIT, a series of experiments
have been conducted. The effectiveness of the five systems
– Khoja et al. [20], ArabicStemmer [37], MADAMIRA [13],
Buckwalter [21] and our proposed rooter RootIT - has been
evaluated and compared in terms of the accuracy of the To illustrate that our RootIT is more efficient, we present
F-score measure. The data set used in our experiments is some results of Arabic verbs’ root identification systems
extracted from the most popular and standard Arabic cor- mentioned in TABLE 2. TABLE 2 shows the RootIT accuracy
pus (PATB) [38]. The adopted dataset consists of just all of 97.34%. In comparison, we observe that our proposed tool
verbs with their surface forms scattered in PATB, which are can produce better results as FIGURE 3 illustrated. One of
extracted from tagged and stored parsed trees in the enormous the main points of evaluation is the comparison of RootIT

VOLUME 7, 2019 45869


B. Azman: Root Identification Tool for Arabic Verbs

[6] F. Aqlan, X. Fan, A. Alqwbani, and A. Al-Mansoub, ‘‘Improved


Arabic–Chinese machine translation with linguistic input features,’’ Future
Internet, vol. 11, no. 1, p. 22, 2019.
[7] Sarf. [Online]. Available: www.alecso.org.tn
[8] M. C. Nouri, The Three Steps of Action in Arabic—A Phonetic Study Using
Computer. 2018.
[9] A. Abdelali, K. Darwish, N. Durrani, and H. Mubarak, ‘‘Farasa: A fast and
furious segmenter for arabic,’’ in Proc. Conf. North Amer. Chapter Assoc.
Comput. Linguistics, Demonstrations, 2016, pp. 11–16.
[10] M. Attia, ‘‘Developing a robust Arabic morphological transducer using
finite state technology,’’ in Proc. 8th Annu. CLUK Res. Colloq., 2005,
pp. 9–18.
[11] A. Amine, L. Bellatreche, Z. Elberrichi, E. J. Neuhold, and R. Wrembel,
‘‘Computer science and its applications,’’ in Proc. 5th IFIP TC Int.
Conf. (CIIA). Springer: Saida, Algeria, May 2015.
[12] W. Black et al., ‘‘Introducing the Arabic wordnet project,’’ in Proc. 3rd Int.
WordNet Conf., 2006, pp. 295–300.
[13] A. Pasha et al., ‘‘MADAMIRA: A fast, comprehensive tool for morpho-
logical analysis and disambiguation of Arabic,’’ in Proc. LREC, 2014,
FIGURE 3. Comparison between RootIT and other four systems. pp. 1094–1101.
[14] I. A. Al-Sughaiyer and I. A. Al-Kharashi, ‘‘Arabic morphological analysis
techniques: A comprehensive survey,’’ J. Amer. Soc. Inf. Sci. Technol.,
vol. 55, no. 3, pp. 189–213, 2004.
with the most competed systems. Hence, the RootIT results [15] E. Benmamoun, ‘‘Arabic morphology: The central role of the imperfec-
tive,’’ Lingua, vol. 108, nos. 2–3, pp. 175–201, 1999.
reflects our hypothesis as stated in section I. [16] B. Haddad, A. Awwad, M. Hattab, and A. Hattab, ‘‘Associative
As for the weak point of RootIT, it is resulted from root–pattern data and distribution in arabic morphology,’’ Data, vol. 3,
some verbs that have a pronoun attached with as suffix like no. 2, p. 10, 2018.
etc. Nevertheless, it is on the top among [17] S. Moscati, An Introduction to the Comparative Grammar of the
Semitic Languages: Phonology and Morphology. Markkleeberg, Germany:
competed systems (see FIGURE 3). Otto Harrassowitz, 1964.
[18] W. Salloum and N. Habash, ‘‘ADAM: Analyzer for dialectal arabic mor-
phology,’’ J. King Saud Univ.-Comput. Inf. Sci., vol. 26, no. 4, pp. 372–378,
V. CONCLUSON
2014.
We have developed a detector tool RootIT that enables to [19] M. Sawalha and E. Atwell, Visualisation of Arabic Morphology. 2012.
identify verb’s root exists in Arabic sentence as a first step [20] S. Khoja and R. J. L. Garside, ‘‘Stemming Arabic text,’’ Dept. Comput.,
towards the automated detection of all arguments of verb. Lancaster Univ., Lancashire, U.K., 1999.
[21] T. Buckwalter, ‘‘Buckwalter Arabic morphological analyzer version
We have developed this tool in order to supply linguistic 1.0,’’ Tech. Rep., 2002.
analyzers concerted on verb with its derivations and inflec- [22] M. N. Al-Kabi, S. A. Kazakzeh, B. M. A. Ata, S. A. Al-Rababah, and
tions, in addition to improve the performance of the Arabic I. M. Alsmadi, ‘‘A novel root based Arabic stemmer,’’ J. King Saud Univ.-
Comput. Inf. Sci., vol. 27, no. 2, pp. 94–103, 2015.
morphology systems at rooting verbs module. The method
[23] R. Alshalabi, ‘‘Pattern-based stemmer for finding Arabic roots,’’ Inf. Tech-
used adopts mapping of the real verb in text against surface nol. J., vol. 4, no. 1, pp. 38–43, 2005.
verb forms stored in database filled from outcomes of Sarf [24] J. Xu, A. Fraser, and R. Weischedel, ‘‘Empirical studies in strategies for
morphology system. Through the test, PATB dataset verbs Arabic retrieval,’’ in Proc. 25th Annu. Int. ACM SIGIR Conf. Res. Develop.
Inf. Retr., 2002, pp. 269–274.
are extracted for constructing our evaluation dataset. Four of [25] M. Ababneh, R. Al-Shalabi, G. Kanaan, and A. Al-Nobani, ‘‘Building an
state-of-the-art rooting system have been adopted for compar- effective rule-based light stemmer for Arabic language to improve search
ing their results with our system results. We have obtained effectiveness,’’ Int. Arab J. Inf. Technol., vol. 9, no. 4, pp. 368–372, 2012.
[26] L. S. Larkey, L. Ballesteros, and M. E. Connell, ‘‘Light stemming for
(97.34%) in f-measure as a top accuracy among competed
Arabic information retrieval,’’ Arabic Comput. Morphol.. Springer, 2007,
systems. Accompaniment of rest morphology features, which pp. 221–243.
have been already initialed in this model along with root, [27] K. Taghva, R. Elkhoury, and J. Coombs, ‘‘Arabic stemming without a root
remains as a future work as well as all derivated nouns owned dictionary,’’ in Proc. Int. Conf. Inf. Technol., Coding Comput., Apr. 2005,
pp. 152–157.
by root into the database on the goal of identify root of nouns [28] R. Sonbol, N. Ghneim, and M. S. Desouki, ‘‘Arabic morphological anal-
derivations. ysis: A new approach,’’ in Proc. 3rd Int. Conf. Inf. Commun. Technol.,
Theory Appl., Apr. 2008, pp. 1–6.
[29] M. Boudchiche and A. Mazroui, ‘‘Improving the Arabic root extraction by
REFERENCES using the quadratic splines,’’ in Proc. Int. Conf. Intell. Syst. Comput. Vis.,
[1] M. S. Basati, ‘‘The nominal sentence in Arabic grammar,’’ Tech. Rep., Apr. 208, pp. 1–5.
2018. [30] M. Momani and J. Faraj, ‘‘A novel algorithm to extract tri-literal Arabic
[2] W. Wright and C. P. Caspari, A Grammar of the Arabic Language. roots,’’ in Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl., May 2007,
New York, NY, USA: Cosimo, 2011. pp. 309–315.
[3] N. Kambhatla and I. Zitouni, ‘‘Systems and methods for automatic seman- [31] A. Al-Omari and B. Abuata, ‘‘Arabic light stemmer (ARS),’’ J. Eng. Sci.
tic role labeling of high morphological text for natural language processing Technol., vol. 9, no. 6, pp. 702–717, 2014.
applications,’’ Google Patent 8 527 262 B2, Sep. 3, 2013. [32] R. Mohammed, ‘‘New Arabic stemming based on Arabic patterns,’’ Iraqi
[4] H. Ishkewy, H. Harb, and H. Farahat. (2014). ‘‘Azhary: An arabic lexical J. Sci., vol. 57, no. 3C, pp. 2324–2330, 2016.
ontology.’’ [Online]. Available: https://arxiv.org/abs/1411.1999 [33] K. Darwish, ‘‘Building a shallow Arabic morphological analyser in one
[5] N. Habash, ‘‘Arabic morphological representations for machine transla- day,’’ in Proc. ACL Workshop Comput. Approaches semitic Lang., 2002,
tion,’’ Arabic Computational Morphology. Springer, 2007, pp. 263–285. pp. 1–8.

45870 VOLUME 7, 2019


B. Azman: Root Identification Tool for Arabic Verbs

[34] A. A. Neme, ‘‘A fully inflected Arabic verb resource constructed con- [38] M. Maamouri, A. Bies, T. Buckwalter, and W. Mekki, ‘‘The penn Arabic
structed from a lexicon of lemmas by using finite-state transducers,’’ Revue treebank: Building a large-scale annotated arabic corpus,’’ in Proc. NEM-
RIST, vol. 20, no. 2, p. 13, 2013. LAR Conf. Arabic Lang. Resour. Tools, 2004, pp. 466–467.
[35] M. Yaseen et al., ‘‘Building annotated written and spoken Arabic LRs in [39] D. M. Powers, ‘‘Evaluation: From precision, recall and F-measure to ROC,
NEMLAR project,’’ in Proc. LREC, 2006, pp. 533–538. informedness, markedness and correlation,’’ Tech. Rep., 2011.
[36] H. Khafajeh, N. Yousef, and M. Abdeldeen, ‘‘Arabic root extraction using
a hybrid technique,’’ Int. J. Adv. Comput. Res., vol. 8, no. 35, pp. 90–96, Authors’ photographs and biographies not available at the time of
2018. publication.
[37] ArabicStemmer. [Online]. Available: https://www.arabicstemmer.com/

VOLUME 7, 2019 45871

View publication stats

You might also like