Evaluation of Arabic-Based Contextualized Word Embedding Models

dc.contributor.authorYagi, Sane Mo
dc.contributor.authorMansour, Youssef
dc.contributor.authorKamalov, Firuz
dc.contributor.authorElnagar, Ashraf
dc.date.accessioned2022-05-19T09:57:04Z
dc.date.available2022-05-19T09:57:04Z
dc.date.copyright© 2021
dc.date.issued2021
dc.descriptionThis conference paper is not available at CUD collection. The version of scholarly record of this paper is published in 2021 International Conference on Asian Language Processing (IALP) (2021), available online at: https://doi.org/10.1109/IALP54817.2021.9675208.
dc.description.abstractThe distributed representation of words, as in Word2Vec, FastText, and GloVe, results in the production of a single vector for each word type regardless of the polysemy or homonymy that many words may have. Context-sensitive representation as implemented in deep learning neural networks, on the other hand, produces different vectors for the multiple senses of a word. Several contextualized word embeddings have been produced for the Arabic language (e.g., AraBERT, QARiB, AraGPT, etc.). The majority of these were tested on a few NLP tasks but there was no direct comparison between them. As a result, we do not know which of these is most efficient and for which tasks. This paper is a first step in an endeavor to establish evaluation criteria for them. It describes 24 such embeddings, then conducts exploratory intrinsic and extrinsic evaluation of them. Afterwards, it tests relational knowledge in them, covering four semantic relations: colors of fruits, capitals of countries, causation, and general information. It also evaluates the utility of these models in Named Entity Recognition and Sentiment Analysis tasks. It has been demonstrated here that AraBERTv02 and MARBERT are the best on both types of evaluation; therefore, both are recommended for fine-tuning Arabic NLP tasks. The ultimate conclusion is that it is feasible to test higher order reasoning relations in these embeddings. © 2021 IEEE
dc.identifier.citationYagi, S. M., Mansour, Y., Kamalov, F., & Elnagar, A. (2021). Evaluation of arabic-based contextualized word embedding models. 2021 International Conference on Asian Language Processing (IALP), pp. 200 - 206. https://doi.org/10.1109/IALP54817.2021.9675208
dc.identifier.isbn978-166548311-7
dc.identifier.urihttps://doi.org/10.1109/IALP54817.2021.9675208
dc.identifier.urihttp://hdl.handle.net/20.500.12519/645
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.relationAuthors Affiliations : Yagi, S.M., Dept. of Foreign Languages, University of Sharjah, Sharjah, United Arab Emirates; Mansour, Y., Dept. of Computer Science, University of Sharjah, Sharjah, United Arab Emirates; Kamalov, F., Dept. of Electrical Engineering, Canadian University of Dubai, Dubai, United Arab Emirates; Elnagar, A., Dept. of Computer Science, University of Sharjah, Sharjah, United Arab Emirates
dc.relation.ispartofseries2021 International Conference on Asian Language Processing (IALP)
dc.rightsPermission to reuse abstract has been secured from Institute of Electrical and Electronics Engineers Inc.
dc.rights.holderCopyright : © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.rights.urihttps://www.ieee.org/publications/rights/rights-policies.html
dc.subjectBERT
dc.subjectExtrinsic evaluation
dc.subjectIntrinsic evaluation
dc.subjectLanguage Models
dc.titleEvaluation of Arabic-Based Contextualized Word Embedding Models
dc.typeConference Paper
dspace.entity.type

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Access Instruction 645.pdf
Size:
56.32 KB
Format:
Adobe Portable Document Format
Description: