Cover Image for System.Linq.Enumerable+EnumerablePartition`1[System.Char]

An Arabic Dialects Dictionary Using Word Embeddings

Azroumahli Chaimae, Yacine El Younoussi, Otman Moussaoui, Youssra Zahidi

OAI: oai:igi-global.com:251899 • DOI: 10.4018/IJRSDA.2019070102

Published by: IGI Global

Abstract

The dialectical Arabic and the Modern Standard Arabic lacks sufficient standardized language resources to enable the tasks of Arabic language processing, despite it being an active research area. This work addresses this issue by firstly highlighting the steps and the issues related to building a multi Arabic dialect corpus using web data from blogs and social media platforms (i.e. Facebook, Twitter, etc.). This is to create a vectorized dictionary for the crawled data using the word Embeddings. In other terms, the goal of this article is to build an updated multi-dialect data set, and then, to extract an annotated corpus from it.

IT Research & Theory Computer Science and Information Technology Data Mining and Databases