Arabic Natural Language Processing using Deep Learning

  • Ayaa G. Alyousef

Student thesis: Master's Thesis

Abstract

Social media has become one of the main ways of communication all over the world, including Arab countries, and generates a great amount of online data per minute. This provides good opportunities to make use of this data by applying deep learning and machine learning techniques to automatically understand sentiment from the population. Social media analysis for Arabic is in high demand but more challenging than languages with simple grammar such as English. Recently, deep learning has been widely used across many applications (such as image analysis) for complex models and comprehensive analysis. However, deep learning methods need a huge amount of data and an extremely long time to train the model. Training complex models on limited amount of data loses its accuracy. Transfer learning, which is a technique that makes use of an existing model and adapts it to an application using limited number of new data, can be used to alleviate this issue. This work proposes a novel method to combine transfer learning on Word2Vec with deep learning classification for more accurate Arabic short message analysis. Our proposed method uses Bidirectional Long Short-Term Memory (Bi-LSTM) deep learning architecture which is widely known for its ability to deal with sequential data (such as text) and applies transfer learning on an existing Word2Vec model to reuse its knowledge and incrementally learn new words automatically which improves the overall accuracy for sentiment analysis. We used the benchmark dataset Arabic Sentiment Tweets Dataset for our performance comparison and evaluation. The results show that an improvement in accuracy is achieved by using our proposed method when compared with state-of-the-art methods.
Date of AwardJul 2020
Original languageAmerican English

Keywords

  • Sentiment Analysis
  • Arabic Natural Language Processing
  • Transfer Learning
  • Deep Learning
  • Social Media Analysis
  • Word Embedding

Cite this

'