Text Mining for Classification and Sentiment Analysis of Personal Stories

Authors

  • Adriana Soledad Ruiz Diaz Universidad CAECE, Argentina
  • Miguel Mendez Garabetti Free and Open Source Software/Hardware Research Laboratory, Argentina

Keywords:

Text Mining, Machine Learning, Classification, Sentiment Analysis

Abstract

This work aims to implement tools and machine learning techniques to automate the process of analyzing the narratives collected in three editions of the book "Matilda and Women in Engineering in Latin America." The goal is to identify factors that influence the choice and practice of an engineering career by women. The methodology will follow the proposed guidelines for a Knowledge Discovery in Texts (KDT) process. The work will be divided into several stages: understanding the application domain, data extraction, cleaning, processing and transformation of data, and model development. Currently, the project is in the phase of constructing the corpus and removing non-significant patterns of information. Next, the text will be tokenized to understand its characteristics, and the most suitable technique for quantifying the set of words present in the corpus will be evaluated. A supervised machine learning model will be built to predict the main theme of the narrative, and its sentiment will be analyzed based on that theme. The sentiment analysis will be performed by considering sentiment as the sum of the sentiments of each of the words that compose it.

Downloads

Published

2023-07-10

Issue

Section

AGRANDA - Simposio Argentino de Ciencia de Datos y GRANdes DAtos

How to Cite

Ruiz Diaz, A. S., & Mendez Garabetti, M. (2023). Text Mining for Classification and Sentiment Analysis of Personal Stories. JAIIO, Jornadas Argentinas De Informática, 9(1). https://revistas.unlp.edu.ar/JAIIO/article/view/18235