This work addresses a problem of text classification. The research question focuses on
identifying the optimal workflow to solve the given problem. The dataset consists of
roughly 430,000 labeled e-mails. The problem is tackled in two steps, namely vectorizing
the text and applying a classification algorithm. Several algorithms, including Word2vec
and Tf-idf for vectorization, and Random Forest, Support Vector Machine, Graph Neural
Network, and Feed-Forward Neural Network for classification, were evaluated. The results
show that Word2Vec performed well, while Tf-idf had too high memory demands. In terms
of classification, the Feedforward Neural Network achieved the highest F1 scores of 0.89-
0.90 depending on the trial, followed by Random Forest and Support Vector Machine with
F1 scores of 0.87-0.89, while the graph neural network achieved F1 scores of 0.80-0.87.
Klíčová slova
-
Klíčová slova v angličtině
Text classification, e-mail, Word2vec, Random Forest, Support Vector Machine, Neural Network
Rozsah průvodní práce
p. vi, p. 47
Jazyk
AN
Anotace
-
Anotace v angličtině
This work addresses a problem of text classification. The research question focuses on
identifying the optimal workflow to solve the given problem. The dataset consists of
roughly 430,000 labeled e-mails. The problem is tackled in two steps, namely vectorizing
the text and applying a classification algorithm. Several algorithms, including Word2vec
and Tf-idf for vectorization, and Random Forest, Support Vector Machine, Graph Neural
Network, and Feed-Forward Neural Network for classification, were evaluated. The results
show that Word2Vec performed well, while Tf-idf had too high memory demands. In terms
of classification, the Feedforward Neural Network achieved the highest F1 scores of 0.89-
0.90 depending on the trial, followed by Random Forest and Support Vector Machine with
F1 scores of 0.87-0.89, while the graph neural network achieved F1 scores of 0.80-0.87.
Klíčová slova
-
Klíčová slova v angličtině
Text classification, e-mail, Word2vec, Random Forest, Support Vector Machine, Neural Network