Abstract:
With the great development in the field of digitization, the extraction of topics through
information that is in the form of unmarked texts, is not an easy matter. Therefore, we
need a topic modeling technique, which is based on unsupervised algorithms.
In our thesis, we clarify the concept of topic modeling and the inherent approaches, such as
Latent Dirichlet Allocation (LDA), Embedded Topic Model (ETM), Gaussian LDA (G-LDA),
and LDA with Word2Vec (LDA2Vec).
In the experimental work, we make an empirical comparison between both LDA and ETM
methods on the 20 newsgroups, in terms of runtime and topic coherence. The results are in
favor of the ETM method