Keerthana Murugaraj 

Doctoral researcher at the FSTM

Keerthana works on the project ‘Innovative Historical Research by Leveraging Topic Modeling and LLM-Powered RAG Chatbots on Impresso large atom collection of newspaper articles.’ under the supervision of MARTIN THEOBALD

Historical newspapers are one of the most valuable resources for historians and other researchers in the field of history. They facilitate our need for important details regarding global history, politics, culture and public opinion. They offer valuable information about the political and environmental issues and societal views, providing new avenues for research. Digitizing newspapers has become increasingly popular in recent years thanks to its ability to streamline access to pertinent information. Also, providing access to the digitized newspapers through online platforms allow historians to filter specific articles, which can then be analyzed using text mining and data analysis tools. This enables historians to uncover hidden trends, and understand biased or misleading reports, among other insights. However, it remains challenging and often unfeasible for historians to study, analyze, and interpret all available historical sources.
 
My current research aims to develop an effective topic model on a large collection of newspaper articles spanning from 1790’s to 2018. The model automatically identify all the hidden topics that can then be linked back to their original articles. My work also focuses on building interactive visualizations of the identified topics and their evolution over time. This visualization will be helpful for historians to grasp the topic trends in limited time. I extend my research by developing a real-time Retrieval-Augmented Generation (RAG) Chatbot for historians to interact with this corpus. My research leverages neural-based topic modeling algorithms, along with large language models (LLMs) and RAG technique, to delve into large collections of historical data. I  employ advanced data mining techniques and tools to address the gap between historians’ practical research requirements and the abundant historical newspapers accessible within my corpus. These techniques and strategies will ultimately result in developing a valuable research tool that allows historians to efficiently and comprehensively analyse large collections of historical newspapers.