Aritra Kumar Lahiri

I am a Ph.D. Graduate in Computer Science (Oct. 2025) with a natural flair for combining technical know-how with creative element. My doctoral thesis is focused on Context-Aware Question Answering, Multimodal Retrieval Augmented Generation (RAG) use cases on Medical applications and Conversational Chat engines.

As part of my research work, the progress can be summarized as below -

Implemented a Multimodal RAG pipeline tool, AlzheimerRAG for biomedical research use cases, primarily focusing on Alzheimer's disease from PubMed articles. The pipeline incorporates multimodal fusion techniques to integrate textual and visual data processing by efficiently indexing and accessing vast amounts of biomedical literature.
Tools & Technology: LLMs like fine-tuned variants of LlaMA and LlaVA, LangChain, FaissDB, Jinja2, FastAPI, and GPT-4.0.
Published DragonVerseQA: Open-Domain Long-Form Context-Aware Question-Answering Dataset based on the fantasy universe genre of TV series “House of Dragons” and “Game of Thrones”. It combines full episode summaries sourced from HBO and fandom wiki websites, user reviews from sources like IMDb and Rotten Tomatoes, high-quality, open-domain, legally admissible sources, and structured data from repositories like WikiData into one dataset providing a multidimensional context. It also integrates a Knowledge Graph for context analysis.
Developed novel QA dataset GameofThronesQA v1.0 using Answer Aware QA technique in NLP. Implemented a novel pipeline approach for answer-aware question generation, where the answers are extracted based on the named entities from the TV series.
Presented a named entity-based sentiment analysis dataset OTTQA v1.0. Supportive tweets are also extracted according to their relevancy with the answer span keyword, which is used to gauge opinion changes of OTT series characters over a given time.
Retrieved the most relevant clinical trials from PubMed Data based on provided topic tracks utilizing natural language processing techniques and neural language models. Performed data preparation tasks using PubMed Parser for article extraction, combined cleaning, and data preprocessing for model input for information retrieval. Employed Sentence Transformer and Doc2Vec models for feature extraction and reranked the articles based on Cosine similarity scores.Part of: TREC Clinical Trials track 2023.

You can refer to the project sections for more details.

When I am not working, I like to travel, hike, explore new places or go on a long drive. Also I like playing any kind of sports (especially cricket and badminton) that keeps me fit and energized. I am passionate about creative writing. Please find my blogs here.

Let's connect

Want to have a quick look at my projects, please refer my Github repo, link below