Mohsen Ghasemizade

About

I am a PhD Student in Computer Science at the University of Vermont, currently serving as a MassMutual Research Fellow. My research lies at the intersection of privacy, ethics, and artificial intelligence, with a focus on GDPR compliance, differential privacy, and the privacy-preserving applications of Large Language Models.

Previously, I completed my M.S. in Computer Science with a thesis on the hierarchical analysis of conspiracy theories. Beyond my academic work, I am an advocate for safe streets and active transportation, serving as a Board Member for Local Motion in Burlington, VT.
Advisee of Juniper Lovato
Member of Computational Ethics Lab

Research Interests

- Privacy Policy
- Differential Privacy
- Large Language Models
- Conspiracy Theories
- Honeypots

Projects

AIM High, Stay Private

Addresses a critical privacy challenge by using Differential Privacy to create private, synthetic health data that protects individual identities.
- Successfully applied this method to a real-world health study (LEMURS) of college students, protecting sensitive data collected from Oura rings and personal health surveys.
- Demonstrates that a practical balance is achievable, identifying a "sweet spot" where the generated data remains highly useful for researchers while significantly reducing privacy risks for participants.

Developing a hierarchical model for unraveling conspiracy theories

- Developed a scientifically structured 'family tree' of conspiracy theories, categorizing  and illustrating the connections among various conspiracies to enhance community understanding. 
-  Created the dataset by scraping articles from fact-checking websites and efficiently labeling them using , simplifying the process of identifying the main themes in each article.
- Developed a binary classifier using various machine learning methods, and our RoBERTa model achieved the highest performance with an F1 score of 87%, effectively distinguishing between conspiracy-related and non-conspiracy articles.
- Utilized the HDBSCAN  + UMAP algorithm to facilitate effective data clustering and exploration, generating labels to be added to the main family tree.

Privacy Policy Compliance Scoring System

Automated evaluation of privacy policies against evolving legal standards.

- Developing a novel scoring system to assess tech companies' privacy policies for compliance with major legislation, ranging from the 1980 OECD guidelines to modern GDPR and CCPA frameworks.
- Tracking the evolution of policy language over time to identify shifts in compliance and transparency.
- Utilizing a fine-tuned LLM model to automate the interpretation and ranking of complex legal text.

Enhancing Government Contract Transparency with LLMs

Leveraging Large Language Models to increase transparency in state government spending.
- Developed a contract classifier for over 20,000 Iowa state government contracts.
- Utilized Google Gemini embeddings with average embedding pooling, achieving an 85% F1 score.
- Implemented Few-Shot Chain-of-Thought prompting with GPT to handle complex classification tasks (71% accuracy).
- Fine-tuned LLaMA models for large-scale assisted labeling and multi-class classification.

Distributed ​​Cryptocurrency System

- Simulated a secure, distributed cryptocurrency system, ensuring transparency and trust by enabling each participant to control and validate the ledger, effectively preventing fraudulent activities such as double spending.
- Leveraging a robust peer-to-peer network framework, our simulation demonstrates the essential processes of a cryptocurrency operation including transaction signing, block mining, broadcasting, and validation, culminating in a dynamic and decentralized digital currency ecosystem.