Mohsen Ghasemizade

About

Privacy policy enthusiast and current PhD candidate in computer science with a background in Generative AI, NLP, Deep Learning, Differential Privacy, and data analytics. I have experience developing end-to-end NLP pipelines and machine learning models . I'm interested in leveraging data and AI to find practical solutions and insights. I've worked on uncovering patterns in complex datasets, particularly in social media behavior and data privacy.
Advisee of Juniper Lovato
Member of Computational Ethics Lab

Research Interests

- Privacy Policy
- Differential Privacy
- Large Language Models
- Conspiracy Theories
- Honeypots

AIM High, Stay Private

- Addresses a critical privacy challenge by using Differential Privacy to create private, synthetic health data that protects individual identities.
- Successfully applied this method to a real-world health study (LEMURS) of college students, protecting sensitive data collected from Oura rings and personal health surveys.
- Demonstrates that a practical balance is achievable, identifying a "sweet spot" where the generated data remains highly useful for researchers while significantly reducing privacy risks for participants.

Developing a Hierarchical Model for Unraveling Conspiracy Theories

- Developed a scientifically structured 'family tree' of conspiracy theories, categorizing and illustrating the connections among various conspiracies to enhance community understanding.
- Created the dataset by scraping articles from fact-checking websites and efficiently labeling them using Keyphrase Extraction, simplifying the process of identifying the main themes in each article.
- Developed a binary classifier using various machine learning methods, and our RoBERTa model achieved the highest performance with an F1 score of 87%, effectively distinguishing between conspiracy-related and non-conspiracy articles.
- Utilized the HDBSCAN + UMAP algorithm to facilitate effective data clustering and exploration, generating labels to be added to the main family tree.

Hacker Detector with Honey Documents

- Developed Google documents filled with simulated hacking methods for distribution acrosspaste sites.
- Utilized Cutlly API, Google App Script, and a self-controlled domain to examine visitor metrics such as visit count, edits made, geolocation, browser type, operating system, and device used.
- Differentiated between bots and non-bots accesses.

Music Genre Classification

Performed support vector machine (SVM), multilayer perceptron (MLP), convolutional neural network (CNN), decision trees, k-fold cross validation, linear and logistic regression on metadata of over 1 million songs to classify the genre, based on 7 different features.

Distributed Cryptocurrency System

- Simulated a secure, distributed cryptocurrency system, ensuring transparency and trust by enabling each participant to control and validate the ledger, effectively preventing fraudulent activities such as double spending.
- Leveraging a robust peer-to-peer network framework, our simulation demonstrates the essential processes of a cryptocurrency operation including transaction signing, block mining, broadcasting, and validation, culminating in a dynamic and decentralized digital currency ecosystem.