Close

Pardis Sadat Zahraei

PhD Student in Computer Science

Download Resume

About Me

Your Name

I am a first-year PhD student in Computer Science at the University of Illinois at Urbana-Champaign (UIUC). My research focuses on Natural Language Processing (NLP), with a particular interest in the safety and alignment of large language models (LLMs). I develop methods and benchmarks to evaluate and enhance LLMs in areas such as multilingual and cross-cultural understanding, reasoning, and ethics. You can also find me active on X (Twitter).

Publications

MENA Values Benchmark for Evaluating Cultural Alignment and Multilingual Bias in LLMs (Under Review - NeurIPS 2025)

Pardis Sadat Zahraei, Ehsaneddin Asgari

We introduce MENAValues, a new benchmark to evaluate how LLMs align with the cultural values of the Middle East and North Africa. Our research reveals that LLM responses are sensitive to language and framing, with models showing shifts in cultural alignment and, in some cases, producing biased outputs. We also identify a phenomenon called "Logit Leakage," where hidden model preferences are exposed through log-probability analysis. This work highlights the importance of using frameworks like MENAValues to assess the cultural sensitivity of LLMs.

Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in LLM Translations (Findings-ACL 2025)

Pardis Sadat Zahraei, Ali Emami

This paper addresses gender bias and logical coherence in machine translation, particularly between gendered languages like English and genderless ones such as Persian. We introduce the Translate-with-Care (TWC) dataset, which includes 3,950 challenging scenarios to test translation systems. Our findings show that all tested models struggle with genderless content, often defaulting to masculine pronouns in professional contexts. We demonstrate that fine-tuning an open-source model on our dataset can significantly reduce these biases and errors, outperforming proprietary LLMs.

WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts (EACL 2024)

Pardis Sadat Zahraei, Ali Emami

Tree-of-Experts (ToE) is a new prompting method that improves the generation of Winograd Schema Challenge questions, achieving 50% valid cases compared to 10% with existing methods. Using ToE, we created WSC+, a dataset of 3,026 LLM-generated questions that includes new categories for ambiguous and offensive content. Our findings show that while GPT-4 leads LLM performance on WSC+ with 68.7% accuracy, this falls well below human performance of 95.1%. We also found that LLMs don't necessarily answer their own generated questions better than those created by other models.

View Paper | View GitHub | View Video

TuringQ: Benchmarking AI Comprehension in Theory of Computation (Findings-EMNLP 2024)

Pardis Sadat Zahraei, Ehsaneddin Asgari

TuringQ is the first benchmark that tests LLMs' reasoning abilities in theoretical computer science. Testing with Chain of Thought prompting on various LLMs, we developed an automated evaluation system that performs similarly to human experts. Fine-tuning Llama3-8B on TuringQ improved both its theoretical reasoning and performance on related tasks like algebra, demonstrating the benchmark's value for advancing LLM capabilities in computational theory.

View Paper | View GitHub | View Dataset | View Model

Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare

Pardis Sadat Zahraei, Zahra Shakeri

Biased AI medical advice poses risks to patient safety as LLMs increasingly influence healthcare decisions. This study introduces two key resources: BiasMD (6,007 Q&A pairs for bias evaluation) and DiseaseMatcher (32,000 clinical Q&As covering 700 diseases). Using these datasets, we developed EthiClinician, a fine-tuned model that surpasses GPT-4 in ethical reasoning and clinical judgment, setting new standards for safer AI-driven healthcare outcomes.

View Paper | View GitHub | View BiasMD | View DiseaseMatcher | View Model

Generative AI for Character Animation: A Comprehensive Survey

Mohammad Mahdi Abootorabi, Omid Ghahroodi, Pardis Sadat Zahraei, et al.

This survey offers a comprehensive overview of how generative AI is applied to character animation, covering facial animation, motion synthesis, and more. It highlights key research, datasets, and trends, providing a single, integrative perspective on the field. The paper also discusses open challenges and future research directions to help researchers and developers advance AI-driven animation technologies.

Projects

Persian Ease & Persian Formalizer

Persian Ease and Persian Formalizer are a pair of complementary language models fine-tuned for Persian text style transfer: Persian Ease transforms formal Persian text into a more casual, conversational style Persian Formalizer converts informal Persian text into formal language suitable for professional or academic contexts Both models leverage fine-tuning techniques to preserve meaning while adapting the linguistic style appropriately.

View PersianEase | View PersianTextFormalizer

Persian NER and Text Summarization

Implemented dual transformer-based models (mT5 and BERT) for Persian language processing, featuring Named Entity Recognition (NER) for token classification and an abstractive text summarization system.

View PersianSummarizer | View PersianNER
View More Projects

Cross-Lingual Drug Name Prediction

This project develops a model to accurately predict drug names in both Persian and English using embedding techniques, FastText and BERT. By leveraging these embeddings, the model predicts drug names based on specific features and patterns in the input data. This bilingual approach enables pharmaceutical and healthcare applications to enhance drug name identification and suggestion in multilingual environments.

View Project

Sentiment Analysis on Social Media Data with Custom NLP Pipeline

This project conducts sentiment analysis on Twitter and YouTube data, implementing a specialized preprocessing pipeline to improve accuracy and contextual understanding. The pipeline includes essential NLP steps such as lemmatization, tokenization, NER, and spell checking, along with advanced customizations like bigram verification, contradiction resolution, and a slang dictionary tailored to social media language. These techniques enable more accurate and nuanced sentiment insights by accounting for informal language, abbreviations, and unique social media expressions.

View Project

Contact

I'm always open to conversations and potential collaborations! Feel free to reach out at zahraei2 [at] illinois [dot] edu .