Name: Computer Science Department, MS Thesis Presentation Jeremy Lim "LLM Cheat Prevention Via Adversarial Question Paraphrasing"
Start: 2025-04-24T10:30:00-0400
End: 2025-04-24T11:30:00-0400
Location: Worcester Polytechnic Institute

Jeremy Lim

MS Student

WPI – Computer Science Department

Thursday, April 24, 2025

Time: 10:30 AM – 11:30 AM

Location : Fuller Labs 141

Advisor: Prof. Fabricio Murai

Reader: Prof. Raha Moraffah

Abstract :

As Large Language Model (LLM) chatbots have become easy to access, their use to cheat on schoolwork has become a serious problem. However, existing methods to combat this via detecting LLM-generated text are imperfect and may create harmful false-positives that can damage the reputation of honest students.

This project explores an alternate strategy. We develop an “inoculation” process by adversarially generating paraphrases for a question in order to discover semantically identical questions that are resistant to correct answer via LLMs.

We explore a preliminary strategy to search for inoculated questions. We prompt a small LLM, Llama 3.2 3B, to generate several paraphrases for each question in MMLU. To improve paraphrase quality, we employ the same model as a judge to evaluate each generated paraphrase for validity. Finally, we evaluate GPT-4o mini’s accuracy on the generated paraphrases.

Using a small LLM for the paraphrase generation process, we find a successful inoculation candidate for 35.7% of questions whose original phrasing are correctly answered by GPT-4o mini. Furthermore, we observe that 13% of paraphrased questions prompt incorrect responses. This work demonstrates the feasibility of a black-box approach that discovers inoculated questions, whilst limiting the needed computational power and preserving question semantics.

Computer Science Department, MS Thesis Presentation Jeremy Lim "LLM Cheat Prevention Via Adversarial Question Paraphrasing"

Department(s):