Computer Science Department, MS Thesis Presentation Jeremy Lim "LLM Cheat Prevention Via Adversarial Question Paraphrasing"
10:30 am to 11:30 am
Jeremy Lim
MS Student
WPI – Computer Science Department
Thursday, April 24, 2025
Time: 10:30 AM – 11:30 AM
Location : Fuller Labs 141
Advisor: Prof. Fabricio Murai
Reader: Prof. Raha Moraffah
Abstract :
As Large Language Model (LLM) chatbots have become easy to access, their use to cheat on schoolwork has become a serious problem. However, existing methods to combat this via detecting LLM-generated text are imperfect and may create harmful false-positives that can damage the reputation of honest students.
This project explores an alternate strategy. We develop an “inoculation” process by adversarially generating paraphrases for a question in order to discover semantically identical questions that are resistant to correct answer via LLMs.
We explore a preliminary strategy to search for inoculated questions. We prompt a small LLM, Llama 3.2 3B, to generate several paraphrases for each question in MMLU. To improve paraphrase quality, we employ the same model as a judge to evaluate each generated paraphrase for validity. Finally, we evaluate GPT-4o mini’s accuracy on the generated paraphrases.
Using a small LLM for the paraphrase generation process, we find a successful inoculation candidate for 35.7% of questions whose original phrasing are correctly answered by GPT-4o mini. Furthermore, we observe that 13% of paraphrased questions prompt incorrect responses. This work demonstrates the feasibility of a black-box approach that discovers inoculated questions, whilst limiting the needed computational power and preserving question semantics.