Computer Science Department, MS Thesis Presentation Jeremy Lim "LLM Cheat Prevention Via Adversarial Question Paraphrasing"

Thursday, April 24, 2025
10:30 am to 11:30 am

Jeremy Lim

MS Student

WPI – Computer Science Department 

 

Thursday, April 24, 2025 

Time: 10:30 AM – 11:30 AM 

Location : Fuller Labs 141 

 

Advisor: Prof. Fabricio Murai

Reader: Prof. Raha Moraffah 

Abstract :

As Large Language Model (LLM) chatbots have become easy to access, their use to cheat on schoolwork has become a serious problem. However, existing methods to combat this via detecting LLM-generated text are imperfect and may create harmful false-positives that can damage the reputation of honest students.

This project explores an alternate strategy. We develop an “inoculation” process by adversarially generating paraphrases for a question in order to discover semantically identical questions that are resistant to correct answer via LLMs.

We explore a preliminary strategy to search for inoculated questions. We prompt a small LLM, Llama 3.2 3B, to generate several paraphrases for each question in MMLU. To improve paraphrase quality, we employ the same model as a judge to evaluate each generated paraphrase for validity. Finally, we evaluate GPT-4o mini’s accuracy on the generated paraphrases.

Using a small LLM for the paraphrase generation process, we find a successful inoculation candidate for 35.7% of questions whose original phrasing are correctly answered by GPT-4o mini. Furthermore, we observe that 13% of paraphrased questions prompt incorrect responses. This work demonstrates the feasibility of a black-box approach that discovers inoculated questions, whilst limiting the needed computational power   and preserving question semantics.

Audience(s)

Department(s):

Computer Science