Abstract: Large Language Models (LLMs) are increasingly used for tasks like match summarization and explanation in hiring pipelines. However,
these systems are vulnerable to prompt injection attacks, where malicious input manipulates the behavior of the model. In this paper,
we investigate a class of prompt injection attacks that aim to deceive LLM-based feature extractors into overestimating candidate
qualifications based on manipulated resume content. We present real-world examples of such resumes and evaluate the effectiveness
of various mitigation strategies. Specifically, we conduct a comparative vulnerability analysis across multiple models, prompting
techniques, and output formats. We also provide empirical results demonstrating the impact of these mitigations, showing before-andafter performance across key evaluation metrics. Our findings offer actionable best practices for securing LLM-powered extraction
pipelines against adversarial user-generated content.
Loading