Macquarie University
Browse

Evaluating the performance of Large Language Model and Prompting Technique combinations for vulnerability prioritisation

Download (4.11 MB)
thesis
posted on 2025-12-03, 02:08 authored by Osama Al Haddad
<p dir="ltr">Security analysts are under growing pressure to assess increasingly complex and high-volume vulnerabilities. This impacts the ability of organisations to correctly respond to vulnerabilities and keep corporate systems secured, leading to damaging breaches. To assist Security Operations Centre teams in accurately and rapidly prioritise software vulnerabilities, this thesis evaluates Large Language Model (LLM) and Prompting Technique (PT) combinations to prioritise software vulnerabilities, using the Cybersecurity Infrastructure and Security Agency's Stakeholder-Specific Vulnerability Categorization (SSVC) framework. </p><p dir="ltr">OpenAI ChatGPT 4o-mini, Anthropic Claude 3 Haiku, and Google Gemini Flash, across 12 prompting techniques, were instructed to analyse 384 real-world vulnerability samples over three trials, and to return values for the four SSVC decision points (SDP). For each trial, F1-Scores were calculated for each LLM-PT-SDP combination. A harmonic mean was then calculated across the three trials to yield a single performance score for each LLM-PT-SDP combination. Further analysis was then conducted to determine LLM-PT-SDP combination performance based on the data detail level available for each vulnerability (i.e., high, medium, and low) and whether the vulnerability was published before or after the LLM's knowledge cutoff date. </p><p dir="ltr">Gemini Flash 1.5 performed strongest overall, showing moderate performance on three SDPs, as did ChatGPT 4o-mini. Claude 3 Haiku achieved moderate performance on only one SDP. Gemini excelled on the Exploitation SDP, while all LLMs performed moderately on the Technical Impact SDP. All LLM and PT combinations performed poorly on the Mission & Wellbeing SDP. Exemplar-based PTs generally outperformed others. Performance generally improved with higher data detail levels, though some LLM-PT combinations performed moderately even with limited data. LLMs generally performed better on vulnerabilities published after LLM knowledge cutoff dates, with few exceptions. </p><p dir="ltr">On select SDPs, LLMs in combination with the appropriate PT are capable of moderate performance. Depending on the vulnerability data detail level and its publication date relative to the knowledge cutoff date, LLMs may also exhibit good performance. Consequently, LLMs can be a useful aid to assist SOC teams accurately and rapidly prioritise software vulnerabilities.</p>

History

Table of Contents

1. Introduction -- 2. Related Works -- 3. Research Methods -- 4. Evaluation and Results -- 5. Assumptions, Limitations and Constraints -- 6. Conclusions and Future Research -- A. Appendix -- References

Notes

Additional Supervisor 3: Young Lee

Awarding Institution

Macquarie University

Degree Type

Thesis MRes

Degree

Master of Research

Department, Centre or School

School of Computing

Year of Award

2025

Principal Supervisor

Muhammad Ikram

Additional Supervisor 1

Hassan Asghar

Additional Supervisor 2

Ejaz Ahmed

Rights

Copyright: The Author Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer

Language

English

Extent

155 pages

Former Identifiers

AMIS ID: 526509