posted on 2025-12-03, 02:08authored byOsama Al Haddad
<p dir="ltr">Security analysts are under growing pressure to assess increasingly complex and high-volume vulnerabilities. This impacts the ability of organisations to correctly respond to vulnerabilities and keep corporate systems secured, leading to damaging breaches. To assist Security Operations Centre teams in accurately and rapidly prioritise software vulnerabilities, this thesis evaluates Large Language Model (LLM) and Prompting Technique (PT) combinations to prioritise software vulnerabilities, using the Cybersecurity Infrastructure and Security Agency's Stakeholder-Specific Vulnerability Categorization (SSVC) framework. </p><p dir="ltr">OpenAI ChatGPT 4o-mini, Anthropic Claude 3 Haiku, and Google Gemini Flash, across 12 prompting techniques, were instructed to analyse 384 real-world vulnerability samples over three trials, and to return values for the four SSVC decision points (SDP). For each trial, F1-Scores were calculated for each LLM-PT-SDP combination. A harmonic mean was then calculated across the three trials to yield a single performance score for each LLM-PT-SDP combination. Further analysis was then conducted to determine LLM-PT-SDP combination performance based on the data detail level available for each vulnerability (i.e., high, medium, and low) and whether the vulnerability was published before or after the LLM's knowledge cutoff date. </p><p dir="ltr">Gemini Flash 1.5 performed strongest overall, showing moderate performance on three SDPs, as did ChatGPT 4o-mini. Claude 3 Haiku achieved moderate performance on only one SDP. Gemini excelled on the Exploitation SDP, while all LLMs performed moderately on the Technical Impact SDP. All LLM and PT combinations performed poorly on the Mission & Wellbeing SDP. Exemplar-based PTs generally outperformed others. Performance generally improved with higher data detail levels, though some LLM-PT combinations performed moderately even with limited data. LLMs generally performed better on vulnerabilities published after LLM knowledge cutoff dates, with few exceptions. </p><p dir="ltr">On select SDPs, LLMs in combination with the appropriate PT are capable of moderate performance. Depending on the vulnerability data detail level and its publication date relative to the knowledge cutoff date, LLMs may also exhibit good performance. Consequently, LLMs can be a useful aid to assist SOC teams accurately and rapidly prioritise software vulnerabilities.</p>
History
Table of Contents
1. Introduction -- 2. Related Works -- 3. Research Methods -- 4. Evaluation and Results -- 5. Assumptions, Limitations and Constraints -- 6. Conclusions and Future Research -- A. Appendix -- References
Notes
Additional Supervisor 3: Young Lee
Awarding Institution
Macquarie University
Degree Type
Thesis MRes
Degree
Master of Research
Department, Centre or School
School of Computing
Year of Award
2025
Principal Supervisor
Muhammad Ikram
Additional Supervisor 1
Hassan Asghar
Additional Supervisor 2
Ejaz Ahmed
Rights
Copyright: The Author
Copyright disclaimer: https://www.mq.edu.au/copyright-disclaimer