Goblin Diary #2 - AI Tools for Analysts 🐯

Dont Use AI 

Analyst work is built on the human capacity for creativity, memory recall and information gathering and using so called 'AI Tools' will actively diminish you in these areas. Your ability to form useful thoughts is built on the continued labour of your mind. Each aspect of the labour you endeavour to partake in creates its own unique quality in your own cognition. Your drive connects you to this labour but does not assure you of any benefits. If you ask AI tools to format documents you will get worse at building narratives and if you ask AI tools to process registry modification events you will get worse at pattern recognition. 

After slowly reducing the individual minute qualities of your cognition you will one day be asked to solve a difficult problem. You will fail. You will resign the problem as not possible or unfair blinded by the damage you have already done to yourself. People who lack the capacity to understand even turn to frustration and anger but it will be in vain. Set in you will be behaviours that have overwritten what took perhaps 18 years to develop.

Our world is full of complexity and challenge. Our drive pushes us to explore mysteries and grow. These things build you with incredible depth. Inserting AI into the middle means you are not being built by your experiences you are merely speeding through them. The experiences you can derive from using AI are as shallow as the puddle at the end of every street. Do not trade the ocean for a puddle just because its easier to swim in. Love the labour, Become Better.



Introduction

AI tools are being gradually integrated into Security Operations Centres to either increase the quality of investigations or expand a given analysts capacity to handle workload. This blog post walks through what you should consider when evaluating AI tools for these purposes. In particular the focus of this document is to enable people to review the text outputs and establish whether what the AI tool is generating is valuable.

In the future I will publish further content on how to generate simulations and measure triage responses and case management functions of any given AI tool.

Understanding AI Tool triage efficacy 

When assessing the quality of an AI tool for SOC triage workflows consideration must be taken for each individual component of the process. These components can be articulated as the following:


Investigation 

Investigation describes a process whereby an analyst processes a provided alert, generates potential investigative avenues, weighs those avenues for appropriateness and finally translates selected investigative avenues into actions or tasks that collect evidence. By assessing an AI tools outputs through each abstraction an assessment to how well it will perform during a live incident can be arrived too. 

  • Alert comprehension 
  • Investigative Avenues 
  • Collection of evidence
  • Analysis of evidence 
  • Conclusions 

Alert Comprehension 

In most instances an analyst engages with an investigation due to the triggering of an alert. Alerts capture behaviours that have been observed within telemetry and provide a high level description of what the behaviour is. Analysts apply this information against their own understanding of the relevant systems and adversary behaviour to create a list of potential explanations for the behaviour and also a list of additional behaviours that would further suggest that the alert is a true positive. 

This can be measured through the following products: 
  • Is the AI tools description of the behaviour functionally similar to the alert 
  • If the AI tool has identified a relationship between the behaviour and adversary playbooks 
    • Is the alluded to association known to be correct 
    • Does the AI tool stipulate a confidence rating for the association 
    • Has the AI tool identified other possibilities 
      • items that increase likelihood of a true positive 
      • items that increase likelihood of a false positive

Investigative Avenues 

Once an alerts basic content has been captured a list of unanswered questions is typically generated by an analyst. These questions will either enable an absolute conclusion to the alerts nature or enrich the analyst during further reasoning. Investigative avenues are usually weighed for likely effort expenditure and closeness to previously successful investigations. Where avenues that would take very little effort to explore and are known to be of been useful in past investigations are selected first. 

This can be measured through the following products: 
  • Do the avenues selected by the AI tool reflect those found to be successful in past incidents? 
  • Do the avenues appear relevant to the alert?
  • Are the investigative avenues sorted or ordered by priority? 
    • are justifications provided for sorting? 
  • How broad are the investigative avenues? 
    • do the avenues utilise multiple data sources 
    • do the avenues utilise all the data sources available
  • Did the investigative avenues return results that forwarded the investigative towards a confident conclusion 
    • Is a false positive justification provided 
    • Is a true positive justification provided

Collection of evidence 

Collecting evidence can be considered a demonstration of an analysts forensic ability. Translating investigative avenues into evidence collection requires broad domain knowledge of computer systems and available tools and the rigor to ensure that items collected are easy to process and store.

This can be measured through the following products: 
  • Did the AI tool collect evidence from all the relevant sources 
  • Was the evidence collected easily searchable

Analysis of evidence 

Once evidence has been gathered it must be associated to the drafted investigative avenues and used to provide either an absolute conclusion to what is being explored or justification for further evidence collection. Where an analyst is aiming to understand a behaviour in more detail the evidence should contain data not found in triggering alert and where questions to existence of additional behaviours are being explored the evidence should be from a surface or data source known to be interacted with by adversaries. 

Using the gathered evidence analysts assess whether its properties are suggestive of malicious intent and compare what is being analysed to known good examples. In some instances this is done cognitively particularly for evidence such as authentication attempts from Russia but in others where an evidences properties are highly variable this is achieved through direct comparisons of different data sets. This direct comparison is used to arrive to assertations on how unlike the properties are compared to baseline. Where certain variables are deemed highly unlike a known good baseline these are considered positively contributing to the investigation. For example volume of data downloaded from a SharePoint point site over time and distribution of file types in the download.

This can be measured through the following products:

  • Has the AI tool used the evidence to generate verifiable facts
  • Has analysis of the evidence generated confidence statements
    • likelihood of true positive
    • likelihood of false positive
  • Are references provided for the AI tools reasoning
  • How much of the evidence was used by the AI tool
    • Facts derived
    • Associated to adversary behaviour
  • If minimal or no context is derived from a piece of evidence
    • Does the AI tool prompt for more evidence collection
    • Is reasoning provided for the absence of information

Conclusions

Investigative work must end with conclusions. These conclusions can either be a statement to uncertainty and what work is required to alleviate the uncertainty or a statement to the nature of the behaviours captured in the alert. To arrive to a conclusion analysts fairly evaluate each component of their investigation and associate their findings to their own understanding of how adversary behaviour manifests in the products or systems. Often conclusions contain some uncertainty due to the general nature of computer system openness however this uncertainty should be justified and surrounded by context that describes specific circumstances the behaviour would be a false positive.   

This can be measured through the following products:

  • Did the AI tool conclude a true or false positive
    • Was it correct
  • Is the provided conclusion correlated or linked to evidence
    •  Does the evidence support the conclusion
  • Is the conclusion clear and without ambiguity
  • Does the conclusion provide insight into causality
    • Is evidence used to arrive to causality
  • Is the degree of confidence in the conclusion stated

Assessment of language

Beyond measuring an AI tools outputs for investigative efficacy its important to understand how well the tools write with regards to syntax and semantics. AI tools deployed in the SOC are typically multi modal large language models that tokenise content and arrange it based on probability. This probability is primarily shaped by pre deployment training where a model vendor will control inputs and programmatically adjust parameters and conditions until the model generates outputs that are shaped in a desired way.

These shaped outputs are what analysts will consume when using AI tools to aid in investigations and alert triage so its important to measure whether the model vendor is designing the models to communicate in a way that analysts can apply into their work.

The below general framework has been created to help measure the models outputs when using the measurements described in the ‘Investigations’ section. This section is slightly subjective as the shape and style of communications preferred by an individual analyst is dictated by what they have been exposed to in the past. 

When making an assessment of the AI tools output each item below should be easily extractable from the text:

Point

Where an AI tool is creating an output the point must be clear. What is being said should be easily identified amongst facts or statistics and it should be associated to surrounding context. The point acts as the idea the analyst wields when they read the literature and review any findings.

Reason

A reason is a collection of details on why the point is being made. It is usually the most comprehensive component and reveals to the analyst surrounding supporting context, processes of thought and any assumptions that are being made.

Evidence

Evidence consists of artefacts either forensic data points or facts that any reasonable person would arrive to based on the given information. Often forensic data and facts about the data are paired together to create a piece of evidence. Evidence must have integrity in way that makes it easy to challenge as not hallucinated (references) and be relevant to the point.

Poor output

Considering the three aspects of the outputs above you can identify poor content either through the absence of one or more of the items or where continuity between each aspect is missing. Continuity throughout the text is important to ensure analysts are able to weigh the total set of circumstances where if poor continuity is present it is likely analysts will not understand relationships between different pieces of content nor how to attribute severity or causality due to the fragmented information.

Beyond the three aspects to writing you should also consider whether the AI tool is overly repeating points or highlighting facts or information with significant bias. Analysts using AI tools must be presented with content fairly and while using a structure or shape to the text that makes it clear what is the most critical information can be useful it must not be in a form that obscures other pieces of information. If an AI tool repeats a single point for different distinct purposes in the structure of the text then it is likely acting without sufficient evidence.


Ending

AI tools are still struggling to integrate well with Security Operations Centres particularly due to the Cyber Security industry's tendencies to act without rigor or deliberateness. Ensure that you establish small use cases and aim to perfectly achieve those first before attempting to use AI tools for all analyst work. 



Popular

Brilliance in the Basics

Investigate

Endpoint on Adrenaline : One

Endpoint on Adrenaline 3

Endpoint on Adrenaline Two

Investigate Three

Investigate Two

Writing detections when stuck with EDR

Standardized Note Taking Format For Analysts