Goblin Detection Diary #1 - Data is queen
Introduction
Detection engineering underpins half of the entire cybersecurity industry but remains only ever softly spoken about or kept in some corner of the conference. So I've started this diary to capture the work I do in my roles and demonstrate different best practices implemented into the real world.
This diary has five goals set:
- share with people the complexity of working in detection engineering
- highlight best practice and how it fits
- share candid details on inter-team work and break/fix tasks
- ensure every entry has something that can be used immediately by other people
- keep the format like diary with no formal structure and a personal tone
Note: The content I generate will be sanitised but will avoid high level overviews. I hope other detection engineers enjoy my pain with me and new aspirants become engorged with new ideas.
Starting Slowly..
This entry was made at the start of the week and so I spent time running my regular reporting. I run reporting and metrics at the start of the week because it allows me to setup my planner (Microsoft ToDo) with tasks and deadlines.
As always, key to creating a good reporting schedule is understanding where your data is so I maintain a map of all the alerting sources I manage and how to query them. It looks something like this:
I've already built the reporting to happen automatically for me but what's not mentioned in the diagram are analytics I use to discover new elements of infrastructure or data generated. These analytics essentially capture what was seen last week and compare it to what was captured this week to highlight new additions. This is important because I rely on third party platforms like Azure / Microsoft and Crowdstrike who don't keep their customers up to date on everything they change so whenever they add new logs or expand old ones I'm kept in the loop.
This week I shared a few observations with other teams as they lack the capacity to understand the data like I can but still benefit from the information I generate.
- Quietest periods of the week were sent to SOC leaders for rota management
- Playbook errors due to case sensitivity sent to other engineers
Now the cogs are turning..
Another engineer had been working on changes to the deployment pipeline and was exploring detections I've written that were generating flags in his new linter. This was my favourite part of the day because I got to showcase detection methodologies I apply across a large number of rules and also demo the advantages and disadvantages of different algorithms and functions.
In particular I walked through how we evaluate events in sequence where two behaviours must occur in that the first is required to occur x times before the second. This is a really common logic found across many different detection ideas.
For new detection engineers I would recommend trying to learn query optimisation and good data modelling practices from outside the cybersecurity industry as the large vendors such as Splunk and Microsoft have in their generosity fostered a lot of lazy attitudes. In addition to this exploring different logic concepts in pure code will help build skills that you will find universally applicable as apposed to being focused on SPL or KQL.
There are a few ways to approach this and largely you want to make the decision based on the nature of the platform your using. SaaS platforms like Sentinel and Crowdstrike allow for inefficient approaches to function well. If you're using one of these you can usually approach the problem by generating a list of records in series and aggregating on a bin of those records for a timespan you give like say 45 minutes. This will then allow you to evaluate patterns inside the list/index and hopefully its obvious to most which option is better:
store it in a string
| user.name | @timestamp | _duration | event.type[1] | source.ip | AccessDenied | AccessAllowed |
|---|---|---|---|---|---|---|
| fluffiest.cat@supercat.org | Oct. 28, 2025 14:07:13.000 | 25000 | denied denied denied denied denied allowed | 172.16.8.11 | 36 | 139 |
store it in an array
New Data New Problems
- What activity are the new users performing in this new region
- How much variance do the authentication properties have in a given week
Modelling this data to explore the two above points is really easy and can be done using basic aggregate functions. I added some additional logic to look at the volumes of activity for the users and calculate percentages for increases or decreases which allows me to easily understand how significant of a change has occurred in the data.
Across most tools you will simply need to compare two time windows of data and define your basic medium and upper bound along with calculating the standard deviation. You should get a result like this!
| Present90 | Historic50 | Historic90 | avg_daily_events | daily_event_variation | PercentDecrease | PercentIncrease |
|---|---|---|---|---|---|---|
| 0.4149804220344742 | 0.42 | 0.45448793562740775 | 231765.47 | 18291.06 | -8.55 | 0.91 |
| 1.5183902122100648 | 1.63 | 2.802939744889062 | 173693.43 | 80118.5 | -75 | 0.25 |
| 1.0924868727462969 | 0.84 | 1.200818894660522 | 215086.27 | 52950.67 | 15.96 | 1.16 |
| 1.6016681095583383 | 1.63 | 1.7209205113771453 | 218395.55 | 45422.7 | -6.88 | 0.93 |
Often in analytics shared online you will see thresholds with hard coded values but gaining understanding of volume using basic statistics like this allows teams to move away from them because in the best of worlds hard coded thresholds require manual regular reviews to ensure coherency with changing understandings of the business and at worse are left unchanged for months.
Iterate
Another thing I was exploring today was how our analysts find documentation, while I was working across different detections I described above I had the thought that some of the fields I drop (remove from an output) before presenting the final result might be useful because inside the logic for all our detections is a name for the detection and the documentation assigned to each one uses this same value so if I kept the key:value for analysts they can just copy and paste the whole string into our documentation store instead of searching on keywords. Luckily I keep logs on what searches analysts make so i can answer questions like what paths analysts are taking to find materials and key phrases they use that might help me improve search optimisation.- All values with a relationship to an identity or asset such as a user name or IP must be surrounded by metadata such as privileges assigned or in which country the user typically authenticates.
- All results are transformed into a table.
- All results are compatible with graph.
- Where external interfaces are required to triage the alert a link is surfaced to that interface in the alert.