Goblin Diary #1
Introduction
Detection engineering underpins half of the entire cybersecurity industry but remains only ever softly spoken about or kept in some corner of the conference. So I've started this diary to capture the work I do in my roles and demonstrate different best practices implemented into the real world.
This diary has five goals set:
- share with people the complexity of working in detection engineering
- highlight best practice and how it fits
- share candid details on inter-team work and break/fix tasks
- ensure every entry has something that can be used immediately by other people
- keep the format like diary with no formal structure and a personal tone
Note: The content I generate will be sanitised but will avoid high level overviews. I hope other detection engineers enjoy my pain with me and new aspirants become engorged with new ideas.
Starting Slowly..
This entry was made at the start of the week and so I spent time running my regular reporting. I run reporting and metrics at the start of the week because it allows me to setup my planner (Microsoft ToDo) with tasks and deadlines.
As always, key to creating a good reporting schedule is understanding where your data is so I maintain a map of all the alerting sources I manage and how to query them. It looks something like this:
I've already built the reporting to happen automatically for me but what's not mentioned in the diagram are analytics I use to discover new elements of infrastructure or data generated. These analytics essentially capture what was seen last week and compare it to what was captured this week to highlight new additions. This is important because I rely on third party platforms like Azure / Microsoft and Crowdstrike who don't keep their customers up to date on everything they change so whenever they add new logs or expand old ones I'm kept in the loop.
This week I shared a few observations with other teams as they lack the capacity to understand the data like I can but still benefit from the information I generate.
- Quietest periods of the week were sent to SOC leaders for rota management
- Playbook errors due to case sensitivity sent to other engineers
Now the cogs are turning..
Another engineer had been working on changes to the deployment pipeline and was exploring detections I've written that were generating flags in his new linter. This was my favourite part of the day because I got to showcase detection methodologies I apply across a large number of rules and also demo the advantages and disadvantages of different algorithms and functions.
In particular i walked through how we evaluate events in sequence where two behaviours must occur where the first is required to occur x times before the second. This is a really common logic found across many different detection ideas.
For new detection engineers i would always recommend actually not learning about how to implement logic primarily through vendor tools like Splunk or Sentinel and instead first understand how to write the logic in code. Not only will this help you maintain a mental image of what you want to achieve that will always apply in every tool but optimisation is a much more discussed topic in the SWE world whereas its treated as a nice to have in the big vendor platforms.
There are a few ways to approach this and largely you want to make the decision based on the nature of the platform your using. SaaS platforms like Sentinel and Crowdstrike are very forgiving and allow for inefficient approaches to function well. If you're using one of these you can usually approach the problem by generating a list of records in series and aggregating on a bin of those records for a timespan you give like say 45 minutes. This will then allow you to evaluate patterns inside the list/index.
store it in a string
| user.name | @timestamp | _duration | event.type[1] | source.ip | AccessDenied | AccessAllowed |
|---|---|---|---|---|---|---|
| fluffiest.cat@supercat.org | Oct. 28, 2025 14:07:13.000 | 25000 | denied denied denied denied denied allowed | 172.16.8.11 | 36 | 139 |
store it in an array
New Data New Problems
- What activity are the new users performing in this new region
- How much variance do the authentication properties have in a given week
Modelling this data to explore the two above points is really easy and can be done using basic aggregate functions. I added some additional logic to look at the volumes of activity for the users and calculate percentages for increases or decreases which allows me to easily understand how significant of a change has occurred in the data.
Across most tools you will simply need to compare two time windows of data and define your basic medium and upper bound along with calculating the standard deviation. You should get a result like this!
| Present90 | Historic50 | Historic90 | avg_daily_events | daily_event_variation | PercentDecrease | PercentIncrease |
|---|---|---|---|---|---|---|
| 0.4149804220344742 | 0.42 | 0.45448793562740775 | 231765.47 | 18291.06 | -8.55 | 0.91 |
| 1.5183902122100648 | 1.63 | 2.802939744889062 | 173693.43 | 80118.5 | -75 | 0.25 |
| 1.0924868727462969 | 0.84 | 1.200818894660522 | 215086.27 | 52950.67 | 15.96 | 1.16 |
| 1.6016681095583383 | 1.63 | 1.7209205113771453 | 218395.55 | 45422.7 | -6.88 | 0.93 |