Be Careful Borrowing From The Insurance Industry
Cybersecurity borrows many insurance models and analogies in an attempt to manage risk. However, this can be dangerous if assumptions are not fully understood.
In my research, I’ve come across a few actuarial formulas used in insurance in an attempt to undertake the massive task quantifying cybersecurity. As I delved deeper it was clear that statisticians are misunderstanding cybersecurity and/or cybersecurity professionals are misunderstanding statistics.
Insurance at its core is managing a lump sum of money and selling policies that cover certain events which expose that lump sum to a risk of loss. Exposure to loss is a known quantity due to two factors: historical data and policy limits. These determine if insurers will cover an event. If there is not enough data to extract a distribution of events, or if the event would cause too much exposure, they will simply not expose themselves to the risk. By both choosing the event and extent of exposure, insurers are able to select risks that their models do a good job of predicting. If they weren’t able to choose, they would not make any money.
In Cybersecurity, you don’t get to choose the risk and the only upper bound on your exposure is bankruptcy. Every event is unique and thus follows its own distribution of occurrences and outcomes. Each path attackers take will be completely unique and yielding wildly different probabilities for the same threat. Anyone who is reading this blog knows that cybersecurity events also affect the organization and can subsequently change the distribution of non-related events. For example, patching a server due to a security event will include other patches for other software which changes the probability of all related attacks and attacks that would hop through that specific node.
What stood out to me is this tramples the concept that many statistical methods seemed to be built on: iid or identically distributed and independent variables. I’ve seen a handful of quantitative risk tools for cybersecurity and every single one of them assumes that data is independent and identically distibuted. While this may make things a whole lot easier, it is obvious to anyone who has worked in the trenches that events are neither identically distributed or independent.
Now we find ourselves in a situation where many of the classical statistical and actuarial methods are simply not useful. At worst, they are harmful since they carry a significant cost hidden in their value, the cost of being wrong. When using quantitative tools, we must be sure that they are accurately applied and do not present more risk than the value they add.
Admittedly, I’m still very much re-learning the math I’ve long forgotten, but it blew me away how such assumptions were ignored. However, it made me excited on how much more there is to explore in quantitative cybersecurity. A shoutout to Nassim Taleb for pointing out that when the practitioners are armed with the math, they can connect the dots that the mathematicians have missed.