Cross-Industry / Blog Home / Guest Post

At The Office & At Home: Why We Need Machine Learning to Prevent Data Breaches

By Ed BishopAugust 17, 2020

Despite there being thousands of cybersecurity products on the market, data breaches are at an all-time high. Worse still, with workforces around the world having suddenly transitioned from offices to their homes, organizations are even more vulnerable to new and increasingly sophisticated threats.

The reason? For decades, businesses have focused on securing the machine layer — layering defenses on top of their networks, devices, and finally cloud applications. But these measures haven’t solved the biggest security problem — an organization’s own people.

In this article, I’ll outline how machine learning can help organizations solve their “people problem” and why this new human-centric approach is more important now than ever.

The “people problem”

While we don’t believe that employees are the weakest link, we do know that, unfortunately, to err is human. People make mistakes, break the rules, and are easily hacked.

The fact is, when faced with overwhelming workloads, constant distractions, and jam-packed schedules, cybersecurity just isn’t top of mind for the average employee. Cybersecurity training can go out the window in moments of stress. Best practice and policies and procedures can be pushed aside in favor of an easier (albeit less secure) path.

Let’s look at email. Imagine an employee has to submit a project proposal by 8 PM. Because they’re working from home, they’re eager to finish up and spend some much-needed time with their family. After finalizing the document with just 5 minutes to spare, they draft an email, and hit send.

The problem is, the email was sent to Jane Green, not Jane Grier, including revenue projections, Intellectual Property, and confidential client data.

An employee could just as easily fall for a phishing scam. It shouldn’t be a huge surprise, then, that nearly half (43%) of people say they’ve made mistakes at work that compromised cybersecurity.

That’s why we shouldn’t leave people as the last line of defense against security threats.

But, that’s easier said than done. Why? Because no two humans are the same. We all communicate differently — and with natural language, not static machine protocols. These complexities make solving human layer security problems substantially more difficult than addressing those at the machine layer — we simply can’t codify our behavior with “if-this-then-that” logic.

But, this isn’t the only issue we face when trying to prevent human error. There’s also the issue of time. Our relationships and behaviors change and constantly evolve. We make new connections, take on new projects, and talk to different people about different things.

The time factor

We can use machine learning to identify normal patterns and signals, allowing us to detect anomalies when they arise in real-time. This technology has allowed businesses to detect attacks at the machine layer more quickly and accurately than ever before.

One example of this is detecting when malware has been deployed by malicious actors to attack company networks and systems. By inputting a sequence of bytes from a computer program into a machine learning model, it is possible to predict whether there is enough commonality with previously seen malware attacks — while successfully ignoring any obfuscation techniques used by the attacker. Like many other threat detection problem areas at the machine layer, this application of machine learning is arguably “standard” because of the nature of malware: A malware program will always be malware.

But, this method of detection won’t work for human behavior. As mentioned, it changes over time. That’s why, in order to solve the threat of data breaches caused by human error, we need stateful machine learning.

Consider the example of trying to detect and prevent data loss caused by an employee accidentally sending an email to the wrong person. Harmless mistake, right? Not quite. Misdirected emails were the leading cause of online data breaches reported to regulators in 2019. All it takes is one clumsy mistake - like adding the wrong person to an email chain - for data to be leaked.

It’s worth mentioning that this is happening a lot more than IT leaders think. Research shows that, while IT leaders in organizations with 1,000+ employees think just 480 misdirected emails are sent every year, the actual number of misdirected emails sent is more than 800. That’s a big difference.

But, how do you accurately predict whether an email is being sent to the right (or wrong) person? You need to understand — at that exact moment in time — the nature of the sender and recipient’s relationship. What do they typically discuss, and how do they normally communicate? You also need to understand the sender’s other email relationships to see if there may be a more appropriate intended recipient for this email. You essentially need an understanding of the sender’s entire historical email relationships up until that moment.

That’s why understanding “state,” or the exact moment in time, is absolutely critical.

Why stateful machine learning?

With a “standard” machine learning problem, you can input raw data directly into the model, like a sequence of bytes in the malware example, and it can generate its own features and make a prediction.

As previously mentioned, this application of machine learning is invaluable in helping businesses quickly and accurately detect threats at the machine layer, like malicious programs or fraudulent activity.

But, the most sophisticated and dangerous threats occur at the human layer when people use digital channels, like email. To predict whether an employee is about to leak sensitive data or determine whether they’ve received a message from a suspicious sender, for example, we can’t simply give that raw email data to the model. It wouldn’t understand the state or context within the individual’s email history.

What is stateful machine learning?

Stateful machine learning allows us to look across each employees’ historical email data set and calculate important features by aggregating all of the relevant data points leading up to that moment in time. We can then pass these into the machine learning model. The time variable makes this a non-trivial task; features now need to be calculated outside of the model itself, which requires significant engineering infrastructure and a lot of computing power, especially if predictions need to be made in real-time. But failure to adopt this type of machine learning means you will never be able to truly protect your people or the sensitive data they access.

The bottom line: people are unpredictable and error-prone and training and policies won’t change that simple fact, especially when employees are working remotely. Nearly half (48%) of employees say they’re less likely to follow safe security practices when working from home.

Businesses need a more robust, people-centric approach to cybersecurity. They need advanced technologies - like stateful machine learning - that understand how individuals’ relationships and behaviors change over time. Only then can we truly succeed in detecting and preventing threats caused by human error in order to reduce the frequency data loss incidents and breaches.