Understanding Data Loss Prevention Techniques

All right, we have DLP today. That is data loss prevention, also sometimes called data leak prevention. And it's all about methods for preventing the loss of confidential sensitive information, or in other words, keeping these confidential files and the databases and the records inside of our company.

Don't let them be disseminated, sent over email, web cloud storage services, instant messages services, anything that can transfer files outside of your company. keep an eye on those channels and make sure that confidential information doesn't get out. That's all DLP is about.

Now, the story behind DLP solutions is actually not that complicated, and it all starts with the necessity of having some sort of an engine that performs two functions. First of all, we need classification. We need to identify which documents are considered by us to be confidential or not.

Secondly, we need that engine to be able to enforce those policies because we could try to classify those documents. But if we don't enforce a policy that actually blocks a user, denies that user the action of attaching that sensitive file to a personal email message and then sending it on its merry way, well, that DLP solution does nothing. It's not data loss prevention.

It's data loss notification. So we do have some methods here for implementing this. First one is manual.

That is just for small companies. now the manual implementation of dlp can only work so much as far as the classification part because you cannot manually block traffic right nobody is paid that well secondly we have an on-premises box that looks at the traffic it can rely on some sort of a network agent that inspects the network traffic knows to look for new application level information like finding files within uh hdp requests and you know http messages that are being sent outside then we can rely on a cloud service this is much easier implemented if your entire infrastructure or even the the file storage services like sharepoint or the email services like office 365 or gmail google apps are also inside the same cloud environment implement the dlp policy when all your data is already in the cloud it's much much easier and finally we can work with agents installed on the hosts those are just small applications that we install on the user hosts and they just look at the operations that the user is performing is the user trying to access a sensitive file okay let's see what the user is about to do with that file is it trying to attach it to an email message is it trying to push it through an ftp connection right we might get a chance to catch it before it actually happens because we we actually have access to the to the actions of the user right on the host where the the data leakage is about to happen now the client-side implementation is mostly considered to be the best way to protect against dlp risks because you can also address usb data exfiltration like for when the user tries to copy that file to an external drive and then you know go home with that drive or give it to somebody else on the server side on the other hand you're gonna have to rely on some proxy device that can intercept and decrypt all the connections with the outbound connections that users are are making and look for that sensitive data or sensitive files now while the client-based method is going to create more admin overhead the server-side method can create other problems like how are you actually able to decrypt or analyze encrypted traffic Or where do you draw the line because users might not be so comfortable having some sort of a proxy appliance that looks inside each and every of their email, maybe personal email messages that they send during office hours just to look for sensitive data. And of course, most DLP products are going to have some sort of a policy server or a management dashboard where you can configure the behavior of the solution.

What to scan? What to do if you find some sensitive info, what to do if the file cannot be scanned, for example, if it's within a password encrypted archive. And some DLP products can manage that classification part of the files as well.

Others allow you to integrate with some sort of a content management database or documents management system. And just to give you a couple of examples here of DLP solutions, we have Digital Guardian. We have Office 365 DLP built into Office 365. Symantec also has a very competent solution. And there are tons of vendors out there that deal with this. Just Google DLP solution, actually.

Now, as far as the DLP rules that we're using for classification and for... For matching, we can look for a number of things, starting with the file names or the file types. We know that specific Office documents that start with specific file names are restricted, or the contents of those files are restricted. We can also look inside the contents of the file by using some pattern matching, just looking for specific sets of data or strings that pertain to some sensitive information.

Now, as far as the actions that we can take, of course, we can just block the transfer. We can block the connection that tries to exfiltrate that data. We can create a notification, which also helps with user training, because this is how we tell users this is sensitive information. You shouldn't be sending this out.

Next up, we have quarantine. We use this when we detect unauthorized access to a file or an unauthorized sharing attempt, and we just place that file in a restricted area and await human intervention to unlock it. Most of the time we do this, to avoid a huge flow of alerts generated by that file, especially if the file was mistakenly placed in some publicly accessible area where employees think that it's free for all, free to share.

So we just place it temporarily in a quarantine area so we don't get overwhelmed with a lot of alerts of people trying to access that specific file or to share it. And finally, we have Tombstone. This is a replacement file. that we place instead of the actual file, usually a generic file placeholder with a description or a text content saying that the original file was replaced due to a policy violation. And as we said in the beginning, we need proper classification in order for a DLP solution to work properly.

If we don't know exactly what is sensitive to us, it's going to be next to impossible to define some actual rules and policies to protect that information. So this classification for DLP can be done in a number of ways. Starting with file tags.

This is just tagging specific files on a file server, specifying whether it is confidential or not, right? Ideally, this would be automated as well. So we would have another type of policy that identifies sensitive information, which then in turn helps us determine which files should be tagged as sensitive or not.

Another one is dictionary. We can just search for specific keywords or regular expression patterns that we know should represent sensitive information, like credit card numbers or personally identifiable numbers or anything that looks like the way we, let's say, number our internal contracts or SQ numbers, any kind of string or number that looks like something that is supposed to be secret. We can also use some templates. These Basically just do the job for us, the classification job instead of us, because they are predefined in most DLP solutions to match the regulatory requirements of HIPAA, GDPR, PCI DSS, and so on. You just load that policy, the policy knows already what to look for, and you just let it scan your entire document base to identify those potentially sensitive sources of data.

We can go one step further and do... EDM. This is not electronic dance music, but it's exact data match, which means scanning our outgoing data, our feeds for exact values.

Things like social security numbers, phone numbers, credit card numbers, contact information, passport numbers. Since you're scanning for exact matches of that sensitive information, you cannot upload all that data to the DLP tool, but instead scanning is done. by one-way hashing that sensitive data as well as the outgoing data fields and then comparing the hashes so in other words i cannot say that this specific passport or social security number shouldn't be allowed to leave our company but i can just hash that information and whenever the dlp solution finds a passport number or a social security number compares those hashes right so we don't defeat the purpose of sharing that information.

If we don't want to share that information, we're not going to share it with the DLP solution either, which makes a lot of sense, especially for regulatory requirements. Again, then we have document matching. This is simply providing samples of specific documents that should not leave the company. This is how an invoice looks like.

This is how a, I don't know, a CV from a resume from HR looks like. This kind of sounds good because it It's the way people work. Usually they just learn from examples.

It doesn't work so well with machines, with automated software, because this method is extremely susceptible to fail if minor changes in those documents occur. or even format conversion is performed. Like for example, okay, this is how a resume should look like.

And we want to block these from going outside of the company. But what if the resume is in a doc format instead of a docx? Or a PDF format or RTF format, rich text format, whatever type of format, right? That exact document match is not going to work anymore.

Now, of all these methods, EDM It's the most difficult one to implement, but it's also the method that generates the least amount of false positives. Also keep in mind that DLP solutions are not only technical controls against, let's say, insider threats, but also against unintentional data leakage, like attaching the wrong file to an email, or attaching the right file to an email, and then accidentally sending that email to the wrong email address. So not all data exfiltration is. malicious intent in nature all right short episode on dlp make sure we remember for the exam what precisely dlp does and what it is good for make sure you can enumerate some of the methods for forming dlp how those rules should look like what those rules should be looking out for and remember that most dlp solutions also have some sort of an automatic response like blocking alerting you know tombstoning and and other examples just like the ones we mentioned you Alright, so until next time, when we will be talking about endpoint security, I wish you good luck, I would kindly remind you to like and subscribe, and hope to see you on the next episode. Bye-bye.

Transcript for:Understanding Data Loss Prevention Techniques

Transcript for:
Understanding Data Loss Prevention Techniques