Transcript for:
Understanding Data Mining Processes and Techniques

If you've ever been panning for gold, you'll know that it takes a lot of time and effort to find even a small nugget. It's estimated that to extract enough go to make a single gold ring, you'd need to sort through around twenty six tons of rock and other stuff. That's a lot to sift through. The same is true when mining data, except the gold is replaced with insights and the panning is replaced with algorithms. So let's talk about it. Data mining. So data mining is the process of extracting valuable information from large datasets, and it's used in a variety of industries, from marketing through to health care. And it can help businesses to make more informed decisions. Now, fundamentally, data mining is about processing data and identifying patterns and trends in that information. And when we think about the evolution of things like data warehouses, and when we think about things like just the sheer volume of data, big data. We can really start to see that these sort of data mining techniques have rapidly accelerated over the last couple of decades. We need to process so much of this data and turn it into useful knowledge. One of the main advantages of data mining is that it can help you to make predictions about future trends. By analyzing past data, you can build up a picture of how things might develop in the future. Data mining can also help you to identify relationships between different pieces of data that you might not have been able to see before. So, for example, you might see that there is a correlation between the amount of time somebody spends on your website and the likelihood of them making a purchase. Now we can think of the data mining process consisting of four basic steps. So step one is setting objectives. And this is where data scientists and business stakeholders work together to define a business problem that data mining will be applied to. Now, with the problem defined with the scope defined, we move onto step two, which is data preparation. This identifies which set of data it will help answer these pertinent questions to the business that we set in step one. Now, there's more here than just identifying the data. We also need to clean it, removing any noise, such as duplicates, missing values, and outliers. Then we move on to stage three, which is applying the data. And applying it specifically through data mining algorithms. We're looking here for interesting data relationships and applying deep learning techniques -- and we'll look deeper into step three in just a second. Then finally, step four is evaluating results. So this is really interpreting results that are valid, novel, useful and understandable. So let's talk about some of those data mining techniques that make up stage three here. Data mining works by using various algorithms and techniques to turn large volumes of data into useful information. And while there are many ways to do this, here are some of the most common - and let's start with kind of the most straightforward, which is association. Now, association is rule-based, and it's a method for finding relationships between variables in a given dataset. You make a simple correlation between two or more items, often with the same type, to identify patterns. So, for example, when tracking people's buying habits, you might identify that a customer always buys cream and then they tend to buy strawberries. And therefore, you could suggest that the next time they buy strawberries, they might also want to purchase cream. You can use another technique called classification as well. And classification does, is this builds up the idea of the type of customer or the type of item or the type of object by describing multiple attributes to identify a particular class. So, for example, you could easily classify cars into different types like sedan, 4x4, convertible, and you could do that by identifying different attributes like the number of seats or the shape of the car. Then, given a new car, you can apply it into a particular class by comparing the attributes with our known definition. Another useful technique is clustering. Now, clustering enables you to group individual pieces of data together to form a structure. Correlating the data instances with other examples so you can see where the similarities and the ranges agree. There are a number of deep learning techniques utilizing artificial neural networks as well that we can use to form things such as predictions. By analyzing past events or past instances, you can make a prediction about an event. If the input data is labeled, regression can be applied to predict the likelihood of a particular assignment. If the dataset isn't labeled, the individual data points and the training set are compared with one another to discover underlying similarities- clustering them based upon those shared characteristics. You’ll often see things like decision trees and K Nearest Neighbor, or KNN algorithms, used here. One of the most important things to remember is that data mining techniques are not a one-size-fits-all solution, with different techniques being more or less effective depending upon your data- your business questions and what you're trying to achieve. It's often a case of trial and error to identify which method will work best for you. So data mining... it combines business stakeholders and data scientists into this whole process shown here. And when done right, you can find [clears throat] golden insights that can be transformational for a business. If you have any questions, please drop us a line below, and if you want to see more videos like this in the future, please like and subscribe. Thanks for watching.