Machine learning is a subset of Artificial Intelligence (AI), which itself is a subset of data science. It concerns descriptive, diagnostic, predictive, and prescriptive analyzes. The descriptive analysis relates to what happened; the diagnostic analysis explains why it happened; predictive analytics helps predict what is most likely to happen in the future, and prescriptive analysis
Machine learning is a subset of Artificial Intelligence (AI), which itself is a subset of data science. It concerns descriptive, diagnostic, predictive, and prescriptive analyzes. The descriptive analysis relates to what happened; the diagnostic analysis explains why it happened; predictive analytics helps predict what is most likely to happen in the future, and prescriptive analysis recommends the most logical course of action to achieve the desired outcome.
Machine learning focuses on predictive and prescriptive analytics, depending on the nature of the analysis and the algorithms used. This section provides an overview of the most common types of machine learning. A high-level, non-technical summary of machine learning algorithms is provided in another section.
1. Supervised learning
Supervised learning algorithms make predictions based on examples, eg. ex. a sales history to determine future prices. In such a case, there is an input variable consisting of labeled training data and a desired output variable. An algorithm is used to analyze the training data to learn the function that relates an input to output. This feature allows the matching of new examples by generalizing from training data to anticipate the results of unknown situations.
Supervised learning is therefore learning from examples. The learner (usually a computer program) receives two sets of data, the first for training and the second for testing. The process is as follows: the training dataset must “learn” from a set of labeled examples so that it can identify unlabeled examples in the test dataset with the largest possible precision.
The learner must therefore develop a rule, program, or procedure that ranks new examples (those found in the test dataset) by analyzing the examples that already have a class label. For example, a training data set may include images of different types of fruit, with the identity of the fruit in each image given to the learner.
The test data set then includes pieces of unidentified fruit belonging to the same class. The learner’s task is then to develop a rule to identify the items in the test data set.
There are many approaches to creating the best possible methods of classifying examples of a test data set using data from a training set. In supervised learning, the training dataset contains ordered pairs (x1, y1), (x2, y2), …, (xn, yn), where each xi is a measure or a set of measures of a single example of a data point, and yi is the label of that data point.
For example, an xi can be a group (sometimes called vector1) of five measurements for an inpatient, that is, height, weight, temperature, blood sugar, and blood pressure. The corresponding yi can refer to “healthy” patient or “unhealthy” patient.
The test data also consists of measurements without labels: (xn + 1, xn + 2, …, xn + m). The goal, therefore, is to make informed inferences about the labels of the test data set (eg “healthy” or “unhealthy”) by drawing conclusions from the data set. training. Supervised learning problems can be subdivided as follows:
When the data is used to predict a categorical variable, supervised learning is also called classification. This is the case, for example, when a tag or flag (eg “dog” or “cat”) is assigned to an image. When there are only two labels, it is referred to as a binary classification. When there are more than two categories, we speak of classification in multiple classes.
When we predict continuous values, we talk about regression.
It’s about making predictions from past and present data. This type of process is most often used to analyze trends, eg. ex. estimate sales for next year based on sales for the current year and previous years.
2. Unsupervised learning
In unsupervised learning, the machine receives untagged data. He is asked to discover the patterns underlying the data, p. ex. a cluster structure, a low-dimensional variety, or a low-density tree and graphic.
This involves grouping together sample data so that the examples from one group (or cluster) are more like (by some criteria) the examples from another group. This process is often used to segment an entire set of data into multiple groups. Analysis can be done in each group to find intrinsic patterns.
The idea is to reduce the number of variables examined. In many applications, raw data has many-dimensional characteristics, some of which are unnecessary or irrelevant. Reducing the dimensions therefore makes it possible to find the true latent relationship.
Unsupervised learning is suitable when a problem requires a considerable amount of unlabeled data. For example, apps related to social media contain large amounts of untagged data.
In order to understand the meaning of these data, one needs algorithms capable of classifying them according to the patterns or clusters that are found. Unsupervised learning is an iterative process of analyzing data without human intervention.
Unsupervised learning is used, for example, to detect spam. In legitimate emails and spam emails, there are far too many variables for an analyst to report spam. Classifiers based on grouping and association are therefore used to perform this task. Unsupervised learning algorithms segment data into groups of examples (clusters) or groups of characteristics.
The unlabeled data creates the parameter values and the classification of the data. This process actually adds labels to the data so that it becomes supervised.
Unsupervised learning can determine results when there are very large amounts of data. In such a case, since the developer does not know the context of the data being analyzed, labeling is not possible at this point. Unsupervised learning can therefore be the first step before submitting data to supervised learning.
Unsupervised learning algorithms can help businesses understand a lot of new unlabeled data. As in supervised learning, these algorithms look for patterns in the data, the difference being that the data is not yet understood. For example, in the healthcare industry, collecting massive amounts of data on a particular disease can help better understand symptoms and link them to outcomes in patients.
It would take too long to label all of the data sources associated with a disease such as diabetes; therefore, unsupervised learning can help determine outcomes faster than supervised learning.
3. Semi-supervised learning
In supervised learning, labeling data can be time consuming and expensive. If the labels are limited, it is possible to use unlabeled examples to enhance supervised learning. Since the machine is not fully supervised, we use the term “semi-supervised”. For semi-supervised learning, unlabeled examples, and a small amount of labeled data are used to improve learning accuracy.
4. Reinforcement learning
The learning of enforcement enables the analysis and optimization of the behavior of an agent-based on the feedback of information from the environment. Machines try different situations to determine the most beneficial actions, rather than just being instructed on what actions to take. What sets reinforcement learning apart from other techniques is learning by trial and error and delayed reward.
Reinforcement learning is a model of behavioral learning. The algorithm receives information through analysis of the data so that the user is directed to the best result. Reinforcement learning differs from other types of supervised learning in that the system is not trained from a set of data – it learns by trial and error.
Therefore, a series of decisions have the effect of “strengthening” the process, as it is best suited to solve the problem.
Reinforcement learning is used, for example, for self-driving cars. Driving an autonomous car is an extremely complex process due to the many possible obstacles. If all cars were self-sufficient, trial and error would be easier to overcome. In the real world, however, human factors are often unpredictable.
Even in such a complex situation, the algorithm can be optimized overtime to find ways to adapt to the state where actions are rewarded.
To understand reinforcement learning, let’s think about training an animal to act in a certain way based on the rewards given to it. If a dog is given a treat every time his handler asks him to sit, then this action will become automatic at home.
5. Deep learning (neural networks)
Deep learning is a method of machine learning that integrates networks of neurons in successive layers to iteratively learn from data. Deep learning is particularly useful for detecting patterns in unstructured data.
Deep learning methods (complex neural networks) are designed to mimic the workings of the human brain so that computers can be trained to deal with abstractions and ill-defined problems.
Neural networks and deep learning are often used in image and speech recognition and computer vision applications.
A neural network consists of at least three layers: an input layer, a hidden layer (or many hidden layers), and an output layer.
Data is absorbed by the input layer. They are then modified in the hidden layer and the output layer according to the weights applied to the nodes. A typical neural network can be made up of thousands, if not millions, of simple, tightly interconnected processing nodes.
The term “deep learning” is used when there are multiple layers hidden in a neural network. A neural network continually iteratively adjusts and makes inferences until a specific stage is reached.
Neural networks are often used in image recognition and computer vision applications. Deep learning is a machine learning technique that uses hierarchical neural networks to learn from a combination of unsupervised and supervised algorithms.
It is often said that deep learning is a sub-discipline of machine learning. Usually, the machine learns from unlabeled and unstructured data. Although deep learning very similar to a traditional neural network, the hidden layers are much more numerous. The more complex the problem, the more hidden layers the model contains.
Deep learning will have an influence on many areas, for example in the case of speech recognition, which will be used both in cars and in customer management.
In the Internet of Things and manufacturing applications, deep learning can be used to predict machine malfunction.
Selection of machine learning algorithms
Three aspects must be taken into account when choosing an algorithm: accuracy, training time, and ease of use. Many users prioritize accuracy, while beginners tend to prioritize algorithms with which they are most familiar.
Newbies tend to choose algorithms that are easy to implement and get results quickly, which works well as long as this is only the first step in the process.
The next step, however, should make way for more sophisticated algorithms to strengthen understanding of the data and improve results.