There are many kinds of computational learning systems. Our journey will focus on learning from examples. In this article, we will mostly be concerned with supervised learning from examples.
Author: Goran Sukovic, PhD in Mathematics, Faculty of Natural Sciences and Mathematics, University
What is the exact mean of “supervised learning from examples?” We start with a simple example. In the beginning, you are supplied with several photos of two animals you’ve never seen before, for example, cats and dogs. Then we give you a piece of information about which animal is in which photo. After that, we will give you a new, unseen photo and you should be able to tell us the type of animal in it. That’s it – you just performed supervised learning from examples. When a computer is coaxed to learn from examples, we have to present examples in a certain way. Each example is measured on a common group of attributes, or features, and we record the values for each attribute on each example.
Let’s get a bit more concrete. For example, consider human medical records with relevant values for each patient: height, weight, sex, age, smoking history, systolic and diastolic blood pressures, and resting heart rate. Our examples are the different people represented in the dataset, and the biometric and demographic characteristics are attributes. A convenient way to represent data is a table, as in Table 1 below.
id | height (cm) | weight (kg) | smoke | sex | age | heartrate | sys bp | sis bp |
071212 | 166 | 54 | no | f | 16 | 65 | 120 | 81 |
078612 | 175 | 85 | no | m | 42 | 73 | 115 | 75 |
123448 | 171 | 81 | yes | f | 42 | 56 | 109 | 70 |
876409 | 192 | 91 | no | m | 32 | 83 | 130 | 92 |
… | … | … | … | … | … | … | … | … |
The rows of the table are the examples of the dataset and the columns are the features. Often, people use “features” and “attributes” as synonyms describing the same thing; what they are referring to is the column of values. Let’s take a moment and look at the types of values of our attributes. The first type of attribute is called discrete, symbolic, categorical, or nominal attributes. They take a small, limited number of possibilities that represent one of several options. For example, attribute sex has values {male, female}. There is one practical point about categorical data. The information in those attributes can be recorded in two distinct ways. The first way is to use a single feature that takes one value for each option. The second way is to use several features, one per option; one, and only one, of those features, is marked as true and the remainder are marked as false.
Example: Two ways to represent the attribute sex for patients:
Id | sex | |
071212 | f | |
078612 | m | |
123448 | f | |
876409 | m | |
id | Sex is male | Sex is female |
071212 | false | true |
078612 | true | false |
123448 | false | true |
876409 | true | false |
The second type of features may be lumped together under the term numerical features, and they can be recorded and operated on as numbers. They are also known as continuous values. Values for attributes like height and weight are typically recorded as decimal numbers. Attributes like age, blood pressure and counts are often recorded as integers. We can perform arithmetic operations on these values. In practice, we can record categorical data as numbers, but usually, we can’t perform meaningful numerical calculations directly on those values. Let’s shift our focus back to the biomedical dataset. Collected attributes might be useful for an insurance company or health care provider trying to assess the likelihood of a patient developing cardiovascular disease. That information can be added to the list of attributes. The idea of “developing heart disease” could be recorded in several different ways:
- did the patient develop any heart disease within a certain time interval (i.e., within six years), with possible answers yes/no.
- did the patient develop X-level severity heart disease within a certain time interval (i.e., within six years): with possible answers None, Grade I, II, III.
- show some level of a specific indicator for heart disease within a time interval: for example, percent of coronary artery blockage.
This approach is trying to predict the future. We have values of attributes today and want to predict the outcome that we will see in some future. In any case, we can pick a measurable target and try to develop a relationship between the features and the outcome. The concrete outcome is called output, target value, or simply, target. If our goal is to find the different classes, or categories, of a possible outcome, then we are talking about classification. Classification deals with the categorical output. If there are only two target classes for output, then we delve into a binary classification task. For example, you can think about {Yes, No} or {True, False} targets. Binary classification problems are described mathematically using {−1, +1} or {0, 1}. If we have more than two target classes, we are talking about a multiclass problem. If the output is numerical, like the usual decimal numbers, then we call the process regression. In short, predicting a category is called classification, and predicting a numerical value is called regression. If we want to emphasize the features being used to predict the unknown future outcome, we may call them input features. predictive features, or predictors.
Some examples of both approaches can be found in the following table:
Classification | Regression |
Stock trading – buy or sell a stock based on a stock’s price history and other financial and market data. | Stock pricing – attempt to predict the future price of a stock, similar to buy/sell. |
Medical diagnosis – from a patient’s medical record, output whether they are sick or healthy. Here we are dealing with a combination of text and images: medical records, notes, and medical imaging. | Web browsing behavior – predict how likely the user is to purchase an item from an online store, based on an online user’s browsing and purchasing history. The input features are not numerical but the output is a percentage value. |
Image classification – from the input image output animal (cat or dog or none). | Student success – predict student scores on exams, based on homework completion rates, class attendance, a measure of participation, grades in previous courses, and even opened-ended written assessments. |
Many options describe a learning system. Learning from data can be applied to different specialties, such as business, science, and medicine. There are different tasks within a domain, such as image recognition, medical diagnosis, web browsing behavior, and stock market predictions. Different types of data are available for different tasks. Relating features to an outcome might be done using several models.
According to the CRISP-DM data flow chart, the high-level steps for developing learning systems are:
1. Task understanding – develop an understanding of our task.
2. Collecting data – collect and understand our data
3. Data preparation – prepare the data for modeling
4. Modeling – build models of relationships in the data
5. Evaluation – evaluate and compare models
6. Deployment – transition the model into a deployable system.
Our main interest is the modeling step. To develop a supervised learning system, we have to deal with several issues. First, we have to decide which part of the data is our output and what are the features; then, how to relate our input features to our output. What is our algorithm? More on these interesting issues in the subsequent articles to be released soon. Thanks for reading this article. If you like it, please recommend, and share it.