# Machine Learning

## Introduction

Machine learning algorithms is an umbrella term that covers a range of generic algorithms. There are other branches of machine learning such as supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is the most commonly used.

Supervised learning requires training a model with labeled training data (data with parameters and a solution). The supervised learning algorithm then generates a solution that is as close to the solution as possible. This solution can then be applied to any other problem of the same type.

```
[ Features ] ----> [ Supervised Learning Model ] ----> [ Prediction ]
```

How the supervised learning model works depends on the implementation. A multivariate linear regression approach will give each feature a weight and iteratively determine a set of weights that produces a prediction that is as close to the answer as possible.

Unsupervised learning is basically what data mining is: it attempts to look for hidden patterns in the data.

### Multivariate Linear Regression

Multivariate linear regression is a 3 step algorithm that converts a training data set into a set of weights that most closely reproduces the answer in the training data set.

The 3 steps are:

- For each parameter, set its weight to an arbitrary value, say 1.0
- For every piece of data, see how far off the function using the given weights are off by. The average error amount is the cost of the function.
- Repeat step 2 with different weights to minimize the cost (ie. the error)

Step 2 requires determining how 'wrong' the prediction is from the answer. This typically is done by:

Cost = (ans1 - pred1)^2 + (ans2 - pred2)^2 + ... + (ansn - predn)^2 / n

By squaring each difference, we penalize larger errors more.

Step 3 requires picking new weights that reduces the cost. This can be done by something called a batch gradient descent. A gradient descent is an iterative optimization algorithm that can be used to minimize the cost function and to find the best weights. However, the data that can be handled with this approach has to be linear. You can use other algorithms such as gradient boosting for non-linear data.

## Hands On Overview

Some software that will be used are:

- NumPy - Python library for scientific computing
- scikit-learn - Python machine learning library
- pandas - (panel datas) virtual spreadsheet

### Using NumPy

You can multiply an entire array with numpy like so. This is more efficient than iterating over each number since the library can make use of SIMD (single instruction, multiple data) instructions on the CPU. Using libraries such as this and by removing iterative loops is called vectorizing the code.

If you find yourself writing loops over an array, you're probably doing it wrong.

```
import numpy as np
sizes = np.array([
1, 2, 3, 4, 5
])
sizes = sizes * 0.3
```

### Using Pandas

Pandas makes viewing large csv files very easy. The library makes working with data similar to a spreadsheet.

For example, to load the first 100 values into a HTML file:

```
import pandas
import webbrowser
import os
# Read the dataset into a data table using Pandas
data_table = pandas.read_csv("ml_house_data_set.csv")
# Create a web page view of the data for easy viewing
html = data_table[0:100].to_html()
# Save the html to a temporary file
with open("data.html", "w") as f:
f.write(html)
# Open the web page in our web browser
full_filename = os.path.abspath("data.html")
webbrowser.open("file://{}".format(full_filename))
```

### Gradient Boosting

A machine learning algorithm that uses an ensemble of decision trees to predict values. This method can handle complex patterns and data that linear models can't.

A decision tree is basically a simple 'if' on a particular value. Eg:

sq_ft > 1000? -> Yes: +$50,000 value -> No: -$50,000 value

Decision trees have one decision point. We can chain multiple trees together in order to have more decision points.

To increase the complexity of a model using decision trees, we can either:

- Create a tree that's hundreds of layers deep with thousands of branching paths or
- Create many small trees that contribute a little to the final answer (called ensemble learning)

### Training Data

The training data should have two main components: the features and the answer. Features are the attributes of the data and is typically called 'X'. The answer is the value that the learning model is trying to predict and is called 'Y'.

The training data should cover as many combinations of features as possible. The more data, the better. You should aim for at least 10x more data points than the number of features.

When using the training data, about 70% should be used for training, with the 30% used for testing the model.

The most time consuming part of supervised machine learning is figuring out which features to use for modelling a problem (also known as feature engineering). The proper set of features to use depends on the problem and also any existing knowledge in that realm which makes the machine learning algorithm work more accurately and efficiently.

Feature engineering boils down to:

- Adding or dropping features that correlate strongly with the output value.
- Combining features to simplify the data
- Binning features into categories (eg: yes/no values)
- One-hot encoding which converts enum values (eg. neighbourhoods) into a boolean feature (is_rocky_ridge).

## Glossary

Word | Definition |
---|---|

Overfitting | Where the model is too specific to the training data and 'memorizes' the exact training data; it does not figure out the pattern in the data. You can see this if training set error is low, but test data error is high. |

Underfitting | Where the model is too simple and doesn't fully learn the patterns in the data. You can see this if both training set and test set errors are very high. |

Features | The fields in a piece of data that can be used to determine the answer. Called X in machine learning code. |

Value to Predict | The value to predict is the value generated from a learning model. Called Y in machine learning code. |

Regression | Another name for value prediction. |

## Tensorflow

Tensorflow is a framework for building and deploying machine learning models. The name comes from the design of the system. Any data processed with TensorFlow has to be stored in a multi-dimensional array called tensors. Operations are constructed as a computational graph. In a sense, you are controlling the flow of the data, hence the name TensorFlow.

Large data sets made up of different individual attributes

- Build a model as a graph.
- Train the model
- Test the model
- Evaluate the model

All operations on the graph are executed in a Tensorflow session. A session is an object that runs operations on the graph and tracks the state of each node in the graph.

### Installation

Simplest way to install TensorFlow is to use pip:

```
# pip3 install tensorflow
```

For GPU support, use the `tensorflow-gpu`

package.

### Introduction

Dealing with nodes. Values that get changed every iteration should be placeholder nodes.

```
import tensorflow as tf
# Model parameters
# variable: default value, type
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Placeholders are variables that are populated with training data
# Input training data, and output 'answers' to check the loss value
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
# The model we are training with. We want to solve for W and b.
linear_model = W * x + b
# loss function.
# Sum Squares; squares to penalize larger differences
loss = tf.reduce_sum(tf.square(linear_model - y))
# optimizer function to 'adjust' values
optimizer = tf.train.GradientDescentOptimizer(0.01)
# We want to minimize loss
train = optimizer.minimize(loss)
# training data
x_train = [1, 2, 3, 4, 5]
y_train = [0, -1, -2, -3, -4 ]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
# Training N iterations
for i in range(1000):
sess.run(train, {x: x_train, y: y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
```

## Other Topics