There are four key steps you would follow when creating a machine learning model.
1. Choose and Prepare a Training Data Set
Training data is information that is representative of the data the machine learning application will ingest to tune model parameters. Training data is sometimes labeled, meaning it has been tagged to call out classifications or expected values the machine learning mode is required to predict. Other training data may be unlabeled so the model will have to extract features and assign clusters autonomously.
For labeled, data should be divided into a training subset and a testing subset. The former is used to train the model and the latter to evaluate the effectiveness of the model and find ways to improve it.
2. Select an Algorithm to Apply to the Training Data Set
The type of machine learning algorithm you choose will primarily depend on a few aspects:
- Whether the use case is prediction of a value or classification which uses labeled training data or the use case is clustering or dimensionality reduction which uses unlabeled training data
- How much data is in the training set
- The nature of the problem the model seeks to solve
For prediction or classification use cases, you would usually use regression algorithms such as ordinary least square regression or logistic regression. With unlabeled data, you are likely to rely on clustering algorithms such as k-means or nearest neighbor. Some algorithms like neural networks can be configured to work with both clustering and prediction use cases.
3. Train the Algorithm to Build the Model
Training the algorithm is the process of tuning model variables and parameters to more accurately predict the appropriate results. Training the machine learning algorithm is usually iterative and uses a variety of optimization methods depending upon the chosen model. These optimization methods do not require human intervention which is part of the power of machine learning. The machine learns from the data you give it with little to no specific direction from the user.
4. Use and Improve the Model
The last step is to feed new data to the model as a means of improving its effectiveness and accuracy over time. Where the new information will come from depends on the nature of the problem to be solved. For instance, a machine learning model for self-driving cars will ingest real-world information on road conditions, objects and traffic laws.