Building A Machine Learning Algorithm
The first and most critical step of building a machine learning algorithm is to correctly define the problem by identifying the necessary inputs and expected outputs.
The next step is data collection. When collecting data, a larger dataset with ample feature variance will usually result in a better-performing model.
The data then needs to be prepared through pre-processing and feature engineering. The most common techniques used at this stage include eliminating or manually inputting missing data, performing one-hot-encoding on categorical variables, and feature scaling via normalization and standardization.
We can also perform dimensionality reduction on the dataset using PCA which can help prevent overfitting and improve training time. Lastly, the data needs to be separated into the test set and the training set. A common ratio for the train-test split is 80/20.
After the data is prepared, it is time to choose a model. When choosing the model, it is critical to consider the size of the training data, training time, and the data type of the output. Furthermore, there is always a tradeoff. For example, opting to go for a more robust model may extend the training time.
Additionally, the data we have access to will also dictate whether the algorithm will be based on Supervised Learning, Unsupervised Learning, or Reinforcement Learning.
Once the optimum model is decided, the model can then be trained and tested. If the test results of the model are unsatisfactory, the model can then be optimized by tuning the hyperparameters that directly affect the learning process and predictive strength of the algorithm. This can also be done using cross-validation. The hyperparameters of the machine learning model will then be tuned until the model produces the best predictions possible.
Written by Max Ong