How to Decide the Number of Layers and Nodes in a Neural Network
Configuring the number of layers and nodes is an essential step when setting up the neural network architecture, and the efficiency of the configuration always has a crucial impact on the predictability of neural networks.
In the beginning, I would like to specify that the best way to solve this problem is to carry out systematic experimentation and there is not a commonly accepted and standard method to decide the number of layers.
Before going into methods, it is significant to know why we have multiple layers. On the one hand, a single-layer neural network can only be used to represent linearly separable functions.
Once the underlying patterns are non-linear and complex, a single-layer neural network will fail to capture them. On the other hand, deep neural networks are experimented with to be more efficient, but sometimes it is hard to find the optimal number of layers.
Moreover, there are very few situations where performance improves with more than one hidden layer.
One hidden layer is sufficient for many problems; however, it should be noticed that a greater depth does result in better generalization for a wide variety of tasks, implying that deeper architectures have a useful prior over the shallow ones.
First, hidden layers may sometimes be redundant as linear and generalized linear models can solve a variety of problems. Linear models work perfectly if the data is linearly separable, or nonlinearities are hard to be recognized from the noisy and small databases. In other words, one should always consider using a simple network with an input layer and an output layer.
Next, with one or two hidden layers, neural networks can approximate any function that maps one finite space to another and represent an arbitrary decision boundary with rational activation functions to any accuracy.
This conclusion derives from the universal approximation theorem that states a neural network with a single hidden layer can approximate any well-behaved function from a space to another.
Also, additional hidden layers do not generate much improvement in the results. But the problem is that such theorem doesn’t specify how many nodes are needed in the layers and how difficult it will be for such a neural network to learn the pattern.
A neural network of greater depth can also approximate the function with higher generation and efficiency, especially for the complex problems in computer vision.
Lastly, once we have made up the decision to use deep neural networks, we can turn to (i)automated search strategies, (ii)pretrained network models,(iii) growing,(iv)pruning, and (v)genetic methods to specify the number of layers.
A rule of thumb indicates that a neural network with 3 to 5 hidden layers can solve most of the high-dimensional problems. (i) Automated search runs preset strategies to loop through a set of configurations and then come up with the best network architecture. Some common strategies used include random search, grid search, heuristic search, exhaustive search.
The heuristic method will search across configurations with a genetic algorithm or a Bayesian optimization; Grid search tries each configuration from combinations of the hyperparameters. (ii) Using configuration from pretrained models is also a good choice and often generates competent results, especially those in computer vision and NLP like VGG19, ResNet50, etc. (iii) Growing approaches start with a small network with a limited number of input nodes and layers, then inserts layers with some specific criterion.
Common growing methods include Cascade-Correlation, Meiosis Networks, and Automatic Structure Optimization. (iv) Pruning approaches prune a network with a much deeper structure than a necessary one. Hessian matrix is often used here. (v) Genetic methods can also be used to select an optimal network structure.
The methods mentioned here would be enough to solve the problem, but a more challenging task is to determine the nodes in each layer. More importantly, I want to emphasize that the increasing number of hidden layers will increase the complexity of the model and excessive hidden layers often lead to overfitting.
To sum up, to determine the number of layers in a neural network, it should be started from a simple architecture as one or two layers can solve the problems in most cases. And when it comes to a deep neural network, a systematic experiment is always a more reliable choice but there is always a tradeoff between overfitting and accuracy.