1Here we include the input layer in counting layers, such that a network with K + 1 layers has K - 1 hidden layers, in accordance with the conventions discussed earlier in this thesis. The input layer is layer k = 0, and the output layer is layer k = K.