Before comparing NNs, make sure each is designed as well as possible.
For example, are the errors randomly distributed with zero average?
If not, the NN has failed to extract trend information and more
learning, perhaps with more hidden nodes is indicated. Plot error
vs target value. Consult a statistics handbook and apply a test to
quantify randomness (e.g., runs test).
You indicate below that you are concerned about the possible
nonnormality of the error distribution. My reply is:
You have commanded your NNs to minimize SSE (and nothing else).
If your error distribution is random with zero error, what
difference does it make if it's nonnormal?
Now, let's compare NNs. Aside from training time and number
of weights, common sense leads to:
1. Which NN has the lowest SSE?
2. Is the difference in SSE statistically significant?
3. A statistician can refer you to the appropriate test.
My guess is an F-test.
4. Is the difference in SSE practically significant?
It looks like you are trying to test if the difference in the
average errors is statistically significant not whether the
difference in SSE is. Or
5. Which NN has the lowest bias?
6. Is the difference in bias statistically significant?
7. A statistician can refer you to the appropriate test.
My guess is a t-test.
8. Is the difference in bias practically significant?
Note: A significant bias indicates an inadequate design.
For example, not enough hidden nodes. Reread my
As indicated above, for a given SSE and zero bias, nonnormality
is a red herring. Besides, even if the biases are nonzero,
the t-test is relatively insensitive to nonnormality.
Who said there is a problem?
As far as comparing the two NNs, you can perform statistical tests
on the distribution of the differences in outputs.
I think the most important comparison test is a plot of SSE1 and
SSE2 vs noise level as zero mean Gaussian noise is added to the
inputs. Well, since the NNs are nonlinear, you might want to plot
Hope this helps.