At the very least, there is a simple solution - choose randomly. Sample your parameter values from a normal distribution with zero mean and a relatively small standard deviation. In the case of neural networks, there are better choices, but even then a random initialization will work. Computing the Hessian

Cpt code for washoutwhere Eq is the expected value with respect to the distribution pq. This matrix can also be viewed as the Hessian matrix of the Shannon Entropy H(p)= Z p(x;q)logp(x;q)dx: This paper will approach the speciﬁc model M = y(Q) of multivariate normal probability density functions on c = Rn, with covariance matrix S. This model is a

2 Second Derivatives Aswehaveseen,afunctionf (x;y)oftwovariableshasfourdifferentpartialderivatives: Ofcourse, fxy (x;y )and fyx x;y are alwaysequal ...