2

We do not initialize weight matrices with zeros because the symmetry isn’t broken during the backward pass, and subsequently in the parameter updating process.

But it is safe to set the bias vector up with zeros, and they are updated accordingly.

Why is it safe to do so, and not the opposite?

**Why can’t we initialize bias vectors with random numbers and weight matrices with zeros?**

My initial thought is that a vector is of rank (n, 1) where $n \in \mathbb{N}$. This is not true for a matrix. And thus symmetry does not really come into play in the case of vectors.

But that does not answer the question that each layer of a deep neural network has its own weight matrix, and there is no need for symmetry across different layers.

So, **when we talk about symmetry are we talking about symmetry across different rows of the same matrix?**

Column wise symmetry should not matter much as they are for different training examples (for the first hidden layer). **Does column-wise symmetry disturb the training process much in the case of hidden layers other than the first one?**