News

There are several algorithms to find optimal values (training cost vs. model accuracy) for model parameters, however, gradient decent, in various flavors, is one of the most popular.
However, SGD performs poorly when applied to train networks on non-ideal analog hardware composed of resistive device arrays with non-symmetric conductance modulation characteristics. Recently we ...
When using SGD, in a very surprising math result, it turns out that L2 regularization and weight decay are mathematically equivalent. Because of this fact, the terms L2 regularization and weight decay ...