Nips day 1

December is a great month. Like most people, I of course look forward to christmas to top it all, but there is another great time during december: the nips conference, which is held in Barcelona this year. Nips is a great conference on machine learning, and it is a great pleasure to be able to aItend it. I’ll take this great excuse to revive this blog and share some of the things I’m learning here.

Day 1 action report.

Day 1 was dedicated to tutorials: long talks with the objective of introducing the audience to important topics of machine learning.

I first attended a tutorial on variational inference by David Blei, Shakir Mohamed and Rajesh Ranganath. It was really cool. They presented variational inference (i.e: minimizing the reverse-KL divergence inside some restricted class of distributions to the target distribution) in a way that made understanding everything that they do very clear. They first presented an overview of the Bayesian approach, then showed how variational inference can be used on almost-all situations, and finally showed some cool applications mixing Bayesian inference and deep nets. I don’t think their slides are online yet, sadly ðŸ˜¦ However, David Blei and co-authors have a review on the subject which seems worth checking out:Â https://arxiv.org/abs/1601.00670

The second exciting thing I got to learn about concerned gradient methods. I know very little about those, so I went to listen to Francis Bach and Suvrit Sra to learn more. Their talks were focused on showing the prowess of SVRGD (“stochastic variance reduction gradient descent”, if I’m not mistaken) when compared to alternatives. The key idea of SVRGD is:

• we do stochastic gradient descent
• but, we keep the gradients which we have computed in memory and, even though they are “stale”: they do not correspond to the point at which we are at, we follow the sum of all gradients we have in memory

This approach, surprisingly, works super well and just wipes the floor with SGD and deterministic gradient methods. I’ll be sure to read the corresponding papers in detail (some samples:Â https://arxiv.org/abs/1202.6258 ,Â https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf ). Slides are available at:Â http://www.di.ens.fr/~fbach/fbach_tutorial_vr_nips_2016.pdf and http://www.di.ens.fr/~fbach/ssra_tutorial_vr_nips_2016.pdf .

I’m sure the rest of the conference will be at least as good !