Nips workshops day 2

Here is what I learned on the second day of the nips workshops where I went to the deep bayesian networks workshops.

I feel like I should take a second to define precisely what the workshop is about. In a nutshell, it’s about trying to combine Bayesian methods and deep neural networks. On paper this seems like a great idea to augment Bayesian methods with the flexibility of neural networks. However, it is a path that really emphasizes the key weakness of Bayesian methods: the fact that it is riddled with computational problems.
The initial idea that people have been using to deal with the computational problems is “variational inference” (minimizing the “reverse” Kullback-Leibler divergence KL(q,p))

The two highlights of the day were a first talk by Zoubin Ghahramani who gave a nice history lesson on Bayesian neural networks. We do tend to get a little bit caught up in what we are doing, and it’s great to have these talks from time to time to remember the giants whose shoulders we are standing on. The second highlight was a tribute to one of those giants, David Mackay who passed away earlier during the year, by Ryan Adams. I didn’t know Prof. Mackay, but this was a very moving talk and it painted a very vivid picture of him. He seemed like a great guy, and an even greater scientist. He will be missed.

An intriguing idea (which was presented several times during the whole conference) was entitled “Stein variational inference”. It consists in finding a cloud of points to approximate a target probability distribution according to an objective that is reminescent of KL(q,p). I’m not sure how much this differs from using a sparse kernel-based approximation of the log probability of the target distribution. They also had a deep network method that was reminescent of generative adverserial networks.
This has a lot of interesting flavor with the combination of Stein’s method and variational inference and kernel-methods so I definitely need to look at it further

At this point, I was pretty saturated so I just couldn’t follow anymore, but the panel discussion which closed the workshop was pretty great. I’m guessing that these panels are growing on me after all. I really didn’t like them last year at nips, as well as the few that were in the cosyne workshops (I remember one at cosyne that was particularly unproductive). It really depends on the panel and the public’s comment… but it can be great. Overall, I liked the first day of the workshops more, but I’m guessing that, quite simply, that workshop simply aligned a bit more with my interests than today’s one. It was still great though.


Nips workshops day 1

Here is what I learned during the first day of workshops: at the approximate bayesian inference workshop !

The day started by a very cool idea by Surya Ganguli, which showed how he could train a deep network to learn how to reverse the flow of time! It’s actually a bit less impressive than it sounds, but stil super cool. He was looking at how to model some data set in an unsupervised fashion. His idea was the following. You first design a dynamical process that will decay each data point into white noise (or whatever the appropriate equivalent of that is) and then training a deep-net to reverse the flow of time: take as input a sample from white-noise and return a prediction for the corresponding data point. He then gets machine that can approximately sample from the data distribution ! that was great.

I then gave a talk about expectation-propagation and how it can be viewed as a variant of gradient descent, shedding quite a bit of light on this method. I need to write a blog-post on that.

Closing the morning session was a pannel on the computational aspects of approximate inference methods. Pretty interesting. I really liked the fact that the panel was pretty open to questions from the audience. I like it a bit less when panels are very structured: in most cases, I don’t think this leads to very lively discussions.

During the lunch break, there was an awesome number of super cool posters (where were they during the poster sesssions of the main conference!?) which I’ll come back to in a future post if I have time.  People have so many super-cool ideas, and actually make them work! I love the energy-feeling you get from places like nips.

In the afternoon, we had a great statistical talk by Jeffrey Regier. They managed to run a model to infer the position, angle, color, etc of all stars and galaxies in the night sky (500TB). That. Was. Cool. His model had a large number of variables, interacting in interesting ways, and was trying to model a giant image of the sky. This was way cooler than my description of it.

Closing the day was a panel on the foundations and futures of variational inference. It was pretty interesting, but it was very directed. Richard Turner had a pretty great recap on the advantages of expectation propagation (oh yeah!), Philipp Hennig had a great introduction to probabilistic numerical methods. When probed about the combination of deep nets and bayesian inference, Ryan Adams said that he was hyped but that he also liked probabilistic graphical models and that he wants to combine so that both are playing to their strengths. The final panel made a great point about the fact that, for variational inference, we optimize the posterior without considering what we do afterwards with the approximate posterior / what task we solve. This is a great point, though I’m not sure how to find answers to that. Indeed, if my loss function is (\theta - \theta_0)^8 , I’m sure that my approximation algorithm should take this into account, but how? Puzzling …


Nips day 3

Here is what I learned during the third day of nips.

The day started by a very very cool talk by Kyle Cranmer from CERN. He did a great job explaining how physicists performed the statistical analysis behind the discovery of the Higgs boson. His talk went in great detail but remained extremely easy to understand and was absolutely great (probably my favorite of the whole conference). He then also had a few great proposals for interesting problems that the community might be interested in modeling.

In the afternoon, there was a very entertaining talk by Marc Raibert from the boston dynamics company. They do the wonderful robots that you probably have seen on youtube. I was a little bit sad that he didn’t give us too much insight into how they actually make the robots move but he did show a lot and had lots of entertaining videos so he gets a shout-out.

Another cool idea in the afternoon, that was a lot more focused on practical theory, was presented by Damien Scieur, Alexandre d’Aspremont and Francis Bach. They showed a post-processing method for optimizing systems. Their method is a very small and cheap trick that you can plugin to any optimization method to make it better. Overall, it was pretty cool, but I’m not sure if I understand optimization well enough to give the best commentary on that.

Nips day 2

This is the second day of nips. Here is what I learned.

In the morning, Matthias Seeger (Amazon) presented the model that they use to forecast demand. That was very interesting. That was basically a linear regression with handcrafted features (he didn’t go in detail there about how they were crafted). The regression itself was also interesting: they want to regress the number of sales in the day to predictors; they did so by having two logistic regression for determining whether you would get 1 sale or more, 2 sales or more, and then a Poisson likelihood for all sales higher than that. This makes the system able to model the fact that the first few values are much higher than they should be in a Poisson model, while modeling the tail behavior in a simple way.
Another interesting bit: they have periods where products aren’t in stock, so they can’t be sold. If they ignore these periods, their model goes wrong (it under-estimates the desirability of the product). If they model these correctly: they are no sales because the product is out-of-stock, then their model is way better. As always, the generative model should match the physical realities of the data.

In the afternoon, there was also a great talk by Saket Navlakha from the Salk institute. I’m normally not a great fan of bio-mimetics approach (trying to use biology to improve machine learning techniques, or, more generally, engineering technology; in most cases, the constraints we deal with as engineers are extremely different from the constraints that biology is dealing with) but he had two great examples of exactly that. The first one consisted of creating a communication network by removing edges from a fully-connected (or just very dense) network.

Final great thing was a very particular mixture of Bayesian methods and deep learning by Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet and Ole Winther. They wanted to model sequential data (recordings of speech). A good idea for a model would use a latent markov-chain, conditional on which the sound at each time step is independent (a hidden-markov-chain model). Normally, people would use a very simple latent model: a Kalman filter. This doesn’t work well at all as it as too simple dynamics compared to the actual dynamics of the signal being modelled. What Marco and co-authors propose to use instetad is to use a deep-network to represent the dynamics of the latent space oO. They then are able to perform inference on the weights of their deep network !!?! and finally they learn to use an inference network to take a soundwave as an input and return a variational inference approximation of the posterior on the latent space !!!?!??!!?! This sounds insane, but it actually works and works very well.
I love these extremely creative combinations of deep-learning with bayesian inference: there was a lot of it at nips this year and I’m very excited to see where it goes in the future.



Nips day 1

December is a great month. Like most people, I of course look forward to christmas to top it all, but there is another great time during december: the nips conference, which is held in Barcelona this year. Nips is a great conference on machine learning, and it is a great pleasure to be able to aItend it. I’ll take this great excuse to revive this blog and share some of the things I’m learning here.


Day 1 action report.

Day 1 was dedicated to tutorials: long talks with the objective of introducing the audience to important topics of machine learning.

I first attended a tutorial on variational inference by David Blei, Shakir Mohamed and Rajesh Ranganath. It was really cool. They presented variational inference (i.e: minimizing the reverse-KL divergence inside some restricted class of distributions to the target distribution) in a way that made understanding everything that they do very clear. They first presented an overview of the Bayesian approach, then showed how variational inference can be used on almost-all situations, and finally showed some cool applications mixing Bayesian inference and deep nets. I don’t think their slides are online yet, sadly 😦 However, David Blei and co-authors have a review on the subject which seems worth checking out:

The second exciting thing I got to learn about concerned gradient methods. I know very little about those, so I went to listen to Francis Bach and Suvrit Sra to learn more. Their talks were focused on showing the prowess of SVRGD (“stochastic variance reduction gradient descent”, if I’m not mistaken) when compared to alternatives. The key idea of SVRGD is:

  • we do stochastic gradient descent
  • but, we keep the gradients which we have computed in memory and, even though they are “stale”: they do not correspond to the point at which we are at, we follow the sum of all gradients we have in memory

This approach, surprisingly, works super well and just wipes the floor with SGD and deterministic gradient methods. I’ll be sure to read the corresponding papers in detail (some samples: , ). Slides are available at: and .


I’m sure the rest of the conference will be at least as good !