Nips workshops day 1

Here is what I learned during the first day of workshops: at the approximate bayesian inference workshop !

The day started by a very cool idea by Surya Ganguli, which showed how he could train a deep network to learn how to reverse the flow of time! It’s actually a bit less impressive than it sounds, but stil super cool. He was looking at how to model some data set in an unsupervised fashion. His idea was the following. You first design a dynamical process that will decay each data point into white noise (or whatever the appropriate equivalent of that is) and then training a deep-net to reverse the flow of time: take as input a sample from white-noise and return a prediction for the corresponding data point. He then gets machine that can approximately sample from the data distribution ! that was great.

I then gave a talk about expectation-propagation and how it can be viewed as a variant of gradient descent, shedding quite a bit of light on this method. I need to write a blog-post on that.

Closing the morning session was a pannel on the computational aspects of approximate inference methods. Pretty interesting. I really liked the fact that the panel was pretty open to questions from the audience. I like it a bit less when panels are very structured: in most cases, I don’t think this leads to very lively discussions.

During the lunch break, there was an awesome number of super cool posters (where were they during the poster sesssions of the main conference!?) which I’ll come back to in a future post if I have time.  People have so many super-cool ideas, and actually make them work! I love the energy-feeling you get from places like nips.

In the afternoon, we had a great statistical talk by Jeffrey Regier. They managed to run a model to infer the position, angle, color, etc of all stars and galaxies in the night sky (500TB). That. Was. Cool. His model had a large number of variables, interacting in interesting ways, and was trying to model a giant image of the sky. This was way cooler than my description of it.

Closing the day was a panel on the foundations and futures of variational inference. It was pretty interesting, but it was very directed. Richard Turner had a pretty great recap on the advantages of expectation propagation (oh yeah!), Philipp Hennig had a great introduction to probabilistic numerical methods. When probed about the combination of deep nets and bayesian inference, Ryan Adams said that he was hyped but that he also liked probabilistic graphical models and that he wants to combine so that both are playing to their strengths. The final panel made a great point about the fact that, for variational inference, we optimize the posterior without considering what we do afterwards with the approximate posterior / what task we solve. This is a great point, though I’m not sure how to find answers to that. Indeed, if my loss function is (\theta - \theta_0)^8 , I’m sure that my approximation algorithm should take this into account, but how? Puzzling …