Nips day 2

This is the second day of nips. Here is what I learned.

In the morning, Matthias Seeger (Amazon) presented the model that they use to forecast demand. That was very interesting. That was basically a linear regression with handcrafted features (he didn’t go in detail there about how they were crafted). The regression itself was also interesting: they want to regress the number of sales in the day to predictors; they did so by having two logistic regression for determining whether you would get 1 sale or more, 2 sales or more, and then a Poisson likelihood for all sales higher than that. This makes the system able to model the fact that the first few values are much higher than they should be in a Poisson model, while modeling the tail behavior in a simple way.
Another interesting bit: they have periods where products aren’t in stock, so they can’t be sold. If they ignore these periods, their model goes wrong (it under-estimates the desirability of the product). If they model these correctly: they are no sales because the product is out-of-stock, then their model is way better. As always, the generative model should match the physical realities of the data.

In the afternoon, there was also a great talk by Saket Navlakha from the Salk institute. I’m normally not a great fan of bio-mimetics approach (trying to use biology to improve machine learning techniques, or, more generally, engineering technology; in most cases, the constraints we deal with as engineers are extremely different from the constraints that biology is dealing with) but he had two great examples of exactly that. The first one consisted of creating a communication network by removing edges from a fully-connected (or just very dense) network.

Final great thing was a very particular mixture of Bayesian methods and deep learning by Marco Fraccaro, Søren Kaae Sønderby, Ulrich Paquet and Ole Winther. They wanted to model sequential data (recordings of speech). A good idea for a model would use a latent markov-chain, conditional on which the sound at each time step is independent (a hidden-markov-chain model). Normally, people would use a very simple latent model: a Kalman filter. This doesn’t work well at all as it as too simple dynamics compared to the actual dynamics of the signal being modelled. What Marco and co-authors propose to use instetad is to use a deep-network to represent the dynamics of the latent space oO. They then are able to perform inference on the weights of their deep network !!?! and finally they learn to use an inference network to take a soundwave as an input and return a variational inference approximation of the posterior on the latent space !!!?!??!!?! This sounds insane, but it actually works and works very well.
I love these extremely creative combinations of deep-learning with bayesian inference: there was a lot of it at nips this year and I’m very excited to see where it goes in the future.