Program

Session 1: 9:00 -10:20

9:00 - 9:20 : Workshop Introduction [Slides]
9:20 - 10:20 : Tony Jebara, Quadratic Majorization for Learning Probabilistic Models [Slides]
Abstract
The partition function plays a key role in probabilistic modeling including in maximum entropy, minimum relative entropy, conditional random fields, graphical models, and maximum likelihood estimation. To optimize partition functions we introduce a quadratic variational upper bound. This inequality facilitates majorization methods: optimization of complicated functions through the iterative solution of simpler sub-problems. Such bounds remain efficient to compute even when the partition function involves a graphical model (with small tree-width) or in latent likelihood settings. For large-scale problems, low-rank versions of the bound are provided and outperform LBFGS as well as first-order methods. Several learning applications are shown and reduce to fast and convergent update rules. Experimental results show advantages over state-of-the-art optimization methods. We also propose a stochastic version of bound majorization which competes well against stochastic gradient descent (across any of its variations and tunings). It converges in fewer iterations, reduces computation time and finds better parameter estimates. The proposed method bridges first- and second-order stochastic optimization methods by maintaining linear computational complexity (with respect to dimensionality) while exploiting second order information about the pseudo-global curvature of the objective function.

Coffee Break: 10:20 - 10:40

Session 2: 10:40 – 12:00

10:40 - 11:40 : Shiliang Sun, Multi-view Machine Learning with Maximum Entropy Discrimination [Slides]
Abstract
In this talk, I will first review existing multi-view learning methods very briefly. Then after retrospecting the multi-view maximum entropy discrimination (MED) introduced last year, I will talk about our recent research on alternative multi-view MED, which is largely different in methodology with the previous one. Finally, I will discuss some interesting future work.
11:40 - 12:00 : A. Storkey, Z. Zhu, J Hu. A Continuum from Mixtures to Products: Aggregation under Bias.

Lunch: 12:00 - 14:00

Session 3: 14:00 – 15:20

14:00 - 15:40 : Mark Reid, Entropic Duality in Probability and Learning [Slides]
Abstract
This talk will be a tour of some recent results that myself and collaborators have developed in the context of fast learning with expert advice and updating generalised exponential families. Underpinning all these results are properties of what we call “entropic duals” – the convex conjugate of functions defined over probability distributions – and their connections with mirror updating.
15:00 - 15:20 : S. Joshi, O. Koyejo, J. Ghosh. Constrained Inference for Multi-View Clustering.

Coffee Break: 15:20 - 15:40

Session 4: 15:40 – 17:00

15:40 - 16:00 : B. Chen, N. Chen, Y. Ren, B. Zhang. Nonparametric Latent Feature Relational Models with Data Augmentation.
16:00 - 16:20 : M. Park, M. Nassar. Variational Bayesian inference for forecasting hierarchical time series.
16:20 - 16:40: M. Basbug. Variational Inference for Nonparametric Switching Linear Dynamical Systems.
16:40 - 17:00 : Wrap-up / Discussion