Proceedings

Slides

M. Basbug. Variational Inference for Nonparametric Switching Linear Dynamical Systems.^†

Switching Linear Dynamical System (SLDS) models are of great interest to many fields including econometrics, brain computer interface, computer vision and neuroscience. Nonparametric Bayesian treatment of SLDS allows the number of temporal modes to grow as needed. A much desired feature, the mode persistence, can be provided with a sticky implementation. In this paper, we propose a variational inference algorithm for the resulting model. We compare the variational inference algorithm to the Gibbs sampling. We also extend the original model to the multiple observation case where each observation has its own Markov process but shares the same temporal mode sequence. We apply the multi observation model to explore the time course of a self initiated movement using MEG data.

B. Chen, N. Chen, Y. Ren, B. Zhang. Nonparametric Latent Feature Relational Models with Data Augmentation.^†

We present nonparametric latent feature relational models for link prediction. Instead of doing standard Bayesian inference, we perform regularized Bayesian inference with a regularization parameter to deal with the imbalance issue in real networks. By introducing the technique of data augmentation, we give the pseudo-likelihood a proper design with auxiliary variables, so that we can develop simple Gibbs sampling algorithms without making restrictive assumptions. Experimental results on real datasets demonstrate that our methods can improve the performance and the latent dimension can be automatically determined.

S. Joshi, O. Koyejo, J. Ghosh. Constrained Inference for Multi-View Clustering. [Download]

We propose a novel approach for probabilistic multi-view clustering that combines view-specific models to improve global coherence. Global incoherence is measured by the difference between view-specific cluster assignment responsibilities. New cluster responsibilities are estimated by optimizing a cost function that maximizes per-view accuracy subject to a user-specified global coherence threshold. When combined with a parameter estimation step, this modified inference encourages the estimation of model parameters that agree between views. We show that the modified inference remains convex when global coherence constraints are given by the norm of the difference between the responsibilities of each model. In addition, the global correction is embarrassingly parallel between examples. The proposed approach is evaluated on a synthetic dataset as well as real data showing improved performance as compared to strong baseline methods for multi-view clustering.

M. Park, M. Nassar. Variational Bayesian inference for forecasting hierarchical time series. [Download]

In many real world data, time series are often hierarchically organized. Based on features such as products or geography, time series can be aggregated and disaggregated at several different levels. The so-called `hierarchical time series' are often forecast using simple top-town or bottom-up approaches. In this paper, we build a probabilistic model that involves dynamically evolving latent variables to capture the proportion changes in time series at each hierarchy. We derive the variational Bayesian expectation maximisation (VBEM) algorithm under the new model. In our algorithm, we implement the posterior inference in a sequential manner that significantly decreases computational overhead common in large hierarchical time series data. Furthermore, unlike the standard EM algorithm that provides point estimates of model parameters, our algorithm yields the distribution over the model parameters, which give us an insight to which subset of features yields the proportion changes of the time series. Simulation results show that our method significantly outperforms other methods in prediction.

A. Storkey, Z. Zhu, J Hu. A Continuum from Mixtures to Products: Aggregation under Bias. [Download]

Machine learning models rely heavily on two compositional methods: mixtures and products. Probabilistic aggregation also commonly uses forms of linear opinion pools (which are effectively mixtures), or log opinion pools (which are effectively products). In this paper, we introduce a complete spectrum of compositional methods, R\'{e}nyi mixtures, that interpolate between mixture models and product models, and hence between log opinion pools and linear opinion pools. We show that these compositional methods are maximum entropy distributions for aggregating information from agents subject to individual biases, with the R\'{e}nyi divergence parameter dependent on the bias. We also demonstrate practically that R\'{e}nyi mixtures can provide better performance than log and linear opinion pools, with the optimal limit of log opinion pools when all agents are unbiased and see the same data. We infer that log opinion pools are the most appropriate aggregator for machine learning competitions. We designed, ran and analysed a machine learning Kaggle competition, the results of which confirmed this expectation. Finally we relate R\'{e}nyi mixtures to recent work on machine learning markets, showing that R\'{e}nyi aggregators are directly implemented by machine learning markets with isoelastic utilities, and so can result from autonomous self interested decision making by individuals contributing different predictors.

^† The authors chose not to post their camera ready.