Bayesian non-parametric modeling using Dirichlet processes

Posted on 03/12/2013 by ifog

A Bayesian nonparametric model is a Bayesian model on an infinite-dimensional
parameter space. The parameter space is typically chosen as the set of all possible
solutions for a given learning problem. For example, in a regression problem
the parameter space can be the set of continuous functions, and in a density estimation
problem the space can consist of all densities. A Bayesian nonparametric
model uses only a finite subset of the available parameter dimensions to explain
a finite sample of observations, with the set of dimensions chosen depending on
the sample, such that the e ffective complexity of the model (as measured by the
number of dimensions used) adapts to the data. Classical adaptive problems,
such as nonparametric estimation and model selection, can thus be formulated
as Bayesian inference problems. Popular examples of Bayesian nonparametric
models include Gaussian process regression, in which the correlation structure
is refined with growing sample size, and Dirichlet process mixture models for
clustering, which adapt the number of clusters to the complexity of the data.
Bayesian nonparametric models have recently been applied to a variety of machine
learning problems, including regression, classification, clustering, latent
variable modeling, sequential modeling, image segmentation, source separation
and grammar induction.

In this reading group we will be going over Michael I. Jordan’s tutorial presentation from NIPS focusing on Dirichlet processes.
The original post script file can found in the following link:
http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps
The presentation follows the same lines as the paper found in:
http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf

Matrix completion and compressed sensing

Posted on 17/11/2013 by ifog

Addition of an application of these works (thanks to Noam):
Candes, Emmanuel J., et al. “Phase retrieval via matrix completion.” SIAM Journal on Imaging Sciences 6.1 (2013): 199-225.
http://arxiv.org/pdf/1109.0573.pdf

Also, the presentation Aditya surveyed is attached here: Wainwright_Winedale_060312. Continue reading →

Compilation of topics since May

Posted on 16/10/2013 by ifog

Sent September 9:

This week we will go over the paper:

Lin Xiao, Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization, in JMLR 2010

(NIPS version http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2009_1048.pdf)

It talks about primal-dual averaging, one of the methods mentioned in the review we covered which we didn’t quite work out.

Sent August 26:

Reminder – As we discussed last week, we will talk about the second part of the optimization lecture on Thursday.

The slides: http://pages.cs.wisc.edu/~swright/nips2010/sjw-nips10.pdf

The video: http://videolectures.net/nips2010_wright_oaml/

Sent August 13:

The paper for this week is “Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization”.

Sent August 5:

I want to dedicate a few sessions to the area of optimization in ML. The idea is to cover new results but also try to make a “map” of the area, and make the connections between the fields.

To bring us all up to some level, this week, instead of reading a paper on a specific algorithmic/theoretical result, I thought we should read a review on the subject. I couldn’t find a good one written but found a nice nips tutorial on the subject (so you don’t even have to read:)).

The slides: http://pages.cs.wisc.edu/~swright/nips2010/sjw-nips10.pdf

The video: http://videolectures.net/nips2010_wright_oaml/

Sent July 28:

This week we will read the paper “Sparse coding for multitask and transfer learning”

from ICML 2013.

Sent July 14:

This week (Thursday @14:30) we will continue with Gaussian Processes. The subject will be the paper from ICML2011: http://www.icml-2011.org/papers/323_icmlpaper.pdf, which applies GPs to Reinforcement Learning.

Also, here a motivational video (learning this task previously demanded hundreds of trials, this algorithm does it in 7):

http://www.youtube.com/watch?v=XiigTGKZfks

Sent July 3:

It was suggested that we do a couple of sessions on Gaussian Processes. For next week, please read Chapters 2 and 5 of the book Gaussian Processes for Machine Learning, available at http://www.gaussianprocess.org/gpml/.

Sent June 16:

This week we will go over the paper:

“A Provably Efficient Algorithm for Training Deep Networks” http://arxiv.org/abs/1304.7045

Sent June 9:

Odalric will lead the discussion on the paper –

“Follow the Leader If You Can, Hedge If You Must”
http://arxiv.org/pdf/1301.0534v2.pdf, by Steven de Rooij, Tim van Erven, Peter D. Grünwald, Wouter M. Koolen.
This paper considers the online learning setting and tries to find a way to optimally
tune the Hedge algorithm so as to get an (really) adaptive algorithm.

A quick reference to the Hedge Algorithm: http://onlineprediction.net/n=Main.HedgeAlgorithm

Some motivation why this setting is useful, can be found in http://hal.archives-ouvertes.fr/docs/00/71/51/77/PDF/Devaine-Goude-Stoltz-Gaillard.pdf).

Sent April 29:

This week we’ll be reading the paper “Sparse inverse covariance estimation with the graphical lasso” by Friedman et al. (http://www-stat.stanford.edu/~tibs/ftp/graph.pdf). The paper discusses the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. The connection to graphs- conditional independence may be deduced when there is a zero in the inverse covariance matrix; for a reminder on the subject see the attached tutorial.