This week will be about Causality. Please read – : an article, “causal feature selection”, where we are interested in the causal part: section 1 to 2.2 included, sections 5 and 7. (add section 8 if you want a lengthy list of applications)

– A part of a tutorial presentation on causality. I removed unnecessary slides to make sure you would not print the whole things if you don’t want to. The video and the full slides are available on video lecture, but not necessary for the reading group. However, the guy who gave the tutorial is really good.

Probabilistic Graphical models – Intro

Last week we discussed the subject of Probabilistic Graphical Models – an introduction.
It is a good and general introduction which covers the basis, and tries to provide intuitions rather than a perfect mathematical formalism. It is slightly outdated, but it makes many links with other areas of machine learning.

Dimension Reduction

In the last few weeks we covered a few interesting subjects in dimensionality reduction. I would like to give a recap of the papers and give some pointers to further reading.

In the first meet we covered a comparative review mostly on non linear methods for dimension reduction (manifold learning) –

Next, we dived into the details of one common manifold learning method – Diffusion Maps. Papers: The original paper by Coifman (link) and an extension paper.

Last, we talked about Random Projections. We looked at several different sources:

  •  For some general theory see link. It gives the basic theorem and proofs and an application to fast computation of truncated PCA (which is highly relevant to our the global methods of dimensionality reduction which use it to place points given a distance matrix).
  • For some practice see: “Random projection in dimensionality reduction: Applications to image and text data”. The paper shows how distances between images are better preserved by random projections than by PCA (note PCA isn’t supposed to preserve distances, exactly). In particular it gives a feel for how these methods sometimes work at much more reasonable dimensions than basic theory predicts.
  • A little more of both: “Experiments with Random Projection” Dasgupta (Sections 3.2, 4.3). Shows a nice extra lemma about eccentricity of Gaussians under random projections, and an application to simple classification of the MNIST drawn digits data-set.

For some more reading on random projections, suggested by Odalric:

Random projections in ML:

  • Linear regression with Random Projections:
  • Random Projections trees: Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In Proceedings Of The 40th Annual ACM Symposium On Theory Of Computing, STOC ’08, pages537–546, New York, NY, USA, 2008. ACM. (+ dasgupta”s webpage)
  • Spectral clustering: Bin Zhao and Changshui Zhang. Compressed spectral clustering. InProceedings Of The 2009 IEEE International Conference On Data Mining Workshops, ICDMW ’09, pages 344–349, Washington, DC, USA, 2009. IEEE Computer Society.

For applications in signal processing:

  • Sparse recovery with Brownian sensing:
  • Compressive sensing:

More topics on random matrices, some examples:

  • Random projection with sparse matrices: Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binarycoins.Journal Of Computer And System Sciences, 66(4):671–687, June 2003
  • The RIP property: R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometry property for random matrices. Constructive Approximation,28(3):253–263,2008.
  • Compressed Sensing and high-dimensional geometry:
  • The notion of incoherence: E. Candes and J. Romberg. Sparsity and incoherence in compressive sampling.Inverse Problems,23:969–985,2007.
  • Singular values:
  • Fast Random projections (e.g.):