## Info

- Location: Zoom
- Language: English or French
- Keywords: machine learning; Information theory; deep learning; neural network; information bottleneck; generalization bounds
- Everyone is welcome, whether or not you work on machine learning or information theory. The presentation will also include the basics of machine learning and information theory in order to understand the global idea.

## Abstract

In this talk, I discuss recent approaches that explore the Information Bottleneck principle [1] (i.e., the trade-off between information compression and prediction quality, in the sense of the information theory) in order to understand generalization in deep learning architectures. Tishby and Noga [2], then extended by Shwartz-Ziv and Tishby [3], suggest that the goal of deep networks is to optimize the IB principle for each layer, according to the following discoveries: 1) most of the training is spent on compressing the input to efficient representation and not fitting the training labels. 2) The representation compression begins when the training errors become small. 3) The converged layers lie on or very close to the IB theoretical bounds. Finally, I discuss the criticism from Saxe et al. [4] which has shown that such approach does not work for recent deep network architectures (convolution, ReLU, residual connections, etc.), the answer from Naftali Tishby to the work of Saxe et al., the confirmation of Naftali Tishby’s point from the recent work from Noshard, Zeng and Hero [5] and finally the opened questions in generalization in deep networks.

[1] Naftali Tishby, Feranando C. Pereira, William Bialek. “*The Information Bottleneck Method*“. In proceedings of 37th Annual Allerton Conference on Communication, Control and Computing,1999.

[2] Naftali Tishby and Zaslavsky Noga. “*Deep learning and the information bottleneck principle*“. In *IEEE Information Theory Workshop (ITW), *2015.

[3] Ravid Shwartz-Ziv and Naftali Tishby. “*Opening the black box of deep neural networks via information*“. arXiv preprint arXiv:1703.00810*, *2017.

[4] Andrew M. Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan D. Tracay, and David D. Cox. *“On the Information Bottleneck Theory of Deep Learning”*. In Proceedings of the International Conference on Representation Learning (ICLR), 2018

[5] Morteza Noshad, Yu Zeng, Alfred O. Hero. “*Scalable mutual information estimation using dependence graphs.*” In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.