Suggested Readings:

Randy: 

1) https://openreview.net/pdf?id=B1J_rgWRW
Is a nice paper with some theorems, but might be too hard.

2) https://arxiv.org/abs/1608.03287
Is a bit easier I think, and also mathematical.

3) https://arxiv.org/abs/1611.03530
Is a paper I discussed in my talk.  It is pretty easy and provides a proof of a single theorem that might be of interest.   I think that would be a nice paper for a student to look at that provides a bridge from the presentation that I gave.

Sarkis:
1) https://www.cambridge.org/core/journals/acta-numerica/article/approximation-theory-of-the-mlp-model-in-neural-networks/18072C558C8410C4F92A82BCC8FC8CF9

If you cannot access Acta Numerica, you can also get the paper easily
on google scholar

Vladimir:
https://arxiv.org/abs/1706.03301

https://arxiv.org/abs/1606.09375

Elisa: 

1) George Cybenko. "Approximation by superposition of sigmoidal function".
Mathematics of control signals and systems, 2(4):303-314, 1989. (Elisa talk
based on this paper) 

2) Baum Eric B. and David Haussler. "What size net gives valid generalization?"/
Advances in neural information processing systems, 1989

3) Makhoul, John, Richard Schwartz and Amron El-Jaroudi. "Classification capabilities of two-layer neural nets". International Conference on Acoustics, Speech, and Signal Processing, IEEE, 1989

4) Barron, Andrew R. "Approximation and estimation bounds for artificial networks". Machine Leaning 14.1, 1994.