Convex Formulations of Finite-Width Neural Networks

This essay investigates convex formulations of finite-width neural networks and their geometric interpretation via Clifford algebra.

The essay is structured around two complementary viewpoints on neural network training: convex reformulations of finite-width networks and geometric interpretations based on Clifford algebra. A central part of the work involved understanding the relationship between these formulations and presenting them in a coherent framework.

I provided simple examples and geometric illustrations of the features selected in a simple two-dimensional example, for both the convex formulation, as well as the formulation derived using Clifford Algebra.

I also conducted numerical experiments to compare convex optimisation with stochastic gradient descent in both fixed- and random-design settings. These experiments illustrate the gap between the global optimum of the convex formulation and the solutions obtained by non-convex gradient descent and stochastic gradient descent across various settings.

Through this project, I developed a deeper understanding of how optimisation theory interacts with the neural network minimization problem and how geometric ideas can provide insight into the behaviour of hidden neurons. In particular, I found it interesting that finite-width neural networks admit convex reformulations under suitable conditions, which helps explain their strong empirical performance.

I still have not figured out theoretical explanations for my numerical experiment results. (For example, the minimised training loss reached its global minimum when networks are trained on noisy data, but does not reach it when networks are trained on noiseless data.) The theoretical explanations behind these outcomes can be explored in future research.

Codes are available at GitHub repository.

Essay is available at link to pdf.

Below is the illustration for formulation of neural networks via Clifford Algebra.

Illustration for formulation of neural networks via Clifford Algebra