My neural network is a piecewise linear regression, but which one? (A glimpse of our AAAI-20 paper on Empirical Bounds)

One of the most beautiful things about the theory of computation is, in my opinion, how you start from something as simple and intuitive as a finite-state machine and then – with a few more bells and whistles – end up with something as complex and powerful as a Turing machine.  When someone mockingly says that machine learning or deep learning is just regression, and that often implies merely linear regression, they are missing a similar jump in complexity as the one from finite-state to Turing machines.

In this post, we will look at rectifier networks: one of the simplest but yet very expressive type of artificial neural network. A rectifier network is made of Rectified Linear Units, or ReLUs, and each ReLU defines a linear function on its inputs that is then composed with a non-linear function that takes the maximum between 0 and that linear function. If the output is positive, we say that the input has activated the ReLU. At the microscopic level, one would be correct to think that adjusting the coefficients of the linear function associated with each ReLU during training is akin to linear regression (and at the same time this person could be puzzled that the negative outputs produced when a ReLU is not activated are thrown away).

ReLU

However, what happens at the macroscopic level when many ReLUs are put together is very different. If we were to combine units defining just linear functions, we would get a single linear function as a result. But as we turn negative outputs into zeroes, we obtain different linear functions for different inputs, and consequently the neural network models a piecewise linear function instead. That alone implies a big jump in complexity. Intuitively, we may guess that having a function with more pieces would make it easier to fit the training data. According to our work so far, that seems correct.

PWL

 

Here is what we found out in prior work published at ICML 2018:

  1. The maximum number of these pieces, which are also called linear regions, may grow exponentially on the depth of the neural network. By extending prior results along the same lines, where a zigzagging function is modeled with the network, we have shown that a neural network with one-dimensional input and L layers having n ReLUs each can define a maximum of (n+1)^L linear regions instead of n^L.
    LB
  2. Surprisingly, however, too much depth may also hurt the number of linear regions. If we distribute 60 ReLUs uniformly among 1 to 6 layers, the best upper bound on the number of linear regions will be attained by different depths depending on the size of input. Because this upper bound is tight for a single layer, that implies that a shallow network may define more linear regions if the input is sufficiently large.
    UB
  3. If no layer in a small neural network trained on the MNIST dataset is too narrow (3 units or less), we found that the accuracy of the network relates to the number of linear regions. However, it takes a long time to count linear regions even in small neural networks (for example, sometimes almost a week with just 32 ReLUs).
    Accuracy

In the paper that we are presenting at AAAI 2020, we describe a fast estimate on the number of linear regions (i.e., pieces of the piecewise linear function) that could be used to measure larger neural networks. We can map inputs to outputs of a rectifier network using a mixed-integer linear programming (MILP) formulation that also includes binary (0-1) variables denoting which units are activated or not by different values for the input. Every linear region corresponds to a different vector of binary values denoting which units are activated, so we just need to count all binary vectors that are feasible. The formulation below connects inputs to the output of a single ReLU.

MILP

What makes counting so slow –  besides the fact that the number of linear regions grows very fast with the size of the neural network – is that there is little research on counting solutions of MILP formulations. And that is justifiable: research and development in operations research (OR) methodologies such as MILP are focused on finding a good (and preferably optimal) solution as fast as possible. Meanwhile, counting solutions of propositional satisfiability (SAT) formulas, which involve only Boolean variables, is a widely explored topic back in artificial intelligence (AI). For example, we can test multiple times how many additional clauses would it take to make a SAT formula unsatisfiable and then draw conclusions on the total number of solutions based on that. There is no reason why the same methods could not be applied to count solutions on binary variables in MILP formulations.

SAT_MC

In other words, we started with an AI problem of counting linear regions, moved to an OR problem of counting solutions of an MILP formulation, and came back to AI to use approximate model counting methods that are typically applied to SAT formulas. However, there is something special about MILP solvers that can make approximate counting more efficient: callback functions. Most of the SAT literature is based on the assumption that testing the satisfiability of a formula with every set of additional clauses requires solving that formula from scratch. With a lazy cut callback in an MILP solver, we can add more constraints as soon as a new solution is found. Consequently, we reduce the number of times that we have to prove that a formulation is infeasible, which is computationally very expensive.

MILP_MC

By adding random parity (XOR) constraints involving a few binary variables to iteratively restrict the number of solutions of the MILP formulation, we obtained probabilistic lower bounds on the number of linear regions that are very similar in shape to the actual figures. On the left of the figure below, we are analyzing 10 networks trained on the MNIST dataset for every possible combination of 22 ReLUs distributed in two hidden layers followed by 10 ReLUs of output. The black and red curves on top are upper bounds, the black dots are averages of the exact number of linear regions, and the line charts below are lower bounds with probability 95%. On the right of the figure below, we see that the probabilistic lower bounds are orders of magnitude faster to calculate if the actual number of linear regions is large (at least 1 million each).

Results

If you are curious about our next steps, take a look at our upcoming CPAIOR 2020 paper, where we look at this topic from a different perspective by asking the following: do we need a trained neural network to be as wide or as deep as it is to do what it does? If a network is trained with L1 regularization, the answer is probably not! We use MILP to identify units that can be removed or merged in a neural network without affecting the outputs that are produced. In other words, we obtain a lossless compression.

Where to find our work at AAAI 2020:

  • Spotlight Presentation: Sunday, 11:15 – 12:30, Clinton (Technical Session 9: Constraint Satisfaction and Optimization)
  • Poster Presentation: Sunday, 7:30 – 9:30, Americas Halls I/II

Links to our papers:

  1. Bounding and Counting Linear Regions of Deep Neural Networks (ICML 2018)
  2. Empirical Bounds on Linear Regions of Deep Rectifier Networks (AAAI 2020)
  3. Lossless Compression of Deep Neural Networks (CPAIOR 2020)

All papers are joint work with Srikumar Ramalingam. Christian Tjandraatmadja is a co-author in the ICML 2018 paper. Abhinav Kumar is a co-author in the CPAIOR 2020 paper.

 

Summer 2020 schools on algorithms, data science, machine learning, networks, optimization, transportation, and other relevant topics in operations research

This post covers relevant schools happening between April of 2020 and September of 2020. If you know of other schools that are not listed here, please reach out to me. Like in previous semesters, I will keep updating this post and I may add some schools with past deadlines as reference for readers looking for schools in the next years.

Previous posts: Summer 2016, Winter 2016 / 2017, Summer 2017, Winter 2017 / 2018, Summer 2018, Winter 2018 / 2019, Summer 2019, and Winter 2019 / 2020.

Spring School on Mathematical Statistics
March 30 – April 3     (deadline: January 24)
Leipzig, Germany

LMS Research School: Graph Packing
April 19-25     (deadline: January 31)
Eastbourne, England
* Included on January 23

NATCOR Heuristics and Stochastic Algorithms
April 20-24     (deadline not posted)
Nottingham, England

ISCO 2020 Spring School: Data Science, Machine Learning and Optimization
May 2-3     (deadline not posted)
Montreal, QC, Canada

Complex Networks: Theory, Methods, and Applications
May 18-21     (deadline: February 23)
Como, Italy

Mathematical Modelling, Numerical Analysis and Scientific Computing
May 24-29     (deadline: April 30)
Kacov, Czech Republic

Machine Learning Crash Course (MLCC 2020)
May 25-29     (deadline: April 3)
Oslo, Norway
* Included on February 28

Simons Institute Workshop: Statistics in the Big Data Era
May 27-29     (deadline: February 1st for travel support)
Berkeley, CA, USA

Column Generation 2020
May 31 – June 3     (deadline not posted)
Sainte-Adèle, QC, Canada

Summer School on Modern Optimization for Transportation
June 1-5     (deadline not posted)
Frejus, France

NATCOR Convex Optimization
June 1-5     (deadline not posted)
Edinburgh, Scotland

Risk Measurement and Control: Fintech and Digital Banking
June 3-6     (deadline not posted)
Rome, Italy

IPCO (Integer Programming and Combinatorial Optimization) Summer School
June 6-7     (deadline not posted)
London, England

Structural Graph Theory
June 7-12     (deadline not posted)
Murol, France
* Included on February 3 by suggestion of Aurélie Lagoutte

ICAPS-ICRA Summer School on Plan-Based Control for Robotic Agents
June 8-12     (deadline: March 31)
Paris, France

Zaragoza Logistics Center PhD Summer Academy
June 8-19     (deadline: June 1st)
Zaragoza, Spain

Summer School in Logic and Formal Epistemology
June 8-26     (deadline: March 13)
Pittsburgh, PA, United States
* Included on February 22

Hausdorff School Algorithmic Data Analysis
June 15-19     (deadline: March 29)
Bonn, Germany
* Included on February 22

Research school in computational complexity
June 15-19     (deadline not posted)
Paris, France
* Included on February 22 by suggestion of Ludmila Glinskih

Simulation Summer School (S3)
June 21     (deadline: February 15)
State College, PA, United States
* Included on January 27

DTU CEE Summer School 2020: Advanced Optimization, Learning, and Game‐Theoretic Models in Energy Systems
June 21-26     (deadline: January 31)
Copenhagen, Denmark

3rd International Summer School on Artificial Intelligence and Games
June 22-26     (deadline: March 1st for early registration)
Copenhagen, Denmark

Swedish Summer School in Computer Science (S3CS 2020): The Method of Moments in Computer Science and Beyond & Polyhedral Techniques in Combinatorial Optimization
June 28 – July 4     (deadline: February 11)
Stockholm, Sweden

Machine Learning Summer School – Germany
June 28 – July 10    (deadline: February 11)
Tubingen, Germany

Regularization Methods for Machine Learning (RegML)
June 29 – July 3    (deadline: March 20)
Genova, Italy

Data Science Summer School (DS3)
June 29 – July 3    (deadline not posted)
Palaiseau, France

Tsinghua University 2020 Deep Learning Summer School
June 29 – July 12    (deadline: April 14)
Beijing, China

International School of Mathematics “Guido Stampacchia”: Graph Theory, Algorithms and Applications
July 1-8     (deadline: April 10)
Erice, Italy

Special Interest Group on Genetic and Evolutionary Computation (SIGEVO) Summer School
July 5-9     (deadline: April 3)
Cancun, Mexico
* Included on January 21 by suggestion of Juergen Branke

4th Summer School on Cognitive Robotics
July 6-10     (deadline not posted)
Brisbane, Australia
* Included on January 23 by suggestion of Philip Kilby

Eastern European Machine Learning Summer School: Deep Learning and Reinforcement Learning
July 6-11     (deadline: March 27)
Krakow, Poland

Gdańsk Summer School on Algorithms for Discrete Optimization and Deep Learning
July 6-12     (deadline: April 30 for early registration)
Gdańsk, Poland
* Included on January 23 by suggestion of Georg Anegg

EURO PhD Summer Schools on Multiple Criteria Decision Aiding / Making (MCDA / MCDM)
July 6-17     (deadline: February 1st)
Ankara, Turkey

Bocconi Summer School in Advanced Statistics and Probability
July 6-17     (deadline: March 31)
Como, Italy

EADM Summer school on Learning and Decision Making
July 8-15     (deadline not posted)
Barcelona, Spain
* Included on February 22

EURO PhD School on Data Driven Decision Making and Optimization
July 10-19     (deadline: January 15)
Seville, Spain

3rd Advanced Course on Data Science & Machine Learning (ACDL 2020)
July 13-17    (deadline: March 31)
Siena, Italy
* Included on February 28

EURO PhD School on Sustainable Supply Chains
July 19-23     (deadline: January 20)
Lisbon, Portugal

Latin-American Summer School in Operational Research (ELAVIO)
July 19-24     (deadline: February 29)
Arequipa, Peru
* Included on January 20 by suggestion of Rodrigo Linfati

Gene Golub SIAM Summer School 2020: Theory and Practice of Deep Learning
July 20-31     (deadline: February 1)
Muizenberg, South Africa

Argonne Training Program on Extreme-Scale Computing
July 26 – August 7     (deadline: March 2)
St. Charles, IL, United States

4rd International Summer School on Deep Learning (DeepLearn 2020)
July 27-31     (deadline: January 26 for early registration)
Leon, Mexico

Metaheuristics Summer School: Learning and Optimization from Big Data
July 27-31     (deadline: March 5)
Catania, Italy

4th Modelling Symposium: Introducing Deep Neural Networks
July 27-31     (deadline: March 27)
Magdeburg, Germany

Advanced Methods in Operations Research for Logistics and Transportation
July 27-31     (deadline not posted)
Bogota, Colombia

Digital Transformation of Mobility Systems – OR Models and Methods
July 27-31     (deadline: March 31)
Munich, Germany
* Included on February 28 by suggestion of Layla Martin

Deep Learning and Reinforcement Learning (DLRL) Summer School 2020
July 29 – August 6     (deadline not posted)
Montreal, QC, Canada

Machine Learning Summer School – Indonesia
August 3-9     (deadline: April 30)
Bandung, Indonesia

The Cornell, Maryland, Max Planck Pre-doctoral Research School 2020
August 4-9     (deadline: February 15)
Saarbrucken, Germany
* Included on January 27 by suggestion of Alex Efremov

Oxford Machine Learning School
August 17-22     (deadline: April)
Oxford, England
* Included on January 27

Prague Summer School on Discrete Mathematics
August 24-28     (deadline: March 15)
Prague, Czech Republic

Simons Institute Workshop: Probability, Geometry, and Computation in High Dimensions Boot Camp
August 24-28     (deadline not posted)
Berkeley, CA, USA

Simons Institute Workshop: Theory of Reinforcement Learning Boot Camp
August 31 – September 4     (deadline not posted)
Berkeley, CA, USA

Summer School on Machine Learning and Big Data with Quantum Computing (SMBQ 2020)
September 7-8     (deadline not posted)
Porto, Portugal
* Included on February 28

Combinatorial Optimization at Work (CO@Work)
September 14-26     (deadline not posted)
Berlin, Germany

NATCOR Forecasting and Predictive Analytics
September 21-25     (deadline not posted)
Lancaster, England

Simons Institute Workshop: Deep Reinforcement Learning
September 28 – October 2     (deadline not posted)
Berkeley, CA, USA