Why backtracking alone cannot suffice to solve combinatorial problems: CP perspective, empirical hardness models evidence, and proof complexity results

Constraint Programming (CP) is a technique with particular strength to solve classes of combinatorial problems for which global constraints were identified and exploited in the design of specialized algorithms. However, I have been convinced that CP is, above all, a general model to understand combinatorial problems and their solution – using the term ‘model’ in a broader sense, as explained in a previous post and based on John Hooker’s article at CP Letters.

Regardless of the solving method used, the classic aim of CP is to tackle the Constraint Satisfaction Problem (CSP), which ultimately encodes any problem involving variables with discrete and finite values as the intersection of the regular languages defined by each of its constraints. When special properties are observed in a class of such languages that would allow us to make strong and efficient inferences among those variables, we have found one of those so-called global constraints. Not surprisingly, the concept is in fact a formalization of the widespread practice of exploiting problem structure, which was also discussed in a previous post linking the implementation of local search operators to CP concepts.

In spite of such claimed ‘agnosticism’ regarding solving methods, CSP solving is often associated with backtracking search algorithms. The rationale is that, if compared to other general solving techniques like dynamic programming, backtracking scales better in memory use with the size of the problem being solved. As a consequence, CP lectures and many lines of research are developed with the premise of using backtracking-based search. In practice, however, that post on CP and local search also points that the converse is true: local search methods like variable neighborhood search have been successfully used in commercial CP solvers.

One of such lines in which backtracking was extensively studied is the adaptive selection of algorithms, which is aimed to select the most suitable algorithm to solve a given instance according to predictive models of performance. From the characterization of CSP instances by means of properties like constrainedness, which is the ratio of pairwise constraints to variables, empirical hardness models were developed that also documented easy-hard-easy phase transitions. In short, it is easy to solve an unconstrained problem or to infer that an overconstrained one is unfeasible, but those in the ‘feasibility frontier’ are much harder to solve. A short reference about the topic is the 2005 article of Carla Gomes and Bart Selman in Nature entitled ‘Can get satisfaction’, and a precise figure of such hardness can be seen in the graph at slide number 6 of Kevin Leyton-Brown’s talk at the Workshop on Beyond Worst-Case Analysis.

Although a well-documented phenomenon, until very recently it was yet unclear to me why this happens. I was only convinced of it this week, when I came across some lecture notes of KTH Professor Jakob Nordstrom on proof complexity that discussed the resolution of formulas for the pigeonhole principle. Proof complexity is a discipline concerned with understanding the hardness of theorem proving, and comprises the study of characteristics and strengths of different proof systems as well as their equivalence. Among such systems is resolution, which is equivalent to solving through backtracking a CSP in which variables can only assume two values – true or false, or 0 or 1 – and constraints only use the logical operators NOT, OR, and AND in Conjunctive Normal Form (CNF): a set of clauses using NOT and OR joined by AND. This is a special form of the satisfiability problem, or simply SAT, for which the backtracking search with only local inference corresponds to the well-known DPLL algorithm.

The pigeonhole principle states that it is not possible to assign n pigeons into m pigeonholes without putting two of them in the same pigeonhole if n > m. It turns out that encoding an instance of this problem in polynomial size in n and m such that n = m+1 as a satisfiability problem (SAT), i.e., a CSP with variable domains of size two and constraints built of logical operators in CNF, results in an unsatisfiable problem that requires an exponential lower-bound resolution. That happens because a valid refutation requires exploring a large number of combinations of assignments to prove that none would lead to the satisfaction of the problem constraints. [check the erratum at the end of the post]

In short, backtracking really needs global constraints!

Erratum

According to the first comment to this post, there is an enconding for which the mentioned refutation is efficiently found. The discussion is continuned in the sequel of this post.

When CP becomes local search and local search becomes CP, and the same about CP and machine learning

As it is said in a Brazilian song entitled Sobradinho, “o sertão vai virar mar, dá no coração o medo que algum dia o mar também vire sertão [the hinterland will become sea, it comes to the heart the fear that someday the sea also will turn into hinterland]”. However, such an occurrence in the academic field is often reason for joy instead of worries – as well as for joint conferences where you can be delighted by funny flamewars.

In the case of Constraint Programming (CP), I have been observing the development of a strong relationship with local search as well as with machine learning. Coincidentally, while I was writing this post, a presentation of a local search solver at the 2012 ISMP (International Symposium on Mathematical Programming) seems to have caused some buzz on Twitter about local search and CP. To the argument that local search has not been exploited by major commercial solvers (as far as I could understand about the controversy), some observed that there is much of local search embedded inconspicuously in CP solvers etc. In the sequence, the discussion extended to the limits between what is CP and what is not.

However, why do we suppose that such techniques must be mutually exclusive?

From CP to local search…

Local search has been increasingly used within CP solvers, as is the case of Comet and of many papers authored by researchers from IBM describing what is behind CP Optimizer’s curtains. Many of them regard Large Neighborhood Search (LNS) as a must to solve cumulative scheduling problems, i.e., scheduling problems in which many activities can be simultaneously processed by the same resource. LNS is based on separate operators to generate modified solutions from a starting pool of solutions, and to adjust feasibility issues of those modified solutions. This latter type of operator addresses a common drawback of local search: the need to fix what the neighborhood operator unintentionally broke. [check the erratum at the end of the post]

For some, it might look strange to associate CP with local search and detach it from standard systematic techniques. While those techniques are important to tackle combinatorial problems in the general case, the size of many real-world problems is far beyond what it is possible to do in a lifetime. Hence, who cares if the search space is not fully explored by the search procedure if that is more of a theoretical matter than of a practical one? In addition, CP is about exploiting problem constraints, from the modeling with more semantic to the propagation that reduces the search space to the search that attempts to obtain feasible solutions. That leaves plenty of space to define the search according to the constraints in the model.

… and back from local search to CP!

A few weeks ago, I had the chance to observe the opposite trend: I saw a presentation about another commercial local search framework, but this one was using the concept of global constraints. As far as I could grasp, it was not on purpose. In order to reduce the problem of generating invalid solutions with standard local search operators on a broad range of problems, specialized constraints were defined to model certain combinatorial problems whose solutions cannot be modified at random. In other words, they are using specialized operators according to the structure of the problem, what is quite similar to performing propagation with specialized constraints in the context of CP models. As stated previously, the use of global constraints in CP is not only about propagation, but rather about exploiting problem structure for the sake of a better performance. And that is what they did on that local search solver. When I see such coincidences of design, I wonder how powerful but at the same time still unknown is the principle beneath CP.

Now from CP to machine learning…

And since constraints can be leveraged not only to improve propagation but also to improve search, why not mix the latter with the feedback received along the solving process in order to improve the selection of algorithms? That is a topic in which I am very interested: the development of adaptive search methods for constraint solving. It has been studied in great depth for the past decade and a half, and has received even more attention in recent years. One of the venues dedicating space to this topic is the LION (Learning and Intelligent OptimizatioN) conference series, which will have its seventh edition in Catania at the beginning of 2013. That is one of the conferences that I want to attend in the following years (monetary and job vacation constraints prevent my prompt participation).

… and back again to CP!

And if local search and CP each show off in the context of the other, one should expect the same in this case. At least, that seems to be the thought of the proponents of the recent call for papers of the first COCOMILE workshop (COmbining Constraint solving with MIning and LEarning), which will be held next August in Montpellier during the 2012 ECAI (European Conference on Artificial Intelligence). The workshop CFP emphasizes the two-way relationship between both fields. While I know that there exist some research on the CP-machine learning direction (even though I am not following it very closely), I think that a single and very specific event that attempts to look for both directions is of great help to foster further development. And, of course, that is yet another event that I want to attend in a nearby future (hopefully, publishing about at least in one of such directions).

Erratum

The description of LNS as stated in the post is not correct. It would be more precise to define that the first type of operator fixes values for part of the variables, and that the second attempts to find a feasible solution by assigning values to the remaining variables. The prior definition that I gave resembles somewhat what I understand as standard local search operators: procedures that fix the value of all variables on each solution that they generate. In the case of LNS, however, just part of the values is fixed and an iterative process is aimed at complete such partial solution.