Dynamic Programming – Thiago Serra

Constraint Programming (CP) is a technique with particular strength to solve classes of combinatorial problems for which global constraints were identified and exploited in the design of specialized algorithms. However, I have been convinced that CP is, above all, a general model to understand combinatorial problems and their solution – using the term ‘model’ in a broader sense, as explained in a previous post and based on John Hooker’s article at CP Letters.

Regardless of the solving method used, the classic aim of CP is to tackle the Constraint Satisfaction Problem (CSP), which ultimately encodes any problem involving variables with discrete and finite values as the intersection of the regular languages defined by each of its constraints. When special properties are observed in a class of such languages that would allow us to make strong and efficient inferences among those variables, we have found one of those so-called global constraints. Not surprisingly, the concept is in fact a formalization of the widespread practice of exploiting problem structure, which was also discussed in a previous post linking the implementation of local search operators to CP concepts.

In spite of such claimed ‘agnosticism’ regarding solving methods, CSP solving is often associated with backtracking search algorithms. The rationale is that, if compared to other general solving techniques like dynamic programming, backtracking scales better in memory use with the size of the problem being solved. As a consequence, CP lectures and many lines of research are developed with the premise of using backtracking-based search. In practice, however, that post on CP and local search also points that the converse is true: local search methods like variable neighborhood search have been successfully used in commercial CP solvers.

One of such lines in which backtracking was extensively studied is the adaptive selection of algorithms, which is aimed to select the most suitable algorithm to solve a given instance according to predictive models of performance. From the characterization of CSP instances by means of properties like constrainedness, which is the ratio of pairwise constraints to variables, empirical hardness models were developed that also documented easy-hard-easy phase transitions. In short, it is easy to solve an unconstrained problem or to infer that an overconstrained one is unfeasible, but those in the ‘feasibility frontier’ are much harder to solve. A short reference about the topic is the 2005 article of Carla Gomes and Bart Selman in Nature entitled ‘Can get satisfaction’, and a precise figure of such hardness can be seen in the graph at slide number 6 of Kevin Leyton-Brown’s talk at the Workshop on Beyond Worst-Case Analysis.

Although a well-documented phenomenon, until very recently it was yet unclear to me why this happens. I was only convinced of it this week, when I came across some lecture notes of KTH Professor Jakob Nordstrom on proof complexity that discussed the resolution of formulas for the pigeonhole principle. Proof complexity is a discipline concerned with understanding the hardness of theorem proving, and comprises the study of characteristics and strengths of different proof systems as well as their equivalence. Among such systems is resolution, which is equivalent to solving through backtracking a CSP in which variables can only assume two values – true or false, or 0 or 1 – and constraints only use the logical operators NOT, OR, and AND in Conjunctive Normal Form (CNF): a set of clauses using NOT and OR joined by AND. This is a special form of the satisfiability problem, or simply SAT, for which the backtracking search with only local inference corresponds to the well-known DPLL algorithm.

The pigeonhole principle states that it is not possible to assign n pigeons into m pigeonholes without putting two of them in the same pigeonhole if n > m. It turns out that encoding an instance of this problem in polynomial size in n and m such that n = m+1 as a satisfiability problem (SAT), i.e., a CSP with variable domains of size two and constraints built of logical operators in CNF, results in an unsatisfiable problem that requires an exponential lower-bound resolution. That happens because a valid refutation requires exploring a large number of combinations of assignments to prove that none would lead to the satisfaction of the problem constraints. [check the erratum at the end of the post]

In short, backtracking really needs global constraints!

Erratum

According to the first comment to this post, there is an enconding for which the mentioned refutation is efficiently found. The discussion is continuned in the sequel of this post.

The INFORMS blog suggested that O.R. bloggers wrote about food. Figuring that a good meal is usually accompanied by a good wine, I’ve decided to focus on using an Operations Research technique to maximize the number of wines someone can taste at a time. I warn in advance to connoisseurs accessing this blog by chance that my knowledge about wine tasting is very short (once in a while, I resume my reading of Jancis Jobson’s book “How to Taste Wine”, but I’m closer to the first pages than to the last ones). Anyway, I hope that some of them find dynamic programming useful for their practice.

First of all, how wine should be tasted? According to a book that I just browsed during lunch time, the following rules must be followed:

white before red;
young before old;
light before heavy;
dry before sweet.

To simplify matters, I will assume that those rules are unbreakable (are they?), I will ignore that it is recommended to taste only similar wines each time, and get to the following question: under such circumstances and provided a collection of bottles, how can I maximize the number of tastings one can do at a time?

Let’s consider as an example the following wines, which this novice considered good and attempted to roughly classify in a binary way:

W1	Argentina	Finca Martha 878 Malbec 2008	red, young, heavy, dry
W2	Brazil	Miolo Gammay 2010	red, young, light, dry
W3	Brazil	Terranova Late Harvest Moscatel 2005	white, young, heavy, sweet
W4	Brazil	Terranova Shiraz 2010	white, young, light, dry
W5	Chile	Casillero del Diablo Carmenère 2009	red, young, heavy, dry
W6	Portugal	Dão Cabriz 2007	red, young, heavy, dry
W7	Portugal	Ramos Pinto Late Bottled Red Port 2000	red, old, heavy, sweet
W8	Portugal	Sandeman White Port 2005	white, young, heavy, sweet
W9	South Africa	Obikwa Pinotage 2008	red, young, light, dry

Without loss of generality and for the sake of breaking ties to avoid equivalent solutions (e.g., tasting W2 before W9 or W9 before W2), we will consider that one must proceed incrementally another in the case of a tie (i.e., W2 before W9 but not W9 before W2).

Now suppose that we start with W3 because it is white and young. Soon we will realize that only two wines can remain in our list for being also heavy and sweet: W8 and W9. Hence, W3 might not be a good starting point. However, it is easy to figure that the optimal path from W3 on is to taste W8 and then W9 because the former is white and the later red. Similarly, the optimal path starting from W8 is to proceed to W9, and from W9 is to do nothing.

Beyond the wines, do you “smell” something interesting here? We have overlapping subproblems and those optimal solutions share optimal substructures with each other. That’s where Dynamic Programming (DP) fits in! Using DP, we consider optimal solutions to varied subproblems as building blocks to find optimal solutions to increasingly bigger problems. Thus, even if those subproblems arise many times, it suffices to solve each of them once.

In the current case, we could do that by finding the best option between the following subproblems: Pi = “How many wines can I taste if I start from Wi?”, for i = 1 to 9. In turn, answering to each of those questions consists of adding one to the best answer found among wines that can be tasted after that first one. For instance, we start with W1, W2, …, or W9 to find the answers P1, P2, …, and P9. Picking W1, we have that P1 = 1 + MAX(P5, P6, P7) because W1 can only be followed by P5, P6 or P7. Note that once we answered P1, we already know the answers to P5, P6 and P7, and therefore we do not need to recalculate them in the remainder of the solving process. The act of memorizing such solutions for later recover is called memoization.

Applying DP to the current case, we will find the following answer to the subproblems:

P1	P2	P3	P4	P5	P6	P7	P8	P9
4	6	3	7	3	2	1	2	5

Working backwards, we start from W4 (P4=7) to find which wine can that can be tasted after W4 and from which point on it is possible to taste 6 wines, and so on until the last one. The final answer to our problem is the sequence W4, W2, W9, W1, W5, W6, W7.

As a final remark, I would like to remember that quantity does not mean quality. Drink responsibly and remind that a tasting experience does not necessarily means getting drunk in the end: you can always spit and enjoy the rest of your day in a better shape.

Once said that, “saúde”, “cheers”, or – as my Polish friends from the Erasmus program would say – “na zdrowie”!

Update: Shiraz is a grape that produces red wine, not white. Anyway, it is still possible to taste 7 out of the 9 wines at once.

Tag: Dynamic Programming

Why backtracking alone cannot suffice to solve combinatorial problems: CP perspective, empirical hardness models evidence, and proof complexity results

Tasting more wine with dynamic programming