Tasting more wine with dynamic programming

The INFORMS blog suggested that O.R. bloggers wrote about food. Figuring that a good meal is usually accompanied by a good wine, I’ve decided to focus on using an Operations Research technique to maximize the number of wines someone can taste at a time. I warn in advance to connoisseurs accessing this blog by chance that my knowledge about wine tasting is very short (once in a while, I resume my reading of Jancis Jobson’s book “How to Taste Wine”, but I’m closer to the first pages than to the last ones). Anyway, I hope that some of them find dynamic programming useful for their practice.

First of all, how wine should be tasted? According to a book that I just browsed during lunch time, the following rules must be followed:

  • white before red;
  • young before old;
  • light before heavy;
  • dry before sweet.

To simplify matters, I will assume that those rules are unbreakable (are they?), I will ignore that it is recommended to taste only similar wines each time, and get to the following question: under such circumstances and provided a collection of bottles, how can I maximize the number of tastings one can do at a time?

Let’s consider as an example the following wines, which this novice considered good and attempted to roughly classify in a binary way:

W1 Argentina Finca Martha 878 Malbec 2008 red, young, heavy, dry
W2 Brazil Miolo Gammay 2010 red, young, light, dry
W3 Brazil Terranova Late Harvest Moscatel 2005 white, young, heavy, sweet
W4 Brazil Terranova Shiraz 2010 white, young, light, dry
W5 Chile Casillero del Diablo Carmenère 2009 red, young, heavy, dry
W6 Portugal Dão Cabriz 2007 red, young, heavy, dry
W7 Portugal Ramos Pinto Late Bottled Red Port 2000 red, old, heavy, sweet
W8 Portugal Sandeman White Port 2005 white, young, heavy, sweet
W9 South Africa Obikwa Pinotage 2008 red, young, light, dry

Without loss of generality and for the sake of breaking ties to avoid equivalent solutions (e.g., tasting W2 before W9 or W9 before W2), we will consider that one must proceed incrementally another in the case of a tie (i.e., W2 before W9 but not W9 before W2).

Now suppose that we start with W3 because it is white and young. Soon we will realize that only two wines can remain in our list for being also heavy and sweet: W8 and W9. Hence, W3 might not be a good starting point. However, it is easy to figure that the optimal path from W3 on is to taste W8 and then W9 because the former is white and the later red. Similarly, the optimal path starting from W8 is to proceed to W9, and from W9 is to do nothing.

Beyond the wines, do you “smell” something interesting here? We have overlapping subproblems and those optimal solutions share optimal substructures with each other. That’s where Dynamic Programming (DP) fits in! Using DP, we consider optimal solutions to varied subproblems as building blocks to find optimal solutions to increasingly bigger problems. Thus, even if those subproblems arise many times, it suffices to solve each of them once.

In the current case, we could do that by finding the best option between the following subproblems: Pi = “How many wines can I taste if I start from Wi?”, for i = 1 to 9. In turn, answering to each of those questions consists of adding one to the best answer found among wines that can be tasted after that first one. For instance, we start with W1, W2, …, or W9 to find the answers P1, P2, …, and P9. Picking W1, we have that P1 = 1 + MAX(P5, P6, P7) because W1 can only be followed by P5, P6 or P7. Note that once we answered P1, we already know the answers to P5, P6 and P7, and therefore we do not need to recalculate them in the remainder of the solving process. The act of memorizing such solutions for later recover is called memoization.

Applying DP to the current case, we will find the following answer to the subproblems:

P1 P2 P3 P4 P5 P6 P7 P8 P9
4 6 3 7 3 2 1 2 5

Working backwards, we start from W4 (P4=7) to find which wine can that can be tasted after W4 and from which point on it is possible to taste 6 wines, and so on until the last one. The final answer to our problem is the sequence W4, W2, W9, W1, W5, W6, W7.

As a final remark, I would like to remember that quantity does not mean quality. Drink responsibly and remind that a tasting experience does not necessarily means getting drunk in the end: you can always spit and enjoy the rest of your day in a better shape.

Once said that, “saúde”, “cheers”, or – as my Polish friends from the Erasmus program would say – “na zdrowie”!

 

Update: Shiraz is a grape that produces red wine, not white. Anyway, it is still possible to taste 7 out of the 9 wines at once.

Resolutions to optimize O.R. blogging

My blog is on air for almost one year. Despite having a modest audience, lots of data has been stored about its visits and it would be ironic if a blog about Operations Research and Analytics does not use such data to improve itself. Based on some data from 2011, I’d like to commit myself to give the audience more of what they expect in 2012 and share some conclusions with other bloggers interested in doing the same.

Top 5 most viewed posts (out of 26):

# 1 Drug discovery optimization: a meeting point for data mining, graph theory and operations research
(204 unique views)

Context:

  • Motivated by an INFORMS blog challenge.
  • It is something I like and worked with in the past.
  • I found a catchy title (I guess).
  • It was a family work (my mother-in-law has a ph.D. in organic chemistry).
  • Referenced by the SYSOR Reddit channel (that made a huge difference).

# 2 Revisiting operations research elements – part I: problem, model and solution
(133 unique views)

Context:

  • Motivated by crazy discussions about what a problem, a model and a solution are.
  • I read a lot before writing.
  • It was a family work [x2] (the discussions were started by my mother-in-law and Sabrina read my drafts until they were clear to someone outside the field).
  • People look for those things on Google.

# 3 How Analytics makes Operations Research the next big thing
(111 unique views)

Context:

  • Motivated by an INFORMS blog challenge [x2].
  • It has something to do with my job.
  • I found a catchy title (I guess) [x2].
  • People look for those things on Google [x2].

# 4 Optimizing Public Policies for Urban Planning
(84 unique views)

Context:

  • Motivated by an INFORMS blog challenge [x3].
  • It is something I like and worked with in the past [x2].
  • It was a family work [x3] (Sabrina has a degree in urban planning).
  • People look for those things on Google [x3].

# 5 When the Network becomes Social: Small World Graphs and O.R.
(67 unique views)

Context:

  • Motivated by an INFORMS blog challenge [x4].
  • I read a lot before writing [x2].
  • I found a catchy title (I guess) [x3].

Lessons learned:

  • People love creative applications of O.R.
  • Telling about what you like the most helps you writing better.
  • Listening to a person around you is worth reading a dozen of papers.
  • You can learn a lot by studying further the topic you want to post about.
  • It is important to focus on being direct, concise and provide resources to those interested in more.
  • Participating on INFORMS blog challenges is a win-win strategy.

Resolutions for 2012:

  • Write more posts like those above.
  • Use more visual resources and hands-on materials.

Over-constrained problems, soft constraints and family holiday parties – and why some companies ask for O.R. support

People are having fewer children, families are becoming smaller but some combinatorial problems involving them are becoming harder to solve. New families are facing harder planning and scheduling problems during Christmas and other holidays than their parents or grandparents ever did. Anyway, that’s an interesting way to explain what over-constrained problems and soft constraints are.

Suppose that you are the head of a family and you decide to run a party at Christmas evening or a banquet in the day after. One or two generations ago, it was not that hard: people used to live closer and have lots of children (I mean, more than two at least). In such case, it would not be a disaster if five out of your nine sons are not able to come over. It might be the case that families sharing common members agree on celebrating at different times. Anyway, the other parties would be so close to yours that everybody would eventually step by sometime.

However, with fewer children, people easily moving far away to pursue a career or for resting after retirement, divorced parents and grandparents running concurrent parties (maybe four grandparents married to four step-grandparents sharing a single grandchild), we must agree that pleasing everybody might become impossible.

That´s roughly what happens when some companies look for the help of an Operations Research consultant: they have a set of resources to produce goods or deliver services to their costumers and they are not sure if it became impossible to support the increasing demand with what they have or whether if they are only having a harder time to find a solution.

It might be the case that some constraints are not as important as it appeared to be. An over-constrained problem is said to be a problem upon which too many constraints are imposed, ruling out any possible solution. Looking carefully to the set of constraints, one might realize that some of them represent desirable but not mandatory situations, in which case they actually represent what we call soft constraints.

In the case of the families, what does a couple with no children and four parties in four cities at two different times do? At least in my case, we have to split in order to meet the scale. In the future, we aim at tackling this problem by running the party ourselves. 🙂

Primal and dual valuation of our natural resources using O.R.

There are many ways in which one can devise the importance of O.R. to protect our environment, many of which dealing with optimization problems related to directly reducing the costs to prevent its destruction and so on. However, what about the environmental impacts from our patterns of consumption? Shall we change our way of living dramatically or rather find a balance between what we want and what we can use from our environment? Maybe O.R. can help us on that.

Roughly speaking, Operations Research (O.R.) deals mostly with finding the best way of doing something subject to a lot of different kinds of restrictions. Thus, one can indirectly consider the protection of the environment whilst solving a wide range of different optimization problems related to the daily needs of our society. For that sake, I figure two possibilities to consider the protection of the environment:

  • pricing natural resources appropriately as a subtraction to the profit of the operation;
  • limiting their use so as to avoid that we steal the share that belongs to the future generations.

I’ve already written about the first possibility in my post about optimizing public policies for urban planning. My fiancee and I devised a model to consider the environmental costs of subsidizing low income housing units at different parts of the city in what comes to daily displacement to work. However, finding the right data to run the model turned out to be our biggest problem. When one defines a penalty to the environmental impact related to the profit of an operation – for instance, using the value of carbon credits – it represents a cost to the problem. However, sometimes we might not have data to price it. But if the consumption of a given natural resource is limited by a constraint instead of penalized in the profit, it is still possible to figure the economic importance of such resource through duality. Therefore, let’s take a look at the second possibility as an alternative to finding the price of natural resources – as well as avoiding an excessive use of them.

The concept of duality in linear programming allows us to associate costs to our constraints. Suppose that our optimization problem is about finding the amount of goods of each kind to produce in order to maximize the profit that they generate subject to the limited resources available. The dual of this problem consists of finding the price of each unity of our limited resources in order to define the minimum price at which it is worthier to sell them instead of processing subject to how much profit each finished good would give us. The relationship between those two problems is quite strong: if a resource is not used up to its limit, its dual cost is zero – meaning that it does not have an economic importance according to the model. Therefore, duality can help us devising how much a limited resource is worth (if it is worth something) and thus provide a way of valuating resources according to their limitation and importance.

As a matter of fact, the more you understand the relation between primal and dual problems, the easier it becomes to talk face-to-face to economists. Indeed, this topic has a lot to do with the 1975 Nobel Prize in Economics. If you want to know more about that, the prize lectures from Kantorovich and Koopmans are a good starting point.

This post was written to the September’s INFORMS blog challenge: O.R. and the Environment.

When the Network becomes Social: Small World Graphs and O.R.

Mathematicians have been studying graphs for a long while. Sociologists found out that some of them explain how we interact. Indeed, social networks just make the connection more evident to anyone. In the middle of that, some researchers have been wondering about the following question: can we make optimal decisions based on our local information into a social network?

A world of lines and dots…

Dots and lines connecting pairs of dots – that’s a graph (but we usually say vertices and edges – or nodes and arcs – when talking about them). Mathematicians study graphs because they are structures capable of modeling lots of relationships among entities. Sometimes they wonder if a property found in a certain graph implies another one, developing statements to the Graph Theory. Other times they want to leverage those properties when designing an algorithm that manipulates certain types of graphs, like in Combinatorial Optimization algorithms. As a matter of fact, that is not an isolated case – many researchers handling real-world problems aim at designing algorithms with an outstanding performance for the most common instances they expect to solve.

… and the world of people!

Many people have already heard about the “six degrees of separation” principle, which states that – on average – you can reach any person in the world through a chain of six people that know one another. Such “magical number” emerged from experiments of Stanley Milgram and others during the 1960’s, in which they asked a person in the U.S. to deliver a letter to another person by submitting it to someone that he/she knew and who he/she supposed to be closer to such person. Theoretical results also point something similar: for a random graph, the average shortest distance among pairs of vertices is proportional to the logarithm of the number of vertices, what means a very slow pace of increment as graphs get bigger and bigger. However, that is not truefor any graph. Instead, people started looking to a more specific class of graphs called Small World Graphs, which are supposed to be representative of a number of situations.

Small World Graphs to be explored everywhere

Small World Graphs can be though as a combination of lattices (grids of edges) and clusters or quasi-clusters (groups in which almost all edges exist among vertices) with a small average degree (number of edges from each vertex). The former property ensures that the graph is connected and it is possible to find a path among any pair of vertices. The later has to do with the fact that two vertices sharing an edge with a third one are more likely to share an edge among them. Think about it: you might know some people from your university, almost everyone from your department, whereas each of your colleagues is more likely to have long range connections with researchers sharing a common interest worldwide; and all of that together means that you don’t need many steps to reach most of the researchers in the world. The same goes valid for airports: your local airport might be connected to a number of airports in other cities of your country and some of them are connected to airports worldwide in a way that you can attend to your meetings everywhere without worrying too much about how to get there. However, if you need to think about it, you might probably come up with a very good answer, isn’t it?

Do we always have optimal answers from local network information?

That’s the question that Jon Kleinberg tries to answer in the article “The Small-World Phenomenon: An Algorithmic Perspective”. He claims to have found the class of graphs for which such local information ensures an optimal decision. To be honest, I didn’t read the entire paper (I’m in really busy times) but it sounds really interesting and I left to the curious reader such task (let me know about it after).

This post was prompted by the INFORMS Blog Challenge of July: OR and Social Networking. You can check all the submitted entries by August 4th.

The model as a spell and the solver as a wand: O.R. magic for a muggles’ world

Who cares about O.R. magic?

When I said once to my sister that my former job was to put more fridges on each truck to save delivery trips (something that many of my colleagues consider a joyful job), she couldn´t be less interested. Maybe I should have tried to use magic metaphors to describe models as spells, solvers as wands and programming contests as Quidditch games for students. Despite those interested in profits and costs, operations research practice sounds really boring to the general audience.

Who believes in O.R. magic?

We are embedded in optimization problems that are usually overlooked. As a result, tackling one of them might look like plain witchcraft to an outsider: how come that costs were reduced by 5% or profit raised by 20% just like that? Of course that such witchcraft may need to compete with quack consultants selling a sole system supposedly capable of solving whatever problem the client has. Apart from a parcel of executives and engineers, O.R. seems to be hovering between unfamiliarity and suspicion to many, what means a lot of opportunities lost.

How to bring them in or back to OuR magic?

Paul Rubin had many insights about that: he presented a very sound analysis about “hitting muggles” on his blog to target high-level executives, business students and small organizations. Indeed, I’ve been on training classes at Petrobras along with many young economists that have been just hired and most know little but are very interested about operations research. I hope they enjoy the O.R. lectures to be held.

Nevertheless, I would like to praise for a holistic education about O.R. for engineers and IT professionals. Being so diversified, O.R. involves fields as diverse that practitioners of some are not fully aware about the existence of others that would suit their needs. Moreover, complex software systems are very likely to require O.R. at some point but system analysts and system architects might not be aware about that. In both cases, an interesting application – if not ignored – might be approached with the wrong spell or wand! Despite how much I believe in magic, I know that I’m a muggle sometimes.

How Analytics makes Operations Research the next big thing

Engineers enjoy laughing at buzzwords that they don’t sell. Despite that, some buzzwords represent important paradigm-shifts. They might not propose any technical novelty but they do contribute to empower our methodologies by valuating the presence of certain skilled specialists in large scale projects. Analytics is one of them: it represents the application of IT to support business decision processes. This post aims at showing that its existence can help leveraging O.R. practice in the industry.

The O.R. filet: there is no such thing as a free lunch!

Who did not wondered about that dream job in which all you have to do is what you do for fun? Suppose that you are an O.R. analyst hired by a company which provides you perfect data and a well-defined problem that you know how to tackle. And that’s not all: they do not underestimate the amount of effort that the project will demand from you and your co-workers. In such a perfect world, you just pick that traveling salesman or bin packing problem with that idealistic instances and expend some time experimenting your favorite techniques until you get satisfied with the results. You would probably finish your work very early and have the rest of the day to share a beer and French fries with your friends at the bar (if that happens in São Paulo).

Back to real life realm: from problem solver to problem finder

Unfortunately, there is a huge gap from being hired until possessing that well-defined problem and that perfect data. That is, if you manage to reach that point. If companies had already all of that figured, they would probably have gone beyond with an in-house approach to their decision problems. In such case, the benefit of an external OR consultant work would be often quite shy. Hence, one must mind that the work is not only about solving an optimization problem but rather helping the company to understand what the problem is and how to collect data to properly solve it.

Some interesting discussions about those issues have been recently raised by a couple of OR professionals called Patricia and Robert Randall on their blog Reflections on Operations Research. They have a blog post about data cleanup and two other posts about understanding what is the right solution for the client’s problem (by the way, I’m waiting for the promised sequel – check the first and the second posts).

And then the O.R. team becomes the Analytics division…

What I exposed before reflects the change that is going on in industry, including my workplace. The O.R. team is no longer called once someone “magically” finds an optimization problem that must be tackled within an IT project. By “magic”, I mean that someone working in a project knew about O.R. by chance and decided to invite an O.R. analyst to check it. Instead of that, new projects are supposed to pass through a preliminary assessment of the need of an O.R. approach. The analytics professional comes into scene to complement the team of software architects, software engineers, data modelers, project managers and stakeholders of any non-ordinary project. The role of that professional is to understand how the system can be used to support business decision-making and define whether statistics, data mining or operations research tools are required to accomplish that. Such assessment avoids that something pass uncaught or misunderstood and, of course, creates lots of interesting opportunities for O.R. professionals both at the assessment and later at the project development phase. As a matter of fact, we have plenty of people ready for the job, as I told last month in a post about O.R. job market in Brazil.

A gain-gain scenario: let’s spread the word about Analytics to empower O.R.!

An Analytics assessment of strategic projects would endorse a broader application of Operations Research, what usually means maximizing profit and reducing costs. Moreover, there is a huge workforce available to the demand that such paradigm-shift would incurs, including me and probably you. So let’s make that happen!

This post is my contribution to the INFORMS’ blog challenge of May: O.R. and Analytics. The INFORMS’ blog challenge consists of a monthly topic about O.R. that is proposed at the INFORMS’ blog. If you happen to write about the topic of the month, send an e-mail to them to get your post mentioned.

Drug discovery optimization: a meeting point for data mining, graph theory and operations research

How can I help finding the best compound to save a life?

In a glance, the answer is what research in optimizing drug design is about. Its main goal is to cheapen and speed-up drug development whilst producing more effective and less toxic pharmaceuticals. It is an interesting topic that involves much of what business analytics and optimization professionals use daily to solve problems in many other areas. For that reason, I decided to write about that for April’s INFORMS Blog Challenge: O.R. in Health Care.

Drug design basics: in vitro vs. in silico methods and QSAR analysis

Before a drug can be prescribed to humans, it must pass through a careful evaluation. However, it would not be reasonable or even feasible to test every possible compound for each desired effect, especially if such tests involve animals. Pharmaceutical researchers are interested in screening molecules on a huge data set, so that only those molecules that are most likely to work are ultimately tested in vitro, in animals and ultimately in humans. They accomplish that with computer-aided predictive analysis commonly called in silico tests.

In silico tests are designed in the belief that there exists a valid theory to extrapolate quantitatively the activity of a compound based on similar ones, using tools like Quantitative Structure-Activity Relationship (QSAR) analysis. The goal of a QSAR analysis is to find a function capable of predicting the activity of the main molecule of a compound based on the presence and quantity of certain molecular substructures. We are then lead to the quest for a way of representing molecules and to use available data about an activity to create trustable predictions for similar molecules.

The role of data mining and molecules as graphs

For many purposes, a molecule is just a graph with vertices and edges. However, the fact that molecular data are more structured does not mean that the problem is easier. Roughly* deciding whether one graph can be found inside another one is know as the Sub-Graph Isomorphism problem. That problem is NP-Complete, what means that there is not know an efficient algorithm to solve it.

(* By roughly, I ask you to forgive the lack of formality in which I defined sub-isomorphism: graphs are not said to be inside each other. A one-to-one correspondence between one graph or sub-graph and another one is called an isomorphism. However, if I was to tell it at first, I would lose most of the audience.)

More than just finding a sub-graph isomorphism, one is usually interested in find the most frequent ones. One interesting approach to Frequent Sub-Graph (FSG) mining is gSpan, a DFS-based mining algorithm proposed by Yan and Han. It consists of defining a unique lexicographic encoding for each sub-graph so that it can be counted more easily while exploring molecular data. There is also a controversy about the validity of 2D models such as graphs for modeling molecules, specially because some geometric subtleties differ a innocuous molecule from a a carcinogenic one. Thus, it is worth of notice that predictive models are not intended to skip in vitro tests but just point out which molecules are most likely to work out.

How can operations research fit in?

There are a number of O.R. applications for what we have been discussing here.

I would like to mention one interesting application of Linear Programming (LP) to QSAR analysis by Saigo, Kodawaki and Tsuda. Their approach consisted of using LP to build a regression function for prediction in which error is minimized. It is based on positive and negative relations between molecules and activities, that is, a quantitative relation among each molecule and the activity or a constraint imposing that such relation is not relevant. The decision variables of the problem are the coefficients associated with every possible substructure. Since there can be a lot of possible substructures, they start with a small problem and there is a column generation algorithm that tries to find a new variable whose addition to the problem might improve the results.

Final remarks: OR and the invisible good

The fun of many professions is to see the benefit of what you are doing. That is even more present in health care, since you are dealing with people and their lives. On the other hand, OR is the kind of invisible science, which is on its best use when the average people can not sense its presence. Notwithstanding, OR practitioners enjoy numbers and are glad to know that things are a bit better because of their work behind the scenes. Despite that, blogging about OR can help people notice the importance of the area to support others that are commonly visible to everyone.

I would like to thank my mother-in-law for some helpful advises to write about QSAR.

 

Optimizing Public Policies for Urban Planning

Another possible title would be “Marrying an urban planner up to her research problems”.

It all happened when I started hearing my fiancée explaining the problems of Brazilian housing policy and ended up with an interesting model for sustainable housing subsidy that she presented at the Latin American Real Estate Society meeting of 2009.

The problem: Housing policy in Brazil 

The main housing subsidy for low-income families in Brazil is based on a federal program roughly called “My House, My Life”, in which public subsidy consists of an amount of money for each constructed housing unit that varies solely according to the city.

In a place like São Paulo, the unavoidable consequence of such policy is to further sprawl low-income families towards suburban areas.
Most of them already spend much more than two hours per day to go to work.
Moreover, great distances prevent them from going to work on foot or bicycle, what raises public transportation demand and decreases their purchasing power.

15183601

São Paulo in 2009: Job offer vs. low-income settlement (source: SEMPLA).

What we did: A model for variable subsidy

We started looking for the main problems of distant placement of such families:

  • Bad life quality due to the time wasted for displacing. 
  • More pollution due to the large displacements. 
  • Public expenditures to support the public transportation system.

    Instead of creating a complex multi-criteria model to tackle that, we just considered that people must be placed at most one hour apart from 50% of the jobs in the city (seems fair, right?) and considered the criteria in which anything is actually done: money!

    After all, how much does it cost to the government if families are so far from their jobs?

    • If the place is too far, one must account the per-family cost on bringing adequate public transportation infrastructure up to there. 
    • Depending on the modal of transportation, there must be public subsidies and also the carbon footprint cost, which were accounted for two people per family to work for one generation (25 years).

      Thus, if you have a 20K subsidy to place a house or apartment anywhere in the city, it would be fair to raise that in places where nothing else must be done. For instance, we realized that extending a subway line costs about 20K per family, that is, government actually spends the double to bring adequate infrastructure, if not more.

      15795412

      São Paulo’s downtown: A very good infrastructure and many abandoned buildings.

      What’s next?

      To what concern the O.R. community, it is not a challenging problem in terms of algorithms but there is a lot of room to improve modeling.
      Unfortunately, we did not apply our model completely due to the lack of data, but we hope to do that someday.
      However, it is already possible to devise the possibilities of reducing public expenditures, and forthcoming approaches would provide integrated decision models for housing subsidy and infrastructure investments.

      Talking about multidisciplinary: O.R. at the Public Sector

      A multidisciplinary approach might start at your own home, and it may point out interesting challenges that academia might not have been devised so far.
      In Brazil, where public sector decisions are all but technical, there are big opportunities yet to be explored.

      About the paper

      We were glad to have had support from Prof. J.C.S. Gonçalves, which previously was Sabrina’s advisor at FAU-USP and also coauthored this paper.

      S. Harris, T. Serra, J.C.S. Gonçalves: Critical Analysis of Brazilian Housing Policy and Proposal of Subsidy Calculation Model Based on Transportation Sustainability Criteria.
      In: Proceedings of LARES Conference 2009, São Paulo, Brazil.