CDF-intervals: a probabilistic interval constraint framework to reason about data with uncertainty
School
Supervisors
Abstract
We propose a novel framework, the cummulative distribution function (cdf )-intervals, that intuitively describes data coupled with uncertainty without losing any information given in the problem definition. Our new proposition brings about a construction of two algebraic convex structures: the cdf -intervals and the Probability Box (p-box) cdf-intervals. The two proposed structures were driven by the practical usage of reliable approaches in Constraint Programming (CP) and Operation Research (OR) paradigms. These approaches tackle large scale constraint optimization (LSCO) problems associated with data uncertainty in a tractable manner. The key idea is to bound data observed in the problem definition, then perform the computation only on the bounds using interval reasoning techniques. Output solution set from this process satisfies all possible realization of the data sought. Approaches following the convex modeling commonly treat data in their interval representation with an equal weight, thus they do not reflect any possible degree of knowledge about the whereabouts.
Motivated by bringing more knowledge to the realized solution set, we introduced the cdf -intervals in [Saad, Gervet, and Abdennadher (2010)]. The bounding points, in the cdf -intervals algebraic structure, each is specified by two values: data and its cumulative distribution function (cdf ) [Saad et al. (2010)]. This new structure attempts to represent data in a 2 dimensions (2D) manner, yet the probability distribution (the 2nd dimension) is an approximated representation of the actual distribution. We further extended the cdf -intervals, with the notion of p-box, in order to enclose all available information by two cdf distributions [Saad, Gervet, and Fruehwirth (2012b),Saad, Gervet, and Fruehwirth (2012a),Saad, Fruehwirth, and Gervet (2014) and Saad (2014)]. The bounding distributions are chosen to be uniform in order to ease the computations over the novel algebraic structure. The probabilities, within those bounds, are ranked based on the stochastic dominance. In this work, we define the formal frameworks for constraint reasoning over the cdf -intervals and the p-box cdf -intervals. The modeling and reasoning are constructed within the CP paradigm due to its powerful expressiveness. Moreover, we construct a system of global constraints, over the two algebraic structures, by extending Interval Linear Systems (ILS) with a second dimension (the cdf ). We develop a formal Constraint Logic Programming (CLP) language from the new defined domains and show how the new domains affect the problem variables and the decision process. We implement the new language as a separate solver module in the ECLiPSe constraint programming environment.
The p-box cdf -intervals combine techniques from the convex modeling, to take advantage of their tractability, with approaches revealing quatifiable information from the probabilistc and stochastic world, to take advantage of their expressiveness. We perform a comparison in the data representation and in the reasoning performance over models from the two paradigms and our novel framework. This comparison is further adopted to model two different real-life applications: the Network Traffic Application problem, used in network design problems, and the Inventory Management problem of a manufacturing process. The empirical evaluation of our implementation shows that, with minimal overhead, the output solution set realizes a full enclosure of the data along with tighter bounds on its probabilistic distributions. Solutions sought to be feasible in the real domain are excluded by the p-box cdf -intervals reasoning since they are infeasible because they violate the properties of the cdf -domain. Additional knowledge, on the data whereabouts, gained by the implementation of our novel formal and practical framework gives rise to a wide range of future research work.