Academia.eduAcademia.edu

On the performance effects of unbiased module encapsulation

2009, Proceedings of the 11th …

A recent theoretical investigation of modular representations shows that certain modularizations can introduce a distance bias into a landscape. This was a static analysis, and empirical investigations were used to connect formal results to performance. Here we replace this experimentation with an introductory runtime analysis of performance. We study a base-line, unbiased modularization that makes use of a complete module set (CMS), with special focus on strings that grow logarithmically with the problem size. We learn that even unbiased modularizations can have profound effects on problem performance. Our (1+1) CMS-EA optimizes a generalized OneMax problem in Ω(n 2 ) time, provably worse than a (1+1) EA. More generally, our (1+1) CMS-EA optimizes a particular class of concatenated functions in O(2 lm kn) time, where lm is the length of module strings and k is the number of module positions, when the modularization is aligned with the problem separability. We compare our results to known results for traditional EAs, and develop new intuition about modular encapsulation. We observe that search in the CMS-EA is essentially conducted at two levels (intra-and extra-module) and use this observation to construct a module trap, requiring super-polynomial time for our CMS-EA and O(n ln n) for the analogous EA.

On the Performance Effects of Unbiased Module Encapsulation R. Paul Wiegand Gautham Anil Ivan I. Garibay Institute for Simulation & Training University of Central Florida Orlando, FL 32826 School of Electrical Eng. & Computer Science University of Central Florida Orlando, FL 32826 Office of Research & Commercialization University of Central Florida Orlando, FL 32826 [email protected] [email protected] [email protected] Annie S. Wu Ozlem O. Garibay Office of Research & Commercialization University of Central Florida Orlando, FL 32826 [email protected] School of Electrical Eng. & Computer Science University of Central Florida Orlando, FL 32826 [email protected] ABSTRACT General Terms A recent theoretical investigation of modular representations shows that certain modularizations can introduce a distance bias into a landscape. This was a static analysis, and empirical investigations were used to connect formal results to performance. Here we replace this experimentation with an introductory runtime analysis of performance. We study a base-line, unbiased modularization that makes use of a complete module set (CMS), with special focus on strings that grow logarithmically with the problem size. We learn that even unbiased modularizations can have profound effects on problem performance. Our (1+1) CMS-EA optimizes a generalized OneMax problem in Ω(n2 ) time, provably worse than a (1+1) EA. More generally, our (1+1) CMSEA optimizes a particular class of concatenated functions in O(2lm kn) time, where lm is the length of module strings and k is the number of module positions, when the modularization is aligned with the problem separability. We compare our results to known results for traditional EAs, and develop new intuition about modular encapsulation. We observe that search in the CMS-EA is essentially conducted at two levels (intra- and extra-module) and use this observation to construct a module trap, requiring super-polynomial time for our CMS-EA and O(n ln n) for the analogous EA. Theory, Algorithms, Performance Categories and Subject Descriptors F.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity; G.1.6 [Numerical Analysis]: Optimization Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’09, July 8–12, 2009, Montréal Québec, Canada. Copyright 2009 ACM 978-1-60558-325-9/09/07 ...$5.00. Keywords module encapsulation, runtime analysis, search space bias 1. INTRODUCTION Many researchers have begun to develop complex encodings that try to exploit known regularities in a problem using notions like modularity in the representation space [10, 8, 17]. Modular representations permit repeatable and reusable components of a candidate solution (modules) to be embedded into the encoding of candidate solutions, which can alter the structure of the search space and the performance of the EA operating on that space. Indeed, many engineering problems seem well-addressed by algorithms using representations that make use of such ideas (e.g., multiagent learning [3]). A useful survey of concepts and definitions for modular representations appears in [4]. Empirical investigations into modularity have primarily focused on understanding how regularity in the landscape affects such representations [10, 15, 8]. Algorithmic design choices include: how many modules exist, how they should be composed, whether they can be dynamically discovered or generated, or whether there is a static a priori list of possible modules [2, 6]. Proponents of modular representations highlight, among other advantages, the potential for these representations to provide greater scalability for increasingly complex problems. Theoretical approaches to understanding modularity have been much more limited and mainly confined to general investigations of genotype-phenotype mapping [19, 18]. A more focused analytical study of the effects of module encapsulation appeared recently [9], where simple modular encapsulations are considered from the perspective of searchspace bias. That is, modular representations can introduce a kind of distance bias by over-representing some portion of the search space and under-representing another. As a kind of baseline, they introduce a representation where solutions are encoded only by modules and the potential module set consists of all possible substrings of a certain length (the complete module set, CMS). The paper proves that, via the application of a particular distance measure, this leaves the resulting search space unbiased. The authors focus their formalism on a static landscape analysis, using empirical studies to connect notions of representational bias to performance based on a well-known informal difficulty measure, and the experimental results suggest that unbiased representations will not impact performance when problem landscapes are well-suited to their distance measure. Here we address the issue of scalability directly by formally examining the performance impact of such unbiased modular representations. The idea is to extend initial static investigations of landscape bias by replacing empirical study with a rigorous runtime analysis of performance. We study very simple algorithms, variants of the so-called (1+1) EA [5] in order to focus on our primary questions of representation, though at least one result generalizes to more sophisticated methods. Additionally, we confine ourselves to this first, simple modularization using a complete module set as an introductory step into performance analysis of modular representations. We learn that even with a simple problem that appears quite compatible with the difficulty estimate used in existing experimental results, representations using unbiased modular encapsulation can still have provably different performance results from an analogous EA. Specifically, we show that a simple (1+1) EA will outperform our CMS variant on a generalized OneMax problem. We then generalize this bound for our CMS-EA to a particular class of aligned concatenated functions, providing upper and lower bounds on such functions. These analyses permit comparisons with traditional EAs on several problem classes and provide new intuition about modular encapsulation: When problems align well with the separability of the function, the module length relative to the problem size of an unbiased modular encoding can have a bracketing effect on the performance scaling of the EA, constraining what is possible for both the upper and lower bounds. Additionally, we see how search in the CMS-EA is essentially conducted at two levels (intra- and extra-module). This insight enables us to construct a module trap, which is super-polynomially difficult for the CMS-EA and quite straightforward for the EA. The take-home message is that even simple modularization schemes for simple algorithms applied to simple problems generate surprising results and deserves further theoretical attention. Runtime analysis can be a useful way to extend existing efforts to understand the static effects of representational bias generated by a modularization scheme. The next section provides the technical background by formalizing our notion of modular encapsulation and describing the runtime analysis framework we employ. Section 3 describes the two algorithms we consider in this paper and motivates the rest of the discussion by providing a lower-bound for our CMS variant on generalized OneMax. Section 4 provides the analysis of the more general class of aligned concatenated problems and provides bounds for examples both in and out of that class. The final section discusses what we conclude from these results. 2. BACKGROUND 2.1 Modular Encapsulation The formalism for mapping from a modular encoding to the true problem encoding discussed in [9] is quite general, and we refer the reader there for that framework. Here we focus only on a relevant subset of modularization schemes on which their analysis was conducted and simplify our formalism for the purposes of clarity of exposition. Where possible, our notation is consistent with both this work and the traditional runtime analysis literature; however, where this is confusing, we employ the latter. We apply our algorithms to pseudo-Boolean problems, where the domain space consists of binary strings of length n, f : {0, 1}n 7→ R. We consider a module to be a symbol that represents some binary substring of fixed length, Mi ∈ {0, 1}lm and use a complete module set (CMS), M, consisting of symbols representing each copy of all such strings exactly once. That is, M := {M0 , . . . , M2lm −1 }, where Mj is the j th binary string of length lm indicated in M. A potential solution is encoded as a string of module symbols, which is translated into a binary string by simple substitution. Before substitution, we write such a solution m := m1 m2 · · · mk , where k is the length of the modular string and n := k · lm . We write the final solution string, after substitution, x := x1 x2 . . . xn . When it is useful, we occasionally use the notation x(i) to refer to the substring corresponding with the ith module of the string, mi = x(i) := x(i−1)lm +1 · · · xi·lm . Examining both an un-encoded binary representation in conjunction with the above CMS representation, it is easy to see that there is no difference in the underlying points of the space being represented. That is, in the final search space, algorithms employing either representation will have access to all 2n search points, and none are any more or less represented than another. A slightly subtler question is whether or not there are topological changes to the space as a result of the encoding, and such a question can only be answered when some kind of distance measure is imposed. [9] examine the effects of modularization on the space. They employ Hamming distance to the global optimum to show that average Hamming distance between the pre- and post-encoded spaces do not differ when using a complete module set. In that sense, the encapsulation is unbiased because the aggregate relationship of underlying search points to the optimum is unchanged. Biases can be introduced with more sophisticated choices about module set composition. That analysis is static, considering only the effects on the representation space given a distance measure. To say something about performance of algorithms employing these schemes, the static descriptions of bias are connected to a well-known “difficulty” measure in the EC literature, fitness distance correlation [14], which relates distance to the optimum to solution quality. This difficulty measure is used to help make predictions about how these bias properties will perform experimentally. 2.2 Performance & Runtime Analysis Though it may be informative in certain circumstances, fitness distance correlation is a poor general predictor for EA performance [1]. Moreover, the measure is meant to describe a problem’s “difficulty” and, whether or not it achieves this, it is clear that what is difficult for an EA using one rep- resentation may not be difficult for another. It’s instructive to consider what one wants out of performance comparison. If performance and scale of performance are of interest, it is productive to replace the empirical studies following static analysis of modular representation with runtime analysis of the first-hitting time of an algorithm to the global optimum. Here we try to elicit bounds for the expected number of function evaluations before an EA first evaluates the optimum, as well as provide bounds on the success probabilities [16, 20]. Since these bounds are typically expressed as function classes dependent on the size of the problem, and since one of the advantages modular representations are believed to have is scalability [4], determining such bounds as a function of problem size is particularly useful for further study the performance of modular representations. A challenge for those conducting runtime analysis as a follow-on to existing work is the tendency for researchers to neglect to specify the functional relationship between certain algorithmic parameters and the problem size [12]. In this case the trouble comes from lm : Is it constant? Does it grow linearly with n? For our analysis, we primarily focus on lm := lg n; our reasoning follows. If the length of the module string is constant in n then there is no real difference between a modular encapsulation and a representation that uses a non-binary alphabet. Of course either can impact performance and may be worthy of study, but our view is that true “modular” representation is one in which the modules have some hope of incorporating useful sub-portions of (some) problems, and that suggests some non-constant relationship to the size of the problem. If the length of the module string is linear in n then the size of M grows exponentially as n increases, which is impractical if an implementation explicitly stores M (though often it isn’t necessary to do so). With our choice, the size of the module set grows linearly with n, which is a useful property since it leads to a kind of apples-to-apples compatibility with the (1+1) EA, which we will discuss later. Other scaling function √ classes for lm are possible and interesting, of course,lm := n in particular. More generally, one could imagine a class of scaling functions lm := nǫ , where ǫ ∈ [0, 1]. When ǫ is 0, our CMS-EA is equivalent to an analogous EA, and when it is 1, our CMS-EA is random search. A thorough analysis of the effects of the range in between is beyond the scope of this paper; however, we will keep these extremes in mind in our discussion. 3. ALGORITHMS ANALYZED For several reasons, we concentrate our analysis on very simple evolutionary algorithms, variants of the so-called (1+1) EA. First, we wish to focus on the question of representation rather than other algorithmic details, so it is sound to consider the simplest algorithm that demonstrates our points. Second, there are now many theoretical runtime results for this simple EA, so concentrating on this provides the most opportunity for comparison. Still, at least one of our bounds is quite general, and we will discuss the extent to which it will include other algorithms. 3.1 Pseudocode Without loss of generality, our algorithms are described for maximization. We describe them without stopping criteria since our interest is in the expected time until the algorithm first evaluates a global optimum (first-hitting time). The (1+1) EA begins with a single random parent, produces a child by the standard bit-flip mutation operator, and replaces the parent with the child if it is at least as good. Algorithm 1 (1 + 1) EvolutionaryAlgorithm 1. Choose x ∈ {0, 1}n , uniformly 2. t := 0 3. Mutation: Create x′ ∈ {0, 1}n by copying x and independently flipping each bit with probability min{1/n, 1/2} 4. Selection: If f (x′ ) ≥ f (x) then set x := x′ 5. t := t + 1 6. Continue at line 3 Our CMS variant functions similarly, except that it operates on the modules directly and only decodes module strings for evaluation. It maintains an individual that has k modules, produces a new individual by applying mutation, then decodes the two module strings into binary strings and evaluates their fitness. If the child is at least as good as the parent, the child’s module string replaces the parent’s. The important observation here is that the mutation operator must change to accommodate the modular representation. We use the same operator described by [9, 7]: Each position has an independent probability of having mutation applied to it, 1/k. However, now we choose a new module from the module set uniformly at random. This means that there is some (low) probability that the very same module will be selected, which essentially implies that the effective mutation rate is somewhat smaller than 1/k. Algorithm 2 (1 + 1) CMS-EvolutionaryAlgorithm 1. Choose m ∈ Mk , uniformly 2. t := 0 3. Mutation: Create m′ ∈ Mk by copying m, then for Each m′i do Decide if mutating with probability min{1/k, 1/2} If mutating, choose a new m′i ∈ M, uniformly end for 4. Mapping: Decode m → x, m′ → x′ by substitution 5. Selection: If f (x′ ) ≥ f (x) then set m := m′ 6. t := t + 1 7. Continue at line 3 3.2 Why it Matters: A Tale of Generalized OneMax Since both modular bias and fitness distance correlation are measures of search point distance to the global optimum, and functions from the simple generalized OneMax class are defined in terms of the very same distance measure, an easy inference to draw is that unbiased modular encodings will not impact the performance of the algorithm on such problems; however, this is not true. Consider the generalized OneMax problem: OneMax(x, x̂) := n − n X i=1 |xi − x̂i |, where x̂ specifies a target string (global optimum). Theorem 1. The expected first-hitting`time for the ´ (1+1) CMS-EA on generalized OneMax is Ω 2lm k ln k if it is initialized with no modules solved. Proof. We examine the event when mutation produces a specific module symbol that decodes to the correct substring in the target string, referring to such an event as having considered the solution to that module. For this to happen for a module position, we must both have a mutation event and that event must produce the correct value, the probability for which is k1 · 2l1m . The probability that this event does not happen in t − 1 steps is: «(t−1) „ 1 1− l 2 mk The probability all k modules have been considered is «(t−1) !k „ 1 1− 1− l 2 mk ` lm ´ Let (t − 1) := 2 k − 1 (ln k − ln ln n). Using the wellknown inequality (1 − 1/n)(n−1) ≥ 1/e, we have: «(2lm k−1)(ln k−ln ln n) „ ln n 1 ≥ e−(ln k−ln ln n) = 1− l 2 mk k So the probability that the algorithm will have considered all module symbols needed in the solution string within t − ` ´k 1 steps is at most 1 − lnkn ≤ e− ln n = 1/n; thus, the probability that it takes at least t steps is 1 − 1/n. Noting that X X t · P r{T = t}, P r{T ≥ t} = E{T } = t∈[1,∞] t∈[1,∞] we have E{T } ≥ „ « “ ” 1 lm 2 k − 1 (ln k − ln ln n) 1 − n = Ω(2lm k ln k) The proof shown above is a special case of the work in [5] and follows their method. The (1+1) EA solves this problem class in ”Θ(n lg n) time while our CMS variant is “ 2 Ω lgn n ln(n/ lg n) = Ω(n2 ) when lm = lg n, provably worse. Obviously this difference is relatively minor; however, it serves to motivate the analysis below. The Hamming distance correlation of search points to the optimum is an ideal distance measure for this landscape. Yet in spite of the fact that the CMS is unbiased according to that measure, these two algorithms scale by different function classes. At issue is the effect of the growth in the module length on the mutation operator, as we shall see. 4. CONCATENATED FUNCTIONS As in [5], the lower bound we just presented is much more general than OneMax. Indeed, it holds (perhaps loosely) for any function with a unique global optimum — one provably worse than that of the (1+1) EA when lm := lg n. The reason for this is that modularity forces a change in the mutation operator that induces a different topology on the underlying space than does bit-flip mutation. Since the length of the module string increases with n, the probability of mutating to the module containing the correct substring of the solution decreases. For any non-constant lm , this lower bound must be higher than Ω(n ln n), so the scale of lm has a kind of “raising the bar” effect on what is possible for a lower bound. Also, note that the ratio of the length of the modules to the length of the candidate solution string shrinks as n increases. Since the number of modules in the module set is linear in n, there is a kind of underlying compatibility with the (1+1) EA: given that a mutation event occurs, the correct module can be found with probability 1/n. The difference is in what the algorithms do with the information inside a module — the (1+1) EA treats all information the same and ignores substring boundaries, while the (1+1) CMS-EA ignores most information inside the substring and concentrates search on assembling the useful modules. Indeed, the algorithm is performing two simultaneous searches: a mutation/section based search of useful modules and an essentially random search to find the correct module values. This observation leads one to think naturally of functions in which this dual-search notion is exploited, where the problem itself is composed of pieces, so-called concatenated functions. We define a class of generalized concatenated functions below, using the same notion of generalization we did for the OneMax problem above: Definition 1. Let x̂ ∈ {0, 1}n be the unique global optimum of some function f : {0, 1}n 7→ R. We partition all search points, x ∈ {0, 1}n , into r pieces each of length lr such that x(i) := x(i−1)lr +1 · · · xi·lr . Given some function g : {0, 1}lr × {0, 1}lr 7→ R, we define a generalized concatenated pseudo-Boolean function by r “ ” X f (x, x̂) := g x(i) , x̂(i) i=1 Further, when lr = lg n, we refer to such functions as logseparable concatenated. Since we require f to have a unique global optimum and that it be composed of identical functions operating over different portions of the solution string, g must also have a unique global optimum. Also, since the g function might itself be a linear combination of subordinate functions, the log-separable concatenated stipulation does not restrict the class to only functions that are precisely separable on lg n boundaries, but to those that are at least separable along such boundaries (e.g., the generalized OneMax just discussed is such a function). In fact, there are many functions that can fall into this class (e.g., concatenated Trap and Clob, discussed below). Finally, while we have conveniently arranged our notation to match the definition for the module boundaries of the CMS-EA offered above, it’s clear this needn’t be so. The bit positions themselves might be presented in a different order and the underlying function would be the same (if the corresponding positions in x̂ were also re-ordered in the same way). This is important because the CMS-EA makes an implicit assumption about where good module boundaries may be, much as compositional coevolutionary algorithms do [13]. When the module boundary and the boundaries of the pieces of the log-separable concatenated function are the same, we consider the algorithm’s modularization to be aligned with the problem. Below, we consider the general case of when modules are aligned with concatenated functions, as well as a counter example for CMS-EA that is outside this class. 4.1 Module Alignment The purpose here is to show how the bounds on the CMSEA’s runtime behavior applied to the class of generalized concatenated functions is affected when the modularization is aligned with the problem. Consequently, we intentionally ignore the details of the component function for the underlying pieces except to insist it have a unique global optimum. Theorem 2. The expected first-hitting time for the (1+1) CMS-EA on a function from the ` ´ class of generalized concatenated functions is O 2lm kn when the modularization is aligned with the problem. Proof. By our definition, using a modularization that is aligned with the problem implies that lr ≤ lm , so we use lm in all cases below for simplicity. We first consider the case where all g(x, x̂) are unique. Without loss of generality, we consider the values ordered by their component function value, writing g1 for the maximal value of g and g2lm for the minimal value. We proceed by fitness-based partitions [5], defining 2lm k + 1 levels: Lij LOpt {x| := = gj−1 · (i + 1) + gj · (k − i − 1) P > kd=1 g(x(d) , x̂(d) ) ≥ gj−1 · (i) + gj · (k − i)}, ∀i ∈ [0, k − 1], ∀j ∈ [2, 2lm ] ( ˛ k ) ˛X ˛ (d) (d) Lk1 := x ˛ g(x , x̂ ) = g1 · k ˛ d=1 To increase a level, the algorithm must mutate a module to a value that receives a larger payoff than the current value without generating deleterious mutations. Though there can in principle be a large number of combinations of component function values that result in a particular fitness level, for a given level Lij , there must be at least one module value at or below the gj−1 level. We pessimistically assume that the we must mutate only that module to some larger value. This ` ´k−1 j−1 ≥ e2j−1 occurs with probability at least k1 1 − k1 lm k . 2l m lm We must make at most 2 k such level advances and we estimate the total time by summing the expected time for each of such level increases: E{T } ≤ = lm k−1 2X kX e2lm k j−1 j=2 i=0 ” “ “ ” O 2lm k2 ln 2lm = O 2lm kn e2lm k + Second, we consider the case where there exists j such that gj−1 = gj . If the levels are defined as above, the fitness partition size is now zero, and it is possible to move backwards. Instead, one can define the fitness partitions so as to merge levels such that no fitness partition is of size zero. If this is done in such a way that rows comprising levels of size zero are repeatedly merged upward into rows above them until they contain valid points, one can ensure that module substrings decoding to the largest possible component fitness value from g appear only once in a given partition for each module position, and that all other values are the same and smaller. Thus, a module mutating between two points with equal fitness values does not lead to a transition in fitness partition. The probability of making a transition is still governed by the rank of the larger value in a partition for a given module, there are simply fewer rows and hence fewer fitness partitions than before. Since the probabilities are no worse and there are fewer steps to make, this case cannot be worse than the case where all g values are unique. There is a similar limiting pattern to the upper bound as we noted for the lower bound: Sub-linear scaling for lm and lr leads to sub-exponential bounds on this function class, and there is a kind of “lowering the bar” effect on what is possible for an upper bound that occurs when this scale is reduced. Taken together, there is a bracketing on the bounds for these kinds of concatenated problems determined by the size of modules when they are no larger than the problem pieces and the modularization is aligned. This follows naturally from the observation that the algorithm is essentially ignoring the values within the pieces and simply performing a traditional search at the module level. What the (1+1) EA does with bits, the (1+1) CMS-EA does with modules, and when the additional gradient information provided in those pieces is helpful for search, the CMS-EA cannot make use if it. But when that information is harmful or misleading, the CMS-EA is undeterred. In essence, bracketing results from the fact that the CMS-EA doesn’t care about g. Considering the (1+1) EA as an extreme case of the CMSEA on linear functions, lr = lm := 1, k = n, our lower bound result is consistent with the known Θ(n ln n) result for the 1-separable concatenated function class (generalized OneMax). At the other extreme, when lr = lm := n, k = 1, the CMS-EA performs random search, and the our bounds loosely corroborates this, as well. When lm := lg n, our lower bound is quadratic and the upper bound is nearly quadratic, ` ´ O n3 / lg n . Our result is general enough to compare with known results for several function classes. The concatenated Trap function, where g decreases with the Hamming distance to its optimum at every point except that optimum, requires nlg n time for the (1+1) EA if the length of the pieces grow as lg n. The Clob (concatenated LeadingOnesBlock) problem, where g is the sum of the left-to-right consecutive b 1-bit blocks, is Θ(nb (n/(br) + ln r)) and so depends heavily on the size of the block [13]. The CMS-EA can solve instances of both problem classes faster when lr = lg n and b ≥ 2, as well as many others (of course). The BuildingBlock function class from [11] is also a concatenated √ function, though√the analysis from that paper uses lm := n. The case of n is particularly interesting since a simultaneous mutation of lg n√bits is not so improbable, whereas this cannot be said for √ n. Here, the running time of the CMS-EA with lm := √n on functions in this class could be quite unpleasant, Ω(2 n n3/2 lg n3/2 ). But for the stated problem classes, the CMS-EA will solve such problems in nearly quadratic time. We do not suggest that this aligned log-separable concatenation property is sufficient or necessary for a CMS-EA advantage — there are sub-classes in this class that are (somewhat) easier or no harder for a simple EA or other method to solve, and there very well may be functions on which the CMS-EA is advantageous that are not log-separable concatenated. But one clear effect that using such a module scheme has is to reduce the possible range of the bounds, and while the bounds will not be the same for more sophisticated variants of the algorithm (those with a population, for instance), we speculate that there will be a similar bracketing effect on what kind of runtime performance is possible for such problems. leading away from the optimum. We also add a minor contrivance that prevents backward drift when very close, but outside of the trap region. It’s useful to see that the ratio of Needle to non-Needle points in the landscape is quite small and shrinks as n grows, so the problem class is not that different from OneMax at the bit level when n is large. Indeed, as we show below, the (1+1) EA will solve this problem in O(n lg n) time, while the problem is quite difficult for the CMS-EA. 4.2 Theorem 3. The expected first-hitting time for the (1+1) EA on ModTrap is O (n lg n) if a Needle is never activated, which occurs with probability 1 − o(n−1/3 ). A Module Trap Obviously there are many functions for which the CMSEA will perform very poorly — the full Trap function, for instance. Such examples have their place, but a more interesting counter example here can be constructed using the same observation that helped us understand one class of functions for which our CMS-EA is reasonably efficient: the CMS-EA cannot make effective use of the information inside the modules and has no effective means of searching the relationships between the internal content of the modules. An obvious way to exploit this search methodology is to create a trap for the modules, though the underlying information in the modules contains sufficient information to solve the problem. We construct such a function class as follows. First, let Needle be a function that returns 1 when two given strings match and 0 when they do not. Interpreting Needle as our g function in the above framework allows us to specify the class of concatenated functions: Definition 2. Let x̂ ∈ {0, 1}n be the unique global optimum of the function CNeedle : {0, 1}n . We partition search points, x ∈ {0, 1}n into r pieces each of length lr such that x(i) := x(i−1)lr +1 · · · xi·lr . We define this by CNeedle(x, x̂) := r X Needle(x(i) , x̂(i) ) i=1 We use the following notational short-cuts: • OM(x) = OneMax(x, x̂) ¯ • ZM(x) = OneMax(x, x̂) ¯ • CN(x) = CNeedle(x, x̂) • MinOM(x) is short for mini∈[1,k] {OM(x(i) , x̂(i) )} Definition 3. Let x̂ ∈ {0, 1}n be the unique global opti¯ be the complemum of the function ModTrap : {0, 1}n , x̂ ment of x̂, and x ∈ {0, 1}n be a search point. We define ModTrap(x, x̂) := 8 4 n if x = x̂ > > > if CN(x) > 0 < n3 · ZM(x) OM(x) + n2 if MinOM(x) ≥ ln lg n ∧ > > CN(x) = 0 > : n · MinOM(x) + OM(x) otherwise Intuitively, each of the two underlying functions, CNeedle and OneMax, are concatenated functions and could be solved efficiently (if the log-separable concatenated and properly aligned); however, for the combined function, once a Needle is activated, the function provides information Proof. The proof is separated into three parts. First, we show that after initialization the EA will have at least ln lg n bits in each piece. We follow by bounding the probability of activating a Needle before the EA has completed nln lg n steps. Finally we compute the expected number of steps to the optimum given that no Needle is activated. Without ¯ = 0n . loss of generality, we assume x̂ = 1n and x̂ Given a bit-flip mutation probability of 1/n, applying Chernoff bounds we know that the probability of having at least n/4 one-bits in the complete string is 1 − e−Ω(n) , so we pessimistically assume we have only n/4 1’s to distribute among the n/ lg n. First, we re-write n/4: „ « n ln n n = 4 4 lg n lg 2 n 1 · (ln n − ln lg n + ln lg n) = 4 lg 2 lg n „ « „ « 1 n n n = ln ln lg n + 4 lg 2 lg n lg n lg n Lemma 17 from [13] provides a generalized form of the well-known coupon collector’s problem, the expected number of throws necessary to ensure at least h balls in each of √k bins is Θ(k ln k + kh) with probability at least 1 − 1/ k. For our presult, this implies that with a probability at least 1 − 1/ n/ lg n = 1 − o(n−1/3 ) there are at least ln lg n onebits in each piece. As long as no Needle has been activated in initialization, there will never be fewer ones after initialization. Moreover, due to the MinOM condition, the only way to obtain fewer than ln lg n 1-bits in a given module is to simultaneously flip all 1-bits to zero. So in the easiest case, the EA must flip ln lg n bits to activate a Needle, which occurs with probability at most 1/nln lg n . Since there are r = n/ lg n pieces of the problem, there are n/ lg n opportunities to do so in each 3 step, so considering n2 steps gives us lgn n such opportunities. The probability that there is never a Needle activation ´ n3 ` event in that time is 1 − nln1lg n lg n . If the Needle is not activated, the algorithm proceeds much like it would for OneMax. The upper bound can be derived using the common f -based partitions approach for OneMax since in all cases where a Needle is not activated it is sufficient to count only steps where precisely one 0 is flipped P to a 1. Using OneMax as a proxy, we have Li := {x| n j=1 xj = i}, 0 ≤ i < n. The probability of advancing a P ne level is at least (n−i)/ne, so E{T } = n−1 i=0 n−i = O(n lg n). The probability that no Needles were activated during initialization and none were activated before the optimum was otherwise found, then, is at least: „ (1 − o(n−1/3 ) · 1 − 1 nln lg n « n3 lg n = (1 − o(n−1/3 )) Theorem 4. The expected first-hitting time for the (1+1) CMS-EA on ModTrap the” modularization is aligned “` when´k/4 if a Needle is activated with the problem is Ω 2lm k before at most 3k/4 modules have been solved, which occurs with probability 1 − e−Ω(k) . Proof. Again, we assume without loss of generality that the solution is found at the all-1 string, the complement at the all-0 string. Also, the fact that the modularization is aligned with the problem implies that lr ≤ lm , so we use lm in all cases below for simplicity. We begin by considering the probability that at least one Needle will have activated in t := 2lm k/4 steps. The probability that a Needle is activated by a mutation event in a 1 given position is 2lm , and there are 2lm k2 /4 opportunities k for such a mutation in t steps. So the probability of Needle activation is at least: „ «2lm k· k 4 1 1− 1− l ≥ 1 − e−k/4 m 2 k Next, we bound the probability of having more than 3k/4 solved modules in t steps. The probability of finding a solution substring via a given module mutation event is also 1 , and given that there are k opportunities in each step 2l m k to create find such a substring, the expected number of such 1 = k/4. Applying solving events in time t is 2lm k2 /4 · 2lm k Chernoff bounds, we see that the probability that there are ` ´k/4 more than 3k/4 solving events in that time is e2 /9 . Of course, not all of those events will occur in distinct module positions, but the number of solved modules cannot be more than the number “solving events” encountered in time t. Thus, the probability that at most 3k/4 solved modules ` ´k/4 is at least 1 − e2 /9 . We consider the first phase of the run to be these first 2lm k/4 steps and bound the probability that by such time at least one Needle will have been activated and no more than 3k/4 modules can have been solved: “ ” ` ´k/4 ” “ 1 − e2 /9 · 1 − e−k/4 = 1 − e−Ω(k) We consider the second phase of the run to be the remaining steps required to solve the problem once a Needle has been activated. Once this occurs, the algorithm cannot select an individual with fewer (or no) Needles unless it simultaneously flips all remaining unsolved modules. In the best case, there are at least k/4 of these modules. The probability of mutating all k/4 of these modules to the solution substring at the same time is at most Ω(2lm k)−k/4 so the ` ´k/4 expected waiting time for such an event is 2lm k . Adding the time for the first phase to that of the second “ ” ` ´k/4 phase, we get 2lm k/4 + 2lm k = Ω (2lm k)k/4 . The (1+1) EA performs reasonably because it is able to use the intra-piece information to scale the Hamming levels to the solution before it has an opportunity to fall for the trap. The (1+1) CMS-EA is “ not so lucky. When ” lr = lm = 2 lg n, the lower bound is Ω (n2 / lg n)n /4 lg n . Because it cannot make use of the intra-block information, it must wait to assemble solved blocks. Unfortunately, it is very unlikely to solve all the modules before falling for the trap — and once in, it cannot easily escape. 5. CONCLUSIONS In this paper, we provide a deeper look at the performance effects of modular representations described in [9] that use a complete module set. That work lays out a rigorous and careful analysis of how certain kinds of modularization impact search space bias in terms of their distance to the global optimum, but their discussion of the actual performance impact of such representations is largely empirical. Moreover, the empirical difficulty measure used implies that unbiased modular encodings, like those that include all substrings of a certain length (e.g., the CMS), will not typically impact search performance. The goal of our paper is to extend existing analysis of modular representations to include formal investigations into their effects on EA performance. We have shown that even this very simple modularization can alter search performance, both positively and negatively. This can be true even when the problem is well-suited for the distance measure used to measure bias. Indeed, the reason fitness distance correlation is a poor predictor of CMS-EA performance is because it is attuned to a landscape topology induced by something like bit-flip mutation, whereas we now see that mutation in the CMS-EA leads to a quite different search mechanism. The main technical difference between their work and ours centers around the relationship between the size of the modules and the size of the problem itself. We take the view that representations that presume constant-sized modules are really just changing the alphabet, while representations that use modules in the truest sense must admit some non-constant relationship between the module length and the size of the problem. We provide bounds for a general class of separable concatenated functions to which a properly aligned simple (1+1) CMS-EA can be applied, focusing on the subclass where the piece size scales with lg n. This class includes functions that require arbitrarily large polynomial time, as well as those that require super-polynomial time for the`(1+1) EA. ´ The CMS-EA solves such problems in Ω(n2 ), O n3 / lg n time when it’s module size is also lg n. Moreover, the general bounds relate in an informative way: Ω(2lm k ln k) and O(2lm kn). The relationship between lm and n determines a kind of bracketing effect on the bounds and helps highlight a how the CMS-EA trades off searches inside and outside modules. We use this intuition to construct a counter example where the CMS-EA requires superpolynomial time to solve to help clarify this point. More specifically, the CMS-EA is attempting to conduct two concurrent but different search processes: one in which individual genotypic values are being manipulated (intramodule search) and one in which assemblies of large components of the solution string are being explored (extra-module search). We note the relationship between EAs employing crossover, EAs employing modular representations, and cooperative coevolutionary EAs — all of which attempt to conduct this kind of two-level search. In cooperative coevolution, intra-module search occurs via genetic operators and extra-module search occurs via the collaboration mechanism [13]. With crossover, intra-module search employs mutation and extra-module search is performed via recombination. In the case of our (1+1) CMS-EA, extra-module search uses an essentially mutation-based method, while intra-module search is blind, random search. Still, the CMS-EA is merely a baseline. Understanding this aspect of the algorithm helps inform our notion of modular bias. Conceptually, one can imagine a transition matrix, for example, where transitioning from one module substring to another occurs according to some specified probability. With this notion, an intra-module search that is equivalent to bit-flip mutation can be produced by simply eliciting the probabilities from the binomial distribution, while one that is identical to our CMS-EA would be based on a uniform distribution. Now, though, we can imagine other distributions, and doing so offers an opportunity to think very generally about modular bias. We believe this analysis, though quite coarse, improved our intuition about CMS-EAs, and we hope there will be increased interest in examining modularization from this perspective. Further runtime analysis of modular representation is certainly needed. In the future, we would like to consider population-based algorithms, as well as those that use more sophisticated operators. More important is the consideration of increasingly complex modular encodings, biased encodings in particular. 6. REFERENCES [1] L. Altenberg. Fitness distance correlation analysis: An instructive counterexample. In In Proceedings of the Seventh International Conference on Genetic Algorithms, pages 57–64. Morgan Kaufmann, 1997. [2] P. Angeline and J. Pollack. Evolutionary induction of subroutines. In Proceedings for the 14th Annual Cognitive Science Conference, pages 236–241, 1992. [3] D. D’Abrisio and K. Stanley. Generative encoding for multiagent learning. In Proceedings of the 2008 Genetic and Evolutionary Computation Conference. ACM Press, 2008. [4] E. De Jong, D. Thierens, and R. Watson. Defining modularity, hierarchy, and repetition. In Proceedings for the 2004 GECCO Workshop on Modularity, regularity and hierarchy in open-ended evolutionary computation, pages 2–6. Springer, 2004. [5] S. Droste, T. Jansen, and I. Wegener. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science, 276:51–81, 2002. [6] I. Garibay, O. Garibay, and A. Wu. Effects of module encapsulation in repetitively modular genotypes on the search space. In Proceedings for the 2004 Genetic and Evolutionary Computation Conference, pages 1125–1137. Springer, 2004. [7] O. Garibay. Analyzing the Effects of Modularity on Search Spaces. PhD thesis, University of Central Florida, Orlando, FL USA, 2009. [8] O. Garibay, I. Garibay, and A. Wu. The modular genetic algorithm: exploiting regularities in the problem space. In Proceedings for the 2003 International Symposium on Computer and Information Systems, pages 584–591. Springer, 2003. [9] O. Garibay and A. Wu. Analyzing the effects of module encapsulation on search spaces. In Proceedings of the 2007 Genetic and Evolutionary Computation Conference, pages 1234–1241. ACM Press, 2007. [10] G. Hornby. Measuring, enabling and comparing modularity, regularity and hierarchy in evolutionary design. In Proceedings for the 2005 Genetic and Evolutionary Computation Conference, pages 1729–1736. Springer, 2005. [11] T. Jansen and R. Watson. A building-block royal road where crossover is provably essential. In Proceedings from the 2007 Genetic and Evolutionary Computation Conference, pages 1452–1459. ACM Press, 2007. [12] T. Jansen and R. Wiegand. Bridging the gap between theory and practice. In Proceedings from the 8th Parallel and Problem Solving from Nature, pages 61–71. Springer, 2004. [13] T. Jansen and R. Wiegand. The cooperative coevolutionary (1+1) EA. Evolutionary Computation, 12(4):405–434, 2004. [14] T. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. PhD thesis, University of New Mexico, Albuquergue, NM USA, 1995. [15] H. Lipson. Principles of modularity, regularity, and hierarchy for scalable systems. In Proceedings for the 2004 GECCO Workshop on Modularity, regularity and hierarchy in open-ended evolutionary computation. Springer, 2004. [16] P. Oliveto, J. He, and X. Yao. Time complexity of evolutionary algorithms for combinatorial optimization: A decade of results. International Journal of Automation and Computing, 4(3):281–293, 2007. [17] K. Stanley, D. D’Abrosio, and J. Gauci. A hypercube-based indirect encoding for evolving large-scale neural networks. Artificial Life Journal, (to appear), 2009. [18] M. Toussaint. The evolution of genetic representations and modular adaptation. PhD thesis, Institut für Neuroinformatik, Ruhr-Universiät Bochum, Germany, 2003. [19] M. Toussaint. Compact genetic codes as a search strategy of evolutionary processes. In Foundations of Genetic Algorithms VIII, pages 75–94. Springer, 2005. [20] I. Wegener. Theoretical aspects of evolutionary algorithms. In 28th International Colloquium of Automata, Languages and Programming (ICALP 2001), pages 64–78. Springer, 2001.