外文翻译遗传算法

资源描述

What is a genetic algorithm?l Methods of representation l Methods of selection l Methods of change l Other problem-solving techniques Concisely stated, a genetic algorithm (or GA for short) is a programming technique that mimics biological evolution as a problem-solving strategy. Given a specific problem to solve, the input to the GA is a set of potential solutions to that problem, encoded in some fashion, and a metric called a fitness function that allows each candidate to be quantitatively evaluated. These candidates may be solutions already known to work, with the aim of the GA being to improve them, but more often they are generated at random.The GA then evaluates each candidate according to the fitness function. In a pool of randomly generated candidates, of course, most will not work at all, and these will be deleted. However, purely by chance, a few may hold promise - they may show activity, even if only weak and imperfect activity, toward solving the problem.These promising candidates are kept and allowed to reproduce. Multiple copies are made of them, but the copies are not perfect; random changes are introduced during the copying process. These digital offspring then go on to the next generation, forming a new pool of candidate solutions, and are subjected to a second round of fitness evaluation. Those candidate solutions which were worsened, or made no better, by the changes to their code are again deleted; but again, purely by chance, the random variations introduced into the population may have improved some individuals, making them into better, more complete or more efficient solutions to the problem at hand. Again these winning individuals are selected and copied over into the next generation with random changes, and the process repeats. The expectation is that the average fitness of the population will increase each round, and so by repeating this process for hundreds or thousands of rounds, very good solutions to the problem can be discovered.As astonishing and counterintuitive as it may seem to some, genetic algorithms have proven to be an enormously powerful and successful problem-solving strategy, dramatically demonstrating the power of evolutionary principles. Genetic algorithms have been used in a wide variety of fields to evolve solutions to problems as difficult as or more difficult than those faced by human designers. Moreover, the solutions they come up with are often more efficient, more elegant, or more complex than anything comparable a human engineer would produce. In some cases, genetic algorithms have come up with solutions that baffle the programmers who wrote the algorithms in the first place!Methods of representationBefore a genetic algorithm can be put to work on any problem, a method is needed to encode potential solutions to that problem in a form that a computer can process. One common approach is to encode solutions as binary strings: sequences of 1s and 0s, where the digit at each position represents the value of some aspect of the solution. Another, similar approach is to encode solutions as arrays of integers or decimal numbers, with each position again representing some particular aspect of the solution. This approach allows for greater precision and complexity than the comparatively restricted method of using binary numbers only and often is intuitively closer to the problem space (Fleming and Purshouse 2002, p. 1228).This technique was used, for example, in the work of Steffen Schulze-Kremer, who wrote a genetic algorithm to predict the three-dimensional structure of a protein based on the sequence of amino acids that go into it (Mitchell 1996, p. 62). Schulze-Kremers GA used real-valued numbers to represent the so-called torsion angles between the peptide bonds that connect amino acids. (A protein is made up of a sequence of basic building blocks called amino acids, which are joined together like the links in a chain. Once all the amino acids are linked, the protein folds up into a complex three-dimensional shape based on which amino acids attract each other and which ones repel each other. The shape of a protein determines its function.) Genetic algorithms for training neural networks often use this method of encoding also.A third approach is to represent individuals in a GA as strings of letters, where each letter again stands for a specific aspect of the solution. One example of this technique is Hiroaki Kitanos grammatical encoding approach, where a GA was put to the task of evolving a simple set of rules called a context-free grammar that was in turn used to generate neural networks for a variety of problems (Mitchell 1996, p. 74).The virtue of all three of these methods is that they make it easy to define operators that cause the random changes in the selected candidates: flip a 0 to a 1 or vice versa, add or subtract from the value of a number by a randomly chosen amount, or change one letter to another. (See the section on Methods of change for more detail about the genetic operators.) Another strategy, developed principally by John Koza of Stanford University and called genetic programming, represents programs as branching data structures called trees (Koza et al. 2003, p. 35). In this approach, random changes can be brought about by changing the operator or altering the value at a given node in the tree, or replacing one subtree with another. Figure 1: Three simple program trees of the kind normally used in genetic programming. The mathematical expression that each one represents is given underneath.It is important to note that evolutionary algorithms do not need to represent candidate solutions as data strings of fixed length. Some do represent them in this way, but others do not; for example, Kitanos grammatical encoding discussed above can be efficiently scaled to create large and complex neural networks, and Kozas genetic programming trees can grow arbitrarily large as necessary to solve whatever problem they are applied to.Methods of selectionThere are many different techniques which a genetic algorithm can use to select the individuals to be copied over into the next generation, but listed below are some of the most common methods. Some of these methods are mutually exclusive, but others can be and often are used in combination.Elitist selection: The most fit members of each generation are guaranteed to be selected. (Most GAs do not use pure elitism, but instead use a modified form where the single best, or a few of the best, individuals from each generation are copied into the next generation just in case nothing better turns up.)Fitness-proportionate selection: More fit individuals are more likely, but not certain, to be selected.Roulette-wheel selection: A form of fitness-proportionate selection in which the chance of an individuals being selected is proportional to the amount by which its fitness is greater or less than its competitors fitness. (Conceptually, this can be represented as a game of roulette - each individual gets a slice of the wheel, but more fit ones get larger slices than less fit ones. The wheel is then spun, and whichever individual owns the section on which it lands each time is chosen.)Scaling selection: As the average fitness of the population increases, the strength of the selective pressure also increases and the fitness function becomes more discriminating. This method can be helpful in making the best selection later on when all individuals have relatively high fitness and only small differences in fitness distinguish one from another.Tournament selection: Subgroups of individuals are chosen from the larger population, and members of each subgroup compete against each other. Only one individual from each subgroup is chosen to reproduce.Rank selection: Each individual in the population is assigned a numerical rank based on fitness, and selection is based on this ranking rather than absolute differences in fitness. The advantage of this method is that it can prevent very fit individuals from gaining dominance early at the expense of less fit ones, which would reduce the populations genetic diversity and might hinder attempts to find an acceptable solution.Generational selection: The offspring of the individuals selected from each generation become the entire next generation. No individuals are retained between generations.Steady-state selection: The offspring of the individuals selected from each generation go back into the pre-existing gene pool, replacing some of the less fit members of the previous generation. Some individuals are retained between generations.Hierarchical selection: Individuals go through multiple rounds of selection each generation. Lower-level evaluations are faster and less discriminating, while those that survive to higher levels are evaluated more rigorously. The advantage of this method is that it reduces overall computation time by using faster, less selective evaluation to weed out the majority of individuals that show little or no promise, and only subjecting those who survive this initial test to more rigorous and more computationally expensive fitness evaluation.Methods of changeOnce selection has chosen fit individuals, they must be randomly altered in hopes of improving their fitness for the next generation. There are two basic strategies to accomplish this. The first and simplest is called mutation. Just as mutation in living things changes one gene to another, so mutation in a genetic algorithm causes small alterations at single points in an individuals code.The second method is called crossover, and entails choosing two individuals to swap segments of their code, producing artificial offspring that are combinations of their parents. This process is intended to simulate the analogous process of recombination that occurs to chromosomes during sexual reproduction. Common forms of crossover include single-point crossover, in which a point of exchange is set at a random location in the two individuals genomes, and one individual contributes all its code from before that point and the other contributes all its code from after that point to produce an offspring, and uniform crossover, in which the value at any given location in the offsprings genome is either the value of one parents genome at that location or the value of the other parents genome at that location, chosen with 50/50 probability. Figure 2: Crossover and mutation. The above diagrams illustrate the effect of each of these genetic operators on individuals in a population of 8-bit strings. The upper diagram shows two individuals undergoing single-point crossover; the point of exchange is set between the fifth and sixth positions in the genome, producing a new individual that is a hybrid of its progenitors. The second diagram shows an individual undergoing mutation at position 4, changing the 0 at that position in its genome to a 1.Other problem-solving techniquesWith the rise of artificial life computing and the development of heuristic methods, other computerized problem-solving techniques have emerged that are in some ways similar to genetic algorithms. This section explains some of these techniques, in what ways they resemble GAs and in what ways they differ. Neural networksA neural network, or neural net for short, is a problem-solving method based on a computer model of how neurons are connected in the brain. A neural network consists of layers of processing units called nodes joined by directional links: one input layer, one output layer, and zero or more hidden layers in between. An initial pattern of input is presented to the input layer of the neural network, and nodes that are stimulated then transmit a signal to the nodes of the next layer to which they are connected. If the sum of all the inputs entering one of these virtual neurons is higher than that neurons so-called activation threshold, that neuron itself activates, and passes on its own signal to neurons in the next layer. The pattern of activation therefore spreads forward until it reaches the output layer and is there returned as a solution to the presented input. Just as in the nervous system of biological organisms, neural networks learn and fine-tune their performance over time via repeated rounds of adjusting their thresholds until the actual output matches the desired output for any given input. This process can be supervised by a human experimenter or may run automatically using a learning algorithm (Mitchell 1996, p. 52). Genetic algorithms have been used both to build and to train neural networks. Figure 3: A simple feedforward neural network, with one input layer consisting of four neurons, one hidden layer consisting of three neurons, and one output layer consisting of four neurons. The number on each neuron represents its activation threshold: it will only fire if it receives at least that many inputs. The diagram shows the neural network being presented with an input string and shows how activation spreads forward through the network to produce an output. Hill-climbingSimilar to genetic algorithms, though more systematic and less random, a hill-climbing algorithm begins with one initial solution to the problem at hand, usually chosen at random. The string is then mutated, and if the mutation results in higher fitness for the new solution than for the previous one, the new solution is kept; otherwise, the current solution is retained. The algorithm is then repeated until no mutation can be found that causes an increase in the current solutions fitness, and this solution is returned as the result (Koza et al. 2003, p. 59). (To understand where the name of this technique comes from, imagine that the space of all possible solutions to a given problem is represented as a three-dimensional contour landscape. A given set of coordinates on that landscape represents one particular solution. Those solutions that are better are higher in altitude, forming hills and peaks; those that are worse are lower in altitude, forming valleys. A hill-climber is then an algorithm that starts out at a given point on the landscape and moves inexorably uphill.) Hill-climbing is what is known as a greedy algorithm, meaning it always makes the best choice available at each step in the hope that the overall best result can be achieved this way. By contrast, methods such as genetic algorithms and simulated annealing, discussed below, are not greedy; these methods sometimes make suboptimal choices in the hopes that they will lead to better solutions later on. Simulated annealingAnother optimization technique similar to evolutionary algorithms is known as simulated annealing. The idea borrows its name from the industrial process of annealing in which a material is heated to above a critical point to soften it, then gradually cooled in order to erase defects in its crystalline structure, producing a more stable and regular lattice arrangement of atoms (Haupt and Haupt 1998, p. 16). In simulated annealing, as in genetic algorithms, there is a fitness function that defines a fitness landscape; however, rather than a population of candidates as in GAs, there is only one candidate solution. Simulated annealing also adds the concept of temperature, a global numerical quantity which gradually decreases over time. At each step of the algorithm, the solution mutates (which is equivalent to moving to an adjacent point of the fitness landscape). The fitness of the new solution is then compared to the fitness of the previous solution; if it is higher, the new solution is kept. Otherwise, the algorithm makes a decision whether to keep or discard it based on temperature. If the temperature is high, as it is initially, even changes that cause significant decreases in fitness may be kept and used as the basis for the next round of the algorithm, but as temperature decreases, the algorithm becomes more and more inclined to only accept fitness-increasing changes. Finally, the temperature reaches zero and the system freezes; whatever configuration it is in at that point becomes the solution. Simulated annealing is often used for engineering design applications such as determining the physical layout of components on a computer chip (Kirkpatrick, Gelatt and Vecchi 1983). 遗传算法是什么?l表示方法l方法的选择l变化的方法l其他解决问题的技术简明地说,遗传算法(GA)是一种编程技术,模仿生物进化作为一个解决问题的策略。给定一个特定的问题解决,遗传算法的输入是一组潜在的解决这一问题,以某种方式编码,度量称为适应度函数,允许每个候选人进行定量评估。这些候选人可能解决方案已知工作,GA的目的是改善,但更多的则是随机生成的。GA然后评估每个候选人根据适应度函数。池中随机生成的候选人,当然,大多数不会工作,这些将被删除。然而,纯粹是一个偶然的机会,几个可能持有的承诺他们可能显示活动,即使是软弱和不完美的活动,对解决这一问题。这些有前途的候选人保持和允许复制。是由多个副本,但副本并不完美,在复制过程中介绍了随机变化。这些数字的后代进入下一代,形成一个新游泳池的候选解决方案,并接受第二轮健康评估。这些候选解决方案恶化,或没有更好,再次修改他们的代码删除了;但是,纯粹的偶然,随机变化引入人口可能改进的一些人,让他们成为更好、更完整的或更有效的解决眼前的问题。这些再次赢得个人选择和复制到下一代随机变化,和重复的过程。预计平均健身的人口将会增加每一轮,所以通过重复这个过程为成百上千的轮,很好的解决这一问题可以被发现。惊人的和违反直觉的是,遗传算法已被证明是一个非常强大的和成功的解决问题的策略,极大地展示的力量进化原则。遗传算法已被用于各种领域发展问题解决方案一样困难或更困难比人类所面临的设计师。此外,他们想出的解决方案通常更有效率,更优雅,更复杂的比可比人类工程师会产生。在某些情况下,遗传算法已经提出了解决方案,挡板的程序员写的算法首先!表示方法在遗传算法可以工作在任何问题,需要一种方法来编码,问题可能的解决方案在一个计算机能处理的形式。一个常见的方法是作为二进制字符串编码解决方案:1和0的序列,其中每个位置的数字表示解决方案的某些方面的价值。另一个类似的方法,编码解决方案作为整数或小数的数组,每个位置再次代表一些解决方案的特定方面。这种方法允许更大的精度和复杂度比仅使用二进制数的相对限制方法,经常“直观地接近问题空间”(弗莱明和Purshouse 2002,p . 1228)。这种技术被使用,例如,在Steffen Schulze-Kremer的工作,写一个遗传算法来预测蛋白质的三维结构的基础上进入它的氨基酸序列(米切尔1996,p . 1996)。Schulze-Kremer的GA使用元数据来代表所谓的“扭转角度”之间的连接氨基酸的肽债券。(一种蛋白质是由一系列的基本构建块称为氨基酸,这是连接在一起的链接。一旦所有的氨基酸有关,蛋白质折叠成一个复杂的三维形状基于氨基酸相互吸引和哪些相互排斥。蛋白质的形状决定了它的功能。)遗传算法训练神经网络也经常使用这种编码方法。第三种方法是代表个人的GA作为字符串字母,再次,每个字母代表一个特定方面的解决方案。这种技术的一个例子是Hiroaki北野的“语法编码”的方法,在遗传算法的任务是发展一套简单的规则称为上下文无关文法,反过来用于生成各种问题的神经网络(米切尔1996,p . 1996)。这三种方法的优点是,他们可以很容易定义运营商导致的随机变化选择候选人:翻转一个0到1,反之亦然,添加或减去数量由一个随机选择的价值金额,或改变一个字母到另一个地方。(见章节的方法改变关于遗传算子的更多细节)。另一个策略,主要是由美国斯坦福大学的约翰科扎(John Koza和遗传规划,代表项目分支数据结构称为树(Koza et al . 2003年,p . 35)。在这种方法中,随机变化可以带来改变运营商或改变值在给定节点树,或与另一个取代一个子树。图1:三个简单程序的树木通常用于遗传规划。的数学表达式,给出每一个代表。重要的是要注意,进化算法不需要代表候选解决方案为固定长度的数据字符串。一些代表他们在这种方式,但别人不愿这样做;例如,北野的语法编码上面讨论可以有效地扩展创建大型和复杂的神经网络,和Koza遗传规划树可以任意大的必要的应用来解决任何问题。方法的选择有许多不同的技术,遗传算法可以用来选择个人被复制到下一代,但下面列出的是一些最常见的方法。这些方法是相互排斥的,但其他人可以和往往是结合使用。精英选择:最适合每一代的成员是保证被选中。(大多数气体不使用纯粹的精英主义,而是使用一种修改的最好的地方,或几个最好的,从每一代个体复制到下一代以防没有更好的出现。)Fitness-proportionate选择:更适合个人更有可能,但不确定,选择。轮盘赌选择:一种fitness-proportionate选择的个体被选中的概率成正比的数量健身是大于或小于其竞争对手的健康。(从概念上讲,这可以表示为一个轮盘赌的游戏,每个人得到一片轮子,但更适合的片比不太适合的。然后旋转轮,无论个人“拥有”的部分土地每次选择。)缩放选择:随着人口的平均健康增加,选择压力的强度也会增加,适应度函数变得更加挑剔。这个方法可以有助于使最好的选择稍后当所有个人有相对较高的健康和健身区分一个从另一个只有微小的差异。锦标赛选择:个人选择的子组更大的人口,和每个子群的成员相互竞争。只有一个个人从每个子群选择繁殖。选择:每个人的人口被分配一个数值排名基于健身,和选择是基于这个排名而不是绝对健康的差异。这种方法的优点是,它可以阻止非常适合个人获得早期的主导地位不太适合的,这将减少人口的遗传多样性,可能阻碍尝试找到一个可接受的解决方案。世代选择:从每一代个体的后代选择成为整个下一代。几代人之间没有个人留存。稳态选择:从每一代个体的后代选择回到预先存在的基因库,替换一些不适合上一代的成员。一些人保留了两代人之间。分层选择:个人要经过多次选择每一代。低级别评估更快和更少的歧视,而那些生存更高更严格地评估。这种方法的优点是,它可以减少总的计算时间用更快、更少的选择性评价剔除大部分个人显示很少或根本没有承诺,只有对那些生存这个初始测试更严格、更计算昂贵的健身评价。改变的方法一旦选择选择适合个人,他们必须随机改变,希望改善他们的健康的下一代。有两种基本策略来完成这项工作。第一和最简单的称为变异。就像一个基因突变生物变化到另一个地方,所以变异遗传算法导致个体的单点小改变代码。第二种方法称为交叉,需要选择两个人交换部分的代码,生产人造“后代”的组合他们的父母。这个过程的目的是模拟类似的重组过程发生在有性繁殖染色体。交叉的常见形式包括单点交叉,交换一个点被设置在一个随机的位置在两个人的基因组,一个个人贡献所有的代码之前,这一点和其他贡献所有的代码之后,指出产生后代,和均匀交叉的价值在任何给定位置的后代的基因组是一个父母的基因组在这个位置的价值或价值的其他父母的基因组在这个位置,选择有50/50的概率。图2:交叉和变异。上面的图表说明了每一种遗传算子的影响对个人在人口的8位字符串。上面的图表显示了两个个体进行单点交叉,交流的目的是将在第五和第六的位置在基因组中,产生新的个体,是一个混合的祖细胞。第二个图显示了一个4个人经历突变位置,改变0在其基因组的那个位置1。其他解决问题的技术随着人工生命计算的兴起和发展的启发式方法,其他计算机解决问题的技术已经出现在某些方面类似于遗传算法。本节解释其中的一些技术,以何种方式类似于气体和以何种方式是不同的。神经网络神经网络,简称神经网络,是一个基于计算机模型解决问题的方法在大脑中神经元是如何连接的。神经网络由层处理单元的节点加入了定向链接:一个输入层、一个输出层,和零个或多个隐藏层。初始的输入模式提出了神经网络的输入层,刺激和节点的节点传输一个信号的下一层连接。如果输入的总和进入其中一个虚拟神经元高于所谓神经元的激活阈值,激活神经元本身,将自己的信号传递给下一层神经元。激活的模式因此向前传播,直到它到达输出层和有作为解决返回输入。就像在生物有机体的神经系统,神经网络学习和调整他们的性能随着时间的推移,通过重复轮调整阈值,直到实际产出匹配所需的输出对于任何给定的输入。这个过程可以由人类实验者监管或自动运行使用学习算法(米切尔,1996年52页)。遗传算法被用来构建和训练神经网络。图3:一个简单的前馈神经网络,一个输入层神经元组成的四个,一个隐藏层神经元组成的三个,和一个输出层组成的四个神经元。数量在每个神经元代表其激活阈值:它只会火,如果收到至少,许多输入。在显示一个图表显示了神经网络输入字符串并显示如何激活扩散通过网络转发给

展开阅读全文

外文翻译遗传算法

最新文档