Thursday, September 01, 2005

A Whiff of the BS Model

I thought it would be useful to compile a summary of the problems with the Behe & Snoke (2004) paper (BS). Note, however, that despite my criticisms I commend BS for their willingness to publish in a high-profile peer-reviewed journal (albeit one that does not often publish on molecular evolution). I wish other proponents of intelligent design creationism (ID) would do the same -- I'm thinking particularly of the mathematician thought by some to be "the Isaac Newton of information theory". (Curiously, this feeling is apparently shared by at least one proponent of ID.)

Here's the beginning of the list, largely based on and Musgrave, Reuland & Cartwright's review in Panda's Thumb (MRC) and the Lynch (2005) paper (L05):
  1. "Darwinism" is tested using a non-Darwinian model.
    "The flavor of BS’ paper may be gauged by the fact that the authors are skeptical of Darwinian processes to produce complex structures, yet use a model which largely ignores Darwinian processes." (MRC)
    "Although the authors claim to be evaluating whether Darwinian processes are capable of yielding new multiresidue functions, the model that they present is non-Darwinian [...]. Contrary to the principles espoused by Darwin, that is, that evolution generally proceeds via functional intermediate states, BS consider a situation in which the intermediate steps to a new protein are neutral and involve nonfunctional products." (L05, p 2217)
  2. The base population is constructed in a bizarre way. BS (pp 2651-2) assume:
    "that newly duplicated genes encode a full-length protein with the signals necessary for its proper expression. It is further assumed that all duplicate genes are selectively neutral. [...] Any given organism in the population may be thought to have anywhere from zero to multiple extra copies of the gene; that is, duplicate copy number is considered to have no selective effect. However, the model presupposes that there are a total of N duplicate copies of the gene, equal to the number of organisms in the population".
    This problem is corrected in L05's simulation (p 2218):
    "the model presented here starts with a more realistic base population harboring a single locus in all individuals. A duplicate gene then arises in a single random member of the population [...]"
    Note that the L05 approach is actually expected to increase the time to neo-functionalization, when compared to BS' approach. (See also points 4 and 5.)

  3. Only one advantageous target sequence is considered possible. This is acknowledged in BS' discussion (p 2661, their emphasis):
    "the simulation looks for the production of a particular MR [multi-residue] feature in a particular gene, the values will be overestimates of the time necessary to produce some MR feature in some duplicated gene. In other words, the simulation takes a prospective stance, asking for a certain feature to be produced, but we look at modern proteins retrospectively."
    However, they completely ignore this caveat, despite its large potential effect on the effective population size and fixation time estimates. L05 (p 2220) corrected this problem in their simulations by introducing a new parameter: "the number of potential contributory sites to the new function (n)".

  4. The mutational advantage of duplication is ignored. L05 (p 2223) point out that BS:
    "failed to realize that a completely linked pair of duplicate genes has a mutational advantage equal to the mutation rate to null alleles (µ), owing to the fact that both members of a linked pair must be inactivated before the viability of the carrier is affected".
    This implies that, when the population size "N is moderately large (2Nµ>1), the fixation probability approaches 2µ" (L05). (See also point 2.)

  5. Intermediate alleles are excluded from the base population. L05 (p 2222) note that BS:
    "assume that the evolution of a multi-residue function requires the origin of a full set of mutations previously kept absent from the population [...]."
    Since BS assume that "the intermediate steps toward the evolution of a selectable multi-residue function are entirely neutral [...]" intermediate alleles could be present in the population before duplication. This is allowed in L05's model. (See also point 2.)

  6. The value of ρ used throughout is unrealistically high. BS define ρ as the ratio of the number of null mutations to mutations compatible with the novel function. They set the parameter to ρ=1000 in all their simulations and justify this choice by asserting that (pp 2652-3):
    "The majority of nonneutral point mutations to the gene will yield a null allele (again, by which we mean a gene coding for a nonfunctional protein) because most mutations that alter the amino acid sequence of a protein effectively eliminate function (Reidhaar-Olson and Sauer 1988, 1990; Bowie and Sauer 1989; Lim and Sauer 1989; Bowie et al. 1990; Reidhaar-Olson and Sauer 1990; Rennell et al. 1991; Axe et al. 1996; Huang et al. 1996; Sauer et al. 1996; Suckow et al. 1996)."
    This assumption is not supported by the studies cited by BS (see MRC, "Rho-Oh!" section and the L05 passage quoted in my earlier post). By their own admission, the value of ρ has a profound effect on the outcome of their model (p 2661):
  7. "The model is more sensitive to the value of ρ [...]. If ρ were less by a factor of 10 (100 instead of 1000), then the population size needed to fix the feature in the preceding example in 10^8 generations would decrease from 10^22 to 10^16."

I'll continue to add to this list over the next few days. If you let me know of other issues, I'll add them to the list as well.

Update: Two more points have been added.

Update 2: One more point has been added.