Contents

## Taxon Sampling as long as Ancestral State Reconstruction Little Background to Ancestral Sequence Reconstruction Part I: Methods as long as Ancestral State Reconstruction A Reconstruction Example

Waters, Mark, General Manager; Senior Vice President has reference to this Academic Journal, PHwiki organized this Journal Taxon Sampling as long as Ancestral State Reconstruction Louxin Zhang Department of Mathematics Natl University of Singapore Little Background to Ancestral Sequence Reconstruction Ancestral sequence reconstruction incorporates sequences from modern organisms into evolutionary models to estimate the corresponding sequence of an ancestor that no longer presents on Earth. It has become an popular approach to studying the origin, evolution, sequence-function relationship of proteins, genes in addition to other components of life. This approach to underst in addition to ing proteins or life in general was proposed by Zuckerk in addition to l in addition to Pauling in 1963. source: Carnegie Museum of Natural History After ~ 100 million years

This Particular University is Related to this Particular Journal

The differences are due to the changes in our DNA (Slide from J.Ma) human: ~3.1G, 23 chromosomes chimp: ~3.3G, 24 chromosomes mouse: ~2.7G, 20 chromosomes dog: ~2.5G, 39 chromosomes source: U.S. DOE Recreate Genome of Ancient Human Ancestor Boreoeutherian ancestor lived 70 million yrs ago. The boreoeutheria was as long as med by a series of speciation events occurring rapidly after the ancestor, leading to a star-like phylogeny of boreoeutheria. Computer simulation suggests a small number of extant genomes can give a highly accurate reconstruction of this ancient genomes. (Blanchette et al04, Ma et al07) Ancestral State Reconstruction Problem Given a phylogenetic tree T of a character – the evolutionary history of the character in addition to an evolutionary model – prior distribution of all possible states at the root in addition to substitution probability on each branch of T Estimate the root state from the leaf states of the character in the tree T.

Part I: Methods as long as Ancestral State Reconstruction Fitch (parsimony) method Step 1: Compute a subset Sx of letters as long as each node x with children y in addition to z as follows Step 2: Select a letter from the subset obtained at the root r in addition to omly. {G}U{ A} {G, A}{G} {G}U{ A} {G} {G, A} G Fitch method – It assigns a state to the root by minimizing the total number of substitutions placed on all branches – But, it ignores substitution rate on branches in addition to is very sensitive to the topology of the tree, leading to several limitations – It is a local in addition to then efficient method. Marginal maximum likelihood (ML) method – It assigns to the root a state a that has the maximum likelihood defined as Pr[ the root state is a the given states at leaves ] in addition to a tie is broken arbitrarily. – It is a global method in addition to so less efficient. Approximation algs are studied. – But, has the largest reconstruction accuracy, over all methods.

Consider the following evolutionary model ( Tree + conservation rates + prior probabilities) as long as a character of two states 0 in addition to 1. pprior(0)=pprior(1)=0.5 r 0.8 0.9 0.9 0.9 0 1 1 A Reconstruction Example 0.8 means that a state remains unchanged with probability 0.8. For root state sr=0, 1 0 Pr[ 011 sr=0 ] = 0.8×0.1×0.9×0.9 + 0.8×0.9×0.1×0.1=0.072 For root state sr=1, 1 0 Pr[ 011 sr=1 ] = 0.2×0.9×0.9×0.9 + 0.2×0.1×0.1×0.1=0.1464

Pr[ 011 sr=0] = 0.072 Pr[ 011 sr=1] = 0.1462 The marginal ML method selects 1 as the root state from leaves states 0 1 1. Part II: Reconstruction Accuracy – Definition For a reconstruction method M, its accuracy is the expected probability that the method reconstructs correctly the root state from a possible configuration D of states of leaf species: RAM( T ) = c,D Pr[c evolves into D] Pr[M reconstructs c from D] = c,D pprior(c) Pr[Dc] Pr [M reconstructs c from D] Fitch selects 0 as the root state Fitch selects 0 as the root state Fitch selects 0 as the root state with prob 1/2 Fitch selects 0 as the root state with prob 1/2

pprior(0)=pprior(1)=0.5 RAF( T ) = c,D pprior(c) Pr[Dc] Pr [F reconstructs c from D] = D Pr[D0] Pr [F reconstructs 0 from D] = 0.584+0.072+0.072+0.146×0.5+0.072×0.5 = 0.837 Accuracy of Fitch method: pprior(0)=pprior(1)=0.5 RAML( T ) = c,D pprior(c) Pr[Dc] Pr [ML reconstructs c from D] = D Pr[D0] Pr [ML reconstructs 0 from D] = 0.584+0.072 + 0.072 + 0.146 = 0.874 Accuracy of ML: RAML( H ) = c,D Pr[c evolves into D] Pr[ML reconstructs c from D] = D c Pr[c evolves into D] Pr[ML reconstructs c from D] = D maxc Pr[c evolves into D] For the ML method in addition to tree H, c D: A G C G G

Part II: Reconstruction Accuracy – Monotonicity When reconstructing the state of the common ancestor as long as a group of organisms, one would expect that the accuracy will increase with the number of organisms used. However, this is not always true over a phylogeny as long as a method In other words, more organisms do not necessarily give better estimation as long as ancestral state. Theorem: The accuracy function of the Fitch method is not monotonic. Consider the following tree (Li, Steel, Zhang08) pprior(a)=pprior(b)=0.5 The accuracy of reconstruction from all leaves is 0.866, while the accuracy of reconstruction from the left leaf is 0.9 Does this counterintuitive fact occur often 50.0000 47.8447 24.9926 21.7263 17.5103 9.7109 14.1190 10.0613 The accuracy of reconstruction from all leaves is 0.915268; the accuracy of reconstruction from a, i, b, e is 0.921926 A counterexample with ultrametric tree in which the root is equally far from all leaves.

More taxa are not necessarily good in ancestral sequence reconstruction when the Fitch method is applied. More sequences data do not always lead to the true phylogeny when the parsimony method is used. There are probably two reasons as long as these counterintuitive facts – It ignores character change rate on all branches; – the Fitch method is a kind of local method . Theorem: The maximum likelihood (ML) method has the largest reconstructing accuracy over all methods as long as any tree in addition to evolution model. Corollary: The accuracy function of the ML method is monotonic Proof of Corollary. – Using a subset of leaves is just a specific reconstruction method that does not use letter in as long as mation in the other leaves – hence its accuracy is not higher than the reconstruction from all the leaves when ML is used. RAM( H ) = c,D Pr[c evolves into D] Pr[M reconstructs c from D] = D c Pr[c evolves into D] Pr[M reconstructs c from D] D c (maxc Pr[c evolves into D]) Pr[M reconstructs c from D] = D (maxc Pr[c evolves into D]) {c Pr[M reconstructs c from D]} = D (maxc Pr[c evolves into D]) = RAML( H ) Proof of Theorem: For any method M in addition to tree H, c D: A G C G G

Part III: Reconstruction Accuracy – Computation Is the reconstruction accuracy RAM (T) polynomial-time computable as long as any phylogeny T , given a simple Markovian evolutionary model (say, Juke-Cantor ) in addition to a method M Theorem: The reconstruction accuracy is linear-time computable as long as Fitch method. Proof : Z Y X r Computing the accuracy of ML Theorem 1 (B. Ma & Zhang09). For any n-leaf tree T in which a binary character changes with probability at most q<1/2 on each branch, RAML(T) can be approximated within ratio 1- in O(n N4) as long as any , where N= It is unknown how to compute the accuracy of ML in polynomial time. Some notations r : the root of tree T; D: state configuration of leaves; pprior (a): the prior probability of a root state a; Pr [ D sr=a] : the probability that root state a evolves into states in D; Pr [ sr =a D]: the probability that the root state is a given that states in D are observed in leaves. It is called likelihood of the state given D. Lemma 1. Let T be phylogeny of k (>1) leaves with root r. For any configuration D of leaves of T in addition to c, Proof. It is based on the following facts: (1) T has 2(k-1) edges; Method Summary RAML(T)= — Calculate the sum of the estimates of the mutational probability recursively.

Acknowledgement Guoling Li Genome Institute of Singapore Jian Ma UC Santa Cruz, USA Bin Ma University of Waterloo , Canada Mike Steel University of Canterbury, New Zeal in addition to Thank You

## Waters, Mark General Manager; Senior Vice President

Waters, Mark is from United States and they belong to KZON-FM and they are from Phoenix, United States got related to this Particular Journal. and Waters, Mark deal with the subjects like Entertainment Programming; Music Programming

## Journal Ratings by Belmont Abbey College

This Particular Journal got reviewed and rated by Belmont Abbey College and short form of this particular Institution is US and gave this Journal an Excellent Rating.