None
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Phylogeny- A Diagram for Evolutionary Network-is used to infer the phylogenetic relationships among the species or genes. The phylogenetic analysis including morphological, biological, and bionomic characters, allozyme, RFLP data have been extensively used to infer the evolutionary relationship among the species during the pre-genomic era. With the advent of high throughput sequencing technologies and the development of extensive statistical analytical tools, an increased amount of sequence information is made available in the public domains. This particular situation has revolutionarized the field of phylogenetics, and has opened up opportunities for drawing and reconstructing the phylogenetic relationships with more confidence and accuracy. Consequently, today, phylogenetics has become an integral part of any sequencing associated research projects. Although, many publications related to the understanding of the phylogenetic tree are available, most of them are either for the experts in the field or for bioinformaticians. It is essentially needed for the beginner to start from a document that includes all the basics together with briefings of the modern developments in phylogenetics. Considering the importance of phylogenetic analysis in modern science, here in this review, an attempt was made to simplify the understanding of the phylogenetic tree construction, availability and usability of the different methods and software tools for inferring the trees.
The field of phylogenetics has become an integral part of any modern biological research. Construction of phylogenetic tree becoming such an easy task that novice can also construct relatively near to perfect phylogenetic tree with little hard work. This is majorly due to free availability of many tree construction, viewing and editing tools that demand very little knowledge regarding the phylogenetic construction procedures (i. e., it is not mandatory to know the basics of the models and algorithm procedures which involves in behind the scenes). Phylogenetic analysis can be performed to infer the evolutionary relationship among the members of the taxa, to understand the evolution of the genomes and gene families, to classify the genes into various classes like orthologs, paralogs, in- or out-paralogs, to understand the evolution of the new functions through duplications, horizontal gene transfers, gene conversion, recombination, and co-evolution etc. (Hafner and Nadler, 1988; Nei, 2003; Pagel, 2000). Phylogenetic analysis provides a powerful tool for comparative genomics (Pagel, 2000). Genome sequencing projects are providing valuable sequence information that is widely used to infer the evolutionary relationship between different species or genes. The species' phylogenies are generally inferred based on the paleontological/geological information or morphological traits (Nei, 2003). These phylogenies act as a reference to assess the veracity of the phylogenetic tree constructed based on any phylogenetic informative marker. With the increased availability of whole genome sequences, the field of phylogenomics (i.e. use of either whole genome or a large number of genes for phylogenetics analysis) is becoming popular among the evolutionary biologists (Fitz Gibbon and House, 1999; Korbel et al., 2002; Snel et al., 1999; Thornton and DeSalle, 2000). Many phylogenomics based reports have been published, and most of them are true reflective of reference species' phylogenies that are inferred from paleontological and/or geological information (Kumar and Filipski, 2001). Furthermore, phylogenomics reconstruction helps in supplementing or correcting the earlier working phylogenetic relationships (Kumar and Filipski, 2001). Phylogenetic trees can be drawn from genes (nucleotide or protein sequences), morphological, biological and bionomic characters, restriction fragment polymorphisms, or whole genome orthologs, or geological records (Horner and Pesole, 2004; Klenk and Göker, 2010; Nikaido et al., 2001; Snel et al., 1999). Although it is very easy to construct the phylogenetic trees using the user-friendly software tools, often it is observed that having basic information regarding the processes that undergo behind the scenes will greatly helps in improving the quality of the phylogenetic tree construction by giving better input values into the programs. Thus, in this review article, our writing centered in basic concepts of construction of phylogenetic analysis using nucleotide or amino acid sequences.
Phylogenetic tree also known as âevolutionary treeâ is the graphical representation of the evolutionary relationship between the taxa/genes in question. A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree. Different terminologies are used to describe the characteristics of a phylogenetic tree. The cladogram is a dendogram which explains only genealogy of the taxa but says nothing about the branch lengths or time periods of divergence (Page and Holmes, 1998; Procter et al., 2010). The phylogram (additive tree) is a phylogenetic tree that explicitly represents a number of character changes (nucleotide/amino acid changes/number of character variations) through its branch lengths (Page and Holmes, 1998; Procter et al., 2010). In case of phylogram the evolutionary distance between any two taxa is given by sum of the branch lengths connected them. Though these trees may be rooted or unrooted, often these trees lack a root. A chronogram (ultrametric) is a rooted phylogenetic tree that posses all the characteristics of an additive tree, in addition with the assumption of molecular clock determination of the molecular divergence time between taxa can be possible (Page and Holmes, 1998). The molecular clock hypothesis assumes that every site in a protein or coding nucleotide sequence from all the species evolve at a constant rate (Zuckerkandl and Pauling, 1962). Furthermore, the chronogram consists of taxa placed equidistant from the ancestor which cannot be seen in case of phylogram. Phenetics (taximetrics) infers the relationship between the taxa that usually involves morphology or other observable traits as phylogenetic informative markers (Duncan and Baum, 1981; Mayr, 1965; Page and Holmes, 1998). A tree that shows the evolution of the genes is known as gene tree (Snel et al., 1999). While, tree that shows the evolution of species is known as species' tree. It is important to note that gene trees are not necessary to follow the species' tree. This is due to different selection constraints that can act on a gene may reflect distinct evolutionary rates from others. How to read a phylogenetic tree:1. A monophyletic grouping is one in which all species share a common ancestor, and all species derived from that common ancestor are included. This is the only form of grouping accepted as valid by cladists.2. A paraphyletic grouping is one in which all species share a common ancestor, but not all species derived from that common ancestor are included.3. A polyphyletic grouping is one in which species that do not share an immediate common ancestor are lumped together, while excluding other members that would link them.The phylogenetic trees may be rooted/unrooted (pl. see figure 1 for typical phylogenetic tree with labeling). A rooted tree represents the divergence of a group of related species from their last common ancestor (root) by successive branching events over the time period. In contrary the unrooted phylogenetic tree reveal inter species/taxa relationships excluding the identification of most recent common ancestor or the root. The rooted phylogenies are constructed using unrelated species/genes involving the phylogenetic reconstruction. Very distantly related taxa or relatively related taxa are considered for tree rooting called out-group and in-group, respectively. The terminal nodes in the phylogenetic tree are called as operational taxonomic units (OTU). The branches that do not join any of the terminal/leaves/OTUs (fig. 1) directly but via internal nodes are called âancestral statesâ or âhypothetical taxonomic unitsâ that might have appeared during evolution and cannot be seen at present (Page and Holmes, 1998; Pagel, 2000). The internal branch points in a species phylogenetic tree represents the speciation events, while gene families' phylogenetic tree, they mean for duplication events (Pagel, 2000). The internal branches may be bifurcating or multi-furcating. Analysis of the gene families generally forms multi-furcating branches and each of the small multi-furcating branches forms a sub tree or a clade (Kao et al., 1999; Nei et al., 1997; Nei and Rooney, 2005). The whole process of construction of the phylogenetic tree is divided into five different steps, viz. Step 1: Choosing an appropriate markers for the phylogenetic analysisStep 2: Multiple sequence alignmentsStep 3: Selection of an evolutionary modelStep 4: Phylogenetic reconstructionStep 5: Evaluation of the phylogenetic treeStep 1: Choosing an appropriate markers for the phylogenetic analysisAny biological information that can be used to infer the evolutionary relationship among the taxa is known as a phylogenetic information marker. It can be anything like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, and conserved intronic positions, etc. Identification of conserved genetic loci (coding- or non-coding) is the first step in analyzing the phylogenetic relationship. Both coding (genes) and non-coding genetic region can be used for the analysis of phylogenetic relationships. However, selected sequence(s) must satisfy the defined necessary rules: (a) the sequence should have a long evolutionary history of conservation, as this feature facilitates, firstly in the preservation of long evolution-selection episodes, and secondly, aids in easy amplification of the target sequences from distant taxa (b) conserved, slow evolving genes may be used to resolve the evolutionary relationship between distantly related species while fast evolving genes should be choose for the recently evolved species or intra-species (c) amino acid sequences are more informative while inferring the evolutionary relationship among distantly related taxa, and conversely, nucleotide information for recently evolved/closely related species (d) the sequences need to be employed in the phylogenetic analysis should be tested for their usability in a given lineage (for instance, mitochondrial (cytochrome C oxidase subunit I & II (CoxI & II)), chloroplast (trnH-psbA, matK, rpoC, rpoB, rbcL), and nuclear (16S ribosomal RNA) conserved genes are preferred to use for analyzing animal, plant, and microbial species, respectively-and are called âbarcode genesâ) (Chantangsi et al., 2007; Liu and Beckenbach, 1992; Raghavendra et al., 2009; Shneer, 2009) (e) finally, if, objective is to estimate the divergence periods between taxa, the selected gene or protein sequences should essentially follow the molecular clock hypothesis (Barton et al., 2007; Kumar and Filipski, 2001). However, recently relaxed molecular clock models have also been proposed. This step follows successful polymerase chain reaction amplification of the target gene/protein, followed by sequencing and editing of the sequences for further analysis.Step 2: Multiple sequence alignmentsThe second step in the phylogenetic construction involves the alignment of edited sequences. Aligning two sequences is known as pair-wise sequence alignment, while the alignment that includes more than two sequences is known as multiple sequence alignments. The pair-wise sequence alignments (MSA) can be classified into global and local. The global pair-wise sequence alignment includes end-to-end alignment of two given sequences irrespective of their sequence sizes, while the local alignment is about finding the best alignment of the short sequence segments locally (http://www.ncbi.nlm.nih.gov/). The main aim of multiple sequence alignment is to compare the three or more nucleotide or protein sequences and to provide the basis for calculation of the sequence diversities/divergences to infer the evolutionary relationship among the taxa. Different models (discussed below) have been proposed based on various assumptions to calculate the sequence divergences between the sequences or taxa. Hence, the correct sequence alignment is mandatory in order to get the true phylogeny that is representative of the evolutionary relationship among the taxa (Feng and Doolittle, 1987). Numerous algorithms have been proposed to perform the task of correct sequence alignment (Procter et al., 2010). Some algorithms are heuristic with a compromised accuracy, while other groups include slow but accurate algorithms, or group with both fast and accurate algorithms (Edgar, 2004; Notredame et al., 2000). Some of the algorithms have been proposed which carry the MSAs by combining the results obtained from more than one program, and hence, reasonably accurate multiple sequence alignment can be resulted (Rice et al., 2000). Although, many program both online and offline are available to perform MSA, often manual intervention is warranted to achieve correct MSAs (Zvelebil and Baum, 2008).Step 3: Selection of an evolutionary modelSelection of an evolutionary model follows the multiple sequence alignment. According to the neutral theory of evolution, most of the mutations are neutral and can occur at the rate of 10-6 to 10-8. Considering this fact every site in a DNA sequence must have undergone numerous substitutions that are proportional to the evolutionary time period. Some sequences may evolve at a faster rate than other, and further, some lineages may undergo faster evolution than others (Lio and Goldman, 1998). Every site in a sequence may evolve differently (Van de Peer and De Wachter, 1997) and may have a differential tendency for mutational tolerance. The nucleotide substitutions can be classified into transitions and transversions, while amino acid substitutions as synonymous and non-synonymous mutations. The transitions have twice as many routes as transversions to occur. Consequently, in nature, the number of transitions always prevails over the transversions. Thus, the rate of transitions to transversions denoted as âRâ is absolutely necessary to infer the correct phylogenetic relationships. The R-value may vary from sequence to sequence, and thus it needs to be estimated for every set of sequences separately. The simplest evolutionary models do not consider the R-value in their analysis.The rate of substitution also varies from a site to site for a given sequence (Van de Peer and De Wachter, 1997). The rates of substitutions are represented by gamma distribution where alpha acts as a measured parameter. This parameter is used to derive a gamma distribution corrected distance, referred to as gamma distance. Thus, inclusion of the gamma parameter will increase the probability of obtaining the correct phylogenetic tree. The actual number of mutations occurred during the evolution to yield the present sequence in question are significantly larger than the actual number of substitutions observed. Hence, evolutionary distance correction is required to obtain near to the actual value through applying best fit models appropriately.All these facts complicate and make the situation that warrants for evolutionary models that can best calculate the actual rate of substitutions for a given set of sequences. Every phylogenetic reconstruction method considers simple to complex models of evolution in order to obtain the evolutionary relationship at least nearer to the reality. A number of different models have been proposed separately for the nucleotide, codon, and protein sequences with emphasis on assumptions made and parameters used (Lio and Goldman, 1998; Yang, 2007). It is important to note that any single model does not incorporate all the possible information; thus, choice of the best fit model for the sequences under study should be critically made before the analysis. Evolutionary model that best explain the observed sequence data can be inferred using the ModelTest or jModelTest software. It uses three different criterions as a measure to infer best fit model, namely hierarchical Likelihood Ratio Test (hLRT), Akaike Information Content (AIC), or Bayesian Information Content (BIC). For more information on how these estimates are calculated, how the parameter rich models influence these estimates, please refer to Posada (2008).Technical details of the different available evolutionary models are beyond the scope of this chapter, and the readers are advised for further reading given in Reference section (Barton et al., 2007; Delport et al., 2008; Lio and Goldman, 1998; Yang, 2007; Yang and Nielsen, 2002).Step 4: Phylogenetic reconstruction Two different methodologies are employed by the presently available programs to generate the dendograms; (a) clustering methods-where two most closely related taxa are placed under single inter-node and further add third taxa considering within internodes taxa as a single group. In this way, the program progressively adds the other remaining taxa to yield final phylogenetic tree (b) second type of methods generate the 'n' number of trees proportional to the number of taxa involved in the phylogenetic analysis followed by the selection of best fit tree topology (increased likelihood or probability) for a given evolutionary model. Choosing the correct substitution model is crucial for inferring the most accurate phylogenetic relationship. The list of freely available software for model selection is listed in the popular software section at the end of the chapter (Table 1).Phylogenetic tree construction methods can be classified into distance methods, minimum evolution, parsimony, probabilistic, and likelihood methods (Table 1). Basically, the distance based methods are simple and the Operational Taxonomic Units (OTUs) clustering is done based on the sequence divergences that are calculated using different evolutionary models. The Unweighted pairwise group of multiple alignments (UPGMA), Neighbor Joining (NJ), Minimum Evolution and Fitch-Margoliash are examples for the distance based methods (Saitou and Imanishi, 1989). These methods produce a single phylogenetic tree with branch lengths using the clustering methods. Further, distance methods can handle a huge number of sequences; for example, to construct the âTree of Life'. Distance based methods derive the pair-wise distances from MSA. While others will consider MSA directly into consideration and construct the phylogenies, that tries to consider every single site variation into the account to derivate branch lengths. The distance matrix is derived from measured distances or morphometric analyses. The various pair-wise distance formulae (Jaccard Coefficient) can be applied for morphological characters or genetic distance data that comes from sequences, restriction site polymorphisms, different methods of marker analysis (for example, micro- or mini-satellites, RAPDs, etc.) or allozyme data. Distance-matrix based methods are generally depending upon the MSA to calculate the pair-wise distances between OTUs. The gaps and missing data can be handled in different ways; a) mismatches (indel/deletions/gaps) can be deleted either pair-wise or completely b) mismatches can be included as mutations in the analysis. The pair-wise distance matrix generated will be used by different phylogenetic reconstruction programs for clustering the taxa. The internal node is placed between two similar taxa. Following which progressive clustering will be done by considering each internal node as single taxa.The NJ based method follows the minimum evolution. The concept of minimum evolution is based on the least number of mutations that are required to obtain a given tree. The maximum parsimony also follows a minimum evolution principle, but are directly on the alignment and minimize the number of mutations required to get the given tree topology. Parsimony methods can be affected by the long-branch attraction (fast evolving species were inferred as closely related because of highly saturated phylogenetically informative sites), while the likelihood methods are best for drawing correct phylogenies with strong statistical support in such cases (Zvelebil and Baum, 2008). Among all, the maximum likelihood and Bayesian probability methods are highly sophisticated that depends on likelihood or probability models to infer the evolutionary distances. To date, these two methods are increasingly become popular to construct the phylogenies. However, these methods are computer intensive and limit the large number of sequences that can be used for constructing larger phylogenies. Finally, every method available till to date can produce wrong phylogenetic relationship under certain conditions and thus, every method has their own followers and discouragers (Nei, 2003).Step 5: evaluating the phylogenetic treeAfter successful construction of the phylogenetic tree, the next step involves evaluation of the tree topology. This process can be performed using two evaluation methods, namely bootstrap method and interior-branch test. The basic concept of bootstrap method is evaluation of the tree topology by constructing phylogenetic trees equal to the given number of pseudo-data replicates. Pseudo-replicates are nothing but complete data set with equal number of information sites (columns) by removing one column information site which is replaced with the complete column site from existing data set. In this way the user defined number of data pseudo-replicates is constructed followed by corresponding phylogenetic trees. The number of times each of the claimed node in initial phylogenetic tree which is under evaluation, is repeated in bootstrap phylogenetic trees will be given in percentages at the tree nodes called âbootstrapped valuesâ or âbootstrapped percentagesâ (Felsenstein, 2004). The tree nodes having >70% bootstrapped values are generally considered as consistent. The computational speed of the bootstrapped testing depends upon the number of sequences, length of the sequences, and finally, the number of pseudo-replicates/bootstrap replicates is requested. This general method of bootstrapping is known as non-parametric bootstrapping. Another variant of the non-parametric bootstrapping is parametric bootstrapping where, the evolutionary model based sequence data sets (pseudo-replicates) are created. This follows the same procedure as non-parametric bootstrapping to evaluate the given phylogenetic tree (Makarenkov et al., 2010). While in case of bootstrap interior branch test, the data sampling is resembles the bootstrapped method, however, here it is used to calculate the branch lengths on the given original phylogenetic tree. In this test confidence of the interior branch length being non-zero is tested and the tree nodes indicated with the confidence of the obtained branch length. This method is considered as an improvement over the existing popular bootstrapped method (Zvelebil and Baum, 2008).
1. Barton, N.H., Briggs, D.E.G., Eisen, J.A., Goldstein, D.B., Patel, N.H. 2007. Evolution (New York, Cold Spring Harbor Laboratory Press).2. Chantangsi, C., Lynn, D.H., Brandl, M.T., Cole, J.C., Hetrick, N., Ikonomi, P., 2007. Barcoding ciliates: a comprehensive study of 75 isolates of the genus Tetrahymena. Int. J. Syst. Evol. Microbiol. 57, 2412-2423.3. Delport, W., Scheffler, K., Seoighe, C., 2008. Models of coding sequence evolution. Brief. Bioinform. 10, 97-109.4. Duncan, T., Baum, B.R., 1981. Numerical phenetics: its uses in botanical systematics. Annu. Rev. Ecol. Syst. 12, 387-404.5. Edgar, R.C., 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113.6. Felsenstein, J. 2004. Inferring phylogenies (Massachusetts, Sinauer Associates, Inc.), p. 644.7. Felsenstein, J., 2005. PHYLIP (phylogeny inference package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle, 47-55.8. Feng, D.F., Doolittle, R.F., 1987. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J. Mol. Evol. 25, 351-360.9. Fitz Gibbon, S.T., House, C.H., 1999. Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27, 4218.10. Guindon, S., Lethiec, F., Duroux, P., Gascuel, O., 2005. PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33, W557.11. Hafner, M.S., Nadler, S.A., 1988. Phylogenetic trees support the coevolution of parasites and their hosts. Nature 332, 258-259.12. Horner, D.S., Pesole, G., 2004. Phylogenetic analyses: a brief introduction to methods and their application. Expert Rev. Mol. Diagn. 4, 339-350.13. Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754-755.14. Kao, H.T., Porton, B., Hilfiker, S., Stefani, G., Pieribone, V.A., DeSalle, R., Greengard, P., 1999. Molecular evolution of the synapsin gene family. J. Exp. Zool. 285, 360-377.15. Klenk, H., Göker, M., 2010. En route to a genome-based classification of Archaea and Bacteria? Syst. Appl. Microbiol. 33, 175-182.16. Korbel, J.O., Snel, B., Huynen, M.A., Bork, P., 2002. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158-162.17. Kumar, S., Filipski, A.J. 2001. Molecular phylogeny reconstruction. In Encyclopedia of life sciences (Macmillan Publishers Ltd, Nature Publishing Group).18. Lio, P., Goldman, N., 1998. Models of molecular evolution and phylogeny. Genome Res. 8, 1233-1244.19. Liu, H., Beckenbach, A.T., 1992. Evolution of the mitochondrial cytochrome oxidase II gene among 10 orders of insects. Mol. Phylogenet. Evol. 1, 41-52.20. Makarenkov, V., Boc, A., Xie, J., Peres-Neto, P., Lapointe, F., Legendre, P., 2010. Weighted bootstrapping: a correction method for assessing the robustness of phylogenetic trees. BMC Evol. Biol. 10, 250.21. Mayr, E., 1965. Numerical phenetics and taxonomic theory. Syst. Biol. 14, 73.22. Nei, M., 2003. Phylogenetic analysis in molecular evolutionary genetics. Annu. Rev. Genet. 30, 371-403.23. Nei, M., Gu, X., Sitnikova, T., 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. U. S. A. 94, 7799-7806.24. Nei, M., Rooney, A.P., 2005. Concerted and birth and death evolution of multigene families. Annu. Rev. Genet. 39, 121-152.25. Nikaido, M., Matsuno, F., Hamilton, H., Brownell, R.L., Cao, Y., Ding, W., Zuoyan, Z., Shedlock, A.M., Fordyce, R.E., Hasegawa, M., Okada, N., 2001. Retroposon analysis of major cetacean lineages: the monophyly of toothed whales and the paraphyly of river dolphins. Proc. Natl. Acad. Sci. U. S. A. 98, 7384-7389.26. Notredame, C., Higgins, D.G., Heringa, J., 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217.27. Page, R.D.M., Holmes, E.C., 1998, Molecular evolution: a phylogenetic approach. Wiley-Blackwell, 417 p.28. Pagel, M., 2000. Phylogenetic-evolutionary approaches to bioinformatics. Brief. Bioinform. 1, 117.29. Pond, S.L.K., Frost, S.D.W., Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676-679.30. Posada, D., 2008. jModelTest: Phylogenetic Model Averaging. Mol. Biol. Evol. 25, 1253-1256.31. Procter, J.B., Thompson, J., Letunic, I., Creevey, C., Jossinet, F., Barton, G.J., 2010. Visualization of multiple alignments, phylogenies and gene family evolution. Nat. Meth. 7, S16-25.32. Raghavendra, K., Cornel, A.J., Reddy, B.P.N., Collins, F.H., Nanda, N., Chandra, D., Verma, V., Dash, A.P., Subbarao, S.K., 2009. Multiplex PCR assay and phylogenetic analysis of sequences derived from D2 domain of 28S rDNA distinguished members of the Anopheles culicifacies complex into two groups, A/D and B/C/E. Infect. Genet. Evol. 9, 271-277.33. Rice, P., Longden, I., Bleasby, A., 2000. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276--277.34. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572.35. Saitou, N., Imanishi, T., 1989. Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol. Biol. Evol. 6, 51.36. Schmidt, H.A., Strimmer, K., Vingron, M., Von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502.37. Shneer, V.S., 2009. DNA barcoding is a new approach in comparative genomics of plants. Genetika 45, 1436-1448.38. Simon, D.L., Larget, B., 1998. Bayesian analysis in molecular biology and evolution (BAMBE). Department of Mathematics and Computer Science, Dequesne University, Pittsburgh.39. Snel, B., Bork, P., Huynen, M.A., 1999. Genome phylogeny based on gene content. Nat. Genet. 21, 108-110.40. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596-1599.Thornton, J.W., DeSalle, R., 2000. Gene family evolution and homology: genomics meets phylogenetics. Annu. Rev. Genomics Hum. Genet. 1, 41-73.41. Van de Peer, Y., De Wachter, R., 1997. Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Comput. Appl. Biosci. 13, 227-230.42. Yang, Z., 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591.43. Yang, Z., Nielsen, R., 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908.44. Zuckerkandl, E., Pauling, L.B. 1962. Molecular disease, evolution, and genetic heterogeneity. In Horizons in Biochemistry, Kasha, M., Pullman, B., eds. (New York, Academic Press), pp. 189-225.45. Zvelebil, M., Baum, J.O. 2008. Understanding bioinformatics, Holdsworth, D., ed. (Garland Science, Taylor & Francis Group, LLC, an informa business).