AnyBook4Less.com - ISBN: 0387952292 - Statistical Methods in Bioinformatics by Warren J. Ewens

AnyBook4Less.com

Find the Best Price on the Web
Order from a Major Online Bookstore

Home | Store List | FAQ | Contact Us |

Ultimate Book Price Comparison Engine
Save Your Time And Money

Statistical Methods in Bioinformatics

Please fill out form in order to compare prices

Title: Statistical Methods in Bioinformatics
by Warren J. Ewens, Gregory R. Grant
ISBN: 0-387-95229-2
Publisher: Springer Verlag
Pub. Date: 20 April, 2001
Format: Hardcover
Volumes: 1
List Price(USD): $84.95

Your Country
Currency
Delivery
Include Used Books
Are you a club member of: Barnes and Noble Books A Million Chapters.Indigo.ca

Average Customer Rating: 3.8 (5 reviews)

Customer Reviews

Rating: 2
Summary: Disappointing overview
Comment: This book is a tremendous disappointment, given other Amazon reviews and the impressive Table of Contents. I picked several topics about which I know something: Likelihoods, P-values, bootstraps. I would have had NO idea about either of these subjects based on the poor delivery in this book. Topics are not well introduced, there are virtually no examples, and the introduction/discussion of most topics is wordy and not informative.

A topic such as the two-sample t-statistic is scattered throughout the book, with the main part not even cited in the index!

Unfortunately there are not a lot of books in the field of Statistics in Bioinformatics. However, I would recommend "The Elements of Statistical Learning" (Hastie et al.) for classifiers etc (Duda and Hart's classic is also good). I would recommend "Biostatistical Analysis" by Zar for a general coverage, and Terry Speed's "stat Labs: Mathematical Statistics ..." which is not comprehensive but has good lab examples with associated statistical analysis.

Rating: 4
Summary: Pretty good overview
Comment: This book is a timely introduction to the mathematical statistics used in computational biology and bioinformatics. The authors have done a superb job in the overview of a subject that students of biology and bioinformatics can rely on for study and for reference. The mathematics is done at an advanced undergraduate level, but the authors are pragmatic in their approach, and interlace the discussion with biological applications immediately after the appropriate mathematical background has been developed. It thus seems appropriate to discuss the quality of the presentation with these applications in mind.

Chapter one begins, appropriately, with an introduction to probability theory, with a consideration of discrete probability distributions of one variable beginning the chapter. The Bernoulli, binomial, uniform, geometric, generalized geometric, and Poisson distributions are discussed. The authors point out the use of geometric-like distributions in the BLAST application. The also caution the reader as to the difference between the mean and the average of a random variable. They then move on to consider continuous distributions, discussing briefly the uniform, Normal, exponential, gamma, and beta distributions. Moment-generating functions are also introduced, and they prove a "convexity" theorem for these functions that is important in the BLAST application. The authors also introduce the relative entropy and generalized support statistics, the later also being used in BLAST.

The next chapter is an overview of probability theory in many random variables. The results in chapter one are discussed in this context, and the authors give an interesting application to the sequencing of EST libraries. The authors also point out that the variance of the maximum of a collection random variables is finite as the number of variables increases, a fact that is used quite often in bioinformatics. Transformations of random variables are also discussed, with the goal of showing how these can be used to find the density function of a single random variable, this also being important in BLAST.

The most important subject of the book begins in chapter 3, wherein the authors introduce statistical inference. They begin with a very brief discussion of the differences between the frequentist and Bayesian approaches to statistical inference and then move on to classical hypothesis testing and nonparametric tests. This chapter is of great value to those readers, for example biologists/would-be bioinformaticists who are approaching statistics for the first time.

Chapter 4 introduces concepts that are of upmost importance in probabilistic computational biology, namely Markov chains. The discussion in this chapter sets up the strategies used in the next chapter on analyzing a single DNA sequence and a latter chapter on hidden Markov models. Shotgun sequencing is discussed as a tool to determine the an actual DNA sequence, and the authors discuss the probabilistic issues that arise in the reconstruction of long DNA sequences from shorter sequences. Missing in this chapter is a mathematical analysis of the advantages/disadvantages between shotgun and whole genome sequencing strategies.

Chapter 6 then generalizes the analysis of chapter 5 to multiple DNA and protein sequences. It is here that one begins to talk about alignments between sequences, which bring about some very subtle mathematical problems in computational biology. The computational complexity of the (global) alignment problem entails the use of softer techniques, such as dynamic programming, which is discussed in this chapter. The (local) alignment problem is also discussed in some detail, using the linear gap model. The alignment problem and the issues with scoring for protein sequences are also discussed in detail. The reader first encounters the famous PAM and BLOSUM matrices in this chapter. The authors do not discuss any connections with the protein folding problem, unfortunately.

The next chapter introduces the basic probability theory behind the BLAST algorithm, namely random walks. They do so with emphasis on moment generating functions, which might be a little abstract for the biologist reader.

The authors return to tatistical estimation and hypothesis testing in chapter 8, with maximum liklihood and fixed sample size tests discussed in some detail. Again connecting with the BLAST algorithm, the sequential probability ratio test is treated.

The authors finally get down to the BLAST algorithm in chapter 9, using an older version of the software (1.4). The connection of the algorithm with random walks and how to assign scores is immediately apparent, as is the ability of BLAST to do database queries against a chosen sequence. The algorithm is compared with the sequential analysis discussed in the last chapter.

The authors return to Markov chains in chapter 10, and give some numerical examples. In addition, they treat the important topic of Markov chain Monte Carlo via the Hastings-Metropolis algorithm, Gibbs sampling, and simulated annealing. An application of simulated annealing to the double digest problem is described. The authors also spend a litte time discussing continuous-time Markov chains.

Hidden Markov models are finally discussed in chapter 11. These have been the most effective tools in sequence analysis and the authors give a nice overview of their construction and properties in this chapter. The Pfam package is discussed as a software implementation of HMMs for determining protein domains. Unfortunately, they do not discuss the excellent package HMMER for implementing HMMs in sequence analysis.

Chapter 12 discusses computationally intensive methods in classical inference. One of these methods, the bootstrap procedure, which is used for large sample sizes, is described. Used to estimate confidence intervals in situations where there is not enough information to employ classical methods, the authors detail a method using quantiles to estimate the confidence interval for the standard deviation of the expression intensity of a gene. This is followed by a return to the multiple testing problem of chapter 3 in the context of the data analysis of expression arrays.

I did not read the last two chapters on evolutionary models and phylogenetic tree estimation so I will omit their review.

Rating: 5
Summary: guide into the right direction
Comment: This is one of the books I have been waiting for. For a population geneticist who wants to learn bioinformatics, most texts are unacceptable: They present heuristic methods in a cookbook fashion, with little reference to what is going on biologically as well as mathematically.

This book is the first exception I know of. It builds, and rests on, solid foundations of genetic stochastic processes and still goes all the way to real-life problems. Let me illustrate this by means of an example, rather than enumerating all the topics in the book.

Chap. 14, entitled `phylogenetic tree estimation' (as opposed to the more common term `phylogenetic tree reconstruction' - not without reason, I presume) builds on, and is firmly interlaced with, Chap. 13 about `evolutionary models', which systematizes the zoo (if not jungle) of substitution models in both discrete and continuous time. On this basis, the overview of tree-building methods makes a lot of sense. Even better, it does not stop here, but presents an application (to real sequence data), followed by a careful analysis of where the various methods agree, and where - and maybe why - they disagree. This way, it clears away some common misconceptions; in particular, it presents a careful analysis of what bootstrap does and what it does not in this context. The chapter closes with a discussion of unresolved problems (like inhomogeneity of substitution rates), and methods and possible pitfalls related to testing of nested and non-nested hypotheses in tree estimation.

The book is written in an informal style without being imprecise, which makes it pleasant reading. It is particularly suitable for teaching at a high level. This is enhanced by realistic (and even real-life) examples that furnish the text, as well as carefully chosen exercises at the end of each chapter.

Certainly, this first edition of `Statistical Methods in Bioinformatics' cannot be the last word in this fast-moving field. But it is an excellent guide into the `right' direction.

Similar Books:

	Title: Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison ISBN: 0521629713 Publisher: Cambridge University Press Pub. Date: 01 July, 1999 List Price(USD): $45.00
	Title: Bioinformatics: Sequence and Genome Analysis by David W. Mount ISBN: 0879696087 Publisher: Cold Spring Harbor Laboratory Pub. Date: 15 March, 2001 List Price(USD): $75.00
	Title: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield ISBN: 0521585198 Publisher: Cambridge University Press Pub. Date: 15 January, 1997 List Price(USD): $75.00
	Title: Beginning Perl for Bioinformatics by James Tisdall ISBN: 0596000804 Publisher: O'Reilly & Associates Pub. Date: 15 October, 2001 List Price(USD): $39.95
	Title: Bioinformatics: The Machine Learning Approach, Second Edition (Adaptive Computation and Machine Learning) by Pierre Baldi, Søren Brunak ISBN: 026202506X Publisher: MIT Press Pub. Date: 01 August, 2001 List Price(USD): $60.00

Thank you for visiting www.AnyBook4Less.com and enjoy your savings!