[Popgenlunch] Talk by Shimodaira on negative amounts of data, 11/17

Joe Felsenstein joe at gs.washington.edu
Mon Nov 7 16:59:13 PST 2011

Folks --

Hidetoshi Shimodaira is visiting next week and, together with Michael
Perlman, I have arranged for him to talk. "Shimo" is one of the most
original thinkers about how to test hypotheses about phylogenies (evolutionary
trees), a hard problem because usually the data are discrete and the space of
phylogenies is a mixture of discrete pieces, each continuous. He visited
my lab at UW in 1995, and has since put forward the AU ("almost unbiased")
test among multiple phylogenies, and the Shimodaira-Hasegawa (SH) test.
I think you will agree that the notion of negative amounts of data is

As you will see (below) the talk is in Savery Hall. He has to leave for the
day right after the talk, but people wanting to talk to him can also book
times with him on Friday.

Joe Felsenstein joe at gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

---------------- please send to whoever might be interested --------------

Thursday, November 17

11:30 - 12:30

Savery Hall, room 156

Bayesian is converted into frequentist by
reversing the sign of the data length.

Hidetoshi Shimodaira

Tokyo Institute of Technology

ABSTRACT: The observed frequency of a particular outcome in data-based
simulation, known as bootstrap probability (BP) of Felsenstein (1985),
is very useful as a confidence level of data analysis with discrete
outcomes such as estimating the phylogenetic tree from aligned DNA
sequences or identifying the clusters from microarray expression
profiles. We argue that the length of simulated data sets should be
(-1) times the original data length for avoiding false positives,
i.e., bias of hypothesis testing, although such a negative data length
cannot be realized. In another word, we perform the "m out of n" bootstrap
with m=-n. This turns out to be equivalent to the
approximately unbiased (AU) confidence level computed by the
multiscale bootstrap of Shimodaira (2002), but such a notion of
negative data length has not been known until Shimodaira (2008). The
method is illustrated in real data analysis of phylogenetic inference
and hierarchical clustering. In the latter part of the talk, the
mathematical justification is explained in terms of distance and
curvature with connection to the geometrical theory of Efron and
Tibshirani (1998) and the argument of Perlman and Wu (1999). BP is
interpreted as Bayesian posterior probability and AU is the
frequentist p-value, and thus changing the length of simulated data
sets bridges the gap between these two confidence levels.

More information about the PopGenLunch mailing list