Seminar Announcements for Spring 2012

________________________________________

Title: Estimating Relative Risks for Longitudinal Binary Response Data
Speaker:  Dr. Binbing Yu, National Institute of Aging - IRP, National Cancer Institute
Abstract:  Logistic regression is the dominant modeling technique for measuring the risk of exposure or treatment on binary responses. The measure of risk in a logistic regression is odds ratio (OR), which is also valid in retrospective studies. Nevertheless, relative risk (RR) is often the preferred measure of exposure effect because it is more interpretable. When the prevalence is low, OR is a good approximation to RR. Their difference, however, is large for common responses, in which case, the log-binomial model is more desirable. Despite the fact that various techniques have been developed to estimate RR for data from cross-sectional studies, there is no statistical method available for estimating RR in longitudinal studies. To address this issue, we developed log-binomial regression models for longitudinal binary response data. We consider both the marginal model and the random-effects model. The generalized estimating equation with the COPY method is used to fit the marginal log-binomial model and the Bayesian Markov Chain Monte Carlo method is used to obtain the parameter estimates for the random-effects log-binomial model. The performances of the proposed methods are evaluated and compared with competing methods through a large-scale simulation study. The usefulness of the methods is illustrated with data from a respiratory disorder study.

Date: Friday, January 27th, 2012
Time: 11:00-12:00 noon
Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________


Title: Banded estimation and prediction for linear time series
Speaker:  Yulia R. Gel,  Associate Professor ,  Department of Statistics  and Actuarial Science,  University of Waterloo
Abstract: This talk discusses banded regularization of an empirical autocovariance matrix and its impact on model estimation and forecasting of a linear weakly dependent time series which does not degenerate to a finite dimensional representation. In particular, we show that banding enables us to employ an approximating model of a much higher order than typically suggested by AIC, while controlling how many parameters are to be estimated precisely and the level of accuracy. We present results on asymptotic consistency of banded autocovariance matrices under the Frobenius norm and the same realization of time series, and provide a theoretical justification on optimal band selection using cross-validation. Remarkably, the cross-validation loss function for banded prediction is related to the conditional mean square prediction error (MSPE) and, thus, may be viewed as an alternative model selection criterion. The proposed procedure is illustrated by simulations and application to predicting sea surface temperature (SST) index in the Nino 3.4 region. This is a joint work with Peter Bickel, University of California, Berkeley.
Date: Friday, Feburary 3rd. 2012
Time: 11:00-12:00 noon
Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________


Title: Simulation-based maximum likelihood inference for partially observed Markov process models
Speaker:  Anindya Bhadra, Postdoctoral Fellow, Statistics, Texas A&M University
Abstract: Estimation of static (or time constant) parameters in a general
class of nonlinear, non-Gaussian, partially observed Markov process models
is an active area of research. In recent years, simulation-based
techniques have made estimation and inference feasible for these models
and have offered great flexibility to the modeler. An advantageous feature
of many of these techniques is that there is no requirement to evaluate
the state transition density of the model, which is often high-dimensional
and unavailable in closed-form. Instead, inference can proceed as long as
one is able to simulate from the state transition density - often a much
simpler problem. In this talk, we introduce a simulation-based maximum
likelihood inference technique known as iterated filtering that uses an
underlying sequential Monte Carlo (SMC) filter. We discuss some key
theoretical properties of iterated filtering. In particular, we prove the
convergence of the method and establish connections between iterated
filtering and well-known stochastic approximation methods. We then use the
iterated filtering technique to estimate parameters in a nonlinear,
non-Gaussian mechanistic model of malaria transmission and answer
scientific questions regarding the effect of climate factors on malaria
epidemics in Northwest India. Motivated by the challenges encountered in
modeling the malaria data, we conclude by proposing an improvement
technique for SMC filters used in an off-line, iterative setting.
Date: Tuesday, Feburary 7th, 2012
Time: 10:00-11:00am
Location: Monroe 451 ( (2115 G Street, NW, Washington, DC 20052)
________________________________________


Title: Statistical Methods for Dynamic Models with Application Examples
Speaker: Tao Lu, Department of Biostatistics and Computational Biology, University of Rochester, School of Medicine
Abstract: A dynamical system in engineering and physics, specified by a set of differential equations,
is usually used to describe a dynamic process which follows physical laws or engineering
principles. The parameters in the dynamical system are usually assumed known. However,
an interesting question to ask is how to estimate these parameters when they are not known
before. In this talk, I show you two examples where various statistical methods are applied to
dynamic models for estimating unknown parameters based on observed data. Eventually, we
are interested in predicting the future behavior of the dynamic system. The first example is
on modeling HIV viral load dynamics from a clinical trial study. The second is on modeling
a complicated interactive network.
Date: Friday, Feburary 10th. 2012
Time: 11:00-12:00 noon
Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________


Title: Random walk Metropolis chains on the hypercube: Mixing time and cutoff
Speaker: Winfried Barta,  Department of Statistics, University of Chicago
Abstract: Markov chain Monte Carlo (MCMC) methods allow us to do approximate calculations in situations where exact ones are impractical or infeasible. To estimate features of a probability distribution Q, we construct a Markov chain that converges to Q as the number of steps goes to infinity. The mixing time of the chain tells us for how many steps we need to run the chain so that its distribution is close to Q. Some Markov chains exhibit cutoff: the distribution of the chain stays far away from Q for a while, and then rapidly gets extremely close.

We consider Random walk Metropolis (RWM) chains on the hypercube {0, 1}^n. At each step we select one of the n coordinates uniformly at random and then flip it with a certain probability. Two relevant classes of applications are variable selection problems and random graph models. We present a new proof for cutoff of the RWM chain for the Erdos-Renyi random graph model. This is the distribution Q on the hypercube where each coordinate is present independently with probability q. The proof is purely probabilistic. It relies on coupling and a projection to a two-dimensional chain, where we only record the number of ones and the distance to the starting state of the chain.

Next we generalize this result to unimodal distributions Q that are radially symmetric. Under smoothness conditions that ensure asymptotic unimodality of the stationary distribution of the number of ones, we show that the RWM chain has cutoff when started from an extremal state. We conjecture a corresponding result for general starting states and briefly discuss a further generalization to radially symmetric multimodal distributions.
Date/Time: Tuesday, Feburary 14th, 2012, 11:00-12:00 noon
Location: Rome Hall 5th floor department conference room
________________________________________

Title: Matern Class of Cross-Covariance Functions for Multivariate Random Fields 
Speaker: Dr. Tatiyana V Apanasovich, Thomas Jefferson University, Jefferson Medical College, Department of Pharmacology  and Experimental Therapeutics, Division of Biostatistics
Abstract: Data indexed by spatial coordinates have become ubiquitous in a large number of applications, for instance in environmental, climate and social sciences, hydrology and ecology. Recently, the availability of high resolution microscopy together with advances in imaging technology has increased the importance of spatial data to detect meaningful patterns as well as to make predictions in medical applications (brain imaging) and systems biology (images of fluorescently labeled proteins, lipids, DNA). The defining feature of multivariate spatial data is the availability of several measurements at each spatial location. Such data may exhibit not only correlation between variables at each site but also spatial correlation within each variable and spatial cross-correlation between variables at neighboring sites.  Any analysis or modeling must therefore allow for flexible but computationally tractable specifications for the multivariate spatial effects processes. In practice we assume that such processes, probably after some transformation, are not too far from Gaussian and characterized well by the first two moments. The model for the mean follows from the context. However, the challenge is to find a valid specification for cross-covariance matrixes, which is estimable and yet flexible enough to incorporate a wide range of correlation structures. Recent literature advocates the use of Matern family for univariate processes. I will introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from Matern class (Apanasovich et al (2012)). Unlike previous attempts, our model indeed allows for various smoothness and rates of correlation decay for any number of vector components. Moreover, I will provide an example of modeling time dependent spatial data with Matern covariances, where dependences across space and time interact (Apanasovich (2012)). Further I will discuss models for multivariate response variables in both space and time, which include all possible interactions between space/time locations and variables (Apanasovich and Genton (2010)). 

The application of the proposed methodologies will be illustrated on the datasets from environmental and soil sciences as well as meteorology and systems biology.


Date: Friday, Feburary 17th. 2012
Time: 11:00-12:00 noon
Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________

Title:
Weighting Method and Pseudo-Semiparametric Inference for Population-Based Case-Control Studies with Complex Sampling
Speaker: Yan Li,  Department of Mathematics, University of Texas Arlington
Abstract: The use of complex sample designs (e.g. stratified multistage clustered sampling), along with frequency matching, is becoming common in population-based case-control studies.  Survey design-based approaches can be inefficient for the analysis of case–control studies with frequency matching due to the large variation of population weights between cases and controls.  We propose a weighting method that post-stratifies the scaled control population weights to the estimated population distribution of the matching variables among cases.  This weighting maintains the efficiency of frequency matching.  The developed weighting methods are then employed in the pseudo-semiparametric maximum likelihood estimators (pseudo-SPMLE) to investigate the effect of gene-environment interaction on the risk of human diseases.  Two design complications (i.e. differential population weights and intra-cluster correlation of individuals) are considered in the pseudo-SPMLE.   The weighting method and the pseudo-SPMLE are evaluated by using simulation studies and are applied to two motivation examples: the Kaposi sarcoma case–control study that was conducted in Sicily and the US kidney cancer case–control study.

Date/Time: Tuesday, Feburary 21th, 2012, 11:00-12:00 noon
Location: Rome Hall 5th floor department conference room
________________________________________

Title: On Coverage & Detection Problems in Sensor Networks

Speaker: Dr. Bimal Roy, Director, Indian Statistical Institute, Kolkata, India

Abstract: Since a sensor has limited communication capability, covering a "field"
with sensors so that the communication in the network is smooth is a
challenging problem. A method of dropping sensors from a helicopter and
then using an actuator (robot with limited intelligence and carrying
capability) to make minor adjustments is proposed.
Once the sensors are placed, detecting an event (say for example, an
explosive) is the next challenge. Assuming a model for sensing,a method
based on standard test of hypothesis is proposed.


Date/time: Wed., March 7, 2012, 4-5pm.

Location:  Duques Hall, Room: 453 (2201 G Street NW, Washington, DC 20052)

====================

Title:
Scale Invariant Estimation with High-dimensional Data
Speaker: Tingni Sun,  Department of Statistics, Rutger University
Abstract: We propose a scale invariant method for the estimation of parameters in linear regression and precision matrix. In linear regression, scaled Lasso jointly estimates the regression coefficients and noise level with a gradient descent algorithm. Under mild regularity conditions, we derive oracle inequalities for the prediction and the estimation of the noise level and regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise level estimator, including certain cases where the number of variables is of greater order than the sample size. Moreover, the scaled Lasso is used to construct an estimator for precision matrix, by taking advantage of the relationship between the matrix inversion in block form and the multivariate linear regression. Under the sparsity on matrix degree and the boundedness of spectrum norm of the target matrix, the proposed estimator guarantees the fastest convergence rate under the spectrum norm. Numerical results demonstrate the superior performance of the proposed methods. This is a joint work with Cun-Hui Zhang.

Date/Time: Tuesday, March 20th, 2012, 11:00-12:00 noon
Location: Rome Hall 5th floor department conference room
________________________________________

Title: PCA Asymptotics & Analysis of Tree Data
Speaker: Dan Shen,  Department of Statistics, University of North Carolina
Abstract: High dimensionality has become a common feature of data encountered in many divergent fields,
which provides modern challenges for statistical analysis. To cope with the high dimensionality,
dimension reduction become necessary. Principal component analysis (PCA) is arguably the
most popular classical dimension reduction technique, which uses a few principal components
(PCs) to explain most of the data variation.
My talk first introduces a general asymptotic framework for studying consistency properties of
PCA. Assuming the spike population model, the framework considers increasing sample size,
increasing dimension and increasing spike sizes. Our framework includes several previously
studied domains of asymptotics as special cases, and for the first time allows one to investigate
interesting connections and transitions among the various domains. The unification power and
additional theoretical insights offered by our general framework for PCA are really intriguing.
The second part of my topic is about developing statistical methods for analyzing tree-structured
data objects. This work is motivated by the statistical challenges of analyzing a set of blood
artery trees, which is from a study of Magnetic Resonance Angiography (MRA) brain images of
a set of 98 human subjects. The non-Euclidean property of tree space makes the application of
conventional statistical analysis, including PCA, to tree data very challenging. We develop an
entirely new approach that uses the Dyck path representation, a tool for asymptotic analysis of
point processes. This builds a bridge between the tree space (a non-Euclidean space) and curve
space (standard Euclidean space). That bridge enables the exploitation of the power of functional
data analysis to explore statistical properties of tree data sets.
This is a joint work with Dr. J.S. Marron and Dr. Haipeng Shen.

Date/Time: Friday, March 23th, 2012, 3:00-4:00
Location: Rome Hall 5th floor department conference room

________________________________________

Title:  Current Challenges in Mathematical Genomics: A study on using Olfactory Receptors(ORs)
Speaker: Pabitra Pal Choudhury, Professor of Computer Science, International Statistical Educational Center (ISEC), ISI, Kolkata
Abstract: Scientists have come to know that roughly there are 700 ORs
of length on an average about 1000 base pairs in Human Genome. But it
is true that we can make 4^1000 codes of length 1000 which consist of
A, T, C and G nucleotides. Out of these only 700 are selected by
NATURE as human olfactory DNA sequences. So, it could be either by
some formation methodology or by some selection methodology or both
one followed by another. These issues will be discussed in the talk.


Date: Friday, March 23th. 2012
Time: 11:00-12:00 noon
Location: Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)

________________________________________

Title: Bivariate/Multivariate Markers and ROC Analysis

Speaker: Mei-Cheng Wang, Professor, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

Abstract: This talk considers receiver operating characteristic (ROC) analysis for bivariate marker measurements. The research interest is  to extend rules and tools from univariate marker to bivariate  marker setting  for evaluating predictive accuracy of markers. Using a tree-based and-or classifier, an ROC function together with  a weighted ROC function (WROC) and their  conjugate counterparts are proposed for examining the performance of bivariate markers. The proposed functions evaluate the performance of and-or classifier among all possible combinations of marker values, and are ideal measures for understanding the predictability of biomarkers in target population. Specific features of  ROC and  WROC functions and other related statistics are discussed in comparison with those familiar properties for univariate marker. Nonparametric methods are  developed for estimating  ROC-related functions, (partial) area under curve and concordance probability. The inferential results developed in this paper also extend  to multivariate marker measurements  with a sequence of arbitrarily combined and-or classifier. The proposed procedures and inferential results are useful for evaluating and comparing marker predictability based on a single or bivariate marker (or test) measurements with different choices of markers, and  for evaluating different and-or combinations in classifiers. The approach is applied to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data to illustrate the applicability of the proposed procedures.

* Content of this talk is based on joint work with Shanshan Li

Date/time: Fri., April 6, 2012, 11-noon.

Location:  Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________

Title:  A Model-based Approach to Limit of Detection in Studying Persistent Environmental Chemicals Exposures and Human Fecundity

Speaker: Zhen Chen,  National Institute of Child Health & Human Development, NIH

Abstract:   Human exposure to persistent environmental pollutants often results in a
range of exposures with a proportion of concentrations below the
laboratory detection limits. Growing evidence suggests that inadequate
handling of concentrations below the limit of detection (LOD) may bias
assessment of health effects in relation to chemical exposures. We sought
to quantify such bias in models focusing on the day specific probability
of pregnancy during the fertile window, and propose a model-based approach
to reduce such bias. A flexible multivariate skewed generalized
$t$-distribution constrained by LODs is assumed, which realistically
represents the underlying shape of the chemical exposures. Correlations in
the multivariate distribution provided information across chemicals. A
Markov chain Monte Carlo sampling algorithm was developed for implementing
the Bayesian computations. The deviance information criterion measure is
used for guiding the choice of distributions for chemical exposures with
LODs. We applied the proposed approach to data from the Longitudinal
Investigation of Fertility and the Environment (LIFE) Study.

Date/time: Fri., April 13, 2012, 11-noon.

Location:  Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________


Title:  On Statistical Inference in Meta-Regression

Speaker: Dr. Guido Knapp, Technische Universität Dortmund, Germany

Abstract:   The explanation of heterogeneity, that occurs when combining results of
different studies sharing a common goal, is an important issue in
meta-analysis. Besides including a heterogeneity parameter in the
analysis, it is also important to understand the possible causes of heterogeneity.
A possibility is to incorporate study-specific covariates in the model
that account for between-trial variability. This leads to what is known as
the random effects meta-regression model. In this talk, we will discuss
the commonly used methods for meta-regression and propose a new method
based on generalised inference. Higher order likelihood methods will also
be considered.

Date/time: Fri., April 20, 2012, 11-noon.

Location:  Phillips Hall, Room 110 (801 22nd Street, NW, Washington, DC 20052)
________________________________________


 Title: Subjective Probability: Its Axioms and Acrobatics.

Speaker:Nozer Singpurwalla, Professor of Statistics and Decision Science, GWU

Abstract:
The meaning of probability has been enigmatic, even to the likes of Kolmogorov, and continues to be so. It is fallacious to claim that the law of large numbers provides a definitive interpretation.
 
Whereas the founding fathers, Kardano, Pascal, Fermat, Bernoulli, de Moivre, Bayes, and Laplace, took probability for granted, the latter day writers, Venn, von Mises, Ramsey, Keynes, deFinetti, and Borel engaged in philosophical and rhetorical discussions about the meaning of probability.  Entering into the arena were also physicists like Cox, Jeffreys, and  Jaynes and philosophers like Carnap, Jeffrey, and Popper.  Interpretation matters because the paradigm used to process information and act upon it, is determined by perspective.

The modern view is that the only philosophically and logically defensible interpretation of probability is that probability is not unique, that it is personal, and therefore subjective. But to make subjective probability mathematically viable, one needs axioms of consistent behavior. The Kolmogorov axioms are a consequence of the behavioristic axioms. In this expository talk, I will review these more fundamental axioms and point out some of the underlying acrobatics that have led to debates and discussions.

Besides mathematicians, statisticians, and decision theorists, the material here should be of interest to physical, biological, and social scientists, risk analysts, and those engaged in the art of “intelligence” (Googling, code breaking, hacking, and eavesdropping).

Time:  Friday, April 27th 3:00 pm - 4:00 pm 

Place:  Duques 651 (2201 G Street NW, Washington, DC 20052)
________________________________________