Giles Hooker
Giles Hooker
Associate Professor:

Department of Statistical Science
Department of Biological Statistics and Computational Biology
Cornell University

Department of Healthcare Policy and Research
Weill Cornell Medical College

Director of Undergraduate Studies:
Biometry and Statistics

Curriculum Vitae
1186 Comstock Hall
Cornell University
Ithaca, NY 14853

Phone: (1 607) 255 1638
Fax: (1 607) 255 4698
*I try to check my e-mail only once each day*

My blog, Of Models and Meanings is a collection of musings on the philosophy of mathematical modeling and what makes a model "interpretable".

I am also a member of, a group dedicated to the public understanding of statistics, in particular providing statistical expertise to journalists.

My book, Functional Data Analysis in R and Matlab with Jim Ramsay and Spencer Graves is now out.

I gave a short course based on this material at the Joint Statistical Meetings in 2015. have previously presented this material at the 2010 International Workshop on Statistical Modeling. Here is a handout and computer lab from it. Code to reproduce the figures in the lecture can be found here. More resources can be found in my book and at

For a bit of amusement, some creativity from long ago: "The Stanford Statistics Songbook: A Musical Tribute". Technical Report, Department of Statistics, Stanford University.

If you are looking for a summer internship, unfortunately I do not have the capacity to take non-Cornell students as Summer interns. Due to the volume of requests, I cannot respond to each of these individually.

Research Interests:

  • Data analysis for dynamical systems and differential equations
  • Machine learning and data mining
  • Disparity-based inference
  • My research focusses on a number of issues within these three fields. I am particularly interested in developing and extending the methods of functional data analysis for examining the evolution of systems in terms of nonlinear differential equations. This involves estimating parameters for such equations, diagnosing when and why equations do not fit data well and developing statistical theory to account for smooth perturbations of such systems.

    Please see my profiling webpages for Matlab code, manuals and webpage demonstrations. CollocInfer, a somewhat more general R package has now been released, a user manual provides a guide to it.

    In machine learning, I focus on the problem of diagnostics and understanding the prediction functions that machine learning produces. I have recently started developing formal statistical inference and testing procedures using ensemble methods such as random forests and am extending these to boosting and to neural networks.

    Lastly, I am interested in robust inference via disparity-based methods such as Hellinger distance. These techniques have been in existence for 30 years or more for iid data; part of my research focusses on adapting these to regression, mixed-effects and Bayesian frameworks.

    I also have a number of articles on the phenomenon of "paradoxical results" in multidimensional educational testing.


    BTRY 4030: Linear Models with Matrices, Fall 2016

    BTRY 4090: Theory of Statistics, Fall 2015

    BTRY 6020: Statistical Methods II, Spring 2009/2010/2011/2015.

    BTRY 6520: Computationally Intensive Methods of Statistics

    BTRY 6940: Inference in Nonlinear Dynamical Systems, Fall 2012.

    BTRY 3520: Statistical Computing, Spring 2012/2013.

    BTRY 7180: Generalized Linear Models, Fall 2011.

    BTRY 6150: Applied Functional Data Analysis, Fall 2008.

    CSCU Workshop: Introduction to Functional Data Analysis, October 13/14, 2011, March 27, 28, 2008.

    BTRY 694: Theory of Multivariate Statistics, Spring 2008/Fall 2009

    BTRY 694: Statistical Learning Theory, Fall 2007

    BTRY 694: Functional Data Analysis, Spring 2007


    Yichen Zhou and Giles Hooker, 2016, "Interpreting Models via Single Tree Approximation", under review.

    Giles Hooker and Clifford A. Hooker, "Machine Learning and the Future of Realism", under review.

    Chong Liu, Surajit Ray and Giles Hooker, 2016, "Functional Principal Components Analysis of Spatially Correlated Data, Journal of Computational and Graphical Statistics, in press. ArXiv link

    Lucas Mentch and Giles Hooker, 2016, "Formal Hypothesis Tests for Additive Structure in Random Forests", Journal of Computational and Graphical Statistics, in press. ArXiv link.

    Giles Hooker and Steven Roberts, 2016, "Maximal Autocorrelation Functions in Functional Data Analysis", Statistics and Computing. 26(5):945-950.

    Cecilia Earls and Giles Hooker, 2016, "Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction", Bayesian Analysis in press.

    Giles Hooker, 2016, "Consistency, Efficiency and Robustness of Conditional Disparity Methods", Bernoulli, 22(2):857-900. ArXiv link.

    Giles Hooker and Stephen P. Ellner, 2016, "Goodness of Fit in Nonlinear Dynamics: Mis-specified Rates or Mis-specified States?", Annals of Applied Statistics, 9(2):754-776. "ArXiv link

    Peter Hall and Giles Hooker, 2016, "Truncated Linear Models for Functional Data, Journal of the Royal Statistical Society, Series B, 78(3):637-653. "ArXiv link

    Giles Hooker and Lucas Mentch, 2016, "Comments on A Random Forest Guided Tour", TEST, 25(2):254-260.

    Keegan Kang and Giles Hooker, 2016, "Improving the Recovery Of Principal Components with Semi Deterministic Random Projections", Proceedings of the 50th Annual Conference on Information Science and Systems.

    Cecilia Earls and Giles Hooker, 2016, "Combining Functional Data Registration and Factor Analysis", Journal of Computational and Graphical Statistics, in press. ArXiv link

    Keegan Keegan and Giles Hooker, 2016, "Block Correlated Deterministic Random Projections", Proceedings of the 6th Conference on Computational Mathematics, Computational Geometry and Statistics.

    Lucas Mentch and Giles Hooker, 2016,"Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests", Journal of Machine Learning Research, 17(3):1-41. Arxiv link.

    Giles Hooker and Lucas Mentch, 2015, "Bootstrap Bias Corrections for Ensemble Methods", under review.

    Teller, Brittany J., Peter B. Adler, Collin B. Edwards, Giles Hooker, Robin E. Snyder and Stephen P. Ellner, 2015, "Linking demography with drivers: climate and competition", Methods in Ecology and Evolution, 7(2):171-183.

    Grinspan, Zachary, M., JS Shapiro, Erika L. Abramson, Giles Hooker, Rainu Kaushal and Lisa M. Kern, 2015, "Predicting Frequent ED Use By People with Epilepsy with Health Information Exchange Data'', Neurology, 85(12):1031-1058.

    Giles Hooker, James O. Ramsay and Luo Xiao, 2015, "CollocInter: Collocation Inference in Differential Equations", Journal of Statistical Software, in press.

    Giles Hooker, Kevin K. Lin and Bruce Rogers, 2015, Control Theory and Experimental Design in Diffusion Processes, Journal of Uncertainty Quantification, 3(1):234-264. ArXiv version.

    Matthew W. McLean, Giles Hooker and David Ruppert , 2015, "Restricted Likelihood Ratio Tests for Linearity in Scalar-on-Function Regression'', Statistics and Computing, 25(5):997-1008. ArXiv link

    Teppo Hiltunen, Stephen P. Ellner, Giles Hooker, Laura E. Jones, Nelson G. Hairston, ``Eco-evolutionary Dynamics in a Three-Species Food Web with Intraguild Predation: Intriguingly Complex'' in Advances in Ecological Research, Vol. 50 -- Eco-Evolutionary Dynamics.

    Teppo Hiltunen, Nelson G. Hairstone, Giles Hooker, Laura E. Jones and Stephen P. Ellner, 2014, "A newly discovered role of evolution in previously published consumer-resource dynamics", Ecology Letters, 17(8):915-923.

    Giles Hooker and Anand Vidyashankar, 2014, "Bayesian Model Robustness via Disparities", TEST, 23(3):556-584. See also a .tar file of R code to reproduce all our results and an older ArXiv version.

    Matthew McLean, Giles Hooker, Ana-Maria Staicu, Fabian Schiepl and David Ruppert, 2014," Functional Generalized Additive Models'', Journal of Computational and Graphical Statistics, 23(1):249-269.

    Maria Asencio, Giles Hooker and H. Oliver Gao, 2014, "Functional Convolution Models", Statistical Modeling, 14(4):1-21.

    Cecilia Earls and Giles Hooker, 2014, "Bayesian Covariance Estimation and Inference in latent Gaussian Process Models'', Statistical Methodology, 18:79-100.

    Giles Hooker, 2013, "On the Identifiability of the Functional Convolution Model", Technical Report BU-1681-M, Department of Biological Statistics and Computational Biology, Cornell University.

    Yuefeng Wu and Giles Hooker, 2013, "Hellinger Distance and Bayesian Non-parametrics: Hierarchical Models for Robust and Efficient Bayesian Inference", under review.

    Giles Hooker, 2013, A review of "Boosting: Foundations and Algorithms" by Schapire and Freund, Journal of the American Statistical Association, 108(502):750-754.

    Matthew W. McLean, Fabian Scheipl, Giles Hooker, Sonja Greven and David Ruppert, 2013 "Bayesian Functional Generalized Additive Models with Sparsely Observed Covariates'', under review.

    Yin Lou, Rich Caruana, Johannes Gehrke and Giles Hooker, 2013, "Accurate Intelligible Models with Pairwise Interactions", KDD'13.

    S.A. Jesty., S.W. Jung, J.M. Cordeiro, T.M. Gunn, J.M. Di Diego, S. Hemsley, B.G. Kornreich, G. Hooker, C. Antzelevitch, N.S. Moise, 2013, "Cardiomyocyte calcium cycling in a naturally occurring German shepherd dog model of inherited ventricular arrhythmia and sudden cardiac death'', Journal of Vetinary Cardiology 15(1): 5-14.

    Robert D. Gibbons, Giles Hooker, Matthew D. Finkelman, David J. Weiss, Paul A. Pilkonis, Ellen Frank, Tara Moore and David J. Kupfer, 2013, "Computerized Adaptive Diagnosis of Depression Using the CAD-MDD'', Journal of Clinical Psychiatry, 74(7):669-674.

    Leifur Thorbergsson and Giles Hooker, 2012, "Experimental Design for Partially Observed Markov Decision Processes'', under review.

    Giles Hooker and James O. Ramsay, 2012. "Learned-Loss Boosting." Computational Statistics and Data Analysis, 56:3935-3944. Matlab software is also available.

    David Campbell, Giles Hooker and Kim McAuley, 2012, "Parameter Estimation in Differential Equation Models with Constrained States", Journal of Chemometrics, 56:322-332.

    Giles Hooker and Saharon Rosset, 2012, "Prediction-Focussed Regularization Using Data-Augmented Regression", Statistics and Computing, 1:237-349. Simulation code is available.

    Chong Liu, Surajit Ray, Giles Hooker and Mark Friedl, 2012, "Functional Factor Analysis for Periodic Remote Sensing Data", Annals of Applied Statistics, 6:601-624. ArXiv link

    Giles Hooker, Stephen P. Ellner, Laura de Vargas Roditi and David J. D. Earn, 2011, "Parameterizing State-space Models for Infectious Disease Dynamics by Generalized Profiling: Measles in Ontario", Journal of the Royal Society Interface, 8:961-975. Matlab code to the conduct the analysis is available.

    Marija Zeremski, Giles Hooker, Marla A. Shu, Emily Winkelstein, Queenie Brown, Don C. Des Jarlais, Leslie H. Tobler, Barbara Rehermann, Michael P. Busch, Brian R. Edlin, and Andrew H. Talal, 2011, "Induction of CXCR3- and CCR5-associated Chemokines during Acute Hepatitis C Virus Infection", Journal of Hepatology, 55:545-553.

    Giles Hooker and Stephen P. Ellner, 2010, ``On Forwards Prediction Error'', Technical Report BU-1679-M, Department of Biological Statistics and Computational Biology, Cornell University.

    Matthew Finkelman, Giles Hooker and Jane Wang, 2010, ""Prevalence and Magnitude of Paradoxical Results in Multidimensional Item Response Theory". Journal of Educational and Behavioral Statistics, 35:744-761.

    Giles Hooker, 2010. "On Separable Tests Correlated Priors and Paradoxical Results in Multidimensional Item Response Theory". Psychometrika, 75:694-707. Manuscript available upon request.

    Daniel Fink, Wesley M. Hochachka, Benjamin Zuckerberg, David W. Winkler, Ben Shaby, M. Arthur Munson, Giles Hooker, Mirek Riedewald, Daniel Sheldon and Steve Kelling, 2010, ``Spatiotemporal Exploratory Models for Broad-scale Survey Data'', Ecological Applications, 20:2131-22147.

    Giles Hooker, 2010, ``Comments on: Dynamic Relations for Sparsely Sampled Gaussian Processes'', TEST, 19, 50-53.

    Ercan Atam and Giles Hooker, 2010, ``An Identification-based State Estimation Method for a Class of Nonlinear Systems''. J. Systems and Control Engineering, 224: 349-359.

    Giles Hooker and Matthew Finkelman, 2009. "Paradoxical Results and Item Bundles". Psychometrika, 75:249-271. Manuscript available upon request.

    Giles Hooker, Matthew Finkelman and Armin Schwartzman, 2009, "Paradoxical Results in Multidimensional Item Response Theory". Psychometrika, 74:419-442. Slides from a recent talk. Manuscript available upon request.

    Matthew Finkelman, Giles Hooker and Jane Wang, 2009, "Unidentifiability and Lack of Monotonicity in the Multidimensional Three-Parameter Logistic Model". Technical Report BU-1678-M, Department of Biological Statistics and Computational Biology, Cornell University.

    S. Kelling, W. Hochachka, D. Fink, M. Riedewald, R. Caruana, M. Ballard and G. Hooker, 2009, "Data Intensive Science: A New Paradigm for Diversity Studies". Bioscience, 59:613-620.

    Giles Hooker, 2009. "Forcing Function Diagnostics for Nonlinear Dynamics". Biometrics, 65:928-936

    Gelzer, A., M. L. Koller, N. F. Otani, J. J. Fox, M. W. Enyeart, G. Hooker, M. L. Riccio, C. R. Bartoli and R. F. Gilmour, 2008, "Dynamic Mechanisms for Initiation of Ventricular Fibrillation in vivo", Circulation, 118:1123-1129.

    Giles Hooker and Larry Biegler, 2007. "IPOPT and Neural Dynamics: Tips, Tricks and Diagnostics", Technical Report BU-1676-M, Department of Biological Statistics and Computational Biology, Cornell University. A demonstration bundle provides data and AMPL code from this estimation.

    James Ramsay, Giles Hooker David Campbell and Jiguo Cao, 2007. "Parameter Estimation for Differential Equations: A Generalized Smoothing Approach". Journal of the Royal Statistical Society 69:741796, (with discussion).

    Giles Hooker, 2007, "Theorems and Calculations for Smoothing-based Profiled Estimation of Differential Equations", Technical Report BU-1671-M, Department of Biological Statistics and Computational Biology, Cornell University.

    Giles Hooker, 2007. "Generalized Functional ANOVA Diagnostics for High Dimensional Functions of Dependent Variables". Journal of Computational and Graphical Statistics, 16:709-732.

    Robert Norris, Jessica Ngo, Karen Nolan and Giles Hooker, 2005. "Volunteers are Unable to Properly Apply Pressure Immobilization in a Simulated Snakebite Scenario". Journal of Wilderness and Environmental Medicine, 16:16-21.

    Armin Schwartzman, Matthew Finkelman and Giles Hooker, 2004. "The Stanford Statistics Songbook: A Musical Tribute". Technical Report, Department of Statistics, Stanford University.

    Giles Hooker, 2004. "Diagnostics and Extrapolation in Machine Learning". PhD Thesis, Department of Statistics, Stanford University.

    Giles Hooker and Matthew Finkelman, 2004. "Sequential Analysis for Learning Modes of Browsing". WEBKDD 2004: Proceedings of the Sixth International Workshop on Knowledge Discovery from the Web.

    Giles Hooker, 2004. "Diagnosing Extrapolation: Tree-Based Density Estimation". Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Giles Hooker, 2004. "Discovering ANOVA Structure in Black Box Functions". Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Giles Hooker and Fuliang Weng, 2003. "Subset Selection in Large, Sparse Systems: An application of the Forward Stagewise approach to Natural Language Processing". Technical Report, Robert Bosch Corp.

    Michael Shirts, Eric Bair, Giles Hooker and Vijay Pande, 2003. "Equilibrium Free Energies from Nonequilibrium Estimates Using Maximum Likelihood Methods". Physical Letters Review. 91(14):140601.

    Giles Hooker, 1999. "Developing a Spline Smoothed Density". Honours Thesis, Department of Mathematics, Australian National University.

    Markus Hegland, Giles Hooker and Stephen Roberts, 1999. "Finite Element Thin Plate Splines in Density Estimation". Computational Techniques and Applications: Proceedings of the Ninth Biennial Conference: CTAC99. Journal of the Australian Mathematical Society, Series B (special issue).

    Prospective Publications:

    These are a range of paper ideas, some of them more likely to turn into papers than others, some of them larger projects than others, that I think worthwhile. Anybody who is interested in them, has relevant data, or knows of authors that have beaten me to it is highly encouraged to contact me. Many of these are also potential graduate student projects.

    "Boosting for Conditional Density and Quantile Estimates": describes a boosting scheme to estimate the conditional density of a response given features at each point in feature space; extensions to directly estimating quantile functions are possible.

    "Disparity Estimation in Nonlinear and Nonparametric Regression". Considers ways to perform Hellinger-distance and other disparity-based estimation for nonlinear regression and extensions to non-parametric regression. With Anand Vidyashankar.

    "Semi-Parametric Boosting": generalizes the ideas in "Learned-Loss Boosting" to a semi-parametric context in which there is an infinite dimensional non-parametric component. Not clear how well this would work, but worth a try.

    "Experiments in Extrapolation and Truncation": Considers a number of post-hoc truncation methods to deal with extrapolation. I will set up some careful experiments and use real-world data to determine what aspects of extrapolation are most salient.

    "A Tale of Two ANOVAs": suppose we have some set of prediction functions for the same task and we want to evaluate where those functions differ. These differences can be measured point-wise using the functional ANOVA in the sense of Ramsay and Silverman. These point-wise differences then define a high dimensional function that may be investigated through a functional ANOVA in the sense of Gu and Wahba. Together, they may provide some insights into what distinguishes the output of different machine learning algorithms.