PR~statistics

Delivering ecology based courses and workshops

Course: Applied Bayesian modelling for ecologists

Instructors: Dr. Matt Denwood and Prof. Jason Matthiopoulos.

Coordinator: Oliver Hooker.

Date: TBC - email for detials.

Cost: TBC - email for detials.

Availability: 30 places total.

Venue: TBC - email for detials.

Duration: 6 days, approximately 8 teaching hours per day.

Registration: Please send course requests to oliverhooker@prstatistics.co.uk stating the course title and if you would be interested in an all inclusive option after which you will be sent a registration form and invoice.

Course details: Starting from a refresher on probability & likelihood, the course will take students all the way to cutting-edge applications such as state-space population modeling & spatial point-process modeling. By the end of the week, you should have a basic understanding of how common MCMC samplers work and how to program them, and have practical experience with the BUGS language for common ecological and epidemiological models. The experience gained will be a sufficient foundation enabling you to understand current papers using Bayesian methods, carry out simple Bayesian analyses on your own data and springboard into more elaborate applications such as dynamical, spatial and hierarchical modeling.

Day 1: REVISION OF LIKELIHOODS USING FULL LIKELIHOOD PROFILES AND AN INTRODUCTION TO THE THEORY OF BAYESIAN STATISTICS

- Conditional, joint and total probability, independence, Baye’s law
- Probability distributions
- Uniform, Bernoulli, Binomial, Poisson, Gamma, Beta and Normal distributions – their range, parameters and common usesoLikelihood and parameter estimation by maximum likelihood
- Numerical likelihood profiles and maximum likelihood

- Relationship between prior, likelihood & posterior distributions
- Summarising a posterior distribution; The philosophical differences between frequentist & Bayesian statistics, & the practical implications of these
- Applying Bayes’ theorem to discrete & continuous data for common data types given different priors
- Building a posterior profile for a given dataset, & compare the effect of different priors for the same data

Day 2: AN INTRODUCTION TO THE WORKINGS OF MCMC, AND THE POTENTIAL DANGERS OF MCMC INFERENCE. PARTICIPANTS WILL PROGRAM THEIR OWN (BASIC) MCMC SAMPLER TO ILLUSTRATE THE CONCEPTS AND FULLY UNDERSTAND THE STRENGTHS AND WEAKNESSES OF THE GENERAL APPROACH. THE DAY WILL END WITH AN INTRODUCTION TO THE BUGS LANGUAGE.

- The curse of dimensionality & the advantages of MCMC sampling to determine a posterior distribution
- Monte Carlo integration, st&ard error, & summarising samples from posterior distributions in R
- Writing a Metropolis algorithm & generating a posterior distribution for a simple problem using MCMC

- Definition of a Markov chain
- Autocorrelation, effective sample size & Monte Carlo error
- The concept of a stationary distribution & burnin;
- Requirement for convergence diagnostics, & common statistics for assessing convergence
- Adapting an existing Metropolis algorithm to use two chains, & assessing the effect of the sampling distribution on the autocorrelation

- Introduction to the BUGS language & how a BUGS model is translated to an MCMC sampler during compilation
- The difference between deterministic & stochastic nodes, & the contribution of priors & the likelihood
- Running, extending & interpreting the output of simple JAGS models from within R using the runjags interface

Day 3: THIS DAY WILL FOCUS ON THE COMMON MODELS FOR WHICH JAGS/BUGS WOULD BE USED IN PRACTICE, WITH EXAMPLES GIVEN FOR DIFFERENT TYPES OF MODEL CODE. ALL ASPECTS OF WRITING, RUNNING, ASSESSING AND INTERPRETING THESE MODELS WILL BE EXTENSIVELY DISCUSSED SO THAT PARTICIPANTS ARE ABLE AND CONFIDENT TO RUN SIMILAR MODELS ON THEIR OWN. THERE WILL BE A PARTICULARLY HEAVY FOCUS ON PRACTICAL SESSIONS DURING THIS DAY. THE DAY WILL FINISH WITH A DISCUSSION OF HOW TO ASSESS THE FIT OF MCMC MODELS USING THE DEVIANCE INFORMATION CRITERION (DIC) AND OTHER METHODS.

- Understanding and generating code for basic generalised linear mixed models in JAGS
- Syntax for quadratic terms and interaction terms in JAGS

- The need for minimal cross-correlation and independence between parameters and how to design a model with these properties
- The practical methods and implications of minimizing Monte Carlo error and autocorrelation, including thinning
- Interpreting the DIC for nested models, and understanding the limitations of how this is calculated
- Other methods of model selection and where these might be more useful than DIC

Day 4: DAY 4 WILL FOCUS ON THE FLEXIBILITY OF MCMC, AND PRECAUTIONS REQUIRED FOR USING MCMC TO MODEL COMMONLY ENCOUNTERED DATASETS. AN INTRODUCTION TO CONJUGATE PRIORS AND THE POTENTIAL BENEFITS OF EXPLOITING GIBBS SAMPLING WILL BE GIVEN. MORE COMPLEX TYPES OF MODELS SUCH AS HIERARCHICAL MODELS, LATENT CLASS MODELS, MIXTURE MODELS AND STATE SPACE MODELS WILL BE INTRODUCED AND DISCUSSED. THE PRACTICAL SESSIONS WILL FOLLOW ON FROM DAY 3.

- The flexibility of the BUGS language and MCMC methods
- The difference between informative and diffuse priors
- Conjugate priors and how they can be used
- Gibbs sampling

- Hierarchical and state space models
- Latent class and mixture models
- Conceptual application to animal movement
- Hands-on application to population biology
- Conceptual application to epidemiology

Day 5: DAY 5 WILL GIVE SOME ADDITIONAL PRACTICAL GUIDANCE FOR THE USE OF BAYESIAN METHODS IN PRACTICE, AND FINISH WITH A BRIEF OVERVIEW OF MORE ADVANCED BAYESIAN TOOLS SUCH AS INLA AND STAN.

- Understand the usefulness of conjugate priors for robust analysis of proportions (Binomial and Multinomial data)
- Be aware of some methods of prior elicitation

Day 6: ROUND TABLE DISCUSSIONS AND PROBLEM SOLVING WITH FINAL Q and ARound table discussion and problem solving with final Q and A

- The final day will consist of round table discussions, the class will be split in to smaller groups to discuss set topics/problems. This will include participants own data where possible. After an early lunch there will be a general question and answer time until approx. 2pm as a whole group before transport to Balloch train station.

Outcomes: By the end of this course you will be able to:

- Do calculations with conditional, joint and total probability.
- Understand the key philosophical differences between Bayesian and Frequentist statistics and be in a position to decide which approach is likely to be most useful for particular research questions.
- Use prior information along with likelihood information to form a Bayesian posterior in simple examples.
- The concept of Markov chain Monte Carlo (MCMC) and how this is used in practice
- Critically discuss the role of autocorrelation and cross-correlation in model identifiability and Monte Carlo error.
- Write regression models (GLMs, GLMMs) in WinBUGS / JAGS and fit these to data.
- Use biological first principle or independent information to choose and implement both informative and minimally informative priors.
- Identify when a model has converged and when sufficient Monte Carlo samples have been obtained. Conduct model selection and comparisons using DIC. Understand the motivation and advantages of alternative model selection methods.
- Understand and customize more complex models for ecological populations in space and time.

Assumed computer background: At entry you should make sure that you have a working knowledge of:

- Basic R usage (command-line interactive, generation of graphs)
- Manipulation of data-frames in R
- Regression modelling (linear, generalised linear & mixed effects models)

- Programming structures (loops, conditional statements)
- The basic ideas of probability and likelihood

Equipment and software requirements: There is a computer based practical session every day, so you will need to bring a laptop/personal computer pre-loaded with the following (free) software (don’t assume that you will have internet access during the course):