You can think of “sites” as a blocking variable. This is a field experiment scenario, where multiple sites within a region are selected and then two plots within each site are randomly placed and a treatment assigned (“treatment” or “control”). The imaginary study design that is the basis of my model has two different sizes of study units.
#SIMULATE RESULTS IN JMP 13 CODE#
You may find that writing the code first and coming back to look at the statistical model later is helpful. If these aren’t helpful to you, jump down to the code. I find this function particularly useful if I want to simulate data based on a fitted model, but it can also be used in situations where you don’t already have a model.Īs usual, I’ll start by writing out the statistical model using mathematical equations. However, also see the simulate() function from package lme4. Given this, I thought exploring estimates of dispersion based on simulated data that we know comes from a binomial distribution would be interesting. Models based on single parameter distributions like the binomial can be overdispersed or underdispersed, where the variance in the data is bigger or smaller, respectively, than the variance defined by the binomial distribution. A different distribution (possibly beta) would be needed for continuous proportions like, e.g., total leaf area with lesions.
I use the term counted proportion to indicate that the proportions are based on discrete counts, the total number of “successes” divided by the total number of trials.
I find binomial models the most difficult to grok, primarily because the model is on the scale of log odds, inference is based on odds, but the response variable is a counted proportion. I settled on a binomial example based on a binomial GLMM with a logit link. A post about simulating data from a generalized linear mixed model (GLMM), the fourth post in my simulations series involving linear models, is long overdue.