--- title: "For One Parameter: Binomial" author: Jeff Rouder date: Sept. 28th, 2018 output: html_document --- #The Problem What is the probability that toast falls butter side down? Let's suppose we have observed 7 successes (butter-side down) in 10 trials. What does that tell us about buttered toast? #The Model Let $Y$ be a random variable that denotes the number of successes. We model $Y$ as $Y \sim \mbox{Binomial}(\theta,n)$ where $\theta$ is the parameter of interest and $n$ is the number of trials. Q. What are the invariances stipulated in this model? #Binomial Distribution on Data Let's look at some predictions of the model {r} n=10 y=0:n p=dbinom(y,n,.7) plot(y,p,typ='h') points(y,p,pch=19,col='red')  We need some notation and terminology at this point. The probability mass function, dbinom in this case, in classical statistics is $f(y;\theta) = \binom{n}{y}\theta^y(1-\theta)^{n-y}.$ The key here is that the mass function is a function of $y$, the outcome (\# of successes) for a provided $\theta$. This is fine in classical statistics where $\theta$ is treated as fixed but unknown. #Bayesian Setup In Bayesian analysis, parameters are treated as random variables, and probability distributions are placed on them. Let's talk about distributions as statements of belief. What are possible distributions of beliefs about toast? What do they represent? Prior beliefs, updating, posterior beliefs. Now we have two random variables, the data $Y$ and the parameter $\theta$. Continuous, discrete? Bayes' Rule: $f(\theta|y) = \frac{Pr(Y=y|\theta)}{Pr(Y=y)}f(\theta)$ 1. Prior, $f(\theta)$, shall we take a uniform to start? $f(\theta)=1$ for $0\leq\theta\leq 1$. 2. $Pr(Y=y|\theta) = \binom{n}{y}\theta^y(1-\theta)^{n-y}$ 3. Derive $Pr(Y=y)$ from the Law of Total Probability $Pr(Y=y) = \int Pr(Y=y|\theta)f(\theta)$ Q. Write an R funciton to compute $Pr(Y=y)$ for a given $y$ and $n$ {r,echo=T} prior=function(theta) dunif(theta,0,1) p.data.Given.theta = function(theta,y,n) dbinom(y,n,theta) integrand=function(theta,y,n) prior(theta)*p.data.Given.theta(theta,y,n) p.data=function(y,n) integrate(integrand,lower=0,upper=1,y=y,n=n)$value  Now, evaluate Bayes Rule {r, echo=T} #Bayes Rule posterior=function(theta,y,n) p.data.Given.theta(theta,y,n) * prior(theta) / p.data(y,n)  {r} theta=seq(0,1,.01) prior.dens=prior(theta) posterior.dens=posterior(theta,7,10) plot(typ='l',theta,posterior.dens,col='darkgreen') lines(theta,prior.dens)  #Encoding Other Beliefs Meet the beta distribution. {r} #Not very likely prior a=1.5 b=3 prior=function(theta) dbeta(theta,a,b) prior.dens=prior(theta) posterior.dens=posterior(theta,7,10) plot(typ='l',theta,posterior.dens,col='darkgreen') lines(theta,prior.dens)  Here is Bayes rule again: $f(\theta|y) = \frac{Pr(Y=y|\theta)}{Pr(Y=y)}f(\theta)$ We are interested in the posterior,$f(\theta|y)$, which is a function of$\theta$for fixed$y$. So we can treat the RHS as a function of$\theta$too. 1.$Pr(Y=y)|\theta$can be treated as a function of$\theta$too {r} #usual, function of y for fixed theta y=0:10 n=10 theta=.7 plot(y,dbinom(y,n,theta),typ='h') #in Bayes rule, function of theta for fixed y y=7 n=10 theta=seq(0,1,.01) plot(theta,dbinom(y,n,theta),typ='l',ylab="Likelihood")  Sometimes called likelihood. $L(\theta;y) = Pr(Y=y|\theta)$ 2.$Pr(Y=y)$, the denominator, does not depend on$\theta$. It is a constant. 3. Constant of proprtionality,$\propto$,$y=Kx$or$y \propto x. $f(\theta|y) \propto L(\theta;y) \times f(\theta)$ # Some One Parameter Models ## Binomial Data, Beta Prior Here is the model, \begin{aligned} Y|\theta &\sim \mbox{Binomal}(\theta,n)\\ \theta &\sim \mbox{Beta}(a,b) \end{aligned} * Group 1: Figure out the beta (wiki), what area$and$b$, how should we set them? * Group 2: Using the proportional form of Bayes' Rule, derive the posterior$\theta|Y. # Exponential Data, Gamma Prior \begin{aligned} Y_i|\lambda &\sim \mbox{Exponential}(\lambda), \quad i=1,\ldots,n\\ \lambda &\sim \mbox{Gamma}(a,b) \end{aligned} * Group 3: Figure out the exponential and gamma. What is\lambda$. What are the roles of$(a,b)$, how should we set them? * Group 4: Using the proportional form of Bayes' Rule, derive the posterior$\lambda|Y$. # Normal data, known variance, normal prior on$\mu\begin{aligned} Y_i|\mu &\sim \mbox{Normal}(\mu,\sigma^2),\quad i=1,\ldots,n\\ \mu &\sim \mbox{Normal}(a,b) \end{aligned} * Group 5: (Experts!) Using the proportional form of Bayes' Rule, derive the posterior\mu|Y$. # Normal data, known mean, inverse gamma prior on$\sigma^2\begin{aligned} Y|\sigma^2 &\sim \mbox{Normal}(\mu,\sigma^2)\\ \sigma^2 &\sim \mbox{Inverse Gamma}(a,b) \end{aligned} * Group 7: (Experts!) Explore the inverse gamma. How shall we set(a,b)$? Using the proportional form of Bayes' Rule, derive the posterior$\sigma^2|Y$. # The Many Parameter Problem Suppose both$\mu$and$\sigma^2are unknown Model: \begin{aligned} Y|\mu,\sigma^2 &\sim \mbox{Normal}(\mu,\sigma^2)\\ \mu &\sim \mbox{Normal}(a,b)\\ \sigma^2 &\sim \mbox{Inverse Gamma}(r,s) \end{aligned} Data:y=105,100,124,104$1. We can still use Bayes rule: $f(\mu,\sigma^2|y_1,\ldots,y_N) \propto f(y_1,\ldots,y_N|\mu,\sigma^2) \times f(\mu,\sigma^2)$ 2. From before: $f(y_1,\ldots,y_N|\mu,\sigma^2) = \frac{1}{(2\pi)^{N/2}(\sigma^2)^{N/2}}\exp\left(-\frac{\sum_i(y_i-\mu)^2}{2\sigma^2}\right).$ 3. Joint Prior:$\mu \sim \mbox{Normal}(a,b)$and$\sigma^2 \sim \mbox{InvGamma}(r,s)\$. $f(\mu,\sigma^2) \propto \frac{1}{\sqrt{b}}\exp\left(-\frac{(\mu-a)^2}{2b}\right)\times \frac{1}{(\sigma^2)^{r+1}}\exp\left(-\frac{s}{\sigma^2}\right)$ 4. We are just going to do the rest in R rather than in math. So, lets do IQ with observations (105,100,124,104) {r,warning=FALSE,message=FALSE} library(MCMCpack) #Show prior a=100 b=5^2 r=1 s=10^2 mu=seq(80,120,1) s2=seq(1^2,20^2,1) my.prior=function(mu,s2) dnorm(mu,a,sqrt(b))*dinvgamma(s2,r,s) prior=outer(mu,s2,my.prior) contour(mu,s2,prior)  {r} #Show likelihood dat=c(105,100,124,104) my.like=function(mu,s2) exp(sum(dnorm(log=T,dat,mu,sqrt(s2)))) my.like=Vectorize(my.like) like=outer(mu,s2,my.like) contour(mu,s2,like)  {r} #posterior post=prior*like total=sum(post) post=post/total #sums to 1 sum(post) contour(mu,s2,post)  {r} #Marginal Distributions mu.post=apply(post,1,sum) plot(mu,mu.post,typ='l') s2.post=apply(post,2,sum) plot(s2,s2.post,typ='l') ` This approach of using joint distributions breaks down quickly with models that have 100s or 1000s of parameters. So, what to do? Next Lecture.