In this previous post, I stated the likelihood equations (or score equations) for generalized linear models (GLMs). Any solution to the score equations is a maximum likelihood estimator (MLE) for the GLM. In this post, I work through the derivation of the score equations.
Recap: What is a GLM?
Assume we have data points
for
. We want to build a generalized linear model (GLM) of the response
using the
other features
. To that end, assume that the
values are all fixed. Assume that
are samples of independent random variables
which have the probability density (or mass) function of the form
In the above, the form of is known, but not the values of the
‘s and
.
Let . We assume that
where for some link function
, assumed to be known.
Recap: The likelihood/score equations
The goal of fitting a GLM is to find estimates for . More specifically, we want to find the values of
which maximize the (log)-likelihood of the data:
To do that, we differentiate w.r.t.
and set it to be 0. Using the chain rule as well as the form of
, after some algebra we have
Hence, the likelihood equations (or score equations) are
appears implicitly in the equations above through
:
. (See the original post for a matrix form of these equations.)
Likelihood equations: A derivation
All this involves is evaluating the 4 partial derivatives in and multiplying them together.
Using the form of the probability density function for
:
The second partial derivative, $\partial \theta_i / \partial \mu_i$, is probably the trickiest of the lot: simplifying it requires some properties of exponential families. Under general regularity conditions, we have
Applying to our setting:
Applying to our setting:
Thus,
The third partial derivative, , actually appears in the likelihood equations so we don’t have to do any algebraic manipulation here. Finally, using the systematic component of the GLM
, we have
Putting these 4 parts together:
as required.
References:
- Agresti, A. Categorical Data Analysis (3rd ed), Chapter 4.