Assume we are in the supervised learning setting with observations, where observation
consists of the response
and features
. A generalized linear model (GLM) consists of 3 components:
- A random component, or a family of distributions
indexed by
(usually an exponential family), such that
,
- A systematic component
, and
- A link function
such that
.
(See this previous post for more details of the components of a GLM.) The user gets to define the family of distributions and the link function
, and
is the parameter to be determined by maximum likelihood estimation.
For one-dimensional exponential families with the canonical link function, it is known that the log-likelihood of the GLM is globally concave in (see, for example, Reference 1). Hence, the MLE
can be found using methods such as gradient descent or coordinate descent. When non-canonical links are used, the GLM’s log-likelihood is no longer guaranteed to be concave in
. However, in some situations we can stilll show that the log-likelihood is concave in
. In this post, we show that the log-likelihood for probit regression is concave in
.
In the probit regression model, , where
is the cumulative distribution function (CDF) of the standard normal distribution. The responses are binary with
. The likelihood function is
and the log-likelihood function is
To show that is concave in
, we make two reductions:
- Since the sum of concave functions is concave, it is enough to show that
is concave.
- Since composition with an affine function preserves concavity, it is enough to show that
is concave in
. (Here,
.)
From here, we can show that is concave by showing that its second derivative is negative:
for all
. Since
can only take on the values of 0 and 1, we can consider those cases separately.
Let denote the probability density function of the standard normal distribution. Recall that
. When
,
since for all
(see this previous post for a proof).
When ,
To show concavity of , it remains to show that
for all
. But this is true: see this previous post for a proof.
References: