Assume we are in the supervised learning setting with observations, where observation consists of the response and features . A generalized linear model (GLM) consists of 3 components:

- A random component, or a family of distributions indexed by (usually an exponential family), such that ,
- A systematic component , and
- A link function such that .

(See this previous post for more details of the components of a GLM.) The user gets to define the family of distributions and the link function , and is the parameter to be determined by maximum likelihood estimation.

For one-dimensional exponential families with the canonical link function, it is known that the log-likelihood of the GLM is globally concave in (see, for example, Reference 1). Hence, the MLE can be found using methods such as gradient descent or coordinate descent. When non-canonical links are used, the GLM’s log-likelihood is no longer guaranteed to be concave in . However, in some situations we can stilll show that the log-likelihood is concave in . **In this post, we show that the log-likelihood for probit regression is concave in .**

In the probit regression model, , where is the cumulative distribution function (CDF) of the standard normal distribution. The responses are binary with . The likelihood function is

and the log-likelihood function is

To show that is concave in , we make two reductions:

- Since the sum of concave functions is concave, it is enough to show that is concave.
- Since composition with an affine function preserves concavity, it is enough to show that is concave in . (Here, .)

From here, we can show that is concave by showing that its second derivative is negative: for all . Since can only take on the values of 0 and 1, we can consider those cases separately.

Let denote the probability density function of the standard normal distribution. Recall that . When ,

since for all (see this previous post for a proof).

When ,

To show concavity of , it remains to show that for all . But this is true: see this previous post for a proof.

References: