* T-learners*,

*and*

**S-learners***are all meta-algorithms that one can use for estimating the*

**X-learners***in the causal inference setting. The information here is largely taken from Künzel et. al. (2019) (Reference 1). All 3 methods are implemented in Microsoft’s EconML python package.*

**conditional average treatment effect (CATE)****Set-up**

- We index individuals by .
- Each individual is either in control () or treatment ().
- We have some outcome/response metric of interest. If the individual is in control (treatment resp.), the response metric is ( resp.). We only get to observe one of them, which we denote by .
- For each individual, we have a vector of pre-treatment covariates .

The * conditional average treatment effect (CATE)* is defined as

If we define the response under control and the response under treatment as

then we can write the CATE as

**T-learner**

The T-learner consists of 2 steps:

- Use observations in the control group to estimate the response under control, . Similarly, use observations in the treatment group to estimate the response under treatment, . Any machine learning method can be used to get these estimates.
- Estimate the CATE by .

**S-learner**

The S-learner treats the treatment variable as if it was just another covariate like those in the vector . Instead of having two models for the response as a function of the covariates , the S-learner has a single model for the response as a function of and the treatment :

The S-learner consists of 2 steps:

- Use all the observations to estimate the response function above, .
- Estimate the CATE by .

**X-learner**

The X-learner consists of 3 steps (sharing the first step with the T-learner):

- Use observations in the control group to estimate the response under control, , and use observations in the treatment group to estimate the response under treatment, .
- Use the estimates in Step 1 to obtain estimates of the individual treatment effects (ITE). For observations in the control group, the ITE estimate is ; For observations in the treatment group, it is . Build a model for the ITE using just observations from the control group (with the imputed/estimated ITE as the response), . Do so similarly with just observations from the treatment group to get .
- Estimate the CATE by combining the two estimates above: , where is a weight function. A good choice for is an estimate of the propensity score.

**When to use what?**

The conclusions here were drawn from Reference 1, the paper which proposed the X-learner. So while I think they make intuitive sense, just keep that in mind when reading this section.

- Overall
- The choice of base learner (for the intermediate models) can make a large difference in prediction accuracy. (This is an important advantage of metalearners in general.)
- There is no universally best metalearner: for each of these 3 metalearners, there are situations where it performs best.

- T-learner
- Performs well if there are no common trends in the response under control and response under treatment and if the treatment effect is very complicated.
- Because data is not pooled across treatment groups, it is difficult for the T-learner to mimic a behavior (e.g. discontinuity) that appears in all the treatment groups.

- S-learner
- Since the treatment indicator plays no special role, the base learners can completely ignore it during model-fitting. This is good if the CATE is zero in many places.
- The S-learner can be biased toward zero.
- For some base learners (e.g. k-nearest neighbors), treating the treatment indicator like any other covariate may not make sense.

- X-learner
- The X-learner can adapt to structural properties such as sparsity or smoothness of the CATE. (This is useful as CATE is often zero or approximately linear.)
- When CATE is zero, it usually is not as good as the S-learner but is better than the T-learner.
- When CATE is complex, it outperforms the S-learner, and is often better than the T-learner too.

- It is particularly effective when the number of units in one treatment group (often the control group) is much larger than in the other.

- The X-learner can adapt to structural properties such as sparsity or smoothness of the CATE. (This is useful as CATE is often zero or approximately linear.)

References:

- Künzel, S. R., et. al. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning.