In this previous post, I introduced theĀ **alternating direction method of multipliers (ADMM)**, an algorithm which solves a particular class of convex optimization problems. ADMM has gained popularity in recent years because it allows certain problems to be solved in a distributed manner. **Consensus optimization**, the subject of this post, is one example of this.

(Note: The material for this post is taken from Chapter 7 of Reference 1.)

**Recap**

Recall that ADMM solves problems that can be written in the form

with variables , , and , , and . and are assumed to be convex functions. For brevity, I will call this problem the **“ADMM problem”**, or I may say that the problem is in **“ADMM form”**.

The **ADMM algorithm** consists of the iterations

In the above, superscripts refer to the iteration number.

**Global Variable Consensus Optimization**

Consider the optimization problem

where and are convex. (Note that each term can encode constraints by setting when a constraint is violated.) Imagine that we had worker machines available to us (along with one central machine). **Could we break the optimization up into parallel parts, with the optimization of being done by machine (roughly speaking)?**

We can do so if we introduce auxiliary variables. For , replace the argument with a “local copy” , then constrain each of the to be equal to some other auxiliary variable . In other words, solve the equivalent problem

At first it might seem like we have taken a step back, since we now have to solve for variables instead of just one variable. However, notice that the problem is in ADMM form, and so we can write the ADMM udpates:

The first and third steps can be done in parallel across machines, with machine solving the minimization problem associated with . After step 1, all the new values are fed to a central processing element (sometimes called the **central collector** or **fusion center**) to compute the update. This new value of is then broadcast back to the machines to compute the updates in parallel.

**Global Variable Consensus Optimization: An example**

One example of global variable consensus optimization is linear regression over a huge dataset. Let’s say that the design matrix is and the response vector is . Linear regression solves the optimization problem

If is huge, it may be broken up into parts and stored on different machines. For example, assume that and are broken up row-wise into parts, with part , denoted by and being on machine . We can then write the optimization problem as

which is the global variable consensus problem. This allows us to use consensus ADMM to solve for in a distributed manner.

**General Form Consensus Optimization**

The global variable consensus problem can be generalized. In the global variable consensus problem, machine deals with , where is a local copy of the global variable . In general form consensus optimization, machine deals with , where ** is (instead) a subvector of the global variable **.

This generalization is easy conceptually but cumbersome in terms of notation. Let be the global variable. For each , let be a “local” variable that maps to a subvector of . Let be a mapping such that if and only if local variable corresponds to the global variable .

Perhaps an example will help make this notation more understandable. In the diagram below, , , and . corresponds to and corresponds to , so and .

The general form consensus optimization problem is of the form

Let be defined by . That is, is the global variable’s idea of what local variable should be. We can rewrite the optimization problem above as

Again, this is in ADMM form and so we can write the ADMM updates easily:

As in global variable consensus, step 1 and 3 decouple into subproblems, one for each and . **Unlike global variable consensus, step 2 also decouples!** We can compute the update for each , separately:

Hence, less broadcasting is required between steps 1 and 2 and steps 2 and 3. For example, in the diagram above, we only have to broadcast along the arrows from left to right between steps 1 and 2, and along the arrows from right to left between steps 2 and 3. (For global variable consensus, we have to broadcast along the complete bipartite graph between the left and right.)

**General Form Consensus Optimization: An example**

One example of global variable consensus optimization is linear regression over a huge dataset, where the dataset is split up row-wise across many machines. An example of general form consensus optimization is where the dataset is further split up column-wise: that is, each machine only has feature values for a subset of features, and for only a subset of the observations.

References:

- Boyd, S., et al. 2010. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.