\documentclass[12pt,graphics]{article} \textheight22truecm \textwidth17truecm \setlength{\oddsidemargin}{0cm} \setlength{\evensidemargin}{0cm} \setlength{\topmargin}{-1cm} \usepackage{amsfonts} \newcommand{\Mn}{\mathbb{M}^n} \newcommand{\R}{\mathbb{R}} \newtheorem{theorem}{Theorem} \newtheorem{example}{Example} \newtheorem{lemma}{Lemma} \newtheorem{corollary}{Corollary} \newtheorem{condition}{Condition} \newtheorem{conjecture}{Conjecture} \newtheorem{proposition}{Proposition} \newtheorem{assumption}{Assumption} \newtheorem{definition}{Definition} \newtheorem{property}{Property} \newtheorem{remark}{Remark} \newcommand{\proof}{\bf Proof: \rm }%\nr} \newcommand{\rvec}{{\rm vec}} \newcommand{\svec}{{\rm svec}} \def\cA{{\cal A}} \def\squareforqed{\hbox{\rlap{$\sqcap$}$\sqcup$}} \def\qed{\ifmmode\else\unskip\quad\fi\squareforqed} %\newcommand{\qed}{\hfill $\Box$ \nr \medskip} \newcommand{\half}{\mbox{${1 \over 2}$}} \usepackage{graphics} \renewcommand\arraystretch{1} \def\ba{\begin{array}} \def\ea{\end{array}} \def\beann{\begin{eqnarray*}} \def\eeann{\end{eqnarray*}} \def\bea{\begin{eqnarray}} \def\eea{\end{eqnarray}} \def\beq{\begin{equation}} \def\eeq{\end{equation}} \def\la{\langle} \def\ra{\rangle} \def\tr{{\rm trace \,}} \def\Diag{{\rm Diag \,}} \def\diag{{\rm diag \,}} %\def\SRn{{\sl S}\R^{n \times n}} \def\BT{\begin{theorem}} \def\ET{\end{theorem}} \def\BL{\begin{lemma}} \def\EL{\end{lemma}} \def\BP{\begin{proposition}} \def\EP{\end{proposition}} \def\BC{\begin{corollary}} \def\EC{\end{corollary}} \def\BD{\begin{definition}} \def\ED{\end{definition}} \def\BA{\begin{assumption}} \def\EA{\end{assumption}} \def\BR{\begin{remark}} \def\ER{\end{remark}} \def\BE{\begin{example}} \def\EE{\end{example}} \newcommand{\bu}{\bullet} \begin{document} \vspace{-20mm} \setlength{\fboxrule}{.5mm}\setlength{\fboxsep}{1.2mm} \newlength{\boxlength}\setlength{\boxlength}{\textwidth} \addtolength{\boxlength}{-4mm} \begin{center}\framebox{\parbox{\boxlength}{\bf Semidefinite Programming \hfill Lecture 1\\ OR 6327 Spring 2012 \hfill January 24, 2012 \\ Scribe: Mike Todd }}\end{center} \vspace{1mm} This course will be concerned with semidefinite programming, the study of optimization problems that include constraints that certain matrices be positive semidefinite. We will confine ourselves to problems where all other constraints and functions are linear (or sometimes quadratic). The subject also goes by the name of linear matrix inequalities in control theory. I am generally assuming mathematical maturity, i.e., some linear algebra and real analysis. Familiarity with linear programming at a graduate level (like ORIE 6300) and numerical linear algebra (like CS 6210) will be useful but not essential. I can be reached at 229 Rhodes Hall, at {\tt mjt7@cornell.edu, miketodd@orie.cornell.edu}, \linebreak {\tt miketodd@cs.cornell.edu}, or at 255-9135. There is a crude web site for the course at \linebreak %{\tt people.orie.cornell.edu/{\char '176'}miketodd/cornellonly/or637/or637.html} {\tt people.orie.cornell.edu/$\sim$miketodd/or6327/or6327.html} \linebreak A rough outline for the course is: \begin{itemize} \item Introduction to SDP problems; \item Applications; \item Duality; \item the Central Path; and \item Algorithms (mostly interior-point methods). \end{itemize} We will be concentrating on SDP problems in primal standard form and in the corresponding dual form. The problem in primal form can be written as \[ \ba{llrcl} & \min_X & C \bu X \\ (P) & & A_i \bu X & = & b_i, \quad i = 1,\dots,m, \\ & & X & \succeq & 0, \ea \] where the data $C$, $A_i$, $i = 1,\dots,m$, are real symmetric $n \times n$ matrices while $b$ is a real $m$-vector and the variable $X$ is a real symmetric $n \times n$ matrix. (There are also interesting SDP problems where instead the matrices above are complex Hermitian matrices instead of real symmetric, but we'll confine ourselves to the real case for simplicity.) Here $U \bu V := \tr(U^T V) = \sum_j \sum_k u_{jk} v_{jk}$ for any two matrices of the same dimensions, and $U \succeq 0$ means that matrix $U$ is symmetric and positive semidefinite (abbreviated psd), i.e., that $z^T U z$ is nonnegative for all $z$. We also write $U \succ 0$ to mean $U$ is positive definite (pd), i.e., $z^T U z$ is positive for all nonzero $z$; $V \succeq W$ or $W \preceq V$ for $V - W \succeq 0$; and similarly $V \succ W$ or $W \prec V$ for $V - W \succ 0$. Using the same data, we can construct an SDP problem in dual form: \[ \ba{cccl} \max_y & b^T y \\ & \sum_{i=1}^m y_i A_i & \preceq & C; \ea \] the constraint can alternatively be written $C - \sum_{i=1}^m y_i A_i \succeq 0$. This is a {\em linear matrix inequality} on the variable $y$: a requirement that a matrix depending linearly (by which we mean affinely!) on some variables be psd. It is convenient to introduce a {\em slack matrix} $S$ and rewrite the problem as \[ \ba{ccccccl} & \max_{y,S} & b^T y & & & & \\ (D) & & \sum_i y_i A_i & + & S & = & C, \\ & & & & S & \succeq & 0. \ea \] Here the variable $y$ is a real $m$-vector while $S$ is a real symmetric $n \times n$ matrix. We have been a little coy in calling $(D)$ a problem in dual form rather than the problem dual to $(P)$, but it will be a while before we establish that it is the Lagrangian dual or prove strong duality (under appropriate conditions). We will (almost) show weak duality below. Our problems $(P)$ and $(D)$ above can be compared to the usual primal-dual pair for linear programming: \[ \ba{rrclrrrrcl} \min_x & c^Tx & & & \quad \max_{y,s} & b^T y \\ & a_i^Tx & = & b_i, \, i=1,\dots,m, & & \sum_i a_i y_i & + & s & = & c, \\ & x & \geq & 0, & & & & s & \geq & 0. \ea \] Here the $a_i$'s, $c$, $x$, and $s$ are $n$-vectors, and the inequality is as usual interpreted componentwise. Let us introduce a little more notation: \begin{itemize} \item $\R^m$ denotes real $m$-dimensional Euclidean space, viewed as the space of real column $m$-vectors (all our vectors will be columns, so we can distinguish between inner products $u^Tv$ and outer products (rank-one matrices) $uv^T$); $\R^{m \times n}$ denotes the space of real $m$ by $n$ matrices; \item $\Mn$ denotes the space of real symmetric $n \times n$ matrices, while \item $\Mn_+$ and $\Mn_{++}$ denote its subsets of psd and pd matrices, respectively; \item $\cA$ denotes the linear mapping from $\Mn$ to $\R^m$ defined by \[ \cA X := (A_i \bu X)_{i=1}^m, \, \mbox{while} \] \item $\cA^*$ denotes the (adjoint) linear mapping from $\R^m$ to $\Mn$ defined by \[ \cA^* y := \sum_i y_i A_i \] (so $\cA$ and $\cA^*$ replace the usual $m \times n$ matrix $A$ in the primal-dual pair of LPs above). \end{itemize} Now we can write our problems more compactly as \[ \ba{crrclcrrrrcl} & \min_X & C \bu X & & & & \max_{y,S} & b^T y \\ (P) & & \cA X & = & b, & \quad (D) & & \cA^* y & + & S & = & C, \\ & & X & \succeq & 0, & & & &&S & \succeq & 0. \ea \] We can mirror the proof of weak duality for linear programming: \BP (Weak Duality) Suppose $X$ and $(y,S)$ are feasible for $(P)$ and $(D)$ respectively. Then \[ C \bu X - b^T y = S \bu X \geq 0, \] so that objective function values of primal feasible solutions always dominate those of dual feasible solutions. \EP \proof We again have an (almost) one-line proof: \[ \ba{rcccl} C \bu X - b^T y & = & (\cA^*y + S) \bu X - (\cA X)^T y & = & (\sum_i y_i A_i + S) \bu X - ((A_i \bu X)_1^m)^T y \\ & = & S \bu X & \geq & 0. \ea \] Here we use the linearity in each argument of the expression $U \bu V$ and also the fact (established later this week) that $U \bu V$ is nonnegative when both $U$ and $V$ are psd. \qed As a corollary, feasible $X$ and $(y,S)$ are optimal if their objective values are equal, which holds iff $S \bu X$ or $X \bu S$ is zero. (Thus ``No SeX Please, We're British'' or ``No eXcesS of primal objective over dual'' are sufficient conditions for optimality.) By stacking its columns one above another, we can convert any $m \times n$ matrix $P$ into a vector \[ \rvec(P) := (p_{11};p_{21};\dots;p_{m1};p_{12},p_{22};\dots;p_{m2};\dots; p_{1n};p_{2n};\dots;p_{mn}), \] where we use the MATLAB-like notation $(z_1;z_2;\dots;z_r)$ to denote a column vector with the appropriate components. (We also use this notation to stack column vectors or matrices columnwise.) Then if $p = \rvec(P)$, $q = \rvec(Q)$, we have $p^Tq = P \bu Q$. For symmetric $n \times n$ matrices, this representation is wasteful (and not onto), so instead we use \[ \svec(U) := (u_{11};\sqrt{2}u_{12};u_{22};\sqrt{2}u_{13};\sqrt{2}u_{23};u_{33}; \dots;u_{nn}), \] where only the entries on and above the diagonal are used and the $\sqrt{2}$ factors are chosen so that we still have $\svec(U)^T \svec(V) = U \bu V$ for symmetric matrices $U$ and $V$. This embeds $\Mn$ into $\R^{n(n+1)/2}$ isometrically. Using this embedding, we can write $(P)$ and $(D)$ above as vector optimization problems, but usually we stick to the matrix notation for clarity. (As above, we try to reserve $P$, $Q$, $L$, and $R$ for possibly nonsymmetric matrices; other upper-case Roman letters usually refer to symmetric matrices.) A more general vector optimization problem is the conic programming problem (and its dual form) \[ \ba{rrclrrrrcl} \min_x & c^Tx & & & \quad \max_{y,s} & b^T y \\ & a_i^Tx & = & b_i, \, i=1,\dots,m, & & \sum_i a_i y_i & + & s & = & c, \\ & x & \in & K, & & & & s & \in & K^*. \ea \] where $K$ is a closed convex cone in $\R^n$ and $K^*$ is the dual cone defined by \[ K^* := \{ s \in \R^n: s^T x \geq 0 \mbox{ for all } x \in K \}. \] It is easy to see that weak duality holds for this pair of problems also. In SDP, instead of $K$ and $K^*$, we have the cone of psd matrices (in both places), so it will be key to show that this cone is self-dual, i.e., equal to its dual. This is a universal form for convex optimization problems since any arbitrary convex problem, \[ \ba{llrcl} & \min_x & f(x) \\ (CP) & & x & \in & C, \ea \] where $f$ is a convex function and $C$ a convex set, can be expressed as a conic program as follows: \begin{itemize} \item[i.] We can assume that the objective function is linear, since $(CP)$ is equivalent to \[ \ba{llrcl} & \min_{(x,\xi)} & \xi \\ & & f(x)-\xi & \leq & 0, \\ & & (x,\xi) & \in & \bar{C}, \ea \] where $\bar{C}:=C\times\R.$ \item[ii.] We can assume that the constraints are in conic form, since \[ \ba{llrcl} & \min_x & c^Tx \\ & & x & \in & C, \ea \] is equivalent to \[ \ba{llrcl} & \min_{(x,\tau)} & c^Tx \\ & & \tau & = & 1, \\ & & (x,\tau) & \in & K=\{(x,\tau)|\tau>0, \frac{x}{\tau}\in C\}. \ea \] \end{itemize} \vspace{2.5in} \pagebreak We conclude the lecture with a little feel for what the cone of psd matrices looks like. First consider the case $n = 2$, and look at the set \[ \{ (x;y;z)\in \R^3 : \left[ \ba{cc} x & y \\ y & z \ea \right] \succeq 0 \}. \] This set is defined by the inequalities $x \geq 0$, $z \geq 0$, and $xz \geq y^2$, and the last can also be written $((x+z)/2)^2 - ((x-z)/2)^2 \geq y^2$, or, in the presence of the other two constraints, $(x+z)/2 \geq \| ((x-z)/2;y) \|_2$. This can be viewed (after a little scaling) as a right circular cone in 3-dimensional space, with its axis in the direction $(1;0;1)$; it is sometimes called the ice-cream, second-order, or Lorentz cone. We see that, while it has extreme rays, it has in fact an infinite number of them. \vspace{3.4in} Next let us consider the case $n = 3$, but restrict ourselves to psd matrices whose diagonal entries are ones. These are {\em correlation matrices}, and the corresponding set is the {\em elliptope}. It is defined by \[ \{ (x;y;z)\in \R^3 : \left[ \ba{ccc} 1 & x & y \\ x & 1 & z \\ y & z & 1\ea \right] \succeq 0 \}. \] Pictures of this ``inflated tetrahedron'' or ``humbug'' can be found at the two websites \linebreak {\tt http://www-user.tu-chemnitz.de/$\sim$helmberg/semidef.html}, \linebreak {\tt http://www.convexoptimization.com/dattorro/elliptope$_-$and$_-$fantope.html}, \linebreak where you can see that it has four sharp ``vertices'' but otherwise a generally smooth boundary. The first of these websites is a very useful reporitory of information about SDP. Next time we will present a very simple instance of the SDP in dual form to give an idea of its applicability, and list (and prove some of) a number of very useful facts about symmetric matrices. \end{document}