Algebra and Optimization for Robotic Arms

Back when I took my first Linear Algebra class during my undergrad, many concepts were outright very mysterious: things like the Null Space, Right-hand Systems, and Rotation Matrices made sense algebraically but not really geometrically. In this post we will cover a robotic arm project that covers some of these concepts.

The goal is to make a robotic arm to pick up a bottled can in a defined area and to put it to the side. Everything done automatically (the solution should determine where the can is, and give orders to the arm to grab the bottle and move it sideways).

Robotic Arm with servomotors.

To do this, we are going to use a webcam to take a picture of a rectangular area where a can will be placed randomly. This can be easily achieved with python packages such as OpenCV. The end result of running the CV algorithm looks like this:

Computer Vision algorithm on an image of a randomly placed can on a piece of paper.

The piece of paper can be seen as a 2-d plane, where the center of the can will occupy coordinates $(x_c,y_c)$. Moreover, the can has a height of $z_h$ centimeters. The center top of the can is at the point $(x_c,y_c,z_h)$ and the top contour of the can is defined by the equation $(x,y,z_h)$ such that $(x-x_c)^2+(y-y_c)^2=r^2$ where $r$ is the radius of the can. For example, the borders of the can parallel to the $x$ axis are the points $(x_c\pm r,y_c,z_h)$. The question is then, given the position of the can expressed by these equations, how to control the arm to move there?

The robot arm: translations and rotations

A robotic arm can be seen as a collection of rotations (think that each joint of the robot extends as a vector in a direction determined by the angle of the servomotor), the length of the vector is the length of that particular joint.

This is where the methods in Linear Algebra become useful: imagine a point in 3-d space defined by the coordinates $(x,y,z)$. The first and most useful linear transformation is the so-called identity transformation, represented by a matrix $I$ called the identity: $I= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}$

This transformation is useful as $I(x,y,z)^T=(x,y,z)^T$. In other words, the identity transformation applied to a vector, returns the same vector. Another useful concept is that linear transformations can be concatenated. For example, applying the identity transformation $n$ times still returns the same vector, that is, $I^n(x,y,z)^T=(x,y,z)^T$ (beyond the trivial algebraic fact derived from the definition of matrix multiplication, it’s an interesting exercise to think why this is the case).

The columns of the identity matrix $I$ form what’s called an orthonormal basis of $\mathbb{R^3}$, composed by 3 vectors: $(1,0,0)$, $(0,1,0)$ and $(0,0,1)$. Visually each of these vectors is the unit length vector (a normal vector) and perpendicular to each other (orthogonal). These vectors are said to span $\mathbb{R}^3$ as any vector (x,y,z) can be written as a combination of these vectors. Geometrically, the identity transformation preserves the original coordinate system of the vector and its length. It’s not difficult to imagine that the transformation that doubles the length of any vector is $2I$, or shortens it by half is $(1/2) I$.

Another useful transformation is rotating the vector by $\theta$ degrees along an axis. To build this from before imagine a 2-d basis with the vectors $(1,0)$ and $(0,1)$. Rotating $(1,0)$ by $\theta$ degrees yields the vector $(\cos(\theta),\sin(\theta))$. Likewise, rotating $(0,1)$ by $\theta$ degrees yields the vector $(-\sin(\theta),\cos(\theta))$. The matrix $\begin{pmatrix} \cos(\theta) & -\sin(\theta)
\sin(\theta) & \cos(\theta)
\end{pmatrix} $ is a rotated basis by $\theta$ degrees from the original orientation of the basis vectors in 2-d. Extending this idea to 3-d leads to the following rotation matrices:

\[R_x(\theta)= \begin{pmatrix} 1 & 0 & 0 \\ 0 & \cos(\theta) & -\sin(\theta) \\ 0 & \sin(\theta) & \cos(\theta) \end{pmatrix}, R_y(\theta)= \begin{pmatrix} \cos(\theta) & 0 & \sin(\theta) \\ 0 & 1 & 0 \\ -\sin(\theta) & 0 & \cos(\theta) \end{pmatrix}, R_z(\theta)= \begin{pmatrix} \cos(\theta) & -\sin(\theta) & 0 \\ \sin(\theta) & \cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

With these matrices, now any vector can be rotated any amount of degrees along an axis. For example, taking a vector $(x,y,z)$, rotating it first $\theta_1$ degrees around axis $y$ and then $\theta_2$ degrees around axis $x$ can be written as: $R_x(\theta_2)R_y(\theta_1)(x,y,z)^T$.

We are almost done with the amount of transformations necessary to fully describe a robotic-arm. We can describe rotations by concatenating rotation matrices. The only transformation left is translating each rotation by the length of each joint of the robotic arm. Given a vector $(x,y,z)$ rotated an angle $\theta$ around the $z$ axis representing the end-point of one arm, that is $R_z(\theta)(x,y,z)^T$, and another vector $a=(x_1,y_1,z_1)$ as another arm, the sum of these vectors is equivalent of concatenating them together, that is, $R_z(\theta)(x,y,z)^T+a$. This can be written in matrix form as: $T(z,\theta,a)\begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}= \begin{pmatrix} R_z(\theta) & a\\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}=(R_z(\theta)(x,y,z)^T+a,1)$

With this trick (of adding an extra dimension), now all movements of a robotic arm can be written as concatenations of rotation matrices and a translation vectors by multiplying them as: $T(c_1,\theta_1,a_1)\times T(c_2,\theta_2,a_2)\times\cdots\times T(c_n,\theta_n,a_n)$

The vector $a_i$ normally denotes the length of a joint of the arm (for example, if the first joint has a length of 5 c.m. The vector $a_1$ can be $(5,0,0)$).

Connection with convex optimization

With the method in the previous section, given a collection of angles $(\theta_1,\dots,\theta_n)$, lengths of the joints of the arms $(a_1,\dots,a_n)$ and coordinates of movement $(c_1,\dots,c_n)$ where $c_i\in{x,y,z}$ the final position of the tip of the robotic arm is given by the last column of the product $\prod_{i=1}^nT(c_i,\theta_i,a_i)=F$. Denote this last column as $f_4$.

Then, for a given target position $(x’,y’,z’)$ the angles $(\theta_1,\dots,\theta_n)$ that position the tip of the robotic arm are given by solving the following convex optimization problem:

\[\min ||f_4(\theta_1,\dots,\theta_n)-(x',y',z',1)||^2_2\] \[\text{s.t. }\,\,\,\,\,\,\theta_l\le\theta_{i}\le \theta_u\]

Solving this problem two times: one for positioning the arm above the can and another for moving it to the side (each time obtaining the angles that produce the target position) results in the following movement:

Example of the optimization formulation to move the robotic arm.