Linear functions

Up to now, we learned a more practical part of the course:

It always how to compute.

Now we enter a more conceptual part of the course and organize ideas.

The concept of a vector space

We have been speaking about a vector space as being closed under linear combinations. However if we stop to think about it, we can only compute a linear combination provided we know how to multiply vectors by numbers and how to add vectors, see the linear combination below:

\[ \overbrace{\alpha u}^\text{number times a vector} + \overbrace{\beta v}^\text{number times a vector}=\underbrace{u'+v'}_\text{sum of vectors} \]

Lets look more carefully at two questions:

  • What is a sum of two vectors?

  • What is a vector times a number?

Answering these questions is important because the concept of linear combination and of vector space we adopted so far - a set of things closed under linear combinations - hinges on answering these two questions.

Start with a set of \(n\) things with labels \(v_1\), \(v_2\), …, \(v_n\):

\[ \mathbb{V}=\{v_1,v_2,\dots,v_n\} \]

What can we do with these things in this set?

We can define a map, i.e., an assignment or a function, from the set onto itself. There are two specific kinds of maps that are relevant to those questions:

  • A function which to each pair of elements of \(\mathbb{V}\) assigns another element of \(\mathbb{V}\), in math language we write:\[ \begin{align} + : \mathbb{V}\times\mathbb{V} &\longrightarrow\mathbb{V}\\ (u,v)&\longmapsto u+v \end{align} \tag{1}\]

    This is relevant for the first question.

  • A function which to each \(\mathbb{V}\) and a real number assigns another element of \(\mathbb{V}\), in math language:

    \[ \begin{align}\cdot : \mathbb{R}\times\mathbb{V} &\longrightarrow\mathbb{V}\\(\alpha,v)&\longmapsto \alpha\cdot v\end{align} \tag{2}\] is relevant for the second question.

The \(u+v\) is the symbol for the element in \(\mathbb{V}\) that corresponds to the pair \((u,v)\). And \(\alpha\cdot v\) is the symbol for the element of \(\mathbb{V}\) assigned to \((\alpha,v)\).

We wish additionally that those maps to have the properties:

  • Commutative: \(u+v=v+u\). [You can flip the order]

  • Associative: \((u+v)+w=u+(v+w)\). [You can shift the brackets]

  • Neutral element (wrt to \(+\)): There is an element in \(\mathbb{V}\), call it \(o\) so that for all \(u\in \mathbb{V}\) we find \(u+o=u\). [This is not \(0\) from the real numbers, it is the zero of \(\mathbb{V}\). To avoid confusion you can call \(o\) as circle rather than zero.]

  • Inverse element (wrt to \(+\)): For every element \(u\) there an element, which we denote by \(-u\), that guarantees \(u+(-u)=o\). [ \(u-v\) is just a short notation for \(u+(-v)\).

The next list of rules describe how the operations \(+\) and \(\cdot\) interact when both present in a calculation, two of this rules are distributive rules:

  • Associative: \(\alpha(\beta u)=(\alpha\beta)v\)

  • Distributive I: \((\alpha+\beta)u = \alpha u +\beta u\)

  • Distributive II: \(\alpha (u +v) = \alpha u + \alpha v\)

  • Unitary: \(1 u = u\)

When looking at Equation 1, Equation 2 and the CANI-ADDU rules note we did not mention what are the elements of \(\mathbb{V}\) and neither did we mention how to compute the assignment, the only thing stated is that \(+\), whatever it ends up being in practice, it maps two elements of \(\mathbb{V}\) into another element of \(\mathbb{V}\) and that, whatever the dot \(\cdot\) ends up meaning in practice, it must assign to a real number and an element \(\mathbb{V}\) another element of \(\mathbb{V}\). All this while, simultaneously all the above rules are obeyed. So, in practice, the \(+\) and \(\cdot\) are really special operation, such is the number of constraints imposed.

A vector space is then a triplet \((\mathbb{V},+,\cdot)\)

Commentaries
  • Why these two maps with these rules? Because on them fit many useful maps in practice. The list of properties is like describing someone: you can list all its characteristics, or just list the essential ones. Here essential depends for what the description is useful for. For example, your CV works for the purpose of getting a job.

  • There are many constraints (CANI and ADDU) on the operation \(+\) and \(\cdot\), but all these boil down to make \(\mathbb{V}\) being closed under l.c., where the l.c. is realized in practice the natural and expected way. The constraints above are the rules of common sense for l.c.

  • The set of assignments made by \(+\) and \(\cdot\) which we can write as \(\{(u,v,u+v)\}\) and \(\{(c,u,cu)\}\) is what we call the structure of the vector space.

What specific sets and \(+\) and \(\cdot\) operations fit the profile above? Here are some examples:

Example 1:

The set of all polynomials of degree \(2\):

\[ \mathbb{P}=\{p:\mathbb{R}\rightarrow\mathbb{R}; x\mapsto a_2 x^2 +a_1 x +a_0\,\,|\,\, a_i \text{'s are real}\} \]

Together with:

\[ \begin{align}+ : \mathbb{P}\times\mathbb{P} &\longrightarrow\mathbb{V}\\(p,q)&\longmapsto (p+q)(x)=p(x)+q(x)\end{align} \]

Here, the \(+\) function of functions \(p\) and \(q\) is defined from the summation \(+\) of real numbers \(p(x)\) and \(q(x)\). While before \(+\) did not have meaning now it has in terms of the \(+\) between numbers.

\[ \begin{align}\cdot : \mathbb{R}\times\mathbb{P} &\longrightarrow\mathbb{V}\\(\alpha,p)&\longmapsto (\alpha\cdot p)(x):=\alpha\cdot p(x)\end{align} \]

The multiplication between a real number and the function \(p\) is understood in therms of the multiplication between the real numbers \(\alpha\) and \(p(x)\).

All the rules above can be checked for this example.

The natural basis for this vector space is: \(\{1,x,x^2\}\)

Commentary

This example is provided to show that the elements of vectors spaces need not be column vectors!

Example 2 of a vector space

Consider the set ordered pairs:

\(\mathbb{V} = \{(x,y)\,\,|\,\, x,y\in \mathbb{R}\}\)

And the maps:

\[ \begin{align} + : \mathbb{V}\times\mathbb{V} &\longrightarrow\mathbb{V}\\ (u,v)&\longmapsto u+v := (u_x,u_y)+(v_x,v_y)=(u_x+v_x,u_y+v_y) \end{align} \]

and

\[ \begin{align}\cdot : \mathbb{R}\times\mathbb{V} &\longrightarrow\mathbb{V}\\(\alpha,u)&\longmapsto \alpha\cdot u:=(\alpha u_x,\alpha u_y)\end{align} \]

Once again the summation of vectors manifests as a entry-wise summation of numbers and the multiplication of a vector by a real number means the multiplication of its entries (real numbers) by the real number \(\alpha\). While before \(\cdot\) procedure was not specified, now it is, through the multiplication \(\cdot\) between numbers.

Notice that when I write \((u_x,u_y)+(v_x,v_y)\) one may wonder what it means, we know it is another element of \(\mathbb{V}\), but which one? To answer that we have to tell how the operation \(+\) between two elements of \(\mathbb{V}\) actually maps. Whatever the way it maps, it must be consistent with the rules defined above. It turn out that to define this \(+\) as doing \((u_x+v_x,u_y+v_y)\) (a plus between reals) is a good and useful choice.

Linear functions

Consider two generic vector spaces \(\mathbb{V}\) and \(\mathbb{W}\).

A linear function are the set of links between elements of both vectors spaces, these links (i.e. pairs \((v,w)\)) are constructed as follows:

Definition 1 \[ \begin{align} f:\mathbb{V}&\overset{\sim}{\longrightarrow} \mathbb{W}\\ v&\longmapsto f(v) \end{align} \]

the word linear, describes a property that \(f\) must have: \(f(a u+b v) = a f (u) + b f(v)\) for all \(a,b \in \mathbb{R}\) and \(u,v\in \mathbb{V}\).

Terminology: The vector space \(\mathbb{V}\) is the domain of the function \(f\), while \(\mathbb{W}\) is the codomain (of \(f\))

When referring to a linear function we will use \(\sim\) and write: \(f:\mathbb{V}\overset{\sim}{\longrightarrow} \mathbb{W}\).

[Commentary: Many function which are useful in practice have this property. Hence the definition.] Here are some examples of \(f\)’s and the tests for linearity property.

Example 1

Start with the vectors spaces of polynomials of degree \(2\) and \(1\). The procedure \(\frac{d}{dx}\) defines the following map:

\[ \begin{align}f:\mathbb{P}_2&\longrightarrow \mathbb{P}_1,\qquad a_2x^2+a_1x+a_0\longmapsto f(a_2x^2+a_1x+a_0):=\frac{d}{dx}(a_2x^2+a_1x+a_0)=2a_2x+a_1\end{align} \]

which is clearly linear.

Example 2

Still the same vector spaces but this time the map is:

\[ \begin{align}f:\mathbb{P}_2&\longrightarrow \mathbb{P}_1,\qquad a_2x^2+a_1x+a_0\longmapsto f(a_2x^2+a_1x+a_0):=(a_2-a_1)x+(a_1+a_0)\end{align} \]

It is linear as well!

Example 4

Now a function from \(\mathbb{R}\) into \(\mathbb{R}^2\), which maps as follows:

\[ \begin{align}g:\mathbb{R}&\longrightarrow \mathbb{R}^2\\ x&\longmapsto g(x):=(2x,x)\end{align} \]

The procedure on which the map is based is to each \(x\) compute \(2x\) and then write \((2x,x)\). Is this procedure linear?

\[ g(c_1 x_1+ c_2 x_2)=(2(c_1 x_1+ c_2 x_2),c_1 x_1+ c_2 x_2) = c_1(2x_1+,x_1)+c_2(2x_2,x_2)=c_1g(x_1)+c_2g(x_2) \]

Because our choices where arbitrary, yes, it is linear.

Example 5:

The projection function procedure consists in picking out the \(x\) component of the vector \((x,y)\):

\[ \begin{align}\pi:\mathbb{R}^2&\longrightarrow \mathbb{R}\\ (x,y)&\longmapsto \pi(x,y):=x\end{align} \]

Is it linear?

\[ \pi(c_1(x_1,y_1)+c_2(x_2,y_2)) = \pi (c_1x_1+c_2x_2,c_1y_1+c_2y_2)=c_1x_1+c_2x_2 =c_1 \pi(x_1,y_1) + c_2\pi(x_2,y_2) \]

A l.c. of inputs \((x_1,y_1)\) and \((x_2,y_2)\) yields the same l.c. of outputs \(\pi(x_1,y_1)\) and \(\pi(x_2,y_2)\).

Example 6:

From \(\mathbb{R}^2\) to \(\mathbb{R}^2\):

\[ \begin{align}\Phi:\mathbb{R}^2&\longrightarrow \mathbb{R}^2\\ (x,y)&\longmapsto \Phi(x,y):=(2x-y,x+y)\end{align} \]

Is it?

\[ \begin{align} \Phi(c_1(x_1,y_1)+c_2(x_2,y_2)) &= \Phi(c_1x_1+c_2x_2,c_1y_1+c_2y_2)\\ &=(2(c_1x_1+c_2x_2)-(c_1y_1+c_2y_2),c_1x_1+c_2x_2+c_1y_1+c_2y_2)\\ &=c_1(2x_1-y_1,x_1+y_1)+c_2(2x_2-y_2,x_2+y_2)\\ &=c_1\Phi(x_1,y_1)+c_2\Phi(x_2,y_2) \end{align} \]

Yes.

Commentary

It is usual to give a more break down version of the rule \(f(a u+b v) = a f (u) + b f(v)\):

  • \(f(u+v)=f(u)+f(v)\)

  • \(f(b u)=b f(v)\)

Because, if you know these then \(f(a u+b v) = a f (u) + b f(v)\) comes as natural consequence. So just checking for these two suffices.

The Matrix representation of a linear function

Given a function \(f\), for example \(f(x,y)=(2x-y,x+y)\), what is its matrix representation?

How:

  1. Choose a basis for the domain and a basis for the codomain;
  2. Compute the output of \(f\) when it act on the basis of the domain and then write the output in terms of the basis of the codomain.
  3. With 2. we can now act with the matrix on any vector of the domain, which is written wrt to that basis.

Lets follow these steps.

Step 1: Choose a basis for the domain and codomain

As said above we have to chose a basis for \(\mathbb{V}\), let it be:

\[ \mathbf{B}=(v_1, v_2, .. , v_n) \tag{3}\]

The next step is to choose a basis for the codomain \(\mathbb{W}\).

\[ \mathbf{C}=(w_1,w_1,..,w_m) \tag{4}\]

Choosing a basis allow us to express any element of these spaces in terms of the basis, therefore:

  • If \(v\in\mathbb{V}\) it has the form:

\[ v=\sum_{j=1}^n x_j v_j = \mathbf{B}X,\qquad X=\begin{pmatrix}x_1\\\vdots\\x_n\end{pmatrix} \tag{5}\]

  • If \(w\in\mathbb{W}\) it has the form:

\[ w=\sum_{i=1}^m y_i w_i = \mathbf{C}Y,\qquad Y=\begin{pmatrix}x_1\\\vdots\\x_m\end{pmatrix} \tag{6}\]

Example

For the function \(f:\mathbb{R}^2\longrightarrow \mathbb{R}^2\) given by \(f(x,y)=(2x-y,x+y)\) we have \(\mathbb{V}=\mathbb{W}=\mathbb{R}^2\) . Choose the basis:

\[ v_1:=(1,-4),v_2:=(0,1)\qquad w_1:=(1,0),w_2:=(1,1) \tag{7}\]

I would like to stress the fact that this is my choice, we can make other choices (as we shall see) of bases for \(\mathbb{V}\) and \(\mathbb{W}\).

Thus any vector \(v\) of the domain is expressed as:

\[ v=\begin{pmatrix}v_1 & v_2 \end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix} \tag{8}\]

and any \(w\) from the codomain can be written as:

\[ w=\begin{pmatrix}w_1 & w_2 \end{pmatrix}\begin{pmatrix}y_1\\y_2\end{pmatrix} \tag{9}\]

How to compute \(x_1, x_2, y_1, y_2\) for given \(v,w\in\mathbb{R}^2\) ?

Substitute into the lhs of Equation 8 our \(v=(\alpha,\beta)\) and on the rhs our choice of \(v_1\) and \(v_2\) from Equation 7:

\[ \begin{pmatrix}\alpha\\\beta\end{pmatrix}=\begin{pmatrix}1 & 0\\-4 & 1 \end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix} \tag{10}\]

then solve for \(x_1\) and \(x_2\).

Similarly, given \(w=(\mu,\nu)^\intercal\), Equation 9 becomes:

\[ \begin{pmatrix}\mu\\\nu\end{pmatrix}=\begin{pmatrix}1 & 1\\0 & 1 \end{pmatrix}\begin{pmatrix}y_1\\y_2\end{pmatrix} \tag{11}\]

Step 2: Compute how \(f\) acts on the domain’s basis

Act with \(f\) on the chosen basis vectors \(v_j\), obtaining \(f(v_j)\).

For the particular example we are considering we find:

\[ f(v_1)=f(1,-4)=(2\cdot 1-(-4),1-4)=(6,-3)\qquad f(v_2)=f(0,1)=(2\cdot 0-1,0+1)=(-1,1) \]

But since these outputs live in \(\mathbb{W}=\mathbb{R}^2\) they can be written as an appropriate l.c. of the basis \(w_1\) and \(w_2\), see Equation 11:

\[ \begin{align} f(v_1)&=(6,-3)=9(1,0)-3(1,1)=9w_1-3w_2\\ f(v_2)&=(-1,1)=-2(1,0)+1(1,1)=-2w_1+w_2 \end{align} \tag{12}\]

The rhs of these formulas have the form:

\[ f(v_j) = \sum_{i=1}^m A_{ij}\,\,w_i \tag{13}\]

the coefficients are conveniently named as \(A_{ij}\). See Equation 13 as a special case of the second equation in Equation 9 for the basis vectors.

Comparing Equation 13 and Equation 12 we see that:

\[ \begin{matrix} A_{11} = 9 & A_{12}=-2\\ A_{21} = -3 & A_{22} = 1 \end{matrix} \]

The numbers \(A_{ij}\) tell us how much \(f(v_j)\) looks like \(w_i\) just as the \(y_i\)’s did in Equation 9. Of course, unlike the \(y_i\)’s which were specific for the \(w\) vector, the \(A_{ij}\)’s have an additional index \(j\) to remind us that we are currently expressing \(f(v_j)\). As a result \(i=1,2,..,m\) and \(j=1,2,..,n\), where \(m=\dim \mathbb{W}\) is the dimension of the codomain and \(n=\dim \mathbb{V}\) is that of the domain. In the present example both \(n=m=2\).

We can write the \(n\) results in Equation 13 in a compact manner as:

\[ (f(v_1),..,f(v_n))=:f(\mathbf{B})=\mathbf{C}A \tag{14}\]

where the \(A\) is a matrix with entries \(A_{ij}\).

Equation 14 for our particular case turns out to be:

\[ (f(v_1),f(v_2))=:f(\mathbf{B})=\begin{pmatrix}w_1 & w_2 \end{pmatrix}\begin{pmatrix}9 & -2\\-3 & 1\end{pmatrix}=\begin{pmatrix} 9w_1-3w_2 & -2w_1+w_2\end{pmatrix} \tag{15}\]

Step 3: Mapping a generic vector

We can act with \(f(x,y)=(2x-y,x+y)\) on any vector of \(\mathbb{V}=\mathbb{R}^2\), no problem, for example, acting with \(f\) on \(v=(1,3)\) gives us:

\[ f(1,3)=(2\cdot 1-3,1+3)=(-1,4) \tag{16}\]

However, what we want is to do something quiet different from the calculation Equation 16, which is to represent how \(f\) acts on any element of \(\mathbb{V}\) expressed in terms of \(\mathbf{B}\) and return the output expressed in terms of the chosen \(\mathbf{C}\). Step 1 and 2 allow us to do that.

We express \(v\) in terms of \(\mathbf{B}\) which is:

\[ v=\begin{pmatrix}v_1 & v_2 \end{pmatrix} \begin{pmatrix}x_1\\x_2\end{pmatrix}\qquad\text{with}\qquad x_1=1\,\,\, x_2=7 \]

Act with \(f\) on it:

\[ f\left(\begin{pmatrix}v_1 & v_2 \end{pmatrix}\begin{pmatrix}1\\7\end{pmatrix}\right)=f\begin{pmatrix}v_1 & v_2 \end{pmatrix}\begin{pmatrix}1\\7\end{pmatrix}=\begin{pmatrix}w_1 & w_2 \end{pmatrix}\begin{pmatrix}9 & -2\\-3 & 1\end{pmatrix}\begin{pmatrix}1\\7\end{pmatrix} = \begin{pmatrix}w_1 & w_2 \end{pmatrix}\begin{pmatrix}-5\\4\end{pmatrix} \tag{17}\]

where in the first step we used linearity and in the second step we used how \(f\) acts on the basis, see Equation 15.

The vector \(\mathbf{B}\begin{pmatrix}1\\7\end{pmatrix}\) is mapped into the vector \(\mathbf{C}\begin{pmatrix}-5\\4\end{pmatrix}\) by the function \(f\). As we can check, the output is the \((-1,4)\) we had in ?@eq-eg_og_calc is just the rhs of Equation 17:

\[ \begin{pmatrix}w_1 & w_2 \end{pmatrix}\begin{pmatrix}-5\\4\end{pmatrix} =-5w_1+4w_2 = -5\begin{pmatrix}1\\0 \end{pmatrix}+4 \begin{pmatrix}1\\1 \end{pmatrix}=\begin{pmatrix}-1\\4 \end{pmatrix} \]

In general formula Equation 17 can be compactly written as:

\[ f(v)=f(\mathbf{B}X)=f(\mathbf{B})X=\mathbf{C}AX \tag{18}\]

Details

The Equation 18 is an abbreviation for:

\[ \begin{align}f(v)&=f(\sum_{j=1}^m x_jv_j)\\&=\sum_{j=1}^m x_jf(v_j)\\&=\sum_{j=1}^m x_j\sum_{i=1}^n A_{ij}\,\,w_i\\&=\sum_{i=1}^n\left(\sum_{j=1}^m A_{ij}x_j\right)w_i\end{align} \]

where in the last line is the expanded version of \(\mathbf{C} AX\).

What does steps 1. 2. 3. above have to do with \(Ax=b\)?

Starting from Equation 18, with further calculation, we can see the connection.

Since \(f(v)\) is in the codomain \(\mathbb{W}\), then it is some l.c. of the \(w\)’s, recall Equation 6. Therefore:

\[ f(v)=\sum_{i=1}^n y_i w_i =\mathbf{C}Y, \qquad Y=\begin{pmatrix}y_1\\\vdots\\y_n\end{pmatrix} \tag{19}\]

Comparing ?@eq-general_map with Equation 18:

\[ \mathbf{C}Y = \mathbf{C}A X \tag{20}\]

Which implies:

\[ Y=AX \]

Which is our \(A\mathbf{x} =\mathbf{b}\) we have dealing with.

Details

Equation 20 is:

\[ \begin{align} \sum_{i=1}^m y_iw_i=\sum_{i=1}^m\left(\sum_{j=1}^n A_{ij}x_j\right)w_i\\ \implies y_i=\sum_{j=1}^n A_{ij}x_j \end{align} \]

This calculation tell us something important: from the coefficients of \(v\) on the chosen basis of the domain of \(f\), they give us the coefficients of \(f(u)\) in the chosen basis of the codomain. Lets read better the meaning of this formula, the \(x\)’s tell us how much the input vector \(v\) looks like to each one of the basis vectors \(v\). The numbers \(A_{ij}\) quantify how much a basis vector \(f(v_j)\) look like the basis vector \(w_j\). This means that the calculation \(\sum_{j=1}^n A_{ij}x_j\) can be understood like:

[how much \(f(v)\) looks like \(w_i\)] = [how much \(f(v_1)\) looks like \(w_i\)] \(\times\) [how much \(v\) looks like \(v_1\)] \(+\) [how much \(f(v_2)\) looks like \(w_i\)] \(\times\) [how much \(\mathbf{u}\) looks like \(v_2\)] \(+\dots +\) [how much \(f(v_n)\) looks like \(w_i\)] \(\times\) [how much \(v\) looks like \(v_n\)]

Its a weighted sum of the representation of \(v\)!

This understanding is clear because we know exactly what basis we are speaking off, after all, the first step was to make the choice of bases.

Commentary

I these sections we have started with a specific function \(f\), specific sets \(\mathbb{V}\) and \(\mathbb{W}\) and bases. Notice however that the only essential ingredients for the whole derivation is to know how \(f\) acting on the basis of the domain is expressed in the basis of the codomain! Knowing this together with the property \(f(au+\mu bv) = a f (u) + b f(v)\) allowed us to recast the procedures \(f\) in a matrix form. These three steps are based on the idea that if we know how a linear function \(f\) maps the chosen basis of the domain, we know how it maps any vector of the domain. And this is possible because \(f\) has the linear property.

Isomorphism and stuff

The hypervector \(\mathbf{B}\) (and \(\mathbf{C}\)) allow us to make a bijective correspondence between vectors in \(\mathbb{R}^n\) (and \(\mathbb{R}^m\)) and vectors in \(\mathbb{V}\) (and \(\mathbb{W}\)). A vector \(X\in\mathbb{R}^n\) is transformed into \(v\in\mathbb{V}\) by dotting the column vector \(X\) with the hypervector \(\mathbf{B}\):

\[ v=\mathbf{B}X \]

same thing with \(Y\) and \(w\):

\[ w=\mathbf{C}Y \]

We picture this correspondence by the vertical arrows in the following picture:

Subspaces in \(\mathbb{R}^n\) are translated into subspaces of \(\mathbb{V}\), this has important consequences. If \(\mathbb{R}^n=\alpha\oplus\beta\) is decomposed by a direct sum of subspaces \(\alpha\) and \(\beta\), then multiplying the vectors in the them by the hypervector \(\mathbf{B}\) yields vector spaces in \(\mathbb{V}=(\mathbf{B}\alpha) \oplus(\mathbf{B}\beta)\). [Warning: if \(\beta=\alpha^\perp\) it may happen that it is not the case that\((\mathbf{B}\beta)=(\mathbf{B}\alpha)^\perp\).]

Now, introduce a map \(f\) between elements of \(\mathbb{V}\) and \(\mathbb{W}\), as was our \(f(x,y)=(2x-y,x+y)\), this is represented by the bottom arrow. Since the elements \(v\) are being mapped into \(w\) and both can be translated uniquely into \(X\) and \(Y\), we naturally expect that there is also map, called \(A\), that directly connects \(X\) with \(Y\).

If \(f\) links \(\mathbf{B}X\) with \(\mathbf{C}Y\), then \(A\) links \(X\) with \(Y\), how? Answer: \(Y=AX\).

See the diagram below:

We have been talking about maps of individual vectors, but we can also do it for subspaces. A subspace \(V\subset_{vec}\mathbb{V}\) is mapped by \(f\) into the subspace\(f(V)\subset_{vec} \mathbb{W}\). Here is an intuitive justification: if \(e_1\) and \(e_2\) are a basis for \(V\), then any l.c. of them gives us a vector in \(V\), this is expected, after all \(V\) is a vector space. But is it true that any l.c. of \(f(e_1)\) and \(f(e_2)\) is also a vector of \(f(V)\)? Well, a generic vector in \(f(V)\) must have the form \(f(ae_1+be_2)\), but since \(f\) is linear this in turn means \(af(e_1)+bf(e_2)\), but this just tells us that any element of \(f(V)\) is a l.c. of \(f(e_1)\) and \(f(e_2)\).

Thus: a linear function \(f\) maps subspaces of \(\mathbb{V}\) into subspaces of \(\mathbb{W}\).

And since subspaces of \(\mathbb{V}\) and \(\mathbb{W}\) in turn have a version in \(\mathbb{R}^n\) and \(\mathbb{R}^m\), the matrix \(A\) maps those as well.

Thus: a linear function \(f\) maps subspaces of \(\mathbb{V}\) into subspaces of \(\mathbb{W}\). And correspondingly, its matrix representation \(A\) maps subspaces of \(\mathbb{R}^n\) into the corresponding subspaces of \(\mathbb{R}^m\).

Revisiting Two important subspaces

In the domain and codomain of a map \(f:\mathbb{V}\overset{\sim}{\longrightarrow}\mathbb{W}\) we find two important subspaces.

Kernel of a linear function

Definition 2 Let \(f:\mathbb{V}\overset{\sim}{\longrightarrow} \mathbb{W}\), then the kernel of \(f\) is the subset of the domain \(\mathbb{V}\):

\[ \ker f = \{v_N\in\mathbb{V}\,\,|\,\,f(v_N)=0\} \]

Moreover, it is in fact a subspace.

With basis introduced in both spaces we can write this set as:

\[ \ker f = \{\mathbf{B}X_N\in\mathbb{V}\,\,|\,\,\mathbf{C}AX_N=\mathbf{C}0\} \]

After solving \(AX_N=0\) we multiply the solutions by \(\mathbf{B}\), giving us \(\ker f\).

Notice that the solutions of \(AX_N=0\) constitute the nullspace of the matrix \(A\). The difference between \(N(A)\) and \(\ker f\) is a multiplication by \(\mathbf{B}\)

\[ \ker f = \mathbf{B} N(A) \]

See the diagram above.

Image of the domain (range of linear functions)

Definition 3 Let \(f:\mathbb{V}\overset{\sim}{\longrightarrow} \mathbb{W}\), then the range (image) of \(f\) is the following subset of the domain \(\mathbb{W}\):

\[ f(\mathbb{V}) = \{f(u)\in\mathbb{W}\,\,|\,\,u\in\mathbb{V}\} \]

Moreover, it is also a subspace.

Upon introducing a basis, this problem can be reformulated as:

\[ f(\mathbb{V}) = \{\mathbf{C}AX\in\mathbb{W}\,\,|\,\,X\in\mathbb{R}^{\dim \mathbb{V}}\} \]

To compute this vector space we find first the column space of \(A\) and then multiply it by \(\mathbf{C}\) get us the image!

\[ f(\mathbb{V})=\mathbf{C} C(A) \]

See the diagram above.

Commentary

We know that:

  • \(\mathbb{R}^n=C(A^\intercal) \oplus N(A)\) which translate into \(\mathbb{V}=(\mathbf{B}C(A^\intercal)) \oplus (\mathbf{B}N(A))\)

  • \(\mathbb{R}^m=C(A) \oplus N(A^\intercal)\) which translates into \(\mathbb{W}=(\mathbf{C}C(A)) \oplus (\mathbf{C}N(A^\intercal))\)

  • \(C(A^\intercal) \perp N(A)\) and that \(C(A) \perp N(A)^\intercal\), always! But the same is not true after multiplying by \(\mathbf{B}\) and \(\mathbf{C}\), unless the chosen basis \(\mathbf{B}\) and \(\mathbf{C}\) are orthogonal (eg. canonical basis).