APPLIED MATHEMATICS 1A (ENG) Mathematics 132: Vectors and Matrices

niko k

Objective: Skeletal populations from Byblos, Beirut, and Tyre were studied to assess mechanical stress along the Roman Phoenician coast. Materials: The sample included 153 adult skeletons. Methods: Skeletal remains were macroscopically assessed for osteoarthritis, intervertebral disc disease (IDD), and Schmorl's nodes. Results: The Byblos population experienced higher levels of mechanical stress than the Beirut and Tyre ones. Sex-based differences were also found in all skeletal assemblages with males likely engaging in physically more demanding tasks. Conclusions: The variation in mechanical stress, and associated physically demanding tasks, between these populations can be attributed to their differing political and economic status during the Roman period; textual sources highlight the economic and political dominance of Beirut and Tyre, emanating from their status as coloniae. Significance: This study represents one of the first attempts to investigate mechanical stress in coastal Phoenicia during the Roman period. It provides valuable insights into the biocultural structure of understudied communities at the periphery of the Roman world, and can serve as a basis for further future research into the occupational patterns of Phoenician communities. Limitations: The contextual information for these skeletal populations is very limited and does not allow secure conclusions regarding their representativeness. The sample sizes are also rather small, especially when divided per sex and age. Suggestions for Further Research: Further investigation employing complementary methods such as cross-sectional geometric properties and entheseal changes is needed to reconstruct the occupational patterns of these communities, taking into account cultural, environmental, and temporal factors.

Cognitive and adaptive behaviour abilities early in life provide important clinical prognostic information. We examined stability of such skills in children at high familial risk for ASD who either met diagnostic criteria for ASD at age 7 years (HR-ASD, n = 15) or did not (HR-non-ASD, n = 24) and low-risk control children (LR, n = 37), prospectively studied from infancy. For both HR groups, cognitive skills were consistently lower across time than those of LR children. HR-ASD children showed increasing difficulties in adaptive behaviour over time compared to LR children, while the HR-non-ASD children showed no such difficulties. This pattern of change may inform our understanding of developmental profiles of HR siblings beyond core ASD symptoms.

APPLIED MATHEMATICS 1A (ENG) Mathematics 132: Vectors and Matrices University of KwaZulu-Natal Pietermaritzburg c °C. Zaverdinos, 2010. All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the author. 2 Preface This course has grown out of lectures going back well over 30 years. Names which come to mind are J.Nevin, R.Watson, G.Parish and J. Raftery. I have been involved with teaching linear algebra to engineers since the early 90s and some of my ideas have been incorporated in the course. What is new in these notes is mainly my approach to the theoretical side of the subject. Several of the numerical examples and exercises come from earlier notes, as well as a some theoretical ones. I would like to thank Dr. Paddy Ewer for showing me how to use the program LatexCad for the diagrams and especially the Reverend George Parish for proof-reading and also for elucidating No.5 of Exercise 2. About this course The course is meant to be studied as a whole. Many examples and exercises in later Chapters refer the reader to earlier chapters. This is especially true of Chapter 3. Chapter 1 motivates the idea of a vector through geometry and discusses lines and planes and transformations related to such geometric objects. A vector can be thought of as a displacement in space and an ordered triple of numbers. Chapter 2 generalizes the idea of a triple to an n-tuple and motivates linear algebra through the problem of finding solutions to simultaneous linear equations in n unknowns. The coefficient matrix of such equations is known as a matrix. Simplification of such a matrix by row operations forms the major part of this chapter. Chapter 3 considers matrices in detail and looks at them dynamically in the converse sense of Chapter 2: A matrix defines a transformation of points in n-dimensional space. Matrix multiplication is introduced in terms of the composition of such transformations and some other key concepts such as linear independence, the rank and inverse of a matrix are discussed. Abstract vector spaces are never mentioned, but the the proof of the basic theorem in 3.4.22 on linear independence goes through word-for-word for such spaces which also leads to the well-known Replacement Theorem (see No.14 of Exercise 93). Chapter 4 is about determinants and the cross product (also called the vector product). The theory of determinants predates that of matrices, going back to Leibnitz in the 17th Century. One of the founders of linear algebra, the 19th Century mathematician Arthur Cayley, once remarked that many things about determinants should really come after the study of matrices, and this is the modern approach adopted by us. The cross product is used extensively in mechanics, in particular in the notes Dynamics for Mathematics 142. Algebraic properties of the cross product are derived from those of 3 × 3 determinants, while the exercises can serve as an introduction to some of its applications. Note 1 Some exercises (particularly those marked with an asterisk *) are harder and, at the discretion of the instructor, can be omitted or postponed to a later stage. 3 Bibliography The following books can be consulted but they should be used with caution since different authors have a variety of starting points and use different notation. Many books also have a tendency to become too abstract too early. Unless you are a mature reader, books can confuse rather than help you. 1. A.E. Hirst: Vectors in 2 or 3 Dimensions (Arnold). This may be useful for our Chapter 1 since it makes 3-dimensional space its central theme. 2. R.B.J.T. Allenby: Linear Algebra (Arnold). This is a quite elementary text. 3. J.B. Fraleigh & R.A. Beauregard: Linear Algebra (Addison-Wesley). 4. K. Hardy: Linear Algebra for Engineers and Scientists (Prentice-Hall). 5. D. Lay: Linear Algebra and its Applications (3rd Edition, Pearson). This book treats matrix multiplication in much the same way as we do, but its treatment of geometric aspects is less thorough. It has over 560 pages, becomes abstract and advanced after p.230, but will probably be useful in later years of study. 6. H. Anton: Elementary Linear Algebra (6th Edition, John Wiley and Sons). 7. E.M. Landesman & M.R. Hestenes: Linear Algebra for Mathematics, Science and Engineering (Prentice-Hall). This is quite advanced. I recommend especially (1) for Chapter 1 and (4) and (5) for later chapters of the notes. Because of its elementary nature, (2) is good for Chapter 2. Some of the above books have been used on the Durban campus and, if consulted with care, can be helpful. C. Zaverdinos Pietermaritzburg, January 21, 2010 4 Contents 1 Two and Three-Dimensional Analytic Geometry. Vectors 7 2 Matrices and the Solution of Simultaneous Linear Equations 41 3 Linear Transformations and Matrices 65 4 Determinants and the Cross Product 123 5 6 CONTENTS Chapter 1 Two and Three-Dimensional Analytic Geometry. Vectors 1.1 Points in three-dimensional space The study of the geometry of lines and planes in space provides a good introduction to Linear Algebra. Geometry is a visual subject, but it also has an analytical aspect which is the study of geometry using algebra: how geometric problems can be expressed and solved algebraically. This geometry is also called Cartesian Geometry, after its founder, the 17th Century philosopher and mathematician René Descartes. From school you are already familiar with the Cartesian plane. There are an x−axis and a y−axis meeting at right-angles at the origin O. Every point A in the x − y plane is uniquely represented by an ordered pair (a, b) of real numbers, where a is the x−coordinate and b is the y−coordinate as in Figure 1.1. We write A = (a, b). Here M is the foot of the perpendicular from A to the x−axis and likewise N is the the foot of the perpendicular from A to the y−axis. Notice that M = (a, 0) and N = (0, b). y− axis ......................................... A = (a, b) .. a .. .. .. .. b... b .. .. .. .. .. x− axis a M O N Figure 1.1 In order to represent a point A in space we add the z−axis which is perpendicular to both the 7 8CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS x− and y−axes as in Figure 1.2. Here A = (a, b, c) is a typical point and a is the x−coordinate, b is the y−coordinate and c is z−coordinate of the point A. In the diagram P is the foot of the perpendicular from A to the y − z plane. Similarly, Q and R are the feet of the perpendiculars from A to the z − x and x − y planes respectively. z− axis N P b a a Q A = (a, b, c) b c c M O y− axis a L a b R x− axis Figure 1.2 1.1.1 The Corkscrew Rule and right-handed systems You wish to open a bottle of wine using a corkscrew. You first turn the corkscrew so that it enters the cork. As observed by the opener the turning is clockwise, but observed from the underside of the bottle the sense is reversed: it is anticlockwise. The direction in which a corkscrew moves as it is turned is given by the Corkscrew Rule. In Figure 1.2, rotating from the x−axis to the y−axis and applying this rule produces motion along the (positive) z−axis. Rotating from the y−axis to the z−axis and applying the same rule gives motion along the (positive) x−axis. Finally, rotating from the z−axis to the x−axis and applying the rule produces motion along the (positive) y−axis. The axes x, y, z (in that order) are said to form a right-hand system. This can be represented by the scheme x−−−− z−−→y −−−− x−−→z y x −−−−−−→ Have a look at yourself in the mirror while opening a bottle of wine and observe that what is clockwise for you is anticlockwise for the person in the mirror. Thus a right-hand system of axes observed in a mirror does not follow the corkscrew rule and is called a left-hand system. The usual convention is to use only right-hand systems. 1.1.2 Displacements and vectors Consider two numbers a and b on the real line ℜ. The difference c = b − a is the displacement from a to b. Given any two numbers from a, b and c, the third is uniquely determined by the 9 1.1. POINTS IN THREE-DIMENSIONAL SPACE equation c = b − a. For example, let a = −2 and b = 3. The displacement from a to b is c = 3 − (−2) = 5. The displacement from 6 to 11 is also 5. Let A = (a1 , a2 , a3 ), and B = (b1 , b2 , b3 ) be two points in space. The displacement from A to B is denoted by AB and is given by AB = (b1 − a1 , b2 − a2 , b3 − a3 ) 1.1.3 (1.1) Units and distance The unit can be any agreed upon distance, like the meter or kilometer. Units (of length, time) are important in applications but for the most part we won’t specify them. The distance of A = (a, b, c) from the origin O is given by p |OA| = a2 + b2 + c2 To see this, see Figure 1.2 and use the theorem of Pythagoras: |OA| 2 2 2 2 2 = |OR| + |RA| = |OL| + |LR| + |RA| = a2 + b2 + c2 2 The distance from A to B is written as |AB| and is given by q 2 2 2 |AB| = (b1 − a1 ) + (b2 − a2 ) + (b3 − a3 ) (1.2) and is also called the length or magnitude of AB. Because the displacement AB has both direction and magnitude it is called a vector and pictured as an arrow going from A to B. The tail of the arrow is at A and its head is at B as in Figure 1.3. As an example, let A = (−1, 2.3, −4.5) m and B = (2.1, 3.6, −2.5) m. Then AB = (2.1 − (−1) , 3.6 − 2.3, −2.5 − (−4.5)) = (3.1, 1.3, 2.0) m. Two displacement vectors AB and CD are equal if they are equal as displacements. Hence if C = (−3.5, 1.1, 3.3) and D = (−.4, 2.4, 5.3) and A, B are as above then CD = (−.4, 2.4, 5.3) − (−3.5, 1.1, 3.3) = (3.1, 1.3, 2.0) = AB. q 2 2 2 The magnitude of this vector is |AB| = |CD| = (3.1) + (1.3) + (2.0) = 3.9115 m. Various vectors (displacement, velocity, acceleration, force) play an important role in mechanics and have their own units. For example, if positions are measured in meters and time is measured in seconds, a velocity vector will have units ms−1 . So if a particle moves in such a way that each second it displaces (−1, 2, 3) meters, we say its velocity is constantly (−1, 2, 3) ms−1 . Depending on the geometric interpretation, the tail (head) of displacement vectors may be at any point. Vectors are often written as u = (u1 , u2 , u3 ), v = (v1 , v2 , v3 ), a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ) etc. We now have three ways of expressing the position of a point A in space. If A has coordinates a = (a1 , a2 , a3 ), then A = (a1 , a2 , a3 ) = a = OA (1.3) Although equation (1.3) identifies a point with its position, geometrically speaking we like to distinguish the point itself from its position. The vector OA is called the position vector 10CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS z− axis AB ✿B A b3 a3 O y− axis a1 b1 a2 b2 x− axis Figure 1.3 of the point A. When we say “a is the position vector of point A” it is understood that the tail of a is at the origin and its head is at the point A. Our convention is that we use capital letters A, B, . . . to denote points in space. Given the point C, there is a unique point D in space such that CD = AB (see Exercise 2, No.4). There is also a unique point D such that OD = AB and we may write D = AB. In that case the tail of D can only be O, and it is better to use the notation OD to bring out the vectorial nature of the displacement from O to D. 1.1.4 More systematic notational convention Instead of speaking of the x−, y− and z−axis, we also refer to these as the x1 −, x2 − and x3 −axis respectively. Accordingly, given vectors a, b, c,... it will be convenient to assume (unless otherwise stated) that a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), c = (c1 , c2 , c3 ), .... Also by convention, points A, P , Q,... have position vectors a, p, q,...., unless otherwise specified. 1.1.5 Addition of vectors Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be two vectors. Their sum u + v is defined by the equation u + v = (u1 + v1 , u2 + v2 , u3 + v3 ) The vector −v is defined as −v = (−v1 , −v2 , −v3 ) and the difference u − v = u + (−v) = (u1 − v1 , u2 − v2 , u3 − v3 ) The zero vector is 0 = (0, 0, 0). 1.1.6 Basic properties of addition of vectors For all vectors u, v and w 11 1.1. POINTS IN THREE-DIMENSIONAL SPACE 1. u + (v + w) = (u + v) + w 2. u + v = v + u 3. 0 + u = u (associative law) (commutative law) (the vector 0 = (0, 0, 0) behaves like the number 0) 4. u + (−u) = 0 An important result for addition is the following: 1.1.7 Geometrical interpretation: The triangle law for addition of vectors AB + BC = AC To see this, let the position vectors of A, B and C be a, b and c respectively. Then AB = b−a and BC = c − b. Hence by the properties of addition, AB + BC = (b − a) + (c − b) = (b − b) + (c − a) = 0 + c − a = c − a = AC Figure 1.4 illustrates this geometrical result in which it is understood that the head B of AB is the tail of BC etc. The point D is chosen so that AD = BC. It follows that DC = AB (why?). We have a geometric interpretation of the above commutative law (known as the parallelogram law for addition of vectors): BC + AB = AD + DC = AC = AB + BC BC ✲... C ✶ .. . . . . . O . .. ... .. .. ... . AB AC . .. .. .. ... .. . . DC .. ... .. ... .. ✻ . . .. ... . ......................................................... . A D ✻ AD b B✿ ✒ ✻ Figure 1.4 1.1.8 AB = DC AD = BC AC = AB + BC Multiplication of vectors by scalars. Direction of vectors. Parallel vectors In contrast to a vector, a scalar is just a real number, i.e. an element of the real number system ℜ. We use α, β, a, b, r, s, t, ... to denote scalars. Note that f (f underlined) denotes a vector while f is a scalar. 12CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS Let u = (u1 , u2 , u3 ) be a vector and α ∈ ℜ be a scalar. The product αu is defined by αu = (αu1 , αu2 , αu3 ) The vector αu is said to be a (scalar) multiple of u. We usually omit the qualification ‘scalar’. 1.1.9 Properties of the product αu For all scalars α, β and vectors u and v, 1. α (u + v) = αu + αv 2. (α + β) u = αu + βu 3. α (βu) = (αβ) u 4. 1u = u 5. αu = 0 if, and only if, α = 0 or u = 0 6. |αu| = |α| |u| (the length of αu is |α| times the length of u) The proofs of the first four are left to you. We prove the last: |αu| = = = = = = = 1.1.10 |(αu1 , αu2 , αu3 )| q 2 2 2 (αu1 ) + (αu2 ) + (αu3 ) q α2 u21 + α2 u22 + α2 u23 q α2 (u21 + u22 + u23 ) √ q α2 u21 + u22 + u23 q |α| u21 + u22 + u23 |α| |u| Geometric interpretation of αu. Parallel vectors Since |αu| = |α| |u|, multiplication of u by the scalar α multiplies the length of u by |α|. Let u be non-zero, so that it has a definite direction. If α > 0 the direction of αu is the same as that of u, while if α < 0 the direction of αu is opposite to that of u. Let u and v be non-zero vectors. We say they have the same direction if u = αv for some scalar α > 0 and opposite directions if u = αv for some α < 0. The vector u is parallel to v if v is a multiple of u, that is, u = αv for some scalar α. Necessarily α 6= 0 and we write u k v to express this relation. Notice that we only speak of vectors being parallel if they are non-zero. 1.1.11 Properties of parallelism For all non-zero u, v and w, 1. u k u 2. u k v implies v ku 3. u k v and v k w together imply u k w. 13 1.1. POINTS IN THREE-DIMENSIONAL SPACE These properties are geometrically evident, while analytic proofs are left as an exercise. For example, to see property (2) analytically, let u = αv. As α 6= 0, v = α1 u and v k u, as expected. Example 1.1.1 Let u = AB, where A is the point (−1, 7, −4) and B is the point (−4, 11, 1), so that u = (−3, 4, 5). Let v = CD, where C = (2, 9, −11) and D = (−7, 21, 4), so CD = (−9, 12, 15) = 3u. Hence u and v have the same direction while u and −v have opposite directions but are parallel. 1.1.12 The dot (scalar) product of two vectors Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be any two vectors. Their dot product u · v is defined as u · v = u1 v1 + u2 v2 + u3 v3 (1.4) Note that the dot product of two vectors is a scalar and not a vector (that is why it is also called the scalar product). 1.1.13 Properties of the dot product For all scalars α, β and vectors u, v and w, 1. u · v = v · u 2. α (u · v) = (αu) · v = u · (αv) 3. (αu + βv) · w = (αu · w) + (βv · w) √ 2 4. u · u = |u| , so |u| = + u · u 5. u · u > 0 and u · u = 0 if, and only if u = 0. 1.1.14 Geometric interpretation of the dot product Let u = AB and v = AC be non-zero vectors. Suppose that AB and AC make an angle θ between them at the vertex A, where 0 ≤ θ ≤ 180 in degrees, or 0 ≤ θ ≤ π in radians (recall that 180 degrees = π radians). Then u · v = |u| |v| cos θ (1.5) To see this result, see Figure 1.5 and use AB + BC = AC, so that BC = AC − AB = v − u and 2 |BC| = (v − u) · (v − u) = v · v + (−u) · (−u) − v · u − u · v 2 2 |v| + |u| − 2u · v = Then by the cosine rule (which applies whether θ is acute or obtuse), 2 = |v| + |u| − 2u · v = 2 |BC| 2 2 2 2 2 |AB| + |AC| − 2 |AB| |AC| cos θ 2 2 |u| + |v| − 2 |u| |v| cos θ (from the previous equation) Cancelling |v| + |u| from both sides leads to the required result (1.5). ¢ ¡ √ Example 1.1.2 Let A = (1, 0, 1), B = (2, 0, 3), C = 4, 10, 2 and D = (2, 2, −2). Find the angle θ between AB and AC. Decide if the angle φ between AB and AD is acute or obtuse. 14CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS ❃ B B ❑ u u v θ ⑦ ✲ C A θ v ✲ ❥ A θ acute C θ obtuse Figure 1.5 ¡ √ ¢ ¡ √ ¢ Solution: AB = (2, 0, 3) − (1, 0, 1) = (1, 0, 2) and AC = 4, 10, 2 − (1, 0, 1) = 3, 10, 1 . Hence AB · AC =√ cos θ = |AB| |AC| and θ = 60 degrees or π3 radians. Similarly, AD = (1, 2, −3) and AB · AD =√ cos φ = |AB| |AD| ¢ ¡ √ (1, 0, 2) · 3, 10, 1 1 q = ¡√ ¢2 2 12 + 02 + 22 32 + 10 + 12 r 5 (1, 0, 2) · (1, 2, −3) q =− 14 2 12 + 02 + 22 12 + 22 + (−3) Since the dot product is negative the angle φ is obtuse. 1.1.15 Some further properties of the dot product 1. The non-zero vectors AB and AC are at right angles (are perpendicular AB ⊥ AC) if, and only if, (AB) · (AC) = 0. Such vectors are also called mutually orthogonal. 2. |u · v| ≤ |u| |v| for any vectors u and v (Cauchy-Schwartz inequality). 3. |u + v| ≤ |u| + |v| for any vectors u and v (Cauchy’s inequality). The first result follows from the fact that AB ⊥ AC if, and only if, cos θ = 0. The second result is left as an exercise. To see the third result, use (2) and consider 2 |u + v| = (u + v) · (u + v) = u · u + v · v + 2u · v ≤ u · u + v · v + 2 |u| |v| = 2 2 |u| + |v| + 2 |u| |v| 2 = (|u| + |v|) 2 2 As |u + v| ≤ (|u| + |v|) , it follows that |u + v| ≤ |u| + |v|. 15 1.1. POINTS IN THREE-DIMENSIONAL SPACE Example 1.1.3 Find a non-zero vector perpendicular to both u = (2, 1, −1) and v = (3, 3, −1). Solution: Let x = (x1 , x2 , x3 ) satisfy x ⊥ u and x ⊥ v. Then x·u x·v = 2x1 + 1x2 + (−1) x3 = 0 = 3x1 + 3x2 + (−1) x3 = 0 By subtraction −x1 − 2x2 = 0, or x1 = −2x2 . Thus from the first equation, x3 = 2x1 + x2 = −3x2 . We may let x2 be any non-zero number, e.g. x2 = 1. Then x = (−2, 1, −3) is perpendicular to u and to v. We will see later why it is always possible to find a vector x 6= 0 perpendicular to two non-zero vectors. This fact will also come into our study of planes (see 1.3.4). 1.1.16 Unit vectors. Components. Projections A vector u of length 1 is called a unit vector. For example, u = A vector c 6= 0 defines the vector 1 b c= c |c| ¡3 4 5 , 0, − 5 ¢ is a unit vector. which is the unique unit vector having the same direction as c. To see that b c has length 1 we note ¶ µ ¶ µ 2 1 1 |c| 1 2 |b c| = b c · c = 2c · c = 2 = 1 c·b c= |c| |c| |c| |c| The component of the vector f in the direction of (or along) the non-zero vector c is c. defined as f · b Suppose that f 6= 0 and that the angle between f and c is θ. Then ¯ ¯ f ·b c = ¯f ¯ cos θ (1.6) To see equation (1.6), use equation (1.5): ✯ f |f | sin θ (f 6= 0) θ ✲ |f | cos θ c Figure 1.6 16CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS ¯ ¯ ¯f ¯ |c| cos θ ¯ ¯ f ·c 1 f ·b c=f· c= = = ¯f ¯ cos θ |c| |c| |c| (See Figure 1.6). Notice that if π2 < θ ≤ π then the component (1.6) is negative (as is only natural). The projection of f on c is the vector ¡ ¢ f ·c c b c= f ·b 2 c |c| (1.7) Figure 1.7 shows two cases of the projection of f 6= 0 on c: when cos θ > 0 and when cos θ < 0. f θ .. . ✣.... .. .. .. .. .. .. .. .. .. .. ✲.. .. ..❑ .. .. .. .. .. .. .. .. .. .. .. .. .✛ ✲b c f (f 6= 0) θ Heavy arrow = projection of f on c ✲ b c Figure 1.7 Let d k c. Then the projection of f on c is the same as the projection of f on d. See Exercise 2, No.13a. The situation with components is slightly different: If d and c have the same direction then the components of f along c and d are the same. If d and c have opposite directions then the components of f along c and d are the same apart from sign - one is the negative of the other. We return to projections in subsection 1.2.5. Example 1.1.4 1. If we imagine that c is pointing due East and f is a displacement, ¯ ¯ ¯ ¯ then ¡ ¢ f ·b c b c is the Easterly displacement, while ¯f ¯ cos θ is the Easterly component and ¯f ¯ sin θ is the Northerly component f . You may think¯ of is dragging a box placed at ¯ f as a vector representing a force¯ that ¯ ¯ ¯ ¯ ¯ its tail. Then f sin θ is the lifting effect of f while f cos θ is the horizontal dragging effect of the force. q 2 2 2. Let f = (11, −3, 4) and c = (5, −1, −2). The length of c is |c| = 52 + (−1) + (−2) = √ 30. The unit vector b c in the direction of c is The component of f along c is 1 b c = √ (5, −1, −2) 30 1 5√ c = (11, −3, 4) · √ (5, −1, −2) = 30 f ·b 3 30 17 1.1. POINTS IN THREE-DIMENSIONAL SPACE and the projection of f on c is 1.1.17 ¡ ¢ 5√ f ·b c b 30 c= 3 µ ¶ 5 1 √ (5, −1, −2) = (5, −1, −2) 3 30 The vectors i, j, k The vectors i, j and k are defined as i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1). They have unit length and are directed along the x−, y− and z−axis respectively. It is clear that they are also mutually orthogonal and that any vector a = (a1 , a2 , a3 ) has a unique expression a = a1 i + a2 j + a3 k (1.8) The scalars a1 , a2 and a3 are the components of a in the directions i, j and k respectively. See also No.15 of Exercise 2 below. 1.1.18 Linear combinations Let a and b be two vectors and λ and µ two scalars. The vector x = λa + µb is called a linear combination of a and b with coefficients λ and µ. Similarly, for scalars r, s and t, the vector ra + sb + tc is a linear combination of a, b, and c with coefficients r, s and t. Equation (1.8) shows that every vector is a linear combination of i, j and k. If a = (−1, 2, 3), b = (5, −7, 1), c = (−4, 8, −1) then µ µ ¶ ¶ 2 5 37 31 1 a+ b+ − ,− ,3 c= 2 3 6 6 3 is a linear combination of a, b, and c Exercise 2 1. In Figure 1.2 express ON , M P , OM , LP , N R, N M and OP as vectors involving a, b and c. Write down these vectors as linear combinations of i, j and k and find their lengths in terms of a, b and c. 2. Interpret the equation (r − (2, −1, 5)) · (r − (2, −1, 5)) = 49 geometrically. 3. Let u = (1, 3, −2), v = (2, 0, 1) and w = (3, −3, 4). (a) Find the linear combinations u + 3v − 2w, (−5) u + 2v + 3w and 43 u − 12 v + 41 w. (b) If the tail of u is the point A = (−2, −11, 13), what is the head of u? If the head of w is B = (2, −9, 10), what is the tail of w? (c) Find coefficients α, β such that w = αu + βv. (d) Is 3 (3, −1, 3) a linear combination of u, v and w? This means you must look for scalars p, q, r such that pu + qv + rw = 3 (3, −1, 3), and this involves three equations in three unknowns. 4. (Generalization of No.3b). Let u be a given vector. Show that if the tail A of u is given then the head B of u is uniquely determined. State and prove a similar result if the head of u is given. Draw a picture. 5. See Figure 1.8. Imagine that in each case the axes are part of a room in which you are standing and looking at the corner indicated. Decide which of the systems of mutually orthogonal axes are left-handed and which right-handed. If you are outside the room, what would your answer be? 6. Let A0 ,...,A9 be ten points in space and put am = Am−1 Am for m = 1, · · · , 9 and a10 = A9 A0 . Find the sum a1 + · · · + a10 . 18CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS y z y x x x x z y z z Figure 1.8 7. Complete the proofs in statements 1.1.6, 1.1.9 and 1.1.11 above. Hint 3 These properties just reflect similar properties of numbers. For example, the commutative law of addition of vectors is proved by u+v = (u1 + v1 , u2 + v2 , u3 + v3 ) = (v1 + u1 , v2 + u2 , v3 + u3 ) = v+u The proof of the first item of 1.1.9 is α(u + v) = = = = α(u1 + v1 , u2 + v2 , u3 + v3 ) (α(u1 + v1 ), α(u2 + v2 ), α(u3 + v3 )) (αu1 + αv1 , αu2 + αv2 , αu3 + αv3 ) αu + αv 8. Complete the proofs of subsections 1.1.13 and 1.1.15. 9. Deduce that a dot product (αr + βs) · (γu + δv) of linear combinations multiplies out just 2 2 2 like in ordinary arithmetic and conclude that |αu + βv| = α2 |u| + β 2 |v| + 2αβu · v. 10. Show that if the vectors u and v in equation (1.5) are unit vectors, then cos θ = u · v. 11. Show that the non-zero vectors u and v are parallel if, and only if, |u · v| = |u| |v|. Solution: The vectors are parallel if, and only if, the angle between them is 0 or π radians. 12. Let a = (3, −1, 2) and b = (−7, 3, −1). Find (a) The unit vectors b a and bb. (b) The component of a along b and the component of b along a. (c) The projection of a on b and the projection of b on a. 13. (a) If γ 6= 0, show that the projections of f on c and f on γc are equal. (b) Show that if f 6= 0, then the projection of f on itself is f . y 19 1.2. THE STRAIGHT LINE 14. Let u and v be two vectors. The set of all linear combinations of u and v is called the span of the vectors and is denoted by sp (u, v). (This is discussed more fully in item 2 of subsection 3.1.2 of Chapter 3). Let u1 , v 1 u2 , v 2 be four vectors. Show that sp (u1 , v 1 ) ⊆ sp (u2 , v 2 ) if, and only if that each of the vectors u1 and v 1 are linear combinations of the vectors u2 and v 2 . 15. Let l, m and n be three mutually orthogonal unit vectors. Placing the tails of these vectors at the origin, it is visually clear that we may use these vectors as a frame of reference in place of i, j and k. (An analytic proof of this will follow from work done in Chapter 3 See Exercise 78, No.9). Assuming this result, any vector a can be expressed as a linear combination a = γ1 l + γ2 m + γ3 n Show that necessarily the coefficients γ1 , γ2 , γ3 are the components (in the above sense) of a in the directions l, m and n respectively. In fact, γ1 l, γ2 m, γ3 n are the projections of the vector a on l, m and n respectively. Conclude that 2 |a| = γ12 + γ22 + γ32 Show that l = √16 (1, −1, 2), m = √13 (−1, 1, 1) and n = √12 (1, 1, 0) is such a system of three mutually orthogonal unit vectors and find the coefficients γ1 , . . . , γ3 , if a = (−3, 2, 1). Hint 4 Take the dot product of both sides of the above equation with each of l, m and n. The coefficients for the specific example are the components a · l = γ1 etc. A longer procedure to find the coefficients for the specific example is to use the method of 3d above. 16. What familiar result from Euclidean Geometry does |b − a| ≤ |a| + |b| represent? 17. Let u and v have the same length: |u| = |v|. Show that (u − v) · (u + v) = 0. In other words, if u − v and u + v are not zero they are at right angles. 18. How are our general results affected (and simplified) when one of the coordinates is restricted to be zero for all vectors under consideration? 1.2 The straight line Two distinct points determine a line. Points that lie on one and the same line they are called collinear. Let A, B and C be three distinct points. Then it is geometrically obvious that C lies on the (infinite) line through A and B if, and only if, AC is parallel to AB. We need to show that this axiom is symmetric with regard to A, B and C. In other words, that the statements AC k AB, CB k CA and BA k BC are equivalent. It is left as an exercise to prove this equivalence algebraically. See Exercise 5, No.2. This axiom for the collinearity of A, B, and C gives the clue on how to describe algebraically the line ℓ through A and B. 1.2.1 Generic (parametric) representations of ℓ Let A and B be two distinct points. Then as t ∈ ℜ varies, the point R with position vector r = r (t) = OA + tAB (1.9) 20CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS varies over the entire straight line ℓ through A and B. As in Figure 1.9, from O jump onto the line at A and then go parallel to AB to R. The point R = A is obviously on ℓ and corresponds with t = 0. If R 6= A we have AR = tAB for some t 6= 0, so in general r = OR = OA + AR = OA + tAB When t = 1 then we are at R = B. ℓ B R ✯ ✒ tAB A ✕ r = OR = OA + tAB . OA Figure 1.9 O As r = r (t) represents a general point on the line ℓ, the equation (1.9) is called a generic or parametric equation of the line through A and B. The variable t is called the parameter. 1.2.2 Some general observations In what follows a = OA, b = OB, etc. 1. As t varies over the set ℜ of real numbers the point r = OR = a + t (b − a) of equation (1.9) varies continuously over the line ℓ. In this representation of the line ℓ we consider b − a = AB as the direction of ℓ. In Figure 1.9 as t increases R moves to the right, as t decreases R moves to the left. 2. We remarked that t = 0 corresponds to R = A. When t = 1 we have R = B. If t is between 0 and 1, say t = 12 we get OR = a + Thus 1 2 1 1 (b − a) = (a + b) 2 2 (a + b) is the position vector of the point midway between A and B. 3. More generally, a point r = r(t) = a + t (b − a) = (1 − t) a + tb where 0 ≤ t ≤ 1 lies between A and B and the set AB of these points is called the line segment joining A and B. If A 6= B and 0 < t < 1, the point r(t) is said to lie strictly between A and B. The line segment AB is a set of points and it must be distinguished from the vector AB and also from the distance |AB|. 21 1.2. THE STRAIGHT LINE 4. Let u 6= 0 define a direction and suppose a is the position vector of the point A. Then a generic equation of the line through A with direction u is r (t) = a + tu (1.10) A typical point a + tu on the line is called a generic point. The set ℓ of points R = r (t) satisfying a generic equation such as (1.10) as t varies over the real numbers is the analytic definition of a straight line. 5. A generic equation is not unique. (a) If C 6= D are two points on the line given by equation (1.9), then since these points also determine the line, another generic equation (now with parameter s) is r (s) = OC + sCD The parameters t and s are of course related. See Example 1.2.1 of subsection 1.2.3 and No.1b of Exercise 5. (b) When do two generic equations like (1.10) define the same line? Let u and v be non-zero vectors and suppose that a + tu is a generic equation of line ℓ1 and b + sv a generic equation of line ℓ2 . Then ℓ1 = ℓ2 if, and only if, u k v and a − b is a multiple of u. (See No.5 of Exercise 5). 1.2.3 Some illustrative Examples Example 1.2.1 Find two generic equations for the line ℓ passing through A = (1, 2, 3) and B = (−1, 1, 4). Solution: One generic equation is r = (1, 2, 3) + t ((−1, 1, 4) − (1, 2, 3)) = (1, 2, 3) + t (−2, −1, 1) = (1 − 2t, 2 − t, 3 + t) Another is r = (−1, 1, 4) + s ((1, 2, 3) − (−1, 1, 4)) = (−1, 1, 4) + s (2, 1, −1) = (−1 + 2s, 1 + s, 4 − s) In the second representation we have used r = OB + sBA, with parameter s. The relationship between s and t is s = 1 − t, as can be easily seen. Example 1.2.2 Find the position vector of the point R lying on the line ℓ of Example 1.2.1 that is a distance 13 |AB| from A but on the side of A opposite to that of B. Solution: To find R put t = − 31 in OR = OA + tAB to get ¶ µ 1 OR = OA + − AB 3 µ ¶ 1 = (1, 2, 3) + − (−2, −1, 1) 3 µ ¶ 5 7 8 = , , 3 3 3 22CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS Example 1.2.3 Find the foot Q of the perpendicular from the point P = (2, −1, 4) to the line ℓ of Example 1.2.1. Hence find the shortest distance between P and ℓ. Solution: Let Q = (1 − 2t, 2 − t, 3 + t) be the required foot of the perpendicular from P = (2, −1, 4) to the line ℓ. Then P Q = (−1 − 2t, 3 − t, −1 + t) must be perpendicular to the direction (−2, −1, 1) of ℓ: (−1 − 2t, 3 − t, −1 + t) · (−2, −1, 1) = 0 ¡ ¢ This gives t = 13 and so Q = q = 31 , 53 , 10 . The shortest distance of P from ℓ is thus 3 ¯ ¯¡ ¯ √ ¢¯ ¯P Q¯ = ¯ − 5 , 8 , − 2 ¯ = 1 93. (See Figure 1.10). 3 3 3 3 P = (2, −1, 4) r = (1 − 2t, 2 − t, 3 + t) Q Figure 1.10 Example 1.2.4 Find the foot Q of the perpendicular from a general point P = (p1 , p2 , p3 ) to the line r = a + tu if a = (1, 2, 3) and u = (−2, −1, 1). Solution: Let t be such that Q has position vector a + tu. As before, we consider P Q = a + tu − p and must solve P Q · u = 0 for t and then substitute this value in a + tu to obtain Q. We have ¡ ¢ (1.11) a + tu − p · u = 0 Since P Q = (1 − 2t − p1 , 2 − t − p2 , 3 + t − p3 ), the condition P Q · u = 0 becomes = This gives t = q= µ 1 6 (1 − 2t − p1 , 2 − t − p2 , 3 + t − p3 ) · (−2, −1, 1) −1 + 6t + 2p1 + p2 − p3 = 0 (1 − 2p1 − p2 + p3 ). Substituting this value of t in (1 − 2t, 2 − t, 3 + t) gives 1 1 11 1 1 1 19 1 1 1 2 2 + p 1 + p2 − p3 , + p1 + p2 − p 3 , − p 1 − p2 + p3 3 3 3 3 6 3 6 6 6 3 6 6 ¶ 23 1.2. THE STRAIGHT LINE 1.2.4 A general result for the foot of a perpendicular to a line In Exercise 5, No.11 you are asked to show that the foot Q of the perpendicular from P to the line a + tu has position vector q =a+ Ã 1 ¡ |u| 2 ¢ ! p−a ·u u (1.12) Example 1.2.5 Let us apply this formula to the above example 1.2.3 to find the foot of the perpendicular from P = (2, −1, 4) to the line (1, 2, 3) + t (−2, −1, 1). Solution: 2 We have |u| = 6 and the foot has position vector µ ¶ 1 ((2, −1, 4) − (1, 2, 3)) · (−2, −1, 1) (−2, −1, 1) q = (1, 2, 3) + 6 ¶ µ 1 5 10 = , , 3 3 3 This is the same result as before. 1.2.5 Projections on a line through the origin When a line ℓ passes through the origin, the foot Q of the perpendicular from P to ℓ is called the projection of P on ℓ. The projection of p on the line tu is just the projection of p on the vector u. This is equation (1.7) with f = p and c = u: ! Ã p·u q= u (1.13) 2 |u| Example 1.2.6 Find the projection of P = (p1 , p2 , p3 ) on the line t (−2, −1, 1). Solution: In this case P Q = (−2t − p1 , −t − p2 , t − p3 ) and we can proceed as before by solving (−2t − p1 , −t − p2 , t − p3 )· (−2, −1, 1) = 6t + 2p1 + p2 − p3 = 0 for t and substituting into t (−2, −1, 1). It amounts to the same thing to use (1.13) and the required projection is q = = −2p1 − p2 + p3 (−2, −1, 1) 6 |(−2, −1, 1)| ¶ µ 1 1 1 1 1 1 1 1 2 p1 + p2 − p 3 , p1 + p 2 − p 3 , − p 1 − p2 + p 3 3 3 3 3 6 6 3 6 6 (p1 , p2 , p3 ) · (−2, −1, 1) 2 (−2, −1, 1) = (1.14) We will return to this example in Chapter 3 (See Example 3.3.1). Example 1.2.7 A straight line lies in the x − y plane and passes through the origin. The line makes an angle of 30◦ with the positive x−axis. Find the foot Q of the projection from P = (p1 , p2 ) on the line. Solution: ³√ ³√ ´ ´ ¡ ¢ Let u = cos π6 , sin π6 = 23 , 21 . The parametric equation of the line is r = t 23 , 12 and the foot Q of the perpendicular from P to the line is the projection of p = (p1 , p2 ) on the unit vector u: Ã√ ! Ã√ ! µ ¶ √ ´ 1√ 1 3p1 3 1 p2 1³ q= 3p1 + p2 + , (1.15) = 3p1 + 3p2 , 2 2 2 2 4 4 4 We shall return to this example in subsection 1.4 below and again in Chapter 3 Example 3.2.1. 24CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS A well-known result on parallelograms Example 1.2.8 Show that the diagonals of a parallelogram bisect each other. Solution: In Figure 1.11 ABCD is a parallelogram with AB = DC and BC = AD. It is understood that the parallelogram is non-degenerate in that no three of A, B, C, D are collinear. A B .. . ✼ ....... .... .... X .... .... .... .... .... .... .... ✲ D ✻ C ✲ ✶ ✣ AB = DC AD = BC AX = XC Figure 1.11 Let X be the midpoint of AC so that AX = XC. Then AB + BX = AX = XC, so BX = XC − AB = XC − DC = XC + CD = XD. 1.2.6 Intersecting, parallel and skew lines Let ℓ1 and ℓ2 given by generic equations a1 + su1 and a2 + tu2 respectively be two straight lines. They are parallel if u1 k u2 . Exactly one of the following holds (see Exercise 5, No.6* for a rigorous proof): 1. The lines are identical. 2. They are distinct and parallel. 3. They are not parallel and meet in a (unique) point. 4. They are not parallel and do not intersect (they are by definition skew). ℓ1 P Q ℓ2 Figure 1.12 25 1.2. THE STRAIGHT LINE A familiar example of skew lines is provided by a road passing under a railway bridge. A train and car can only crash at a level crossing! In the case of skew lines there will be points P on¯ ℓ1 and ¯ Q on ℓ2 such that P Q is perpendicular to the directions of both lines and the distance ¯P Q¯ is minimum. See Figure 1.12. Example 1.2.9 Let ℓ1 be the line through A = (0, 0, 0) and B = (1, 2, −1) and suppose ℓ2 is the line through C = (1, 1, 1) and D = (2, 4, 3). Are the lines skew? In any case, find the shortest distance between them. Solution: Generic equations for ℓ1 and ℓ2 are t(1, 2, −1) and (1, 1, 1)+s(1, 3, 2) respectively. Since (1, 2, −1) 6k (1, 3, 2), the lines are not parallel (and so cannot possibly be identical). They will meet if for some t and s t(1, 2, −1) = (1, 1, 1) + s(1, 3, 2) This means that the three equations t = 1 + s, 2t = 1 + 3s and −t = 1 + 2s must hold. From ¡ the¢ first and last equation, 1+s = − (1 + 2s) and thus s = − 23 , t = 31 . But then 2t = 23 6= 1+3 − 32 and the second equation fails. It follows that ℓ1 and ℓ2 are skew. Let P = t(1, 2, −1) and Q = (1, 1, 1) + s(1, 3, 2) be the points where P Q is minimum. Then P Q = (1, 1, 1) + s(1, 3, 2) − t(1, 2, −1) is perpendicular to both directions (1, 2, −1) and (1, 3, 2). Hence: ((1, 1, 1) + s(1, 3, 2) − t(1, 2, −1)) · (1, 2, −1) = 2 + 5s − 6t = 0 ((1, 1, 1) + s(1, 3, 2) − t(1, 2, −1)) · (1, 3, 2) = 6 + 14s − 5t = 0 2 Solving these two equations gives s = − 26 59 and t = − 59 . Hence, µ ¶ 35 15 5 ,− , P Q = (1, 1, 1) + s(1, 3, 2) − t(1, 2, −1) = 59 59 59 ¯¡ 35 15 5 ¢¯ The shortest distance between the lines is ¯ 59 , − 59 , 59 ¯ = √559 . Exercise 5 1. let A = (1, 2, −3), B = (−1, 1, 2), C = (5, 4, −13), D = (−5, −1, 12) and E = (−1, −2, 3). Answer the following questions. (a) Find a generic equation with parameter t of the straight line ℓ through A and B in the form of equation (1.9). (b) Is E on ℓ? Show that C and D are on ℓ and find a corresponding generic equation of the line with parameter s. Find the relationship between s and t. √ (c) Find two points on ℓ at a distance 24 from E. (d) Find an equation describing the line segment joining A and B. In particular find the midpoint of the segment and also the three points strictly between A and B dividing the segment into four equal parts. 2. * Let A, B and C be three distinct points. Show that any of the conditions AB k AC, BC k AC, and AB k BC are equivalent. Hint 6 Since A, B and C are distinct, AB = b − a, AC = c − a and BC = c − b are all non-zero. Let p be the statement AB k AC, and q the statement BC k AC and r the statement AB k BC. The statement p says there is s with b − a = s(c − a). From this we get c − b = c − (a + s (c − a)) = c − a − s (c − a) = (1 − s) (c − a). We have just shown p ⇒ q (p implies q). Now substitute C for A, A for B and B for C in p ⇒ q. You should get q ⇒ r. Making the same substitutions in q ⇒ r you should get r ⇒ p. 26CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS 3. Use the previous criterion to show that any three points satisfying equation (1.10) lie on a straight line (are collinear) and hence that the equation does indeed define a straight line. Hint 7 Use the results of the previous problem. 4. Let the generic equation r(t) = a + tu represent a line ℓ, where u 6= 0 and suppose that P with position-vector p is a point. Show that P lies on ℓ (a) if, and only if, p − a is a multiple of u. (b) if, and only if, the equation a + tu = p has a unique solution for t. (Compare this with No.14 below for another criterion). 5. * Let u and v be non-zero vectors and suppose that a + tu and b + sv are generic equations of lines ℓ1 and ℓ2 respectively. Show that necessary and sufficient conditions for the generic equations to represent the same set of points (i.e. ℓ1 = ℓ2 ) are (i) u k v and (ii) a − b is a multiple of v. Using this result show that two lines sharing two distinct points are identical. 6. * Use the results of the previous exercise to show that the four conditions in subsection 1.2.6 are in fact mutually exclusive and exhaustive. 7. Find - but not by simply substituting in equation (1.12) - the point Q of the line ℓ with parametric equation (1 − 2t, 2 − t, −3 + 5t) which is closest to (a) the origin, Solution. Solve (1 − 2t, 2 − t, −3 5t) · (−2, ¢ −1, 5) = −19 + 30t = 0. So t = ¡ + 4 41 1 , 30 , 6 Q is (1 − 2t, 2 − t, −3 + 5t) = − 15 19 30 and (b) the point P = (5, 8, 4), (c) the point P = (−3, 2, 1). 8. Find the projections of the following points P on the line (−2t, −t, 5t). (a) P = (4, 6, 7) Solution: We have ((−2t, −t, +5t) − (4, ¡ 6,7 7)) ·7 (−2, ¢ −1, 5) = −21 + 30t = 0 and t = 7 7 . The projection is (−2t, −t, +5t) = − , − , 10 5 10 2 . Alternatively, the projection of P on (−2, −1, +5) is (4, 6, 7) · (−2, −1, +5) |(−2, −1, +5)| 2 (−2, −1, +5) = 7 (−2, −1, +5) 10 (b) P = (−1, −2, 3). Hint 8 These can be done in the same way you handled the previous question, but realize that you can use the projection of p on the vector (−2, −1, 5). 9. Check your results of 7a and 7b using equation (1.12). Can you see any connection between 7a and 8b and also between between 7b and 8a? 27 1.2. THE STRAIGHT LINE 10. Find the projection of the point P with position vector p = (p1 , p2 , p3 ) on the line (−2t, −t, 5t). 11. Prove the formula (1.12) that finds the foot Q of the perpendicular from P = (p1 , p2 , p3 ) onto the line a + tu. Deduce the projection formula (1.13). ¢ ¡ Hint 9 Show that the solution to a + tu − p · u = 0 (equation (1.11) is t=− ¢ ¡ a−p ·u |u| 2 12. Show that the closest the line ℓ with parametric equation a + tu comes to the point P is q ¢2 ¡¡ ¢ ¢2 1 ¡ d= |p − a||u| − p − a · u (1.16) |u| 2 13. Show that |r (t)| in equation (1.10) is a quadratic in t. How does your knowledge of quadratics tie up with the formula (1.12) and equation (1.16)? 14. Deduce from equation (1.16) that ℓ passes through the point P if, and only if, (p − a) · u = ±|p − a||u|. Note that geometrically this is clear in view of the formula for the angle between p − a and u if p − a 6= 0. 15. * If t is time in seconds and u is the displacement in meters per second (the constant velocity), then equation (1.10) can be interpreted as the position of (say) an aircraft in meters at time t. (a) In a situation like this we would be interested in how close the aircraft comes to a certain point P for t ≥ 0. Find a condition for (1.16) to hold at some time t ≥ 0. If this condition fails, what is the closest the aircraft gets to P and when does this occur? (b) How can your findings be used to find the closest distance two aircraft travelling with constant velocities get to each other, assuming that as time t ≥ 0 increases the distance between the aircraft decreases? Hint 10 Let a + tu and b + tv describe the positions of the two aircraft at time t. Consider the line b − a + t (v − u). You want its closest distance from the origin. By assumption, why is (b − a) · (v − u) < 0? 16. A rhombus is a (non-degenerate) parallelogram all of whose sides have the same length. Show that the diagonals of a rhombus intersect at right angles. Hint 11 In Example 1.2.8 (Figure 1.11) use |AB| = |BC| as well as AC = AB + BC and AB + BD = AD, so BD = AD − AB = BC − AB. Now use the result of Exercise 2 No.17. 17. *(Generalization of previous exercise). Let a + tu and b + sv represent two straight lines and suppose that A = a + t1 u, B = b + s1 v, C = a + t2 u, and D = b + s2 v are points on these lines with A 6= C and B 6= D. Show that |AB|2 + |CD|2 = |BC|2 + |DA|2 if, and only if, the lines are perpendicular. Hint 12 Expand |AB|2 = (b + s1 v − (a + t1 u)) · (b + s1 v − (a + t1 u)) etc. and show that |AB|2 + |CD|2 − |BC|2 − |DA|2 = 2u · v(t1 − t2 )(s2 − s1 ). 28CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS 18. Let A, B and C be three points not on a line and so forming the distinct vertices of a triangle. The median from A is the line joining A to the midpoint of the opposite side BC. Show that all three medians intersect in a single point M . This point is called the centroid of the triangle and is the centre of mass of three equal masses placed at the vertices of the triangle. Hint 13 Let a, b, and c, be the position vectors of A, B and C respectively. Then the midpoint of BC has position vector 12 (b + c). A generic equation of the line through A and the midpoint of BC is ¶ µ 1 a+t (b + c) − a 2 Put t = 32 and simplify the result. You should conclude that the centroid lies two thirds of the way from any vertex along its median toward the opposite side. 19. (Cartesian equations for a straight line). Let equation (1.10) define a straight line, where u = (u1 , u2 , u3 ) and each ui 6= 0. Show that r = (x, y, z) is on the line if, and only if, y − a2 z − a3 x − a1 = = u1 u2 u3 What happens if one or two of the ui = 0? Find Cartesian equations for the line of Question 1a above. Find generic and Cartesian equations for the x−, y− and z−axis. 20. Let ℓ1 be the line through A = (−1, −2, 6) and B = (3, 2, −2) and ℓ2 the line through C = (1, 3, 1) and D = (1, 1, 5). Solve the following problems. (a) Show that ℓ1 and ℓ2 are skew. (b) Find points P on ℓ1 and Q on ℓ2 such that P Q is perpendicular to the directions of both lines. Hence find the shortest distance between ℓ1 and ℓ2 . 21. Let ℓ1 be the line through A = (2, −1, 4) and B = (3, 0, 5) and ℓ2 the line through C = (−1, 0, 1) and D = (−2, 1, 0). Show that ℓ1 and ℓ2 meet and find the point of intersection. 22. How are our general results affected (and simplified) when one of the coordinates (say the z− coordinate) is restricted to be zero? In particular what does equation (1.12) reduce to? 1.3 1.3.1 Planes Generic Equation for a plane Our first object is to find a generic equation for the plane passing through three non-collinear points A = (a1 , a2 , a3 ), B = (b1 , b2 , b3 ) and C = (c1 , c2 , c3 ). If four points lie on a plane, they are called coplanar. So if R is a point, our first question is “when are A, B, C and R coplanar?” The answer is suggested by Figure 1.13: Suppose that R lies in the same plane as A, B and C. Let the line through R parallel to AC meet the line through A and B at the point P . Then AP = sAB for some scalar s. Similarly, let the line through R parallel to AB meet the line through A and C at the point Q, so that AQ = tAC for some scalar t. The figure AP RQ forms a parallelogram and OR ¡ ¢ = OA + AR = OA + (AP + P R) = OA + AP + AQ = OA + sAB + tAC (1.17) This equation describes the position vector r = OR of a general point on the plane through A, B and C and is called a generic equation of the plane through A, B and C. 29 1.3. PLANES sAB = AP = QR B ✼ P A ✻ O ✣ tAC = AQ = P R q R ✸ ✣ q Q q C Figure 1.13 Example 1.3.1 Let A = (1, −1, 2), B = (1, 2, 3) and C = (−1, 0, 1) be three given points. They are not collinear since the vectors AB = (0, 3, 1) and AC = (−2, 1, −1) are not parallel. Hence a generic equation of the plane through A, B and C is r = (1, −1, 2) + s(0, 3, 1) + t(−2, 1, −1) In equation (1.17) the vectors u = AB and v = AC are non-zero and non-parallel and are thought of as parallel to the plane described. This suggests the following definition: 1.3.2 Analytic definition of a plane Let u and v be two non-zero, non-parallel vectors and A = a a given point. Then the set Π of points R = r satisfying the generic (parametric) equation r = r(s, t) = a + su + tv (1.18) describes a plane. The parameters are s and t and we call a + su + tv a generic point of the plane. 1.3.3 Equality of planes As with lines, a generic equation for a plane is not unique. Let a1 + s1 u1 + t1 v 1 and a2 + s2 u2 + t2 v 2 Represent planes Π1 and Π2 respectively. Then Π1 = Π2 if, and only if, (i) sp (u1 , v 1 ) = sp (u2 , v 2 ) and (ii) a1 − a2 is in this common span. For the meaning of ‘span’ see No.14 of Exercise 2 and compare No.5 of Exercise 5. The proof of the above statement is left as an exercise. See No.11 of Exercise 16 below. 30CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS 1.3.4 Cartesian Equation of a plane A vector n = (n1 , n2 , n3 ) 6= 0 is called a normal of the plane defined by equation (1.18) if it is perpendicular to every line lying in the plane. So for a variable point R on the plane we must have n · AR = 0. Thus as in Figure 1.14, all points on the plane with position vector r = (x1 , x2 , x3 ) must satisfy n · (r − a) = 0 ✻ plane (r − a) · n = 0 n ✿ R = (x1 , x2 , x3 ) AR A Figure 1.14 This is a Cartesian equation of the plane. More fully this reads = n1 x1 + n2 x2 + n3 x3 − (n1 a1 + n2 a2 + n3 a3 ) n1 x1 + n2 x2 + n3 x3 + c = 0 (1.19) where c = − (n1 a1 + n2 a2 + n3 a3 ) A Cartesian equation is unique up to non-zero constant multiples of (n1 , n2 , n3 , c) since evidently multiplying n1 , n2 , n3 and c by β 6= 0 cannot change a solution (x1 , x2 , x3 ). 1.3.5 Finding a normal n If n is a normal to the plane given by equation (1.18), then in particular n ⊥ u and n ⊥ v. In other words, simultaneously n 1 u1 + n 2 u2 + n 3 u3 n1 v1 + n2 v2 + n3 v3 = 0 = 0 (1.20) Conversely, if equation (1.20) holds then n must be a normal (see Exercise 16, No.4 below). Because equation (1.20) involves two equations for three unknowns, a solution n 6= 0 can always be found. See Chapter 2, 2.2.15 for a proof of this. Example 1.3.2 Find a normal n 6= 0 to the plane of Example 1.3.1 and hence its Cartesian equation. 31 1.3. PLANES Solution: We solve (n1 , n2 , n3 ) · (0, 3, 1) = 3n2 + n3 = 0 (n1 , n2 , n3 ) · (−2, 1, −1) = −2n1 + n2 − n3 = 0 simultaneously for n1 , n2 and n3 . (We did much the same thing in Example 1.1.3). From the first equation, n3 = −3n2 . Substituting this into the second equation, we get n1 = 1 2 (n2 − n3 ) = 2n2 . Thus, provided n2 6= 0, the vector n = (2n2 , n2 , −3n2 ) = n2 (2, 1, −3) is perpendicular (normal) to the plane. We may take n2 = 1 and n = (2, 1, −3). With A = (1, −1, 2) we get a Cartesian equation for the plane: (2, 1, −3) · ((x1 , x2 , x3 ) − (1, −1, 2)) = 2x1 + x2 − 3x3 + 5 = 0 1.3.6 Some remarks on planes Remark 14 The following statements are geometrically obvious, but analytic proofs can be given after we have studied the subject more deeply, in particular after we ave done Chapter 3. See No.10 in Exercise 78. 1. Provided n = (n1 , n2 , n3 ) 6= 0, an equation n1 x1 + n2 x2 + n3 x3 + c = 0 always represents a plane with normal n. (If n = 0 the equation represents nothing if c 6= 0 and all of space if c = 0). 2. By definition, two distinct planes are parallel if their normals are parallel. Such planes are a constant positive distance apart and have no points in common. 3. Two non-parallel planes meet in a line. 4. By definition, a line is parallel to a plane if it is perpendicular to its normal (draw a picture to see that this is correct). A line not parallel to a plane meets the plane in exactly one point. 5. A line and a point not on it determine a unique plane. 6. There are an infinity of planes containing a given line. Example 1.3.3 Find a generic equation for the line of intersection of the planes x1 + x2 + x3 + 1 = 0 2x1 + x2 − x3 + 2 = 0 Solution: Note that (1, 1, 1) 6k (2, 1, −1) so the planes are not parallel. Eliminating x1 gives x2 + 3x3 = 0 or x2 = −3x3 . Substituting x2 = −3x3 into the first (or second) equation gives x1 = 2x3 − 1. Hence, with t = x3 the parameter, r = (x1 , x2 , x3 ) = (2t − 1, −3t, t) is the line of intersection of the two planes. Example 1.3.4 Find a generic equation of the plane with Cartesian equation x1 − 2x2 + 5x3 = 10 (1.21) 32CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS Solution: By far the easiest solution is r = (10 + 2x2 − 5x3 , x2 , x3 ) Here the independent parameters are x2 and x3 and they can vary freely over ℜ. We could of course let s = x2 and t = x3 , but this is only an artificial device and changes nothing. A more long-winded approach is to find three non-collinear points A, B and C on the plane and then proceed as in Example (1.3.1). Letting x2 = x3 = 0 gives x1 = 10 and A = (10, 0, 0) as one point. Putting x1 = x2 = 0 gives x3 = 2 and B = (0, 0, 2). With x1 = x3 = 0 we have x2 = −5 and C = (0, −5, 0). The points A, B and C do not lie on a line and another generic equation of the plane is r = OA + sAB + tAC = (10 − 10s − 10t, −5t, 2s) Remark 15 Instead of A, B and C we could have chosen any three non-collinear points satisfying equation (1.21). The parametric equation will be different but will represent the same plane. Example 1.3.5 Find two non-parallel vectors u and v both perpendicular to n = (1, −2, 5). Solution: Two such vectors are u = AB and v = AC Example 1.3.6 Find the point of intersection of the line through (1, −1, 0) and (0, 1, 2) and the plane 3x1 + 2x2 + x3 + 1 = 0 Solution: For some t the point r = (1, −1, 0) + t ((0, 1, 2) − (1, −1, 0)) = (1, −1, 0) + t (−1, 2, 2) = (1 − t, −1 + 2t, 2t) must be on the given plane, i.e. 3 (1 − t) + 2 (−1 + 2t) + (2t) + 1 = 0 Thus t = − 23 and the point of intersection is 1.3.7 ¡5 7 4 3, −3, −3 ¢ . Perpendicular from a Point to a Plane. Projections If Π is a plane and P a point, the problem in this section is to find the foot Q of the perpendicular from P to the plane. See Figure 1.15. If the plane passes through the origin, the foot Q is called the projection of P on the plane. Example 1.3.7 Find the foot Q of the perpendicular from the point P = (4, −1, 3) to the plane x + 2y + z = −4. Hence find the shortest distance from P to the given plane. Solution: The vector (1, 2, 1) is normal to the plane so the line ℓ defined by (4, −1, 3) + t (1, 2, 1) = (4 + t, −1 + 2t, 3 + t) passes through P and has direction normal to the plane. Geometry tells 33 1.3. PLANES P ✣ plane Π Q O Figure 1.15 Perpendicular Q from P to plane Π ¯ ¯ us that the line ℓ hits the plane at the required foot Q of the perpendicular. Then ¯P Q¯ will be the shortest distance of P from the plane. To find t, substitute (4 + t, −1 + 2t, 3 + t) into the Cartesian equation of the plane (4 + t) + 2 (−1 + 2t) + (3 + t) = −4 and solve for t. This gives t = − 32 and the position vector of the point Q therefore has position vector µ µ ¶ µ ¶ ¶ µ ¶ 3 3 3 3 5 , −4, , −1 + 2 − ,3 + − = q = 4+ − 2 2 2 2 2 The required shortest distance of P from the plane is ¯ ¯µ ¶ ¯ ¯ ¯ ¯ ¯P Q¯ = ¯ 5 , −4, 3 − (4, −1, 3)¯ ¯ ¯ 2 2 ¯µ ¶¯ r ¯ 3 ¯ 3 27 = ¯¯ − , −3, − ¯¯ = 2 2 2 Example 1.3.8 Find the foot Q of the perpendicular from a general point P = (p1 , p2 , p3 ) to 1. the above plane x + 2y + z = −4, and 2. the parallel plane x + 2y + z = 0 which passes through the origin. Solution: 1. Consider the line (p1 , p2 , p3 )+t (1, 2, 1) = (p1 + t, p2 + 2t, p3 + t). As above, we find where this line hits the given plane. The equation determining this point Q is (p1 + t) + 2 (p2 + 2t) + (p3 + t) = −4 2 +p3 This gives t = − 4+p1 +2p . Hence Q is 6 q = (p1 , p2 , p3 ) − 4 + p1 + 2p2 + p3 (1, 2, 1) 6 (1.22) 34CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS 2. We must substitute (p1 , p2 , p3 ) + t (1, 2, 1) = (p1 + t, p2 + 2t, p3 + t) into the equation x + 2y + z = 0 of the plane: (p1 + t) + 2 (p2 + 2t) + (p3 + t) = 0 This gives t = − p1 +2p62 +p3 and hence q = (p1 , p2 , p3 ) − p1 + 2p2 + p3 (1, 2, 1) 6 (1.23) Because the plane x + 2y + z = 0 passes through the origin the point Q given by equation (1.23) is the projection of P on the plane. See Exercise 16, (5), where the theory is taken further. Exercise 16 1. Show that A = (1, 2, 1), B = (0, 1, 3) and C = (−1, 2, 5) are not collinear and so define a unique plane Π. Find: (a) A generic equation for Π (b) A normal to Π and hence a Cartesian equation for Π (c) The foot Q of the perpendicular from P = (1, 2, 3) to Π and so the shortest distance from P to the plane Π (d) The foot Q of the perpendicular from P = (p1 , p2 , p3 ) to Π (see also No.5 below). (e) The intersection of the line ℓ with parametric form (1 + t, 3 + 2t, 4 − t) and Π, as well as the acute angle between the line and the normal to Π. (f ) A plane containing the point (4, 5, −3) and the line (1 + t, 3 + 2t, 4 − t). 2. Find a generic equation for the plane 2x + 3y − 4z + 5 = 0. 3. Find generic and Cartesian equations for the x − y and y − z planes. 4. Prove that if n satisfies equation (1.20) then it is normal to the plane given by the generic equation (1.18). 5. Follow the method of Examples (1.3.7) and (1.3.8) above to show that the foot Q of the perpendicular from the point P to the plane Π with Cartesian equation n · r + c = 0 is given by Ã ! c+n·p q =p− (1.24) n 2 |n| Here p is the position vector of P . Consider also the case c = 0.Then Q is called the projection of P on the plane Hint 17 Solve for t. ¡ ¢ n · p + tn + c = 0 6. Use equation (1.24) to show that the shortest distance of the point P from the plane n · r + c = 0 is ¯ ¯ ¯n · p + c¯ |n| 7. Write the equation of a plane n · p + c = 0 in the form m · p + d = 0 where m is unit vector. 35 1.4. REFLECTIONS IN A LINE OR A PLANE 8. Let n · r + c1 = 0 and n · r + c2 = 0 represent planes Π1 and Π2 respectively. Show algebraically that Π1 and Π2 are either identical or have no point in common. In the latter case write down a formula for the shortest distance between Π1 and Π2 if P = (p1 , p2 , p3 ) is a given point on the first plane. 9. Find a generic equation of the line of intersection of the planes 2x + 3y + z + 1 = 0 and x − 4y + 5z − 2 = 0. 10. (Hirst, p.36). In 10a - 10c find the intersection of the three given planes. In each case give a brief geometric description. (a) 3x − 2y + 2z 4x + y − z x+y+z = 5 = 3 = 6 x + 2y − z 2x + 4y − 2z x−y+z = 3 = 5 = 1 x + 2y − z x−y+z 3x + 3y − z = 2 = 1 = 5 (b) (c) 11. * Prove the statement of subsection 1.3.3. Hint 18 Imitate the procedure of No.5 of Exercise 5. 12. (See Figure 1.16) Let n · r + c = 0 and m · r + d = 0 represent non-parallel planes Π1 and Π2 respectively. Show that the Cartesian equation of a plane Π that contains the line of intersection of Π1 and Π2 has the form λ (n · r + c) + µ (m · r + d) = (λn + µm) · r + λc + µd = 0 (1.25) where λ and µ are two scalars not both of which are zero. Hint 19 First note that because n 6= 0 and m 6= 0 are not parallel, the linear combination λn + µm is also non-zero if one or both of λ, µ are non-zero. In that case equation (1.25) represents a plane Π. Note that if (e.g.) λ 6= 0 and µ = 0 then Π. is just Π1 . Now check that a point on Π1 and Π2 is also on Π. 13. Find the form of a Cartesian equation of the plane Π that contains the intersection of the planes 2x1 + 3x2 + x3 + 1 = 0 and x1 − 4x2 + 5x3 − 2 = 0 If the plane Π also contains the point (1, −1, 2), find its Cartesian equation. 36CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS Π1 Π .... .... ... . . . .. .... .. .... .. . . . . . .. . . . .. . . .. .. ... .. .. ... .. ... .. . Π2 Figure 1.16 1.4 Reflections in a line or a plane A line or plane can serve as a mirror so that a point P is reflected in a point P ′ in such a way that P and P ′ are symmetric with respect to the line or plane, as in Figure Figure 1.17 . From the figure the point P ′ is given by ¡ ¢ p′ = OP ′ = OP + 2P Q = p + 2 q − p = 2q − p (1.26) where Q is the foot of the perpendicular from P to the line or plane and we have written p′ for the position vector of P ′ . Example 1.4.1 Consider example 1.2.7 of a line in the x-y plane. Using equation (1.15) and equation (1.26), the reflection of p = (p1 , p2 ) in the line is µ ³ ¶ √ ´ 1√ 1 1 ′ p = 2q − p = 2 3p1 + 3p2 , 3p1 + p2 − (p1 , p2 ) (1.27) 4 4 4 µ ¶ 1 1√ 1√ 1 = p1 + 3p2 , 3p1 − p2 2 2 2 2 Example 1.4.2 Consider (1) of Example 1.3.8 where we found the foot of the perpendicular from p to the plane x + 2y + z = −4 given by equation (1.22): q = (p1 , p2 , p3 ) − 4 + p1 + 2p2 + p3 (1, 2, 1) 6 From this it follows that the reflection P ′ of P in the plane is the point with position p′ = 2q − p = p − 4 + p1 + 2p2 + p3 (1, 2, 1) 3 Exercise 20 1. Using equation (1.12) - which finds the foot Q of the perpendicular from P to the line ℓ with parametric equation r = a + tu - find a formula for the reflection P ′ of P in the line ℓ. 37 1.4. REFLECTIONS IN A LINE OR A PLANE P P Q .. Q ... . . . ... .... ❂ ′ P Π ℓ P Reflection in a line ′ ☛ ℓ Reflection in a plane Π Figure 1.17 2. Using equation (1.24) - which finds the foot Q of the perpendicular from P to the plane with Cartesian equation n · r + c = 0 - find a formula for the reflection P ′ of P in the given plane. 3. Specialize your formulae when a = 0 and c = 0. 4. Using the answers to Questions (2) and (3), or otherwise, find formulae for reflections in the planes x + 2y + z = 0, 3x − y − 2z = 1 and 3x − y − 2z = 0. 5. In the x − y plane find a formula for reflection in the line passing through the origin and making an angle of 60◦ with the positive x−axis. 6. * In the x − y plane find a formula for reflection in the line passing through the origin and making an angle of θ radians with the positive x−axis. Hint 21 All these questions use equation (1.26) 38CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS 1.5 1.5.1 Summary of Chapter 1 Points in Space, Vectors (See section 1.1). A point P = (p1 , p2 , p3 ) in space is identified with its position vector OP = p. The displacement from A to B is AB = b − a. Vectors u = (u1 , u2 , u3 ) are underlined. Scalars α,β ,...,s, t,... are real numbers. The product of a vector a by a scalar γ is γa = (γa1 , γa2 , γa3 ). Non-zero vectors a and b are parallel if b = sa for a non-zero scalar s. Vectors a and b are added: a + b = (a1 + b1 , a2 + b2 , a3 + b3 ) Addition satisfies the triangle law AB + BC = AC. The dot product of vectors a and b is a · b = a1 b1 + a2 b2 + a3 b3 The length or magnitude of a is |a| = √ a·a The angle θ between a and b is given by a · b = |a| |b| cos θ. The unit vector in the direction of a 6= 0 is b a= 1 a |a| The component of a in the direction b 6= 0 and the projection of a on b are respectively ³ ´ a · bb and a · bb bb 1.5.2 Straight lines (See section 1.2). A parametric equation of the straight line ℓ through a with direction u 6= 0 is r = a + tu (t ∈ ℜ) The foot Q of the perpendicular from P to ℓ is found by solving ¢ ¡ a + tu − p · u = 0 for t and then substituting this value back into a + tu. Lines ℓ1 and ℓ1 are parallel if they have parallel directions and are skew if they are not parallel and do not intersect. 1.5.3 Planes (See section 1.3). A generic equation of the plane containing two non-parallel vectors u and v and passing through the point A is r = a + su + tv (s, t ∈ ℜ) 39 1.5. SUMMARY OF CHAPTER 1 A vector n 6= 0 is normal to the plane if it is perpendicular to every line lying in the plane. A Cartesian equation of the plane has the form (r − a) · n + c = (x1 − a1 ) n1 + (x2 − a2 ) n2 + (x3 − a3 ) n3 + c = 0 where c is a constant. The foot Q of the perpendicular from a point P to the plane is found by solving ¡ ¢ n · p + tn + c = 0 for t and then substituting this into the line p + tn 1.5.4 Projections and reflections (See equation 1.7, subsection 1.2.5, example 1.3.8 and subsection 1.4). When a line or plane goes through the origin the foot Q of the perpendicular from P to the line or plane is called the projection of P on the plane. The reflection P ′ of a point P in a line or plane is given by p′ = 2q − p where Q is the foot of the perpendicular from P to the line or plane. 40CHAPTER 1. TWO AND THREE-DIMENSIONAL ANALYTIC GEOMETRY. VECTORS Chapter 2 Matrices and the Solution of Simultaneous Linear Equations Linear algebra is used in almost every branch of pure and applied mathematics. It is indispensable for all branches of engineering, the physical sciences (including physics, chemistry and biology), statistics and for operations research (mathematics applied to industry and commerce). Although most results in this course are valid for various number systems (e.g. rational real or complex numbers), it is assumed that all numbers and number-variables dealt with are real, i.e. range over the real number system ℜ. Numbers will also be referred to as scalars. 2.1 The Solution of Linear Equations At school you learned how to solve up to three simultaneous linear equations for three unknowns, say x1 , x2 , x3 , In Chapter 1 we saw how such equations arise in analytic geometry and used vector notation such as a row x = (x1 , x2 , x3 ) to represent the variables x1 , x2 and x3 . In this chapter we will learn some systematic techniques for solving m equations for n unknowns x1 , x2 ,...,xn , and we will see the need to carefully distinguish rows from columns. While geometry often serves as an inspiration for various algebraic results, it is not possible to draw an actual picture representing a system involving n variables when n ≥ 4. 2.1.1 Some examples Example 2.1.1 Find all solutions that satisfy simultaneously 2x1 x1 Solution: + 3x2 − 4x2 = = −1 5 (a) (b) 1. One way: Make x1 “the subject of the formula” in (b), i.e. x1 = 5 + 4x2 and substitute 5 + 4x2 for x1 in (a) to give 2(5 + 4x2 ) + 3x2 = −1, so x2 = −1 and x1 = 1. 2. We prefer to replace the above system by a sequence of equivalent systems (i.e. that have exactly the same solutions), known as Gauss reduction. (a) (First equivalent system). Multiply (b) by 2 and leave (a) unchanged: 2x1 2x1 + 3x2 − 8x2 = = −1 leave unchanged 10 2 × (b) 41 (a) (b) 42CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS (b) (Second equivalent system). In the new system subtract (a) from (b), to obtain: 2x1 + 3x2 − 11x2 −1 11 = = leave unchanged [ (-1) × (a)]+(b) (a) (b) It is now obvious from (b) that x2 = −1. By back-substitution in (a) we get x1 = 1. For reasons that will be become clear later we write this unique solution as a column vector · ¸ ¸ · x1 1 = x= x2 −1 Example 2.1.2 Consider x1 x1 −x1 + + + x2 x2 x2 + 3x3 + 4x3 − 2x3 = −1 = −3 = 1 (a) (b) (c) (2.1) Solution: x1 + x2 −x1 + x2 x1 + x2 2x2 + 3x3 x3 − 2x3 = = = −1 −2 1 leave unchanged [(−1) × (a)]+(b) leave unchanged (a) (b) (c) + 3x3 x3 + x3 = = = −1 −2 0 leave unchanged leave unchanged (a) + (c) (a) (b) (c) Finally, interchange the equations (b) and (c): x1 + x2 2x2 + 3x3 + x3 x3 = = = −1 0 −2 (a) (b) (c) (2.2) Because the coefficients of x1 in (b) and (c) are zero and also because the coefficient of x2 in (c) is zero, we say the system is in (row) echelon or (upper) triangular form. From (c), x3 = −2. Back-substitution into (b) gives x2 = 1. Finally, back-substitution of x3 = −2 and x2 = 1 into (a) gives x1 = 4. The solution is     x1 4 x =  x2  =  1  (2.3) x3 −2 Example 2.1.3 Find all solutions x to the system x1 + x2 2x2 + 3x3 + x3 x3 + 2x4 + x4 − x4 = −1 = 0 = −2 (a) (b) (c) (2.4) Solution: There are only three equations for four unknowns and the equations are already in echelon form. One of the unknowns (say x4 ) can be any number and back-substitution gives in turn x3 = x4 − 2, x2 = − 12 (x3 + x4 ) = 1 − x4 and x1 = −1 − x1 − x2 − 3x3 − 2x4 = 4 − 4x4 . It is customary to let x4 = t, so 43 2.2. MATRICES    x1 4 − 4t  x2   1 − t   x=  x3  =  t − 2 x4 t     (2.5) is the general solution. The parameter t ∈ ℜ can be any real number. Remark 22 Here we have 3 equations for 4 unknowns and the solution shows that we have an infinity of solutions. In Example 1.3.2 of Chapter 1, we found a vector n 6= 0 normal to a plane. For β 6= 0, βn is also a normal, so again we have an infinite number of solutions, this time of two equations in three unknowns. Example 1.3.4 of Chapter 1 involved solving one equation for three unknowns and this involved two parameters. These examples and also Example 2.1.3 illustrate a general result: If a system of equations has more unknowns than equations and there is a solution, then the system has infinitely many solutions. See, for example, 2.2.15 below. In Exercise 52, No.5 you will get a more complete picture of this theorem. Example 2.1.4 For which values of the variables does the following system have a solution? x1 x1 2x1 + x2 + 3x2 + 4x2 + 3x3 + 4x3 + 7x3 = = = −1 −1 5 (a) (b) (c) Solution: x1 + x2 2x2 2x2 + 3x3 + x3 + x3 = −1 = 0 = 7 leave unchanged [ (−1) × (a)]+(b) [ (−2) × (a)]+(c) (a) (b) (c) The system cannot have a solution since (b) and (c) are contradictory. The system is inconsistent (or the equations are incompatible) and the solution set is empty. 2.2 Matrices A rectangular array A of numbers with m rows and n columns is called an m × n (“m by n”) matrix. Matrices will form a new kind of algebra and it is customary to place brackets (we will use square ones) around such arrays. 2.2.1 Examples of matrices  1   − 2 21 −3 13 −1  −1 5 0   −12 31  is a 3 × 4 matrix,   0 −2 7  is a 4 × 3 matrix. 40 −3 1 −2 6   ¸ · 3 2 12 3 2 4 −1 is a 2 × 3 matrix;  −1 0 5  is a 3 × 3 matrix. The column x = 1 −6 − 52 7 8 −2   x1 £ ¤  x2  is a 3 × 1 matrix, while the row y = y1 y2 y3 y4 is a 1 × 4 matrix. x3 One speaks of the row entries of a column and the column entries of a row.  5 −1  −1 1 1 0 44CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 2.2.2 Standard notation for an m × n matrix A Given that A is an m × n matrix, the standard notation is to write A = [aij ]. Here aij is the entry of A in the ith row and j th column. One calls aij the (i, j) entry of A. The first index i labels the rows and has the range i = 1, · · · , m, while the second index j labels the columns and has the range j = 1, · · · , n. For example, if m = 4 and n = 5,   a11 a12 a13 a14 a15  a21 a22 a23 a24 a25   A= (2.6)  a31 a32 a33 a34 a35  a41 a42 a43 a44 a45 is a 4 × 5 matrix. One reads a12 as “a one-two”, not as “a twelve”. Likewise, a34 is “a three-four”, not “a thirty four”. For example, consider the 2 × 3 matrix ¸ · 2 3 −1 A= 1 −4 5 The (1, 1) entry is a11 = 2. It is the first entry in column 1 and also the first entry in row 1. The (2, 1) entry a21 = 1 is the second row entry of column 1 and the first entry in row 2. The numbers a12 = 3, a22 = −4, are the first and second entries of column 2 respectively. Row 2 has entries a21 = 1, a22 = −4, a23 = 5 respectively. The (1, 3) entry of A is a13 = −1 the (2, 3) entry is a23 = 5. It is important to keep in mind that aij is in row i and in column j of A. Sometimes we write A = Am×n to indicate that A is m × n. Thus if A has two rows and three columns, we may write A = A2×3 . Exercise 23 1. For each matrix A in 2.2.1 write down (a) each of its entries in the form aij = ... (b) each of its rows and columns. What notation do you think would be suitable for denoting row i of a matrixA? What notation do you think would be suitable for denoting column j of a matrix A? In Chapter 3, 3.1.1, we will develop a systematic notation for these; here you are only asked to use your imagination. 2. Find a formula for the entries aij in the following 4 × 4  2 −3 4 −5  −3 4 −5 6 A=  4 −5 6 −7 −5 6 −7 8 3. The n × n Hilbert matrix Hn = [hij ] is defined by hij = 1 i+j−1 (i = 1, . . . , n; matrix     j = 1, . . . , n) Write down the matrices H1 , H2 , H3 , H4 . 2.2.3 Expressing m equations in n unknowns as a matrix equation Ax = b A system of linear equations has a matrix of coefficients, or coefficient matrix. For example, the coefficient matrix of Example 2.1.4 is   1 1 3  1 3 4  2 4 7 45 2.2. MATRICES If we are given m simultaneous equations in n unknowns x1 , x2 , ... , xn , there is a standard way to express the equations as a single matrix equation. In the case of Example 2.1.4 this expression is     1 1 3 −1  1 3 4  x =  −1  (2.7) 5 2 4 7 The system of Example 2.1.2 (equations (2.1)) has for coefficient matrix   1 1 3 A= 1 1 4  −1 1 −2 The standard corresponding matrix  1  1 −1 equation is    −1 1 3 1 4  x =  −3  1 1 −2 (2.8) What this means is the equality of two column matrices:     x1 + x2 + 3x3 −1  x1 + x2 + 4x3  =  −3  −x1 + x2 − 2x3 1 (2.9) The system of equations of Example 2.1.3 has the matrix expression    x1    −1 1 1 3 2  x2    0   0 2 1 1   x3  = 0 0 1 −1 −2 x4 (2.10) Written out, we again have the equality of two column matrices:    a11 x1 + a12 x2 + a13 x3 + a14 x4 + a15 x5  a21 x1 + a22 x2 + a23 x3 + a24 x4 + a25 x5       a31 x1 + a32 x2 + a33 x3 + a34 x4 + a35 x5  =  a41 x1 + a42 x2 + a43 x3 + a44 x4 + a45 x5 (2.12) Using the general 4 × 5 matrix of (2.6), the following matrix equation expresses a general system of 4 equations in 5 unknowns x1 , x2 , ... , x5 :    x    a11 a12 a13 a14 a15  1  b1  a21 a22 a23 a24 a25   x2   b2       (2.11)  a31 a32 a33 a34 a35   x3  =  b3   x4  a41 a42 a43 a44 a45 b4 x5  b1 b2   b3  b4 In general, let A be the matrix of coefficients of m simultaneous equations in n unknowns x1 , x2 , ..., xn . If the right-hand side of equation i is bi , the standard way to express the equations in matrix form is (2.13) Ax = b Here the unknowns xj are written as an n × 1 column matrix (a column vector) x and b is the m × 1 column of the bi . Observe that for the matrix equation (2.13) to be meaningful the number of rows of A must be the same as the number of rows of b and the number of columns of A must equal the number of rows of x. 46CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 2.2.4 The dot product generalized Consider the equations of the left-hand side of Example 2.1.2 or of equation (2.9). Each entry is a dot product, e.g. the third entry is (−1, 1, −2) · (x1 , x2 , x3 ) = −x1 + x2 − 2x3 . In the next chapter we will emphasize this as the product of a row and a column:   £ ¤ x1 −1 1 −2  x2  = −x1 + x2 − 2x3 x3 Likewise, the left-hand sides of the first and second equations are respectively   £ ¤ x1 1 1 3  x2  = x1 + x2 + 3x3 x3 £ 1 1 4 ¤   x1  x2  = x1 + x2 + 4x3 x3 Similarly, the left-hand side of the first equation of Example 2.1.3 is the product   x1 £ ¤  x2   1 1 3 2   x3  = x1 + x2 + 3x3 + 2x4 x4 The left-hand side of equation (2.10) in Example 2.1.3 is the column with three products ¤  £    x1 + x2 + 3x3 + 2x4 £ 1 1 3 2 ¤x  =  2x2 + x3 + x4 £ 0 2 1 1 ¤x x3 − x4 0 0 1 −1 x Remark 24 The last examples show that we are dealing with a product which is a generalization of the dot product of Chapter 1 (subsection 1.1.12). As remarked earlier, we will later develop special notations for row i and column j of A. What is more, we will emphasize Ax as the product of the matrices A and x. See 3.1.6. 2.2.5 Detached Coefficients We can also express the above problems compactly by considering the array formed by attaching the right- hand side b to the coefficient matrix A in the form [A, b]. Corresponding to the problem of solving the matrix equation (2.13) we call R = [A, b] the array of detached coefficients or the matrix A augmented by b. Operations done on the equations have their exact counterpart when done on the rows of R. 1. Example 2.1.1 above can be expressed as the matrix equation ¸· ¸ · ¸ · x1 2 3 −1 = 1 −4 x2 5 (2.14) or as the array of detached coefficients: 2 3 −1 1 −4 5 (2.15) We have omitted brackets around the array. Here x1 and x2 can be thought of as labels of the first two columns of the array. The third column of (2.15) is the right-hand side of 47 2.2. MATRICES (2.14). It is clear that two equations in two unknowns determine and are determined by such an array. Instead of performing operations on the given equations, we perform the same operations on this array. In the case of Example 2.1.1 these steps are: 2 3 −1 2 −8 10 2 0 3 −11 −1 11 2R2 (multiply Row 2 by 2) −R1 + R2 (add (−1) × Row 1 to Row 2) From the last array we read off −11x2 = 11, so x2 = −1. The first row reads 2x1 + 3x2 = −1 and x1 = 1. This is the same solution we found before. 2. Example 2.1.2 has an expression as the matrix equation (2.8) and as an array of detached coefficients the equations are: 1 1 3 −1 1 1 4 −3 −1 1 −2 1 Here x1 , x2 , and x3 label the first three columns. The steps leading to the solution are: 1 1 0 0 −1 1 3 −1 1 −2 −2 1 1 1 3 −1 0 0 1 −2 0 2 1 0 −R1 + R2 R1 + R3 Finally, we interchange rows 2 and 3: 1 1 3 −1 0 2 1 0 0 0 1 −2 R2 ↔ R3 (2.16) The equations are in echelon form and we can read off the solution: x3 = −2, 2x2 + x3 = 0, so x2 = 1 and finally from x1 + x2 + 3x3 = −1 we get x1 = 4. We have     x1 4  x2  =  1  x3 −2 This is equation (2.3), found earlier. Observation: It is important to observe that the right-hand side of the original set of equations is a linear combination of the columns of the coefficient matrix with coefficients x1 , x2 and x3 :         1 1 3 −1  1  4 +  1  +  4  (−2) =  −3  (2.17) −1 1 −2 1 Linear combinations were introduced for row vectors in Chapter 1, subsection 1.1.18 and will be systematically developed in Chapter 3. See subsection 3.1.2 and especially 3.1.9. Note that it is natural to write the coefficients on the right of the column vectors since x appears on the right of A when we express the equations in the form Ax = b. 48CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 3. Example 2.1.3 (equations (2.4)) has the matrix expression (2.10). The corresponding array of detached coefficients is already in echelon form: 1 1 3 0 2 1 0 0 1 2 −1 1 0 −1 −2 (2.18) The general solution can be read off as before by putting x4 = t and solving in turn for x3 , x2 and x1 :     x1 4 − 4t  x2   1 − t     x=  x3  =  t − 2  t x4 This is just the solution (2.5) found earlier. A particular solution is found by giving the parameter t some value, eg. t = 3. In that case x1 = −8, x2 = −2, x3 = 1 and x4 = 3. As a linear combination we have           2 −1 3 1 1  0  (−8) +  2  (−2) +  1  +  1  3 =  0  (2.19) 1 −1 −2 0 0 4. Example 2.1.4 has two expressions, one as equation (2.7), and also as the array of detached coefficients (augmented matrix) 1 1 3 1 3 4 2 4 7 −1 −1 5 The steps leading to the solution are: 1 1 3 0 2 1 0 2 1 −1 0 7 −R1 + R2 −2R1 + R3 Taking it a step further, we obtain the echelon (triangular) form: 1 1 3 0 2 1 0 0 0 −1 0 7 (2.20) −R2 + R3 The last row reads 0x1 + 0x2 + 0x3 = 7, an impossibility. The given equations are inconsistent (incompatible); they do not have a solution. 5. As a detached array, the equations (2.11) appear as a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 a15 a25 a35 a45 b1 b2 b3 b4 (2.21) Remark 25 Our convention is that an operation performed on row i is written next to row i after the operation has been done. 49 2.2. MATRICES 2.2.6 Permissible operations on equations: Elementary row operations. Equivalent row operations on the augmented matrix R = [A, b] The row operations which do not alter the solution set of a system Ax = b of m simultaneous linear equations for n unknowns are: 1. Interchange equations i and j (interchange rows Ri and Rj of the corresponding array of detached coefficients). 2. Multiply equation i (multiply row Ri of the corresponding array) by a scalar β 6= 0. 3. For any scalar γ and i 6= j, add γ times equation i to equation j (add γ times row Ri of the corresponding array to row Rj ). The above three operations done on an array are called elementary row operations (abbreviated as eros). Note that forming βRi is just like multiplying a vector in 3-space by a scalar, except that now the vector has n + 1 entries. The type (3) operation replaces Rj with γRi + Rj , where addition of rows is addition of vectors. What is obvious (and assumed at school) is that if x is a simultaneous solution of the m given equations before any one of the above eros is performed, then x remains a solution after the operation is performed. To see that the reverse is true, suppose we have just performed the elementary row operation of type (3) and that x is a solution to the new system of equations. Adding −γ times equation i (−γ times row Ri of the augmented array) to equation j (row Rj of the array) in the new system of equations (array) brings us back to the system before the type (3) operation was performed. This shows that x is also a solution to the old system. For example, consider example 2.1.2 again and the first step toward a solution, as in 2: 1 1 3 −1 1 1 4 −3 −1 1 −2 1 1 1 0 0 −1 1 3 −1 1 −2 −2 1 −R1 + R2 Adding row 1 to row 2 of this array restores the original array: 1 1 3 −1 1 1 4 −3 −1 1 −2 1 R1 + R2 The following more general setting refers to the array (2.21). Suppose we add γ times Row 2 to Row 3, obtaining a11 a21 γa21 + a31 a41 a12 a22 γa22 + a32 a42 a13 a23 γa23 + a33 a43 a14 a24 γa24 + a34 a44 a15 a25 γa25 + a35 a45 b1 b2 γb2 + b3 b4 Then, a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 brings us back to the original array. a14 a24 a34 a44 a15 a25 a35 a45 b1 b2 b3 b4 −γR2 + R3 γR2 + R3 50CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS Remark 26 For convenience we have been using Ri for row i of an array R. In the next chapter we will use a slightly different notation for row i of a matrix. 2.2.7 Arrays in row echelon form (row EF): Proper Definition An array is in row echelon form if 1. All zero rows come after the non-zero rows. 2. For non-zero rows, the first non-zero entry in row i+1 comes later than the first non-zero entry in row i. Remark 27 Row echelon forms are also called upper triangular. In subsection 3.4.14 of Chapter 3 we will also discuss elementary column operations and column echelon forms. There, the distinction between the row and column forms will matter. 2.2.8 How an array gets reduced to row echelon form At the start, and at any later stage, we arrange the rows so that a all zero rows come after non-zero rows and b for the non-zero rows, the first non-zero entry in row i + 1 does not come earlier than the first non-zero entry in row i. Suppose that only the first r rows are non-zero and in row echelon form. The first non-zero entry in row r is called the pivot element and row r is the pivot or reference row. We assume that (a) and (b) hold. If the remaining rows are zero we are finished. Otherwise, multiples of row r are added to rows below it in such a way that all entries below the pivot element become zeros. The first r + 1 rows are then in echelon form and this remains the case after arranging rows r + 1, r + 2,... so that (a) and (b) hold. The process ends when the whole array is in row echelon form and then each non-zero row has a pivot element. The columns of the array containing a pivot element are called pivot columns. (This will be used in 2.2.15 below and also in subsection 3.4.9). 2.2.9 Further examples Example 2.2.1 Reduce to row echelon form −3 −10 2 7 5 6 Solution (method 1) 1 7 10 1 3 0 − 55 3 10 3 5 − 23 6 − 23 32 3 ¡ 1 ¢ − 3 R1 ref row, pivot1 −7R1 + R2 Remark 28 This illustrates how the pivot element can always be made 1. 51 2.2. MATRICES Method 2 (Fraction-free solution) −21 −70 14 21 15 18 7R1 −21 −70 14 0 −55 32 ref row 3R2 R1 + R2 Remark 29 This example illustrates how it is possible to obtain a fraction-free answer when all entries are integers or even fractions. Example 2.2.2 Reduce the array of detached coefficients to row echelon form and hence solve     1 0 2 −1  3 0 5 x =  1  −5 1 −10 2 Solution: The augmented matrix (without brackets) is 1 0 3 0 −5 1 2 5 −10 −1 ref row 1 2 Elementary row operations (eros) bringing the detached array into echelon form are: 1 0 2 −1 0 0 −1 4 0 1 0 −3 −3R1 + R2 5R1 + R3 then 1 0 0 1 0 0 From this row echelon form we find the unique solution   7 x =  −3  −4 Example 2.2.3 Triangulate and hence solve  3 −3 1 10  2 −3 −1 3   −1 2 0 3 1 −1 1 2 Solution:  2 −1 0 −3 −1 4   x1 2   x2   0      x3  =  −6 x4 4 R2 ↔ R3     The initial array of detached coefficients is 3 −3 1 10 2 −3 −1 3 −1 2 0 3 1 −1 1 2 2 0 −6 4 The steps giving equivalent systems are: Step 1: 1 −1 1 2 2 −3 −1 3 −1 2 0 3 3 −3 1 10 4 0 −6 2 R1 ↔ R4 interchange row 1 and row 4 ref row (2.22) 52CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS We have interchanged rows 1 and 4 Step 2 1 −1 1 2 0 −1 −3 −1 0 1 1 5 0 0 −2 4 as it is convenient to have a 1 as pivot element. 4 −8 −2 10 −2R1 + R2 new ref row R1 + R3 (−3) R1 + R4 At this point the first two rows are in row echelon form (row EF). Step 3. Use row 2 for reference row to obtain 1 −1 1 2 0 −1 −3 −1 0 0 −2 4 0 0 −2 4 4 −8 −10 −10 R2 + R3 new ref row Step 4. Row 3 is the new reference row: 1 −1 1 2 0 −1 −3 −1 0 0 −2 4 0 0 0 0 4 −8 −10 0 − R3 + R4 This is in echelon form but we can simplify still further: 1 −1 1 2 4 0 1 3 1 8 0 0 1 −2 5 0 0 0 0 0 −R2 1 −2 R3 (2.23) The required general solution to Example 2.2.3 is with x4 = t the parameter, x3 = 2t + 5, x2 = −7t − 7 and x1 = −11t − 8. Or,     x1 −11t − 8  x2   −7t − 7      (2.24)  x3  =  2t + 5  x4 t As a linear combination of the columns of the coefficient matrix,        3 −3 1  2   −3   −1           −1  (−11t − 8) +  2  (−7t − 7) +  0  (2t + 5) +  1 −1 1   10  3  t =   3  2  2 0   −6  4 (2.25) Remark 30 Each column vector has 4 entries instead of 3 and so we cannot visualize these vectors in space. Nevertheless, this should not be a problem. In Chapter 3 we will consider n− dimensional vectors. See subsections 3.1.2 3.1.9. Remark 31 Although we associate arrays with simultaneous equations, it is important to realize that such operations can be done on any arrays, i.e. matrices, without any reference to solving equations in the usual sense. Note however the following: Reducing a matrix A to to row echelon form amounts to reducing the augmented matrix [A, 0] to row echelon form. This in turn amounts to solving Ax = 0. See Remark 34. Example 2.2.4 Reduce the following matrix to row echelon form: 53 2.2. MATRICES   0 0 −3 −3 −1  2 −2 −2 −3 −2   A=  0 0 3 −2 −2  2 −3 −4 3 −3 Solution: In order for (b) in 2.2.8 to hold, first interchange rows 1 and 4 of A: 2 −3 −4 3 −3 2 −2 −2 −3 −2 0 0 3 −2 −2 0 0 −3 −3 −1 R 1 ↔ R4 ref row Next: 2 −3 −4 3 −3 0 1 2 −6 1 0 0 3 −2 −2 0 0 −3 −3 −1 −R1 + R2 new ref row Finally, 2 −3 −4 3 −3 0 1 2 −6 1 0 0 3 −2 −2 0 0 0 −5 −3 (2.26) R3 + R4 The matrix is now in row echelon form. Exercise 32 1. Express the following simultaneous equations in the form Ax = b and use Gauss-reduction and detached coefficients to find solutions. Display the final array in row echelon form. Also, express b as a linear combination of the columns of A whenever this is possible. (a) i. x1 3x1 ii. iii. iv. v. − − 2x2 x2 = −1 = 1 x − 2y 2x − 4y = = −3 −1 x − 2y 2x − 4y = = −3 −6 x1 3x1 4x1 − − − 2x2 x2 3x2 x − 2y 2x + y 3x − y = −3 = 1 = 5 = = = −3 4 1 vi. 0x + 0y = 0 54CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS vii. 3x + 0y (b) = −3 i. − x1 ii. x1 + x2 2x2 2x2 3x2 + − − x3 + x3 x3 x3 = 0 = 1 − x4 + 4x4 x4 iii. x1 + 2x2 + x3 = x3 − 5x4 = 1 = −1 = −2 −1 iv. x1 + 2x2 + = 0 2. (a) In solving simultaneous equations, if we drop an equation the resulting solution set contains and is usually larger than the original solution set. Illustrate this with an example. (b) Rule 3 of permissable row operations in 2.2.6 says: For any scalar γ and i 6= j, add γ times row Ri of the array to row Rj . Why must we have i 6= j? (c) In an array with rows R1 , . . . , Rm , let i 6= j. Suppose that at least one of the numbers α, β is not zero and form the row αRi + βRj . Under what conditions can this new row replace row i or row j? Describe what is going on in terms of the strict rules of 2.2.6. 3. * (Simplified Elementary Row Operations). Consider the following simplified set of two elementary row operations that can be applied to an array. Show that each row operation in 2.2.6 can be obtained by applying a sequence of these two operations. In other words, the two row operations are equivalent to those in 2.2.6. (a) Multiply row Ri of the array by a scalar β 6= 0. (b) For i 6= j add row Ri to row Rj . Hint 33 In order to add αRi to Rj when i 6= j, we may obviously assume α 6= 0. Perform (a) with β = α then do (b), i.e. add the resulting row i to row j. What must be done next? It still remains to show that we can interchange rows Ri and Rj of the array using operations like (a) and (b). Try adding row i to row j, then changing the sign of row i. 2.2.10 Homogeneous Equations A system Ax = b of m equations in n unknowns is called homogeneous if in the right-hand side all bi = 0. Such a system always has at least one solution, namely x = 0, the column vector with all n entries xj = 0. This is known as the trivial solution. The existence of non-trivial solutions is the subject of subsection 2.2.15. Every system defines a corresponding homogeneous system. Example 2.2.5 Consider the following homogeneous system of equations:      3 −3 1 10 x1 0  2 −3 −1 3   x2   0    =   −1 2 0 3   x3   0  1 −1 1 2 x4 0 (2.27) 55 2.2. MATRICES This is the homogeneous system corresponding to Example 2.2.3. We found the row echelon form of the detached coefficients for that example as (2.23). The row EF for the homogeneous system (2.27) is therefore 1 −1 1 2 0 0 1 3 1 0 0 0 1 −2 0 0 0 0 0 0 The general solution of (2.27) is with x4 = t ∈ ℜ any real number,     −11t x1  x2   −7t       x3  =  2t  x4 t (2.28) Remark 34 When solving a homogeneous system Ax = 0, the column 0 can simply be ignored in the array of detached coefficients. Exercise 35 1. Find a row echelon form of the following matrices: (a)   1 2 −3 −1  −3 4 1 2   A=  −2 6 −2 1  4 −2 −4 −3 (b)  1  2 B=  −3 −1 (c) How are the matrices A and B related? 2. Let:  −3 −2 4 4 6 −2   1 −2 −4  2 1 −3   1 1 2 1 −3  2 1 5 2 −7    A= 0 −1 3 5 −4  −2 1 −3 9 3 (a) Reduce A to echelon form. (b) Solve the homogeneous system Ax = 0. Express 0 as a linear combination of the columns of A in a non-trivial way. Hint 36 For coefficients take the values of xi in a homogeneous solution that are not all zero. (The trivial linear combination is the one with all coefficients 0.) The columns of the above A are said to be linearly dependent. Linear independence and dependence will be studied formally in Chapter 3, section 3.4. See also the theorem 2.2.15 below. 3. (a) Reduce the array of detached coefficients  1 −1 1  2 −2 −1   −1 1 0 1 −2 2 to echelon form   2 4  0 3  x =   −6 3  5 5 and solve     56CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS (b) Show that whatever the RHS b may be, the system always has a unique solution. 4. (a) Solve for x:   1 1 2 1   2 1 5 2      0 −1 3 5  x =  3 1 10 8   −3 −7   −4  −14 (b) Solve the corresponding homogeneous system for x:    1 1 2 1  2 1  5 2      0 −1 3 5  x =  3 1 10 8 5. (a) Solve for x: · 1 −1 1 2 2 −2 −1 3 ¸ x= ·  0 0   0  0 4 0 ¸ (This consists of the first two equations of No.(3). (b) Solve the homogeneous system for x. 6. (a) What are the solutions x to  ?    −1 3 1 1 4  2 5 1 7  x =  −1  2 1 9 1 10 (b) Solve the corresponding homogeneous system. 7. If you have one solution to 4a and the general solution of 4b, show that you can write down the general solution of 4a. 8. The same for 5a and 5b. 9. Solve the following system for x and relate your solution counterpart.    2 1 3 5 −1  −2 −1 −1 −2   0  x =   4  −3 2 7 11  6 3 10 16 −4 to that of its homogeneous     10. Can you state a general theorem of which Nos.(7), (8) and (9) are special cases? Note that one solution to Example 2.2.3 (matrix equation (2.22)) plus the general solution (2.28) to the homogeneous Example 2.2.5 gives the general solution (2.24) to Example 2.2.3. We will return to this in the next chapter. (See Exercise 52, No.5). 11. (a) Solve the system for x, y, z:     1 1 −1  −3 x  −2 3   7 4   y  =   −1 4  −2 1  z 4 −1 −2 2 (b) Find the homogeneous solution to (a).     57 2.2. MATRICES 12. Reduce to row EF  13. * (Allenby, p.30) For any a, b,  1  3   1 3 c and d discuss the solution x of    2 −1 3 −1 a   4 2 5 −2  x =  b   c  0 4 −1 0  2 7 1 −1 d  2 1 3 5  −2 −1 −1 −2     4 2 7 11  6 3 10 16 Find also the corresponding homogeneous solution. 14. For which scalars α and β is the system     1 2 −3 α  3 −1 2  x =  β  1 −5 8 −α consistent? (recall that a system of equations is consistent (or compatible) if it has at least one solution). If so, find the general solution x. 15. Show that the system  1 1  3 4 2 3    2 λ 2 x =  λ  1 −1 is consistent for any scalar λ. Find those values of λ for which i the system has a unique solution x and ii more than one solution x. 16. Let Ax = b represent a system of m equations for n unknowns. Which of the following is true? Give reasons. (a) (b) (c) (d) 2.2.11 The The The The system system system system always has a solution if m < n. always has a solution if m = n. never has a solution if m > n (more equations than unknowns). always has a solution if it is homogeneous. Row Reduced Echelon Form (row REF) A matrix A = [aij ] is said to be in row reduced echelon form (row REF) if 1. It is in row echelon form. 2. If aij (the pivot entry) is the first non-zero entry in row i, then (a) aij = 1 and (b) all other entries in column j are zeros. 2.2.12  Some examples in row reduced echelon form   0 1 −3 0 0  0  0 0 1 0 ,   0 0 0 0 1 0 1 0 0 0 0 1 0 0   5 1 3 6  ,  0 0 0  0 0 0  −5 0 0 7 0 1 0 −2  0 0 1 3 58CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 2.2.13 Any matrix can be brought into row reduced echelon form by suitable elementary row operations Let A be in row echelon form. Suppose that A has r non-zero rows. We first divide each non-zero row by its leading (first non-zero) entry. So we may suppose that the leading entry of row r is arj = 1. Use row r for reference row and arj as a pivot element to reduce all entries above it to zeros. Now repeat the process with row r − 1 etc., the last reference row being row 2. The resulting array is in row reduced echelon form (row REF). Note that in converting a row EF to a row REF all the pivot elements remain in the same positions but become 1s. The process is known as the Gauss-Jordan method. Remark 37 Although a matrix does not in general have a unique row echelon form (which ones do?), it can be shown that the row reduced echelon form is unique. 2.2.14 Some examples of Gauss-Jordan reduction Example 2.2.6 For our first example let us reduce the array of Example 2.2.1 to row reduced echelon form. We found the row EF 10 1 3 0 − 55 3 − 32 32 3 Step 1: 1 0 10 3 1 − 23 32 − 55 Step 2: 1 0 0 1 14 11 32 − 55 ¡ ¢ 3 − 55 R2 ref row − 10 3 R 2 + R1 Example 2.2.7 As a second example consider Example 2.1.2. Its row echelon form was found in (2.16): 1 1 3 0 2 1 0 0 1 The following steps bring it into row REF. Step 1: 1 1 3 −1 0 1 21 0 0 0 1 −2 −1 0 −2 1 2 R2 ref row, pivot a33 = 1 Step 2: 1 1 0 0 1 0 0 0 1 5 1 −2 − 12 R3 + R2 −3R3 + R1 new ref row, pivot a22 = 1 Step 3 1 0 0 0 1 0 0 0 1 4 1 −2 −R2 + R1 We read off the same solution (2.3) as before, only that it is easier to do. 59 2.2. MATRICES Example 2.2.8 Consider Example 2.1.3 again. From (2.18) we already have the row EF of detached coefficients: 1 1 3 0 2 1 0 0 1 The following steps bring it into row REF. Step 1 1 1 3 2 −1 1 0 0 1 12 2 0 0 1 −1 −2 2 −1 1 0 −1 −2 1 2 R2 ref row, pivot a33 = 1 Step 2 Step 3 1 1 0 5 0 1 0 1 0 0 1 −1 5 1 −2 − 12 R3 + R2 1 0 0 0 1 0 0 0 1 −3R3 + R1 new ref row, pivot a22 = 1 −R2 + R1 4 4 1 1 −1 −2 Reading off the solution gives (unsurprisingly) the same solution (2.5) that we found before. Example 2.2.9 Example 2.1.4 (equation (2.7)) has a row echelon form (2.20): 1 1 3 −1 0 2 1 0 0 0 0 7 The following steps bring it into row REF 1 1 0 1 0 0 1 1 0 1 0 0 1 0 0 1 0 0 −1 0 1 3 1 2 0 3 0 0 1 1 2 0 5 2 1 2 0 0 0 1 1 2 R2 1 7 R3 R3 + R1 −R2 + R1 (2.29) Example 2.2.10 As a final example, consider Example 2.2.3. We found the row echelon form (2.23): 1 −1 1 2 4 0 1 3 1 8 0 0 1 −2 5 ref row 0 0 0 0 0 It is now two small steps to getting Step 1 1 −1 0 0 1 0 0 0 1 0 0 0 the reduced form: 4 7 −2 0 −1 −7 5 0 −R3 + R1 −3R3 + R2 ref row 60CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS Step 2 1 0 0 0 0 1 0 0 11 −8 7 −7 −2 5 0 0 0 0 1 0 R2 + R1 The array of detached coefficients of Example 2.2.3 therefore has row REF 1 0 0 0 0 1 0 0 0 0 1 0 11 7 −2 0 −8 −7 5 0 The required solution to Example 2.2.3 (matrix equation (2.22)) is (unsurprisingly) as before. We note that the row REF of the coefficient matrix of the same example is 1 0 0 0 0 1 0 0 0 0 1 0 11 7 −2 0 (2.30) The solution (2.28) to Example 2.2.5 can again be read off this array. 2.2.15 A homogeneous system with more unknowns than equations always has a non-trivial solution More precisely, we will prove the following Theorem: Let Ax = 0 be a homogeneous system of m equations in n unknowns and suppose that the row echelon form of A has r < n non-zero rows. Then there are solutions x in which the xj corresponding to the k = n−r non-pivot columns can have arbitrary values. In other words, the solution set has k independent parameters. Proof Consider the row REF A′′ of A. The matrix A′′ has r non-zero rows and r pivot columns. Thus there are k = n − r non-pivot columns. By the nature of the row REF, the k variables xj corresponding to these columns can be given any values, and we can then solve uniquely for the remaining r variables. (See the illustrative examples below.) Corollary A homogeneous system with more unknowns than equations always has a non-trivial solution. Proof With the above notation, we have m < n and since the number r of non-zero rows in a row REF satisfies r ≤ m, we have r < n. Note 38 In Corollary 91 of section 3.6 in chapter 3 there is an alternative approach to the topic. Some illustrative examples 1. Suppose that the row REF of a matrix  1 A′′ =  0 0 A is 3 0 0  −5 0 0 7 0 1 0 −2  0 0 1 3 61 2.2. MATRICES Here the non-pivot columns are columns 2,  −3x2 + 5x3 − 7x6  x2   x3 x=  2x 6   −3x6 x6 3 and 6. The general solution to Ax = 0 is    −3t1 + 5t2 − 7t3    t1       t2 =     2t3       −3t3 t3 We have renamed the parameters as t1 = x2 , t2 = x3 and t3 = x6 . 2. As a second example consider Example 2.2.5 (equation array (2.30)):    1 0 0 11 0  0 1 0 7   0     0 0 1 −2  x =  0 0 0 0 0 0 2.27) which in REF reads (see     This shows that we are actually dealing with 3 equations for 4 unknowns. Column 4 of the coefficient matrix is the only non-pivot column. Hence x4 = te can have any value and we get the same solution as before (equation (2.28). Exercise 39 1. Prove that if the augmented array [A, b] for a system of equations Ax = b has been reduced to row (reduced) echelon form [A′ , b′ ] then A′ is also in (reduced) echelon form. Hint 40 A zero row of A′ comes after a non-zero row of A′ . 2. Find the row reduced echelon form of the matrices and arrays of detached coefficients in Exercise (35), numbers 1a, 1b, 2, 3, 4, 9, 11 , 12 and 13. Solve (once more) the problems involving systems of equations Ax = b using the row REF. Write down the row REF of the corresponding coefficient matrix A. Illustrate the above theorem 2.2.15 in the homogeneous cases. 3. Consider the general problem of solving two linear equations for two unknowns x1 and x2 : ¸ ¸ · ¸· · b1 x1 a11 a12 (2.31) = b2 x2 a21 a22 (a) In (2.31) all the entries except x1 and x2 are supposed known. Using formal Gaussreduction find formulae for the unknowns x1 and x2 in terms of the other symbols. For the purposes of reduction you may assume any number you like is non-zero. In fact, conclude that if only d = a11 a22 − a21 a12 6= 0, then you have a formula for x. Can you see that the solution must be unique? Remark 41 The number a11 a22 − a21 a12 is called the determinant of the matrix A = [aij ]. and is denoted by |A|. For example, ¯ ¯ ¯ 3 −2 ¯ ¯ ¯ ¯ 7 4 ¯ = (3) (4) − (7) (−2) = 26 We will study determinants in some detail in Chapter 4. Hint 42 Consider the array of detached coefficients and, assuming a11 6= 0, the step a11 0 a12 a12 − a21 a11 + a22 Now assume − and solve for x2 . Next, find x1 . b1 − aa2111b1 + b2 a21 a12 + a22 6= 0 a11 21 − aa11 R 1 + R2 (2.32) 62CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS (b) Use your formula to solve for x: ¸ · ¸ · −5 11 −2 x= 3 −7 13 2.3. SUMMARY OF CHAPTER 2 2.3 2.3.1 63 Summary of Chapter 2 Matrix equation of simultaneous equations Given a system of m equations for n unknowns, this is expressed compactly by a matrix equation (See section 2.2). Ax = b where A is the m × n coefficient matrix, b is a column with m entries and x is a column with n entries. Entry i of the column matrix Ax is the ‘dot’ product of row i of A and the column x (See subsection 2.2.3 and equation 2.13):   x1  ¤ £ ¤  x2  £ ai1 ai2 · · · ain  .  = ai1 x1 + · · · + ain xn  ..  xn 2.3.2 Detached Coefficients (See subsection 2.2.5). The other way to express the system of equations Ax = b is as the augmented matrix, or array of detached coefficients C = [A, b] where the last column of R is understood to be the column b of constants. 2.3.3 Elementary row operations (eros). Row echelon and row reduced echelon form (See subsections 2.2.6, 2.2.8 and 2.2.11). Elementary row operations (eros), known as Gauss reduction, are done on the array R reducing it to row echelon form (row EF): ¤ £ C ′ = A′ , b ′ Further eros (Gauss-Jordan) done on C ′ reduce it to row reduced echelon form (row REF): ¤ £ C ′′ = A′′ , b′′ The important fact about elementary row operations on a system of equations (or its detached array of coefficients) is that they leave the solution set unchanged. For any particular x the equation Ax = b is satisfied if, and only if, the equation A′ x = b′ is satisfied if, and only if, the equation A′′ x = b′′ is satisfied. 2.3.4 Homogeneous equations (See subsection 2.2.10). A homogeneous system of equations has b = 0 and consequently b′ = b′′ = 0, since an elementary row operation done on 0 leaves it as 0. For a particular x, the matrix equation Ax = 0 holds if, and only if, A′ x = 0 if, and only if, ′′ A x = 0 holds. A system of homogeneous equations with more unknowns than equations always has a nontrivial solution (see 2.2.15). 64CHAPTER 2. MATRICES AND THE SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS Chapter 3 Linear Transformations and Matrices In this chapter we will learn how the solution to a set of linear equations is the inverse operation of the interpretation of a matrix A as a transformation or mapping. In this sense A transforms a vector x into another vector y = Ax. Examples in ordinary space are projections on planes and lines through the origin, reflections in such objects and rotations about an axis through O (important for mechanics). Next, we deal with the difficult idea of linear independence, and it is followed by a section on subspaces that can be regarded as an introduction to the idea of a general ‘vector space’. Finally, there is a section on inverses of matrices. 3.1 The Algebra of Vectors and Matrices We begin by developing some systematic notation for m × n matrices, their rows and columns and for column and row vectors in general. 3.1.1 Systematic Notation For Matrices 1. Recall from section 2.2 that an m × n matrix is a rectangular A with m rows and n columns. In standard notation, A = [aij ] where the i − j entry aij ∈ ℜ is at the intersection of row i and column j. The set of m × n matrices is denoted by ℜm×n . Hence A ∈ ℜm×n means that A has m rows and n columns and that the size of A is m × n. Some authors write A = Am×n to indicate the size of A. Alternative notation notation If A = [aij ], we will also find it most convenient to use the [A]ij = aij (3.1) to describe the i − j entry aij of A. An n × n matrix is called square of size n. Such matrices deserve to be studied in their own right (see section 3.5). 2. From the definitions it follows that ℜn×1 is the set of columns with n entries. We call 65 66 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES these column vectors and write    b=  b1 b2 .. . bn      ℜ1×n denotes the set of row vectors with n entries: £ ¤ a = a1 a2 ... an (3.2) (3.3) For typographical reasons this row vector is sometimes written [a1 , ..., an ]. It is traditional to use ℜn for both ℜn×1 (column vectors) and ℜ1×n (row vectors). We will only use the notation ℜn if the context makes it quite clear whether rows or columns are meant. 3. The m × n zero matrix has 0 for all its entries and is written Om×n , or simply as O when the size is understood. 4. Addition of matrices Let A = [aij ] and B = [bij ] be of the same m × n size. Their sum A + B = C where the (i, j)-entry of C is cij = aij + bij . In brief, using the notation or equation (3.1), [A + B]ij = [A]ij + [B]ij (3.4) It is essential to note that the addition A + B makes sense only if A and B have the same size. 5. Multiplication of a matrix by a scalar. Let A = [aij ] and s ∈ ℜ. The product sA = As has the number saij in the (i, j) −position: [sA]ij = [As]ij = s [A]ij = [A]ij s Remark 43 Notice that it makes no difference if we write the scalar s on the left or on the right of A. The matrix −A is defined as (−1) A and has for (i, j) entry the number −aij i.e., [(−1) A]ij = − [A]ij . The difference A − B of two matrices of the same size means the same thing as A + (−B). £ ¤ We will see that if a is a row vector (3.3) then it is more natural to write sa = sa1 sa2 ... san rather than as. On the other hand, for a column b, as in equation (3.2), it is better to write bs. See, for example, Equation (3.23) below. 6. Row i of the m × n matrix A = [aij ] is denoted by Ai• . Hence, £ ¤ Ai• = ai1 ai2 .... ain (i = 1, · · · , m) 7. Column j of the m × n matrix A = [aij ]  a1j  a2j  A•j =  .  .. amj is denoted by A•j . Hence,      (j = 1, · · · , n) (3.5) (3.6) Remark 44 The symbols for row i and column j are consistent with our other notation. Thus [A]ij = aij is at the intersection of row Ai• and column A•j . Other notations in use for the ith row of A, are Ai and [A]i , and for the j th column of j A, A(j) or [A] . However, we will not use these. 67 3.1. THE ALGEBRA OF VECTORS AND MATRICES 8. The unit column vectors ej ∈ ℜn×1 (j = 1, ..., n) have a 1 in position j and zeros elsewhere:   0  ..   .     row j (3.7) ej =   1   .   ..  0 9. The n × n Identity matrix is the n × n matrix with columns e1 , ... ,en (in that order). It is denoted by In (or simply I if n is understood):   .. 1 0 . 0     .  0 1 .. 0   In =  . . . .  (3.8)   .. .. .. ..    . 0 0 .. 1 As we will see, In behaves like the number 1 when we multiply matrices. (See Exercise 63 No.5). We note that, writing I = In , by definition I•j = ej for j = 1, · · · , n. The unit row vectors are £ ¤ Ii• = 0 · · · 1 (position i) · · · 0 (i = 1, · · · , n) The identity I = In of (3.8) is equivalently defined by [I]ij = 1 if i = j and [I]ij = 0 if i 6= j (i, j = 1, · · · , n) 10. The transpose of the m × n matrix A = [aij ] is denoted by AT (“A transpose”). The transpose AT has for its (i, j) entry the number aji . Thus AT is an n × m matrix and can also be vaguely defined as ‘the matrix A with its rows and columns interchanged’. In fact, using the notation (3.1), £ T¤ (3.9) A ji = [A]ij (i = 1, ..., n; j = 1, 2, ..., m) 11. The product a b of the row a with n entries and the column b with n of entries is   b1  £ ¤  b1  a b = a1 a2 ... an  .  = a1 b1 + a2 b2 + · · · + an bn (3.10)  ..  bn Remark 45 This was introduced in subsection 2.2.3 of Chapter 2. You can, if you wish, call this a ‘dot product’ since it is a generalization of the dot product Equation (1.4) of Chapter 1, but there is no need here for the ‘dot’. Note the order in a b: first a, then b, This is important as ba will turn out to be something quite different: in fact, an n × n matrix.(See No.3 of Exercise 63). 3.1.2 Linear combinations of vectors, linear dependency, Span 1. The column vector v ∈ ℜn×1 is a linear combination of the vectors u1 , u2 ,...,uk with coefficients s1 , s2 , ..., sk if v = u1 s1 + u2 s2 + · · · + uk sk (3.11) 68 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES We then say that v depends linearly on u1 , u2 ,. . . ,uk . Of course, we can also write v = s1 u1 + s2 u2 + · · · + sk uk , with the coefficients on the left, but, as remarked, we prefer to write the coefficients of columns on the right. If v and u1 , u2 ,. . . ,uk are row vectors all of the same size our preference would be v = s1 u1 + s2 u2 + · · · + sk uk 2. The idea of ‘span’ was introduced in Exercise 2, No.14. The span of (column or row) vectors u1 , u2 , . . . , uk is the set of all linear combinations of these vectors. It is also called the space spanned by the vectors and is denoted by sp(u1 , . . . , uk ). Thus, v ∈ sp(u1 , u2 , . . . , uk ) if, and only if, (3.11) holds for certain coefficients s1 , . . . , sk . The idea of ‘span’ first came up in No.14 of Exercise 2. The column space of a matrix A is the span of its columns; its row space is the space spanned by its rows. The ideas of ‘span’, column space and row space will only be systematically developed in the section 3.6 on vector spaces. 3. Linear combinations of matrices We wish to extend the definition of ‘linear combination’ to matrices: Let A1 , A2 , ..Ak be k matrices, all of the same size and suppose that s1 , s2 , ..., sk are k scalars. The linear combination of A1 , A2 , ..,Ak with coefficients s1 , s2 , ..., sk is s1 A1 + s2 A2 + ... + sk Ak 3.1.3 Some examples 1. · −1 2 2. · − 73 2 5 5 4 13 −4 −7 ¸ 3.  5 2 3 7 2  ∈ ℜ3×2 , −1 2 6. (a) Let Then · ¸ 0 0 0 0 ∈ ℜ2×2 3 · · −4 ∈ ℜ2×3 ,  5 11  5  ∈ ℜ2×1 ,  − 92  ∈ ℜ3×1 6 4. 5.  ¸ 5 4 −4 O23 = · 0 0 0 0 0 0 ¸ + · −1 45 13 2 −4 −7 13 −7 ¸ ¸ , O22 = ¸ = −1 45 13 2 −4 −7 ¸ 2 − 74 5 −3 2 (100) = (100) · ·  A= A1• = A3• = −1 5 4 13 £ £ −7 −7 · 1 −1 = ·  2 − 23 −4 5  3 −7 8 −1 2 13 0 0 0 0 −7 − 32 3 8 ¤ ¤ ¸ − 21 − 32 −100 200 6 −14 125 −400 ¸ 1300 −700 ¸ (3.12) 69 3.1. THE ALGEBRA OF VECTORS AND MATRICES (b) Let   7 2 − 23    X=  9  − 65 then X2• = X4• = £ £ − 32 − 65 ¤ ¤ Remark 46 We usually use an ‘underline’ notation like (3.2) for column vectors but will occasionally write X2 = X2• = − 32 if it is understood that X is a column matrix (i.e. a column vector). Similar remarks apply to row vectors (i.e. row matrices). 7. (a) If A is equation (3.12) above then A•2   2 =  −4  −7  A•1 =  (b) If Y = £ 5 − 72 Y•2 −1 5 4 13   −9 10 · ¸ 7 = − 2 −23 ¤ Y•5 = [−23] 8. (a) The vectors (3 × 1 column matrices) e1 , e2    1 e1 =  0  , e2 =  0 and e3 in ℜ3×1 are    0 0 1  , e3 =  0  0 1 We met these as rows in Chapter 1, 1.1.17, calling them i, j, and k respectively. (b) In ℜ4×1 9. We have and    1 0  0   1   e1 =   0  , e2 =  0 0 0    0 0  0   0   , e3 =   , e4 =   1   0  0 1    1 0 0 I3 =  0 1 0  0 0 1  1  0 I4 =   0 0 0 1 0 0 0 0 1 0  0 0   0  1     70 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 10. (a) Let  −5  7  A= 1 2 1 − 13 then A ∈ ℜ4×3 , while the transpose AT  −5 AT =  0 2 0 6 −1 9  2 −1   3  0 ∈ ℜ3×4 and 1 7 2 6 −1 −1 3  1 − 13 9  0 (b) The transpose of the column vector (3.2) is the row £ ¤ bT = b1 b2 ... bn and the transpose of the row vector (3.5) of A is a column vector. T T (c) [Ii• ] = I•i and [I•i ] = Ii• and I T = I, where I = In . 11. (a) An example of the product a b of a row a and a column b is   2 £ ¤ 3   −3 5 −7 4   −2  = (−3) (2) + (5) (3) + (−7) (−2) + (4) (9) = 59 9 (b) Another, using a general 2 × 3 matrix A = [aij ], and a general 3 × 4 matrix B = [bij ] is Ai• B•j = ai1 b1j + ai2 b2j + ai3 b3j (i = 1, 2; j = 1, 2, 3, 4) Note that such a product is only possible when the number of columns of A equals the number of rows of B. The significance of such products will be seen in subsection 3.3.4 below. Referring to linear combinations (item (3) in 3.1.1) we have: 1. (a) In ℜ4×1    11 µ ¶  1      2  1  2  (−6) +       −3   9  (3) +  −4  − 2 =  2 −1 − 56 3 (b) In ℜ1×5 = −5   7 2 − 23   £ ¤ £ (−2) p q 12 −3 s + 4 −1 r + 2 − 14 £ ¤ −2p − 4 −2q + 4r + 8 −2 16 −2s + 4 2. The span of the rows of I3 is clearly ℜ1×3 . Consider the column space of matrix  1  0 A=  0 0 1 1 0 0 1 1 1 0  1 1   1  1  35  − 17 2  47  −6 5 2 1 ¤ 71 3.1. THE ALGEBRA OF VECTORS AND MATRICES The span of the columns of A is the whole of ℜ4×1 , since for any choices of b1 , b2 , b3 and b4 ,           b1 1 1 1 1  b2   0   1   1   1             b3  =  0  (b1 − b2 ) +  0  (b2 − b3 ) +  1  (b3 − b4 ) +  1  b4 b4 0 0 0 1 See also subsections 3.1.6 and 3.1.9 below. 3. In ℜ2×2 (7) · −1 2 3 6 ¸ + (−3) · 0 8 −2 5 ¸ = = 3.1.4 · −1 2 3 6 ¸ (7) + ¸ · −7 −10 27 27 · 0 8 −2 5 ¸ (−3) Some basic algebraic properties For all m × n matrices A, B and C and scalars s and t, 1. A + (B + C) = (A + B) + C 2. A + B = B + A 3. O + A = A (associative law for addition). (commutative law for addition). (O = Om×n behaves like the number 0). 4. A + (−A) = O. 5. sA = As (by definition). 6. s (A + B) = sA + sB. 7. (s + t) A = sA + tA. 8. s (tA) = (st) A. 9. 1A = A. 10. Every vector x ∈ ℜn×1 can be uniquely written as a linear combination x = e1 x1 + e2 x2 + ... + en xn (Compare 1.1.17 in Chapter 1). Hence it is trivial that every x depends linearly on the unit vectors e1 , e2 ,..., en , in other words that these unit vectors span ℜn×1 . However, there are many other sets of vectors that have a similar property, as we shall see. In Chapter 1, No.15 of Exercise 2 we indicated that any three mutually orthogonal vectors in ℜ3 span the whole of space. You are asked to show this rigorously in Exercise 78, No.9. ¡ ¢T = A for any matrix A. 11. AT 12. Referring to the product (3.10), T (a b) = bT aT (3.13) 13. (a) Let u and v be column vectors with n entries and a a row with n entries. Then a (u + v) = a u + a v (3.14) a (us) = (a u) s = (sa) u (3.15) (b) If s is a scalar, 72 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES (c) Equations (3.14) and (3.15) combine so that for all scalars s and t: a (us + vt) = (a u) s + (a v) t (3.16) (d) More generally, Equation (3.16) can be generalized as follows: Suppose u1 , u2 , ..,uk are column vectors in ℜn×1 and s1 , s2 , ..,.sk are scalars and a ∈ ℜ1×n is a row vector. Then a (u1 s1 + u2 s2 + ... + uk sk ) = (a u1 ) s1 + (a u2 ) s2 + ... + (a uk ) sk (3.17) 14. (a) Quite similarly, if a and b are row vectors with n entries and u is a column with n entries, (a + b) u = a u + b u (3.18) (b) and for scalars t, (ta) u = t (a u) (3.19) (c) Equations (3.18) and (3.19) combine so that for all scalars s and t: (sa + tb) u = s (a u) + t (b u) (3.20) (d) The result (3.20) can be generalized as follows: Suppose a1 , a2 , ..,ak are rows in ℜ1×n and t1 , t2 , ..,.tk are scalars and u ∈ ℜn×1 is a column, then (t1 a1 + t2 a2 + ... + tk ak ) u = t1 (a1 u) + t2 (a2 u) + ... + tk (ak u) 3.1.5 (3.21) Some proofs The above properties and their proofs are similar to those for vectors discussed in Chapter 1, subsections 1.1.6 and 1.1.9, equation (1.8). See No.7 of Exercise 47 below. We prove a few properties as illustrations. Proof of property (6) that s (A + B) = sA + sB. The (i, j) entry in s (A + B) is s (aij + bij ). But s (aij + bij ) = saij + sbij , which is the (i, j)− entry in sA + sB. Since s (A + B) and sA + sB have the same entries, they are equal. Using the alternate notation (equation 3.1, ³ ´ [s (A + B)]ij = s [A + B]ij = s [A]ij + [B]ij = s [A]ij + s [B]ij = [sA]ij + [sB]ij = [sA + sB]ij Proof of property (11) h¡ AT ¢T i ij £ ¤ = AT ji = [A]ij Proof of equation (3.14) With obvious notation, entry i in the column u + v is ui + vi . Hence, a (u + v) = a1 (u1 + v1 ) + · · · + an (un + vn ) = (a1 u1 + a1 v1 ) + · · · + (an un + an vn ) = (a1 u1 + · · · + an un ) + (a1 v1 + · · · + an vn ) = au + av 73 3.1. THE ALGEBRA OF VECTORS AND MATRICES Or, using P - notation, a (u + v) n X = i=1 n X = ai (ui + vi ) = n X (ai ui + ai vi ) i=1 ai ui + i=1 n X ai vi i=1 = au + av Proof of equation (3.15). Again with obvious notation, the j th entry in the vector us is uj s. Hence a (us) = a1 (u1 s) + · · · + an (un s) = (a1 u1 + · · · + an un ) s = (a u) s Proof of equation (3.16). By equation (3.14), a (us + vt) = a ( us) + a (vt). Using equation (3.15), a ( us) + a (vt) = (a u) s + (a v) t. Exercise 47 1. Consider 23 in Chapter 2 and the matrices in 2.2.1 referred to there. For each of these as well as for the following A ∈ ℜm×n find m and n and, using our notation in equation (3.5) and equation (3.6), write down all rows and columns of A.   −2 5 (a) A =  3 − 47  1 −1 2   2 4 7 0 5 (b) A =  0 9 −1 34  − 35 1 32 0   p f c (c) A =  b q p  d f r   −3 −1 8 0 1   2 −7 9 2  (d) A =  2  0 −2 0  3 1 −4 5 −6 2. Write down C if it is given that (a) C has three columns C•1 = (b) C has three rows and C1• = (c) C = kI5 (k a scalar) 3. Find:  · £ · · ¸ ¸ 0 2 , C•3 = 2 , C•2 = 7 3 ¤ £ ¤ £ 5 −z , C2• = 0 a b , C3• = −11 x 4 −5 q ¸   1 0  3  1   3   −2  2     5   (a)   2  2 +  − 2  (−6) +  1  5 +  −1 3 10 4 0 −1 − 16       6 −2 11 0 0 6 (b) 4  −1 8  + 3  2 1  − 2  9 4  7 2 −4 5 −3 7 −1   1 2      (−1)  2 3 ¤ . 74 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES (c) ¡1 2 +p ¢£ ¤ £ 2 −4 6 3 − 32 (2p + q) −3 9 18 £ 4. Verify equation (3.16) in case a = −3 ¤T £ 5 1 − 6 3 − 21 1 , s = 2 and t = −6. 5 2 2 −1 ¤ ¤ £ ¤ −6 + 14 (p − q) 0 −24 8 6 ,u= £ 1 2 −1 −3 1 4 ¤T ,v= 5.  Find the transposes  of the matrices ¸ · − 56 −2 12 £ ¤ v −3 7 a  1 8 −9 , x −u c d and . −9 w 0 z 1 −7 2 3 £ ¤ £ ¤ 6. For the matrices A in No.5 compare Ai• and AT •i as well as A•j and AT j• . Can we £ ¤ £ ¤ T T say that [Ai• ] = AT •i and [A•j ] = AT j• ? Are these true for any matrix A? 7. Prove the above basic properties in subsection 3.1.4. For equation (3.17) and equation (3.21) you may assume k=3, since the proofs of the general cases are quite similar. Hint 48 Study the above proofs in 3.1.5 as well as your own proofs of the statements in 1.1.6 and 1.1.9 of Chapter 1. Apart from the fact that Chapter 1 concentrated on vectors in ℜ1×3 or ℜ1×2 most of the proofs for general vectors go through almost word-for-word. 8. (a) i. In ℜm = ℜm×1 let x be a linear combination of the vectors a1 , a2 and a3 . Suppose that each of a1 , a2 and a3 is a linear combination of e, f and g. Show that x is a linear combination of e, f and g. ii. Express the this result in terms of the ‘span’ concept. (b) Is the above result 8(a)i true for p × q matrices X, A1 A2 , A3 , E, F and G (in place of the vectors x,...,g)? Can you prove your answer? (c) Find a statement that generalizes No.8a: Suppose that x ∈ ℜm depends linearly on a1 , a2 , . . . , ak and each vector aj depends linearly on b1 , b2 , . . . , bn , then ...? (You need not prove your statement yet, but see Exercise 63, No.13). T 9. Show that, provided A and B are of the same size, (A + B) notation of equation 3.1). = AT + B T . (Use the Solution: T [(A + B) ]ji = [A + B]ij = [A]ij + [B]ij = [AT ]ji + [B T ]ji = [AT + B T ]ji . 10. For the following pairs A, B of matrices find all possible products Ai• B•j and arrange them in matrix form in what you think is a nice way. (a) A= (b) Solution: ¸ · 61 −32 −65 19 10 4 · 2 −9 3 1  ¸  −1 5 0 , A= 3 2 −4   −1 5 0 , A= 3 2 −4 , B= · 8 2 −1 −5 4 7 B= · −1 2 3 0 B= · −1 2 3 0 ¸ 1 0 −4 7 1 0 −4 7 ¸ ¸ 75 3.1. THE ALGEBRA OF VECTORS AND MATRICES 11. Show that with appropriate definitions in ℜn×1 (or in ℜ1×n ) properties 1 - 5 of subsection 1.1.13 in Chapter 1 are valid. Notice in p particular No.4, which allows one to define the length |x| of a vector x ∈ ℜn×1 as |x| = xT x. This in turn gives us a definition of the distance between a and b as |a − b|. 3.1.6 Ax viewed in two equivalent ways 1. In terms of the products Ai• x: Let A = [aij ] be m × n and x = [xj ] be n × 1. In Chapter 2 we considered the system of m equations for n unknowns symbolized by Ax = b. In 2.2.4 of the same chapter and in other examples we saw that Ax is a column with m entries. Its ith entry is (compare equation (3.10)): Ai• x = ai1 x1 + ai2 x2 + ... + a1n xn Hence,    Ax =   A1• x A2• x .. . Am• x      (3.22) We met such a product in 2.2.4 of Chapter 2. The entry in the ith row of Ax is the ‘dot product’ Ai• x of row Ai• and column x. 2. Ax as a linear combination of the columns of A with coefficients x1 , . . . , xn : Ax = A•1 x1 + A•2 x2 + ... + A•n xn (3.23) This view was anticipated in Chapter 2, for example in equations (2.17, (2.19) and (2.25). Like the previous view, it can immediately be seen from the definitions. The following examples should make this clear. Example 3.1.1 Let A= and x ∈ ℜ3×1 . Then · a11 a21 a12 a22 a13 a23 ¸   x1 a11 a12 a13  x2  Ax = a21 a22 a23 x3 · ¸ · ¸ · ¸ · ¸ a11 x1 + a12 x2 + a13 x3 a11 a12 a13 = = x1 + x2 + x3 a21 x1 + a22 x2 + a23 x3 a21 a22 a23 · ¸ So  x1 −3 6 5  x2  2 −4 7 x3 ¸ · ¸ ¸ · ¸ · · 5 6 −3 −3x1 + 6x2 + 5x3 x3 x2 + x1 + = = 7 −4 2 2x1 − 4x2 + 7x3 · ¸  76 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Example 3.1.2 As a more general example, consider the general system of 4 equations for five unknowns, equation (2.11) of the previous chapter. We found there equation (2.12):     a11 x1 + a12 x2 + a13 x3 + a14 x4 + a15 x5 A1• x  A2• x   a21 x1 + a22 x2 + a23 x3 + a24 x4 + a25 x5     Ax =   A3• x  =  a31 x1 + a32 x2 + a33 x3 + a34 x4 + a35 x5  A4• x a41 x1 + a42 x2 + a43 x3 + a44 x4 + a45 x5           a11 a12 a13 a14 a15  a21   a22   a23   a24   a25           =   a31  x1 +  a32  x2 +  a33  x3 +  a34  x4 +  a35  x5 a41 a42 a43 a44 a45 Example 3.1.3 In the previous example let x = ej be the unit vector of equation (3.7) with n = 5 entries . Then Ae2 = A•2 (the second column of A) since x2 = 1 and x1 = x3 = x4 = x5 = 0. Similarly, for j = 1, · · · , 5, we have Aej = A•j . 3.1.7 Aej = A•j in general From the above example it is clear that if A is an m × n matrix and ej is the j th unit vector as in equation (3.7), then Aej = A•j (j = 1, · · · , n) (3.24) Equivalently, if I is the n × n identity matrix, AI•j = A•j 3.1.8 (j = 1, · · · , n) Ax as a product of A and x Henceforth we will regard Ax as the product of the m × n matrix A by the n × 1 column matrix x. 3.1.9 Connection between linear combinations and solving equations, the column space This was also anticipated in Chapter 2 (for example, in Exercise 32 and in No.2b from Exercise 35. Solving a system of equations Ax = b for x is exactly the same as looking for coefficients xj so that b is a linear combination of the columns of A: Ax = b if, and only if b = A•1 x1 + A•2 x2 + ... + A•n xn In terms of the concept of ‘span’ (see item 2 in subsection 3.1.1), this is saying that b is in the span of the columns of A: b ∈ sp(A•1 , A•2 , . . . , A•n ) Example 3.1.4 Is a = [1, 5, 2, 7] a linear combination of u = [1, 3, 0, 5] and v = [0, 1, 1, 1, 1]? Another way to put it: ’Does a depend linearly on u and v?’ In other terminology we have been using, ’is it true that a ∈ sp(u, v)?’ Solution: To answer this, we change the vectors into column vectors (i.e. we use uT , v T , aT ) and try to solve the following matrix equation for x:     1 0 · 1 ¸  3 1  x1  5       0 1  x2 =  2  7 5 1 3.1. THE ALGEBRA OF VECTORS AND MATRICES 77 In other words, we try to express the column on the right as a linear combination of the columns of the 4 × 2 coefficient matrix. Use the array of detached coefficients: 1 3 0 5 1 0 0 0 0 1 1 1 1 2 2 2 1 0 0 0 0 1 0 0 1 2 0 0 0 1 1 1 1 5 2 7 −3R1 + R2 −5R1 + R4 This is equivalent to −R2 + R3 −R2 + R4 Hence x2 = 2 and x1 = 1, and aT = uT + v T 2, so a = u + 2v and a depends linearly on u and v. Remark 49 We are in the habit of solving equations in the form Ax = b (so we are taking linear combinations of columns), but we could just as well have left the vectors as rows: Then we would be taking linear combinations of rows and using elementary column operations. Remark 50 Any result about columns has a corresponding result about rows and vice-versa. This is a fact you will appreciate more and more as we go ahead. Example 3.1.5 Consider from Chapter 2, Example 2.1.4, equation (2.7):    −1 1 1 3  1 3 4  x =  −1  5 2 4 7  We found from the detached array in echelon form (2.20) in Chapter 2 that there is no solution. £ ¤T Thus the right-hand side −1 −1 5 is not a linear combination of the columns of the coefficient matrix A. Example 3.1.6 Let   2 5 −11 A =  −3 −6 12  4 7 −13 (3.25) What sort of solutions does the homogeneous system Ax = 0 have? Is some column of A a linear combination of the other columns of A? Solution: Solve the homogeneous linear system      2 5 −11 x1 0  −3 −6 12   x2  =  0  4 7 −13 x3 0 x = 0 is in any case the (trivial) solution. As usual, to find the general solution, use Gaussreduction to find a row echelon form of the coefficient matrix A: 78 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES   −11 −3  0 2 5  0 1 0 0 (3.26) and the general solution with parameter t is     x1 −2t  x2  =  3t  x3 t (Check this). If we  2 5  −3 −6 4 7 put t = 1, we get   −11 −2 12   3  = −13 1 (3.27)       2 5 −11  −3  (−2) +  −6  3 +  12  4 7 −13   0  0  0 = (3.28) Since each coefficient −2, 3 and 1 is non-zero, every column of A is a linearly dependent on the other two columns. For example,       −11 2 5 1 3 A•1 =  −3  =  −6  +  12  2 2 −13 4 7 Example 3.1.7 Let   2 5 1 A =  −3 −6 0  4 7 −2 (3.29) Can we find a non-trivial solution x to Ax = 0? Is any column of A linearly dependent on the other two? Solution: Reduce A to row-echelon form: 2 5 1 3 3 0 23 , 2 2 R1 + R 2 0 −3 −4 (−2) R1 + R3 So a row EF for A is 2 0 0 5 1 3 2 3 2 0 −1 2R2 + R3   2 5 1 C= 0 1 1  0 0 1 , 2 5 1 0 1 1 0 0 1 2 3 R2 (−1) R3 Thus Ax = 0 has only the trivial solution   0 x= 0  0 Hence no column of A can depend linearly on the other two. For example, should we have A•1 = A•2 β + A•3 γ, then   1 x =  −β  −γ would be a non-trivial solution to Ax = 0 because 1 6= 0. 3.1. THE ALGEBRA OF VECTORS AND MATRICES 79 Remark 51 Questions of the sort “which columns of A are linearly dependent on other columns of A?” are further explored below in 3.4.22. When Ax = 0 has only the trivial solution x = 0 we will call the columns of A linearly independent. See subsection 3.4.1. 3.1.10 Fundamental properties of the product Ax Let A be an m × n matrix and suppose u1 , u2 are vectors in ℜn = ℜn×1 and s1 , s2 are scalars. Then (3.30) A (u1 s1 + u2 s2 ) = (Au1 ) s1 + (Au2 ) s2 Proof Consider the ith entry of the left-hand side of (3.30). By equation (3.16) this is Ai• (u1 s1 + u2 s2 ) = (Ai• u1 ) s1 + (Ai• u2 ) s2 As (Ai• u1 ) s1 + (Ai• u2 ) s2 is by definition the ith entry of the right-hand side of equation (3.30), the result follows. More generally, equation (3.30) easily extends to the following general result: Let A be an m × n matrix and suppose u1 , u2 , ..,uk are vectors in ℜn = ℜn×1 and s1 , s2 , ..,.sk are scalars. Then A (u1 s1 + u2 s2 + ... + uk sk ) = (Au1 ) s1 + (Au2 ) s2 + ... + (Auk ) sk (3.31) Exercise 52 1. Find Ix if I = In is the n × n identity matrix as in equation (3.8) and x ∈ ℜn×1 . 2. Let A be m × n. Show that βA2• + A1• is a linear combination of A2• , · · · , Am• if, and only if, A1• is a linear combination of A2• , · · · , Am• . 3. Suppose that v 1 , v 2 , .... , v k are solutions to a system Ax = b of m equations in n unknowns x1 , . . . , xn . Let w = v 1 α1 + v 2 α2 + · · · + v k αk be a linear combination of these vectors. (a) Show that if the system is homogeneous i.e. b = 0, then w is also a solution, i.e. Aw = 0 Solution: Aw = A (v 1 α1 + v 2 α2 + · · · + v k αk ) = (Av 1 ) α1 + (Av 2 ) α2 + · · · + (Av k ) αk = 0α1 + · · · 0αk = 0 (b) Find a simple example to show that (a) may fail if b 6= 0. (c) If α1 + α2 + · · · + αk = 1 show that Aw = b. 4. (a) Consider Exercise 35 of Chapter 2, Numbers 3, 4a, 5a and 6, 11a and 11b. These require solutions to a matrix equation Ax = b. For which of these is b a linear combination of the columns of A? Consider those for which b is a linear combination of the columns of A. For which of these are the coefficients in the linear combination unique? For those which are not, express b in two ways as linear combinations of the columns of A. (b) * In Problem 13 from Exercise 35 of Chapter 2. For which values of a, b, c, d is £ ¤T a b c d a linear combination of the columns of A? 5. Prove the following theorem, which was anticipated in No.10 of Exercise 35 in Chapter 2. Let Ax = b represent m equations in n unknowns. Suppose that the system has at least 80 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES one particular solution, say p. Then if Av = 0 so is x = p + v a solution and all solutions to Ax = b have this form. Conclude that if m < n and Ax = b has at least one solution, then the system has infinitely many solutions. In fact, if there are k non-pivot columns then the general solution contains k independent parameters. (Use 2.2.15 from Chapter 2.) 3.2 Matrices as Mappings Let A be an m × n matrix. In what follows ℜn stands for the set of column vectors with n entries, ie. ℜn = ℜn×1 . The matrix A converts or transforms the vector x ∈ ℜn into the vector Ax ∈ ℜm . In other words, letting y = Ax, have a function of the independent variable x with y = Ax being the dependent variable. In the language of set-theory, A is a function or transformation or mapping with domain ℜn and range in the set ℜm : A : ℜn → ℜm (3.32) A maps the vector x to the vector y = Ax. Symbolically, x → Ax x−−−− A−−→Ax or 3.2.1 The range The range of x → Ax is the set of vectors y such that Ax = y for some x. In other words, the range is just the set of all linear combinations of the columns of A. The range is therefore the column space of A (see item 2 in subsections 3.1.1 and 3.1.9). See also section 3.6, where some important properties of subspaces are developed. Remark 53 If c is a fixed column vector with m entries then x → c+Ax is also a mapping from ℜn to ℜm . However, we will concentrate mainly on the case c = 0 (linear transformations). 3.2.2 Solving equations is the inverse process of the mapping x → Ax Given the mapping (3.32) and a vector y ∈ ℜm , we can ask: Is there an x ∈ ℜn such that y = Ax? This is precisely the problem of searching for a solution x to the system of equations y = Ax, as discussed in 3.1.9. In Chapter 2 we found that some systems have a solution, others not. Example 3.2.1 Consider the Cartesian plane ℜ2 . In equation (1.27) of Chapter 1 we found the reflection of the point (p1 , p2 ) in the line ℓ lying in the x − y plane and which passes through the origin making an angle of 30◦ with the positive x−axis: ¶ µ 1√ 1√ 1 1 p1 + 3p2 , 3p1 − p2 p′ = 2 2 2 2 Hence considered as a mapping, the matrix " A= transforms the point x = · x1 x2 ¸ 1 √2 3 2 1 2 √ 3 − 21 # into its reflection in the above line: y=A · x1 x2 ¸ = · √ 1 1 2x √1 + 2 13x2 1 2 3x1 − 2 x2 ¸ 81 3.2. MATRICES AS MAPPINGS (We have written p and p′ as columns x and y respectively). See Figure 3.1. Geometrically, it is clear that the range of this reflection is ℜ2 . This can be seen algebraically by showing that, whatever the values of y1 and y2 , the matrix equation · √ 1 1 2x √1 + 2 13x2 1 2 3x1 − 2 x2 ¸ = · y1 y2 ¸ always has a solution for x1 and x2 . (Show this). x x2 - axis line ℓ A O 30 x1 - axis ❯ y = Ax Figure 3.1 Example 3.2.2 The simpler matrix A= · −1 0 0 1 ¸ represents a reflection in the x2 −axis. The range is again ℜ2 . Example 3.2.3 The matrix A= · 1 0 0 0 1 0 ¸ T represents the mapping that projects the point x = [x1 , x2 , x3 ] ∈ ℜ3 in space onto the point y = Ax = [x1 , x2 ]T ∈ ℜ2 in the x1 − x2 plane. The range is obviously ℜ2 . Example 3.2.4 Consider a fixed column vector a with three entries and non-zero column matrix   u1 A =  u2  u3 The mapping, A : ℜ3 → ℜ1 = ℜ, and t → a + At represents a parametric equation of a straight line in space passing through the point a, as in Chapter 1, 1.2. The only difference is that here we are using columns instead of rows. 82 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Example 3.2.5 Consider the matrix  u1 A =  u2 u3  v1 v2  v3 in which the columns are non-zero and non-parallel. The mapping · ¸ · ¸ s s →A t t now represents a generic equation of a plane going through the origin. See Chapter 1, subsection 1.3.1. Example 3.2.6 The matrix   −4 0 0 A =  0 −4 0  0 0 −4 transforms the point x ∈ ℜ3 into the point y = Ax = −4x ∈ ℜ3 . This represents a stretching x → 4x of x by a factor of 4 followed by an inversion x → −x in the origin O. ¸ · 0 −1 as mapping of ℜ2 to ℜ2 is Example 3.2.7 The matrix A = 1 0 · ¸ · ¸· ¸ −x2 0 −1 x1 y= = x1 1 0 x2 and represents a rotation of the point with position vector x through 90◦ (anticlockwise looking down on the plane). Convince yourselves of this by looking at a few values of x, but we will return to this below in 3.3.11. ¸ · 1 α , interpreted as a mapping, leaves x2 fixed and Example 3.2.8 The matrix A = 0 1 moves x1 to x1 + αx2 . It is known as a shear transformation. · ¸ x1 + αx2 Ax = x2 See Figure 3.2 where α > 0. The box OBCD is transformed into the quadrilateral OEF D. The range is again ℜ2 , as can very easily be seen. (Exercise 56 No.3). 3.2.3 Ax = Bx for all x if, and only if, A = B It is obvious that if A = B then Ax = Bx for all x. Conversely, if Ax = Bx for x ∈ ℜn then A and B must have the same number n of columns and also the same number m of rows (the number of rows in Ax = Bx). Furthermore, if ej ∈ ℜn is the j th unit vector, then by equation (3.24), A•j = Aej = Bej = B•j for j = 1, · · · , n. Hence A and B have the same columns and so A = B. 3.3 Linear Transformations Because of equation (3.31), the mapping (3.32) is said to be linear and is called a linear transformation. In abstract terms, a mapping T : ℜn → ℜm (3.33) 83 3.3. LINEAR TRANSFORMATIONS x2 B O x ..................................... C E ... .. .. .. .. .. .. .. D ... . .. αx2 x1 . Ax ... F .. .. .. .. .. .. .. . x1 + αx2 Figure 3.2 is called a linear transformation if for all u1 and u2 in ℜn and all scalars s1 and s2 it follows that T (u1 s1 + u2 s2 ) = T (u1 ) s1 + T (u2 ) s2 (3.34) An equivalent definition is the following. The mapping (3.33) is a linear transformation if, and only if, for all u and v in ℜn and all scalars s, T (u + v) = T (u) + T (v) (3.35) T (us) = T (u) s (3.36) For it is obvious that if (3.34) holds then so will (3.35) and (3.36). On the other hand, suppose that these last two conditions hold for the mapping T . Then for any u1 and u2 scalars s1 and s2 , T (u1 s1 + u2 s2 ) = T (u1 s1 ) + T (u2 s2 ) = T (u1 ) s1 + T (u2 ) s2 Remark 54 By putting s = 0 in (3.36) we see that a necessary condition for T to be linear is that it maps the zero vector of ℜn to the zero vector of ℜm . Most transformations are decidedly non-linear. Consider, for example, T : ℜ1 → ℜ1 given by T (x) = x2 . Here T (0) = 0 but T fails (3.35), since (e.g.) T (1 + 1) = 22 6= T (1) + T (1) = 2. 3.3.1 An extension of equation (3.34) The result expressed by equation (3.34) easily extends to the following: Let T be a linear transformation and suppose vectors u1 , u2 , ..,uk in ℜn are given as well as scalars s1 , s2 , ...sk , then T (u1 s1 + u2 s2 + ... + uk sk ) = T (u1 ) s1 + T (u2 ) s2 + ... + T (uk ) sk We use this result to show that (3.37) 84 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.3.2 A linear transformation T : ℜn → ℜm is defined by a unique matrix mapping x → Ax Proof : We have seen that the mapping x → Ax is a linear transformation. Conversely, let T : ℜn − ℜm be a linear transformation. We show that there is a unique matrix A such that T (x) = Ax for all x ∈ ℜn . ¡ ¢ Let ej be one of the n unit vectors (3.7). By assumption, T ej is a column vector with m entries. Since x ∈ ℜn has the expression x = e1 x1 + e2 x2 + ... + en xn , we deduce from equation (3.37) that (3.38) T (x) = T (e1 ) x1 + T (e2 ) x2 + ... + T (en ) xn In other words, by equation (3.23), T (x) = Ax (x ∈ ℜn ) (3.39) ¡ ¢ = T ej for j = 1, · · · , n. This is saying that A is the Here the matrix A is defined by A•j m × n matrix £ ¤ A = T (e1 ) T (e2 ) ... T (en ) The matrix A in (3.39) is obviously unique. 3.3.3 A¡linear transformation T is completely determined by its effect ¢ T ej on the unit vectors ej This is just a restatement of equation (3.38). Remark 55 Since a linear transformation is just a mapping determined by a matrix, why bother with the concept ‘linear transformation’ ? One reason is that our intuition often suggests that a transformation is linear. Example 3.3.1 In equation 1.14 of Chapter 1 we found the projection Q of a point P = (p1 , p2 , p3 ) on the line t (−2, −1, 1): q= µ 1 1 1 1 1 1 1 1 2 p1 + p2 − p3 , p 1 + p2 − p3 , − p1 − p 2 + p3 3 3 3 3 6 6 3 6 6 ¶ Our geometric intuition strongly suggests that this should represent a linear transformation. Prove this by finding the matrix determining it. Solution: Writing x1 = p1 , x2 = p2 , x3 = p3 and using column vectors, the above projection formula becomes     2  1 − 13 q1 x1 3 3 1  q2  =  1 − 16   x2  3 6 1 1 1 q3 x3 −3 −6 6 This is the mapping x → q = Ax and is indeed a linear transformation. Example 3.3.2 Find a matrix representation of the mapping which sends the point P to the foot of the perpendicular from P to the plane x1 + 2x2 + x3 = −4. Show that this mapping is not linear but that the projection of P on the plane x1 + 2x2 + x3 = 0 is a linear transformation. Find the matrix that reflects in this plane. 85 3.3. LINEAR TRANSFORMATIONS Solution: From equation (1.22) in Chapter 1, with P = (p1 , p2 , p3 ) we found for the foot Q q = (p1 , p2 , p3 ) − 4 + p1 + 2p2 + p3 (1, 2, 1) 6 Letting x1 = p1 , x2 = p2 , x3 = p3 , and again using column vectors, this becomes      2    − x1 +2x6 2 +x3 q1 x1 3  q2  =  x2  −  4  +  −2 x1 +2x2 +x3  3 6 2 q3 x3 − x1 +2x6 2 +x3 3 This cannot be linear since x = 0 does not map to q = 0. However, from equation (1.23) we found the formula for the projection Q of P = (p1 , p2 , p3 ) on the plane x1 + 2x2 + x3 = 0 to be q = (p1 , p2 , p3 ) − p1 + 2p2 + p3 (1, 2, 1) 6 In matrix form with P = x, this reads       − x1 +2x6 2 +x3 q1 x1  q2  =  x2  +  −2 x1 +2x2 +x3  6 q3 x3 − x1 +2x6 2 +x3   5 1 1 6 x1 − 3 x2 − 6 x3 =  − 31 x1 + 31 x2 − 31 x3  − 1 x1 − 31 x2 + 65 x3  56   − 13 − 61 x1 6 1 − 31   x2  =  − 13 3 1 1 5 x3 −6 −3 6 The matrix that reflects in the plane x1 + 2x2 + x3 = 0 is   2    5 − 31 − 16 1 0 0 6 3 1 − 31  −  0 1 0  =  − 32 2  − 31 3 5 0 0 1 − 61 − 13 − 31 6 This also defines a linear transformation. − 32 − 13 − 23 (3.40)  − 13 − 23  2 3 Exercise 56 Partial Solutions 1. Find a matrix formula for the foot Q of the perpendicular from P = x ∈ ℜ3 to the plane 2x + z − 3 = 0. Do the same for the plane 2x + z = 0, showing in the latter case that we get a linear transformation. Write down the matrix A of this transformation, as well as the matrix that reflects in the plane 2x + z = 0. Hint 57 Refer to 1d from Exercise 16 in Chapter 1. There you found the foot Q of the perpendicular from (p1 , p2 , p3 ) to the plane 2x + z − 3 = 0: ¶ µ 2 6 4 2 3 1 q= p1 − p3 + , p 2 , p3 − p1 + 5 5 5 5 5 5 Now use appropriate columns in place of row vectors. 2. (a) Consider equation (1.13) in subsection 1.2.5 of Chapter 1 for the projection of a point on a line ℓ that passes through the origin. Using column vectors find a matrix A such that the foot Q of the perpendicular from the point x is given by q = Ax. 86 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES (b) Consider equation (1.24) from Exercise 16 in Chapter 1. Assume that the plane goes through the origin. Using column vectors find a matrix equation q = Ax for the foot Q of the perpendicular from the point x onto the plane (the projection of P onto the plane). The matrices in 2a and 2b are called projection matrices. (c) Specialize (a) in case the transformation is restricted to the Cartesian plane, i.e. is from ℜ2 to ℜ2 . 3. Find the range of the shear transformation (Example 3.2.8) and prove that your statement is correct. 4. Use Exercise 20 from Chapter 1 to find linear transformations describing the following: (a) The reflection in a line that passes through the origin. Remark 58 Observe that this matrix actually represents a rotation through π radians about the given line. Contrast this with the situation when we restrict ourselves to the Cartesian plane in Exercise 63, No.8, where a reflection cannot be a rotation. (b) The reflection in a plane that passes through the origin. (c) Specialize (a) in case the transformation is restricted to the Cartesian plane, i.e. is from ℜ2 to ℜ2 . (d) What do you think the ranges of these mappings are? 5. In the x1 - x2 plane let A be the rotation matrix of Example 3.2.7 and suppose B is the 2 × 2 matrix that reflects in the x1 −axis. (a) Find matrices C and D such that for all vectors x ∈ ℜ2 we have Cx = B (Ax) and Dx = A (Bx). (b) Show that C and D are reflection matrices and describe them geometrically. Remark 59 In the next section, we will regard C as the product of B and A, in that order: C = BA. Similarly, D = AB. See Example 3.3.3 and notice that AB 6= BA. 6. Let A be one of the projection matrices from the previous exercises. Show geometrically why you expect that A (Ax) = Ax for all points x. Now suppose that A is one of the reflection matrices. What should A (Ax) be? See also Exercise 63, No.10 below. 3.3.4 The product AB of two matrices Let A be an m × k matrix and B a k × n matrix. As linear transformations, B : ℜn → ℜk Consequently, for x ∈ ℜn and A : ℜ k → ℜm . x → A (Bx) (3.41) defines a transformation from ℜn to ℜm . See Figure 3.3. Symbolically x−−−− B−−→Bx−−−− A−−→A (Bx) It is essential to observe that (3.41) only makes sense if the number of columns of A equals the number of rows of B (here both are k). Remark 60 We have to read this as: first apply B to x, then apply A to the result Bx. This “backward” reading is due to our traditional functional notation: For ordinary functions f and g we write the composition of f and g as (f ◦ g) (x) = f (g (x)). 87 3.3. LINEAR TRANSFORMATIONS x B ℜn ℜk ③ Bx A ✰ A(Bx) ℜm Figure 3.4 We will define the product P = AB in such a way that (x ∈ ℜn ) P x = A (Bx) (3.42) In other words, so that AB defines the composition of the transformations A and B. 3.3.5 There is one, and only one, matrix P satisfying equation (3.42) Proof Suppose (3.42) holds. Put x = ej , one of the n unit vectors (3.7). Then ¡ ¢ P•j = P ej = A Bej = AB•j (j = 1, · · · , n) (3.43) In other words, for (3.42) to hold (3.43) is the only choice for P . On the other hand, (3.41) is a linear transformation: See No.15 in Exercise 63). Therefore its matrix can only have columns (3.43). We therefore define the matrix AB as follows: 3.3.6 Definition If A is m × k and B is k × n, then AB = Equivalently, £ AB•1 AB•2 [AB]•j = AB•j ··· AB•n (j = 1, · · · , n) With this notation So writing AB for P we rewrite (3.42) as (AB) x = A (Bx) (x ∈ ℜn ) ¤ (3.44) (3.45) 88 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.3.7 Alternative more symmetric definition of AB Equation (3.44) seems biased towards columns, but that this is not the case can be seen by looking at the entries of (AB)•j . Using (3.22) with x = B•j and the notation of equation (3.1), [AB]ij = Ai• B•j (i = 1, · · · , m; j = 1, · · · , n) (3.46) The entry in row i and column j of AB is the product of row Ai• and column B•j . Equation (3.46) can just as well be taken as the definition of AB and in fact is the standard definition. This equation was anticipated in Exercise 47, No.10a above. Whatever is true for columns (rows) has its exact counterpart for rows (columns). From (3.46) we see that [AB]i• = Ai• B (i = 1, · · · , m) (3.47) or more fully,    AB =   A1• B A2• B .. . Am• B      Expanding (3.46), we get using the usual convention for matrix entries and [AB]ij = k X r=1 P -notation, air brj = ai1 b1j + · · · + aik bkj As already remarked, this equation only makes sense if the number of columns of A equals the number of rows of B. We may write (AB)m×n = Am×k Bk×n Example 3.3.3 Let B be the reflection matrix of Example 3.2.1 and A that of the rotation matrix A of Example 3.2.7. Find the matrix of (i) first rotating then reflecting and (ii) the other way round, first reflecting then rotating. See No.5 in Exercise 56. Solution (i) If we first rotate through 90◦ then reflect in the x1 −axis we get the product  ¸  ¸ · · £ ¤ −1 £ ¤ 0 ¸ ¸· · ¸ · 1 0 1 0  0 −1 0 ¸  1 0 1 ¸ 0 −1   · · = £ BA = ¤ 0 £ ¤ −1  = −1 0 1 0 0 −1 0 −1 0 −1 1 0 Solution (ii) If we first reflect and then rotate we obtain  ¸ ¸ · · £ ¤ £ ¤ 1 0 ¸ ¸· · 0 −1 0 −1  1 0 0 −1 · −1 ¸ · 0¸ = AB = £ ¤ ¤ 1  £ 0 −1 1 0 0 1 0 1 0 −1 0 Example 3.3.4 In Exercise 47, No.10a, A= Find AB. · 2 −9 3 1 ¸ , B= · 8 2 −5 4 −1 7 ¸   =  · 0 1 1 0 ¸ 89 3.3. LINEAR TRANSFORMATIONS Solution: From (3.46), AB   =   = · ¸ 8 2 −9 · −5 ¸ £ ¤ 8 3 1 −5 ¸ 61 −32 −65 19 10 4 £ ¤ · Example 3.3.5 Find yB , where y = £ y1 £ y1 y2 ¸ −1 2 −9 · 7 ¸ £ ¤ −1 3 1 7 ¸ 2 2 −9 · 4¸ £ ¤ 2 3 1 4 £ ¤ y2 ¤ · ¤ · £ ¤ ·     and B is as in the previous example:. 8 2 −1 −5 4 7 ¸ (3.48) Solution: yB = = = £ · £ y1 £ y2 y1 ¸ 8 2 −1 −5 4 7 ¸ · £ ¤ 8 y1 y2 −5 ¤ · 8y1 − 5y2 2y1 + 4y2 y2 ¤ · −y1 + 7y2 2 4 ¤ ¸ £ y1 y2 ¤ · −1 7 ¸ ¸ Example 3.3.6 Express equation (3.48) as a linear combination of rows of B. Solution: In (3.23) we found that the product Ax is a linear combination of the columns of A with coefficients xk . In exactly the same way we find the product (3.48) to be a linear combination of the rows of B with coefficients y1 and y2 : = = = 3.3.8 y1 B1• + y2 B2• £ ¤ £ ¤ y1 8 2 −1 + y2 −5 4 7 £ ¤ 8y1 − 5y2 2y1 + 4y2 −y1 + 7y2 yB For a row y, the product yB is a linear combination of the rows of B with coefficients yi Let B be a k × n matrix and y = [y1 , · · · , yk ] a row vector with k entries. Then yB = y1 B1• + · · · + yk Bk• (3.49) The reasoning is just as in the case of (3.23) except that now we are multiplying on the left of B by a row y and taking a linear combination of rows of B. Like equation (3.22) we have yB = £ yB•1 yB•2 which is just a special case of equation (3.44). ··· yB•n ¤ 90 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.3.9 The rows of AB as linear combinations of rows of B and the columns of AB as linear combinations of columns of A Let A = [air ] be m × k and B = [brj ] be k × n. In the product (3.46), we have 1. [AB]i• = ai1 B1• + ai2 B2• + · · · + aik Bk• (i = 1, · · · , m) (3.50) 2. [AB]•j = A•1 b1j + A•2 b2j + · · · + A•k bkj (3.51) To see equation (3.50), use equations (3.47) and (3.49) with y = Ai• . In words: “Row i of AB is a linear combination of the rows of B with coefficients taken from row i of A”. To see equation (3.51) use equations (3.45) and our earlier result equation (3.23) with x = B•j . In words: “Column j of AB is a linear combination of the columns of A with coefficients taken from column j of B”. The above examples illustrate these statements, as do the following. Example 3.3.7 Find AB in two ways: (i) in terms of linear combinations of rows of B and (ii) in terms of linear combinations of columns of A, given that A= · 2 1 −1 3 ¸ and B = · 5 4 −7 0 ¸ Solution: (i) (ii) In terms of the rows of AB: ¸ · A1• B AB = A2• B  · £ ¤ 5 2 1  −7 · =   £ ¤ 5 −1 3 −7 ¸ · 3 8 = −26 −4 ¸  4 £ ¤ £ ¤ ¸ · 0 ¸  2 £5 4 + = ¤ 1 £−7 0 ¤  4 (−1) 5 4 + 3 −7 0 0 In terms of the columns of AB: £ ¤ AB•1 AB•2 AB = ¸ · ¸· · · 2 5 2 1 = −1 −7 −1 3 ¸ · · · ¸ · 1 2 5+ = (−7) −1 3 · ¸ 3 8 = −26 −4 1 3 ¸· 2 −1 ¸ 4 0 ¸ ¸ 4+ · 1 3 ¸ 0 Example 3.3.8 If A is a 3 × 3 matrix, find matrices B and C such that   −5A2• + 67A1• + 92A3•  9A3• − 4A1• BA =  8A2• + 21A1• ¸ 91 3.3. LINEAR TRANSFORMATIONS and AC = (AC has two columns). £ 7A•1 − 37A•2 19A•2 + 17A•1 − 7A•3 ¤ Solution:   67 −5 92 9  B =  −4 0 21 8 0   7 17 and C =  −37 19  0 −7 Example 3.3.9 What is the matrix of the transformation T : ℜ3 → ℜ3 that first projects onto the plane x1 + 2x2 + x3 = 0 as in (3.40) and then projects onto the x1 − x2 plane as in Example 3.2.3? Solution: The required matrix is the product  5 · ¸ 1 0 0  61 −3 0 1 0 − 16 3.3.10 − 31 1 3 − 31  · − 16 − 31  = 5 6 5 6 − 31 − 13 1 3 − 16 − 31 ¸ The Associative law: (AB) C = A (BC) If A is m × k and B is k × n, our whole inspiration (3.42) for the definition of the product P = AB is that (AB) x = A (Bx) should hold for all x ∈ ℜn . This is a special case of the associative law for multiplication of matrices, but in fact gives us the general law directly. Statement and proof of the associative law Let A be an m × k matrix, B a k × n matrix and C an n × p matrix. Then (AB) C = A (BC) (3.52) Considered as mappings, both sides of (3.52) have the same meaning: First apply C, then B, then A. Therefore both sides are equal as matrix products. Powers of A if A is square. Let A be m × m. Then (AA)A and A(AA) are both defined and are equal by (3.52), so we can write A3 for this product. Similarly, if n > 1 is an integer we have An = AA · · · A (n times), where the bracketing does not matter. By convention, A0 = Im (the m × m identity matrix). Remark 61 We have given a motivated and especially simple proof of the associative law (3.52). The usual proof given in textbooks is the following: [A (BC)]ij = k X [A]ir [BC]rj = r=1 = = k X [A]ir r=1 k X n X r=1 s=1 n X [A]ir [B]rs [C]sj = Ã n X Ã k n X X s=1 [AB]is [C]sj = [(AB) C]ij s=1 [B]rs [C]sj s=1 r=1 ! [A]ir [B]rs ! [C]sj 92 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.3.11 Rotations are linear transformations . Let X be a point in the Cartesian plane ℜ2 = ℜ2×1 and suppose that R rotates OX = x through the angle θ about the origin anticlockwise looking down the x3 − axis towards O. This is a clockwise rotation of θ looking from the origin O along the positive x3 −axis. Let R (x) be the effect of R on x. x.2 ′ .② R(a) .A . . . ... ... . . ... ... . . R(a + b) ... ... . . .. R(b) .✮ C ′ ............... ........ ✠ .. B′ . ........... C ........ ✒ . . . . . . . . . B .. .. .. ✍ ... . .. b ... . a+b .. ... . .. .. A ✿ a x1 O Figure 3.4 We will now see that R : ℜ2 → ℜ2 is a linear transformation. In Figure 3.5, A and B are any two points in the plane with position vectors a and b. The point C has position vector c = a + b and the quadrilateral OACB is a parallelogram. The points A′ , B ′ and C ′ are the respective results of rotating about O the points A,B and C through the angle θ. In other words, A′ , B ′ and C ′ have position vectors R (a), R (b) and R (a + b) respectively. The parallelogram OA′ C ′ B ′ is congruent to OACB and this shows that R (a + b) = R (a) + R (b) Thus the transformation R satisfies the first condition in equation (3.35) for linearity. It is pretty obvious that if we multiply a by a scalar s that R (a) will be multiplied by the same factor, i.e. that R (as) = R (a) s, so that that the second condition for linearity as expressed in equation (3.36) also holds. We have shown that R : ℜ2 → ℜ2 is a linear transformation. 3.3.12 The matrix Rθ of a rotation through θ Let Rθ be the matrix of the above rotation transformation R that rotates through the angle θ. Then ¸ · cos θ − sin θ (3.53) Rθ = sin θ cos θ To see this, note that from Figure 3.5, ¸ · ¸ · cos θ 1 Rθ = sin θ 0 and Rθ · 0 1 ¸ = · − sin θ cos θ ¸ 93 3.3. LINEAR TRANSFORMATIONS Since these are the first and second columns of Rθ respectively, this proves (3.53). x2 ❃ · · ¸ cos θ sin θ − sin θ cos θ ¸ · 0 1 ¸ ✻ ❪ θ θ · ¸ ✲ 1 0 x1 Figure 3.5 Example 3.3.10 Write down the linear transformations that rotate π/3 and 2π/3 radians about the origin in the Cartesian plane. Solution: R π3 = R 2π = 3 3.3.13 · · ¸ cos π3 sin π3 − sin π3 cos π3 cos 2π 3 sin 2π 3 − sin 2π 3 cos 2π 3 = ¸ · = 1 2 √ 1 2 · 3 √ ¸ − 12 3 1 2 √ ¸ −√21 − 21 3 1 − 21 2 3 The addition formulae for sine and cosine We note that rotation through φ followed by the rotation through θ is equivalent to the rotation through θ + φ, that is, Rθ Rφ = Rθ+φ This means · cos θ sin θ − sin θ cos θ ¸· cos φ sin φ − sin φ cos φ ¸ = · cos (θ + φ) − sin (θ + φ) sin (θ + φ) cos (θ + φ) ¸ (3.54) Multiplying the two matrices on the left gives the addition formulae for sine and cosine: ¸ ¸ · · cos (θ + φ) − sin (θ + φ) cos θ cos φ − sin θ sin φ − cos θ sin φ − sin θ cos φ = sin (θ + φ) cos (θ + φ) sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ Remark 62 Let X 6= O be a point in three-dimensional space. Looking along OX we can rotate clockwise through θ and this defines a transformation T : ℜ3 → ℜ3 . Much the same argument shows that T is linear. However, the formula for its matrix is more complicated. Exercise 63 1. Find all possible products of the following matrices, whenever the product is defined,and when this is the case, express the rows and columns of the product as linear combinations, as in example 3.3.7. 94 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES  ¸ · −2 5 −1 3 2 0 −5 , B =  4 −7 , C = , A= 2 1 −4 3 11 0 3   2 £ ¤ D = −3 4 −1 , E =  −1  6 · ¸  2. Given that B is a 3 × 3 matrix, find by inspection matrices A and C such that   B1• − 2B3•   2B2• + 5B3•  AB =   B1• + 4B2• + 7B3•  −3B1• £ ¤ BC = B•1 (−2) + B•2 (3) + B•3 B•2 − B•1 Note that BC has two columns. 3. If a ∈ ℜ1×n is a row and b ∈ ℜm×1 is a column, is the product X = b a defined? If so, describe the entries [X]ij . 4. If AB = C and two of A ,B, C are square then so is the other. True or false? 5. Let A be an m × n matrix and Im and In identity matrices (see equation (3.8)). Show that Im A = A = AIn Conclude that if A is n × n and I = In that IA = AI = A. 6. In No.10 of Exercise 5 of Chapter 1 you found the formula for the projection of a point (p1 , p2 , p3 ) on the line (−2t, −t, 5t). Find its matrix B as a linear transformation. In No.1 of Exercise 56 above you found the matrix A of the projection on the plane 2x + z = 0. Find (a) the matrix of the transformation that first projects on the plane 2x + z = 0 then on the line (−2t, −t, 5t). (b) the matrix of the transformation that first projects on the line (−2t, −t, 5t) then on the plane 2x + z = 0. The answer is  1 1  5 0 AB = 30 − 52    4 4 2 −10 0 − 25 25 1 1 −5  =  15 1 0  2 4 8 −10 −5 25 0 − 25 5 2 25 1 30 4 − 25  − 52 − 61  4 5 7. (a) complete the proof of the addition formulae for sine and cosine by multiplying out equation (3.54). (b) Check by direct multiplication that Rθ R−θ = I2 . 8. * (a) Let ℓ be a line in the Cartesian plane making an angle θ with the positive x1 −axis. Find the projection matrix Pθ on ℓ and deduce that the matrix Mθ which reflects in ℓ is given by ¸ · cos 2θ sin 2θ Mθ = sin 2θ − cos 2θ Observe that, in spite of superficial appearances, Mθ is not a rotation matrix. Why? (b) Let φ and θ be two angles and α = φ − θ. Prove that Mφ Mθ = R2α in two ways: 95 3.3. LINEAR TRANSFORMATIONS Mφ Mθ x ■ x... 2 -axis .. .. .. .. .. .. .. .. .. Mθ x .. .. ▼ .. .. .. .. α .. x .. ........ .. .. θ ................ .. ................ .......... ............................................ x -axis 1 O Figure 3.6 α=φ−θ i. geometrically by referring to Figure 3.6; ii. algebraically. Hint 64 For the geometric solution, let Mθ rotate x through 2γ. Then Mφ rotates Mθ x through 2(α − γ). For the algebraic solution use the expression for Mθ found in the previous exercise. 9. Following the corkscrew rule, find matrices that represent rotations through θ about the x−, y− and z− axes. Hint 65 Recall the corkscrew rule: looking along the specified axis, rotate clockwise through θ. The required rotation about the z−axis is   cos θ − sin θ 0  sin θ cos θ 0  0 0 1 10. (Compare Exercise 56, No.6 above). Let M be the matrix of a reflection in a plane going through the origin and P the matrix of a projection on the same plane. Explain geometrically why you expect the following to be true. Draw diagrams! (a) M 2 = I3 (b) P 2 = P (c) M P = P = P M Verify these results for the plane 2x + z = 0 of No.1 of Exercise 56 above. For analytical proofs of these results see No.17 below T 11. If the matrix product AB is defined, show that so is B T AT and that (AB) = B T AT . Hint 66 [(AB)T ]ji = [AB]ij = Ai• B•j . Now use equation (3.13) and Exercise 47 No.6. 96 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 12. Prove that the composite transformation x → A (Bx) of (3.41) is indeed linear. Can you see why a similar result holds for general linear transformations? Hint 67 Consider A [B(us + vt)] = A[(Bu)s + (Bv)t] = · · · A[B(us + vt)] = A[(Bu)s + (Bv)t] = [A(Bu)]s + [A(Bv)]t 13. (“Linear combinations of linear combinations” theorem) (a) In Exercise 47, No.8c you anticipated the following result: Let x be a column vector with m entries that depends linearly on a1 . . . , ak and suppose that each vector aj depends linearly on b1 , b2 , . . . , bn . Then x depends linearly on b1 , b2 , . . . , bn . Using matrix multiplication give a short proof of this statement. It should be clear that the same result must be true if the vectors are rows, but give a modified matrix proof. (b) Interpret this result in terms of the span concept. Hint 68 Let the m × k matrix A have columns a1 , . . . , ak and let the m × n matrix B have columns b1 , . . . , bn . By assumption, there is an n × k matrix C such that A = BC. Since x = Au for some column vector u with k entries, the result follows. Fill in the remaining details. For the row case, think in terms of the transposes of the above matrices. P 14. Show that the column case of No.13 can also be proved using - notation as follows: Pk Pn Write x = j=1 aj uj and aj = i=1 bi cij for j = 1, . . . , n. Then expand and rearrange the sum Ã n ! k X X bi cij zj x= j=1 i=1 15. Prove that the composite transformation x → A (Bx) of (3.41) is indeed linear. Can you see why a similar result holds for general linear transformations? Hint 69 Consider A [B(us + vt)] = A[(Bu)s + (Bv)t] = · · · A[B(us + vt)] = A[(Bu)s + (Bv)t] = [A(Bu)]s + [A(Bv)]t 16. (a) If the matrix sum A + B and the matrix product C (A + B) are defined, show that C (A + B) = CA + CB. (b) If the matrix sum A + B and and the matrix product (A + B) C are defined, show that (A + B) C = AC + BC. (c) If AB is defined, and s is a scalar, show s(AB) = (sA)B and (AB)s = A(Bs) and that both are equal. Conclude with statements and proofs of more general results than (a) and (b). Hint 70 To show (a) use the definition of matrix multiplication and equation (3.14): [C (A + B)]ij = Ci• [A + B]•j = Ci• (A•j + B•j ) = Ci• A•j + Ci• B•j = · · · If C is m × k and A and B are k × n, you may prefer to proceed directly: [C (A + B)]ij = k X r=1 cir [A + B]rj = k X r=1 cir (arj + brj ) = ... 97 3.4. LINEAR INDEPENDENCE 17. * (a) Let C be an n × n matrix satisfying C 2 = I where I = In . 2 2 Show that (I − C) = I − C and that (2C − I) = I. The rest of the question refers to No.10 above and to Exercise 56, No.2a, No.2b and No.4. (b) Let P be the projection matrix of Exercise 56, No.2a and write the vector u as a £ ¤T column vector u = u1 u2 u3 . Show that i. P = 1 u uT , |u|2 ii. P 2 = P using matrix multiplication, iii. the matrix M = 2P − I reflecting in the line satisfies M 2 = I. (c) Let P be the projection matrix of Exercise 56, No.2b onto a plane through the origin £ ¤T with normal n = n1 n2 n3 . Show that i. P = I − 1 n nT , |n|2 ii. P 2 = P using matrix multiplication, iii. the matrix M = 2P − I reflecting in the plane satisfies M 2 = I. (d) Complete analytical proofs of No.10 above. Hint 71 Note that we are forming matrix products of a column and a row as in No.3. Also make use of No.17a. 3.4 Linear independence In ℜ2 let a and b be two column vectors. If if one depends linearly on the other, say a = bs for a scalar s, we say they are linearly dependent. Otherwise the vectors are linearly independent: neither depends on the other. In that case, if some linear ¡ ¢combination as + bt = 0, then both s and t have to be zero. For if (say) t 6= 0, then b = a − st , and b would depend on a. Similarly, in ℜ3 three column vectors a, b and c are linearly dependent if one of them is a linear combination of the other two. This will be the case if there are numbers s, t, r not all zero such that as + bt + ct = 0. For example, if s 6= 0, then a will depend linearly on b and c. They are linearly independent if no vector depends linearly on the other two. In that case we can only have as + bt + cr = 0 if s = t = r = 0. Geometrically let a, b and c be linearly independent and let the tails of these vectors are placed at the origin O. Then O and the heads of a, b and c do not lie on a plane. For more details on the geometric meaning of linear independence see subsection 3.4.11 below. We now generalize and formalize these ideas as follows: 3.4.1 Column independence and column dependence Definition Let a1 , a2 , · · · , an be a list of n column vectors each with m entries. Form the m × n matrix A with these vectors as columns: A= £ a1 a2 ··· an ¤ 98 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES If there is a non-trivial solution x 6= 0 to the homogeneous system Ax = 0 (3.55) the vectors a1 , a2 , · · · , an are said to be linearly dependent, otherwise they are linearly independent. To put it another way, the columns of the m × n matrix A are linearly independent if, and only if for all column vectors x with n entries Ax = 0 implies x=0 (3.56) To say the same thing once more, if we can find a linear combination such that a1 x1 + a2 x2 + · · · + an xn = 0 (3.57) in which one or more of the coefficients xj are not zero, then the vectors a1 , a2 ,..., an are linearly dependent (there is a non-trivial linear relationship among them), otherwise they are independent. 3.4.2 An equivalent way to see independence/dependence Let a1 , a2 , · · · , an be column vectors each with m entries. If there is only one vector, i.e. n = 1, then a1 is dependent if, and only if it is the zero vector. If n ≥ 2, the vectors are linearly dependent if, and only if, one of them depends linearly on the others. Proof : The proof of the case n = 1 is left to you. See Exercise 78, No.1a. Let n ≥ 2 and suppose the vectors are linearly dependent. Then (say) x1 6= 0 in equation 3.57. In that case a1 depends linearly on a2 ,..., an : ¶ ¶ µ µ x2 xn a1 = a2 − + · · · + an − x1 x1 Similarly, if xj 6= 0, then aj depends linearly on the vectors ai for i 6= j. Conversely, if (say) there are scalars s2 , . . . , sn such that a1 = a2 s2 + · · · + an sn Then a1 + a2 (−s2 ) + · · · + an (−sn ) = 0 Since the coefficient of a1 is 1 6= 0, the vectors are linearly dependent, as required. For a slightly refined version of this theorem, see Exercise 78, No.1(c)v. Independence/dependence of row vectors The definition of linear independence for a list of row vectors is almost identical to that for column vectors. See 3.4.12 below. 3.4.3 Some examples of independence/dependence The most elementary example of linear independence The simplest example of linearly independent column vectors is afforded by one or more columns of the n × n identity matrix I = In . Since Ix = x, we can only have Ix = 0 if x = 0. Slightly more complicated is 99 3.4. LINEAR INDEPENDENCE Example 3.4.1 Consider the matrix (3.25) of example 3.1.6:   2 5 −11 A =  −3 −6 12  4 7 −13 A row-echelon was found in equation (3.26):   2 5 −11 A′ =  0 1 −3  0 0 0 The columns of A are linearly dependent, as can be seen from (3.27) by letting t 6= 0, say putting t = 1. The columns of the matrix (3.29) in example 3.1.7 are linearly independent since Ax = 0 has only the trivial solution x = 0. The columns of A in Exercise 35 No.2b are linearly independent. 3.4.4 If a matrix has more columns than rows, its columns are linearly dependent; if more rows than columns then its rows are linearly dependent This is just a rephrasing of the result of subsection 2.2.15 in Chapter 2: A homogeneous system with more unknowns than equations has a non-trivial solution. The row case is quite similar. 3.4.5 Testing for linear independence of some columns of A Suppose that A has seven columns and we wish to test if the columns of [A•2 , A•5 , A•6 ] are linearly independent. This is the same as considering all vectors x with x1 = x3 = x4 = 0 and testing the implication (3.56). The same principle obviously applies to any selection of columns of A. 3.4.6 Column-rank and row-rank If a1 , a2 , . . . , an are column vectors, their column-rank is the maximum number of vectors aj that are linearly independent. This is also the column-rank of the matrix A = £ ¤ a1 a2 · · · an . Row-rank The row-rank of a matrix A is defined similarly, except that ‘rows’ replace ‘columns’. See 3.4.12 below for more details. Given a list of vectors of the same size (either rows or columns), it should be clear what is be meant by their rank: it is the maximum number of linearly independent vectors among them. 3.4.7 Elementary row operations preserve column independence and dependence ¤ £ Consider a system Ax = b of m equations for n unknowns. Let A′ , b′ be the array after performing elementary row operations. We saw in the last chapter that Ax = b holds if, and only if, A′ x = b′ holds: b = A•1 x1 + · · · + A•n xn if, and only if, b′ = A′•1 x1 + · · · + A′•n xn 100 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Now for the homogeneous system (3.55), b = b′ = 0, and any dependency relation among the columns of A also holds for A′ and vice-versa : A•1 x1 + · · · + A•n xn = 0 if, and only if, A′•1 x1 + · · · + A′•n xn = 0 In particular, certain columns of A are linearly independent if, and only if, the same columns of A′ are linearly independent. In particular, the column ranks of A and A′ are the same. Also, column A•j is a linear combination of other columns of A if, and only if, A′•j is a linear combination of the corresponding columns of A′ with the same coefficients. For these reasons, questions regarding dependence or independence of the columns of A are best answered by considering the row reduced echelon form (row REF) of A. 3.4.8 The column-rank of a matrix is the number of non-zero rows in a row echelon form Proof Let k be the number of non-zero rows in an echelon form of the m × n matrix A and suppose that A′′ is its row REF. Recall that the pivot columns (see 2.2.8) of A′′ are columns of the identity matrix Im , which are certainly linearly independent. Hence the column-rank of A′′ is at least k. The k × n matrix of the non-zero rows of A′′ cannot have column-rank more than k by 3.4.4. Since the other rows of A′′ are zero, the column rank of A′′ is exactly k. This is also the column-rank of A. 3.4.9 Every column of A is a linear combination of the linearly independent pivot columns of A This is because the pivot columns of A′′ , when restricted to the non-zero rows, are the columns of the identity matrix Ik . The statement says that the pivot columns span the column space. Example 3.4.2 Consider example 3.1.6 once more. Can we find a set of linearly independent columns of A in (3.25) such that every column of A is a linear combination of these independent columns? What is the column-rank of A? Illustrate the statements of 3.4.8 and 3.4.9. Solution: The original matrix (3.28) is The echelon form 3.26 is   2 5 −11 A =  −3 −6 12  4 7 −13  2 5  0 1 0 0  −11 −3  0 shows that first two columns of A are linearly independent. The row REF shows things even more clearly:   1 0 2  0 1 −3  (3.58) 0 0 0 Every column of A is a linear combination of the first two, which are pivot columns. The column rank is 2. 101 3.4. LINEAR INDEPENDENCE Example 3.4.3 Let  Illustrate 3.4.7, 3.4.8 and 3.4.9.  −1 −4 −3 2  3 4 1 2   A=  2 9 7 −5  −5 −1 4 −9 Solution: We first reduce the matrix A to row REF: −1 −4 −3 2 0 −8 −8 8 0 1 1 −1 0 19 19 −19 −1 −4 −3 0 −1 −1 0 1 1 0 1 1 2 1 −1 −1 −1 −4 −3 2 0 −1 −1 1 0 0 0 0 0 0 0 0 −1 −4 −3 2 0 1 1 −1 0 0 0 0 0 0 0 0 −1 0 0 0 This 3R1 + R2 2R1 + R3 −5R1 + R4 1 8 R2 1 19 R4 R2 + R 3 R2 + R 4 −R2 0 1 −2 4R2 + R1 1 1 −1 0 0 0 0 0 0 gives the row REF A′′ of A as  1  0 E=  0 0 0 1 0 0 −1 1 0 0  2 −1   0  0 The pivot columns of E are E•1 and E•2 and they are linearly independent, and therefore so are A•1 and A•2 . The column-rank of A is 2. Clearly, every column of E is a linear combination of E•1 and E•2 . Therefore, every column of A depends linearly on A•1 and A•2 . For example, E•4 = E•1 (2) + E•2 (−1), and so A•4 = A•1 (2) + A•2 (−1), much as in the previous example. 3.4.10 Uniqueness of the solution to Ax = b. One-to-one linear transformations If the columns of the m × n matrix A are linearly independent then for any m × 1 column vector b the system of equations Ax = b has at most one solution. To see this, consider the following: Suppose that u and v are two solutions, i.e. Au = b and Av = b. Then Au = Av and so Au − Av = A (u − v) = 0. As the columns of A are linearly independent, u − v = 0 and u = v. 102 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Consider the matrix mapping x → Ax of (3.32). To say that this mapping is one-to-one means that for any two column vectors u and v with n entries, we can have Au = Av only if u = v. In other words, two linear combinations of the columns of A are equal only if they have identical coefficients. The mapping x → Ax is one-to-one if, and only if, the column-rank of A is n. Example 3.4.4 A row echelon form (2.2) Chapter 2 is  1  0 0 of the coefficient matrix A of Example 2.1.2 in  1 3 2 1  0 1 The column-rank of A is 3 and the mapping  1 1 x→ 1 1 −1 1  3 4 x −2 from ℜ3 to ℜ3 is one-to-one. All this becomes even clearer if we look at the row-reduced echelon form in Example 2.2.7 of Chapter 2. This is the 3 × 3 identity matrix and so Ax = b has a solution for any b ∈ ℜ3 : The mapping x → Ax has range the whole of ℜ3 . The example illustrates the following remark. 3.4.11 Geometric interpretation of column-rank A proper definition of ‘dimension’ is given in section 3.6), but we can accept that a line in ordinary space through O has dimension 1, that a plane through O has dimension 2 and that the dimension of the whole of ordinary space is 3. The column-rank of a matrix A can be thought of as the dimension of its range when A is considered as a linear mapping. To better appreciate this, see Exercise 78, No.19 as well as No.8 in Exercise 93. Consider the matrix of Example 3.3.1 that projects on a line. Each column is a multiple of any other and the column-rank is 1, as expected. Again, consider the coefficient matrix A of equation (3.40). Because A•1 + A•2 (2) + A•3 = 0, the column-rank of A is not 3. It is easy to see that any two columns are linearly independent, so r(A) = 2, in agreement with our comment. The range of a reflection matrix is the whole of space, which has dimension 3. So we expect its rank to be 3, which is true. 3.4.12 Row Independence and row-rank Everything that can be said about column independence has its counterpart for rows. Hence, as indicated earlier, the rows of the m × n matrix A are linearly independent if, and only if for all row vectors y with m entries, yA = 0 implies y=0 In other words, if we can find a row vector y 6= 0 such that the linear combination y1 A1• + y2 A2• + · · · + ym Am• = 0 then the rows of A are linearly dependent, otherwise they are linearly independent. The row-rank of A is the maximum number of linearly independent rows of A. 103 3.4. LINEAR INDEPENDENCE 3.4.13 The non-zero rows of a matrix in row echelon form are linearly independent To see this, let C be an m × n matrix in row echelon form, and suppose that C1• , C2• ,...., Ck• , are its non-zero rows. Let y1 C1• + y2 C2• + · · · + yk Ck• = 0 where 0 is the zero row with n entries. The first non-zero entry in C1• has only zeros below it. Therefore, y1 = 0 and y2 C2• + · · · + yk Ck• = 0. As the first non-zero entry in C2• has only zeros below it, y2 = 0. Carrying on the argument, y1 = y2 = · · · = yk = 0. The claim is proved. For example, the first two rows of  −1 −4 −3 2  0 1 1 −1     0 0 0 0  0 0 0 0  For more examples see Chapter 2, including Exercises 35 and 39. 3.4.14 Elementary column operations for equations of type yA = c Let A be an m × n matrix. In order to solve the matrix equation yA = c (3.59) we can mimic the procedure followed to solve a system of the type Ax = b. To do this we must perform elementary column operations on the augmented array · ¸ A c Such operations do not alter the solution set to equation 3.59 and are: (i) interchange two columns of the array, (ii) multiply a column of the array by a non-zero constant and (iii) add a multiple of some column to another. In remark 49 we mentioned that we are in the habit of rather doing elementary row operations on the system AT y T = cT Every elementary row operation on this system has its exact counterpart as an elementary column operation on the system yA = c, and this is true in particular for c = 0. Nevertheless, column operations on an array are important in their own right. Example 3.4.5 Let Find the row-rank of B. Solution:  −1  −4 B=  −3 2 3 4 1 2  2 −5 9 −1   7 4  −5 −9 B is the transpose of the matrix A in example 3.4.3. Thus its row-rank is the column-rank of A, which is 2. Note 72 For further elementary properties of linear dependence/independence, see No.1 in Exercise 78 below. 104 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.4.15 A Column Lemma Let A and B have the same number n of columns. If for all column vectors x with n entries, Ax = 0 implies Bx = 0 (3.60) then (column-rank A) ≥ (column-rank B). Proof : We may suppose that the first k columns of B are independent, where k is the columnrank of B. Let x be any vector such that xk+1 = · · · = xn = 0 and Ax = 0. By assumption, Bx = 0. As the first k columns of B are independent, x = 0. Hence the first k columns of A are independent. Remark 73 We have actually shown a bit more: if the condition (3.60 holds, and certain columns of B are linearly independent, then the corresponding columns of A are also linearly independent. 3.4.16 Corollary: column-rank of a matrix product Let the matrix product CA = B be defined. Then (column-rank A ≥ (column-rank B) and if C has independent columns, the two ranks are equal. For, if Ax = 0, then certainly Bx = 0 and so (column -rank A) ≥ (column-rank B) by the Lemma 3.4.15. Now let C have independent columns and suppose Bx = 0. Then C(Ax) = 0 and therefore Ax = 0 as the columns of C are independent. It follows, again by the Lemma, that (column-rank B) ≥ (column-rank A) and equality holds. Example 3.4.6 Let C be the 3 × 3 matrix matrix of example 3.4.2 with column-rank column-rank of the product   1 1 3 2 CA =  −1 1 −2   −3 1 1 4 4 is also 2. of Example 3.4.4 of rank 3 and let A be the 3 × 3 2. From the corollary we can conclude that the 5 −6 7   −11 11 20 12  =  −13 −25 −13 15 27  −38 49  −51 The counterpart for rows of the Column Lemma is 3.4.17 A Row Lemma Let A and B have the same number m of rows. If for all row vectors y with m entries, yA = 0 implies yB = 0 then row-rank A ≥ row-rank B. Remark 74 As with the Column Lemma, we have shown that if the condition (3.4.17) holds and certain rows of B are linearly independent, the corresponding rows of A are also linearly independent. 3.4.18 Corollary: row-rank of a matrix product Let AD = B be a matrix product. Then row-rank A ≥ row-rank B and equality holds if D has independent rows. The proof of this Lemma and its corollary are left as exercises. See No.13 in Exercise 78. 105 3.4. LINEAR INDEPENDENCE 3.4.19 The the column-rank and row-rank of a matrix A are equal Proof : Let the column-rank of the m × n matrix A be k. By 3.4.8 the k pivot columns of A are linearly independent, forming an m × k matrix B. By 3.4.9 all columns of A are linear combinations of the columns of B. Thus, for some for some k × n matrix D, A = BD By Corollary 3.4.18, row-rank B ≥ row-rank A. But B has k columns, so its row-rank cannot exceed k: column-rank A = k ≥ row-rank B ≥ row-rank A Hence the column-rank of A is at least the row-rank of A. By symmetry (consider the transpose AT ), the row-rank of A is at least the column-rank of A and so the two are equal. 3.4.20 Definition of rank r(A) The common row- and column-rank of A is called the rank of A and is denoted by r(A). By 3.4.8, the rank of A is the number of non-zero rows in a row echelon form of A. Example 3.4.7 Find the rank of the matrix  0 0 −3 −3 −1  2 −2 −2 −3 −2 A=  0 0 3 −2 −2 2 −3 −4 3 −3     Solution: This is the matrix of Example 2.2.4 in 2.26:  2  0   0 0 Therefore, r(A) = 4 3.4.21 Chapter 2. We found its row-echelon form in equation  −3 −4 3 −3 1 2 −6 1   0 3 −2 −2  0 0 −5 −3 Lemma: A basic property of linear independence A basic property concerning linear independence/dependence is the following. Let a1 , a2 , . . . , an be a list of n linearly independent column vectors, each with m entries and b a column vector with m entries. Then the rank of a1 , a2 , . . . , an , b equals the rank of a1 , a2 , . . . , an if, and only if, b depends linearly on a1 , a2 , . . . , an . Proof : The rank of a1 , a2 , . . . , an , b is either n or n + 1. If it is n, then these vectors are linearly dependent, and there exist constants x1 ,...,xn , β not all zero such that a1 x1 + a2 x2 + · · · + an xn + bβ = 0 106 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES We must have β 6= 0, otherwise one or more of the scalars xj would have to be non-zero, contradicting the linear independence of the vectors a1 , a2 , . . . , an . Therefore, ¡ ¢ ¡ ¢ ¡ ¢ b = a1 −x1 β −1 + a2 −x2 β −1 + · · · + an −xn β −1 and b is a linear combination of a1 , a2 , . . . , an . Conversely, if b is a linear combination of a1 , a2 , . . . , an , then a1 , a2 , . . . , an , b cannot be linearly independent, and Theorem A has been proved. The Lemma rephrased: Given linearly independent vectors a1 , a2 , . . . , an , the vectors a1 , a2 , . . . , an , b are linearly independent if, and only if, b does not depend linearly on a1 , a2 , . . . an . 3.4.22 Theorem: The fundamental property of linear independence Note 75 This theorem is a generalization of the Lemma. To understand what follows you should have a thorough grasp of the content of the “linear combinations of linear combinations theorem” contained in No.8c of Exercise 47 and No.13 of Exercise 63. Let a1 , a2 , . . . , an , b be any list of n column vectors, each with m entries. Then the rank of a1 , a2 , . . . , an , b is not more than the rank of a1 , a2 , . . . , an if, and only if, b depends linearly on a1 , a2 , . . . , an . Put slightly differently, the theorem says that column-rank [A, b] = column-rank A if, and only if, Ax = b has a solution. Here, A is the m × n matrix having the vectors aj as columns. Proof : Let the maximum number of linearly independent vectors among a1 , a2 , . . . , an be s. 1. Suppose that the rank of a1 , a2 , . . . , an , b is s. Then, by Theorem A, b must be a linear combination of any s linearly independent vectors among a1 , a2 , . . . , an . 2. Conversely (and this is the only non-trivial case), suppose that the rank of a1 , a2 , . . . , an , b is s + 1. Then we may assume (without loss in generality) that a1 , . . . , as , b are linearly independent. By Theorem A, every vector aj is a linear combination of a1 , . . . , as . Then if b were a linear combination of a1 , a2 , . . . , an , it would also be a linear combination of a1 , · · · , as , which is a contradiction, because a1 , . . . , as , b are assumed to be linearly independent. Corollary 76 Let the matrix B consist of k linearly independent columns of a matrix A which cannot be enlarged by columns of A without destroying independence. Then 1. Every column of A is a linear combination of columns of B (this follows from the Lemma) 2. k is the column-rank of A. Remark 77 We emphasize that the above theorem 3.4.22 about column vectors is equally valid for row vectors. The statement 3.4.22 goes through word-for-word. The full use of the theorem will be seen in section 3.6 when we study vector spaces. 3.4. LINEAR INDEPENDENCE 3.4.23 107 An Application of the Fundamental Theorem We will use 3.4.22 to show directly that elementary row operations on an m × n matrix A do not change its row-rank. It is clear that interchanging two rows cannot change the row-rank. It is almost as obvious that multiplication of a row by a non-zero constant does not alter the row-rank (see No.1(c)iv in Exercise 78). It remains to show that replacing (say) A1• by βA2• + A1• to form A′ does not change the row-rank. Let the last m−1 rows A2• , · · · , Am• of A have rank k. Since βA2• +A1• is a linear combination of these m − 1 rows if, and only if, A1• is (see No.2 in Exercise 52), it follows that the row-rank of A′ is k + 1 if,and only if, the row-rank of A is k + 1. For an alternative way to view this see Exercise 93 No. 9. Exercise 78 1. In most of the elementary questions that follow, you may assume the given vectors are either rows or columns, each with the same number of entries. Prove the given statements. (a) A single vector a is linearly independent if, and only if, a 6= 0. (b) Show directly that two vectors a, b are linearly dependent if, and only if, one is a multiple of the other. Compare the idea of parallel vectors in subsection 1.1.10, Chapter 1. (c) Let a1 , a2 , · · · , an be a given list of vectors. Show i. If some ai = 0, then a1 , a2 , · · · , an are linearly dependent. ii. If n ≥ 2 and some subset, say a2 , · · · , am , are linearly dependent, so are a1 , a2 , · · · , an iii. If a1 , a2 , · · · , an are linearly independent, what can be said of any subset of a1 , a2 , · · · , am ? iv. The rank of a1 , a2 , · · · , an is unaltered if one of the vectors aj is multiplied by a non-zero scalar k. v. * Let n > 2 and suppose that a1 , a2 , ..., an are non-zero vectors. Prove that the vectors are linearly dependent if and only if some aj (j > 2) depends linearly on a1 ,..., aj−1 . Translate this statement into one about linear independence. Hint 79 Suppose the vectors a1 , a2 , ..an are linearly dependent. Since a1 6= 0, there must be a first index j > 2 such that the vectors a1 ,..., aj are linearly dependent. 2. How does Lemma 3.4.21 show why 3.4.9 follows directly from 3.4.8? 3. Let a, b, c be the position vectors of the points A, B and C respectively. Show that the three vectors are linearly independent if, and only if, A, B, C and O do not lie on a plane. (This was mentioned in the preamble 3.4). 4. Let A = [C, D] where C and D are matrices with the same number of rows. Show that if the rows of C (or of D) are independent, so are those of A. Conclude that in any case r (A) > r (C) and r (A) > r (D). 5. For any matrix product AB, say why r(AB) ≤ r(A) and r(AB) ≤ r(B). 6. let C be m × n. If A is m × m of rank m, then r(AC) = r(C). Let B be n × n of rank n, then r(CB) = r(C). Consequently, r(ACB) = r(C). Prove these statements. 108 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 7. Without multiplying out, find the rank of     −14 2 5 −11 1 1 3  −3 −6 12   −1 1 −2  =  15 −16 4 7 −13 1 1 4  −4 −48 3 51  −2 −54 Hint 80 Consider Example 3.4.6. 8. The following questions refer you to the matrices in Exercise 35 of Chapter 2. To answer them you should consult the row reduced echelon forms (row REF) found in Exercise 39. Occasionally you may need to find the new row reduced echelon forms. (a) Consider the matrix A of No.1a.Show that the first two columns of A form a maximum independent set of columns of A and that every column of A is a linear combination of A•1 and A•2 . Can you find three linearly independent columns of A? Find the rank r(A). Show that the first two rows of A form a maximum number of independent rows of A. Hint 81 Either show directly that A1• and A2• are linearly independent or use the matrix B of No.1b in Exercise 35. (b) Let C be the row REF of matrix A of No.2 of Exercise 35. Show that A•5 depends linearly on A•1 , A•2 , A•3 and that every maximum set of linearly independent columns must include A•4 . Find a maximum set of linearly independent columns of A and the rank r(A). Which of the following sets are linearly independent? i. A•2 , A•3 , A•4 , A•5 ii. A•1 , A•3 , A•4 , A•5 iii. A•1 , A•2 , A•4 , A•5 Find a maximum number of linearly independent solutions to the homogeneous equation. (c) Consider the coefficient matrix A in No.3 of Exercise 35. £ ¤ i. Does any row vector c = c1 c2 c3 c4 depend linearly on the rows of A ? Solution: A is 4 × 4 and the rows of A together with c can have rank at most four. Since the rows of A are linearly independent (r(A) = 4 from the row REF), the answer is ‘yes’ by the row version of Lemma 3.4.21. ii. Are the columns of the coefficient matrix A linearly independent? iii. Does Ax = b have a unique solution for every column vector b with 4 entries? iv. Let C be the augmented matrix of No.3a in Exercise 35. Does every column vector b with 4 entries depend linearly on C•1 , C•2 , C•3 , C•5 ? 9. Prove that in 3-space mutually orthogonal vectors are linearly independent. Hence complete a rigorous proof of No.15 of Exercise 2 in Chapter 1. 10. * Give rigorous proofs of the statements about planes in subsection 1.3.6. 109 3.4. LINEAR INDEPENDENCE 11. Let I be the m × m identity matrix. The matrix J resulting from I after performing an elementary row operation on I is called an m × m elementary matrix. Hence J is the result of doing one of the following: (i) multiplying row r of I by β 6= 0, (ii) interchanging rows r and s of I, and (iii) changing row Is• to βIr• + Is• if s 6= r. (a) Show that an elementary matrix J has rank m. (b) Show furthermore that if A is an m × n matrix, then JA is the result of doing the corresponding elementary row operation on A. Hint 82 Use [JA]i• = Ji• A. So, for example, let J be the result of a type (iii) operation. Then for i 6= s, [JA]i• = Ai• , while if i = s, [JA]s• = Js• A = (βIr• + Is• )A = (βIr• )A + Is• A = βAr• + As• (c) What are the corresponding results for elementary column operations? (d) Give a description of ”column echelon form” and ”column reduced echelon form” of a matrix A. 12. Show that A, AT , AT A and AAT all have the same rank. Hint 83 First of all, recall that if a is a column vector and aT a = 0, then a = 0 (See Exercise 47 No.11). Now suppose AT Ax = 0. Then xT AT Ax = (Ax)T Ax = 0. 13. Prove the Lemma 3.4.17 and its corollary 3.4.18. Hint 84 One way: Use AT and B T in place of A and B in the previous Lemma 3.4.15. For a more pleasant direct proof, we may suppose that the first k = r(B) rows of B are independent. Let y be any row vector such that yk+1 = · · · = ym = 0 and yA = 0. Now proceed as in the Column Lemma 3.4.15, except use rows in place of columns. Solution: By assumption, yB = 0. As the first k = r(B) rows of B are independent, y = 0. Hence the first k rows of A are independent. 14. Complete the following alternative proof of the corollary to the Column Lemma (Corollary 3.4.16) which says r(BC) ≤ r(C). If k columns (say [BC]•1 , · · · , [BC]•k ) of BC are linearly independent then so are C•1 , · · · , C•k . Do the same for the row version. 15. In the following assume the relevant matrix products are defined. Prove the statements. (a) If the columns of A are independent and AB = AC, then B = C (b) If the rows of A are independent and BA = CA then B = C. 16. Let A be n × n and A 6= I, A 6= O and A2 = I, where I = In . Find all integers k ≥ 1 such that Ak = I. 17. * Using (among other things) the Column Lemma 3.4.15, show that if the condition (3.60) holds then B = CA for some C. (This is essentially the converse of the corollary 3.4.16). State and prove the corresponding result for the Row Lemma 3.4.17. ′ Hint 85 Let d = d = Bi• be a row of B and let A be the augmented matrix formed by ′ ′ adding row d to A. Then A x = 0 if, and only if, Ax = 0. Hence A and A have the same column ranks and so the same row ranks. By the row version of the basic theorem 3.4.22 on linear independence, d is a linear combination of the rows of A. 110 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES State and prove a similar result corresponding to the Row Lemma 3.4.17. 18. Find the ranks of the following matrices and prove your results correct. (a) A is the projection matrix on the plane 2x + z = 0 of Exercise 56, No.1. (b) A is the reflection matrix in the plane 2x + z = 0. 19. Let A be a 3 × 3 matrix. In each of the following cases A is considered as a linear transformation. Describe the rank of A with a brief geometrical proof. Illustrate the comment 3.4.11. (a) A is a projection matrix onto a line that passes through the origin. (b) A is a projection matrix onto a plane that passes through the origin. (c) A is a matrix that reflects in a plane or line that passes through the origin. This again illustrates the remarks in 3.4.11. 20. * Use the results of Exercise 63, No.17 to give analytic proofs for No.19 above. 21. If Rθ is the 2 × 2 rotation matrix of equation (3.53), prove algebraically that r(Rθ ) = 2. 22. Let A be an m × n matrix. (a) What is the row REF A′′ of A if r (A) = n? What is A′′ if r (A) = m = n? (b) Let Ax = b be a system of m equations for n unknowns. If the rows of A are independent then the system always has a solution. Find a quick proof using the Fundamental Theorem 3.4.22. (c) Let yA = c be a system of n equations for m unknowns y1 , · · · , ym . If the columns of A are independent then the system always has a solution. (d) What can be said of the solutions in 22b and 22c if m = n? 3.5 Square matrices In this section we consider square matrices A of size n × n. Unless otherwise stated, I stands for the identity matrix In . Observe that r(I) = n. 3.5.1 Invertible matrices 3.5.2 Definition of invertibility An n × n matrix A is said to be invertible (or non-singular or to have an inverse) if there is a matrix B such that AB = BA = I (3.61) It is clear that such a matrix B must also be n × n. From the definition it is also clear that if B is an inverse of A then A is an inverse of B. From Exercise 78 No.5 we see that a necessary condition for A to have an inverse is that r(A) = n. The converse will be shown in 3.5.7. 111 3.5. SQUARE MATRICES 3.5.3 Uniqueness of the inverse, definition of A−1 Let X and Y be n × n matrices such that AX = I and Y A = I. Then X = Y and this matrix is the unique inverse of A. Proof : We have Y = Y I = Y (AX) = (Y A) X = IX = X (3.62) Hence X = Y is an inverse of A. But two inverses X and Y of A must satisfy (3.62), and so A can have only one inverse. It is denoted by A−1 . Example 3.5.1 Show directly that · 3 7 2 5 ¸−1 = · 5 −7 −2 3 ¸ Solution: By direct multiplication, · ¸ · ¸· ¸· ¸ · ¸ 3 7 1 0 5 −7 3 7 5 −7 = = −2 3 2 5 −2 3 0 1 2 5 3.5.4 Use of the inverse 3.5.5 Solving Ax = b Consider a system Ax = b (3.63) of n equations for n unknowns. If A has an inverse, then the unique solution is x = A−1 b In the first place, we show that x = A−1 b is a solution to (3.63): ¢ ¡ ¢ ¡ A A−1 b = AA−1 b = Ib = b Conversely, let x = u satisfy the equation (3.63), i.e. suppose Au = b. Multiply both sides of this equation on the left by A−1 : ¡ ¢ A−1 (Au) = A−1 A u = Iu = u = A−1 b Thus x = A−1 b is the unique solution to (3.63). 3.5.6 Solving yA = c If A has an inverse, then for any row vector c with n entries, the unique solution to yA = c (3.64) is y = cA−1 The proof can be derived from 3.5.5 using transposes (see equation 3.9). Alternatively, the proof parallels that of the previous section. The unique solution is found by multiplying both sides of equation (3.64) on the right by A−1 . 112 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Example 3.5.2 Solve · 3 7 2 5 ¸ ¸ · x= · 9 −11 ¸ Solution: Use the result of Example 3.5.1: x= 3.5.7 · 3 7 2 5 ¸−1 · 9 −11 = 5 −7 −2 3 ¸· 9 −11 ¸ = · 122 −51 ¸ Necessary and sufficient conditions for A to be invertible The n × n matrix A is invertible if, and only if, any of the following equivalent conditions hold. If (14) holds, necessarily X = A−1 . If (15) holds, necessarily Y = A−1 . 1. r(A) = n. 2. The columns of A are independent. 3. The rows of A are independent. 4. The homogeneous equation Ax = 0 has only the trivial solution. 5. The homogeneous equation yA = 0 has only the trivial solution. 6. The row REF of A is I. 7. The column REF of A is I (the row REF of AT is I). 8. For every column vector b with n entries Ax = b has a solution. 9. For every column vector b with n entries Ax = b has a unique solution. 10. There is a column vector b with n entries such that Ax = b has a unique solution. 11. For every row vector c with n entries yA = c has a solution. 12. For every row vector c with n entries yA = c has a unique solution. 13. There is a row vector c with n entries such that yA = c has a unique solution. 14. There is a matrix X such that AX = I. 15. There is a matrix Y such that Y A = I. Proof The first seven properties are equivalent from the definition of linear independence, the fact that row-rank is equal to column-rank, and because the row (column) REF of A is I if, and only if, the columns (rows) of A are independent. Now assume r(A) = n. Since the addition of column b (row c) to A does not change the column (row) rank, Ax = b has a solution by Lemma 3.4.21, and similarly so does yA = c. These solutions are necessarily unique on account of the independence of the columns (rows) of A. Assume next that (8) holds. Then we can solve the n matrix equations AX•j = I•j for the columns X•j and obtain an n × n matrix X such that AX = I. Similarly, if (11) holds we can find an n × n matrix Y such that Y A = I. By No.5 of Exercise 78, if either AX = I for some X or Y A = I for some Y , necessarily 113 3.5. SQUARE MATRICES r(A) = n. Thus all the above properties (1) - (13) are equivalent. Finally, suppose (14) holds. Then, as the properties are equivalent, Y A = I for some Y . By 3.5.3, X = Y and X is the inverse of A. A similar argument shows that if (15) holds, necessarily Y = A−1 . Example 3.5.3 Show that none of the matrices ¸ ¸ · ¸ · · 2 2 2 0 2 −3 , , 1 1 1 0 0 0 can have inverses. Solution: None of these has rank 2. Reduction method finds A−1 , or decides it does not exist 3.5.8 Reduce the augmented array [A, I] to row REF [A′′ , I ′′ ]. Then A has an inverse if, and only if, A′′ = I , in which case I ′′ = A−1 . We are using property (14) of 3.5.7. Note that if at any stage say, [A′ , I ′ ] of the reduction, A′ has a zero row, then A−1 cannot exist. Example 3.5.4 Consider the matrix of example 3.1.7:   2 5 1 A =  −3 −6 0  4 7 −2 We found there that an echelon form has three non-zero rows. Therefore, r (A) = 3 and A−1 exists. Find A−1 by row reduction. Solution: Start with 2 5 1 1 0 0 −3 −6 0 0 1 0 4 7 −2 0 0 1 2 5 1 3 0 23 2 0 −3 −4 2 0 0 1 0 0 1 0 −2 0 1 3 2 5 1 1 3 2 3 2 3 2 0 −1 2 5 1 0 1 1 0 0 1 2 5 0 0 1 0 0 0 1 2 2 −1 1 0 0 1 0 2 1 1 0 0 2 1 0 3 −1 −2 −1 2 8 3 −2 1 1 −1 3 2 R1 + R2 (−2) R1 + R3 2R2 + R3 2 3 R2 (−1) R3 −R3 + R1 −R3 + R2 114 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 2 0 0 0 1 0 0 0 1 −8 − 34 3 8 2 3 −1 −2 −4 1 −1 −5R2 + R1 1 0 0 0 1 0 0 0 1 −4 − 17 3 8 2 3 −1 −2 −2 1 −1 1 2 R1 Hence A−1  −4 − 17 3 8  2 = 3 −1 −2  −2 1  −1 Exercise 86 1. Find the inverses of the following matrices (where possible), by reduction to I, and solve the related problems. (a) A= (b) i. · 2 1 3 2 ¸  3 2 1 B= 0 2 2  0 0 −1  ii. Find the linear transformation T that maps column B•j of B to the corresponding column Y•j of Y for j = 1, 2, 3, where   1 0 4 Y =  −1 1 −2  2 3 1 Hint 87 You want a 3 × 3 matrix T such that Y•j = T B•j . In fact, T = Y B −1 (c) (d)  i. ii. Use your inverse to solve 1 6 C =  −2 3 7 12  2 5  −4   1 3 −1 D =  2 0 −3  −1 4 2 x1 + 3x2 − x3 = 3 2x1 − 3x3 = −1 −x1 + 4x2 + 2x3 = 2 iii. Use your inverse to solve y1 + 2y2 − y3 = 4 3y1 + 4y3 = 5 −y1 − 3y2 + 2y3 = −1 2. Prove that an invertible matrix maps linearly independent vectors to linearly independent vectors and state and prove the converse. 3.6. VECTOR SPACES: SUBSPACES OF ℜM 115 3. Let A and B be n × n matrices. Show that AB is invertible if, and only if, both A and −1 B are, in which case (AB) = B −1 A−1 . Generalize this result to products of three or more matrices. ¡ ¢−1 4. Let the n × n matrix A be invertible. Show that AT is also invertible and that AT = ¡ −1 ¢T A . 5. Let A be n × n. As a linear transformation, the mapping A : ℜn → ℜn is one-to-one if, and only if, A maps ℜn onto the whole of ℜn . Prove the statement. (Compare Example 3.4.4). 6. If M is a matrix that reflects in a plane or line passing through the origin, show algebraically that M −1 = M . Hint 88 Use M 2 = I from Exercise 63, No.17 7. * Let A be an m × n matrix of rank k. Show that by applying elementary row operations and elementary column operations to the matrix A we can find an invertible matrices P and Q such that ¸ · Ik Ok×(n−k) P AQ = O(m−k)×k O(m−k)×(n−k) Hint 89 First reduce A to row REF. Now use elementary column operations and the pivot columns to reduce the non-pivot columns to zero columns. Apply suitable permutations to the columns and finally use the results of Exercise 78), No.11.(Compare No.6 of the same exercise). 8. In Chapter 2, Exercise 39), No.3 we considered the system Ax = b of two equations for two unknowns (2.32). Show directly that, provided the determinant defined by |A| = a11 a22 − a21 a12 is not zero, · ¸ 1 a22 −a12 B= |A| −a21 a11 satisfies AB = BA = I, so that B = A−1 . Hence derive Cramer’s solution of the equations, as found in Chapter 2. (This theory will be further developed in Chapter 4). 9. Let A be a 3 × 3 matrix such that 2I − 3A + 5A2 − 7A4 = 0, where 0 is the 3 × 3 zero matrix. Show that A has an inverse and write it in terms of A. For the solution see Chapter 4, subsection 4.1.1 and equation (4.1.3). Hint 90 A(7A3 − 5A + 3I) = 2I 10. Find all n × n invertible A such that A2 = A. 3.6 Vector spaces: Subspaces of ℜm Let ℜm stand for ℜm×1 (the set of column vectors with m entries) or ℜ1×m (the set of row vectors with m entries). (The context will decide which is meant, or it may not matter, as in Theorem A below, subsection 3.6.4.) A non-empty subset S ⊆ ℜm is said to be a subspace of ℜm (or a vector space in ℜm ) if whenever a1 , a2 , . . . , ak are in ℜm it follows that any linear combination of a1 , a2 , . . . , ak is also in ℜm . One says that S is closed under linear combinations. It is easy to see that S is a subspace of ℜm if S is closed under linear combinations of any two vectors from S. (See Exercise 93, No.1.) 116 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES 3.6.1 Examples 1. The set S = {0} consisting of only the zero vector 0 ∈ ℜm×1 is a subspace of ℜm×1 . The whole space ℜm×1 is a subspace of itself. These are called trivial subspaces of ℜm×1 . 2. Let A be an m × n matrix. 3. The row space of A is a subspace of ℜ1×n . We denote this space by R (A). This is the span of the rows of A. 4. The column space of A is a subspace of ℜm×1 . It is denoted by C (A). This is the range of the mapping x → Ax. The set of solutions x of the homogeneous equation Ax = 0 is a subspace of ℜn×1 . It is called the null space of A and is denoted by N (A). 5. Let A be n × n and I = In . A scalar λ such that the matrix A − lambdaI fails to be invertible is called an eigenvalue of A (see No.9 in Exercise 101). The null space N (A − λI) is called the eigenspace belonging to λ. These are important concepts, both for pure and applied mathematics. 3.6.2 Bases and dimension We suppose for convenience here that subspaces are contained in ℜm = ℜm×1 . 3.6.3 Definition of basis Let S 6= {0} be a subspace of ℜm . A set a1 , a2 , . . . , an of linearly independent vectors that span S is called a basis of S. The main results are the following 3.6.4 Theorem A 6 {0} of ℜm . Then Let a1 , a2 , . . . , an be a basis for the subspace S = 1. The space S contains at most n linearly independent vectors. 2. If b2 , . . . , bk are vectors in S that span S then k ≥ n. 3. Any two bases of S contain the same number of vectors. Proof 1. Suppose that the vectors b1 , b2 , . . . , bk are linearly independent. By the fundamental theorem in 3.4.22, since they are in the span of the vectors a1 , a2 , . . . , an , adding the bj to this set cannot create more than n independent vectors. Thus k ≤ n. This proves 1. 2. A similar argument shows that if the vectors b1 , b2 , . . . , bk in S span the space S, then k ≥ n, since the independent set a1 , . . . , an is contained in this span. This proves 2. 3. If the vectors b1 , . . . , bk form a basis of S, then k ≥ n and k ≤ n. Hence k = n. This is the statement of 3. Corollary 91 A consequence of 1 is yet another proof that a homogeneous system Ax = 0 of m equations in n > m unknowns has a non-trivial solution. (See 2.2.15). For, ℜm has a basis consisting of the m unit vectors ej and A has n > m columns in ℜm . 3.6. VECTOR SPACES: SUBSPACES OF ℜM 3.6.5 117 Theorem B A subspace S = 6 {0} of ℜm always has a basis. Proof : Consider the following process. Let a1 6= {0} be a member of S. If a1 spans S the process stops. / sp(a1 ). By the fundamental theorem on linear independence, Otherwise there is some a2 ∈ 3.4.22, a1 and a2 are linearly independent. If S = sp(a1 , a2 ), stop. Otherwise, continuing in this way, the process must stop at some stage since ℜm has at most m linearly independent vectors. At which point we have linearly independent vectors a1 , a2 , . . . , an which span S. This proves the theorem. Remark 92 Let the basis a1 , a2 , . . . , an for S be the columns of the m × n matrix A = [a1 , a2 , . . . , an ]. What we have found is that r (A) = n and the column space C (A) = S. Definition The number k of vectors in any basis of S is called the dimension of S, written k = dim (S). By definition the dimension of the trivial space {0} is 0. Exercise 93 1. Let S be a non-empty subset of ℜm . Show that it is a subspace if, and only if the following conditions hold: (i). If a ∈ S and γ is a scalar, then γa ∈ S. (ii). If a and b are both in S then a + b ∈ S. 2. Show that the items listed in 3.6.1 do satisfy the criteria of No.1 to be subspaces. 3. Show that the dimension of ℜm is m. 4. Show that the columns of a matrix A are linearly independent if, and only if, N (A) = {0}. (See 3.56.) 5. Let A be an n × n matrix. Write down three equivalent necessary and sufficient on the dimensions of the row, column and null spaces of A for A to be invertible. 6. Let S and T be subspaces of ℜm . The sum S + T consists of all vectors a + b, where a ∈ S and b ∈ T . Show that both the intersection S ∩ T and sum S + T are subspaces of ℜm . 7. Without any reference to elementary row operations use the fundamental theorem 3.4.22 to prove that the column - and row-ranks of an m × n matrix A are equal. Hint 94 Use results from Theorem A above and its Corollary 91 and suppose the columns of the m × k matrix B form a maximum number of independent columns of A. Follow the proof in subsection 3.4.19. Alternatively, you can first assume the m rows of A are independent and show that the column-rank of A is at least m. The general case, column-rank A ≥ row-rank A, follows from this. 8. Show that the column space C (A) of an m × n matrix A has a basis consisting of r(A) columns of A. A similar result holds for the row space R (A). Conclude that the dimension of C (A) and R (A) are both r (A). 9. Let A be an m × n matrix. Suppose that B is the result of performing an elementary row operation on A (see subsection 2.2.6, Chapter 2). Show that A and B have the same row space. Conclude that if E is an echelon form (or REF) of A, then E and A have the same row space. For example, the rows of the coefficient matrix of example 2.2.5 in Chapter 2 are spanned 118 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES by the three rows £ ¤ £ ¤ £ 1 −1 1 2 , 0 1 3 1 , 0 0 1 ¤ −2 . Find two rows which span the row space of the matrix A in Exercise 35 of Chapter 2, No.1a. Hint 95 Suppose that a1 , . . . , am are the rows of matrix A and let y1 , . . . , ym and k be scalars. Then, y1 a1 + · · · + ym am = y1 (a1 + ka2 ) + (y2 − y1 k)a2 + · · · + ym am 10. Let b1 , b2 , . . . , bn span the space S of dimension n. Show that the vectors b1 , b2 , . . . , bn are necessarily independent. 11. Let independent vectors a1 , a2 , . . . , ak be in the space S of dimension n. Show that S has a basis that includes the vectors a1 , a2 , . . . , ak . (This is implicit in the proof of the above theorem B). 12. Let B be a matrix whose columns are in the column space of A, i.e., each B•j ∈ C(A). Then we know that C(B) ⊆ C(A) and that r(B) ≤ r(A). (Why?) Show that C(B) = C(A) if, and only if, r(B) = r(A). State and prove similar results for the row space R (A) of A. 13. * Show that if matrices A and B have the same size, then r (A + B) ≤ r [A, B] ≤ r (A) + r (B) Hint 96 Use C (A + B) ⊆ C [A, B] = C (A) + C (B) 14. * Prove the following exchange theorem, first proved by the 19th Century mathematician H. Grassmann: In ℜm let a1 , a2 , . . . , an be linearly independent and suppose that each vector aj depends linearly on b1 , b2 , . . . , bk . Then (i) n ≤ k. (ii) At least n vectors among b1 , b2 , . . . , bk can be dropped and replaced with the vectors a1 , a2 , . . . , an in such a way that the resulting set has the same span as that of b1 , b2 , . . . , bk . Hint 97 We may assume a1 , a2 , . . . , an , b1 , b2 , . . . , br is a basis for S = sp (b1 , b2 , . . . , bk ). Use the fact that the dimension of this space is n + r ≤ k. 15. * The following theorem is basic for vector space theory, and is closely related to the result proved in Exercise 52, No. 5: Let A be an m × n matrix. Show that the null and column spaces of A are related by the equation dim N (A) + dim C (A) = n Hint 98 Let u1 , . . . uk be a basis for the null space N (A) of A. Extend this to a basis u1 , . . . uk , uk+1 . . . un for the whole space ℜn×1 . Then Auk+1 . . . Aun is a basis for the column space C (A). 16. Deduce the result of Exercise 52, No. 5 from this theorem. 119 3.7. SUMMARY OF CHAPTER 3 3.7 3.7.1 Summary of Chapter 3 Matrix notation; columns and rows (See subsection 3.1.1) An m × n matrix A = [aij ] is also described by [A]ij = aij . ℜm×n denotes the set of m × n matrices with real number entries. Row i of the m × n matrix A is written £ ¤ Ai• = ai1 ai2 .... ain (i = 1, · · · , m) Column j is written    A•j =   a1j a2j .. . amj      (j = 1, · · · , n) Addition of two matrices of the same size is defined naturally, as is multiplication by a scalar α: [αA]ij = α [A]ij If A is m × n and x a column vector with n entries xi , we have   A1• x  A2• x    Ax =   = A•1 x1 + A•2 x2 + ... + A•n xn ..   . Am• x Here Ai• x = £ ai1 ai2 ··· ain  ¤     x1 x2 .. .  £   = ai1 x1  xn and    A•j xj =   a1j xj a2j xj .. . amj xj Similarly, if y is a row vector with m entries yj , yA = 3.7.2 £ yA•1 yA•2 ··· yA•n ¤ + ··· + ain xn ¤      = y1 A1• + y2 A2• + · · · + ym Am• Span, row space, column space (See (see item 2 in subsection 3.1.1) The span of the vectors a1 , . . . , am (all of the same size) is the set of linear combinations of a1 , . . . , am . It is denoted by sp(a1 , . . . , am ). The row space of a matrix A is the span of its rows (see also exercise 9). The column space of a matrix A is the span of its columns. 120 3.7.3 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Matrices as mappings (See section 3.2. Let A be a fixed m × n matrix. If x is a variable column vector with n entries, then y = Ax expresses y as a function of x; A transforms the vector x into the vector y, so the function is also called a transformation or mapping x → y = Ax. The transformation is linear since for all vectors x1 and x2 and scalars α1 and α2 , A (x1 α1 + x2 α2 ) = (Ax1 ) α1 + (Ax2 ) α2 3.7.4 The product of two matrices (See subsection 3.3.4) Let A be an m × k matrix and B a k × n matrix. Then the product AB is given by [AB]ij = Ai• B•j (i = 1, . . . , m; j = 1, . . . , n) Note that AB is only defined when the number of columns of A equals the number of rows of B. Then AB is m × n and [AB]i• [AB]•j = = Ai• B AB•j (i = 1, . . . , m) (j = 1, . . . , n) Furthermore, Ai• B AB•j = ai1 B1• + ai2 B2• + · · · + aik Bk• = A•1 b1j + A•2 b2j + · · · + A•k bkj If x is a variable vector with n entries, then the composite transformation x → A (Bx) is given by A (Bx) = (AB) x More generally, we have the associative law for multiplication: if AB and BC are defined, then so are (AB) C and A (BC) and (AB) C = A (BC) 3.7.5 Linear independence For this see section 3.4. The columns of a matrix A are linearly independent if the equation Ax = 0 has only the trivial solution x = 0. The rows of a matrix A are linearly independent if the equation yA = 0 has only the trivial solution y = 0. Elementary row operations preserve linear relations among the columns of A. See 3.4.7. The column rank of any m × n matrix A is the maximum number of linearly independent columns. This is the number of pivot columns in a row EF of A. (See subsection 3.4.8). The row and column ranks of A are equal. (See 3.4.20). The main theorem on linear independence is 3.4.22): If we add a column b with m entries to an m × n matrix A to form [A, b], then A and [A, b] have the same column rank if, and only if Ax = b has a solution, i.e., if, and only if, b is linearly dependent on the columns of A. 3.7. SUMMARY OF CHAPTER 3 3.7.6 121 Invertible square matrices (See subsection 3.5.3. An n × n matrix A is invertible if there is a matrix B such that AB = BA = I. When this is the case, B is unique and B = A−1 is called the inverse of A. A has an inverse if, and only if, r (A) = n and there are a number of equivalent conditions for this to be the case. See 3.5.7. 3.7.7 Vector spaces (See section 3.6). A subspace is a set S ⊆ ℜm closed under the taking of linear combinations. One of the main results is the dimension theorem: Any two bases of a space S have the same number of vectors. 122 CHAPTER 3. LINEAR TRANSFORMATIONS AND MATRICES Chapter 4 Determinants and the Cross Product 4.1 Determinants The determinant of a 1 × 1 matrix A = [a] is simply |A| = a (not |a|). 4.1.1 Determinants of 2 × 2 matrices Starting in Chapter 2, Exercise 39 No.3 and continuing in Chapter 3, Exercise 86, No.8 we found that · ¸ 1 a22 −a12 A−1 = |A| −a21 a11 provided that the determinant of A, |A| = a11 a22 − a21 a12 (4.1) is not zero. We will now see that for 2 × 2 matrices A, 4.1.2 A has an inverse if, and only if, |A| 6= 0. Proof Let d = |A| and consider the matrix C= · −a12 a11 ¸ |A| 0 0 |A| ¸ a22 −a21 (4.2) Then, by direct multiplication, CA = AC = · = dI where I = I2 is the 2 × 2 identity matrix. Therefore, if |A| 6= 0, the matrix · ¸ 1 1 a22 −a12 B= C= |A| d −a21 a11 satisfies BA = AB = I 123 (4.3) 124 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT and so B is the inverse of A. Conversely, suppose that A−1 exists. Multiply both sides of equation (4.3) by A−1 : ¡ ¢ A−1 (AC) = A−1 A C = IC = C = dA−1 Now if d = 0, then C = O, and therefore A = O. Obviously, the zero matrix cannot have an inverse, so d 6= 0. 4.1.3 Cramer’s Rule for two equations in two unknowns Let us use −1 A 1 = d · ¸ −a12 a11 a22 −a21 to solve the system of two equations for two unknowns: A · ¸ x1 x2 = · b1 b2 ¸ We have A−1 (Ax) = (A−1 A)x = Ix = x = A−1 b, so ¸ ¸· · ¸ · 1 b1 a22 −a12 x1 = b2 x2 d −a21 a11 · ¸ 1 a22 b1 − a12 b2 = d −a21 b1 + a11 b2 Or, x1 x2 ¯ ¯ = ¯¯ ¯ ¯ = ¯¯ b1 a12 b2 a22 a11 b1 a21 b2 ¯ ¯ ¯ / |A| ¯ ¯ ¯ ¯ / |A| ¯ Notice how column b replaces column A•j in the determinant of A for xj . This is known as Cramer’s Rule, after the 18th Century Swiss mathematician G. Cramer. 4.1.4 Properties of determinants of 2 × 2 matrices Although a determinant is a number, equation (4.1) is also called a 2 × 2 determinant. The following properties hold for 2 × 2 determinants 1. The determinant of A is the same as the determinant of its transpose: ¯ T¯ ¯ ¯ ¯A ¯ = |A| Proof :If A = · a11 a21 a12 a22 ¸ , then AT = ¯ T¯ ¯A ¯ = a11 a22 − a12 a21 = |A|. 2. Interchanging rows changes the sign ¯ ¯ A2• ¯ ¯ A1• Proof : ¯ ¯ ¯ ¯ A2• ¯ ¯ a21 ¯ ¯ ¯ ¯ A1• ¯ = ¯ a11 · a11 a12 a21 a22 ¸ , so of the determinant : ¯ ¯ ¯ ¯ ¯ ¯ ¯ = − ¯ A1• ¯ = − |A| ¯ ¯ A2• ¯ ¯ a22 ¯¯ = a21 a12 − a11 a22 = − |A| a12 ¯ 125 4.1. DETERMINANTS 3. Multiplying a row by a scalar λ multiplies the determinant by λ: ¯ ¯ ¯ ¯ ¯ λA1• ¯ ¯ A1• ¯ ¯ ¯=¯ ¯ ¯ A2• ¯ ¯ λA2• ¯ = λ |A| Proof : ¯ ¯ ¯ ¯ λA1• ¯ ¯ λa11 ¯ ¯ ¯ ¯ A2• ¯ = ¯ a21 Similarly, 4. Let C = £ c1 ¯ λa12 ¯¯ = λa11 a22 − a21 λa12 = λ |A| a22 ¯ ¯ ¯ A1• ¯ ¯ λA2• ¯ ¯ ¯ = λ |A| ¯ ¤ c2 , then we have the expansion ¯ ¯ A1• + C ¯ ¯ A2• Proof : ¯ ¯ ¯ ¯ A1• ¯=¯ ¯ ¯ A2• ¯ ¯ ¯ ¯ ¯ C ¯ ¯+¯ ¯ ¯ ¯ A2• ¯ ¯ a12 + c2 ¯¯ ¯ = (a11 + c11 )a22 − a21 (a12 + c12 ) a22 ¯ ¯ ¯ ¯ ¯ ¯ a a12 ¯¯ ¯¯ c1 c2 ¯¯ ¯¯ A1• = a11 a22 − a21 a12 + c1 a22 − a21 c2 = ¯¯ 11 + = a21 a22 ¯ ¯ a21 a22 ¯ ¯ A2• ¯ ¯ A1• + C ¯ ¯ A2• ¯ ¯ ¯ ¯ a11 + c1 ¯=¯ ¯ ¯ a21 5. If one row is a multiple of another the determinant vanishes: ¯ ¯ ¯ A1• ¯ ¯ ¯ ¯ λA1• ¯ = 0 ¯ ¯ A1• Proof :¯¯ λA1• ¯ · ¯ a11 ¯= ¯ λa11 a12 λa12 ¸ ¯ ¯ ¯ ¯ ¯ C ¯ ¯+¯ ¯ ¯ ¯ A2• ¯. = a11 λa12 − λa11 a12 = 0. 6. Adding a multiple of one row to another row leaves the determinant invariant: ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ λA1• + A2• ¯ = |A| ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A1• ¯ ¯ A1• ¯ ¯ A1• ¯ = 0 + |A| = |A|. ¯+¯ ¯=¯ Proof :By (4) and (5), ¯¯ λA1• + A2• ¯ ¯ λA1• ¯ ¯ A2• ¯ 7. If A and B are 2 × 2 matrices, then |AB| = |A| |B| Proof : To better see the argument |AB| = = = = · ¸ a b let A = . Then using properties c d ¯ ¯· ¸ ¯ ¯ ¯ ¯ aB1• + bB2• ¯ ¯ a b ¯=¯ ¯ ¯ B ¯ ¯ cB1• + dB2• ¯ ¯ c d ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ bB2• aB1• ¯+¯ ¯ ¯ ¯ cB1• + dB2• ¯ ¯ cB1• + dB2• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ aB1• ¯ ¯ aB1• ¯ ¯ bB2• ¯ ¯ bB2• ¯ ¯ ¯ ¯+¯ ¯+¯ ¯ cB1• ¯ ¯ dB2• ¯ ¯ cB1• ¯ + ¯ dB2• ¯ ¯ ¯ ¯ ¯ aB1• ¯ ¯ cB1• ¯ ¯−¯ ¯+0 0 + ¯¯ dB2• ¯ ¯ dB2• ¯ = (ad − bc) |B| = |A| |B| (2) - (5) we have: ¯ ¯ ¯ ¯ 126 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT 8. The above properties hold for columns in place of rows. In No.1 of Exercise 101 you are asked to formally state and prove this. 4.1.5 3 × 3 determinants 4.1.6 The search for a formula solving simultaneous equations Let us try to solve formally  a11  a21 a31 a12 a22 a32 The detached array for this system is a11 a21 a31     a13 x1 b1 a23   x2  =  b2  a33 x3 b3 a12 a22 a32 a13 a23 a33 b1 b2 b3 Assuming a11 6= 0 this is equivalent to a11 0 0 a12 a − aa21 12 + a22 11 a − aa31 12 + a32 11 In particular, this means · a21 − a11 a12 + a22 a12 + a32 − aa31 11 Assuming that a13 − aa21 a 13 + a23 11 − aa31 a 13 + a33 11 − aa21 a13 + a23 11 − aa31 a13 + a33 11 ¯ a21 ¯ − a12 + a22 ¯ aa11 ¯ − 31 a12 + a32 a11 we can use Cramer’s rule to find b1 21 − aa11 b1 + b2 31 − aa11 b1 + b3 ¸· x2 x3 ¸ = · − aa21 R1 + R2 11 − aa31 R1 + R3 11 21 − aa11 b1 + b2 a31 − a11 b1 + b3 ¸ ¯ − aa21 a13 + a23 ¯¯ 11 6 0 = − aa31 a13 + a33 ¯ 11 ¯ a21 ¯ − b1 + b2 ¯ aa11 ¯ − 31 b1 + b3 a11 x2 = ¯ a21 ¯ − a12 + a22 ¯ aa11 ¯ − 31 a12 + a32 a11 Multiply numerator and denominator of (4.4) above: ¯ ¯ a11 b2 − a21 b1 ¯ ¯ a11 b3 − a31 b1 x2 = ¯¯ ¯ a11 a22 − a21 a12 ¯ a11 a32 − a31 a12 ¯ 21 − aa11 a13 + a23 ¯¯ 31 − aa11 a13 + a33 ¯ ¯ − aa21 a13 + a23 ¯¯ 11 − aa31 a13 + a33 ¯ 11 (4.4) by a211 and use property 3 of subsection 4.1.4 ¯ a23 a11 − a21 a13 ¯¯ a33 a11 − a31 a13 ¯ ¯ a23 a11 − a21 a13 ¯¯ a33 a11 − a31 a13 ¯ Consider the denominator: (a11 a22 − a21 a12 )(a33 a11 − a31 a13 ) − (a11 a32 − a31 a12 )(a23 a11 − a21 a13 ) = a11 a11 a22 a33 − a11 a13 a22 a31 − a11 a12 a21 a33 + a12 a21 a31 a13 −a11 a11 a23 a32 + a11 a13 a21 a32 + a11 a12 a23 a31 − a12 a13 a21 a31 127 4.1. DETERMINANTS = a11 (a11 a22 a33 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 + a13 a21 a32 + a12 a23 a31 ). The numerator for x2 is found by substituting b1 for a12 , b2 for a22 and b3 for a32 : a11 (a11 b2 a33 − a13 b2 a31 − b1 a21 a33 − a11 a23 b3 + a13 a21 b3 + b1 a23 a31 ). Finally, we find x2 = a11 b2 a33 − a13 b2 a31 − b1 a21 a33 − a11 a23 b3 + a13 a21 b3 + b1 a23 a31 a11 a22 a33 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 + a13 a21 a32 + a12 a23 a31 (4.5) The denominator is defined as the determinant |A| of the 3 × 3 matrix A: |A| = a11 a22 a33 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 + a13 a21 a32 + a12 a23 a31 (4.6) |A| is called a 3 × 3 determinant and it is sometimes written as det A. Equation (4.5) can be written ¯ ¯ a11 ¯ ¯ a21 ¯ ¯ a31 x2 = b1 a13 b2 a23 b3 a33 |A| ¯ ¯ ¯ ¯ ¯ ¯ Can you guess what x1 and x3 are, if |A| 6= 0? This is again Cramer’s rule. (See exercise No.7 below). 4.1.7 Discussion on the definition of the 3 × 3 determinant |A| Each term of (4.6) is of the form ±a1i a2j a3k The column indices i, j, and k form an arrangement (permutation) ijk of the numbers 1, 2, and 3. In fact all six permutations occur and each has its own sign according to the scheme: +123 4.1.8 −321 −213 −132 +312 +231 The rule is for finding the sign of ijk Start with 123 as +. Each time an interchange of two numbers occurs, change the sign as follows: +123 −→ −132 (interchange 2 and 3) −132 −→ +231 (interchange 1 and2) +231 −→ −213 (interchange 1 and 3) −213 −→ +312 (interchange 2 and 3) +312 −→ −321 (interchange 1 and 2) 128 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT 4.1.9 Formal definition of |A| Let σ = (σ1, σ2, σ3) denote a typical permutation of 1, 2, 3 and let sgn(σ)) be its sign. Then we can write |A| = X sgn(σ)a1σ1 a2σ2 a3σ3 σ Here the sum extends over all 3! = 6 permutations σ of 1, 2, 3. 4.1.10 The cofactor Ai|j of aij The cofactor of aij is defined as i+j Ai|j = (−1) times determinant formed by crossing out row i and column j ¯ ¯ ¯ ¯ a22 a23 ¯ ¯ a22 ¯ ¯ ¯ Example 4.1.1 A1|1 = (−1) ¯ a32 a33 ¯ = ¯ a32 ¯ ¯ ¯ ¯ ¯ a21 a23 ¯ a23 ¯¯ 1+2 ¯¯ a21 ¯ ¯, = − A1|2 = (−1) ¯ ¯ ¯ ¯ ¯ a31 a33 ¯ ¯ a31 a33¯ ¯ ¯ ¯ ¯ a a13 ¯ ¯ a11 a13 ¯ A2|2 = (−1)2+2 ¯¯ 11 = , a31 a33 ¯ ¯ a31 a33 ¯ ¯ ¯ ¯ ¯ ¯ a11 a12 ¯ a12 ¯¯ 2+3 ¯¯ a11 ¯ ¯ A2|3 = (−1) ¯ a31 a32 ¯ = − ¯ a31 a32 ¯, ¯ ¯ ¯ ¯ a13 ¯¯ ¯¯ a12 a13 ¯¯ 3+1 ¯¯ a12 A3|1 = (−1) ¯ a22 a23 ¯ = ¯ a22 a23 ¯ 1+1 4.1.11 a23 a33 ¯ ¯ ¯, ¯ The sign pattern for the cofactors Ai|j Each time we move by one row or by one column we get a change in sign: + − + 4.1.12 − + − + − + Expansion of |A| by a row Expansion by row 1: |A| = a11 a22 a33 − a11 a23 a32 + a12 a23 a31 − a12 a21 a33 + a13 a21 a32 − a13 a22 a31 = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ) ¯ ¯ ¯ ¯ ¯ ¯ ¯ a21 a22 ¯ ¯ a21 a23 ¯ ¯ a a23 ¯¯ ¯ ¯ ¯ ¯ + a − a = a11 ¯¯ 22 13 ¯ 12 ¯ a31 a33 ¯ a31 a32 ¯ a32 a33 ¯ Hence |A| = a11 A1|1 + a12 A1|2 + a13 A1|3 Expansion by row 2: |A| = a11 a22 a33 − a11 a23 a32 + a12 a23 a31 − a12 a21 a33 + a13 a21 a32 − a13 a22 a31 (4.7) 129 4.1. DETERMINANTS −a21 (a12 a33 − a13 a32 ) + a22 (a11 a33 − a13 a31 ) − a23 (a11 a32 − a12 a31 ) ¯ ¯ ¯ ¯ ¯ ¯ ¯ a11 a12 ¯ ¯ a11 a13 ¯ ¯ a12 a13 ¯ ¯ ¯ ¯ ¯ ¯ ¯ − a23 ¯ + a22 ¯ = −a21 ¯ a31 a32 ¯ a31 a33 ¯ a32 a33 ¯ |A| = a21 A2|1 + a22 A2|2 + a23 A2|3 Expansion by row 3: |A| = a11 a22 a33 − a11 a23 a32 + a12 a23 a31 − a12 a21 a33 + a13 a21 a32 − a13 a22 a31 = a31 (a12 a23 − a13 a22 ) − a32 (a11 a23 − a13 a21 ) + a33 (a11 a22 − a12 a21 ) ¯ ¯ a = a31 ¯¯ 12 a22 Hence: ¯ ¯ ¯ a a13 ¯¯ − a32 ¯¯ 11 a23 ¯ a21 ¯ ¯ ¯ a a13 ¯¯ + a33 ¯¯ 11 a23 ¯ a21 ¯ a12 ¯¯ a22 ¯ |A| = a31 A3|1 + a32 A3|2 + a33 A3|3 4.1.13 Expansion by row i |A| = ai1 Ai|1 + ai2 Ai|2 + ai3 Ai|3 Example 4.1.2 Let  1 3 A= 2 4 −2 5 Find |A| by expanding by the second row.  −1 0  −3 Solution: |A| = a21 A2|1 + a22 A2|2 + a23 A2|3 ¯ ¯ ¯ ¯ ¯ 3 −1 ¯ ¯ ¯ ¯ + (4)(−1)2+2 ¯ 1 −1 ¯ = (2)(−1)2+1 ¯¯ ¯ ¯ 5 −3 −2 −3 ¯ = (−2)(−4) + (4)(−5) = −12 4.1.14 Properties of 3 × 3 determinants 1. The determinant of A and its transpose are equal: ¯ ¯ |A| = ¯AT ¯ Proof : |A| = a11 a22 a33 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 + a13 a21 a32 + a12 a23 a31 = a11 a22 a33 − a31 a22 a13 − a21 a12 a33 − a11 a32 a23 + a21 a32 a13 + a31 a12 a23 . If B = AT then bij = aji by definition, so that |A| is equal to b11 b22 b33 − b13 b22 b31 − b12 b21 b33 − b11 b23 b32 + b12 b23 b31 + b13 b21 b32 (4.8) 130 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT = |B| This proves the result. Note 99 The idea of the proof can be seen as follows: Consider some term of the full expansion of |A|, say −a12 a21 a33 , or +a13 a21 a32 . In the first case, −a12 a21 a33 = −a21 a12 a33 = −b12 b21 b33 . In the second case, +a13 a21 a32 = +a21 a32 a13 = +b12 b23 b31 . In both cases we get a typical term of |B| with its correct sign. 2. Interchanging two rows changes the sign of the determinant. Proof : Consider, for example, interchanging rows 2 and 3. Expand |A| by row 1: |A| = a11 A1|1 + a12 A1|2 + a13 A1|3 ¯ ¯ ¯ ¯ a ¯ a22 a23 ¯ a23 ¯ ¯ − a12 ¯¯ 21 = a11 ¯ a31 a33 a32 a33 ¯ ¯ ¯ ¯ ¯ ¯ + a13 ¯ a21 ¯ a31 ¯ ¯ a22 ¯¯ a32 ¯ Hence, if we interchange A2• and A3• , the 2 × 2 determinants will change sign and so will |A| . 3. Multiplying a row of A by a scalar multiplies the determinant by the same scalar: ¯ ¯ ¯ ¯ ¯ ¯ ¯ λA1• ¯ ¯ A1• ¯ ¯ A1• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A2• ¯ = ¯ λA2• ¯ = ¯ A2• ¯ = λ |A| ¯ ¯ ¯ ¯ ¯ ¯ ¯ A3• ¯ ¯ A3• ¯ ¯ λA3• ¯ Proof : Consider for example multiplying row 2 by λ. Expand the result by row 2: ¯ ¯ ¯ A1• ¯ ¯ ¯ ¡ ¢ ¯ λA2• ¯ = (λa21 )A2|1 + (λa22 )A2|2 + (λa23 )A2|3 = λ a21 A2|1 + a22 A2|2 + a23 A2|3 = λ |A| ¯ ¯ ¯ A3• ¯ Less elegantly, this can also be seen by expanding by row 1: ¯ ¯ ¯ A1• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ λA2• ¯ = a11 ¯ λa22 λa23 ¯ − a12 ¯ λa21 λa23 ¯ + a13 ¯ λa21 ¯ ¯ ¯ a32 ¯ ¯ ¯ ¯ a31 a33 a31 a33 ¯ A3• ¯ = λ |A| ¯ λa22 ¯¯ a32 ¯ We have used the fact that each cofactor gets multiplied by λ. £ ¤ 4. Let B = b1 b2 b3 , then ¯ ¯ A1• + B ¯ ¯ A2• ¯ ¯ A3• Proof : ¯ ¯ A1• + B ¯ ¯ A2• ¯ ¯ A3• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ = ¯ A2• ¯ ¯ ¯ ¯ A3• ¯ ¯ ¯ ¯ ¯ ¯ B ¯ ¯ ¯ B ¯ ¯ ¯ ¯ ¯ ¯ ¯ + ¯ A2• ¯ = |A| + ¯ A2• ¯ ¯ ¯ ¯ ¯ ¯ ¯ A3• ¯ ¯ ¯ A3• ¯ = (a11 + b1 )A1|1 + (a12 + b2 )A1|2 + (a13 + b3 )A1|3 = = a11 A1|1 + a12 A1|2 + a13 A1|3 + b1 A1|1 + b2 A1|2 + b3 A1|3 ¯ ¯ ¯ ¯ ¯ A1 ¯ ¯ B ¯ ¯ ¯ ¯ ¯ ¯ A2 ¯ + ¯ A2 ¯ ¯ ¯ ¯ ¯ ¯ A3 ¯ ¯ A3 ¯ 131 4.1. DETERMINANTS Similar results apply if the row B is added to another ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A1• ¯ ¯ A1• ¯ ¯ ¯ ¯ ¯ ¯ A2• + B ¯ = ¯ A2• ¯ + ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A3• ¯ ¯ A3• row, e.g. ¯ A2• ¯¯ B ¯¯ A3• ¯ Using property (2) this can be proved directly from the first result as follows ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A2• ¯ ¯ B ¯ ¯ A1• ¯ ¯ A2• ¯ A2• + B ¯ ¯ ¯ A1• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ = − ¯ A1• ¯ − ¯ A1• ¯ = ¯ A2• ¯ + ¯ B ¯ A2• + B ¯ = − ¯ A1• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ A3• ¯ ¯ A3• ¯ ¯ A3• ¯ ¯ A3• ¯ ¯ ¯ ¯ A3• A3• 5. If one row of A is a multiple of another then, |A| = 0. Proof ¯ ¯ ¯ ¯ ¯ ¯ First suppose that two rows are equal, say A1• = A2• . Then interchanging rows 1 and 2 of A changes the sign of the determinant by property (2). Hence |A| = − |A| and so |A| = 0. Now let A1• = λA2• . By property (3), |A| is λ times a determinant with equal rows 1 and 2 and so vanishes. 6. Adding a multiple of one row to a different row leaves |A| unchanged, e.g. ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ λA1• + A2• ¯ = |A| ¯ ¯ ¯ ¯ A3• Proof : Use property 5: ¯ ¯ A1• ¯ ¯ λA1• + A2• ¯ ¯ A3• ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ = ¯ A2• ¯ ¯ ¯ ¯ A3• 7. If A and B are 3 × 3 matrices, ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ + ¯ λA1• ¯ ¯ ¯ ¯ A3• ¯ ¯ ¯ ¯ A1• ¯ ¯ ¯ = ¯ A2• ¯ ¯ ¯ ¯ A3• ¯ ¯ ¯ ¯ + 0 = |A| ¯ ¯ ∗ |AB| = |A| |B| (4.9) * We will not prove this but note that the proof follows that for 2 × 2 determinants. 8. Any of the above properties (2) - (6) are true for columns in place of rows. Proof : ¯ ¯ This follows from property (1), namely that |A| = ¯AT ¯. In particular, we have: (a) The expansion of |A| by column j: |A| = A1|j a1j + A2|j a2j + a3j A3|j a2j (b) Interchanging two columns changes the sign of the determinant. (c) Multiplying a column of A by a scalar multiplies the determinant by the same scalar. (d) If one column of A is a multiple of another, then |A| = 0. (e) Adding a multiple of one column to another leaves |A| unchanged. 132 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT Example 4.1.3 Let   −1 0  −3 1 3 A= 2 4 −2 5 Find |A| by expanding by the third column. Solution: |A| = A1|3 a13 + A2|3 a23 + A3|3 a33 ¯ ¯ ¯ 4 ¯¯ 3+3 ¯¯ 1 3 1+3 ¯¯ 2 = (−1) ¯ 2 4 ¯ −2 5 ¯ (−1) + A2|3 (0) + (−1) = 18(−1) + (−2)(−3) = −12 ¯ ¯ ¯ (−3) ¯ Simplification: We may also simplify the evaluation of the another: ¯ ¯ 1 3 −1 ¯ ¯ 2 4 0 ¯ ¯ −5 −4 0 Expanding by the third column gives 1+3 |A| = (−1) 4.1.15 determinant by adding a multiple of a row to ¯ ¯ ¯ ¯ ¯ ¯ (−3)R1 + R3 ¯ ¯ 2 4 ¯ ¯ −5 −4 ¯ ¯ ¯ (−1) = −12 ¯ Fundamental properties of cofactors For rows: ai1 Aj|1 + ai2 Aj|2 + ai3 Aj|3 ai1 Aj|1 + ai2 Aj|2 + ai3 Aj|3 = |A| if i = j = 0 if i 6= j (4.10) A1|i a1j + A2|i a2j + A3|i a2j A1|i a1j + A2|i a2j + A3|i a2j = |A| if i = j = 0 if i 6= j (4.11) For columns: Proof We know that if i = j in (4.10) the left-hand side is just the expansion of |A| by row i. Now let i 6= j, say i = 2 and j = 1. Then = a21 A1|1 + a22 A1|2 + a23 A1|3 ¯ ¯ ¯ ¯ a21 a23 ¯ a a23 ¯¯ ¯ − a a21 ¯¯ 22 22 ¯ ¯ a32 a33 a31 a33 This is the expansion of ¯ ¯ A2• ¯ ¯ A2• ¯ ¯ A3• ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ + a23 ¯ a21 ¯ a31 ¯ by its first row and so vanishes. The result for columns can be derived from that for rows. ¯ a22 ¯¯ a32 ¯ 133 4.1. DETERMINANTS 4.1.16 The adjoint adj(A) of A and the inverse A−1 The adjoint of the n × n matrix A is the n × n matrix adj(A), where [adj (A)]ij = Aj|i This means that we can first find the matrix B with [B]ij = Ai|j and then adj(A) = B T . 4.1.17 Properties (4.10) and (4.11) as matrix products Properties (4.10) and (4.11) just say A adj (A) = |A| I = adj (A) A where I = I3 is the 3 × 3 identity matrix. Remark 100 For a 2 × 2 matrix A = · a11 a21 a12 a22 ¸ , the adjoint is adj(A) = (4.12) · a22 −a21 −a12 a11 ¸ . This was found in equation (4.2) and the property (4.12) was found in equation (4.3). 4.1.18 Corollary: condition for A to be invertible. Formula for the inverse The 3 × 3 matrix A is invertible if, and only if, |A| 6= 0, in which case A−1 = 1 adj (A) |A| (4.13) (For the definition of ‘invertible’, see Chapter 3, subsection 3.5.2). Proof If |A| 6= 0, then equation (4.12) shows that µ ¶ µ ¶ 1 1 A adj (A) = I = adj (A) A |A| |A| In other words, (4.13) holds. Conversely, suppose that A−1 exists. Let an elementary row operation be done on A to obtain A′ . By properties (2), (3) and (6) we see that |A| 6= 0 if, and only if, |A′ | 6= 0. As A−1 exists, its row REF is I. Since |I| = 1, it follows that |A| 6= 0.   2 4 3 Example 4.1.4 Let A =  0 1 −1  3 5 7 −1 If A exists, find it. Solution: |A| = a21 A2|1 + a22 A2|2 + a23 A2|3 ¯ ¯ ¯ ¯ ¯ 23 ¯ ¯ 24 ¯ ¯ ¯ ¯ ¯=3 =0+¯ − (−1) ¯ 37 ¯ 35 ¯ As |A| 6= 0, A−1 exists. We find 134 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT A1|1 A2|1 A3|1 ¯ ¯ ¯ ¯ ¯ ¯ ¯ 0 1 ¯ ¯ 0 −1 ¯ ¯ 1 −1 ¯ ¯ ¯ = −3 ¯ ¯ ¯ ¯ = −3, A1|3 = ¯ = 12, A1|2 = − ¯ =¯ 3 5 ¯ 3 7 ¯ 5 7 ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ 4 3 ¯ ¯ 2 3 ¯ ¯ 2 4 ¯ ¯ ¯ ¯ ¯ ¯ ¯=2 = −¯ = −13, A2|2 = ¯ = 5, A2|3 = − ¯ 5 7 ¯ 3 7 ¯ 3 5 ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ 4 3 ¯ ¯ = −7, A3|2 = −(−2) = 2, A3|3 = ¯ 2 4 ¯ = 2 = ¯¯ ¯ ¯ 0 1 ¯ 1 −1 Hence A−1 = = = 4.2  A A1|2 1 1  1|1 A2|1 A2|2 adjA = |A| |A| A3|1 A3|2  T 12 −3 −3 1 −13 5 2  3 −7 2 2   12 −13 −7 1 −3 5 2  3 −3 2 2 T A1|3 A2|3  A3|3 Higher Order Determinants Let A be an n × n matrix. As with three numbers (see subsection 4.1.7), a permutation (σ1, σ2, ..., σn) of 1, 2, · · · , n is called even if it can be obtained from 1, 2, · · · , n by an even number of interchanges, otherwise odd. sign (σ) = +1 if σ is even, and sgn(σ) = −1 if σ is odd. Following the formal definition for 3 × 3 determinants 4.7, for a general n × n matrix A, its determinant |A| is given by |A| = X sgn(σ)a1σ1 a2σ2 ...anσn σ where the sum extends over all n! permutations σ of the numbers 1, 2, · · · , n. Recall that n! = (1) (2) (3) · · · (n − 1) (n) and is called n factorial. As with 2 × 2 and 3 × 3 determinants (4.1.10 above), if A is an n × n matrix, the cofactor Ai|j of aij is given by i+j Ai|j = (−1) times the determinant formed by crossing out row i and column j For a 4 × 4 determinant the sign pattern for the cofactors is + − + − 4.2.1 − + − + + − + − − + − + All the properties of 2 × 2 and 3 × 3 determinants hold for n × n determinants So properties in 4.1.14, 4.1.13 (equation 4.8) carry over to n × n determinants. In particular, the formula |AB| = |A| |B| and equation (4.12) hold as well as the statement on the inverse in Corollary 4.1.18 and equation (4.13). 135 4.2. HIGHER ORDER DETERMINANTS Example 4.2.1 Find |A| if Solution:  3  1 A=  −1 0 ¯ ¯ ¯ ¯ |A| = ¯¯ ¯ ¯  −2 −1 2 −2 4 0   3 1 −3  1 −1 2 3 −2 −3 1 −2 2 −1 3 4 0 1 0 6 4 −9 0 ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Here we have added column 2 to column 3 and (−2)(Column 2) to column 4. Expanding by row 4: |A| ¯ ¯ ¯ 4+2 ¯ = (−1) ¯ ¯ ¯ ¯ 0 −3 ¯ = ¯¯ 3 2 ¯ 3 4 ¯ 3 −3 6 ¯¯ 1 2 4 ¯¯ −1 4 −9 ¯ ¯ ¯ 0 ¯¯ ¯ 3 8 1+2 ¯ 8 ¯ = (−1) (−3) ¯¯ 3 −1 −1 ¯ ¯ ¯ ¯ = −81 ¯ We have added column 2 to column 1 and 2(column 2) to column 3 then expanded by row 1. Exercise 101 Partial Solutions 1. Using property (1) for 2 × 2 determinants, state and prove similar results for columns in (2)-(6). 2. Evaluate the determinants of the matrices in Chapter 3, Exercise 86, No. 1. Find the inverses of the matrices by the adjoint method if they exist. The values of the the determinants are: ¯ ¯ |A| = ¯¯ ¯ ¯ ¯ |B| = ¯¯ ¯ ¯ ¯ ¯ |Y | = ¯¯ ¯ ¯ ¯ ¯ |T | = ¯¯ ¯ ¯ ¯ ¯ |C| = ¯¯ ¯ ¯ ¯ ¯ |D| = ¯¯ ¯ ¯ 2 1 ¯¯ =1 3 2 ¯ ¯ ¯ ¯ ¯ = −6 ¯ ¯ ¯ 1 0 4 ¯¯ −1 1 −2 ¯¯ = −13 2 3 1 ¯ ¯ 1 ¯ − 13 − 13 3 3 ¯ 5 10 ¯ 1 13 −3 6 3 ¯= 6 5 4 2 ¯ 3 6 3 ¯ 1 6 2 ¯¯ −2 3 5 ¯¯ = 0 7 12 −4 ¯ ¯ 1 3 −1 ¯¯ 2 0 −3 ¯¯ = 1 −1 4 2 ¯ 3 2 0 2 0 0 1 2 −1 3. Evaluate the determinants 136 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT ¯ ¯ ¯ ¯ (a) ¯¯ ¯ ¯ ¯ ¯ ¯ ¯ (b) ¯¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ (c) ¯¯ ¯ ¯ ¯ ¯ 1 −3 0 −2 ¯¯ 3 −12 −2 −6 ¯¯ −2 10 2 5 ¯¯ −1 6 1 3 ¯ ¯ −3 7 8 4 ¯¯ 2 1 0 3 ¯¯ 6 −1 −2 −1 ¯¯ −3 4 2 11 ¯ −3 7 2 1 6 −1 −3 4 −3175 − 25 2 ¯ 8 4 −7 ¯¯ 0 3 −1 ¯¯ −2 −1 1 ¯¯ 2 11 −4 ¯¯ 17 59 12 ¯ 4. Verify that the adjoint of a 2 × 2 matrix is given by equation (4.2). 5. Let A be a 3 × 3 matrix and k a scalar. Show that |kA| = k 3 |A|. What do you think the general result is? 6. Suppose the n × n matrix A is not invertible. Show that A(adj A) is the zero matrix. 7. State and prove Cramer’s rule for 3 equations in 3 unknowns. Do the same for n equations in n unknowns using the fact that higher order determinants have the same properties as 2 and 3 determinants. 8. (a) Let A = [aij ] be a 3 × 3 matrix and β a scalar. Replace the entry a23 with β and let the resulting matrix be B. Show that |B| = |A| + (β − a23 ) A2|3 , and find a criterion for B to be invertible. (b) Generalize the previous result. 9. Let A be a 3 × 3 matrix. A scalar λ such that A − λI is not invertible is called an eigenvalue of A (see No.5 in Examples 3.6.1). This means that |A − λI| = 0. Show that λ must satisfy a polynomial equation f (λ) = 0 of degree 3 (called the characteristic polynomial of A). Find this polynomial for the matrix   10 −12 −1 −6 −1  A= 5 −11 12 0 as well as the eigenvalues and three corresponding eigenvectors. Solution: f (λ) = λ3 − 4λ2 + λ + 6. The eigenvalues are λ = 2, λ = 3 and λ = −1. Eigenvectors belonging to these eigenvalues are respectively £ 7 ¤T £ 3 ¤T ¤T £ 4 1 − 34 1 − 32 1 56 , , and . 3 2 6 10. Let |A| be a 3 × 3 determinant. We can permute the rows A1• , A2• and A3• in six ways. For which of these does the determinant keep its sign and for which does it change its sign? State and prove a similar result for the columns A•1 , A•2 and A•3 of A. 4.3 The Cross Product a × b In this section all vectors have three entries and, as in the notation of Chapter 1, will normally be row vectors. The unit vectors i = (1, 0, 0), j = (0, 1, 0) and k = (0, 0, 1) were introduced in 4.3. THE CROSS PRODUCT A × B 137 subsection 1.1.17. Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) are two vectors. Their cross product is defined as a×b = = ¯ ¯ ¯ ¯ ¯ ¯ ¯ a2 a3 ¯ ¯ ¯ ¯ ¯ ¯ ¯ i − ¯ a1 a3 ¯ j + ¯ a1 a2 ¯ k ¯ b2 b3 ¯ ¯ b1 b3 ¯ ¯ b1 b2 ¯ ¯ ¯ ¯ ¯ ¯¶ µ¯ ¯ a2 a3 ¯ ¯ ¯ ¯ ¯ ¯ ¯ , − ¯ a1 a3 ¯ , ¯ a1 a2 ¯ ¯ b2 b3 ¯ ¯ b1 b3 ¯ ¯ b1 b2 ¯ (4.14) This is sometimes called the vector product and is used constantly in mechanics. Treating i, j and k as though they are scalars (which they are not), the cross product can be remembered by the formula ¯ ¯ ¯ i j k ¯¯ ¯ a × b = ¯¯ a1 a2 a3 ¯¯ ¯ b1 b2 b3 ¯ Expanding this ‘determinant’ by its first row gives (4.14). If a and b are written as columns the convention is to write ¯ ¯ ¯ i a1 b 1 ¯ ¯ ¯ a × b = ¯¯ j a2 b2 ¯¯ ¯ k a3 b 3 ¯ but we get the same result as before. Example 4.3.1 ¯ ¯ ¯ = ¯¯ ¯ ¯ ¯ = ¯¯ (−1, 2, 4) × (2, −3, 1) ¯ j k ¯¯ i −1 2 4 ¯¯ 2 −3 1 ¯ ¯ ¯ ¯ ¯ ¯ −1 4 ¯ ¯ −1 2 2 4 ¯¯ ¯ ¯ j + ¯¯ i−¯ 2 1 ¯ 2 −3 −3 1 ¯ = (14, 9, −1) Example 4.3.2 A characteristic property of i, j and k i×j =k To see these results, consider i×j j×k ¯ ¯ ¯ = ¯¯ ¯ ¯ ¯ = ¯¯ = = k×i = = ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ j×k =i k×i=j ¯ i j k ¯¯ 1 0 0 ¯¯ 0 1 0 ¯ ¯ ¯ ¯ 1 0 0 0 ¯¯ i − ¯¯ ¯ 0 0 1 0 ¯ ¯ ¯ ¯ ¯j + ¯ 1 0 ¯ 0 1 ¯ ¯ ¯ ¯k = k ¯ ¯ ¯ ¯ 0 0 1 0 ¯¯ i − ¯¯ 0 1 ¯ 0 1 ¯ ¯ ¯ ¯ ¯j + ¯ 0 1 ¯ ¯ 0 0 ¯ ¯ ¯k = i ¯ ¯ ¯ ¯ 0 1 0 1 ¯¯ i − ¯¯ 0 0 ¯ 1 0 ¯ ¯ ¯ ¯ ¯j + ¯ 0 0 ¯ ¯ 1 0 ¯ ¯ ¯k = j ¯ i j 0 1 0 0 k 0 1 i j 0 0 1 0 k 1 0 ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯k ¯ 138 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT 4.3.1 Properties of the cross product The cross product (4.14) is quite different from the dot product a · b in that it is a vector while the dot product is a scalar. Nevertheless, some properties are shared with the dot product. For all vectors a, b and c and scalars β and γ: 1. b × a = − (a × b) Proof : In b × a we interchange the rows in the cofactors defining a × b. Since these all change sign, the result follows. 2. (The triple scalar product) c · (a × b) = b · (c × a) = a · (b × c) (4.15) Proof : First note that c · (a × b) = = ¯ ¯ ¯ ¯ a a3 ¯ ¯ a1 ¯−¯ c1 ¯¯ 2 b2 b3 ¯ ¯ b1 ¯ ¯ ¯ c1 c2 c3 ¯ ¯ ¯ ¯ a1 a2 a3 ¯ ¯ ¯ ¯ b1 b2 b3 ¯ ¯ ¯ ¯ a1 a3 ¯¯ ¯ c + 2 ¯ ¯ b1 b3 ¯ a2 ¯¯ c b2 ¯ 3 (4.16) Here we have expanded the determinant by its first row. The other properties follow by rearranging the rows of the above determinant according to the permutations bca and abc of cab. These are all even and so the other determinants are all equal. (See Exercise 101, No.10). 3. a · (a × b) = b · (a × b) = 0 Proof : The statement a · (a × b) = 0 follows by putting c = a in the determinant of (4.16). The resulting determinant is zero as two of its rows are the same. The result b · (a × b) = 0 is similar. Geometrically, this means that if a × b is non-zero it is perpendicular to both a and b. 4. β(a × b) = (βa) × b = a × (βb) 5. (β + γ)(a × b) = β(a × b) + γ(a × b) 6. (αa + βb) × c = (αa × c) + (βb × c) 7. If either a = 0 or b = 0 then a × b = 0. Otherwise, a × b = 0 if, and only if, a and b are parallel. (See Exercise 102 No. 7) 4.3. THE CROSS PRODUCT A × B 139 a × b = (|a||b| sin θ) bc ✻ Figure 4.1 ✐ 4.3.2 ✿ θ ✛ b a Geometric meaning of the cross product a × b Suppose that 0 < θ < π is the angle between the non-zero and non-parallel vectors a and b. Let bc be the unit vector perpendicular to a and b with direction found by using the corkscrew rule: rotate from a to b through θ, then the direction of bc goes the way a corkscrew would go. See Chapter 1, subsection 1.1.1 and Figure 4.1. Geometric Interpretation a × b = (|a||b| sin θ) bc Proof of equation (4.17): We first show that (4.17) |a × b|2 + (a · b)2 = |a|2 |b|2 We have 2 = |a × b|2 + (a · b) ¯ ¯ ¯ ¯ a2 a3 ¯2 ¯ a1 ¯ ¯ +¯ ¯ b2 b3 ¯ ¯ b1 ¯2 ¯ ¯ a a3 ¯¯ + ¯¯ 1 ¯ b3 b1 2 2 2 2 ¯2 a2 ¯¯ 2 + (a1 b1 + a2 b2 + a3 b3 ) b2 ¯ 2 2 = (a2 b3 − a3 b2 ) + (a1 b3 − a3 b1 ) + (a1 b2 − a2 b1 ) + (a1 b1 + a2 b2 + a3 b3 ) = a22 b23 + a23 b22 + a21 b23 + a23 b21 + a21 b22 + a22 b12 + a21 b21 + a22 b22 + a23 b23 ¡ ¢¡ ¢ = a23 + a22 + a21 b23 + b21 + b22 = |a|2 |b|2 It follows that |a × b| (a · b) = 1 − 2 2 = 1 − cos2 θ = sin2 θ 2 2 |a| |b| |a| |b| Hence |a × b| = |a||b| sin θ. Since a × b is parallel to bc, we have a × b = ± (|a||b| sin θ) bc. If a = i and b = j, then from Example 4.3.2 we have from the Corkscrew Rule bc = k and i × j = +k. This suggests that the corkscrew rule applied to any cross product a × b gives the same direction as bc, and we will be satisfied with this. Equation (4.17) now follows. From Example 4.3.2 we see how the geometric interpretation gives, besides i × j = k, also the other products j × k = i and k × i = j. Compare this with 1.1.1 on right-handed systems in Chapter 1. Exercise 102 Solution: 1. Find (−1, 5, 7) × (2, −3, 4) and (−2, 4, −3) × (6, −1, 2). 140 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT (−1, 5, 7) × (2, −3, 4) = (41, 18, −7) (−2, 4, −3) × (6, −1, 2) = (5, −14, −22) 2. Find formulae for i × a, j × a and k × a. 3. Consider three points A, B and C in space. If they are not collinear they form a triangle. Show that its area is 21 |AB × AC|. 4. Find the area of the triangle with vertices A = (3, 4, 5), B = (4, 4, 6), C = (5, 5, 4). 5. Show that if the non-zero vectors a and b are at right angles then |a × b| = |a||b|. What of the converse? 6. Let l and m be unit vectors and put n = l × m. Using the geometric interpretation of the cross product show that l, m and n form a right-handed system (see 1.1.1). In fact, l×m=n m×n=l n×l =m Compare this with Example 4.3.2. 7. Complete the proofs of the properties 4.3.1 of the cross product. Hint 103 For property 7, consider the case when a and b are both non-zero. If they are parallel, use properties of 2 × 2 determinants in 4.1.4 to show that the cross product is zero. For the converse use the geometric interpretation (4.17) to show that a × b 6= 0 when the vectors are not parallel. 8. Let a, b and c be three vectors in ℜ3 . Show that they are linearly independent if, and only if, the triple scalar product a · (b × c) is not zero. 9. *Consider the parallelogram ABCD in Figure 4.2. D E A The line EF passing through Q is H C F Q G B Figure 4.2 parallel to AB and the line GH passing through Q is parallel to AD. Using vector methods show that the parallelograms EQHD and GBF Q have equal areas if, and only if, Q lies on the diagonal AC. Conclude that this is the case if, and only if, the parallelograms AGQE and QF CH are similar. (A theorem going back to Euclid). Hint 104 Consider the cross product AQ × QC. 10. A parallelepiped is a ‘squashed box’: it has six faces with opposite faces congruent parallelograms. Let u = AD, v = AB, and w = AE be three of its linearly independent edges. Show that its volume is |(u × v) · w|. 4.3. THE CROSS PRODUCT A × B 141 ✿ ✲ ✿ ✗ ✲ ✗ E ✍ h w Figure 4.2 Parallelepiped h= ✲B v A u ✸C w·(u×v) |u×v| ✲ D Hint 105 Let its base be ABCD. This has area α = |AD × AB|. The height h of the parallelepiped is the absolute value of the component of AE along AD × AB. The volume is then hα. Make a drawing. 11. * (The triple vector product). Let a, b and c be three given vectors. Show that a × (b × c) = (a · c) b − (a · b) c The result is not altogether surprising since a × (b × c) is a linear combination of b and c. Hint 106 Show the result is true if a is i, j or k. Now use the result of No.2. 12. * Application to torques (moments) of forces. Let F be a force vector that is applied to a point P in space. The torque (or moment) of F about the the point Q is defined as M = QP × F . The line of action of the force is the straight line ℓ parallel to F that passes through P . Show that the torques of F applied at any two points on its line of action about Q are equal. Let H be the point on ℓ closest to Q. Then with F acting at H we get the usual definition of ‘moment’, except that it now has a vector quality; its direction is perpendicular to both QH and F . 13. * (Continuation) Consider an axis ξ passing through Q and having the same direction as a given unit vector l. Let D and E be points on the line of action of F and the axis ξ respectively such that the distance |ED| is minimum. (Thus ED, if not zero, is perpendicular to both lines.) Let m be the unit vector with direction ED, so that ED = µm, where µ = |ED|. Put n = l × m. In the sense of the right-hand rule, the torque of F about the axis ξ is the component F · n times the distance µ = |ED| (draw a picture to see this). Show that the torque about the axis ξ is the triple scalar product M · l. Show further that if P ′ and Q′ are any points on the line of action of F and the axis ξ respectively, then ¡ ¢ ¢ ¡ ′ ′ Q P × F · l = QP × F · l This shows that the torque of a force about an axis is purely a property of the force with its line of action and the axis and does not depend on selected points on these lines. Hint 107 Express QD as a linear combination λl + µm and F as a linear combination αl + γn. Now multiply out the expression M = (λl + µm) × (αl + γn) and find M · l. 142 4.4 4.4.1 CHAPTER 4. DETERMINANTS AND THE CROSS PRODUCT Summary of Chapter 4 Determinants Determinants are inspired by the search for a formula to solve n equations for n unknowns. (See, for example, subsection 4.1.6). 4.4.2 2 × 2 determinants In Exercise 3a of Chapter 2 we introduced the idea of the determinant: |A| = a11 a22 − a21 a12 . In subsection 4.1.4 of 4.1.1 we develop some basic properties of 2 × 2 determinants. 4.4.3 3 × 3 determinants For a 3 × 3 matrix A, its determinant is given by equation (4.7): |A| = X sgn(σ)a1σ1 a2σ2 a3σ3 σ Here the sum extends over all σ = 3! = 6 permutations of 1, 2, 3 and sgn(σ) is the sign of the permutation σ = (σ1, σ2, σ3). A similar definition holds for an n × n matrix A. 4.4.4 Cofactors and properties of 3 × 3 determinants (See subsections 4.1.10, 4.1.13, 4.1.14)The cofactor of ai j is −1i+j times the determinant formed by deleting row i and column j. A determinant can be expanded by any row: |A| = ai1 Ai|1 + ai2 Ai|2 + ai3 Ai|3 (4.18) (See equation (4.8). Multiplying a row of A by a number k multiplies the determinant by k. Interchanging two rows (columns) changes the sign of the determinant. Adding a row (column) of A to another row (column) does not change the determinant. To each property of rows there is a corresponding property of columns. 4.4.5 The adjoint (See subsection 4.1.16) The adjoint of A is given by [adj (A)]ij = Aj|i This means that we can first find the matrix B with [B]ij = Ai|j and then adj(A) = B T . The fundamental property is Aadj (A) = |A| I = adj (A) A Consequently, A has an inverse if, and only if, |A| 6= 0, in which case A=1 = 4.4.6 1 adj (A) |A| General determinants All properties of 2 × 2 and 3 × 3 carry over to n × n determinants. (See section 4.2). 143 4.4. SUMMARY OF CHAPTER 4 4.4.7 The Cross Product (See section 4.3). The cross product is defined in ¯ ¯ a a3 a × b = ¯¯ 2 b2 b3 4.4.8 equation 4.14 ¯ ¯ ¯ ¯ ¯ i − ¯ a1 a3 ¯ b1 b3 ¯ ¯ ¯ ¯ ¯ ¯ j + ¯ a1 ¯ b1 ¯ The Triple Scalar Product of a, b and c ¯ a2 ¯¯ k b2 ¯ This is a · (b × c) Basic properties are given in equations (4.15) and (4.16). 4.4.9 Geometric interpretation a × b = (|a||b| sin θ) bc Here 0 < θ < π is the angle between the non-zero and non-parallel vectors a and b and bc is the unit vector perpendicular to a and b with direction found by using the corkscrew rule: rotate from a to b through θ. The cross product can also be seen geometrically as an area (exercise 3); the triple scalar product as a volume as a volume (exercise 10).

Log In

APPLIED MATHEMATICS 1A (ENG) Mathematics 132: Vectors and Matrices

APPLIED MATHEMATICS 1A (ENG) Mathematics 132: Vectors and Matrices

Related Papers

RELATED PAPERS