12.5 the chain rule

You already are quite familiar with the chain rule for functions of a single variable. Even if you don't remember the name, you certainly remember how it's used. For instance, to differentiate the function we have
 
 
You have used this rule hundreds of times, but you may be less familiar with the general form of the chain rule. For differentiable functions f and g, we have



We now pause to extend the chain rule to functions of several variables. This takes several slightly different forms, depending on the number of independent variables. Keep in mind that these are variations of the already familiar chain rule for functions of a single variable.

First, consider a differentiable function f (x, y) where the arguments x and y are both in turn, differentiable functions of a single variable t. If we wish to find the derivative of f (x, y) with respect to t, we can first write g(t ) = f (x(t ), y(t )). Then, from the definition of (an ordinary) derivative, we have

 
For simplicity, we write x = x(t+ t )-x(t ), y = y(t+ t )-y(t ) and z = f (x(t+ t ), y(t+ t ))-f (x(t ), y(t )). We now have
Since f is a differentiable function of x and y, we have (from the definition of differentiability) that
where and both tend to 0, as ( x, y) (0, 0). Dividing through by t  gives us
Taking the limit as t 0 now gives us
 
 
(5.1)
Notice that
and
Further, notice that since x(t ) and y(t ) are differentiable, they are also continuous and so,
Likewise, also. Consequently, since ( x, y) (0, 0), as t 0, we have
From (5.1), we now have
     
 
We summarize the chain rule for the derivative of f (x(t ), y(t )) with the following result.

5.1    (Chain Rule)
 
If z = f (x(t ), y(t )), where x(t ) and y(t ) are differentiable and f (x, y) is a differentiable function of x and y, then

As a convenient device for remembering the chain rule, we often use a tree diagram like the one shown in the margin. Notice that if z = f (x, y) and x and y are both, functions of the variable t, then t is the independent variable. We consider x and y to be intermediate variables, since they both depend on t. In the tree diagram, we list the dependent variable z at the top, followed by each of the intermediate variables x and y, with the dependent variable t at the bottom level, with each of the variables connected by a path. Next to each of the paths, we indicate the corresponding derivative i.e., between z and x, we indicate . The chain rule then gives as the sum of the all of the products of the derivatives along each path to t. That is,
This device is especially useful for functions of several variables that are in turn functions of several other variables, as we will see shortly.


We illustrate the use of this new chain rule in the following example.

5.1   
Using the Chain Rule
 
For f (x, y) = x2ey, x(t ) = t 2-1 and y(t ) = sin t, find the derivative of g(t ) = f (x(t ), y(t )).
 
 
We first compute the derivatives x' (t ) = 2t  and y' (t ) = cos t. The chain rule (Theorem 5.1) then gives us
  = 2(t 2-1)esin t (2t )+(t 2-1)2esin t cos t.

In example 5.1, notice that you could have first substituted for x and y and then computed the derivative of g(t ) = (t 2-1)2esin t , using the usual rules of differentiation. In fact, when direct substitution is possible, it is usually preferable. In the following example, you don't have any alternative but to use the chain rule.

5.2   
A Case Where the Chain Rule Is Needed
 
Suppose the production of a firm is modeled by the Cobb-Douglas production function P(k, l) = 20k1/4l3/4, where k measures capital (in millions of dollars) and l measures the labor force (in thousands of workers). Suppose that l = 2 and k = 6, the labor force is decreasing at the rate of 20 workers per year and capital is growing at the rate of $ 400, 000 per year. Determine the rate of change of production.
 
 
Suppose that g(t ) = P(k(t ), l(t )). From the chain rule, we have
Notice that and With l = 2 and k = 6, this gives us and . Since k is measured in millions of dollars and l is measured in thousands of workers, we have k'(t ) = 0.4 and l' (t ) = -0.02. From the chain rule, we now have
  2.1935 (0.4)+19.7411 (-0.02) = 0. 48258.

We can easily extend Theorem 5.1 to the case of a function f (x, y), where x and y are both functions of the two independent variables s and t, x = x(s, t ) and y = y(s, t ). Notice that if we differentiate with respect to s, we treat t as a constant. Applying Theorem 5.1 (while holding t fixed), we have
Similarly, we can find a chain rule for This gives us the following more general form of the chain rule.

5.2    (Chain Rule)
 
Suppose that z = f (x, y) , where f  is a differentiable function of x and y and where x = x(s, t ) and y = y(s, t ) both have first-order partial derivatives. Then we have the chain rules:
and


Observe that the tree diagram shown in the margin serves as a convenient reminder of the chain rules indicated in Theorem 5.2, again by summing the products of the indicated partial derivatives along each path from z to s or t, respectively.

The chain rule is easily extended to functions of three or more variables. You will explore this in the exercises.

5.3   
Using the Chain Rule
 
Suppose that f (x, y) = exy , x(u, v) = 3usin v and y(u, v) = 4v2u . For g(u, v) = f (x(u, v), y(u, v)) , find the partial derivatives and .
 
 
We first compute the partial derivatives , , 3sin v and . The chain rule (Theorem 5.2) gives us
Substituting for x and y , we get
For the partial derivative of g with respect to v , we compute and 8vu . Here, the chain rule gives us
Substituting for x and y , we have

Once again, it is often simpler to first substitute in the expressions for x and y. We leave it as an exercise to show that you get the same derivatives either way. On the other hand, there are plenty of times where the chain rules seen in Theorems 5.1 and 5.2 are indispensable. You will see some of these in the exercises, while we present several important uses next.

5.4   
Converting from Rectangular to Polar Coordinates
 
For a differentiable function f (x, y) with x = rcos and y = rsin , show that f r = f xcos +f ysin and f rr = f xxcos2+2f xycossin+f yysin2.
 
 
First, notice that and . From Theorem 5.2, we now have
Be very careful when computing the second partial derivative. Using the expression we have already found for f r and Theorem 5.2, we have
 
  = (f xxcos+f xysin)cos+(f yxcos+f yysin)sin
  = f xxcos2+2f xycossin+f yysin2,
as desired.
 

Implicit Differentiation

Suppose that the equation F(x, y) = 0 defines y implicitly as a function of x, say y = f (x) . In section 2.8, we saw how to calculate in such a case. We can use the chain rule for functions of several variables to obtain an alternative method for calculating this. Moreover, this will provide us with new insights into when this can be done and, more important yet, this will generalize to functions of several variables defined implicitly by an equation.

We let z = F(x, y) , where x = t  and y = f (t ) . From Theorem 5.1, we have

But, since z = F(x, y) = 0 , we have , too. Further, since x = t, we have and . This leaves us with
Notice that we can solve this for , provided Fy 0. In this case, we have
Recognize that we already know how to calculate implicitly, so this doesn't appear to give us anything new. However, it turns out that the Implicit Function Theorem (proved in a course in advanced calculus) says that if Fx and Fy are continuous on an open disk containing the point (a, b) where F(a, b) = 0 and Fy(a, b) 0, then the equation F(x, y) = 0 implicitly defines y as a function of x nearby the point (a, b) . More significantly, we can extend this notion to functions of several variables defined implicitly, as follows. Suppose that the equation F(x, y, z) = 0 implicitly defines a function z = f (x, y), where f  is differentiable. Then, we can find the partial derivatives f x and f y using the chain rule, as follows. We first let w = F(x, y, z) . From the chain rule, we have
Notice that since w = F(x, y, z) = 0, . Also, and , since x and y are independent variables. This gives us
We can solve this for , as long as Fz 0 , to obtain
(5.2)
Likewise, differentiating w with respect to y leads us to
(5.3)
again, as long as Fz 0. Much as in the two variables case, the Implicit Function Theorem for functions of three variables says that if Fx , Fy and Fz are continuous inside a sphere containing the point (a, b, c) , where F(a, b, c) = 0 and Fz(a, b, c) 0 , then the equation F(x, y, z) = 0 implicitly defines z as a function of x and y nearby the point (a, b, c) .

5.5   
Finding Partial Derivatives Implicitly
 
Find and , given that F(x, y, z) = xy2+z3+sin(xyz) = 0.
 
 
First, note that using the usual chain rule, we have
Fx = y2+yzcos(xyz), Fy = 2xy+xzcos(xyz)
and
Fz = 3z2+xycos(xyz).
From (5.2), we now have
Likewise, from (5.3), we have

Notice that, much like implicit differentiation with two variables, implicit differentiation with three variables yields expressions for the derivatives that depend on all three variables.


© 2002 McGraw-Hill Companies, Inc.