Multivariable CalculusDifferentiation

Differentiating a single-variable function involves answering the question near a given point, how much does the value of the function change per unit change in the input? In the higher-dimensional setting, the question must be made more specific, since the change in output depends not only on how much the input is changed but also on the of the change in input.

Consider, for example, the function which returns the altitude of the point on earth with latitude and longitude . If the point identifies a point on a sloping hillside, then there are some directions in which increases, others in which decreases, and two directions in which neither increases nor decreases (these are the directions along the hill's contour lines, as you would see represented on a map).

Partial derivatives

The simplest directions for inquiring about the instantaneous rate of change of are those along the axes: The partial derivative of a function at a point is the slope of the graph of in the -direction at the point . In other words, it's the slope of the intersection of the graph of with the plane . The partial derivative may also be denoted .

Exercise
Consider the function whose graph is shown. Determine the sign of () and the sign of ().

Solution. If we increase a little while holding constant, then decreases. Therefore, . If we increase a little while holding constant, then increases. Therefore, .

Graphically, the partial derivative with respect to at a point is equal to the slope of the trace of the graph in the "" plane passing through that point. Similarly, the partial derivative with respect to at a point is equal to the slope of the trace of the graph in the "" plane passing through that point.

We can partial-differentiate multiple times, and it turns out that the order in which we apply these partial differentiation operations doesn't matter. This fact is called Clairaut's theorem.

Exercise
Consider the function . Show that differentiating with respect to and then with respect to gives the same result as differentiating with respect to and then with respect to .

Solution. The partial derivative of with respect to is , and the derivative of that with respect to is . The partial derivative of with respect to is , and the derivative of that with respect to is . Therefore, the conclusion of Clairaut's theorem is satisfied in this case.

Differentiability

A single-variable function is differentiable at a point if and only if its graph looks increasingly like that of a non-vertical line when zoomed increasingly far in. In other words, is differentiable if and only if there's a linear function such that goes to 0 as .

Likewise, a function of two variables is said to be differentiable at a point if its graph looks like a plane when you zoom in sufficiently around the point; that is, is differentiable at if

for some real numbers , , and . If such a linear function exists, then its coefficients are necessarily , , and .

The function is differentiable at the point shown, because its graph looks increasingly like the dark green plane shown as you zoom in around the point.

So, the equation of the plane tangent to the graph of a differentiable function at the point is given by

This equation says how behaves for values of very close to : the output changes by the -change times 's sensitivity to changes in (namely ) plus the -change times 's sensitivity to changes in (namely ).

Gradient

Once we know how a differentiable function changes in the coordinate-axis directions, we can use the formula to succinctly express how it changes in any direction: we form the gradient of by putting all of the partial derivatives of a function together into a vector. Then, for any unit vector , the rate of change of in the direction is equal to .

Since , where is the angle between and , the direction of the gradient is the direction in which increases most rapidly. The direction opposite to the gradient is the direction of maximum decrease, and the directions orthogonal to these are the ones in which is constant.

Exercise
Suppose that is a differentiable function at the point and that its instantaneous rates of change in the directions and are known. Show that if and are not parallel, then it is always possible to infer 's rates of change in the coordinate-axis directions.

Solution. The problem stipulates that we are given equations of the form

for some numbers . This system may be written in matrix form as

Since and are not parallel, they span . Therefore, the matrix is invertible, and the solution of is .

Exercise
Consider a differentiable function from to and a point where is differentiable with nonzero gradient. The number of directions in which increases maximally from that point is . The number of directions in which decreases maximally from that point is . The number of directions in which remains approximately constant is .

Solution. increases maximally in the direction of its gradient and decreases maximally in the opposite direction. It remains approximately constant in the two directions orthogonal to its gradient.

Solution. increases maximally in the direction of its gradient and decreases maximally in the opposite direction. It remains approximately constant in the plane of directions orthogonal to its gradient. Since a plane contains infinitely many directions, the number of directions in which remains approximately constant is infinite.

Second-order differentiation

We can take the notion of a gradient, which measures the linear change of a function, up a degree. The Hessian of a function is defined to be the matrix

The best quadratic approximation to the graph of a twice-differentiable function at the origin is

The same is true at points other than the origin if we evaluate the gradient and Hessian at instead of and if we replace with .

is the best quadratic approximation of at the origin.

Exercise
Suppose that and are real numbers and that . Show that the quadratic approximation of at the origin is equal to .

Solution. The gradient of evaluated at the origin is , so the linear approximation of is

The Hessian is , so the quadratic terms in the quadratic approximation are

as desired.

We can combine the ideas of quadratic approximation and diagonalization to gain sharp insight into the shape a function's graph at a point where the gradient is zero. Since the Hessian matrix is by Clairaut's theorem, the spectral theorem implies that it is orthogonally diagonalizable.

With as the diagonalization of , the quadratic term in the quadratic approximation becomes

Since the components of are the coordinates of with respect to the basis given by the columns of , the quadratic term may be written as

where is the vector of coordinates of with respect to the basis given by the columns of .

Writing the quadratic approximation of in the form is powerful because it presents the changes in as a sum of separate changes, each of which is as simple as the parabola .

a=${a}

If is negative, then the graph of is shaped like an up-turned parabola along the axis. If it's positive, then the graph of is shaped like a down-turned parabola along that axis.

Exercise
Consider a point where has zero gradient and a Hessian with eigenvalues .

If all of the eigenvalues are positive, then is at than at nearby points.

If all of the eigenvalues are negative, then is at than at nearby points.

If some eigenvalues are positive and some are negative, then increases as you move away from in some directions and in other directions.

If every slice of the graph of is convex, then has a local minimum at the origin

If every slice is concave, then has local maximum

If there are both concave and convex slices, then has a saddle point

In addition to helping distinguish local minima, local maxima, and saddle points, the diagonalized Hessian can also help us recognize ravines in the graph of . This idea arises in the context of numerical optimization methods for deep learning.

The minimum of this function is in a long, narrow valley with steep sides.

Exercise
Suppose that has zero gradient at a given point, and suppose that its Hessian matrix at that point has eigenvalues and . How can you recognize based on the values of and whether the graph of is ravine-shaped?

Solution. If and are both positive, with one close to zero and the other very large, then the graph of will be ravine-shaped. That's because the steep increase in one direction corresponds to one of the eigenvalues being very large, and the shallow increase in the orthogonal direction is indicated by the other eigenvalue being very small.

Dili değiştir

Mathigon'da oturum açın

Paylaş

İlerlemeyi Sıfırla

Sözlük

Multivariable CalculusDifferentiation

Partial derivatives

Differentiability

Gradient

Second-order differentiation