# StatisticsPoint Estimation

In the previous section, we discussed the problem of estimating a distribution given a list of independent observations from it. Now we turn to the simpler task of **point estimation**: estimating a single real-valued feature (such as the mean, variance, or maximum) of a distribution. We begin by formalizing the notion of a real-valued feature of a distribution.

**Definition** (Statistical functional)

A **statistical functional** is any function from the set of distributions to .

For example, if we define to be the mean of the distribution , then is a statistical functional. Similarly, consider the *maximum* functional where is the CDF of . To give a more complicated example, we can define to be the expected value of the difference between the greatest and least of 10 independent random variables with common distribution . Then also a statistical functional.

Given a statistical functional, our goal will be to use a list of independent observations from to estimate .

**Definition** (Estimator)

An **estimator** is a random variable which is a function of i.i.d. random variables.

For example, the random variable is an estimator, if are independent and identically distributed. Let's develop a general approach for defining estimators. We begin by observing a large independent sample from a distribution gives us direct information about the CDF of the distribution.

**Example**

Draw 500 independent observations from an exponential distribution with parameter 1. Plot the function which maps to the proportion of observations at or to the left of on the number line. We call the **empirical CDF**. Compare the graph of the empirical CDF to the graph of the CDF of the exponential distribution with parameter 1.

*Solution.* We can graph using a step plot:

using Plots, Distributions n = 500 xs = range(0, 8, length=100) plot(xs, x-> 1-exp(-x), label = "true CDF", legend = :bottomright) plot!(sort(rand(Exponential(1),n)), (1:n)/n, seriestype = :steppre, label = "empirical CDF")

n <- 500 xvals = seq(0,8,length=100) ggplot() + geom_line(aes(x=xvals,y=1-exp(-xvals))) + geom_step(aes(x=sort(rexp(n)),y=(1:n)/n))

This example suggests an idea for estimating : since the unknown distribution is typically close to the measure which places mass at each of the observed observations, we can build an estimator of by plugging into .

**Definition** (Plug-in estimator)

The plug-in estimator of is .

**Example**

Find the plug-in estimator of the mean of a distribution. Find the plug-in estimator of the variance.

*Solution.* The plug-in estimator of the mean is the mean of the empirical distribution, which is the average of the locations of the observations. We call this the **sample mean**:

Likewise, the plug-in estimator of the variance is **sample variance**

Ideally, an estimator *mean* of

### Bias

**Definition** (Bias)

The **bias** of an estimator

An estimator is said to be **biased** if its bias is nonzero and **unbiased** if its bias is zero.

**Example**

Consider the estimator

of the maximum functional. Assuming that the distribution is described by a density function (in other words, it's a continuous rather than a discrete random variable), show that

*Solution.* If

We can numerically experiment to approximate the bias of this estimator in a specific instance. For example, if we estimate the maximum of a uniform distribution on

using Statistics mean(maximum(rand() for _ in 1:100) - 1 for _ in 1:10_000)

which is about -0.0098. We can visualize these sample maximum estimates with a histogram:

using Plots histogram([maximum(rand() for _ in 1:100) for _ in 1:10_000], label = "sample maximum", xlims = (0,1), legend = :topleft)

### Standard Error

Zero or small bias is a desirable property of an estimator: it means that the estimator is accurate *on average*. The second desirable property of an estimator is for the probability mass of its distribution to be concentrated near its mean:

**Definition** (Standard error)

The standard error

**Example**

Find the standard error of the sample mean if the distribution

*Solution.* We have

Therefore, the standard error is

We can see how the standard error decreases with

n = 100 histogram([mean(rand() for _ in 1:n) for _ in 1:10_000], label = "sample mean, $n observations", xlims = (0,1), size = (600,400))

### Mean Squared Error

If the expectation of an estimator of

**Definition** (Mean squared error)

The mean squared error of an estimator

As advertised, the mean squared error decomposes as a sum of *squared bias* and *squared standard error*:

**Theorem**

The mean squared error of an estimator

*Proof.* The idea is to add and subtract the mean of

The middle term is zero by

If the bias and standard error of an estimator both converge to 0, then the estimator is *consistent*:

**Definition** (Consistent)

An estimator is **consistent** if

**Example**

Show that the plug-in maximum estimator

*Solution.* The probability that

This converges to 0 as

The figure below summarizes the four possibilities for combinations of high or low bias and variance.

**Example**

Show that the sample variance

*Solution.* We will perform the calculation for

Squaring out each trinomial, we get

Recalling that

If we repeat the above calculation with

Motivated by this example, we define the **unbiased sample variance**

**Exercise**

Let's revisit the adult height distribution from the first section. We observed the human adult heights shown below (in inches). If we want to approximate the height distribution with a Gaussian, it seems reasonable to estimate μ and σ² using the unbiased estimators

Calculate these estimators for the height data.

heights = [71.54, 66.62, 64.11, 62.72, 68.12, 69.07, 64.82, 61.92, 68.45, 66.3, 66.99, 62.2, 61.04, 63.31, 68.94, 66.27, 66.8, 71.7, 68.93, 66.65, 71.97, 60.27, 62.81, 70.64, 71.61, 65.51, 63.1, 66.21, 68.23, 72.32, 62.29, 63.12, 64.94, 71.89, 65.48, 63.66, 56.11, 65.63, 61.26, 65.12, 66.93, 68.51, 67.2, 71.57, 66.65, 59.77, 61.51, 63.25, 69.12, 64.98]

*Solution.* Julia has built-in functions for this:

mean(heights), var(heights)

We could also write our own:

μ = sum(heights)/length(heights) σ̂² = 1/(length(heights)-1) * sum((h-μ)^2 for h in heights)

In the next section, we will develop an important extension of point estimation which supplies additional information about how accurate we expect our point estimate to be.