Given a probability space and a random variable , the distribution of tells us how distributes probability mass on the real number line. Loosely speaking, the distribution tells us where we can expect to find and with what probabilities.
Definition (Distribution of a random variable)
The distribution (or law) of a random variable is the probability measure on which maps a set to .
Suppose that represents the amount of money you're going to win with the lottery ticket you just bought. Suppose that is the law of . Then
We can think of as pushing forward the probability mass from to by sending the probability mass at to for each . The probability masses at multiple 's can stack up at the same point on the real line if maps the 's to the same value.
A problem on a test requires students to match molecule diagrams to their appropriate labels. Suppose there are three labels and three diagrams and that a student guesses a matching uniformly at random. Let denote the number of diagrams the student correctly labels. What is the probability mass function of the distribution of ?
Solution. The number of correctly labeled diagrams is an integer between 0 and 3 inclusive. Suppose the labels are , and suppose the correct labeling sequence is (the final result would be the same regardless of the correct labeling sequence). The sample space consists of all six possible labeling sequences, and each of them is equally likely since the student applies the labels uniformly at random. So we have
The probability mass function of the distribution of is therefore
All together, we have
Cumulative distribution function
The distribution of a random variable may be specified by its probability mass function or by its cumulative distribution function :
Definition (Cumulative distribution function)
If is a random variable, then its cumulative distribution function is the function from to defined by
Consider a random variable whose distribution is as shown in the figure above. Select the true statements.
Solution. The first one is true, since the CDF goes from about 0.1 at to about 0.9 at . The difference, about 0.8 is larger than 0.6.
The second one is also true, since there is no probability mass past 2.
The third one is false: there is no probability mass in the interval from to 0.
is equivalent to the probability that is less than , which (reading the graph of the CDF) we see is between and . Therefore, the last one is false.
Suppose that is a random variable with CDF and that . Express in terms of the function . For simplicity, assume that .
Solution. By definition of , we have that if or Since these events are mutually exclusive, we have
where the last step follows since for this random variable .
Random variables with the same cumulative distribution function are not necessarily equal as random variables, because the probability mass sitting at each point on the real line can come from different 's.
For example, consider the two-fair-coin-flip experiment and let be the number of heads. Find another random variable which is not equal to but which has the same distribution as .
Solution. If we define to be the number of tails, then it's clear from symmetry that it has the same distribution as . Furthermore, and are unequal as random variables because if , then (and vice versa).
(In fact, we can express in terms of as .)