Probability
TOC
- Introduction
- Random Continuous Variable
- Random Discrete Variable
- Joint Probability
- Conditional Probability
- Expectation
Introduction
Probability is a branch of mathematics that deals with the study of random events and the likelihood of their occurrence. It is used to model situations where there is uncertainty or randomness involved, and is widely applied in various fields such as statistics, finance, physics, engineering, and computer science. Probability is also widely used in machine learning and artificial intelligence, where it is used to model uncertainty in data and to make predictions.
Random variable
A randome variable x denotes an uncertain quantity. It may be the result of a coin flip or the measurement of temperature. Each time we experience x, it can take a different value
We can also say, in some other words: if the experiment is done n times and the event A occurs
The random variable can become a function f(x) when the domain is the set of all experiment outcomes (so that the total proabilities summed to one). Note that a function is a rule of correspondence between x and y, with the values of the independent variable x form a set D named the domain and the values of dependent y = f(x) form the range set R of the function. In another way, we have two sets of number D and R. For every x in D we assign a number y = f(x) belong to R. We would say f is the function of x. The mapping between x and y can be one to one or many to one.
There are two types of random variables: discrete and continuous. A discrete variable has a set of values. This set can be an ordered set, for example the list of a dice rolling values, ranging from 1 to 6, or it can be an unordered one, say, the weather outcomes of sunny, snowy, rainy and windy. It can be finite or infinite and the probability distribution is best shown as a histogram. With that, each possible outcome has a positive probability and the sum of all such probability is 1. On the other hand, continuous random variable has values in the real domain. These can also be finite or infinite, depending on the problem. It can be infinite but bounded and the probability distribution is best shown as the graph of the probability density function (pdf). Each outcome would have its own probability (propensity) and the integral of the pdf always be 1, similar to the discrete variable.
Image: the visualization of the probability distribution of discrete and continuous variable
Continuous random variable
Normal (Gaussian) distribution
This is the most popular distribution. We say x is a normal (or Gaussian) random variable with parameters
Many natural phenomena follows Gaussian distribution. One example, Maxwell arrived at the normal distribution for the distribution of velocities of molecules, under the assumption that the probability density of molecules with given velocity components is a function of their velocity magnitude and not their directions.
Exponential distribution
We say x is exponential with parameter
Some exponentially distributed events are phone calls or bus arrivals, given that the occurrences of those events are independent.
Image: The waiting time at bus stop or phone calls, according to exponential distribution assumption
Gamma distribution
We say x to be a gamma random variable with parameters
with
The gamma distribution (which was mentioned) takes on different shapes and sizes.
Chi-square distribution
x is said to be a
with n = 2, we have the exponential distribution.
Uniform distribution
x is said to be uniformly distributed in the interval (a,b)
Image: A uniform distribution
Beta distribution
The random variable x has beta distribution with nonnegative parameters
where the beta function
Cauchy distribution
Laplace distribution
Maxwell distribution
Discrete variable
Bernouli distribution
The Bernoulli distribution refers to any experiment with only two possible outcomes: success or failure (head or tail). x is said to be Bernoulli distributed if x takes the values 1 and 0 with P(x=1) = p and P(x=0) = q = 1 - p
Binomial distribution
When we have independent trial of n Bernoulli experiment, we call it a binomial random variable. x is said to be a binomial random variable with parameters n and p if x takes the values of n classes: 1, 2,..n with
Since the binomial coefficient
Let
Let’s state the law of large numbers: if an event A with P(A) = p occurs k times in n trials, then
For the Poisson approximation: if
Poisson distribution
The Poisson distribution represents random variables such as number of telephone calls for a fixed period, the number of winning ticketss in a large lottery, and the number of printing errors in a book. The event can be rare, but does happen. x follows a Poisson distribution with parameter
Geometric distribution
Let x be the number of trials needed to find the first success in repeated Bernoulli trials. Then x follows a geometric distribution.
The probability of event (x>m) is:
Negative binomial distribution
x follows negative binomial distribution with parameters r and p if
Discrete uniform distribution
P(x=k) = \frac{1}{N} with k = 1,2,..N
Joint probability
Joint probability of variable x and y
To extract the probability distribution of a single variable from a joint distribution we sum (or integrate) over all other variables:
Pr(x) is called the marginal distribution and doing the equation is called the marginalization process.
Image: Joint probability of two continuous variables x and y
Conditional proability
The conditional probability is the probability of x condition on
The denominator is the marginal probability of
Image: Conditional probability of variable x given two values of y
Bayes’ rule
Since
This is called the Bayes’ rule and
Independence
Independence is a condition that knowing x doesn’t give out information about y. Hence the conditional probability is simply the evidence
Expectation
Given random variable x with Pr(x) and a function f(x), we can calculate the expected value of f(x):
For multiple variables x and y:
When thinking of expectations, remember these rules:
-
the expected value of a constant k with respect to random variable x is k itself:
-
the expected value of a constant k times a function x is k times the expected value of that function
-
the expected value of the sum of two functions of x is the sum of each of those expected values:
-
the expected value of the product of two functions f(x) and g(y) is the product of the individual expected values if x and y are independent:
The expectations also have special names for some functions. Let’s call the mean of the random variable x to be
We denote
The absolute moment is
Variance
The variance of f(x) is defined by:
This is how much variability there is in f(x) around its mean value
For one variable x,