This is the second blog on my adventure to learn more about probability and statistics. This is one is about Probability transformation, probability momentum and their use in skewness, kurtosis, variance and so on

Sometime it is easier for us if we can represent a probability distribution in a different way, say we have a very complicated joint probability distribution of a set of random variables, we want to transform it into much simpler form using probability transformation.So, using probability transformation constructs an equivalent set of values that can reasonably be modelled from a specified distribution.

Suppose we have a distribution like this:

F(x) \int_{\sqrt{ \pi /2}}^{0} 2x cos x\^{2} dx

We want to transform it into something else, say y=x^{2}

Replace the limits x = 0 and x = \sqrt{2 / \pi} will change into y=0 and y=\pi/2

\frac{d}{dx}x\^{2}=frac{d}{dx}y =>2x dx=dy

So, F(y) \int_{ \pi /2}^{0} sin y dy will represent the same function.

If cdf F_x(x), and transformation function Y=g(x), then the relationship between F_x(x) and F_y(y) depends on whether g is increasing or decreasing, if it is increasing then F_y(x)=F_x(g\{-1}(y)) but if it is decreasing then F_y(x)=1-F_x(g\{-1}(y))

Let X have pdf fx), let Y=g(x) Suppose there exists a partition A_0, A_1,…A_k of ample saple s[ace

i) g(x)=g_i(x) for x \in A_i

ii)g_i(x) is monotone on A_i

iii) Y=

There are a lot of interesting theorem I found in the book, like say if we have Y=F(x), then P(Y

In binomial transform tation a set of number a_0, a_1, a_2, … get transformed into b_0, b_1, b_2, …, where

b_n =\sum_{k=0}^{n} (-1)^k {(n-k)}\dbinom{n}{k} a_i

A discrete random variable X has a binomial distribution if its pmf is of the form f_x(x)=P(X=x)=\dbinom{n}{k} p^{x} (1-p)^{n-x} where 0< =p<=1, n>=0

If g is increasing function on X, F_y(y)=F_x(g^{-1}(y))

If g is decreasing function on X, F_y(y)=1-F_x(g^{-1}(y))

Now, lets get started with Expectation, what to expect and what not to, this is one hell of a question. Don’t go for very serious issue at the beginning, but at least we can formulate what to expect when we roll a dice. It is calculated as following E(X)=\sum{n}^{n=0} x p(x) So what to expect when we roll the dice? Dice has 6 side, so for each side p(x)=1/6.

E(X)=(1/6)*1+(1/6)*2+(1/6)*3+(1/6)*4+(1/6)*5+(1/6)*6=3.5

So which is basically the mean of the number so the expection is basically the mean, E(X). Not only mean but expectation also has a relation with variance as mean has a relation with variance. Variance \sigma/^{2} can be calculated using the expectation of x-EX )\^2, so E(x-EX )\^2 is variance. So from expectation we know the average, as well as how sparse our data is, on other word variance, so standard deviation is just root of \sigma.

Probability moments are formulated using this eq E(X\^k), so its a expected power of random variable.Central moment E(X -EX)/^{k}. Now we see that anything like mean or variance can be calculated using this probability moment, when k=1 it is same as the equation of mean, when k=2 it is close to variance. For k=3 we E(x-u)\{3}/\sigma\^{3} we get the skew of the moment. skew defines the goodness of the distribution. Skewness=(mean-median)/Standard deviation. Positive distribution pick will be in the left, and vise versa. For k=4 we E(x-u)\{4}/\sigma\^{4} we get the kurtosis, Kurtosis defines whether the distribution has higher peak then the normal distribution or flatter.

If random variable with finite variance, then for any constants a and b, Var(aX+b0=a^2 Var X.

Convergence of MGF theorem states, if lim t->0 M(t)_i=M(t) for all neighbor of 0, then F_xi(x)=F_x(x)