# What can you do with probability theory

## Introduction to probability theory and statistics

### § 1. Models for random experiments, counting methods

The aim of probability theory is to analyze the laws that play a role in the description of so-called “random experiments”. By this we mean experiments whose outcomes are not determined by the experimental conditions for logical or other reasons. At least from a conceptual point of view, the experiments should be repeatable under the same conditions, in such a way that the outcome of the experiment is not necessarily always the same when it is repeated independently, but only follows statistical regularities.

### § 2. Conditional probability and independence

Often, before the result of a random experiment is known, the information is already available that the result belongs to a certain subset of the sample space. E.g. a player sees his own ten cards while playing skat. If player 1 is interested in the probability of event A that player 2 has two aces, he will first count his own aces. If he himself has three or four aces, the probability of event A is of course 0 for him; if he has a maximum of two aces, it is positive.

### § 3. Random variable, expected value, variance

In many random experiments it is not so much the result ω that is of interest, but only a certain quantity X (ω), which is determined by ω. For example, if a person were randomly selected, ω could be the name or passport number of the selected person and X (ω) their income. Occasionally, characteristics of a qualitative nature such as religion, eye color, etc.

### § 4. Basic concepts of estimation theory

We now want to get to know a few important concepts in statistics. A classic example should help us with this.

### § 5. Approximations of the binomial distribution

For large n is the exact calculation of the probability
\$\$ {b_ {n, p}} (k) = \ left ({_k ^ n} \ right) {p ^ k} {(1 - p) ^ {n - k}} \$\$
It is troublesome to have exactly k successes in n independent attempts with a success probability p. How likely is it to get k = 40 heads with n = 80 tosses of a coin? On the result (4080)2−80 not even the order of magnitude can be easily recognized. The calculation of sums of such probabilities is even more confusing, i.e. the probability of getting heads between 40 and 50 times. We therefore now want to deal with approximations for such probabilities.

### § 6. Tests

It is a fundamental idea of ​​the empirical sciences that the decision between competing models of reality should be based on observations of an experiment which, under the alternative model assumptions, suggests different test results. Ideally, according to Francis Bacon's idea, an “experimentum crucis” is possible that leads to a definitive decision. A famous example is the Michelson interference experiment.

### § 7. Generating functions

We now want a simple, yet surprisingly powerful tool for studying distributions on ℤ+ = Get to know {0, 1, 2, ...}.

### § 8. Entropy and Coding

We want to at least briefly address a concept of stochastics that is less clear than e.g. the concepts of probability and independence, but which also has a fundamental meaning: the concept of entropy. It is closely linked to that of information.

### § 9. Run time analysis of recursive algorithms

In this section we want to give a first introduction to a topic that is of great importance in the face of the advance of computers. We are interested in statements about the running time of recursive algorithms. Sorting algorithms serve as an example. The results of this section are not needed in the remainder of the book.

### § 10. Probability measures with densities

In addition to the discrete probability measures, we will be particularly interested in those with densities. But it is economical to formulate the basic terms in a general way.

### § 11. Random variables and their moments

In the discrete case we called every mapping X from Ω into ℝ a random variable. This is not useful for general Ω. For example, we want to be able to speak of the probability that X ≤ 7. For this, {X ≤ 7} must be an event, i.e. it must belong to the σ-algebra on which P is defined.

### § 12. Limit sets

In this section we want to derive a tightening of the weak law of large numbers and generalize the normal approximation of the binomial distribution.

### § 13. Estimation procedure and error calculation

A more or less complete introduction to the most important statistical methods should not be attempted here. We just want to present some of them as examples. This is relatively easy if you just describe the process in a recipe. Most procedures, on the other hand, find it difficult to prove that they are optimal in an appropriate sense. Here we want to take an intermediate path and first motivate some estimation methods, then - in the next paragraph - some common tests, based on general considerations. The maximum likelihood method is particularly suitable for this.

### § 14. Some important test procedures

Let us remember that in a test problem a - mostly vector-valued - random variable X is observed, the distribution of which is Pυ of a family {pυ : υ ∈ Θ} belongs to the set of distributions considered in the model. Θ is the disjoint union of two non-empty sets H and K, the hypothesis and the alternative, and it should be decided on the basis of the observed value x of X whether υ belongs to H or to K.

### § 15. The Markov property

Let (Ω, A, P) be a probability space, T any non-empty index set, and (I, I) a measurable space. A family {Xt, t ∈ T} of random variables with values ​​in I is called a stochastic process with parameter range T and state space I.

### § 16. The behavior of Markovian chains over long periods of time

If you know the transition probabilities (pij) of a Markov chain, probabilities that only depend on a small number of transitions can often be calculated explicitly. The computational effort, e.g. for calculating the n-step transition probabilities, can be extremely high for large n. We are therefore interested in limit theorems for n → ∞.

### § 17. The renewal rate

We can now turn to the question of the convergence of the transition probabilities pij(n) to return. The case in which j is transient can now be checked off fairly quickly. In the recurrent case, we still need a "renewal rate", which is also of independent interest. The idea of ​​renewal is one of the most fruitful ideas in probability theory.

### § 18. The Poisson process

We now discuss one of the simplest examples of a Markov chain with continuous time, the Poisson process, which can, among other things, serve as a model for observing radioactive decay.We only assume § 10 and § 11, but not the above results on Markov chains.