Frequentist vs Bayesian Statistics

As I was walking to a cafe to meet my friend Christine, a machine learning enthusiast like myself, I couldn't help but think about the age-old debate of Frequentist vs. Bayesian statistics. I mean, what could be more exciting than discussing probability interpretations over a warm cup of coffee? Okay, maybe bungee jumping or skydiving, but still. Little did I know that our friendly chat at the cafe was about to turn into a full-blown statistical showdown.

This blog is about the pros and cons of the Frequentist and Bayesian approaches to statistical analysis.

7 minute read

As I sat at the cafe with my friend Christine, we couldn't help but get into a heated debate over the best approach to interpreting probability. Christine argued that the frequentist (or classical) interpretation, which sees probability as the frequency of random, repeatable events over a long period of time, was the only way to go. She pointed out that this approach is often used in experiments and observations where it is possible to repeat the process multiple times and calculate the probability of a certain outcome based on the frequency of that outcome occurring.

But I couldn't disagree more. I argued that the Bayesian interpretation of probability, which sees probabilities as quantifications of uncertainty, was the superior approach.

Christine: Using probabilities to represent uncertainty? Does that even make sense? Have any formal studies shown the rules of probability hold with this interpretation?

Me: Yeah, I read something in the book "Pattern Recognition and Machine Learning" by Christopher Bishop that I think is relevant

The use of probability to represent uncertainty, however, is not an ad-hoc choice, but is inevitable if we are to respect common sense while making rational coherent inferences. For instance, Cox (1946) showed that if numerical values are used to represent degrees of belief, then a simple set of axioms encoding common sense properties of such beliefs leads uniquely to a set of rules for manipulating degrees of belief that are equivalent to the sum and product rules of probability.

Do you think it would be nice to "somehow" be able to incorporate our prior beliefs into our analysis,
or would it introduce bias and subjectivity?

Christine: Wow, I didn't know that, but why would Bayesian Interpretation ever be helpful?

Me: Don't you see the problem with your approach? Consider an uncertain event like an earthquake that might destroy this cafe. It's not something we can repeat to define a notion of probability. But with the Bayesian approach, we can use probabilities to represent our uncertainty about the event.

Here's another thing. Using the frequentist approach, how would you incorporate new information into your probability estimates? For example, if you were trying to determine the probability of a tropical thunderstorm due to climate change and someone told you there were b winds 200km away, how would you account for this new information?

Christine: Okay Manas, I understand your concerns about the frequentist approach, but I still think it is the best!

  1. It's simpler and easier to use. You don't need to specify a prior probability distribution, and it relies on well- established statistical theory and methods.
  2. The frequentist approach is based on the idea of long-run frequencies, which are objective and not influenced by personal beliefs or biases.
  3. The frequentist approach is based on statistical sampling, which involves drawing a sample from a population and using it to make inferences about the population. This allows for the use of well-established statistical methods and leads to more consistent and reliable results.

Me: I see what you're saying, but using a prior can be valuable in certain situations. For example, let's say you're trying to model the probability of your favorite golf player, Smith, making a hole-in-one using a Bernoulli distribution. We can represent this by the random variable \(X\) where \(X = 1\) if Smith scores a hole-in-one otherwise \(X = 0\). \[P(X) = \begin{cases} \theta & X = 1 \\ 1-\theta & X = 0 \end{cases}\] Or in other words, \[P(X) = \theta^{X} (1-\theta)^{1-X}\] And let's say you observe that he makes a hole-in-one on three consecutive tries i.e. Your Observation \( = D = \{1, 1, 1\}\)

With the frequentist approach, the probability of Smith succeeding would be 1. We can find this out by maximizing our likelihood of observing the data. \[L(D | \theta) = \Pi_{i=1}^{3} \theta^{x_i} (1-\theta)^{1-x_i}\] Taking derivative with respect to \(p\) and solving for \(p\) in \(L'(p) = 0\), we get \(p = 1\).

Christine, we both know that no one is good enough to always make a hole-in-one. Obviously there is a flaw!

Now suppose you were initially clueless about \(\theta\) i.e. you assume it to come from a uniform distribution in the range 0 to 1. \[P(\theta) \sim \mathcal{U}(0, 1)\] In that case, you could use the Bayesian approach to incorporate this belief into your analysis. The posterior probability of the player making a hole-in-one would be: \[P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)}\] and since we...

Christine (interrupting): Isn't this the Bayes theorem developed by Reverend Thomas Bayes in the 18th Century?

Me: Yes Christine, no need to flex. So where was I? Yeah, since \[P(D) = \int_{\theta=0}^{\theta=1}P(D | \theta) P(\theta) d\theta\] and \[P(\theta) = \frac{1}{b-a} = 1\] we can calculate the probability of data as follows: \[P(D) = \int_{\theta=0}^{\theta=1} P(D | \theta) d\theta = \int_{\theta=0}^{\theta=1} \theta^3 d\theta = \frac{1}{4}\] Thus \[P(\theta | D) = \frac{\theta^3 \cdot 1}{\frac{1}{4}} = 4\theta^3 \] having a prior allowed us to incorporate our prior knowledge into the analysis and obtain more accurate results. Similarly, having a bad prior will of course, lead to bad results.

Christine: Thanks for the example, you just proved my first point about the superiority of the frequentist approach. Solving a continuous integral is so tedious and computationally expensive as well! I can't imagine any statistician wanting to deal with that.

Me: It's a small price to pay for salvation.

Christine: Sigh, I genuinely don't understand why you would want to use priors in situations where you can do experiments multiple times. Is the law of large numbers an alien concept to you?

Me: You know, the frequentist approach assumes that the probability of an event is fixed over time, based on the idea of long-run frequencies. But the Bayesian approach allows us to incorporate time-varying probabilities and update our estimates as new information becomes available. That's pretty cool, don't you think? And speaking of sample sizes, the Bayesian approach is definitely the way to go if you're working with a small sample. It's pretty much a no-brainer.

Christine: Are you suggesting that the Bayesian approach is more effective than using well-established statistical theories? It's safe to say that the probability of you getting a math degree is pretty much null and void.

We went back and forth for what felt like hours, each trying to convince the other of the superiority of our preferred interpretation. In the end, we agreed to disagree and decided to just enjoy our coffees instead.

By the way, if you had to choose, would you side with Christine or with me?

More Sources