My professor was teaching about hypothesis testing in class today.

It reminded me of some blogs by Allen Downey that I bookmarked ages ago.

I read through them in class and this is the framework to takeaway about hypothesis tests.

- Compute
**test statistic**that measures size of apparent effect. It could be a difference between two groups, absolute difference in means, see more examples here. We call this test statistic 𝛿 - Define a
**null hypothesis**, which is a model of the world under which the assumption that effect is not real, ex: if you think there is a difference between group A and B, H0 = there is no difference between A and B. - Model of null hypothesis should be stochastic, that is, capable of simulating data similar to original data.
- Goal: compute p-value (probability of seeing an effect as big as 𝛿 under null hypothesis). You can estimate p-value using
**simulation**: calculate the same test statistic you used on the actual data for each simulation. - Count the fraction of times the test statistic exceeds 𝛿. This fraction approximates p-value. If it's sufficiently small, you can conclude that the apparent effect is unlikely due to chance.

## Why simulation?

- analytical methods are slow and expensive, but even as computation gets faster, they are appealing because they are
**inflexible**: using a standard test -> particular test statistic and model, might not be appropriate for problem domain.**opaque**: real-world scenario has many possible models, based on different assumptions. In standard tests, assumptions are implicit, not easy to know whether model is appropriate.

- simulation on the other hand, are
**explicit**: creating a simulation forces you to think about your modeling decisions, the simulations themselves document those decisions.**arbitrarily flexible**: can try out several test statistics and models, can choose most appropriate one for the scenario.