Maximum Likelihood Estimation (MLE)
IntroductionThis site provides a brief introduction to maximum likelihood estimation: the details are not essential to learn, but it is useful to have a grasp of some of the underlying principles.
ProbabilityThe concept of likelihood, introduced by Sir R. A. Fisher, is closely related to the more common concept of probability. We speak about the probability of observing events. For example, for an unbiased coin, the probability of observing heads is 0.5 for every toss. This is taken to mean that if a coin were tossed a large number of times then we would expect, on average, to find half of the time the coin landed heads, half of the time tails. There are certain laws of probability that allow us to make inferences and predictions based on probabilistic information. For example, the probabilities of different outcomes for a certain event must always add up to 1: if there is a 20% chance of rain today, there must be an 80% chance of no rain. Another very common law is that if two events are independent of one another (that is, they in no way influence each other), then the probability of certain pairs of outcomes will be the product of the two outcomes by themselves: if we toss a coin twice, the probability of getting 2 heads is 0.5 times 0.5 = 0.25.
Models: parameters and distributionsWhen we speak about the probability of observing events such as the outcome of a toss of a coin, we are implicitly assuming some kind of model, even in this simple case. In the case of a coin, the model would state that there is some certain, fixed probability for the particular outcomes. This model would have one parameter, p the probability of the coin landing on heads. If the coin is fair, then p=0.5. We can then speak about the probability of observing an event, given specific parameter values for the model. In this simple case, if p =0.5, then the probability of the coin landing heads on any one toss is also 0.5. In the case of this simple example, it does not seem that we have gained very much - we seem to be merely calling what was previously a simple probability the parameter of a model. As we shall see, however, this way of thinking provides a very useful framework for expressing more complex problems.
Conditional probabilityIn the real world, very few things have absolute, fixed probabilities. Many of the aspects of the world that we are familiar with are not truly random. Take for instance, the probability of developing schizophrenia. Say that the prevalence of schizophrenia in a population is 1%. If we know nothing else about an individual, we would say that the probability of this individual developing schizophrenia is 0.01. In mathematical notation,
P(Sz) = 0.01We know from empirical research, however, that certain people are more likely to develop schizophrenia than others. For example, having a schizophrenic first-degree relative greatly increases the risk of becoming schizophrenic. The probability above is essentially an average probability, taken across all individuals both with and without schizophrenic first-degree relatives. The notion of conditional probability allows us to incorporate other potentially important variables, such as the presence of familial schizophrenia, into statements about the probability of an individual developing schizophrenia. Mathematically, we write
P( X | Y)meaning the probability of X conditional on Y or given Y. In our example, we could write
P (Sz | first degree relative has Sz)and
P (Sz | first degree relative does not have Sz)Whether or not these two values differ is an indication of the influence of familial schizophrenia upon an individual's chances of developing schizophrenia.
P (H | p=0.5)where H is the event of obtaining a head and p is the model parameter, set at 0.5. Let's think a little more carefully about what the full model would be for tossing a coin, if p is the parameter. What do we know about coin tossing?
H, T, H, H, T, T, H, T, T.or
T, H, H, T, H, T, T, H, T.or even
H, H, H, H, T, T, T, T, T.Every one of the permutations is assumed to have equal probability of occurring - the coefficient
Return to front page
Site created by S.Purcell, last updated 20.05.2007