Estimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python

Estimate the Unobserved: Moving-Average Model Estimation with Maximum Likelihood in Python

How unobserved covariates’ coefficients can be estimated with MLE

For those experienced with time series data and forecasting, terms like regressions, AR, MA, and ARMA should be familiar. Linear Regression is a straightforward model with a closed-form parametric solution obtained through OLS. AR models can also be estimated using OLS. However, things become more complex with MA models, which form the second component of the more advanced ARMA and ARIMA models.

The plan for this story:

Introducing Moving Average models
Discussing why there’s no closed solution for MA model
Introducing Maximum Likelihood Estimation method
MA(1) ML estimation — Theory and Python code

Moving Average Model

MA models can be described by the following formula:

Here, the thetas represent the model parameters, while the epsilons are the error terms, assumed to be mutually independent and normally distributed with constant variance. The intuition behind this formula is that our time series, X, can always be described by the last q shocks that occurred in the series. It is evident from the formula that each shock impacts only the subsequent q values of X, in contrast to AR models where the effect of a shock persists indefinitely, although gradually diminishing over time.

Closed estimation of a simple model — Linear Regression

As a reminder, the general form of a linear regression equation looks like this:

For forecasting tasks, we typically aim to estimate all the model’s parameters (the betas) using a sample set of x’s and y’s. Given a few assumptions about our model, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimators for the betas have the lowest sampling variance among the class of linear unbiased estimators. In simpler terms, OLS provides us with the best possible estimation of the betas.

So, what is OLS? It is a closed-form solution to a loss function minimization problem:

where the loss function, S, is defined as follows –

In this context, y and X are our sample data and are observable vectors of numbers (as in time series). Therefore, it is straightforward to calculate the function S, determine its derivative, and find the beta that solves the minimization problem.

Closed estimation of MA(q)

It is should be clear why applying a method like OLS to estimate MA(q) models is problematic — the dependent variable, the time series values, are described by unobservable variables, the epsilons. This raises the question: how can these models be estimated at all?

Maximum Likelihood (MLE)

Likelihood Function

Statistical distributions typically depend on one or more parameters. For instance, the normal distribution is characterized by its mean and variance, which define its “height” and “center of mass” —

Suppose we have a dataset, X={x_1,…x_n}, comprising samples drawn from an unknown normal distribution, with its parameters unknown. Our objective is to determine the mean and variance values that would characterize the specific normal distribution from which our dataset X is most likely to have been sampled.

MLE provides a framework that precisely tackles this question. It introduces a likelihood function, which is a function that yields another function. This likelihood function takes a vector of parameters, often denoted as theta, and produces a probability density function (PDF) that depends on theta.

The likelihood function general definition

The probability density function (PDF) of a distribution is a function that takes a value, x, and returns its probability within the distribution. Therefore, likelihood functions are typically expressed as follows:

Likelihood as a function of theta given x

The value of this function indicates the likelihood of observing x from the distribution defined by the PDF with theta as its parameters.

The goal

When constructing a forecast model, we have data samples and a parameterized model, and our goal is to estimate the model’s parameters. In our examples, such as Regression and MA models, these parameters are the coefficients in the respective model formulas.

The equivalent in MLE is that we have observations and a PDF for a distribution defined over a set of parameters, theta, which are unknown and not directly observable. Our goal is to estimate theta.

The MLE approach involves finding the set of parameters, theta, that maximizes the likelihood function given the observable data, x.

We assume our samples, x, are drawn from a distribution with a known PDF that depends on a set of parameters, theta. This implies that the likelihood (probability) of observing x under this PDF is essentially 1. Therefore, identifying the theta values that make our likelihood function value close to 1 on our samples, should reveal the true parameter values.

Conditional likelihood

Notice that we haven’t made any assumptions about the distribution (PDF) on which the likelihood function is based. Now, let’s assume our observation X is a vector (x_1, x_2, …, x_n). We’ll consider a probability function that represents the probability of observing x_n conditional on that we have already observed (x_1, x_2, …, x_{n-1}) —

This represents the likelihood of observing just x_n given the previous values (and theta, the set of parameters). Now, we define the conditional likelihood function as follows:

Later, we will see why it is useful to employ the conditional likelihood function rather than the exact likelihood function.

Log-Likelihood

In practice, it is often convenient to use the natural logarithm of the likelihood function, referred to as the log-likelihood function:

This is more convenient because we often work with a likelihood function that is a joint probability function of independent variables, which translates to the product of each variable’s probability. Taking the logarithm converts this product into a sum.

Estimating MA(1) with MLE

For simplicity, I’ll demonstrate how to estimate the most basic moving average model — MA(1):

Here, x_t represents the time-series observations, alpha and beta are the model parameters to be estimated, and the epsilons are random noise drawn from a normal distribution with zero mean and some variance — sigma, which will also be estimated. Therefore, our “theta” is (alpha, beta, sigma), which we aim to estimate.

Let’s define our parameters and generate some synthetic data using Python:

import pandas as pd
import numpy as np

STD = 3.3
MEAN = 0
ALPHA = 18
BETA = 0.7
N = 1000

df = pd.DataFrame({"et": np.random.normal(loc=MEAN, scale=STD, size=N)})
df["et-1"] = df["et"].shift(1, fill_value=0)
df["xt"] = ALPHA + (BETA*df["et-1"]) + df["et"]

Note that we have set the standard deviation of the error distribution to 3.3, with alpha at 18 and beta at 0.7. The data looks like this —

Likelihood function for MA(1)

Our objective is to construct a likelihood function that addresses the question: how likely is it to observe our time series X=(x_1, …, x_n) assuming they are generated by the MA(1) process described earlier?

The challenge in computing this probability lies in the mutual dependence among our samples — as evident from the fact that both x_t and x_{t-1} depend on e_{t-1) — making it non-trivial to determine the joint probability of observing all samples (referred to as the exact likelihood).

Therefore, as discussed previously, instead of computing the exact likelihood, we’ll work with a conditional likelihood. Let’s begin with the likelihood of observing a single sample given all previous samples:

Conditional likelihood for observing x_n given the rest

This is much simpler to calculate because —

All that remains is to calculate the conditional likelihood of observing all samples:

applying a natural logarithm gives:

which is the function we should maximize.

Code

We’ll utilize the GenericLikelihoodModel class from statsmodels for our MLE estimation implementation. As outlined in the tutorial on statsmodels’ website, we simply need to subclass this class and include our likelihood function calculation:

from scipy import stats
from statsmodels.base.model import GenericLikelihoodModel
import statsmodels.api as sm

class MovingAverageMLE(GenericLikelihoodModel):
    def initialize(self):
        super().initialize()
        extra_params_names = ['beta', 'std']
        self._set_extra_params_names(extra_params_names)

        self.start_params = np.array([0.1, 0.1, 0.1])

    def calc_conditional_et(self, intercept, beta):
        df = pd.DataFrame({"xt": self.endog})
        ets = [0.0]
        for i in range(1, len(df)):
            ets.append(df.iloc[i]["xt"] - intercept - (beta*ets[i-1]))

        return ets

    def loglike(self, params):
        ets = self.calc_conditional_et(params[0], params[1])
        return stats.norm.logpdf(
            ets,
            scale=params[2],
        ).sum()

The function loglike is essential to implement. Given the iterated parameter values paramsand the dependent variables (in this case, the time series samples), which are stored as class members self.endog, it calculates the conditional log-likelihood value, as we discussed earlier.

Now let’s create the model and fit on our simulated data:

df = sm.add_constant(df) # add intercept for estimation (alpha)
model = MovingAverageMLE(df["xt"], df["const"])
r = model.fit()
r.summary()

and the output is:

And that’s it! As demonstrated, MLE successfully estimated the parameters we selected for simulation.

Wrapping Up

Estimating even a simple MA(1) model with maximum likelihood demonstrates the power of this method, which not only allows us to make efficient use of our data but also provides a solid statistical foundation for understanding and interpreting the dynamics of time series data.

Hope you liked it !

References

[1] Andrew Lesniewski, Time Series Analysis, 2019, Baruch College, New York

[2] Eric Zivot, Estimation of ARMA Models, 2005

Unless otherwise noted, all images are by the author

Estimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Originally appeared here:
Estimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python

Go Here to Read this Fast! Estimate the unobserved — Moving-Average Model Estimation with Maximum Likelihood in Python

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.