**appelpy: Applied Econometrics Library for Python**

**appelpy** is the *Applied Econometrics Library for Python*. It seeks to bridge the gap between the software options that have a simple syntax (such as Stata) and other powerful options that use Python's object-oriented programming as part of data modelling workflows. ⚗️

Econometric modelling and general regression analysis in Python have never been easier!

The library builds upon the functionality of the 'vanilla' Python data stack (e.g. Pandas, Numpy, etc.) and other libraries such as Statsmodels.

## 10 Minutes to Appelpy

Explore the core functionality of Appelpy in the **10 Minutes To Appelpy** notebook (click the badges):

- : interactive experience of the
*10 Minutes to Appelpy*tutorial via Binder. - : static render of the
*10 Minutes to Appelpy*notebook.

# Installation

Install the library via the Pip command:

```
pip install appelpy
```

Supported for Python 3.6 and higher versions.

# Why Appelpy?

## Basic usage

It only takes **one line of code** to fit a basic linear model of 'y on X' and another line to return the model's results.

```
from appelpy.linear_model import OLS
model1 = OLS(df, y_list, X_list).fit() # y_list & X_list contain df columns
model1.results_output # returns (Statsmodels) summary results
```

Model objects have many useful attributes, e.g. the inputs X & y, standardized X and y values, results of fitted models (incl. standardized estimates). The library also has **diagnostic classes and functions** that consume model objects (or else their underlying data).

These are more things that can be obtained via **one line of code:**

*Diagnostics*can be called from the object: e.g. produce a P-P plot via`model1.diagnostic_plot('pp_plot')`

*Model selection statistics*: e.g. find the root mean square error of the model from`model1.model_selection_stats`

*Standardized model estimates*:`model1.results_output_standardized`

Classes in the library have a fluent interface, so that they can be instantiated and have chained methods in one line of code.

## Features that add value to model workflows in Python

See Appelpy's **key features** (with images), which add *so much more* to the vanilla Python data stack, e.g.:

- Fluent interface and API design make it easier to build pipelines for modelling & data pre-processing.
- More accessible interface for Stata users, while utilising the benefits of object-orientated programming.
- One method for calling
**diagnostic plots**to assess whether OLS assumptions hold in a model. **Useful encoders**for transforming datasets, e.g.`DummyEncoder`

and`InteractionEncoder`

.- Standardized model estimates (Beta coefficients).
- Decomposition of influence analysis into three parts: leverage, outlier and influence measures.
- Identify extreme observations in a model based on common heuristics.
**Perform diagnostics not implemented in the main Python libraries**, e.g. studentized Breusch–Pagan test of heteroskedasticity.

# Modules

## Exploration and pre-processing

functions for exploratory data analysis (EDA) of datasets, e.g.`eda`

:`statistical_moments`

for obtaining mean, variance, skewness and kurtosis of all numeric columns.classes and functions for data pre-processing, e.g. encoding of interaction effects and dummy variables in datasets.`utils`

:`DummyEncoder`

: encode dummy variables in a dataset based on different policies for dealing with NaN values.`InteractionEncoder`

: encode interaction effects of variables in a dataset.

## Model fitting

classes for linear models such as Ordinary Least Squares (OLS) and Weighted Least Squares (WLS).`linear_model`

:classes for discrete choice models, e.g. logistic regression (Logit).`discrete_model`

:

## Model diagnostics

`diagnostics`

:`BadApples`

: class for inspecting observations that could 'stink up' a model, i.e. the observations that are outliers, high-leverage points or else have high influence in a model.`variance_inflation_factors`

: function that returns variance inflation factor (VIF) scores for regressors in a dataset.`partial_regression_plot`

: also known as 'added variable plot'. Examine the effect of adding a regressor to a model.