Skillmodels Quickstart¶
[1]:
from skillmodels.likelihood_function import get_maximization_inputs
from skillmodels.config import TEST_DIR
import pandas as pd
import numpy as np
import yaml
from time import time
from estimagic import maximize
Loading Model Specification and Data¶
Model specifications are python dictionaries that can be safed in yaml or json files. For a moment, just assume you know how to write a model specification and have a skillmodels compatible dataset. Both are explained in different tutorials.
Next we load the model specification and the dataset.
[2]:
with open(TEST_DIR / "model2.yaml") as y:
model_dict = yaml.load(y, Loader=yaml.SafeLoader)
[3]:
data = pd.read_stata(TEST_DIR / "model2_simulated_data.dta")
data.set_index(["caseid", "period"], inplace=True)
Getting the inputs for estimagic.maximize
¶
Skillmodels basically just has one public function called get_maximization_inputs
. When called with a model specification and a dataset it contains a dictionary with everything you need to maximize the likelihood function using estimagic.
By everything you need I mean everything model specific. You should still use the optional arguments of maximize
to tune the optimization.
[4]:
max_inputs = get_maximization_inputs(model_dict, data)
/home/janos/anaconda3/envs/skillmodels/lib/python3.7/site-packages/jax/lib/xla_bridge.py:130: UserWarning: No GPU/TPU found, falling back to CPU.
warnings.warn('No GPU/TPU found, falling back to CPU.')
Filling the Params Template¶
Often you can greatly reduce estimation time by choosing good start parameters. What are good start parameters depends strongly on the model specifications, the scaling of your variables and the normalizations you make.
If you have strong difficulties to pick good start values, you probably want to think again about the interpretability of your model parameters and possibly change the normalizations and scaling of your measurements.
As a rule of thumb: If all measurements are standardized and, all fixed loadings are 1 and all fixed intercepts are 0 then one is a good start value for all free loadings and 0 is a good start value for all free intercepts.
Measurement and shock standard deviations are better started slightly larger than you would expect them.
Below I just load start parameters for the CHS example model that I filled out manually.
[5]:
params_template = max_inputs["params_template"]
params_template.head()
[5]:
value | lower_bound | ||||
---|---|---|---|---|---|
category | period | name1 | name2 | ||
controls | 0 | y1 | constant | NaN | -inf |
x1 | NaN | -inf | |||
y2 | constant | NaN | -inf | ||
x1 | NaN | -inf | |||
y3 | constant | NaN | -inf |
[6]:
index_cols = ["category", "period", "name1", "name2"]
chs_path = TEST_DIR / "regression_vault" / "chs_results.csv"
chs_values = pd.read_csv(chs_path)
chs_values.set_index(index_cols, inplace=True)
chs_values = chs_values[["chs_value", "good_start_value", "bad_start_value"]]
chs_values.head()
[6]:
chs_value | good_start_value | bad_start_value | ||||
---|---|---|---|---|---|---|
category | period | name1 | name2 | |||
controls | 0 | y1 | constant | 1.001618 | 1.0 | 0.0 |
x1 | 1.005455 | 1.0 | 0.0 | |||
y2 | constant | 1.031439 | 1.0 | 0.0 | ||
x1 | 0.975992 | 1.0 | 0.0 | |||
y3 | constant | 0.994091 | 1.0 | 0.0 |
[7]:
params = params_template.copy()
params["value"] = chs_values["chs_value"]
params.head()
[7]:
value | lower_bound | ||||
---|---|---|---|---|---|
category | period | name1 | name2 | ||
controls | 0 | y1 | constant | 1.001618 | -inf |
x1 | 1.005455 | -inf | |||
y2 | constant | 1.031439 | -inf | ||
x1 | 0.975992 | -inf | |||
y3 | constant | 0.994091 | -inf |
Time compilation speed¶
Skillmodels uses jax to just-in-time compile the numerical code and get a gradient of the likelihood function by automatic differentiation.
This has the downside that it does not work on windows and that it has a relatively long compile time before the model. If you just want
[8]:
debug_loglike = max_inputs["debug_loglike"]
loglike = max_inputs["loglike"]
gradient = max_inputs["gradient"]
loglike_and_gradient = max_inputs["loglike_and_gradient"]
[9]:
start = time()
debug_loglike_value = debug_loglike(params)
print(time() - start)
debug_loglike_value
8.9623384475708
[9]:
{'pre_update_states': fac1 fac2 fac3 mixture period measurement id
0 0.000000 0.000000 0.000000 0 0 y1 0
1 0.000000 0.000000 0.000000 0 0 y1 1
2 0.000000 0.000000 0.000000 0 0 y1 2
3 0.000000 0.000000 0.000000 0 0 y1 3
4 0.000000 0.000000 0.000000 0 0 y1 4
... ... ... ... ... ... ... ...
3995 -0.393568 -0.260489 -0.304120 0 7 Q1_fac1 3995
3996 0.111399 0.356767 0.299648 0 7 Q1_fac1 3996
3997 -0.321585 -0.549056 0.511860 0 7 Q1_fac1 3997
3998 0.293930 0.524977 -0.274992 0 7 Q1_fac1 3998
3999 -0.216362 0.023973 0.063563 0 7 Q1_fac1 3999
[236000 rows x 7 columns],
'post_update_states': fac1 fac2 fac3 mixture period id measurement
0 0.111871 0.003323 0.003591 0 0 0 y1
1 0.376414 0.011179 0.012084 0 0 1 y1
2 0.174989 0.005197 0.005618 0 0 2 y1
3 0.095859 0.002847 0.003077 0 0 3 y1
4 -0.311033 -0.009238 -0.009985 0 0 4 y1
... ... ... ... ... ... ... ...
3995 -0.015864 -0.255050 -0.288638 0 7 3995 Q1_fac1
3996 0.247573 0.359324 0.305756 0 7 3996 Q1_fac1
3997 -0.591241 -0.553675 0.501790 0 7 3997 Q1_fac1
3998 0.360258 0.525979 -0.271412 0 7 3998 Q1_fac1
3999 -0.288590 0.022965 0.060775 0 7 3999 Q1_fac1
[236000 rows x 7 columns],
'filtered_states': fac1 fac2 fac3 mixture period id
0 0.186376 0.091633 -0.106732 0 0 0
1 0.245693 -0.280525 0.285942 0 0 1
2 0.494929 0.037534 -0.458473 0 0 2
3 0.558594 0.067168 0.170920 0 0 3
4 0.084291 0.250471 0.466898 0 0 4
... ... ... ... ... ... ...
3995 -0.393568 -0.260489 -0.304120 0 7 3995
3996 0.111399 0.356767 0.299648 0 7 3996
3997 -0.321585 -0.549056 0.511860 0 7 3997
3998 0.293930 0.524977 -0.274992 0 7 3998
3999 -0.216362 0.023973 0.063563 0 7 3999
[32000 rows x 6 columns],
'state_ranges': {'fac1': minimum maximum
period
0 -1.286306 1.281440
1 -1.747732 1.704974
2 -1.956092 1.691804
3 -1.608079 1.783263
4 -1.835516 1.710544
5 -1.806694 1.614255
6 -1.728938 1.469292
7 -1.735790 1.567366, 'fac2': minimum maximum
period
0 -1.008363 1.138972
1 -1.308280 1.440339
2 -1.625487 1.482808
3 -1.539565 1.346799
4 -1.514808 1.541269
5 -1.377056 1.651247
6 -1.640329 1.301055
7 -1.422660 1.472981, 'fac3': minimum maximum
period
0 -1.243904 1.258522
1 -1.253434 1.257631
2 -1.265532 1.261948
3 -1.264254 1.254055
4 -1.264228 1.277353
5 -1.277315 1.293874
6 -1.288693 1.284524
7 -1.297715 1.306730},
'residuals': residual mixture period id measurement
0 0.431990 0 0 0 y1
1 1.453522 0 0 1 y1
2 0.675720 0 0 2 y1
3 0.370159 0 0 3 y1
4 -1.201056 0 0 4 y1
... ... ... ... ... ...
3995 2.461561 0 7 3995 Q1_fac1
3996 0.889661 0 7 3996 Q1_fac1
3997 -1.758525 0 7 3997 Q1_fac1
3998 0.433416 0 7 3998 Q1_fac1
3999 -0.470317 0 7 3999 Q1_fac1
[236000 rows x 5 columns],
'residual_sds': residual mixture period id measurement
0 0.825500 0 0 0 y1
1 0.825500 0 0 1 y1
2 0.825500 0 0 2 y1
3 0.825500 0 0 3 y1
4 0.825500 0 0 4 y1
... ... ... ... ... ...
3995 0.797613 0 7 3995 Q1_fac1
3996 0.797394 0 7 3996 Q1_fac1
3997 0.797556 0 7 3997 Q1_fac1
3998 0.797379 0 7 3998 Q1_fac1
3999 0.797690 0 7 3999 Q1_fac1
[236000 rows x 5 columns],
'all_contributions': contribution measurement period id
0 -0.864098 y1 0 0
1 -2.277343 y1 0 1
2 -1.062191 y1 0 2
3 -0.827706 y1 0 3
4 -1.785603 y1 0 4
... ... ... ... ...
3995 -5.454991 Q1_fac1 7 3995
3996 -1.314937 Q1_fac1 7 3996
3997 -3.123508 Q1_fac1 7 3997
3998 -0.840237 Q1_fac1 7 3998
3999 -0.866717 Q1_fac1 7 3999
[236000 rows x 4 columns],
'value': -251177.3089520689,
'contributions': array([-65.88419161, -57.35209259, -63.66290665, ..., -62.21729575,
-64.62474922, -61.60588466])}
[10]:
start = time()
loglike_value = loglike(params)
print(time() - start)
loglike_value
54.89086103439331
[10]:
{'contributions': array([-65.88419161, -57.35209259, -63.66290665, ..., -62.21729575,
-64.62474922, -61.60588466]), 'value': -251177.3089520689}
[11]:
%timeit loglike(params)
173 ms ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
[12]:
start = time()
gradient_value = gradient(params)
print(time() - start)
1000.3803389072418
[13]:
%timeit gradient(params)
614 ms ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[14]:
start = time()
loglike_and_gradient_value = loglike_and_gradient(params)
print(time() - start)
0.6180074214935303
[15]:
%timeit loglike_and_gradient(params)
618 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Insights¶
The debug_loglike
takes a few seconds for one evaluation. The first and all subsequent evaluations have the same speed.
You should use this if you want to quickly find out if your start parameters produce a valid likelihood or want to debug why it does not. As the name suggests, you can step through the debug_loglike
with a debugger. Moreover, the result of debug_loglike
containts intermediate results like the filtered states, Kalman residuals and others. All intermediate results are tidy DataFrames that make it easy to plot things. Besides debugging purposes, you can use the debug_loglike
to create
inputs for plotting functions.
The jitted loglike
takes long for the first evaluation (on my machine about 50 seconds, but under 200 milliseconds for all subsequent evaluations.
The gradient
or loglike_and_gradient
(which ever you call first) will take about 15 minutes to compile but under 700 milliseconds for all subsequent evaluations. In particular, evaluating just the gradient and evaluating loglike_and_gradient
takes exactly the same time! If you have an optimizer that always evaluates gradient and criterion simultaneously, use it!
A few additional constraints¶
To get the same values as CHS we will have to do a little more work. The reason is that on top of the many constraints skillmodels generates atuomatically from the model specification, CHS impose two more constraints:
All but the self productivity paramet in the linear transition equaltion are fixed to zero
The initial mean of the states is not estimated but assumed to be zero.
The anchoring parameters (intercepts, control variables, loadings and SDs of measurement error are pairwise equal across periods).
Fortunately, estimagic makes it easy to express such constraints:
[16]:
constraints = max_inputs["constraints"]
additional_constraints = [
{"query": "category == 'transition' & name1 == 'fac2' & name2 != 'fac2'",
"type": "fixed", "value": 0},
{"loc": "initial_states", "type": "fixed", "value": 0},
{"queries": [f"period == {i} & name1 == 'Q1_fac1'" for i in range(8)],
"type": "pairwise_equality"}
]
[17]:
constraints = constraints + additional_constraints
Generating a group column for better dashboard output¶
[18]:
from estimagic.optimization.process_constraints import process_constraints
pc, pp = process_constraints(constraints, params)
params["group"] = params.index.get_level_values("category")
params.loc["controls", "group"] = params.loc["controls"].index.get_level_values("name2")
params["group"] = params["group"].astype(str) + "_" + params.index.get_level_values("period").astype(str)
params["group"] = params["group"].str.replace("_", "-")
params["group"] = params["group"].astype("O")
params.loc[~pp["_internal_free"], "group"] = None
params
/home/janos/anaconda3/envs/skillmodels/lib/python3.7/site-packages/IPython/core/interactiveshell.py:2848: PerformanceWarning: indexing past lexsort depth may impact performance.
raw_cell, store_history, silent, shell_futures)
[18]:
value | lower_bound | group | ||||
---|---|---|---|---|---|---|
category | period | name1 | name2 | |||
controls | 0 | y1 | constant | 1.001618 | -inf | constant-0 |
x1 | 1.005455 | -inf | x1-0 | |||
y2 | constant | 1.031439 | -inf | constant-0 | ||
x1 | 0.975992 | -inf | x1-0 | |||
y3 | constant | 0.994091 | -inf | constant-0 | ||
... | ... | ... | ... | ... | ... | ... |
transition | 6 | fac1 | phi | -0.407018 | -inf | None |
fac2 | fac1 | 0.000000 | -inf | None | ||
fac2 | 0.608871 | -inf | None | |||
fac3 | 0.000000 | -inf | None | |||
constant | 0.000000 | -inf | None |
441 rows × 3 columns
Estimating the model¶
[19]:
params["value"] = chs_values["good_start_value"]
loc = params.query("category == 'shock_sds' & name1 == 'fac3'").index
params.loc[loc, "lower_bound"] = 0.00
loglike(params)
/home/janos/anaconda3/envs/skillmodels/lib/python3.7/site-packages/IPython/core/interactiveshell.py:2848: PerformanceWarning: indexing past lexsort depth may impact performance.
raw_cell, store_history, silent, shell_futures)
[19]:
{'contributions': array([-65.78459637, -56.84181997, -64.07640541, ..., -61.3151648 ,
-64.35146984, -61.33764354]), 'value': -249617.99170565547}
[32]:
res = maximize(
criterion=loglike,
params=params,
algorithm="scipy_lbfgsb",
criterion_and_derivative=loglike_and_gradient,
logging="model2.db",
constraints=constraints,
log_options={"if_exists": "replace"},
algo_options={"relative_criterion_tolerance": 1e-9}
)
[21]:
res["message"]
[21]:
b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Inference Using Estimagic¶
[22]:
from estimagic.inference.likelihood_inference import do_likelihood_inference
[24]:
inference_df, free_cov = do_likelihood_inference(
loglike=loglike,
params=res["solution_params"],
constraints=constraints,
)
[25]:
inference_df = inference_df.loc[free_cov.index]
[30]:
inference_df["correct"] = chs_values.loc[free_cov.index, "chs_value"]
[31]:
inference_df
[31]:
value | standard_error | p_value | ci_lower | ci_upper | stars | correct | ||||
---|---|---|---|---|---|---|---|---|---|---|
category | period | name1 | name2 | |||||||
controls | 0 | y1 | constant | 1.002213 | 0.026971 | 3.050083e-302 | 0.949350 | 1.055075 | *** | 1.001618 |
x1 | 1.005312 | 0.046886 | 5.360805e-102 | 0.913416 | 1.097209 | *** | 1.005455 | |||
y2 | constant | 1.032298 | 0.028347 | 2.284859e-290 | 0.976738 | 1.087858 | *** | 1.031439 | ||
x1 | 0.975610 | 0.048971 | 2.562509e-88 | 0.879627 | 1.071593 | *** | 0.975992 | |||
y3 | constant | 0.995045 | 0.031394 | 1.742298e-220 | 0.933513 | 1.056576 | *** | 0.994091 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
initial_cholcovs | 0 | mixture_0 | fac3-fac3 | 0.450412 | 0.016153 | 4.031445e-171 | 0.418752 | 0.482071 | *** | 0.481416 |
transition | 0 | fac1 | fac1 | 0.587785 | 0.009386 | 0.000000e+00 | 0.569389 | 0.606181 | *** | 0.659788 |
fac2 | 0.218174 | 0.007976 | 9.757881e-165 | 0.202540 | 0.233807 | *** | 0.174038 | |||
phi | -0.374843 | 0.097642 | 1.210932e-04 | -0.566221 | -0.183464 | *** | -0.407018 | |||
fac2 | fac2 | 0.612401 | 0.011861 | 0.000000e+00 | 0.589153 | 0.635649 | *** | 0.608871 |
203 rows × 7 columns
[ ]: