Modules Related to Simulation¶
Simulating a Dataset¶
Functions to simulate a dataset generated by a latent factor model.
-
add_missings
(data, meas_names, p_b, p_r)[source]¶ Add np.nans to data.
nans are only added to measurements, not to control variables or factors.
Note that p is NOT the marginal probability of a measurement being missing. The marginal probability is given by: p_m = p/(1-serial_corr), where serial_corr = (p_r-p_b) in general != 0, since p_r != p_b. This means that in average the share of missing values (in the entire dataset) will be larger than p. Thus, p and q should be set accordingly given the desired share of missing values.
- Parameters
data (pd.DataFrame) – contains the observable part of a simulated dataset
meas_names (list) – list of strings of names of each measurement variable
p_b (float) – probability of a measurement to become missing
p_r (float) – probability of a measurement to remain missing in the next period
- Returns
data_with_missings (pd.DataFrame) – Dataset with a share of measurements replaced by np.nan values
-
simulate_datasets
(labels, dimensions, n_obs, params, parsing_info, update_info, control_means, dist_arg_dict, dist_name='multivariate_normal', policies=None)[source]¶ Simulate datasets generated by a latent factor model.
- Parameters
labels (dict) – Dict of lists with labels for the model quantities like factors, periods, controls, stagemap and stages. See labels.
dimensions (dict) – Dimensional information like n_states, n_periods, n_controls, n_mixtures. See dimensions.
n_obs (int) – number of observations.
params (jax.numpy.array) – 1d array with model parameters
parsing_info (dict) – Dictionary with information on how the parameters have to be parsed.
control_mean (array) – 1d array with initial means of the control variables
dist_name (string) – the elliptical distribution to use in the mixture
dist_arg_dict (list) – list of length n_mixtures of dictionaries with the relevant arguments of the mixture distributions. Arguments with default values should NOT be included in the dictionaries. Lengths of arrays in the arguments should be in accordance with n_states + n_controls. the key names of dist_arg_dict can be looked up in the module elliptical_distributions. For multivariate_normal it’s mean and cov.
policies (list) – list of dictionaries. Each dictionary specifies a a stochastic shock to a latent factor AT THE END of “period” for “factor” with mean “effect_size” and “standard deviation”
- Returns
observed_data (pd.DataFrame) –
- Dataset with measurements and control variables
in long format
latent_data (pd.DataFrame): Dataset with latent factors in long format
-
generate_start_state_and_control_variables_elliptical
(n_obs, dimensions, dist_name, dist_arg_dict, weights)[source]¶ Draw initial states and control variables from a (mixture of) normals.
- Parameters
n_obs (int) – number of observations
dimensions (dict) – Dimensional information like n_states, n_periods, n_controls, n_mixtures. See dimensions.
dist_name (string) – the elliptical distribution to use in the mixture
dist_arg_dict (list of dict) – list of length nmixtures of dictionaries with the relevant arguments of the mixture distributions. Arguments with default values should NOT be included in the dictionaries. Lengths of arrays in the arguments should be in accordance with n_states + n_controls weights (np.ndarray): size (nmixtures). The weight of each mixture element. Default value is equal to 1.
- Returns
start_states (np.ndarray) – shape (n_obs, n_states), controls (np.ndarray): shape (n_obs, n_controls),
-
next_period_states
(states, transition_names, transition_params, shock_sds)[source]¶ Apply transition function to factors and add shocks.
- Parameters
states (np.ndarray) – shape (n_obs, n_states)
transition_names (list) – list of strings with the names of the transition function of each factor.
transition_params (list) – list of dictionaries of length n_states with the arguments for the transition function of each factor. A detailed description of the arguments of transition functions can be found in the module docstring of skillmodels.model_functions.transition_functions.
shock_sds (np.ndarray) – numpy array of length n_states.
- Returns
next_factors (np.ndarray) – shape(n_obs,n_states)
-
measurements_from_states
(states, controls, loadings, control_params, sds)[source]¶ Generate the variables that would be observed in practice.
This generates the data for only one period. Let n_meas be the number of measurements in that period.
- Parameters
states (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_states)
controls (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_controlsrols)
loadings (np.ndarray) – numpy array of size (n_meas, n_states)
control_coeffs (np.ndarray) – numpy array of size (n_meas, n_states)
sds (np.ndarray) – numpy array of size (n_meas) with the standard deviations of the measurements. Measurement error is assumed to be independent across measurements.
- Returns
measurements (np.ndarray) – array of shape (n_obs, n_meas) with measurements.
Sampling from Elliptical Distributions¶
Sample from elliptical distributions.
- Contains functions for simulating random vectors of arbitrary size from:
multivariate student’s t
multivariate symmetric stable (based on Nolan (2018) and Nolan (2013))
calls multivariate normal from np.random to be able to use with getattr() in simulate_data