Modules Related to Simulation¶
Simulating a Dataset¶
Functions to simulate a dataset generated by a latent factor model.
- add_missings(data, meas_names, p_b, p_r)[source]¶
Add np.nans to data.
nans are only added to measurements, not to control variables or factors.
Note that p is NOT the marginal probability of a measurement being missing. The marginal probability is given by: p_m = p/(1-serial_corr), where serial_corr = (p_r-p_b) in general != 0, since p_r != p_b. This means that in average the share of missing values (in the entire dataset) will be larger than p. Thus, p and q should be set accordingly given the desired share of missing values.
- Parameters
data (pd.DataFrame) – contains the observable part of a simulated dataset
meas_names (list) – list of strings of names of each measurement variable
p_b (float) – probability of a measurement to become missing
p_r (float) – probability of a measurement to remain missing in the next period
- Returns
data_with_missings (pd.DataFrame) – Dataset with a share of measurements replaced by np.nan values
- simulate_dataset(model_dict, params, n_obs, control_means=None, control_sds=None, policies=None)[source]¶
Simulate datasets generated by a latent factor model.
- Parameters
model_dict (dict) – The model specification. See: Model specifications
params (pandas.DataFrame) – DataFrame with model parameters.
n_obs (int) – Number of simulated individuals
control_means (pd.Series) – Series with means of the control variables. The index are the names of the control variables. Control variables are assumed to be time constant for the simulation.
control_sds (pd.Series) – Series with standard deviations of the control variables. The index are the names of the control variables.
policies (list) – list of dictionaries. Each dictionary specifies a a stochastic shock to a latent factor AT THE END of “period” for “factor” with mean “effect_size” and “standard deviation”
- Returns
observed_data (pd.DataFrame) –
- Dataset with measurements and control variables
in long format
latent_data (pd.DataFrame): Dataset with latent factors in long format
- generate_start_state_and_control_variable(n_obs, dimensions, dist_args, weights)[source]¶
Draw initial states and control variables from a (mixture of) normals.
- Parameters
n_obs (int) – number of observations
dimensions (dict) – Dimensional information like n_states, n_periods, n_controls, n_mixtures. See dimensions.
dist_args (list) – list of dicts of length nmixtures of dictionaries with the entries “mean” and “cov” for each mixture distribution.
- Returns
start_states (np.ndarray) – shape (n_obs, n_states), controls (np.ndarray): shape (n_obs, n_controls),
- next_period_states(states, transition_names, transition_params, shock_sds)[source]¶
Apply transition function to factors and add shocks.
- Parameters
states (np.ndarray) – shape (n_obs, n_states)
transition_names (list) – list of strings with the names of the transition function of each factor.
transition_params (list) – list of dictionaries of length n_states with the arguments for the transition function of each factor. A detailed description of the arguments of transition functions can be found in the module docstring of skillmodels.model_functions.transition_functions.
shock_sds (np.ndarray) – numpy array of length n_states.
- Returns
next_factors (np.ndarray) – shape(n_obs,n_states)
- measurements_from_states(states, controls, loadings, control_params, sds)[source]¶
Generate the variables that would be observed in practice.
This generates the data for only one period. Let n_meas be the number of measurements in that period.
- Parameters
states (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_states)
controls (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_controlsrols)
loadings (np.ndarray) – numpy array of size (n_meas, n_states)
control_coeffs (np.ndarray) – numpy array of size (n_meas, n_states)
sds (np.ndarray) – numpy array of size (n_meas) with the standard deviations of the measurements. Measurement error is assumed to be independent across measurements.
- Returns
measurements (np.ndarray) – array of shape (n_obs, n_meas) with measurements.