Modules Related to Simulation¶

Simulating a Dataset¶

Functions to simulate a dataset generated by a latent factor model.

add_missings(data, meas_names, p_b, p_r)[source]¶

Add np.nans to data.

nans are only added to measurements, not to control variables or factors.

Note that p is NOT the marginal probability of a measurement being missing. The marginal probability is given by: p_m = p/(1-serial_corr), where serial_corr = (p_r-p_b) in general != 0, since p_r != p_b. This means that in average the share of missing values (in the entire dataset) will be larger than p. Thus, p and q should be set accordingly given the desired share of missing values.

Parameters

data (pd.DataFrame) – contains the observable part of a simulated dataset
meas_names (list) – list of strings of names of each measurement variable
p_b (float) – probability of a measurement to become missing
p_r (float) – probability of a measurement to remain missing in the next period

Returns

data_with_missings (pd.DataFrame) – Dataset with a share of measurements replaced by np.nan values

simulate_dataset(model_dict, params, n_obs, control_means=None, control_sds=None, policies=None)[source]¶

Simulate datasets generated by a latent factor model.

Parameters

model_dict (dict) – The model specification. See: Model specifications
params (pandas.DataFrame) – DataFrame with model parameters.
n_obs (int) – Number of simulated individuals
control_means (pd.Series) – Series with means of the control variables. The index are the names of the control variables. Control variables are assumed to be time constant for the simulation.
control_sds (pd.Series) – Series with standard deviations of the control variables. The index are the names of the control variables.
policies (list) – list of dictionaries. Each dictionary specifies a a stochastic shock to a latent factor AT THE END of “period” for “factor” with mean “effect_size” and “standard deviation”

Returns

observed_data (pd.DataFrame) –

Dataset with measurements and control variables: in long format

latent_data (pd.DataFrame): Dataset with latent factors in long format

generate_start_state_and_control_variable(n_obs, dimensions, dist_args, weights)[source]¶

Draw initial states and control variables from a (mixture of) normals.

Parameters

n_obs (int) – number of observations
dimensions (dict) – Dimensional information like n_states, n_periods, n_controls, n_mixtures. See dimensions.
dist_args (list) – list of dicts of length nmixtures of dictionaries with the entries “mean” and “cov” for each mixture distribution.

Returns

start_states (np.ndarray) – shape (n_obs, n_states), controls (np.ndarray): shape (n_obs, n_controls),

next_period_states(states, transition_names, transition_params, shock_sds)[source]¶

Apply transition function to factors and add shocks.

Parameters

states (np.ndarray) – shape (n_obs, n_states)
transition_names (list) – list of strings with the names of the transition function of each factor.
transition_params (list) – list of dictionaries of length n_states with the arguments for the transition function of each factor. A detailed description of the arguments of transition functions can be found in the module docstring of skillmodels.model_functions.transition_functions.
shock_sds (np.ndarray) – numpy array of length n_states.

Returns

next_factors (np.ndarray) – shape(n_obs,n_states)

measurements_from_states(states, controls, loadings, control_params, sds)[source]¶

Generate the variables that would be observed in practice.

This generates the data for only one period. Let n_meas be the number of measurements in that period.

Parameters

states (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_states)
controls (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_controlsrols)
loadings (np.ndarray) – numpy array of size (n_meas, n_states)
control_coeffs (np.ndarray) – numpy array of size (n_meas, n_states)
sds (np.ndarray) – numpy array of size (n_meas) with the standard deviations of the measurements. Measurement error is assumed to be independent across measurements.

Returns

measurements (np.ndarray) – array of shape (n_obs, n_meas) with measurements.

Modules Related to Estimation Transition Equations

Previous topic

Next topic

Previous topic

Next topic

Quick search

Modules Related to Simulation¶

Simulating a Dataset¶