logo
  • Getting Started
  • How-To Guides
  • Explanations
  • Reference Guides

Previous topic

Modules Related to Estimation

Next topic

Transition Equations

Quick search

On this page
  • Simulating a Dataset

Modules Related to Simulation¶

Simulating a Dataset¶

Functions to simulate a dataset generated by a latent factor model.

add_missings(data, meas_names, p_b, p_r)[source]¶

Add np.nans to data.

nans are only added to measurements, not to control variables or factors.

Note that p is NOT the marginal probability of a measurement being missing. The marginal probability is given by: p_m = p/(1-serial_corr), where serial_corr = (p_r-p_b) in general != 0, since p_r != p_b. This means that in average the share of missing values (in the entire dataset) will be larger than p. Thus, p and q should be set accordingly given the desired share of missing values.

Parameters
  • data (pd.DataFrame) – contains the observable part of a simulated dataset

  • meas_names (list) – list of strings of names of each measurement variable

  • p_b (float) – probability of a measurement to become missing

  • p_r (float) – probability of a measurement to remain missing in the next period

Returns

data_with_missings (pd.DataFrame) – Dataset with a share of measurements replaced by np.nan values

simulate_dataset(model_dict, params, n_obs, control_means=None, control_sds=None, policies=None)[source]¶

Simulate datasets generated by a latent factor model.

Parameters
  • model_dict (dict) – The model specification. See: Model specifications

  • params (pandas.DataFrame) – DataFrame with model parameters.

  • n_obs (int) – Number of simulated individuals

  • control_means (pd.Series) – Series with means of the control variables. The index are the names of the control variables. Control variables are assumed to be time constant for the simulation.

  • control_sds (pd.Series) – Series with standard deviations of the control variables. The index are the names of the control variables.

  • policies (list) – list of dictionaries. Each dictionary specifies a a stochastic shock to a latent factor AT THE END of “period” for “factor” with mean “effect_size” and “standard deviation”

Returns

observed_data (pd.DataFrame) –

Dataset with measurements and control variables

in long format

latent_data (pd.DataFrame): Dataset with latent factors in long format

generate_start_state_and_control_variable(n_obs, dimensions, dist_args, weights)[source]¶

Draw initial states and control variables from a (mixture of) normals.

Parameters
  • n_obs (int) – number of observations

  • dimensions (dict) – Dimensional information like n_states, n_periods, n_controls, n_mixtures. See dimensions.

  • dist_args (list) – list of dicts of length nmixtures of dictionaries with the entries “mean” and “cov” for each mixture distribution.

Returns

start_states (np.ndarray) – shape (n_obs, n_states), controls (np.ndarray): shape (n_obs, n_controls),

next_period_states(states, transition_names, transition_params, shock_sds)[source]¶

Apply transition function to factors and add shocks.

Parameters
  • states (np.ndarray) – shape (n_obs, n_states)

  • transition_names (list) – list of strings with the names of the transition function of each factor.

  • transition_params (list) – list of dictionaries of length n_states with the arguments for the transition function of each factor. A detailed description of the arguments of transition functions can be found in the module docstring of skillmodels.model_functions.transition_functions.

  • shock_sds (np.ndarray) – numpy array of length n_states.

Returns

next_factors (np.ndarray) – shape(n_obs,n_states)

measurements_from_states(states, controls, loadings, control_params, sds)[source]¶

Generate the variables that would be observed in practice.

This generates the data for only one period. Let n_meas be the number of measurements in that period.

Parameters
  • states (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_states)

  • controls (pd.DataFrame or np.ndarray) – DataFrame of shape (n_obs, n_controlsrols)

  • loadings (np.ndarray) – numpy array of size (n_meas, n_states)

  • control_coeffs (np.ndarray) – numpy array of size (n_meas, n_states)

  • sds (np.ndarray) – numpy array of size (n_meas) with the standard deviations of the measurements. Measurement error is assumed to be independent across measurements.

Returns

measurements (np.ndarray) – array of shape (n_obs, n_meas) with measurements.

Modules Related to Estimation Transition Equations

© Copyright 2016-2021, Janos Gabler.

Created using Sphinx 4.2.0.