I have a dataset with ["Uni", 'Region', "Profession", "Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'] columns. All values in ["Uni", 'Region', "Profession"] are filled while ["Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'] always contain NAs.
For each column with NAs there are several possible values
Level_Edu = ['undergrad', 'grad', 'PhD']
Financial_Base = ['personal', 'grant']
Learning_Time = ["morning", "day", "evening"]
GENDER = ['Male', 'Female']
I want to generate all possible combinations of ["Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'] for each observation in the initial data. So that each initial observation would be represented by 36 new observations (obtained by the formula of combinatorics: N1 * N2 * N3 * N4, where Ni is the length of the i-th vector of possible values for a column)
Here is a Python code for recreating two initial observations and approximation of the result I desire to get (showing 3 combinations out of 36 for each initial observation I want).
import pandas as pd
import numpy as np
sample_data_as_is = pd.DataFrame([["X1", "Y1", "Z1", np.nan, np.nan, np.nan, np.nan], ["X2", "Y2", "Z2", np.nan, np.nan, np.nan, np.nan]], columns=["Uni", 'Region', "Profession", "Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'])
sample_data_to_be = pd.DataFrame([["X1", "Y1", "Z1", "undergrad", "personal", "morning", 'Male'], ["X2", "Y2", "Z2", "undergrad", "personal", "morning", 'Male'],
["X1", "Y1", "Z1", "grad", "personal", "morning", 'Male'], ["X2", "Y2", "Z2", "grad", "personal", "morning", 'Male'],
["X1", "Y1", "Z1", "undergrad", "grant", "morning", 'Male'], ["X2", "Y2", "Z2", "undergrad", "grant", "morning", 'Male']], columns=["Uni", 'Region', "Profession", "Level_Edu", 'Financial_Base', 'Learning_Time', 'GENDER'])
sample_data_as_is
?