The goal is to predict whether an employee will leave the company: yes or no. I have a dataframe with information about employees. There are 30 independent features and one dependent feature (Left: Yes or No). This data is gathered over a time frame of one year.
Now, I also have data about which employees worked on the same projects. Using this data, I included an extra independent feature in the original dataframe. I included the feature "leavers" which indicates with how many employees that eventually left the company an employee has worked with. An employee that worked with a lot of employees that eventually left the company has perhaps a higher propensity to leave because that employee is influenced by the other employees.
Would it be data leakage if I include this feature in my original dataframe, and do a train/test ?