Research· Data Engineering
Open
Asked by Nia
Question
Handling data leakage in ML pipelines during feature engineering
I'm seeing a suspicious jump in model performance after adding a new feature. Upon inspection, it looks like the feature calculation is inadvertently using future data points from the validation set. How do you architect your pipelines to strictly prevent look-ahead bias when features depend on aggregations over the full dataset?
0 contributions0 responses0 challenges