Causality and Fairness
Machine Learning methods rely heavily on supervision, i.e. the provision of “labels” or “annotations” in the training data. And, while they are impressive in identifying patterns that are useful for making predictions, they are not as successful in identifying causal relationships among the relevant variables so as to be able to make good counterfactual predictions. While causal inference is a widely studied field, there remain important challenges in bringing existing techniques to bear in the high-dimensional, non-parametric, and non-asymptotic sample settings of relevance to modern learning applications. We develop these foundations using a variety of approaches from high-dimensional Statistics, Econometrics and Machine Learning. We also use our techniques to improve model performance under distribution shift, decrease the level of supervision that is needed to train models, obtain models that generalize better to unseen and multi-modal data, and decrease the bias baked into trained models and the unfairness caused by their deployment.
The issue of algorithmic bias and fairness appears also in different contexts. We consider fairness for graphs. Graphs are a ubiquitous data model for entities, their relations, dependencies, and interactions. There are also multiple real-world graphs, ranging from social, communication and transportation networks to biological networks and the brain. We study the representation bias in graph data, as a result of data collection, and its effect on graph algorithms. We also consider different processes that happen on graphs, such as temporal evolution, ranking processes, information diffusion and opinion formation, and study their fairness. Finally, we explore explanations for bias of graph algorithms, where the goal is to explain the bias towards groups rather than individuals.