top of page
Haosen Ge Ph.D.
WST: Weak-to-Strong Knowledge Transfer via Reinforcement Learning arXiv
NeurIPS 2025 Workshop on Efficient Reasoning
Haosen Ge; Shuo Li; Lianghuan Huang
Effective prompt engineering remains a challenging task for many applications. We introduce Weak-to-Strong Transfer (WST), a automatic prompt engineering framework where a small “Teacher” model generates instructions that enhance the performance of a much larger “Student” model. Unlike prior work, WST requires only a weak teacher, making it efficient and broadly applicable in settings where large models are closed-source or difficult to fine-tune. Using reinforcement learning, the Teacher Model’s instructions are iteratively improved based on the Student Model’s outcomes, yielding substantial gains across reasoning (MATH-500, GSM8K) and alignment (HH-RLHF) benchmarks—98% on MATH-500 and 134% on HH-RLHF—and surpassing baselines such as GPT-4o-mini and Llama-70B. These results demonstrate that small models can reliably scaffold larger ones, unlocking latent capabilities while avoiding misleading prompts that stronger teachers may introduce, establishing WST as a scalable solution for efficient and safe LLM prompt refinement.
Stochastic Online Conformal Prediction with Semi-Bandit Feedback arXiv
ICML 2025
Haosen Ge; Hamsa Bastani; Osbert Bastani
Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label with high probability. However, conformal prediction typically requires a large calibration dataset of i.i.d. examples. We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. Departing from existing work, we assume semi-bandit feedback, where we only observe the true label if it is contained in the prediction set. For instance, consider calibrating a document retrieval model to a new domain; in this setting, a user would only be able to provide the true label if the target document is in the prediction set of retrieved documents. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor. We evaluate our algorithm on a retrieval task, an image classification task, and an auction price-setting task, and demonstrate that it empirically achieves good performance compared to several baselines.
Generative AI without Guardrails Can Harm Learning: Evidence from High School Mathematics Journal Version
Proceedings of the National Academy of Sciences (2025)
Cited in the 2025 Economic Report of the President
Hamsa Bastani; Osbert Bastani; Alp Sungu; Haosen Ge; Özge Kabakcı; Rei Mariman
Generative artificial intelligence (AI) is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving human productivity. However, a key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs. We study the impact of generative AI, specifically OpenAI's GPT-4, on human learning in the context of math classes at a high school. In a field experiment involving nearly a thousand students, we have deployed and evaluated two GPT based tutors, one that mimics a standard ChatGPT interface (called GPT Base) and one with prompts designed to safeguard learning (called GPT Tutor). These tutors comprise about 15% of the curriculum in each of three grades. Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards included in GPT Tutor. Our results suggest that students attempt to use GPT-4 as a "crutch" during practice problem sessions, and when successful, perform worse on their own. Thus, to maintain long-term productivity, we must be cautious when deploying generative AI to ensure humans continue to learn critical skills.
Rethinking Algorithmic Fairness for Human-AI Collaboration arXiv
ITCS 2024
Haosen Ge; Hamsa Bastani; Osbert Bastani
Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional algorithmic fairness constraints. We illustrate the value of our approach on criminal sentencing data before and after the introduction of an algorithmic risk assessment tool in Virginia.
Transformer Encoder for Social Science arXiv Huggingface
Haosen Ge; In Young Park; Xuancheng Qian; Grace Zeng
High-quality text data has become an important data source for social scientists. We have witnessed the success of pretrained deep neural network models, such as BERT and RoBERTa, in recent social science research. In this paper, we propose a compact pretrained deep neural network, Transformer Encoder for Social Science (TESS), explicitly designed to tackle text processing tasks in social science research. Using two validation tests, we demonstrate that TESS outperforms BERT and RoBERTa by 16.7% on average when the number of training samples is limited (<1,000 training instances). The results display the superiority of TESS over BERT and RoBERTa on social science text processing tasks. Lastly, we discuss the limitation of our model and present advice for future researchers.
bottom of page