Research
My research goal is to develop communicative reasoners that can learn, adapt, and reason by interacting with the world and produce effective, explainable, and equitable outcomes. I apply my research to advance the frontier of science.
I led the development of Asta DataVoyager, a data-driven discovery tool to accelerate science with generative AI. We are the first to introduce the trustworthy evaluation harness for AI agents solving autonomous scientific discovery tasks.
Research highlights includes DataVoyager, Auto-discovery with Surprisal, DiscoveryBench, DiscoveryWorld, CodeScientist. I also design, train, and evaluate LLM agents at scale.
See my full body of work in Publications.
|
Awards
- [2025] Outstanding Position Paper in International Conference on Machine Learning (ICML), 2025
- [2024] Receipient of UCSD CSE Doctoral Dissertation Award
- [2022] Work recognized as Highlights of ACM RecSys' 22; invited for ACM Transactions on Recommendation Systems
- [2022] Receipient of TrustNLP Travel Grant award for NAACL 2022
- [2022] Receipient of UCSD CSE Doctoral Award for Excellence in Research, 2022
- [2022] Receipient of Adobe Research Fellowship, 2022
- [2021] Receipient of Friends of the International Center Fellowship, UC San Diego
- [2020] Receipient of Qualcomm Innovation Fellowship, 2020 from North America
- [2019] Intern Spotlight in Google-wide Engineering Newsletter for summer internship project with the Juicer Team
- [2019] Awarded $250,000 for leading UC San Diego (Team Bernard) in the finals of Alexa Prize 2019
- [2018] Department Fellowship, 1st-year of PhD, Dept. of CSE, UC San Diego
- [2017] Gold medal and Endowment for the highest academic performance (Rank-1) in Masters, IIT Kharagpur
- [2016] Finalist, Data Science Game '16, Paris; Represented India (1 out of 3 teams), International Rank 14
- [2015] Scholarship for academic excellence (obtaining CGPA > 9.5), Indian Statistical Institute
- [2011] 4-year scholarship for academic excellence, Ministry of Human Resource & Development, India
|
|
Book: Practical Natural Language Processing by O'Reilly
|
|
Practical Natural Language Processing
O'Reilly Media, 2020
Sowmya Vajjala, Bodhisattwa P. Majumder, Anuj Gupta, Harshit Surana
amazon |
safari online |
website
Practical Natural Language Processing
distills our collective wisdom on building real world applications such as data collection, working with noisy data and signals, incremental development of solutions, and issues involved in deploying the solutions as a part of a larger application - bridging a gap between current textbooks and online offerings.
Highlights:
- Endorsed by Zach Lipton, Sebastian Ruder, Marc Najork et al.
- #1 Best seller in Amazon.com in Data Mining category
- #1 New release in Amazon.com in Natural Language Processing category
- Read and adapted by 20+ AI companies and 6 academic courses internationally
|
|
Talks
Continual Learning with Language Agents | slides
[2024] at Commonsense Reasoning in Natural Language Processing, University of British Columbia
User-centic Natural Language Processing | video
[2023] PhD Defense, CSE, UC San Diego
Effective, Explainable, and Equitable NLP with Knowledge and Interactions | slides
- [2022] at Stanford University
- [2022] at Allen Institute for AI
- [2022] at University of Southern California/USC-ISI
- [2022] at Harvard University/Harvard Business School
- [2022] at University College London
- [2022] at UC Irvine
- [2022] at University of British Columbia
- [2022] at UC San Diego
Producing Explanations with Commonsense and Interactions | slides
- [2022] at AI Research Seminar, UC San Diego
- [2021] at Allen Institute for AI
Explainable Language Generation with Commonsense | slides
- [2021] at Facebook AI Research
- [2021] at Machine Learning Group, Oxford University
Grounding Language Generation with World Knowledge | slides
- [2021] at Microsoft Research, India
- [2021] at IIT Kharapgur
- [2020] at NC State, AI Club
- [2020] at INFORMS 2020, Mining and Learning on Graphs session, Washington, DC
Clarification Question Generation using Global Knowledge | slides
- [2021] at Microsoft Research, Redmond
- [2021] at AI Research Seminar, UC San Diego
Personalization, NLP and others
- [2020] at UC San Diego, CSE Research Open House, on Personalization in Natural Language Generation
- [2018] at Indian Inst of Management Calcutta, Industry Conclave & Graduate Orientation, on NLP - a primer
- [2017] at Walmart Labs, on Information Extraction from Images - Application in e-Commerce
- [2017] at Indian Statistical Institute, on Deep Neural Network: in light of Optimization and Regularization
|
Thanks to Jon Barron for this nice template!
Gorgeous Geisel Library cover art from here.
|
|