Bodhisattwa P. Majumder
bodhisattwa[at]ucsd.edu
Office @ 4146, CSE (EBU3B)
UC San Diego

I am a 2nd year Ph.D. student at the Artificial Intelligence Group, Computer Science Department, UC San Diego, advised by Prof. Julian McAuley. I work on Natural Language Generation and Conversational AI with a focus on personalization and common sense reasoning. Broadly my research interest lies at the intersection of Natural Language Processing and Machine learning.

I am spending my summer at NLP Group @ Microsoft Research working with Sudha Rao and Bill Dolan. In 2019, I spent another wonderful summer at Google AI Research with Sandeep Tata and Marc Najork. Currently, I also lead Team Bernard from UC San Diego in Amazon Alexa Prize.

Previously, I graduated (2017) summa cum laude from IIT Kharagpur with a Masters in Machine Learning. I was advised by Prof. Animesh Mukherjee and Prof. Pawan Goyal. Before joining UC San Diego, I was a Research Engineer at Walmart Labs building large-scale NLP and ML applications for eCommerce. I just finished writing my book on Practical NLP, published by O'Reilly Media.

By Note-to-Self
Museum of Photographic Arts (MOPA)
A fusion with piano notes and digital image

CV  |  Google Scholar  |  Github
LinkedIn  |  Twitter

Publications·Experiences·Awards·Education·Book by O'Reilly·Invited Talks
Highlights

2020 2019 2018
  • [Sept] Joined the NLP group at CSE, UC San Diego in Fall 2018.
  • [July] Paper w/ Amrith Krishna, Rajesh Bhat and Pawan Goyal got published in CoNLL, 2018.

Here in xkcd.

Research

I explore dialog systems, question-answering, grounded generation, and broadly various natural language generation tasks. I'm interested in developing generative models that are personalized, capable of reasoning common sense and world events, and providing subjective knowledge -- broadly connecting to an interactive system. My previous research on NLP includes information extraction, sequence labeling, sequence generation, and natural language parsers. I also worked on statistical modeling, game theory, and machine learning applications.

Selected reseach projects are listed here. The complete list of my publications can be seen from the Google Scholar page.

Publications
(* denotes equal contribution)
PontTuset

Representation Learning for Information Extraction from Form-like Documents
Bodhisattwa P. Majumder, Navneet Potti, Sandeep Tata, James Wendt, Qi Zhao, Marc Najork
2020 Annual Conference of the Association for Computational Linguistics (ACL)
pdf | blog | slides

We propose a novel approach using representation learning to extract structured information from form-like document images. Our extraction system learns interpretable representation for each candidates, generated based on types of the target fields, considering neighboring words and their spatial distribution in the document which are efficient in solving the extraction task for unseen document templates. This work was done at Google AI as a part of 2019 summer internship.

PontTuset

ReZero is All You Need: Fast Convergence at Large Depth
Thomas Bachlechner*, Bodhisattwa P. Majumder*, Henry Mao*, Gary Cottrell, Julian McAuley
Preprint. Work In Progress. arXiv, 2020
pdf | code

To facilitate deep signal propagation, we propose ReZero, a simple change to the architecture that initializes an arbitrary layer as the identity map, using a single additional learned parameter per layer. When applied to 12 layer Transformers, ReZero converges 56% faster on enwiki8. ReZero applies beyond Transformers to other residual networks, enabling 1,500% faster convergence for deep fully connected networks and 32% faster convergence for a ResNet-56 trained on CIFAR 10.

PontTuset

Generating Personalized Recipes from Historical User Preferences
Bodhisattwa P. Majumder*, Shuyang Li*, Jianmo Ni, Julian McAuley
2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)
pdf | code | data

Media coverage: Science Node, UCSD CSE News, UCSD JSOE News

We propose a new task of personalized recipe generation to help these users: expanding a name and incomplete ingredient details into complete natural-text instructions aligned with the user's historical preferences.

PontTuset

Improving Neural Story Generation by Targeted Common Sense Grounding
Henry Mao, Bodhisattwa P. Majumder, Julian McAuley, Gary Cottrell
2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)
pdf | code

We propose a simple multi-task learning scheme to achieve quantitatively better common sense reasoning in language models by leveraging auxiliary training signals from datasets designed to provide common sense grounding.

PontTuset

Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit
Amrith Krishna, Bodhisattwa P. Majumder, Rajesh S. Bhat, Pawan Goyal
2018 Conference on Computational Natural Language Learning (CoNLL), co-located with EMNLP
pdf | code+data | supplementary

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. We find that the use of copying mechanism (Gu et al., 2016) yields a percentage increase of 7.69 in Character Recognition Rate (CRR) than the current SOTA model in solving monotone sequence-to-sequence tasks (Schnober et al., 2016) This work was done in a collaboration with CNeRG.

PontTuset

An 'Eklavya' approach to learning Context Free Grammar rules for Sanskrit using Adaptor Grammar
Amrith Krishna, Bodhisattwa P. Majumder, Anil K. Boga, Pawan Goyal
17th World Sanskrit Conference , 2018
pdf

This work presents the use of Adaptor Grammar, a non-parametric Bayesian approach for learning (Probabilistic) Context Free Grammar productions from data. We discuss the effect of using Adaptor grammars for Sanskrit language at word-level supervised tasks such as compound type identification, identification of source and derived words from the corpora for derivational nouns and sentence-level structured prediction. This work was done in a collaboration with CNeRG.

PontTuset

Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce
Bodhisattwa P. Majumder*, Aditya Subramanian*, Abhinandan Krishnan, Shreyansh Gandhi, Ajinkya More
ArXiv , 2017
pdf | system description | video

We demonstrate the potential of neural recurrent structures in product attribute extraction by improving overall F1 scores, as compared to the previous benchmarks (More et al., 2016) by at least 0.0391. This has made Walmart e-commerce achieve a significant coverage of important facets or attributes of products. This work was done at Walmart Labs and was followed by a US patent from Wal-mart.

PontTuset

Distributed Semantic Representations of Retail Products based on Large-scale Transaction Logs
Bodhisattwa P. Majumder*, Sumanth S Prabhu*, Julian McAuley
2018
report

We processed 18 million transactions consisting of unique 325,548 products from 1,551 categories to obtain vector representations which preserve product analogy. These representations were effective in identifying substitutes and complements. This work was done at Walmart Labs.

PontTuset

When lolcats meet philosoraptors! - What's in a 'meme'?
Bodhisattwa P. Majumder, Amrith Krishna, Unni Krishnan, Anil K. Boga, Animesh Mukherjee
Arxiv , 2018
pdf | presentation

How similar are the dynamics of meme based communities to that of text based communities? We try to explain the community dynamics by categorising each day based on temporal variations in the user engagement. This work was done in a collaboration with CNeRG.

Patents
  • REDCLAN - RElative Density based CLustering and Anomaly Detection, Wal-mart, 2018
  • Automated Extraction of Product Attributes from Images, Wal-mart, 2018
  • System and Method for Product Attribute Extraction Using a Deep Recurrent System, Wal-mart, 2017
  • Analytical Determination of Competitive Interrelationship between Item Pairs, Wal-mart, 2017
Experiences
PontTuset

Microsoft Research, Redmond
Summer, 2020
Research Intern with Sudha Rao, Michel Galley, and Bill Dolan at Natural Language Processing Group.

Deveopling dialog and question generation models.

PontTuset

Amazon Alexa Prize
2019-2020
Team Leader of Bernard, UC San Diego.

Media Coverage: cnet
Building free-form social conversational agent as a finalist in the Amazon Alexa Prize Challenge 2019-2020 along with 9 other finalist universities. We have been awarded $250,000 to support our research on dialog systems.

PontTuset

Google AI, Mountain View
Summer, 2019
Research Intern with Sandeep Tata and Navneet Potti from Team Juicer.

Developed an Information Extraction Framework for form-like documents using representation learning. The work was published as an Intern spotlight article in the Google-wide Newsletter and is being integrated with Google Cloud's Document AI. Our work got accepted as a long paper in ACL '20.

PontTuset

Walmart Labs
2017-2018
Research Engineer

Developed a neural multimodal attribute tagging framework to improve faceted product using both product description and product images. The work produced 2 US patents and one technical report published in arXiv. Other works on user modeling and product embeddings also have been patented.

Services
Awards
  • [2020] Finalist in Qualcomm Innovation Fellowship, 2020 for North America
  • [2019] Nominated by UC San Diego (one of two from Dept. of CSE) for Google PhD Fellowhip 2020
  • [2019] Intern Spotlight in Google-wide Engineering Newsletter for summer internship project with the Juicer Team
  • [2019] Team Leader for Team Bernard represeting UC San Diego, a finalist in Alexa Prize 2019; awarded $250,000
  • [2018] Department Fellowship, 1st-year of PhD, Dept. of CSE, UC San Diego
  • [2017] Gold medal and Endowment for the highest academic performance (Rank-1) in Masters, IIT Kharagpur
  • [2016] Finalist, Data Science Game '16, Paris; Represented India (1 out of 3 teams), International Rank 14
  • [2015] Scholarship for academic excellence (obtaining CGPA > 9.5), Indian Statistical Institute
  • [2014] Officially entitled as contributor in NSF-CPS project (CNS -1136040) by PIs, Kansas State University
  • [2011] 4-year scholarship for academic excellence, Ministry of Human Resource & Development, India
Education
PontTuset

PhD, Computer Science and Engineering
University of California, San Diego
2018-Present

Advised by Prof. Julian McAuley on Adapting Personalization and Common Sense Reasoning in Language Generation for Interactive and Conversational Systems.

PontTuset

MS, Computer Science and Engineering
University of California, San Diego
2018-2020

CGPA: 4.0; Courses: Intro to NLP, Data Mining, Program Synthesis, Deep Learning for Sequences, Probabilistic Reasoning, Intro to Computer Vision, Convex Optimization, Human-centered Programming

PontTuset

MS, Data Science and Machine Learning
Indian Institute of Technology, Kharagpur
2015-2017

Summa cum laude (Gold Medalist); Advised by Prof. Animesh Mukherjee as a part of CNeRG lab. Courses: Algorithms, Intro to ML, Multivariate Analysis, Complex Networks, Information Retrieval

Book: Practical NLP by O'Reilly
PontTuset

Practical Natural Language Processing
O'Reilly Media, 2020
Sowmya Vajjala, Bodhisattwa P. Majumder, Anuj Gupta, Harshit Surana
pre-order | early release (requires login) | website

Practical Natural Language Processing is a guide to build, iterate and scale NLP systems in a business setting and to tailor them for various industry verticals. The book distills our collective wisdom on building real world applications such as data collection, working with noisy data and signals, incremental development of solutions, and issues involved in deploying the solutions as a part of a larger application - bridging a gap between current textbooks and online offerings.

Invited Talks
  • [2020, Upcoming] at INFORMS 2020, Mining and Learning on Graphs session in Washington, DC
  • [2020] at UC San Diego, CSE Research Open House, on Personalization in Natural Language Generation
  • [2018] at Indian Inst of Management Calcutta, Industry Conclave & Graduate Orientation, on NLP - a primer
  • [2017] at Walmart Labs, on Information Extraction from Images - Application in e-Commerce
  • [2017] at Indian Statistical Institute, on Deep Neural Network: in light of Optimization and Regularization

Thanks to Jon Barron for this nice template! Art by Bekin M ~