Bodhisattwa Prasad Majumder
bodhisattwa[at]ucsd.edu

I am a Ph.D. student at the Artificial Intelligence Group, Computer Science Department, UC San Diego. I am advised by Prof. Ndapa Nakashole. I focus on generalised methods for Natural Language Understanding (NLU) tasks. I explore multi-task learning, domain adaptation mainly to realize benefits in low-resource, limited labelled data regime. Broadly my research interest is at the intersection of Natural Language Processing (NLP), Computational Linguistics (CL) and Machine learning (ML).

I graduated from IIT Kharagpur with a Masters Degree majoring in Data Science and Machine Learning, under the supervision of Prof. Animesh Mukherjee. Before joining UC San Diego, I was a Research Engineer at Walmart Labs building large-scale NLP and Machine Learning applications for eCommerce. I actively collaborated with CNeRG, the NLP-Complex Network Group of IIT Kharagpur for research on NLP and CL.

CV  /  Google Scholar  /  LinkedIn  /  Github  /  Twitter

By Note-to-Self
Museum of Photographic Arts (MOPA)
A fusion with piano notes and digital image

News

Here in xkcd.

Research

I'm interested in building NLP models to tackle the case of low-resource and to achieve sample-efficiency. I try to design, develop and analyze generalized models for machine reading tasks to cater to domain and language adaptation. I also seek to obtain language representations which are domain-agnostic and can aid in low-resource machine reading. My previous research on NLP includes sequence labeling, sequence generation, and natural language parsers. I also worked on statistical modeling, game theory, and machine learning applications.

Selected reseach projects are listed here. The complete list of my publications can be seen from the Google Scholar page.

PontTuset

Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit
Amrith Krishna, Bodhisattwa P. Majumder, Rajesh S. Bhat, Pawan Goyal
22nd Conference on Computational Natural Language Learning (CoNLL), co-located with EMNLP , 2018
code+data / supplementary

We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. We find that the use of copying mechanism (Gu et al., 2016) yields a percentage increase of 7.69 in Character Recognition Rate (CRR) than the current SOTA model in solving monotone sequence-to-sequence tasks (Schnober et al., 2016)

This work was done in a collaboration with CNeRG.

PontTuset

An 'Eklavya' approach to learning Context Free Grammar rules for Sanskrit using Adaptor Grammar
Amrith Krishna, Bodhisattwa P. Majumder, Anil K. Boga, Pawan Goyal
17th World Sanskrit Conference , 2018

This work presents the use of Adaptor Grammar, a non-parametric Bayesian approach for learning (Probabilistic) Context Free Grammar productions from data. We discuss the effect of using Adaptor grammars for Sanskrit language at word-level supervised tasks such as compound type identification, identification of source and derived words from the corpora for derivational nouns and sentence-level structured prediction.

This work was done in a collaboration with CNeRG.

PontTuset

Deep Recurrent Neural Networks for Product Attribute Extraction in eCommerce
Bodhisattwa P. Majumder*, Aditya Subramanian*, Abhinandan Krishnan, Shreyansh Gandhi, Ajinkya More (* denotes equal contribution)
ArXiv , 2017
system description / video

We demonstrate the potential of neural recurrent structures in product attribute extraction by improving overall F1 scores, as compared to the previous benchmarks (More et al., 2016) by at least 0.0391. This has made Walmart e-commerce achieve a significant coverage of important facets or attributes of products.

This work was done at Walmart Labs and was followed by a US patent from Wal-mart.

PontTuset

Distributed Semantic Representations of Retail Products based on Large-scale Transaction Logs
Bodhisattwa P. Majumder*, Sumanth S Prabhu*, Julian McAuley
2018

We processed 18 million transactions consisting of unique 325,548 products from 1,551 categories to obtain vector representations which preserve product analogy. These representations were effective in identifying substitutes and complements.

This work was done at Walmart Labs.

PontTuset

When lolcats meet philosoraptors! - What's in a 'meme'?
Bodhisattwa P. Majumder, Amrith Krishna, Unni Krishnan, Anil K. Boga, Animesh Mukherjee
Arxiv , 2018
presentation

How similar are the dynamics of meme based communities to that of text based communities? We try to explain the community dynamics by categorising each day based on temporal variations in the user engagement.

This work was done in a collaboration with CNeRG.

Course Projects
pacman

Exploring Domain Adaptability for Sentiment Classification Models
Bodhisattwa P. Majumder*, Khalil Mrini*, Yutong Shao* (* denotes alphabatical order)
CSE 258, Fall 2018

We seek to explore the domain adaptability of machine learning models in a sentiment classification task. We train our model on reviews from one domain and perform a study on how that works on reviews from a different domain. Our goal is to achieve a generalized representation of the input text in a way that it captures the general sense of discriminative words or expressions for sentiments irrespective of domains.

Patents
  • [US patent] REDCLAN - RElative Density based CLustering and Anomaly Detection, Wal-mart, 2018
  • [US patent] Automated Extraction of Product Attributes from Images, Wal-mart, 2018
  • [US patent] System and Method for Product Attribute Extraction Using a Deep Recurrent System, Wal-mart, 2017
  • [US patent] Analytical Determination of Competitive Interrelationship between Item Pairs, Wal-mart, 2017
Invited Talks
  • [2018] at Indian Inst of Management Calcutta, Industry Conclave & Graduate Orientation, on NLP - a primer
  • [2017] at Walmart Labs, on Information Extraction from Images - Application in e-Commerce
  • [2017] at Indian Statistical Institute, on Deep Neural Network: in light of Optimization and Regularization

Thanks to Jon Barron for this nice template!