Ayush Maheshwari

Ayush Maheshwari

(Now at Vizzhy Inc.) Grad student in NLP/ML at

CSE, IIT Bombay

Biography

** Update: Joined Vizzhy Inc as a Research Scientist**

I am Ayush Maheshwari (आयुष माहेश्वरी), a final year PhD student in the Indian Institute of Technology Bombay (India) with Prof. Ganesh Ramakrishnan and Prof. Manjesh Kumar Hanawal . I am fortunate to be funded by Ekal fellowship from Ekal foundation.

My research interests lie in the area of Natural Language Processing, Graphs from machine learning perspective. Currently, I am working on constrained neural machine translation and semi- and un-supervised machine learning problems with data-programming.

I am a key member of neural machine translation project, UDAAN, which helps publishers to quickly translate technical content in Indian languages. The project is open-source and used by several Indian government technical education agencies and official languages departments.

In my spare time, I enjoy playing and reading about Indian culture, Ramáyaṇa and Mahábhárat.

Download my resumé . (Last updated: Nov 2022)

Interests
  • Natural Language Processing
  • Human-in-the-loop AI
  • Neural Machine Translation
  • Machine Learning
  • Information Retrieval
Education
  • PhD in Computer Science, 2019-present

    Indian Institute of Technology Bombay

Updates

  • [Feb 24] Our paper on development of English - Sanskrit parallel corpus is accepted at LREC-COLING 2024. [Pre-print]

  • [Jan 24] Our paper on ‘Filtering of automatically induced rules for weak supervision’ accepted at EACL 2024 ❤️. [Paper]

  • [Nov 23] Paper accepted at Machine Learning for Health Conference (co-located with Neurips) Paper 🥳

  • 📝 Serving in the PC for EMNLP Research & Industry Track 2023.

  • 📝 Serving in the PC for ARR, 2022 - Present.

  • [Jul 23] 🎤 Along with Prof. Ganesh Ramakrishnan I delivered an invited half-day tutorial at Educational Data Mining Conference held at IISc Bangalore. More details in Talk section below ! 🤩

  • [Mar 23] 📚️ First batch of engineering books translated in Malayalam using our post-editing tool was released by Honourable President of India in Thiruvananthapurm, Kerala. More details on this page.

  • [Jan 23] 🏆️ Our paper on translation post-editing tool won the best paper award at CODS-COMAD 2023 🥳.

Click here for updates archive

Experience

 
 
 
 
 
Adobe Research
Research Intern
May 2021 – Aug 2021 Bengaluru

Worked on prototyping new service for Adobe PDF in the legal domain Responsibilities include:

  • Modeling of the problem
  • Designing, developing and prototyping using ML
  • Deployment and Demonstration
 
 
 
 
 
IIT Bombay
Project Engineer
Jan 2016 – Dec 2018 Mumbai
Develop software solutions for security agencies
 
 
 
 
 
Tata Consultancy Services
System Engineer
Oct 2011 – Jul 2013 Mumbai

Projects

UDAAN - An NMT pipeline + Post-editing tool to translate document (Best Paper Award at CODS-COMAD 2023)
UDAAN has an end-to-end Machine Translation (MT) and post-editing pipeline. Using our tool, users can upload a document, obtain raw MT output, and edit the raw translations. We have digitized >100 dictionaries from CSTT. You can freely download these dictionaries from the project website. Our pipeline is being used by >100 translators across 10 languages to translate >50 books.
SPEAR - Programmatically label and quickly build training data
SPEAR is a python library that reduce data labeling efforts using data programming. It implements several recent approaches such as Snorkel, ImplyLoss, Learning to reweight, etc. In addition to data labeling, it integrates semi-supervised approaches for training and inference.
Temples of India
Temples of India is a not-for-profit knowledge platform to document and store possibly all details of temples across Indian subcontinent. We aim to present each detail related to the temple such as its location, images of the temple, videos, open and close timings, etc.

Recent Publications

Quickly discover relevant content by filtering publications. Complete list at Google scholar.
A Benchmark and Dataset for Post-OCR text correction in Sanskrit
Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming
Rule Augmented Unsupervised Constituency Parsing
Unsupervised Learning of Explainable Parse Trees for Improved Generalisation