1 | Ayush Maheshwari

TL;DR: Comprehensive benchmark to evaluate LLMs on low-resource Indic languages. Dataset publicly available on HuggingFace. Addresses critical gap in multilingual NLP evaluation.

Ayush Maheshwari, Kaushal Sharma, Vivek Patel, Aditya Maheshwari

TL;DR: Graduate-level benchmark for evaluating LLM understanding on Indic subjects. Dataset available on HuggingFace. Addresses need for rigorous multilingual evaluation.

Ayush Maheshwari, Kaushal Sharma, Vivek Patel, Aditya Maheshwari

TL;DR: Domain-aware multilingual lexicon generation for 6 Indian languages across 8 domains using routing-based architecture. Released benchmark with 75K+ translation pairs. Accepted at ACL Main Conference 2025.

Ayush Maheshwari, Atul Kumar Singh, Karthika NJ, Krishnakant Bhat, Preethi Jyothi, Ganesh Ramakrishnan

TL;DR: ARISE iteratively induces rules and generates synthetic data for text classification via bootstrapping. Outperforms complex methods like contrastive learning across diverse domains and languages. Published at NAACL 2025 Findings.

Yaswanth M, Vaibhav Singh, Ayush Maheshwari, Amrith Krishna, Ganesh Ramakrishnan

TL;DR: DictDis disambiguates between multiple dictionary candidate translations in lexically constrained NMT. Achieves 2-3 BLEU point improvements across regulatory, finance, engineering, and health domains. Published at EMNLP 2024 Findings.

Ayush Maheshwari, Preethi Jyothi, Ganesh Ramakrishnan

TL;DR: FAIR filters automatically induced rules using submodular optimization that accounts for precision, coverage, and conflicts. Outperforms existing rule-filtering approaches with statistically significant results. Published at EACL 2024.

Divya Jyoti Bajpai, Ayush Maheshwari, Manjesh Kumar Hanawal, Ganesh Ramakrishnan

TL;DR: First comprehensive benchmark and dataset for English-Sanskrit machine translation. Addresses critical gap in classical language NLP. Published at LREC-COLING 2024.

Ayush Maheshwari, Ashim Gupta, Amrith Krishna, Atul Kumar Singh, Ganesh Ramakrishnan, G. Anil Kumar, Jitin Singla

TL;DR: EIGEN combines expert knowledge with joint learning for high-fidelity information extraction from medical document images. Applied to healthcare domain at ML4Health (NeurIPS) 2023.

Abhishek Singh, Venkatapathy Subramanian, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty, Ganesh Ramakrishnan

TL;DR: A production-ready MT post-editing tool used by 100+ translators to translate technical content into Indian languages. Won Best Paper Award at CODS-COMAD 2023. Impact: First batch of engineering books translated using UDAAN were released by the President of India.

Ayush Maheshwari, Ajay Ravindran, Venkatapathy Subramanian, Ganesh Ramakrishnan

TL;DR: Python library for programmatic data labeling using weak supervision. Reduces manual labeling effort by combining multiple weak labeling sources (rules, heuristics, models). Open-source toolkit with 100+ GitHub stars.

Sai Abhishek, Harshad Ingole, Parth Laturia, Vineeth Dorna, Ayush Maheshwari, Rishabh Iyer, Ganesh Ramakrishnan