Ayush Maheshwari
Ayush Maheshwari
Home
Experience
Projects
Publications
Contact
1
IndicParam: Benchmark to evaluate LLMs on low-resource Indic Languages
TL;DR:
Comprehensive benchmark to evaluate LLMs on low-resource Indic languages. Dataset publicly available on HuggingFace. Addresses critical gap in multilingual NLP evaluation.
Ayush Maheshwari
,
Kaushal Sharma
,
Vivek Patel
,
Aditya Maheshwari
PDF
Cite
Code
Dataset
ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects
TL;DR:
Graduate-level benchmark for evaluating LLM understanding on Indic subjects. Dataset available on HuggingFace. Addresses need for rigorous multilingual evaluation.
Ayush Maheshwari
,
Kaushal Sharma
,
Vivek Patel
,
Aditya Maheshwari
PDF
Cite
Code
Dataset
LexGen: Domain-aware Multilingual Lexicon Generation
TL;DR:
Domain-aware multilingual lexicon generation for 6 Indian languages across 8 domains using routing-based architecture. Released benchmark with 75K+ translation pairs.
Accepted at ACL Main Conference 2025.
Ayush Maheshwari
,
Atul Kumar Singh
,
Karthika NJ
,
Krishnakant Bhat
,
Preethi Jyothi
,
Ganesh Ramakrishnan
PDF
Cite
Code
ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification
TL;DR:
ARISE iteratively induces rules and generates synthetic data for text classification via bootstrapping. Outperforms complex methods like contrastive learning across diverse domains and languages. Published at NAACL 2025 Findings.
Yaswanth M
,
Vaibhav Singh
,
Ayush Maheshwari
,
Amrith Krishna
,
Ganesh Ramakrishnan
PDF
Cite
DictDis: Dictionary Constrained Disambiguation for Improved NMT
TL;DR:
DictDis disambiguates between multiple dictionary candidate translations in lexically constrained NMT. Achieves 2-3 BLEU point improvements across regulatory, finance, engineering, and health domains. Published at EMNLP 2024 Findings.
Ayush Maheshwari
,
Preethi Jyothi
,
Ganesh Ramakrishnan
PDF
Cite
Code
FAIR: Filtering of Automatically Induced Rules
TL;DR:
FAIR filters automatically induced rules using submodular optimization that accounts for precision, coverage, and conflicts. Outperforms existing rule-filtering approaches with statistically significant results. Published at EACL 2024.
Divya Jyoti Bajpai
,
Ayush Maheshwari
,
Manjesh Kumar Hanawal
,
Ganesh Ramakrishnan
PDF
Cite
Code
Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation
TL;DR:
First comprehensive benchmark and dataset for English-Sanskrit machine translation. Addresses critical gap in classical language NLP. Published at LREC-COLING 2024.
Ayush Maheshwari
,
Ashim Gupta
,
Amrith Krishna
,
Atul Kumar Singh
,
Ganesh Ramakrishnan
,
G. Anil Kumar
,
Jitin Singla
PDF
Cite
Code
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
TL;DR:
EIGEN combines expert knowledge with joint learning for high-fidelity information extraction from medical document images. Applied to healthcare domain at ML4Health (NeurIPS) 2023.
Abhishek Singh
,
Venkatapathy Subramanian
,
Ayush Maheshwari
,
Pradeep Narayan
,
Devi Prasad Shetty
,
Ganesh Ramakrishnan
PDF
Cite
Code
UDAAN - Machine Learning based Post-Editing tool for Document Translation
TL;DR:
A production-ready MT post-editing tool used by 100+ translators to translate technical content into Indian languages. Won Best Paper Award at CODS-COMAD 2023.
Impact:
First batch of engineering books translated using UDAAN were released by the President of India.
Ayush Maheshwari
,
Ajay Ravindran
,
Venkatapathy Subramanian
,
Ganesh Ramakrishnan
PDF
Cite
Code
Project
SPEAR: Semi-supervised Data Programming in Python
TL;DR:
Python library for programmatic data labeling using weak supervision. Reduces manual labeling effort by combining multiple weak labeling sources (rules, heuristics, models). Open-source toolkit with 100+ GitHub stars.
Sai Abhishek
,
Harshad Ingole
,
Parth Laturia
,
Vineeth Dorna
,
Ayush Maheshwari
,
Rishabh Iyer
,
Ganesh Ramakrishnan
PDF
Cite
Code
Dataset
Project
»
Cite
×