Workshop Presentation at NASSCOM NLTF 2024 - Building Indian Language Foundation Models

Name: Workshop Presentation at NASSCOM NLTF 2024 - Building Indian Language Foundation Models
Start: 2024-02-19T12:00:00Z
End: 2024-02-19T13:30:00Z
Location: Mumbai, India

Date

Feb 19, 2024 12:00 PM — 1:30 PM

Event

NASSCOM National Leadership and Technology Forum (NLTF 2024)

Location

Mumbai, India

Workshop: Building Indian Language Foundation Models

Presented at NASSCOM’s National Leadership and Technology Forum (NLTF 2024), India’s premier platform for technology and business leadership.

Presentation Overview

Shared insights and technical approaches from building large-scale foundation models specifically designed for Indian languages, addressing the unique challenges of linguistic diversity in India.

Key Topics Covered:

1. Data Collection & Processing:

Large-scale multilingual data curation strategies
Quality control for diverse Indian language data
Handling code-mixing and transliteration challenges
Building evaluation datasets for low-resource languages

2. Model Architecture & Training:

Tokenizer design for morphologically rich Indian languages
Training architecture for multilingual models
Distributed training on large accelerator clusters
Optimization techniques for efficient training

3. Model Tuning & Deployment:

Instruction tuning approaches
Preference training for alignment
Deployment considerations for production systems
Performance evaluation across language families

4. Real-world Impact:

Applications in education, government services, and content creation
Bridging the digital divide through vernacular AI
Democratizing access to AI technology across India

Context

This work was conducted while leading a team of 5 researchers building Indic large language models from scratch, combining technical innovation with practical deployment considerations for India’s multilingual landscape.

The presentation contributed to NASSCOM’s vision of positioning India as a global AI powerhouse.

Foundation Models LLM Indic Languages Multilingual NLP AI Deep Learning

Workshop Presentation at NASSCOM NLTF 2024 - Building Indian Language Foundation Models

Presentation Overview

Key Topics Covered:

Context

Ayush Maheshwari

Sr. Solutions Architect at NVIDIA
PhD in NLP/ML from CSE, IITB

Workshop Presentation at NASSCOM NLTF 2024 - Building Indian Language Foundation Models

Presentation Overview

Key Topics Covered:

Context

Ayush Maheshwari

Sr. Solutions Architect at NVIDIAPhD in NLP/ML from CSE, IITB

Sr. Solutions Architect at NVIDIA
PhD in NLP/ML from CSE, IITB