Academic papers, patents, and research work.
For citation metrics, h-index, and a complete publication list, see my Google Scholar profile.
Amazon Nova Team, Shubham Garg
Amazon Science · 2025
We present Amazon Nova 2, a family of multimodal foundation models that demonstrate state-of-the-art performance in reasoning and generation tasks across text, image, and video modalities.
Research Team, Shubham Garg
NeurIPS 2025 · 2025
We present MIRAGE, a comprehensive benchmark for evaluating multimodal AI systems in the context of agricultural expert consultations, requiring complex information-seeking and reasoning capabilities.
Research Team, Shubham Garg
ACL 2025 · 2025
We introduce Tree-of-Prompts, a framework for systematically optimizing prompts by abstracting the control-flow of prompt strategies into a searchable tree structure.
Amazon Nova Team, Shubham Garg
Amazon Science · 2025
Technical report detailing Amazon Nova Premier, our most capable foundation model designed for complex reasoning, analysis, and agentic workflows.
Research Team, Shubham Garg
CoRL 2025 · 2025
We propose a real-to-sim-to-real pipeline that leverages vision-language models to generate iterative keypoint rewards for learning robotic manipulation skills.
Amazon Nova Team, Shubham Garg
arXiv · 2024
We introduce the Amazon Nova family of foundation models, spanning multiple sizes and capabilities for diverse AI applications.
Research Team, Shubham Garg
CoRL 2024 · 2024
We present RoboEXP, a framework that constructs action-conditioned scene graphs through interactive exploration to enable more effective robotic manipulation.
Research Team, Shubham Garg
ECML 2023 · 2023
We propose contextual data augmentation techniques that improve the robustness and generalization of task-oriented dialog systems.
We present methods for synthesizing code-switched text in language pairs not seen during training, enabling better multilingual NLP systems.
Research Team, Shubham Garg
EMNLP 2022 (MMNLU Workshop) · 2022
We present a comprehensive empirical analysis of cross-lingual phenomena in voice assistant interactions, using large-scale data from Alexa to understand how users naturally mix languages.