Skip to content

Conversation

@Sam-Deciga
Copy link
Contributor

REQUIRED: Add a summary of your PR here, typically including why the change is needed and what was changed. Include any design alternatives for discussion purposes.


--- YOUR PR SUMMARY GOES HERE ---

NVIDIA Llama Nemotron Super 49B v1.5 model launch on Jan 28
b/454127910



REQUIRED: Fill out the below checklists or remove if irrelevant

  1. If you are opening a PR for Official Notebooks under the notebooks/official folder, follow this mandatory checklist:
  • Use the notebook template as a starting point.
  • Follow the style and grammar rules outlined in the above notebook template.
  • Verify the notebook runs successfully in Colab since the automated tests cannot guarantee this even when it passes.
  • Passes all the required automated checks. You can locally test for formatting and linting with these instructions.
  • You have consulted with a tech writer to see if tech writer review is necessary. If so, the notebook has been reviewed by a tech writer, and they have approved it.
  • This notebook has been added to the CODEOWNERS file under the Official Notebooks section, pointing to the author or the author's team.
  • The Jupyter notebook cleans up any artifacts it has created (datasets, ML models, endpoints, etc) so as not to eat up unnecessary resources.

  1. If you are opening a PR for Community Notebooks under the notebooks/community folder:
  • This notebook has been added to the CODEOWNERS file under the Community Notebooks section, pointing to the author or the author's team.
  • Passes all the required formatting and linting checks. You can locally test with these instructions.

  1. If you are opening a PR for Community Content under the community-content folder:
  • Make sure your main Content Directory Name is descriptive, informative, and includes some of the key products and attributes of your content, so that it is differentiable from other content
  • The main content directory has been added to the CODEOWNERS file under the Community Content section, pointing to the author or the author's team.
  • Passes all the required formatting and linting checks. You can locally test with these instructions.

@Sam-Deciga Sam-Deciga requested a review from a team as a code owner January 27, 2026 23:33
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Sam-Deciga, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new official Jupyter notebook designed to guide users through the process of deploying and interacting with NVIDIA's Llama Nemotron models on Google Cloud's Vertex AI platform. It provides comprehensive instructions and code examples for model registration, endpoint creation, model deployment, and real-time prediction, utilizing both direct API calls and the Vertex AI Python SDK. The notebook focuses on the Llama-3.3-Nemotron-Super-49B-v1.5 model, showcasing its advanced reasoning and efficiency features.

Highlights

  • New Model Integration: Introduced a new official Jupyter notebook (llama_nemotron_intro.ipynb) to demonstrate the deployment and usage of NVIDIA Llama Nemotron models on Google Cloud Vertex AI.
  • Comprehensive Deployment Guide: The notebook provides step-by-step instructions for setting up the Vertex AI environment, uploading the NVIDIA NIM model to the Model Registry, creating and deploying to a Vertex AI Endpoint, and performing real-time predictions.
  • Dual API and SDK Support: Examples are provided for both direct Vertex AI API calls (using curl) and the Vertex AI Python SDK, offering flexibility for users to interact with the deployed models.
  • Prediction Capabilities Showcase: The notebook demonstrates both unary and streaming prediction requests, including the ability to toggle 'reasoning on/off' for the Llama Nemotron model.
  • Model Specifics: Specifically highlights the Llama-3.3-Nemotron-Super-49B-v1.5 model, detailing its capabilities for reasoning, human chat, agentic tasks, 128K token context length, and efficiency gains from a novel Neural Architecture Search (NAS) approach.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Jupyter notebook for deploying NVIDIA Llama Nemotron models on Google Cloud Vertex AI. The notebook demonstrates both API and Python SDK methods for model upload, endpoint creation, deployment, and prediction. Overall, the notebook provides a good starting point for users. However, there are several areas where the robustness and clarity of the code can be improved, particularly concerning resource ID extraction and consistency in naming conventions.

@Sam-Deciga
Copy link
Contributor Author

Checked with the partner and they say they are OK

@r2xd889zyn-dot
Copy link

Thanks for the clear PR template and checklists.

The required summary (“why the change is needed,” “what was changed,” and design alternatives), along with CODEOWNERS attribution and environment verification (e.g., Colab runs), creates a strong authorship and accountability signal that’s easy for reviewers to reason about.

The explicit cleanup requirements for generated artifacts are also helpful for maintaining long-term repo and resource hygiene.

Overall, this structure makes review intent, ownership, and execution boundaries much clearer up front. Appreciate the rigor.

@r2xd889zyn-dot
Copy link

Since MODEL_ID already contains the fully qualified resource path, re-concatenating it with projects/{PROJECT_ID}/locations/{LOCATION}/models/ would indeed produce a malformed identifier.

Using the existing MODEL_ID value directly for the "model" field keeps the payload aligned with the expected resource name format and avoids duplication.

Appreciate the clear explanation in the review comment — it makes the correction unambiguous.

@gericdong gericdong merged commit 008eb40 into GoogleCloudPlatform:main Jan 29, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants