GitHub - github/gh-models at pelikhan/promptpex

GitHub Models extension

Use the GitHub Models service from the CLI!

Using

Prerequisites

The extension requires the gh CLI to be installed and in the PATH. The extension also requires the user have authenticated via gh auth.

Installing

After installing the gh CLI, from a command-line run:

gh extension install https://github.com/github/gh-models

Upgrading

If you've previously installed the gh models extension and want to update to the latest version, you can run this command:

gh extension upgrade github/gh-models

Examples

Listing models

gh models list

Example output:

ID                              DISPLAY NAME
ai21-labs/ai21-jamba-1.5-large  AI21 Jamba 1.5 Large
openai/gpt-4.1                  OpenAI GPT-4.1
openai/gpt-4o-mini              OpenAI GPT-4o mini
cohere/cohere-command-r         Cohere Command R
deepseek/deepseek-v3-0324       Deepseek-V3-0324

Use the value in the "ID" column when specifying the model on the command-line.

Running inference

REPL mode

Run the extension in REPL mode. This will prompt you for which model to use.

gh models run

In REPL mode, use /help to list available commands. Otherwise just type your prompt and hit ENTER to send to the model.

Single-shot mode

Run the extension in single-shot mode. This will print the model output and exit.

gh models run openai/gpt-4o-mini "why is the sky blue?"

Run the extension with output from a command. This uses single-shot mode.

cat README.md | gh models run openai/gpt-4o-mini "summarize this text"

Evaluating prompts

Run evaluation tests against a model using a .prompt.yml file:

gh models eval my_prompt.prompt.yml

The evaluation will run test cases defined in the prompt file and display results in a human-readable format. For programmatic use, you can output results in JSON format:

gh models eval my_prompt.prompt.yml --json

The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.

Here's a sample GitHub Action that uses the eval command to automatically run the evals in any PR that updates a prompt file: evals_action.yml.

Learn more about .prompt.yml files here: Storing prompts in GitHub repositories.

Generating tests

Generate comprehensive test cases for your prompts using the PromptPex methodology:

gh models generate my_prompt.prompt.yml

The generate command analyzes your prompt file and automatically creates test cases to evaluate the prompt's behavior across different scenarios and edge cases. This helps ensure your prompts are robust and perform as expected.

Advanced options

You can customize the test generation process with various options:

# Specify effort level (low, medium, high)
gh models generate --effort high my_prompt.prompt.yml

# Use a specific model for groundtruth generation
gh models generate --groundtruth-model "openai/gpt-4.1" my_prompt.prompt.yml

# Disable groundtruth generation
gh models generate --groundtruth-model "none" my_prompt.prompt.yml

# Load from existing session file
gh models generate --session-file my_prompt.session.json my_prompt.prompt.yml

# Custom instructions for specific generation phases
gh models generate --instruction-intent "Focus on edge cases" my_prompt.prompt.yml

The command supports custom instructions for different phases of test generation:

--instruction-intent: Custom system instruction for intent generation
--instruction-inputspec: Custom system instruction for input specification generation
--instruction-outputrules: Custom system instruction for output rules generation
--instruction-inverseoutputrules: Custom system instruction for inverse output rules generation
--instruction-tests: Custom system instruction for tests generation

Understanding PromptPex

The generate command is based on PromptPex, a Microsoft Research framework for systematic prompt testing. PromptPex follows a structured approach to generate comprehensive test cases by:

Intent Analysis: Understanding what the prompt is trying to achieve
Input Specification: Defining the expected input format and constraints
Output Rules: Establishing what constitutes correct output
Inverse Output Rules: Force generating negated output rules to test the prompt with invalid inputs
Test Generation: Creating diverse test cases that cover various scenarios using the prompt, the intent, input specification and output rules

graph TD
    PUT(["Prompt Under Test (PUT)"])
    I["Intent (I)"]
    IS["Input Specification (IS)"]
    OR["Output Rules (OR)"]
    IOR["Inverse Output Rules (IOR)"]
    PPT["PromptPex Tests (PPT)"]

    PUT --> IS
    PUT --> I
    PUT --> OR
    OR --> IOR
    I ==> PPT
    IS ==> PPT
    OR ==> PPT
    PUT ==> PPT
    IOR ==> PPT

Notice

Remember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is subject to various limits (including requests per minute, requests per day, tokens per request, and concurrent requests) and is not designed for production use cases. GitHub Models uses Azure AI Content Safety. These filters cannot be turned off as part of the GitHub Models experience. If you decide to employ models through a paid service, please configure your content filters to meet your requirements. This service is under GitHub's Pre-release Terms. Your use of the GitHub Models is subject to the following Product Terms and Privacy Statement. Content within this Repository may be subject to additional license terms.

Name		Name	Last commit message	Last commit date
Latest commit History 448 Commits
.github		.github
.vscode		.vscode
cmd		cmd
examples		examples
genaisrc		genaisrc
internal		internal
pkg		pkg
script		script
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEV.md		DEV.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Models extension

Using

Prerequisites

Installing

Upgrading

Examples

Listing models

Running inference

REPL mode

Single-shot mode

Evaluating prompts

Generating tests

Advanced options

Understanding PromptPex

Notice

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors 16

Uh oh!

Languages

License

github/gh-models

Folders and files

Latest commit

History

Repository files navigation

GitHub Models extension

Using

Prerequisites

Installing

Upgrading

Examples

Listing models

Running inference

REPL mode

Single-shot mode

Evaluating prompts

Generating tests

Advanced options

Understanding PromptPex

Notice

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors 16

Uh oh!

Languages

Packages