Skip to content

generate command #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 128 commits into
base: main
Choose a base branch
from
Draft

generate command #79

wants to merge 128 commits into from

Conversation

pelikhan
Copy link

@pelikhan pelikhan commented Jul 24, 2025

Implement PromptPex strategy to generate tests for prompts automatically.


🚀 Automated Prompt Test Generation: PromptPex Integration, Robust CLI, and Enhanced Utilities

This PR introduces advanced automated test generation for prompt files using the PromptPex methodology, empowering users to systematically validate and harden prompt engineering workflows.

Highlights:

  • 🧪 PromptPex Test Generation Pipeline:
    Implements a new generate CLI command that orchestrates intent analysis, input specification, rule extraction, scenario generation, and test case creation for prompts—enabling automated, stepwise test generation.

  • 🛠️ Extensive CLI Enhancements:

    • Adds robust session management with context loading, merging, and saving for persistent state and resumable test generation.
    • Supports customizable effort levels (low/medium/high) to control test generation depth and complexity.
    • Integrates custom system instructions and advanced options for granular control.
  • 🧰 Utility Functions & Helpers:

    • Introduces utilities for response normalization, code block handling, and prompt string rendering.
    • Adds SHA256 hashing for prompt versioning and integrity checks.
    • Enhances output formatting for clear, styled CLI feedback and debugging.
  • 🏗️ Improved Reliability & Error Handling:

    • Implements retry logic with exponential backoff for LLM API calls, including spinner-based progress indication and clear error reporting.
    • Expands .gitignore and CI tooling for better artifact management and workflow reliability.
  • 🧑‍🔬 Comprehensive Testing:

    • Adds unit tests for new utilities, CLI flag parsing, error scenarios, and command behaviors.
    • Provides scaffolding for temporary prompt files and LLM client mocks, ensuring repeatable and robust test coverage.
  • 📄 Documentation & Examples:

    • Expands documentation for the generate command, PromptPex integration, and advanced usage.
    • Adds example documents for custom instruction scenarios.
  • 🔍 Debugging & Transparency:

    • Adds HTTP request logging for the Azure client, aiding in request/response debugging.
    • Refines prompt file management with improved YAML serialization and dedicated save methods.

These changes deliver a powerful, research-backed framework for automated prompt validation—streamlining prompt engineering, improving reliability, and making the CLI experience more transparent and user-friendly.

AI-generated content by prd may be incorrect.

pelikhan added 30 commits July 21, 2025 13:41
- Implement tests for Float32Ptr to validate pointer creation for float32 values.
- Create tests for ExtractJSON to ensure correct extraction of JSON from various input formats.
- Add tests for cleanJavaScriptStringConcat to verify string concatenation handling in JavaScript context.
- Introduce tests for StringSliceContains to check for string presence in slices.
- Implement tests for MergeStringMaps to validate merging behavior of multiple string maps, including overwrites and handling of nil/empty maps.
… tests in export_test.go

- Changed modelParams from pointer to value in toGitHubModelsPrompt function for better clarity and safety.
- Updated the assignment of ModelParameters to use the value directly instead of dereferencing a pointer.
- Introduced a new test suite in export_test.go to cover various scenarios for GitHub models evaluation generation, including edge cases and expected outputs.
- Ensured that the tests validate the correct creation of files and their contents based on the provided context and options.
- Added NewPromptPex function to create a new PromptPex instance.
- Implemented Run method to execute the PromptPex pipeline with context management.
- Created context from prompt files or loaded existing context from JSON.
- Developed pipeline steps including intent generation, input specification, output rules, and tests.
- Added functionality for generating groundtruth outputs and evaluating test results.
- Implemented test expansion and rating features for improved test coverage.
- Introduced error handling and logging throughout the pipeline execution.
- Implemented TestCreateContext to validate various prompt YAML configurations and their expected context outputs.
- Added TestCreateContextRunIDUniqueness to ensure unique RunIDs are generated for multiple context creations.
- Created TestCreateContextWithNonExistentFile to handle cases where the prompt file does not exist.
- Developed TestCreateContextPromptValidation to check for valid and invalid prompt formats.
- Introduced TestGithubModelsEvalsGenerate to test the generation of GitHub Models eval files with various scenarios.
- Added TestToGitHubModelsPrompt to validate the conversion of prompts to GitHub Models format.
- Implemented TestExtractTemplateVariables and TestExtractVariablesFromText to ensure correct extraction of template variables.
- Created TestGetMapKeys and TestGetTestScenario to validate utility functions related to maps and test scenarios.
…se and restore its implementation; remove obsolete promptpex.go and summary_test.go files
…covering various scenarios and error handling
…neFlags function and update flag parsing to use consistent naming
pelikhan added 23 commits July 24, 2025 20:31
…ription for clarity; remove unused test functions
…erations field; update related tests for consistency
…s; update related tests and documentation for consistency
…pdate related parsing and test logic for consistency
…values; update related tests for consistency and remove unused test_types.go file
…s to values; update ApplyEffortConfiguration and tests for consistency
…e GetDefaultOptions and pipeline logic for usage
…nd update Test Generation section; add mermaid diagram for clarity
…tures

- Introduced constants for evaluator rules compliance in constants.go.
- Implemented GenerateRulesEvaluator function in evaluators.go for evaluating compliance with output rules.
- Updated GetDefaultOptions to include evaluation model in options.go.
- Modified pipeline to insert output rule evaluator into the prompt context.
- Refactored render functions to use new color constants.
- Added Eval field to PromptPexOptions in types.go for configuration.
…improved clarity and functionality; enhance test generation process with new rules and options
…ling in generateGroundtruth function and remove obsolete prompt_hash_test file
@pelikhan pelikhan requested a review from Copilot July 25, 2025 09:53
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements the generate command to automatically generate test cases for prompts using the PromptPex methodology. The implementation adds comprehensive test generation capabilities that analyze prompts and create diverse test scenarios to evaluate prompt behavior across different edge cases.

  • Adds new generate command with full PromptPex pipeline for automated test generation
  • Implements HTTP request logging functionality for debugging API interactions
  • Extends prompt file structure to support generated test data and evaluations

Reviewed Changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/prompt/prompt.go Added SaveToFile method and TestDataItem type, updated YAML tags for omitempty
internal/azuremodels/client.go Added HTTP logging context utilities for request debugging
internal/azuremodels/azure_client.go Implemented HTTP request logging to specified log files
examples/test_generate.yml Example generated prompt file with 40+ test cases and evaluator configuration
examples/custom_instructions_example.md Documentation for custom instruction flags usage
cmd/run/run.go Minor variable extraction refactor
cmd/root_test.go Added test assertion for generate command in help output
cmd/root.go Registered new generate command
cmd/generate/* Complete generate command implementation with pipeline, parsing, utilities, and tests
README.md Added comprehensive documentation for generate command and PromptPex methodology
Makefile Added ci-lint, build, and clean targets

defer sp.Stop()

resp, err := h.client.GetChatCompletionStream(ctx, req, h.org)
if err != nil {
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defer statement for sp.Stop() is placed inside a loop and will be executed when the function returns, not when the loop iteration ends. This could lead to multiple spinners running simultaneously. Consider calling sp.Stop() explicitly before continuing to the next iteration or restructuring the code.

Suggested change
if err != nil {
resp, err := h.client.GetChatCompletionStream(ctx, req, h.org)
if err != nil {
sp.Stop() // Ensure spinner is stopped before handling errors

Copilot uses AI. Check for mistakes.

}
reader := resp.Reader
//nolint:gocritic,revive // TODO
defer reader.Close()
Copy link
Preview

Copilot AI Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the spinner issue, the defer statement for reader.Close() is inside a loop and may not behave as expected. Consider explicit resource management.

Suggested change
defer reader.Close()

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants