-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Automate spam issue detection #11316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements an automated spam detection system for GitHub issues using AI inference. The system analyzes newly opened issues and labels suspected spam with the "suspected-spam" label.
- Adds a GitHub Actions workflow that triggers on issue creation to run spam detection
- Creates a modular bash script architecture for spam detection with AI prompt generation
- Includes evaluation tooling to test and validate the AI model's accuracy
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
.github/workflows/detect-spam.yml |
Main workflow file that triggers spam detection on new issues |
.github/workflows/scripts/spam-detection/process-issue.sh |
Main entry point that orchestrates spam detection and labeling |
.github/workflows/scripts/spam-detection/check-issue.sh |
Core logic for fetching issue data and running AI inference |
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh |
Generates comprehensive system prompt with project context and templates |
.github/workflows/scripts/spam-detection/generate-prompt.sh |
Formats issue title and body into structured prompt format |
.github/workflows/scripts/spam-detection/eval.sh |
Evaluation script for testing AI model accuracy |
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh
Outdated
Show resolved
Hide resolved
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh
Outdated
Show resolved
Hide resolved
Signed-off-by: Babak K. Shandiz <[email protected]>
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh
Outdated
Show resolved
Hide resolved
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
Signed-off-by: Babak K. Shandiz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @babakks! I think this is in a state where we can begin attempting to identify spammy issues, refining as we go along.
The following nits and questions are primarily about making some aspects easier for people to understand and maintain or minor rephrasing of particular content for prompts. They aren't deal breakers to block this work but I wanted to bring them up all the same.
<GitHub CLI docs> | ||
\`\`\` | ||
$(gh --help) | ||
\`\`\` | ||
</GitHub CLI docs> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how much context about gh
capabilities does the model need here?
gh reference
produces usage information for all available commands. I ask just being uncertain with how much the models have been trained on what gh
provides or what degree we need to tell it.
Does removing this section affect the response any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 concerns with using gh --help
is that it includes the user's gh alias
and gh extension
, which can create confusion and affect the results.
@BagToad had an idea we prepare a minimal list of gh
commands that are essential. 🤷
Expand for what this shows for me
$ gh --help
Work seamlessly with GitHub from the command line.
USAGE
gh <command> <subcommand> [flags]
CORE COMMANDS
auth: Authenticate gh and git with GitHub
browse: Open repositories, issues, pull requests, and more in the browser
codespace: Connect to and manage codespaces
gist: Manage gists
issue: Manage issues
org: Manage organizations
pr: Manage pull requests
project: Work with GitHub Projects.
release: Manage releases
repo: Manage repositories
GITHUB ACTIONS COMMANDS
cache: Manage GitHub Actions caches
run: View details about workflow runs
workflow: View details about GitHub Actions workflows
EXTENSION COMMANDS
backlog: Extension backlog
binary-ext: Extension binary-ext
catsup: Extension catsup
commit2pr: Extension commit2pr
copilot: Extension copilot
dash: Extension dash
eco: Extension eco
elephant-carpaccio: Extension elephant-carpaccio
kraken: Extension kraken
markdown-poc: Extension markdown-poc
models: Extension models
montage: Extension montage
publicize: Extension publicize
quoth: Extension quoth
repo-export: Extension repo-export
repo-stats: Extension repo-stats
script-ext: Extension script-ext
skyline: Extension skyline
slack: Extension slack
sonar: Extension sonar
token: Extension token
workflow-stats: Extension workflow-stats
ALIAS COMMANDS
slackd: Alias for "slack read -d"
unreleased: Shell alias for "!gh pr list --base trunk --limit 200 --search \"merged:>$(gh release view --js..."
ADDITIONAL COMMANDS
alias: Create command shortcuts
api: Make an authenticated GitHub API request
attestation: Work with artifact attestations
completion: Generate shell completion scripts
config: Manage configuration for gh
extension: Manage gh extensions
gpg-key: Manage GPG keys
label: Manage labels
preview: Execute previews for gh features
ruleset: View info about repo rulesets
search: Search for repositories, issues, and pull requests
secret: Manage GitHub secrets
ssh-key: Manage SSH keys
status: Print information about relevant issues, pull requests, and notifications across repositories
variable: Manage GitHub Actions variables
HELP TOPICS
accessibility: Learn about GitHub CLI's accessibility experiences
actions: Learn about working with GitHub Actions
environment: Environment variables that can be used with gh
exit-codes: Exit codes used by gh
formatting: Formatting options for JSON data exported from gh
mintty: Information about using gh with MinTTY
reference: A comprehensive reference of all gh commands
FLAGS
--help Show help for command
--version Show gh version
EXAMPLES
$ gh issue create
$ gh repo clone cli/cli
$ gh pr checkout 321
LEARN MORE
Use `gh <command> <subcommand> --help` for more information about a command.
Read the manual at https://cli.github.com/manual
Learn about exit codes using `gh help exit-codes`
Learn about accessibility experiences using `gh help accessibility`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just remove it altogether and run the evals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh
Outdated
Show resolved
Hide resolved
`gh` has Go templating support built in, so let's use it.
# The following `gh models eval` command will fail after 20 requests due to rate limits. | ||
# We are going to open up an issue in `github/gh-models` to address this. | ||
# | ||
# TODO: break up `eval-prompts.yml` file into smaller batches to avoid hitting the rate limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created github/gh-models#74 to discuss this further with github/gh-models
team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paired on this with @andyfeller and also self-reviewed again. LGTM
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [cli/cli](https://github.com/cli/cli) | minor | `v2.74.2` -> `v2.76.1` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>cli/cli (cli/cli)</summary> ### [`v2.76.1`](https://github.com/cli/cli/releases/tag/v2.76.1): GitHub CLI 2.76.1 [Compare Source](cli/cli@v2.76.0...v2.76.1) #### `gh pr create` regression fix This release fixes a regression introduced in `v2.76.0` where organization teams were retrieved outside of intentional use cases. This caused problems for GitHub Enterprise Server users using the GitHub Actions automatic token that does not have access to organization teams. For more information, see cli/cli#11360 #### What's Changed ##### 🐛 Fixes - Fix: `gh pr create`, only fetch teams when reviewers contain a team by [@​BagToad](https://github.com/BagToad) in cli/cli#11361 ##### 📚 Docs & Chores - add tenancy aware for san matcher by [@​ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11261 - Run Lint and Tests on `push` to `trunk` branch by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11325 - update ownership of pkg/cmd/release/shared/ by [@​ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11326 - Automate spam issue detection by [@​babakks](https://github.com/babakks) in cli/cli#11316 - Improve `api` `--preview` docs by [@​jsoref](https://github.com/jsoref) in cli/cli#11274 - Incorporate govulncheck into workflows by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11332 - chore(deps): bump advanced-security/filter-sarif from 1.0.0 to 1.0.1 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11298 - chore(deps): bump github.com/sigstore/sigstore-go from 1.0.0 to 1.1.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11307 **Full Changelog**: cli/cli@v2.76.0...v2.76.1 ### [`v2.76.0`](https://github.com/cli/cli/releases/tag/v2.76.0): GitHub CLI 2.76.0 [Compare Source](cli/cli@v2.75.1...v2.76.0) ####Copilot Coding Agent Support GitHub Copilot Pro+ and Copilot Enterprise subscribers can now assign issues to GitHub Copilot during issue creation using: - Command-line flag: `gh issue create --assignee @​copilot` - Launching web browser: `gh issue create --assignee @​copilot --web` - Or interactively selecting `Copilot (AI)` as assignee in `gh issue create` metadata For more details, refer to [the full changelog post for Copilot coding agent](https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/). #### What's Changed ##### ✨ Features - Assign Copilot during `gh issue create` by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11279 - Display immutable field in `release view` command by [@​bdehamer](https://github.com/bdehamer) in cli/cli#11251 ##### 🐛 Fixes - FIX: Do not fetch logs for skipped jobs by [@​babakks](https://github.com/babakks) in cli/cli#11312 - Transform `extension` and `filename` qualifiers into `path` qualifier for web code search by [@​samcoe](https://github.com/samcoe) in cli/cli#11211 ##### 📚 Docs & Chores - FIX: Workflow does not contain permissions by [@​BagToad](https://github.com/BagToad) in cli/cli#11322 - Add automated feature request response workflow by [@​BagToad](https://github.com/BagToad) in cli/cli#11299 **Full Changelog**: cli/cli@v2.75.1...v2.76.0 ### [`v2.75.1`](https://github.com/cli/cli/releases/tag/v2.75.1): GitHub CLI 2.75.1 [Compare Source](cli/cli@v2.75.0...v2.75.1) #### What's Changed ##### 🐛 Fixes - Ensure hostnames are visible in CLI website by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11295 - Revert "Fix: `gh pr create` prioritize `--title` and `--body` over `--fill` when `--web` is present" by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11300 ##### 📚 Docs & Chores - Ensure go directive is always .0 version in bump by [@​williammartin](https://github.com/williammartin) in cli/cli#11259 - Minor (1-word) documentation typo in generated `~/.config/gh/config.yml` by [@​kurahaupo](https://github.com/kurahaupo) in cli/cli#11246 - Automate closing of stale issues by [@​babakks](https://github.com/babakks) in cli/cli#11268 - Filter the `third-party/` folder out of CodeQL results by [@​BagToad](https://github.com/BagToad) in cli/cli#11278 - Exclude `third-party` source from golangci-lint by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11293 #####
Dependencies - Bump Go to 1.24.5 by [@​github-actions](https://github.com/github-actions)\[bot] in cli/cli#11255 - chore(deps): bump github.com/sigstore/protobuf-specs from 0.4.3 to 0.5.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11263 - chore(deps): bump golang.org/x/term from 0.32.0 to 0.33.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11266 - chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11264 - chore(deps): bump golang.org/x/text from 0.26.0 to 0.27.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11265 - chore(deps): bump golang.org/x/crypto from 0.39.0 to 0.40.0 by [@​dependabot](https://github.com/dependabot)\[bot] in cli/cli#11275 #### New Contributors - [@​kurahaupo](https://github.com/kurahaupo) made their first contribution in cli/cli#11246 - [@​github-actions](https://github.com/github-actions)\[bot] made their first contribution in cli/cli#11255 **Full Changelog**: cli/cli@v2.75.0...v2.75.1 ### [`v2.75.0`](https://github.com/cli/cli/releases/tag/v2.75.0): GitHub CLI 2.75.0 [Compare Source](cli/cli@v2.74.2...v2.75.0) #### What's Changed ##### ✨ Features - init release verify subcommands by [@​ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11018 - Embed Windows resources (VERSIONINFO) during build by [@​babakks](https://github.com/babakks) in cli/cli#11048 - Support `--no-repos-selected` on `gh secret set` by [@​williammartin](https://github.com/williammartin) in cli/cli#11217 ##### 🐛 Fixes - Fix: `gh pr create` prioritize `--title` and `--body` over `--fill` when `--web` is present by [@​dankrzeminski32](https://github.com/dankrzeminski32) in cli/cli#10547 - fix: get token for active user instead of blank if possible by [@​anuraaga](https://github.com/anuraaga) in cli/cli#11038 - Use Actions API to retrieve job run logs as a fallback mechanism by [@​babakks](https://github.com/babakks) in cli/cli#11172 - Fix query object state mutation during pagination by [@​babakks](https://github.com/babakks) in cli/cli#11244 - Handle `HTTP 404` when deleting remote branch in `pr merge` by [@​babakks](https://github.com/babakks) in cli/cli#11234 ##### 📚 Docs & Chores - chore: fix function name by [@​jinjingroad](https://github.com/jinjingroad) in cli/cli#11149 - chore: update Go version to 1.24 in devcontainer configuration and docs by [@​tMinamiii](https://github.com/tMinamiii) in cli/cli#11158 - Ensure lint workflow checks whether 3rd party license and code is up to date by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11047 - docs: install\_linux.md: add Solus linux install instructions by [@​chax](https://github.com/chax) in cli/cli#10823 - Fix missing newline in install\_linux.md by [@​BagToad](https://github.com/BagToad) in cli/cli#11160 - Ensure automation uses pinned go-licenses version by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11161 - Add `workflow_dispatch` support to MR Help Wanted check by [@​BagToad](https://github.com/BagToad) in cli/cli#11179 - Remove unused `GH_TOKEN` env variable from workflow by [@​BagToad](https://github.com/BagToad) in cli/cli#11190 - Add workflow to automate go version bumping by [@​williammartin](https://github.com/williammartin) in cli/cli#11189 - Fix inconsistent use of tabs and spaces by [@​Stefan-Heimersheim](https://github.com/Stefan-Heimersheim) in cli/cli#11194 - Decouple arg parsing from MR finder by [@​babakks](https://github.com/babakks) in cli/cli#11192 - docs: consistently use `apt` in installation instructions by [@​tklauser](https://github.com/tklauser) in cli/cli#11216 - Ensure bump go script has git user configured by [@​williammartin](https://github.com/williammartin) in cli/cli#11229 - Inject token into bump-go workflow by [@​williammartin](https://github.com/williammartin) in cli/cli#11233 - Reinstating Primer Style CLI content within `cli/cli` repository by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11060 - Add setup-go to bump-go workflow by [@​williammartin](https://github.com/williammartin) in cli/cli#11237 - Ensure GoReleaser does not break on Mac OS and Linux when skipping Windows `.rsyso` generation script by [@​andyfeller](https://github.com/andyfeller) in cli/cli#11257 #####
Dependencies - Bump all dependencies except dev-tunnels by [@​williammartin](https://github.com/williammartin) in cli/cli#11203 - Update microsoft dev-tunnels to v0.1.13 by [@​williammartin](https://github.com/williammartin) in cli/cli#11205 - Consume dependabot minor versions for go modules by [@​williammartin](https://github.com/williammartin) in cli/cli#11213 #### New Contributors - [@​jinjingroad](https://github.com/jinjingroad) made their first contribution in cli/cli#11149 - [@​tMinamiii](https://github.com/tMinamiii) made their first contribution in cli/cli#11158 - [@​chax](https://github.com/chax) made their first contribution in cli/cli#10823 - [@​dankrzeminski32](https://github.com/dankrzeminski32) made their first contribution in cli/cli#10547 - [@​anuraaga](https://github.com/anuraaga) made their first contribution in cli/cli#11038 - [@​Stefan-Heimersheim](https://github.com/Stefan-Heimersheim) made their first contribution in cli/cli#11194 **Full Changelog**: cli/cli@v2.74.2...v2.75.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MC42Mi4xIiwidXBkYXRlZEluVmVyIjoiNDAuNjIuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Fixes #11168
This PR adds a workflow to help detect spam issues by using AI. A couple of bash scripts are added to
.github/workflows/scripts/spam-detection
, but there are two main entrypoints:process-issue.sh
which is used in the new workflow.eval.sh
which is meant to be used to run evals and verify the AI inference accuracy.As of our last discussion, the spam issues will be labelled with
suspected-spam
.Notes
Since this is our first attempt, I intentionally left some details to be fixes/improved later (e.g. re-running the workflow when an issue is edited, or checking the user history in the repo).
Also, we could have used the
actions/ai-inference
action, but that would either introduce more coupling between our helpers scripts, or bring too much low level processings to the workflow. So, I decided to go with a script and use the GitHub Models extension forgh
to interact with AI.