Skip to content

Automate spam issue detection #11316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 21, 2025
Merged

Conversation

babakks
Copy link
Member

@babakks babakks commented Jul 16, 2025

Fixes #11168

This PR adds a workflow to help detect spam issues by using AI. A couple of bash scripts are added to .github/workflows/scripts/spam-detection, but there are two main entrypoints:

  • process-issue.sh which is used in the new workflow.
  • eval.sh which is meant to be used to run evals and verify the AI inference accuracy.

As of our last discussion, the spam issues will be labelled with suspected-spam.

Notes

Since this is our first attempt, I intentionally left some details to be fixes/improved later (e.g. re-running the workflow when an issue is edited, or checking the user history in the repo).

Also, we could have used the actions/ai-inference action, but that would either introduce more coupling between our helpers scripts, or bring too much low level processings to the workflow. So, I decided to go with a script and use the GitHub Models extension for gh to interact with AI.

@Copilot Copilot AI review requested due to automatic review settings July 16, 2025 20:13
@babakks babakks requested a review from a team as a code owner July 16, 2025 20:13
@babakks babakks requested a review from williammartin July 16, 2025 20:13
@babakks babakks temporarily deployed to cli-automation July 16, 2025 20:13 — with GitHub Actions Inactive
@babakks babakks requested review from andyfeller and BagToad July 16, 2025 20:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements an automated spam detection system for GitHub issues using AI inference. The system analyzes newly opened issues and labels suspected spam with the "suspected-spam" label.

  • Adds a GitHub Actions workflow that triggers on issue creation to run spam detection
  • Creates a modular bash script architecture for spam detection with AI prompt generation
  • Includes evaluation tooling to test and validate the AI model's accuracy

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/workflows/detect-spam.yml Main workflow file that triggers spam detection on new issues
.github/workflows/scripts/spam-detection/process-issue.sh Main entry point that orchestrates spam detection and labeling
.github/workflows/scripts/spam-detection/check-issue.sh Core logic for fetching issue data and running AI inference
.github/workflows/scripts/spam-detection/generate-sys-prompt.sh Generates comprehensive system prompt with project context and templates
.github/workflows/scripts/spam-detection/generate-prompt.sh Formats issue title and body into structured prompt format
.github/workflows/scripts/spam-detection/eval.sh Evaluation script for testing AI model accuracy

Signed-off-by: Babak K. Shandiz <[email protected]>
Copy link
Member

@andyfeller andyfeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @babakks! I think this is in a state where we can begin attempting to identify spammy issues, refining as we go along.

The following nits and questions are primarily about making some aspects easier for people to understand and maintain or minor rephrasing of particular content for prompts. They aren't deal breakers to block this work but I wanted to bring them up all the same.

Comment on lines +83 to +87
<GitHub CLI docs>
\`\`\`
$(gh --help)
\`\`\`
</GitHub CLI docs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how much context about gh capabilities does the model need here?

gh reference produces usage information for all available commands. I ask just being uncertain with how much the models have been trained on what gh provides or what degree we need to tell it.

Does removing this section affect the response any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 concerns with using gh --help is that it includes the user's gh alias and gh extension, which can create confusion and affect the results.

@BagToad had an idea we prepare a minimal list of gh commands that are essential. 🤷

Expand for what this shows for me

$ gh --help
Work seamlessly with GitHub from the command line.

USAGE
  gh <command> <subcommand> [flags]

CORE COMMANDS
  auth:          Authenticate gh and git with GitHub
  browse:        Open repositories, issues, pull requests, and more in the browser
  codespace:     Connect to and manage codespaces
  gist:          Manage gists
  issue:         Manage issues
  org:           Manage organizations
  pr:            Manage pull requests
  project:       Work with GitHub Projects.
  release:       Manage releases
  repo:          Manage repositories

GITHUB ACTIONS COMMANDS
  cache:         Manage GitHub Actions caches
  run:           View details about workflow runs
  workflow:      View details about GitHub Actions workflows

EXTENSION COMMANDS
  backlog:       Extension backlog
  binary-ext:    Extension binary-ext
  catsup:        Extension catsup
  commit2pr:     Extension commit2pr
  copilot:       Extension copilot
  dash:          Extension dash
  eco:           Extension eco
  elephant-carpaccio: Extension elephant-carpaccio
  kraken:        Extension kraken
  markdown-poc:  Extension markdown-poc
  models:        Extension models
  montage:       Extension montage
  publicize:     Extension publicize
  quoth:         Extension quoth
  repo-export:   Extension repo-export
  repo-stats:    Extension repo-stats
  script-ext:    Extension script-ext
  skyline:       Extension skyline
  slack:         Extension slack
  sonar:         Extension sonar
  token:         Extension token
  workflow-stats: Extension workflow-stats

ALIAS COMMANDS
  slackd:        Alias for "slack read -d"
  unreleased:    Shell alias for "!gh pr list --base trunk --limit 200 --search \"merged:>$(gh release view --js..."

ADDITIONAL COMMANDS
  alias:         Create command shortcuts
  api:           Make an authenticated GitHub API request
  attestation:   Work with artifact attestations
  completion:    Generate shell completion scripts
  config:        Manage configuration for gh
  extension:     Manage gh extensions
  gpg-key:       Manage GPG keys
  label:         Manage labels
  preview:       Execute previews for gh features
  ruleset:       View info about repo rulesets
  search:        Search for repositories, issues, and pull requests
  secret:        Manage GitHub secrets
  ssh-key:       Manage SSH keys
  status:        Print information about relevant issues, pull requests, and notifications across repositories
  variable:      Manage GitHub Actions variables

HELP TOPICS
  accessibility: Learn about GitHub CLI's accessibility experiences
  actions:       Learn about working with GitHub Actions
  environment:   Environment variables that can be used with gh
  exit-codes:    Exit codes used by gh
  formatting:    Formatting options for JSON data exported from gh
  mintty:        Information about using gh with MinTTY
  reference:     A comprehensive reference of all gh commands

FLAGS
  --help      Show help for command
  --version   Show gh version

EXAMPLES
  $ gh issue create
  $ gh repo clone cli/cli
  $ gh pr checkout 321

LEARN MORE
  Use `gh <command> <subcommand> --help` for more information about a command.
  Read the manual at https://cli.github.com/manual
  Learn about exit codes using `gh help exit-codes`
  Learn about accessibility experiences using `gh help accessibility`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just remove it altogether and run the evals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking with @BagToad, he mentioned that @babakks said this improved the evaluation. 🤷

# The following `gh models eval` command will fail after 20 requests due to rate limits.
# We are going to open up an issue in `github/gh-models` to address this.
#
# TODO: break up `eval-prompts.yml` file into smaller batches to avoid hitting the rate limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created github/gh-models#74 to discuss this further with github/gh-models team

Copy link
Member

@BagToad BagToad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paired on this with @andyfeller and also self-reviewed again. LGTM

@BagToad BagToad merged commit b2348f8 into trunk Jul 21, 2025
15 checks passed
@BagToad BagToad deleted the babakks/automate-spam-issue-detection branch July 21, 2025 23:49
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Jul 27, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [cli/cli](https://github.com/cli/cli) | minor | `v2.74.2` -> `v2.76.1` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>cli/cli (cli/cli)</summary>

### [`v2.76.1`](https://github.com/cli/cli/releases/tag/v2.76.1): GitHub CLI 2.76.1

[Compare Source](cli/cli@v2.76.0...v2.76.1)

#### `gh pr create` regression fix

This release fixes a regression introduced in `v2.76.0` where organization teams were retrieved outside of intentional use cases.  This caused problems for GitHub Enterprise Server users using the GitHub Actions automatic token that does not have access to organization teams.

For more information, see cli/cli#11360

#### What's Changed

##### 🐛 Fixes

- Fix: `gh pr create`, only fetch teams when reviewers contain a team  by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11361

##### 📚 Docs & Chores

- add tenancy aware for san matcher by [@&#8203;ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11261
- Run Lint and Tests on `push` to `trunk` branch by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11325
- update ownership of pkg/cmd/release/shared/ by [@&#8203;ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11326
- Automate spam issue detection by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11316
- Improve `api` `--preview` docs by [@&#8203;jsoref](https://github.com/jsoref) in cli/cli#11274
- Incorporate govulncheck into workflows by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11332
- chore(deps): bump advanced-security/filter-sarif from 1.0.0 to 1.0.1 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11298
- chore(deps): bump github.com/sigstore/sigstore-go from 1.0.0 to 1.1.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11307

**Full Changelog**: cli/cli@v2.76.0...v2.76.1

### [`v2.76.0`](https://github.com/cli/cli/releases/tag/v2.76.0): GitHub CLI 2.76.0

[Compare Source](cli/cli@v2.75.1...v2.76.0)

#### :copilot: Copilot Coding Agent Support

GitHub Copilot Pro+ and Copilot Enterprise subscribers can now assign issues to GitHub Copilot during issue creation using:

- Command-line flag: `gh issue create --assignee @&#8203;copilot`
- Launching web browser: `gh issue create --assignee @&#8203;copilot --web`
- Or interactively selecting `Copilot (AI)` as assignee in `gh issue create` metadata

For more details, refer to [the full changelog post for Copilot coding agent](https://github.blog/changelog/2025-05-19-github-copilot-coding-agent-in-public-preview/).

#### What's Changed

##### ✨ Features

- Assign Copilot during `gh issue create` by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11279
- Display immutable field in `release view` command by [@&#8203;bdehamer](https://github.com/bdehamer) in cli/cli#11251

##### 🐛 Fixes

- FIX: Do not fetch logs for skipped jobs by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11312
- Transform `extension` and `filename` qualifiers into `path` qualifier for web code search by [@&#8203;samcoe](https://github.com/samcoe) in cli/cli#11211

##### 📚 Docs & Chores

- FIX: Workflow does not contain permissions by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11322
- Add automated feature request response workflow by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11299

**Full Changelog**: cli/cli@v2.75.1...v2.76.0

### [`v2.75.1`](https://github.com/cli/cli/releases/tag/v2.75.1): GitHub CLI 2.75.1

[Compare Source](cli/cli@v2.75.0...v2.75.1)

#### What's Changed

##### 🐛 Fixes

- Ensure hostnames are visible in CLI website by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11295
- Revert "Fix: `gh pr create` prioritize `--title` and `--body` over `--fill` when `--web` is present" by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11300

##### 📚 Docs & Chores

- Ensure go directive is always .0 version in bump by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11259
- Minor (1-word) documentation typo in generated `~/.config/gh/config.yml` by [@&#8203;kurahaupo](https://github.com/kurahaupo) in cli/cli#11246
- Automate closing of stale issues by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11268
- Filter the `third-party/` folder out of CodeQL results by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11278
- Exclude `third-party` source from golangci-lint by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11293

##### :dependabot: Dependencies

- Bump Go to 1.24.5 by [@&#8203;github-actions](https://github.com/github-actions)\[bot] in cli/cli#11255
- chore(deps): bump github.com/sigstore/protobuf-specs from 0.4.3 to 0.5.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11263
- chore(deps): bump golang.org/x/term from 0.32.0 to 0.33.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11266
- chore(deps): bump golang.org/x/sync from 0.15.0 to 0.16.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11264
- chore(deps): bump golang.org/x/text from 0.26.0 to 0.27.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11265
- chore(deps): bump golang.org/x/crypto from 0.39.0 to 0.40.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in cli/cli#11275

#### New Contributors

- [@&#8203;kurahaupo](https://github.com/kurahaupo) made their first contribution in cli/cli#11246
- [@&#8203;github-actions](https://github.com/github-actions)\[bot] made their first contribution in cli/cli#11255

**Full Changelog**: cli/cli@v2.75.0...v2.75.1

### [`v2.75.0`](https://github.com/cli/cli/releases/tag/v2.75.0): GitHub CLI 2.75.0

[Compare Source](cli/cli@v2.74.2...v2.75.0)

#### What's Changed

##### ✨ Features

- init release verify subcommands  by [@&#8203;ejahnGithub](https://github.com/ejahnGithub) in cli/cli#11018
- Embed Windows resources (VERSIONINFO) during build by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11048
- Support `--no-repos-selected` on `gh secret set` by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11217

##### 🐛 Fixes

- Fix: `gh pr create` prioritize `--title` and `--body` over `--fill` when `--web` is present by [@&#8203;dankrzeminski32](https://github.com/dankrzeminski32) in cli/cli#10547
- fix: get token for active user instead of blank if possible by [@&#8203;anuraaga](https://github.com/anuraaga) in cli/cli#11038
- Use Actions API to retrieve job run logs as a fallback mechanism  by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11172
- Fix query object state mutation during pagination by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11244
- Handle `HTTP 404` when deleting remote branch in `pr merge` by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11234

##### 📚 Docs & Chores

- chore: fix function name by [@&#8203;jinjingroad](https://github.com/jinjingroad) in cli/cli#11149
- chore: update Go version to 1.24 in devcontainer configuration and docs by [@&#8203;tMinamiii](https://github.com/tMinamiii) in cli/cli#11158
- Ensure lint workflow checks whether 3rd party license and code is up to date by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11047
- docs: install\_linux.md: add Solus linux install instructions by [@&#8203;chax](https://github.com/chax) in cli/cli#10823
- Fix missing newline in install\_linux.md by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11160
- Ensure automation uses pinned go-licenses version by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11161
- Add `workflow_dispatch` support to MR Help Wanted check by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11179
- Remove unused `GH_TOKEN` env variable from workflow by [@&#8203;BagToad](https://github.com/BagToad) in cli/cli#11190
- Add workflow to automate go version bumping by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11189
- Fix inconsistent use of tabs and spaces by [@&#8203;Stefan-Heimersheim](https://github.com/Stefan-Heimersheim) in cli/cli#11194
- Decouple arg parsing from MR finder by [@&#8203;babakks](https://github.com/babakks) in cli/cli#11192
- docs: consistently use `apt` in installation instructions by [@&#8203;tklauser](https://github.com/tklauser) in cli/cli#11216
- Ensure bump go script has git user configured by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11229
- Inject token into bump-go workflow by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11233
- Reinstating Primer Style CLI content within `cli/cli` repository by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11060
- Add setup-go to bump-go workflow by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11237
- Ensure GoReleaser does not break on Mac OS and Linux when skipping Windows `.rsyso` generation script by [@&#8203;andyfeller](https://github.com/andyfeller) in cli/cli#11257

##### :dependabot: Dependencies

- Bump all dependencies except dev-tunnels by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11203
- Update microsoft dev-tunnels to v0.1.13 by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11205
- Consume dependabot minor versions for go modules by [@&#8203;williammartin](https://github.com/williammartin) in cli/cli#11213

#### New Contributors

- [@&#8203;jinjingroad](https://github.com/jinjingroad) made their first contribution in cli/cli#11149
- [@&#8203;tMinamiii](https://github.com/tMinamiii) made their first contribution in cli/cli#11158
- [@&#8203;chax](https://github.com/chax) made their first contribution in cli/cli#10823
- [@&#8203;dankrzeminski32](https://github.com/dankrzeminski32) made their first contribution in cli/cli#10547
- [@&#8203;anuraaga](https://github.com/anuraaga) made their first contribution in cli/cli#11038
- [@&#8203;Stefan-Heimersheim](https://github.com/Stefan-Heimersheim) made their first contribution in cli/cli#11194

**Full Changelog**: cli/cli@v2.74.2...v2.75.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MC42Mi4xIiwidXBkYXRlZEluVmVyIjoiNDAuNjIuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create workflow than runs on Issue and PR Creation and closes if high likelihood of spam
4 participants