Skip to content

[12.x] Consistent use of mb_split() to split strings into words #56338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shaedrich
Copy link
Contributor

@shaedrich shaedrich commented Jul 19, 2025

Situation

Currently, we split strings into words differently within \Illuminate\Support\Str

Downsides

This can, theoretically, lead to

  • different results
  • more maintenance

Change

This PR unifies the splitting of strings into words to always use mb_split() with the regular expression /\s+/

Alternatives

    public static function extractWords($value, $locale = 'en')
    {
        $wordBreakIterator = IntlBreakIterator::createWordInstance($locale);
        $wordBreakIterator->setText($value);
        
        return array_filter([ ...$wordBreakIterator->getPartsIterator() ], trim(...));
    }
  • split at word boundaries (\b) non-word characters (\W+)
patch
diff --git a/src/Illuminate/Support/Str.php b/src/Illuminate/Support/Str.php
index ffe670f6f9c5..consistent-word-splitting-in-string-helper 100644
--- a/src/Illuminate/Support/Str.php
+++ b/src/Illuminate/Support/Str.php
@@ -1449,7 +1449,7 @@ public static function title($value)
      */
     public static function headline($value)
     {
-        $parts = mb_split('\s+', $value);
+        $parts = mb_split('\W+', $value);
 
         $parts = count($parts) > 1
             ? array_map(static::title(...), $parts)
@@ -1482,7 +1482,7 @@ public static function apa($value)
 
         $endPunctuation = ['.', '!', '?', ':', '—', ','];
 
-        $words = mb_split('\s+', $value);
+        $words = mb_split('\W+', $value);
 
         for ($i = 0; $i < count($words); $i++) {
             $lowercaseWord = mb_strtolower($words[$i]);
@@ -1697,7 +1697,7 @@ public static function studly($value)
             return static::$studlyCache[$key];
         }
 
-        $words = mb_split('\s+', static::replace(['-', '_'], ' ', $value));
+        $words = mb_split('\W+', static::replace(['-', '_'], ' ', $value));
 
         $studlyWords = array_map(fn ($word) => static::ucfirst($word), $words);

Follow-ups

  • Extracting the functionality into a separate function ("words()" would be a good fit but this is, unfortunately, already in use; maybe mb_split_words() 🤔)
    • Pro: If made public, it can be used separately

@taylorotwell taylorotwell merged commit 816ffa5 into laravel:12.x Jul 20, 2025
62 checks passed
@shaedrich shaedrich deleted the consistent-word-splitting-in-string-helper branch July 21, 2025 09:49
itinerare pushed a commit to itinerare/Mundialis that referenced this pull request Jul 23, 2025
This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [laravel/framework](https://laravel.com) ([source](https://github.com/laravel/framework)) | `12.20.0` -> `12.21.0` | [![age](https://developer.mend.io/api/mc/badges/age/packagist/laravel%2fframework/12.21.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/packagist/laravel%2fframework/12.21.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/packagist/laravel%2fframework/12.20.0/12.21.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/packagist/laravel%2fframework/12.20.0/12.21.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>laravel/framework (laravel/framework)</summary>

### [`v12.21.0`](https://github.com/laravel/framework/blob/HEAD/CHANGELOG.md#v12210---2025-07-22)

[Compare Source](laravel/framework@v12.20.0...v12.21.0)

- fix(vite): [#&#8203;55793](laravel/framework#55793) add explicit as-script to link tag for script modul… by [@&#8203;midsonlajeanty](https://github.com/midsonlajeanty) in laravel/framework#55794
- \[12.x] Allow globally disabling Factory parent relationships via `Factory::dontExpandRelationshipsByDefault()` by [@&#8203;cosmastech](https://github.com/cosmastech) in laravel/framework#56154
- \[12.x] Adds checking if a value is between two columns by [@&#8203;DarkGhostHunter](https://github.com/DarkGhostHunter) in laravel/framework#56119
- \[12.x] Ensure database connection is always restored by [@&#8203;xurshudyan](https://github.com/xurshudyan) in laravel/framework#56258
- \[12.x] Fix handling of `Htmlable` objects in `Js::convertDataToJavaScriptExpression()` by [@&#8203;jj15asmr](https://github.com/jj15asmr) in laravel/framework#56253
- Reduce meaningless intermediate variables. by [@&#8203;LjjGit](https://github.com/LjjGit) in laravel/framework#56265
- \[12.x] Improve typehints for `AbstractCursorPaginator@through()` by [@&#8203;cosmastech](https://github.com/cosmastech) in laravel/framework#56267
- Use `Date` facade instead of `time()` for `password_confirmed_at` check by [@&#8203;dylanbr](https://github.com/dylanbr) in laravel/framework#56270
- \[12.x] fix: Collection::transform() and Paginator::through() return types by [@&#8203;calebdw](https://github.com/calebdw) in laravel/framework#56273
- \[12.x] Merge 11.x into 12.x by [@&#8203;u01jmg3](https://github.com/u01jmg3) in laravel/framework#56289
- \[12.x] Reduce meaningless intermediate variables by [@&#8203;AhmedAlaa4611](https://github.com/AhmedAlaa4611) in laravel/framework#56288
- \[12.x] Refactor build Method to Use Null Coalescing Assignment for Default C… by [@&#8203;Ashot1995](https://github.com/Ashot1995) in laravel/framework#56283
- \[12.x] minor code formatting improvements by [@&#8203;browner12](https://github.com/browner12) in laravel/framework#56296
- \[12.x] Use more specific route binding exception message for child routes by [@&#8203;jessekoerhuis](https://github.com/jessekoerhuis) in laravel/framework#56298
- \[12.x] Fix Possible Undefined Variables by [@&#8203;calfc](https://github.com/calfc) in laravel/framework#56292
- \[12.x] Fix: Ensure scheduler `dailyAt()` method parses minutes and ignores seconds when seconds are provided by [@&#8203;amirhshokri](https://github.com/amirhshokri) in laravel/framework#56308
- \[12.x] Allows for strict boolean validation by [@&#8203;peterfox](https://github.com/peterfox) in laravel/framework#56313
- Improve `SeedCommand` console output by [@&#8203;Jehong-Ahn](https://github.com/Jehong-Ahn) in laravel/framework#56310
- \[12.x] Add unified enum support across framework docs by [@&#8203;amirhshokri](https://github.com/amirhshokri) in laravel/framework#56271
- \[12.x] Allows for strict numeric validation by [@&#8203;peterfox](https://github.com/peterfox) in laravel/framework#56328
- \[12.x] Update PHPDoc annotations in `Validation` by [@&#8203;mrvipchien](https://github.com/mrvipchien) in laravel/framework#56321
- \[12.x] Add operator class support for PostgreSQL GiST spatial indexes by [@&#8203;joteejotee](https://github.com/joteejotee) in laravel/framework#56324
- Fix multipart array value parsing in HTTP client ([#&#8203;55732](laravel/framework#55732)) by [@&#8203;joteejotee](https://github.com/joteejotee) in laravel/framework#56302
- Fixes bug with ShouldBeUniqueUntilProcessing locks getting stuck due to Middleware by [@&#8203;TWithers](https://github.com/TWithers) in laravel/framework#56318
- \[12.x] add prompts based expectations to PendingCommand by [@&#8203;BinaryKitten](https://github.com/BinaryKitten) in laravel/framework#56260
- \[12.x] Add Singleton and Scoped attributes to Container by [@&#8203;riasvdv](https://github.com/riasvdv) in laravel/framework#56334
- Fix unsetting model castable attribute when cast to object ([#&#8203;56335](laravel/framework#56335)) by [@&#8203;guram-vashakidze](https://github.com/guram-vashakidze) in laravel/framework#56343
- \[12.x]  Fix/memory improvement by [@&#8203;CharrafiMed](https://github.com/CharrafiMed) in laravel/framework#56345
- \[12.x] Add hasMailer method to the mailable class by [@&#8203;kevinb1989](https://github.com/kevinb1989) in laravel/framework#56340
- \[12.x] Consistent use of `mb_split()` to split strings into words by [@&#8203;shaedrich](https://github.com/shaedrich) in laravel/framework#56338
- \[12.x] Add toStringable to Uri by [@&#8203;Kyrch](https://github.com/Kyrch) in laravel/framework#56359
- \[12.x] Fix PHPStan Integrations by [@&#8203;crynobone](https://github.com/crynobone) in laravel/framework#56369
- Add 'isEmpty' and 'isNotEmpty' to Fluent by [@&#8203;cworreschk](https://github.com/cworreschk) in laravel/framework#56370
- \[12.x] Add mergeMetadata method to the Mailable class by [@&#8203;kevinb1989](https://github.com/kevinb1989) in laravel/framework#56376
- Add 'dontReportUsing' to filter exceptions to be reported by [@&#8203;pelmered](https://github.com/pelmered) in laravel/framework#56361

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS40Mi4zIiwidXBkYXRlZEluVmVyIjoiNDEuNDMuMCIsInRhcmdldEJyYW5jaCI6ImRldmVsb3AiLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Reviewed-on: https://code.itinerare.net/itinerare/Mundialis/pulls/282
Co-authored-by: Amadeus[bot] <[email protected]>
Co-committed-by: Amadeus[bot] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants