Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 1 | # Efficient Fuzzing Guide |
| 2 | |
Adrian Taylor | 6a886ec6 | 2023-10-25 23:45:27 | [diff] [blame] | 3 | This relates to fuzzers created using [libfuzzer] not [FuzzTests] - none of this |
| 4 | advice is necessary for FuzzTests. |
| 5 | |
Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 6 | Once you have a fuzz target running, you can analyze and tweak it to improve its |
| 7 | efficiency. This document describes techniques to minimize fuzzing time and |
| 8 | maximize your results. |
| 9 | |
| 10 | *** note |
| 11 | **Note:** If you haven’t created your first fuzz target yet, see the [Getting |
| 12 | Started Guide]. |
| 13 | *** |
| 14 | |
| 15 | The most direct way to gauge the effectiveness of your fuzz target is to collect |
| 16 | metrics. You can get them manually, or take them from a [ClusterFuzz status] |
| 17 | page after your fuzz target is checked into the Chromium repository. |
| 18 | |
| 19 | [TOC] |
| 20 | |
| 21 | ## Key metrics of a fuzz target |
| 22 | |
| 23 | ### Execution speed |
| 24 | |
| 25 | A fuzzing engine such as libFuzzer typically explores a large search space by |
| 26 | performing randomized mutations, so it needs to run as fast as possible to find |
| 27 | interesting code paths. |
| 28 | |
| 29 | Fuzz target speed is calculated in executions per second (`exec/s`). It is |
| 30 | printed while a fuzz target is running: |
| 31 | |
| 32 | ``` |
| 33 | #11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098 |
| 34 | ``` |
| 35 | |
| 36 | You should aim for at least 1,000 exec/s from your fuzz target locally before |
| 37 | submitting it to the Chromium repository. If you’re under 1,000, consider the |
| 38 | following improvements: |
| 39 | |
| 40 | * [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup) |
| 41 | * [Minimizing memory usage](#Minimizing-memory-usage) |
| 42 | |
| 43 | #### Simplifying initialization/cleanup |
| 44 | |
| 45 | If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the |
| 46 | fuzzer’s execution speed. It can also cause the fuzzer to target specific |
| 47 | use-cases or fail to account for unexpected scenarios. |
| 48 | |
| 49 | Instead of performing setup and teardown on each input, use static |
| 50 | initialization and shared resources. Check out this [startup initialization] in |
| 51 | libFuzzer’s documentation for an example. |
| 52 | |
| 53 | *** note |
| 54 | **Note:** You can skip freeing static resources. However, all other resources |
| 55 | allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated, |
| 56 | since the function gets called millions of times during a fuzzing session. If |
| 57 | you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency. |
| 58 | *** |
| 59 | |
| 60 | #### Minimizing memory usage |
| 61 | |
| 62 | Avoid allocation of dynamic memory wherever possible. Memory instrumentation |
| 63 | works faster for stack-based and static objects than for heap-allocated ones. |
| 64 | |
| 65 | *** note |
| 66 | **Note:** It’s always a good idea to try different variants for your fuzz target |
| 67 | locally, then submit only the fastest implementation to the Chromium repository. |
| 68 | *** |
| 69 | |
| 70 | ### Code coverage |
| 71 | |
| 72 | You can check the percentage of code covered by your fuzz target to gauge |
| 73 | fuzzing effectiveness: |
| 74 | |
| 75 | * Review aggregated Chrome coverage from recent runs by checking the [fuzzing |
| 76 | coverage] report. This report can provide insight on how to improve code |
| 77 | coverage. |
| 78 | * Generate a source-level coverage report for your fuzzer by running the |
| 79 | [coverage script] stored in the Chromium repository. The script provides |
| 80 | detailed instructions and a usage example. |
| 81 | |
Daniel Classon | 3da95ee7 | 2021-11-18 18:28:23 | [diff] [blame] | 82 | For the `out/coverage` target in the coverage script, make sure to add all of |
| 83 | the gn args you needed to build the `out/libfuzzer` target; this could include |
| 84 | args like `target_os=chromeos` and `is_asan=true` depending on the [gn config] |
| 85 | you chose. |
| 86 | |
Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 87 | *** note |
| 88 | **Note:** The code coverage of a fuzz target depends heavily on the corpus. A |
| 89 | well-chosen corpus will produce much greater code coverage. On the other hand, |
| 90 | a coverage report generated by a fuzz target without a corpus won't cover much |
| 91 | code. If you don’t have a corpus to use, you can download the [corpus from |
| 92 | ClusterFuzz]. For more information on the corpus, see |
| 93 | [Corpus Size](#Corpus-Size). |
| 94 | *** |
| 95 | |
| 96 | ### Corpus size |
| 97 | |
| 98 | A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase |
| 99 | or corpus unit) *interesting* if the input results in new code coverage (i.e., |
| 100 | if the fuzzer reaches code that has not been reached before). The set of all |
| 101 | interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs |
| 102 | and grows over time. |
| 103 | |
| 104 | If a fuzz target stops discovering new interesting inputs after running for a |
| 105 | while, it typically indicates that the fuzz target is hitting a code barrier |
| 106 | (also called a *coverage plateau*). The corpus for a reasonably complex target |
| 107 | should contain hundreds (if not thousands) of inputs. |
| 108 | |
| 109 | If a fuzz target reaches coverage plateau with a small corpus, the common causes |
| 110 | are checksums and magic numbers. Or, it may be impossible for your fuzzer to |
| 111 | reach a lot of code. The easiest way to diagnose the problem is to generate and |
| 112 | analyze a [coverage report](#code-coverage). Then, to fix the issue, try the |
| 113 | following: |
| 114 | |
| 115 | * Change the code (e.g., disable CRC checks while fuzzing) with a |
| 116 | [custom build](#Custom-build). |
| 117 | * Prepare or improve the [seed corpus](#Seed-corpus). |
| 118 | * Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary). |
| 119 | |
| 120 | ## Ways to improve a fuzz target |
| 121 | |
| 122 | ### Seed corpus |
| 123 | |
| 124 | You can give your fuzz target a starting point by creating a set of valid and |
| 125 | interesting inputs called a *seed corpus*. If you don’t provide a seed corpus, |
| 126 | the fuzzing engine has to guess inputs from scratch, which can take time |
| 127 | (depending on the size of the inputs and the complexity of the target format). |
| 128 | In many cases, providing a seed corpus can increase code coverage by an order of |
| 129 | magnitude. |
| 130 | |
| 131 | Seed corpuses work especially well for strictly defined file formats and data |
| 132 | transmission protocols: |
| 133 | |
| 134 | * For file format parsers, add valid files from your test suite. |
| 135 | * For protocol parsers, add valid raw streams from a test suite into separate |
| 136 | files. |
| 137 | * For graphics libraries, add a variety of small PNG/JPG/GIF files. |
| 138 | |
| 139 | #### Using a corpus locally |
| 140 | |
| 141 | If you’re running a fuzz target locally, you can easily designate a corpus by |
| 142 | passing a directory as an argument: |
| 143 | |
| 144 | ``` |
| 145 | ./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus |
| 146 | ``` |
| 147 | |
| 148 | The fuzzer stores all the interesting inputs it finds in the directory. |
| 149 | |
| 150 | #### Creating a Chromium repository seed corpus |
| 151 | |
| 152 | When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined |
| 153 | in the Chromium source repository. You can define one in your `BUILD.gn` file by |
| 154 | adding a `seed_corpus` attribute to your `fuzzer_test` target definition: |
| 155 | |
| 156 | ``` |
| 157 | fuzzer_test("my_fuzzer") { |
| 158 | ... |
| 159 | seed_corpus = "test/fuzz/testcases" |
| 160 | ... |
| 161 | } |
| 162 | ``` |
| 163 | |
| 164 | If you want to specify multiple seed corpus directories, use the `seed_corpuses` |
| 165 | attribute instead: |
| 166 | |
| 167 | ``` |
| 168 | fuzzer_test("my_fuzzer") { |
| 169 | ... |
| 170 | seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ] |
| 171 | ... |
| 172 | } |
| 173 | ``` |
| 174 | |
| 175 | All files found in these directories and their subdirectories are stored in a |
| 176 | `<my_fuzzer>_seed_corpus.zip` output archive. |
| 177 | |
| 178 | #### Uploading corpus files to GCS |
| 179 | |
| 180 | If you can't store your seed corpus in the Chromium repository (e.g., it’s too |
| 181 | large, can’t be open-sourced, etc.), you can upload the corpus to the Google |
| 182 | Cloud Storage (GCS) bucket used by ClusterFuzz. |
| 183 | |
| 184 | 1) Open the [Corpus GCS Bucket] in your browser. |
| 185 | 2) Search for the directory named `<my_fuzzer>`. If the directory does not |
| 186 | exist, create it. |
| 187 | 3) In the `<my_fuzzer>` directory, upload your corpus files. |
| 188 | |
| 189 | *** note |
| 190 | **Note:** If you upload your corpus to GCS, you don’t need to add the |
| 191 | `seed_corpus` attribute to your `fuzzer_test` target definition. However, adding |
| 192 | seed corpus to the Chromium repository is the preferred way. |
| 193 | *** |
| 194 | |
| 195 | You can do the same thing by using the [gsutil] command line tool: |
| 196 | |
| 197 | ```bash |
| 198 | gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer> |
| 199 | ``` |
| 200 | |
| 201 | *** note |
| 202 | **Note:** To write to this bucket using `gsutil`, you must be logged into your |
| 203 | @google.com account (@chromium.org will not work). You can use the `gcloud auth |
| 204 | login` command to log into your account in `gsutil` if you installed `gsutil` |
| 205 | through `gcloud`. |
| 206 | *** |
| 207 | |
| 208 | #### Minimizing a seed corpus |
| 209 | |
| 210 | Your seed corpus is synced to all fuzzing bots for every iteration, so it's |
| 211 | important to minimize it to a small set of interesting inputs before uploading. |
| 212 | Keeping the seed corpus small improves fuzzing efficiency and prevents our bots |
| 213 | from running out of disk space. |
| 214 | |
| 215 | You can minimize your seed corpus by using libFuzzer’s `-merge=1` option: |
| 216 | |
| 217 | ```bash |
| 218 | # Create an empty directory. |
| 219 | mkdir seed_corpus_minimized |
| 220 | |
| 221 | # Run the fuzzer with -merge=1 flag. |
| 222 | ./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus |
| 223 | ``` |
| 224 | |
| 225 | After running the command, the `seed_corpus_minimized` directory will contain a |
| 226 | minimized corpus that gives the same code coverage as your initial `seed_corpus` |
| 227 | directory. |
| 228 | |
| 229 | ### Fuzzer dictionary |
| 230 | |
| 231 | You can help your fuzzer increase its coverage by providing a set of common |
| 232 | words or values that you expect to find in the input. Such a dictionary works |
| 233 | especially well for certain use-cases (e.g., fuzzing file format decoders or |
| 234 | text-based protocols like XML). |
| 235 | |
| 236 | Add a fuzzer dictionary: |
| 237 | |
| 238 | 1) Create a flat ASCII text file that lists one input token per line in the |
| 239 | format `name="value"`. The value must appear in quotes with hex escaping |
| 240 | (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic |
| 241 | characters (`\` and `"` shorthands are recognized, too). This syntax is |
| 242 | similar to the one used by the [AFL] fuzzing engine (`-x` option). |
| 243 | |
| 244 | *** note |
| 245 | **Note:** `name` can be omitted, but it is a convenient way to document the |
| 246 | meaning of each token. Here’s an example dictionary: |
| 247 | *** |
| 248 | |
| 249 | ``` |
| 250 | # Lines starting with '#' and empty lines are ignored. |
| 251 | |
| 252 | # Adds "blah" word (w/o quotes) to the dictionary. |
| 253 | kw1="blah" |
| 254 | # Use \\ for backslash and \" for quotes. |
| 255 | kw2="\"ac\\dc\"" |
| 256 | # Use \xAB for hex values. |
| 257 | kw3="\xF7\xF8" |
| 258 | # Key name before '=' can be omitted: |
| 259 | "foo\x0Abar" |
| 260 | ``` |
| 261 | |
| 262 | 2) Test your dictionary by running your fuzz target locally: |
| 263 | |
| 264 | ```bash |
| 265 | ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus> |
| 266 | ``` |
| 267 | |
| 268 | If the dictionary is effective, you should see `NEW` units discovered in the |
| 269 | output. |
| 270 | |
| 271 | 3) Add the dictionary file in the same directory as your fuzz target, then add |
| 272 | the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file: |
| 273 | |
| 274 | ``` |
| 275 | fuzzer_test("my_fuzzer") { |
| 276 | ... |
| 277 | dict = "my_fuzzer.dict" |
| 278 | } |
| 279 | ``` |
| 280 | |
| 281 | The dictionary is submitted to the Chromium repository. Once ClusterFuzz |
| 282 | picks up a new revision build, the dictionary is used automatically. |
| 283 | |
| 284 | ### Custom build |
| 285 | |
Lukasz Anforowicz | b534bdf | 2025-03-28 13:39:34 | [diff] [blame] | 286 | If you need to change the code being tested by your fuzz target, you can use |
| 287 | conditional compilation as follows: |
| 288 | |
| 289 | * `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` in C/C++ code |
| 290 | * `if cfg!(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) { ... }` in Rust code |
Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 291 | |
| 292 | *** note |
| 293 | **Note:** Patching target code is not a preferred way of improving the |
| 294 | corresponding fuzz target, but in some cases it might be the only way to do it |
| 295 | (e.g., when there is no intended API to disable checksum verification, or when |
| 296 | the target code uses a random generator that affects the reproducibility of |
| 297 | crashes). |
| 298 | *** |
| 299 | |
| 300 | [AFL]: http://lcamtuf.coredump.cx/afl/ |
| 301 | [ClusterFuzz status]: libFuzzer_integration.md#Status-Links |
| 302 | [Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer |
| 303 | [Getting Started Guide]: getting_started.md |
Daniel Classon | 3da95ee7 | 2021-11-18 18:28:23 | [diff] [blame] | 304 | [gn config]: getting_started.md#running-the-fuzz-target |
Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 305 | [corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus |
| 306 | [coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py |
Adrian Taylor | 5257831 | 2023-10-25 07:49:23 | [diff] [blame] | 307 | [fuzzing coverage]: https://analysis.chromium.org/coverage/p/chromium?platform=fuzz |
Max Moroz | 4a8415a | 2019-08-02 17:46:51 | [diff] [blame] | 308 | [gsutil]: https://cloud.google.com/storage/docs/gsutil |
| 309 | [startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization |
Adrian Taylor | 6a886ec6 | 2023-10-25 23:45:27 | [diff] [blame] | 310 | [libfuzzer]: getting_started_with_libfuzzer.md |
| 311 | [fuzztests]: getting_started.md |