blob: 019d09e19cf6ed9667ea7636c55a8273b2822d80 [file] [log] [blame] [view]
Max Moroz4a8415a2019-08-02 17:46:511# Efficient Fuzzing Guide
2
Adrian Taylor6a886ec62023-10-25 23:45:273This relates to fuzzers created using [libfuzzer] not [FuzzTests] - none of this
4advice is necessary for FuzzTests.
5
Max Moroz4a8415a2019-08-02 17:46:516Once you have a fuzz target running, you can analyze and tweak it to improve its
7efficiency. This document describes techniques to minimize fuzzing time and
8maximize your results.
9
10*** note
11**Note:** If you haven’t created your first fuzz target yet, see the [Getting
12Started Guide].
13***
14
15The most direct way to gauge the effectiveness of your fuzz target is to collect
16metrics. You can get them manually, or take them from a [ClusterFuzz status]
17page after your fuzz target is checked into the Chromium repository.
18
19[TOC]
20
21## Key metrics of a fuzz target
22
23### Execution speed
24
25A fuzzing engine such as libFuzzer typically explores a large search space by
26performing randomized mutations, so it needs to run as fast as possible to find
27interesting code paths.
28
29Fuzz target speed is calculated in executions per second (`exec/s`). It is
30printed while a fuzz target is running:
31
32```
33#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
34```
35
36You should aim for at least 1,000 exec/s from your fuzz target locally before
37submitting it to the Chromium repository. If you’re under 1,000, consider the
38following improvements:
39
40* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup)
41* [Minimizing memory usage](#Minimizing-memory-usage)
42
43#### Simplifying initialization/cleanup
44
45If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the
46fuzzer’s execution speed. It can also cause the fuzzer to target specific
47use-cases or fail to account for unexpected scenarios.
48
49Instead of performing setup and teardown on each input, use static
50initialization and shared resources. Check out this [startup initialization] in
51libFuzzer’s documentation for an example.
52
53*** note
54**Note:** You can skip freeing static resources. However, all other resources
55allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated,
56since the function gets called millions of times during a fuzzing session. If
57you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.
58***
59
60#### Minimizing memory usage
61
62Avoid allocation of dynamic memory wherever possible. Memory instrumentation
63works faster for stack-based and static objects than for heap-allocated ones.
64
65*** note
66**Note:** It’s always a good idea to try different variants for your fuzz target
67locally, then submit only the fastest implementation to the Chromium repository.
68***
69
70### Code coverage
71
72You can check the percentage of code covered by your fuzz target to gauge
73fuzzing effectiveness:
74
75* Review aggregated Chrome coverage from recent runs by checking the [fuzzing
76 coverage] report. This report can provide insight on how to improve code
77 coverage.
78* Generate a source-level coverage report for your fuzzer by running the
79 [coverage script] stored in the Chromium repository. The script provides
80 detailed instructions and a usage example.
81
Daniel Classon3da95ee72021-11-18 18:28:2382For the `out/coverage` target in the coverage script, make sure to add all of
83the gn args you needed to build the `out/libfuzzer` target; this could include
84args like `target_os=chromeos` and `is_asan=true` depending on the [gn config]
85you chose.
86
Max Moroz4a8415a2019-08-02 17:46:5187*** note
88**Note:** The code coverage of a fuzz target depends heavily on the corpus. A
89well-chosen corpus will produce much greater code coverage. On the other hand,
90a coverage report generated by a fuzz target without a corpus won't cover much
91code. If you don’t have a corpus to use, you can download the [corpus from
92ClusterFuzz]. For more information on the corpus, see
93[Corpus Size](#Corpus-Size).
94***
95
96### Corpus size
97
98A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase
99or corpus unit) *interesting* if the input results in new code coverage (i.e.,
100if the fuzzer reaches code that has not been reached before). The set of all
101interesting inputs is called the *corpus*. A corpus is shared across fuzzer runs
102and grows over time.
103
104If a fuzz target stops discovering new interesting inputs after running for a
105while, it typically indicates that the fuzz target is hitting a code barrier
106(also called a *coverage plateau*). The corpus for a reasonably complex target
107should contain hundreds (if not thousands) of inputs.
108
109If a fuzz target reaches coverage plateau with a small corpus, the common causes
110are checksums and magic numbers. Or, it may be impossible for your fuzzer to
111reach a lot of code. The easiest way to diagnose the problem is to generate and
112analyze a [coverage report](#code-coverage). Then, to fix the issue, try the
113following:
114
115* Change the code (e.g., disable CRC checks while fuzzing) with a
116 [custom build](#Custom-build).
117* Prepare or improve the [seed corpus](#Seed-corpus).
118* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary).
119
120## Ways to improve a fuzz target
121
122### Seed corpus
123
124You can give your fuzz target a starting point by creating a set of valid and
125interesting inputs called a *seed corpus*. If you don’t provide a seed corpus,
126the fuzzing engine has to guess inputs from scratch, which can take time
127(depending on the size of the inputs and the complexity of the target format).
128In many cases, providing a seed corpus can increase code coverage by an order of
129magnitude.
130
131Seed corpuses work especially well for strictly defined file formats and data
132transmission protocols:
133
134* For file format parsers, add valid files from your test suite.
135* For protocol parsers, add valid raw streams from a test suite into separate
136 files.
137* For graphics libraries, add a variety of small PNG/JPG/GIF files.
138
139#### Using a corpus locally
140
141If you’re running a fuzz target locally, you can easily designate a corpus by
142passing a directory as an argument:
143
144```
145./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
146```
147
148The fuzzer stores all the interesting inputs it finds in the directory.
149
150#### Creating a Chromium repository seed corpus
151
152When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined
153in the Chromium source repository. You can define one in your `BUILD.gn` file by
154adding a `seed_corpus` attribute to your `fuzzer_test` target definition:
155
156```
157fuzzer_test("my_fuzzer") {
158 ...
159 seed_corpus = "test/fuzz/testcases"
160 ...
161}
162```
163
164If you want to specify multiple seed corpus directories, use the `seed_corpuses`
165attribute instead:
166
167```
168fuzzer_test("my_fuzzer") {
169 ...
170 seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ]
171 ...
172}
173```
174
175All files found in these directories and their subdirectories are stored in a
176`<my_fuzzer>_seed_corpus.zip` output archive.
177
178#### Uploading corpus files to GCS
179
180If you can't store your seed corpus in the Chromium repository (e.g., it’s too
181large, can’t be open-sourced, etc.), you can upload the corpus to the Google
182Cloud Storage (GCS) bucket used by ClusterFuzz.
183
1841) Open the [Corpus GCS Bucket] in your browser.
1852) Search for the directory named `<my_fuzzer>`. If the directory does not
186 exist, create it.
1873) In the `<my_fuzzer>` directory, upload your corpus files.
188
189*** note
190**Note:** If you upload your corpus to GCS, you don’t need to add the
191`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding
192seed corpus to the Chromium repository is the preferred way.
193***
194
195You can do the same thing by using the [gsutil] command line tool:
196
197```bash
198gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
199```
200
201*** note
202**Note:** To write to this bucket using `gsutil`, you must be logged into your
203@google.com account (@chromium.org will not work). You can use the `gcloud auth
204login` command to log into your account in `gsutil` if you installed `gsutil`
205through `gcloud`.
206***
207
208#### Minimizing a seed corpus
209
210Your seed corpus is synced to all fuzzing bots for every iteration, so it's
211important to minimize it to a small set of interesting inputs before uploading.
212Keeping the seed corpus small improves fuzzing efficiency and prevents our bots
213from running out of disk space.
214
215You can minimize your seed corpus by using libFuzzer’s `-merge=1` option:
216
217```bash
218# Create an empty directory.
219mkdir seed_corpus_minimized
220
221# Run the fuzzer with -merge=1 flag.
222./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
223```
224
225After running the command, the `seed_corpus_minimized` directory will contain a
226minimized corpus that gives the same code coverage as your initial `seed_corpus`
227directory.
228
229### Fuzzer dictionary
230
231You can help your fuzzer increase its coverage by providing a set of common
232words or values that you expect to find in the input. Such a dictionary works
233especially well for certain use-cases (e.g., fuzzing file format decoders or
234text-based protocols like XML).
235
236Add a fuzzer dictionary:
237
2381) Create a flat ASCII text file that lists one input token per line in the
239 format `name="value"`. The value must appear in quotes with hex escaping
240 (`\xNN`) applied to all non-printable, high-bit, or otherwise problematic
241 characters (`\` and `"` shorthands are recognized, too). This syntax is
242 similar to the one used by the [AFL] fuzzing engine (`-x` option).
243
244 *** note
245 **Note:** `name` can be omitted, but it is a convenient way to document the
246 meaning of each token. Here’s an example dictionary:
247 ***
248
249 ```
250 # Lines starting with '#' and empty lines are ignored.
251
252 # Adds "blah" word (w/o quotes) to the dictionary.
253 kw1="blah"
254 # Use \\ for backslash and \" for quotes.
255 kw2="\"ac\\dc\""
256 # Use \xAB for hex values.
257 kw3="\xF7\xF8"
258 # Key name before '=' can be omitted:
259 "foo\x0Abar"
260 ```
261
2622) Test your dictionary by running your fuzz target locally:
263
264 ```bash
265 ./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
266 ```
267
268 If the dictionary is effective, you should see `NEW` units discovered in the
269 output.
270
2713) Add the dictionary file in the same directory as your fuzz target, then add
272 the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file:
273
274 ```
275 fuzzer_test("my_fuzzer") {
276 ...
277 dict = "my_fuzzer.dict"
278 }
279 ```
280
281 The dictionary is submitted to the Chromium repository. Once ClusterFuzz
282 picks up a new revision build, the dictionary is used automatically.
283
284### Custom build
285
Lukasz Anforowiczb534bdf2025-03-28 13:39:34286If you need to change the code being tested by your fuzz target, you can use
287conditional compilation as follows:
288
289* `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` in C/C++ code
290* `if cfg!(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) { ... }` in Rust code
Max Moroz4a8415a2019-08-02 17:46:51291
292*** note
293**Note:** Patching target code is not a preferred way of improving the
294corresponding fuzz target, but in some cases it might be the only way to do it
295(e.g., when there is no intended API to disable checksum verification, or when
296the target code uses a random generator that affects the reproducibility of
297crashes).
298***
299
300[AFL]: http://lcamtuf.coredump.cx/afl/
301[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
302[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer
303[Getting Started Guide]: getting_started.md
Daniel Classon3da95ee72021-11-18 18:28:23304[gn config]: getting_started.md#running-the-fuzz-target
Max Moroz4a8415a2019-08-02 17:46:51305[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus
306[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py
Adrian Taylor52578312023-10-25 07:49:23307[fuzzing coverage]: https://analysis.chromium.org/coverage/p/chromium?platform=fuzz
Max Moroz4a8415a2019-08-02 17:46:51308[gsutil]: https://cloud.google.com/storage/docs/gsutil
309[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization
Adrian Taylor6a886ec62023-10-25 23:45:27310[libfuzzer]: getting_started_with_libfuzzer.md
311[fuzztests]: getting_started.md