Blame - testing/libfuzzer/efficient_fuzzing.md - chromium/src

blob: 019d09e19cf6ed9667ea7636c55a8273b2822d80 [file] [log] [blame] [view]

Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	1	# Efficient Fuzzing Guide
				2
Adrian Taylor	6a886ec6	2023-10-25 23:45:27	[diff] [blame]	3	This relates to fuzzers created using [libfuzzer] not [FuzzTests] - none of this
				4	advice is necessary for FuzzTests.
				5
Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	6	Once you have a fuzz target running, you can analyze and tweak it to improve its
				7	efficiency. This document describes techniques to minimize fuzzing time and
				8	maximize your results.
				9
				10	*** note
				11	Note: If you haven’t created your first fuzz target yet, see the [Getting
				12	Started Guide].
				13	***
				14
				15	The most direct way to gauge the effectiveness of your fuzz target is to collect
				16	metrics. You can get them manually, or take them from a [ClusterFuzz status]
				17	page after your fuzz target is checked into the Chromium repository.
				18
				19	[TOC]
				20
				21	## Key metrics of a fuzz target
				22
				23	### Execution speed
				24
				25	A fuzzing engine such as libFuzzer typically explores a large search space by
				26	performing randomized mutations, so it needs to run as fast as possible to find
				27	interesting code paths.
				28
				29	Fuzz target speed is calculated in executions per second (`exec/s`). It is
				30	printed while a fuzz target is running:
				31
				32	```
				33	#11002 NEW cov: 1337 ft: 10934 corp: 707/409Kb lim: 1098 exec/s: 5333 rss: 27Mb L: 186/1098
				34	```
				35
				36	You should aim for at least 1,000 exec/s from your fuzz target locally before
				37	submitting it to the Chromium repository. If you’re under 1,000, consider the
				38	following improvements:
				39
				40	* [Simplifying initialization/cleanup](#Simplifying-initialization-cleanup)
				41	* [Minimizing memory usage](#Minimizing-memory-usage)
				42
				43	#### Simplifying initialization/cleanup
				44
				45	If your `LLVMFuzzerTestOneInput` function is too complex, it can decrease the
				46	fuzzer’s execution speed. It can also cause the fuzzer to target specific
				47	use-cases or fail to account for unexpected scenarios.
				48
				49	Instead of performing setup and teardown on each input, use static
				50	initialization and shared resources. Check out this [startup initialization] in
				51	libFuzzer’s documentation for an example.
				52
				53	*** note
				54	Note: You can skip freeing static resources. However, all other resources
				55	allocated within the `LLVMFuzzerTestOneInput` function should be de-allocated,
				56	since the function gets called millions of times during a fuzzing session. If
				57	you don’t, you’ll often run out of memory and reduce overall fuzzing efficiency.
				58	***
				59
				60	#### Minimizing memory usage
				61
				62	Avoid allocation of dynamic memory wherever possible. Memory instrumentation
				63	works faster for stack-based and static objects than for heap-allocated ones.
				64
				65	*** note
				66	Note: It’s always a good idea to try different variants for your fuzz target
				67	locally, then submit only the fastest implementation to the Chromium repository.
				68	***
				69
				70	### Code coverage
				71
				72	You can check the percentage of code covered by your fuzz target to gauge
				73	fuzzing effectiveness:
				74
				75	* Review aggregated Chrome coverage from recent runs by checking the [fuzzing
				76	coverage] report. This report can provide insight on how to improve code
				77	coverage.
				78	* Generate a source-level coverage report for your fuzzer by running the
				79	[coverage script] stored in the Chromium repository. The script provides
				80	detailed instructions and a usage example.
				81
Daniel Classon	3da95ee7	2021-11-18 18:28:23	[diff] [blame]	82	For the `out/coverage` target in the coverage script, make sure to add all of
				83	the gn args you needed to build the `out/libfuzzer` target; this could include
				84	args like `target_os=chromeos` and `is_asan=true` depending on the [gn config]
				85	you chose.
				86
Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	87	*** note
				88	Note: The code coverage of a fuzz target depends heavily on the corpus. A
				89	well-chosen corpus will produce much greater code coverage. On the other hand,
				90	a coverage report generated by a fuzz target without a corpus won't cover much
				91	code. If you don’t have a corpus to use, you can download the [corpus from
				92	ClusterFuzz]. For more information on the corpus, see
				93	[Corpus Size](#Corpus-Size).
				94	***
				95
				96	### Corpus size
				97
				98	A guided fuzzing engine such as libFuzzer considers an input (a.k.a. testcase
				99	or corpus unit) interesting if the input results in new code coverage (i.e.,
				100	if the fuzzer reaches code that has not been reached before). The set of all
				101	interesting inputs is called the corpus. A corpus is shared across fuzzer runs
				102	and grows over time.
				103
				104	If a fuzz target stops discovering new interesting inputs after running for a
				105	while, it typically indicates that the fuzz target is hitting a code barrier
				106	(also called a coverage plateau). The corpus for a reasonably complex target
				107	should contain hundreds (if not thousands) of inputs.
				108
				109	If a fuzz target reaches coverage plateau with a small corpus, the common causes
				110	are checksums and magic numbers. Or, it may be impossible for your fuzzer to
				111	reach a lot of code. The easiest way to diagnose the problem is to generate and
				112	analyze a [coverage report](#code-coverage). Then, to fix the issue, try the
				113	following:
				114
				115	* Change the code (e.g., disable CRC checks while fuzzing) with a
				116	[custom build](#Custom-build).
				117	* Prepare or improve the [seed corpus](#Seed-corpus).
				118	* Prepare or improve the [fuzzer dictionary](#Fuzzer-dictionary).
				119
				120	## Ways to improve a fuzz target
				121
				122	### Seed corpus
				123
				124	You can give your fuzz target a starting point by creating a set of valid and
				125	interesting inputs called a seed corpus. If you don’t provide a seed corpus,
				126	the fuzzing engine has to guess inputs from scratch, which can take time
				127	(depending on the size of the inputs and the complexity of the target format).
				128	In many cases, providing a seed corpus can increase code coverage by an order of
				129	magnitude.
				130
				131	Seed corpuses work especially well for strictly defined file formats and data
				132	transmission protocols:
				133
				134	* For file format parsers, add valid files from your test suite.
				135	* For protocol parsers, add valid raw streams from a test suite into separate
				136	files.
				137	* For graphics libraries, add a variety of small PNG/JPG/GIF files.
				138
				139	#### Using a corpus locally
				140
				141	If you’re running a fuzz target locally, you can easily designate a corpus by
				142	passing a directory as an argument:
				143
				144	```
				145	./out/libfuzzer/my_fuzzer ~/tmp/my_fuzzer_corpus
				146	```
				147
				148	The fuzzer stores all the interesting inputs it finds in the directory.
				149
				150	#### Creating a Chromium repository seed corpus
				151
				152	When running fuzz targets at scale, ClusterFuzz looks for a seed corpus defined
				153	in the Chromium source repository. You can define one in your `BUILD.gn` file by
				154	adding a `seed_corpus` attribute to your `fuzzer_test` target definition:
				155
				156	```
				157	fuzzer_test("my_fuzzer") {
				158	...
				159	seed_corpus = "test/fuzz/testcases"
				160	...
				161	}
				162	```
				163
				164	If you want to specify multiple seed corpus directories, use the `seed_corpuses`
				165	attribute instead:
				166
				167	```
				168	fuzzer_test("my_fuzzer") {
				169	...
				170	seed_corpuses = [ "test/fuzz/testcases", "test/unittest/data" ]
				171	...
				172	}
				173	```
				174
				175	All files found in these directories and their subdirectories are stored in a
				176	`<my_fuzzer>_seed_corpus.zip` output archive.
				177
				178	#### Uploading corpus files to GCS
				179
				180	If you can't store your seed corpus in the Chromium repository (e.g., it’s too
				181	large, can’t be open-sourced, etc.), you can upload the corpus to the Google
				182	Cloud Storage (GCS) bucket used by ClusterFuzz.
				183
				184	1) Open the [Corpus GCS Bucket] in your browser.
				185	2) Search for the directory named `<my_fuzzer>`. If the directory does not
				186	exist, create it.
				187	3) In the `<my_fuzzer>` directory, upload your corpus files.
				188
				189	*** note
				190	Note: If you upload your corpus to GCS, you don’t need to add the
				191	`seed_corpus` attribute to your `fuzzer_test` target definition. However, adding
				192	seed corpus to the Chromium repository is the preferred way.
				193	***
				194
				195	You can do the same thing by using the [gsutil] command line tool:
				196
				197	```bash
				198	gsutil -m rsync <path_to_corpus> gs://clusterfuzz-corpus/libfuzzer/<my_fuzzer>
				199	```
				200
				201	*** note
				202	Note: To write to this bucket using `gsutil`, you must be logged into your
				203	@google.com account (@chromium.org will not work). You can use the `gcloud auth
				204	login` command to log into your account in `gsutil` if you installed `gsutil`
				205	through `gcloud`.
				206	***
				207
				208	#### Minimizing a seed corpus
				209
				210	Your seed corpus is synced to all fuzzing bots for every iteration, so it's
				211	important to minimize it to a small set of interesting inputs before uploading.
				212	Keeping the seed corpus small improves fuzzing efficiency and prevents our bots
				213	from running out of disk space.
				214
				215	You can minimize your seed corpus by using libFuzzer’s `-merge=1` option:
				216
				217	```bash
				218	# Create an empty directory.
				219	mkdir seed_corpus_minimized
				220
				221	# Run the fuzzer with -merge=1 flag.
				222	./my_fuzzer -merge=1 ./seed_corpus_minimized ./seed_corpus
				223	```
				224
				225	After running the command, the `seed_corpus_minimized` directory will contain a
				226	minimized corpus that gives the same code coverage as your initial `seed_corpus`
				227	directory.
				228
				229	### Fuzzer dictionary
				230
				231	You can help your fuzzer increase its coverage by providing a set of common
				232	words or values that you expect to find in the input. Such a dictionary works
				233	especially well for certain use-cases (e.g., fuzzing file format decoders or
				234	text-based protocols like XML).
				235
				236	Add a fuzzer dictionary:
				237
				238	1) Create a flat ASCII text file that lists one input token per line in the
				239	format `name="value"`. The value must appear in quotes with hex escaping
				240	(`\xNN`) applied to all non-printable, high-bit, or otherwise problematic
				241	characters (`\` and `"` shorthands are recognized, too). This syntax is
				242	similar to the one used by the [AFL] fuzzing engine (`-x` option).
				243
				244	*** note
				245	Note: `name` can be omitted, but it is a convenient way to document the
				246	meaning of each token. Here’s an example dictionary:
				247	***
				248
				249	```
				250	# Lines starting with '#' and empty lines are ignored.
				251
				252	# Adds "blah" word (w/o quotes) to the dictionary.
				253	kw1="blah"
				254	# Use \\ for backslash and \" for quotes.
				255	kw2="\"ac\\dc\""
				256	# Use \xAB for hex values.
				257	kw3="\xF7\xF8"
				258	# Key name before '=' can be omitted:
				259	"foo\x0Abar"
				260	```
				261
				262	2) Test your dictionary by running your fuzz target locally:
				263
				264	```bash
				265	./out/libfuzzer/my_fuzzer -dict=<path_to_dict> <path_to_corpus>
				266	```
				267
				268	If the dictionary is effective, you should see `NEW` units discovered in the
				269	output.
				270
				271	3) Add the dictionary file in the same directory as your fuzz target, then add
				272	the `dict` attribute to the `fuzzer_test` definition in your `BUILD.gn` file:
				273
				274	```
				275	fuzzer_test("my_fuzzer") {
				276	...
				277	dict = "my_fuzzer.dict"
				278	}
				279	```
				280
				281	The dictionary is submitted to the Chromium repository. Once ClusterFuzz
				282	picks up a new revision build, the dictionary is used automatically.
				283
				284	### Custom build
				285
Lukasz Anforowicz	b534bdf	2025-03-28 13:39:34	[diff] [blame]	286	If you need to change the code being tested by your fuzz target, you can use
				287	conditional compilation as follows:
				288
				289	* `#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` in C/C++ code
				290	* `if cfg!(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) { ... }` in Rust code
Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	291
				292	*** note
				293	Note: Patching target code is not a preferred way of improving the
				294	corresponding fuzz target, but in some cases it might be the only way to do it
				295	(e.g., when there is no intended API to disable checksum verification, or when
				296	the target code uses a random generator that affects the reproducibility of
				297	crashes).
				298	***
				299
				300	[AFL]: http://lcamtuf.coredump.cx/afl/
				301	[ClusterFuzz status]: libFuzzer_integration.md#Status-Links
				302	[Corpus GCS Bucket]: https://console.cloud.google.com/storage/clusterfuzz-corpus/libfuzzer
				303	[Getting Started Guide]: getting_started.md
Daniel Classon	3da95ee7	2021-11-18 18:28:23	[diff] [blame]	304	[gn config]: getting_started.md#running-the-fuzz-target
Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	305	[corpus from ClusterFuzz]: libFuzzer_integration.md#Corpus
				306	[coverage script]: https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py
Adrian Taylor	5257831	2023-10-25 07:49:23	[diff] [blame]	307	[fuzzing coverage]: https://analysis.chromium.org/coverage/p/chromium?platform=fuzz
Max Moroz	4a8415a	2019-08-02 17:46:51	[diff] [blame]	308	[gsutil]: https://cloud.google.com/storage/docs/gsutil
				309	[startup initialization]: https://llvm.org/docs/LibFuzzer.html#startup-initialization
Adrian Taylor	6a886ec6	2023-10-25 23:45:27	[diff] [blame]	310	[libfuzzer]: getting_started_with_libfuzzer.md
				311	[fuzztests]: getting_started.md