Skip to Main Content
Fluent Python
book

Fluent Python

by Luciano Ramalho
August 2015
Intermediate content levelIntermediate
790 pages
18h 48m
English
O'Reilly Media, Inc.
Content preview from Fluent Python

Chapter 4. Text versus Bytes

Humans use text. Computers speak bytes.1

Esther Nam and Travis Fischer, Character Encoding and Unicode in Python

Python 3 introduced a sharp distinction between strings of human text and sequences of raw bytes. Implicit conversion of byte sequences to Unicode text is a thing of the past. This chapter deals with Unicode strings, binary sequences, and the encodings used to convert between them.

Depending on your Python programming context, a deeper understanding of Unicode may or may not be of vital importance to you. In the end, most of the issues covered in this chapter do not affect programmers who deal only with ASCII text. But even if that is your case, there is no escaping the str versus byte divide. As a bonus, you’ll find that the specialized binary sequence types provide features that the “all-purpose” Python 2 str type does not have.

In this chapter, we will visit the following topics:

  • Characters, code points, and byte representations

  • Unique features of binary sequences: bytes, bytearray, and memoryview

  • Codecs for full Unicode and legacy character sets

  • Avoiding and dealing with encoding errors

  • Best practices when handling text files

  • The default encoding trap and standard I/O issues

  • Safe Unicode text comparisons with normalization

  • Utility functions for normalization, case folding, and brute-force diacritic removal

  • Proper sorting of Unicode text with locale and the PyUCA library

  • Character metadata in the Unicode database ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Fluent Python, 2nd Edition

Fluent Python, 2nd Edition

Luciano Ramalho

Publisher Resources

ISBN: 9781491946237Errata Page