Why does "bytes(n)" create a length n byte string instead of converting n to a binary representation?
I was trying to build this bytes object in Python 3:
b'3\r\n'
so I tried the obvious (for me), and found a weird behaviour:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
Apparently:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format
to bytes (see also Python 3 bytes formatting):
http://bugs.python.org/issue3982
This interacts even more poorly with oddities like bytes(int) returning zeroes now
and:
It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)
Can someone explain me where this behaviour comes from?
Answers
The behavior you're observing stems from the way Python's bytes
constructor works. Specifically:
bytes(3)
creates a zero-filled bytes object of length 3, not the ASCII representation of the integer3
.
Here’s an explanation of the behavior:
bytes()
constructor behavior:
In Python 3, the bytes()
constructor behaves differently based on the argument passed to it:
-
Single Integer Argument:
- When you pass a single integer
n
tobytes(n)
, it creates abytes
object of lengthn
, filled with null bytes (\x00
). - Example:
This creates a>>> bytes(3) b'\x00\x00\x00'
bytes
object with 3 null bytes.
- When you pass a single integer
-
Other Arguments:
- If you pass an iterable (like a list, tuple, or string) of integers, it creates a
bytes
object with each integer representing the value of a byte. - Example:
>>> bytes([65, 66, 67]) b'ABC'
- If you pass an iterable (like a list, tuple, or string) of integers, it creates a
Why this behavior?
-
Historical reasons: This behavior originates from how
bytes
objects are treated as immutable sequences of byte values. When a single integern
is passed, the most common use case was to create a block ofn
zero-filled bytes. This is useful for working with binary protocols or low-level networking where you need buffers of specific lengths. -
Consistency with other sequences: Python’s
bytes
is a type of sequence, and the idea is consistent with howbytearray
(a mutable counterpart ofbytes
) works: initializing an empty or zero-filled sequence of a given length.>>> bytearray(5) bytearray(b'\x00\x00\x00\x00\x00')
What you wanted:
If you were looking to convert the integer 3
into its ASCII representation, you would use:
>>> b = b'3' + b'\r\n'
>>> print(b)
b'3\r\n'
Alternatively, if you want to construct the bytes object dynamically, you can use:
>>> b = str(3).encode() + b'\r\n'
>>> print(b)
b'3\r\n'
Here, str(3).encode()
converts the integer 3
into a string, then encodes it as a bytes object before concatenating it with b'\r\n'
.
Why not represent the integer as an ASCII byte?
The bytes(int)
behavior returning a zero-filled sequence is based on Python’s design choice to prioritize binary data handling. If the bytes()
constructor returned the ASCII encoding of integers, it would conflict with many common use cases that rely on bytes(int)
to initialize zeroed buffers.
To explicitly convert an integer to its ASCII byte, you can use:
>>> bytes([3 + 48]) # 48 is the ASCII value of '0'
b'3'
This works because ASCII digits start at 48 ('0'
is 48, '1'
is 49, etc.).