Why does "bytes(n)" create a length n byte string instead of con

ghz 3months ago ⋅ 39 views

Why does "bytes(n)" create a length n byte string instead of converting n to a binary representation?

I was trying to build this bytes object in Python 3:

b'3\r\n'

so I tried the obvious (for me), and found a weird behaviour:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

Apparently:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting):

http://bugs.python.org/issue3982

This interacts even more poorly with oddities like bytes(int) returning zeroes now

and:

It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)

Can someone explain me where this behaviour comes from?

Answers

The behavior you're observing stems from the way Python's bytes constructor works. Specifically:

  • bytes(3) creates a zero-filled bytes object of length 3, not the ASCII representation of the integer 3.

Here’s an explanation of the behavior:

bytes() constructor behavior:

In Python 3, the bytes() constructor behaves differently based on the argument passed to it:

  1. Single Integer Argument:

    • When you pass a single integer n to bytes(n), it creates a bytes object of length n, filled with null bytes (\x00).
    • Example:
      >>> bytes(3)
      b'\x00\x00\x00'
      
      This creates a bytes object with 3 null bytes.
  2. Other Arguments:

    • If you pass an iterable (like a list, tuple, or string) of integers, it creates a bytes object with each integer representing the value of a byte.
    • Example:
      >>> bytes([65, 66, 67])
      b'ABC'
      

Why this behavior?

  • Historical reasons: This behavior originates from how bytes objects are treated as immutable sequences of byte values. When a single integer n is passed, the most common use case was to create a block of n zero-filled bytes. This is useful for working with binary protocols or low-level networking where you need buffers of specific lengths.

  • Consistency with other sequences: Python’s bytes is a type of sequence, and the idea is consistent with how bytearray (a mutable counterpart of bytes) works: initializing an empty or zero-filled sequence of a given length.

    >>> bytearray(5)
    bytearray(b'\x00\x00\x00\x00\x00')
    

What you wanted:

If you were looking to convert the integer 3 into its ASCII representation, you would use:

>>> b = b'3' + b'\r\n'
>>> print(b)
b'3\r\n'

Alternatively, if you want to construct the bytes object dynamically, you can use:

>>> b = str(3).encode() + b'\r\n'
>>> print(b)
b'3\r\n'

Here, str(3).encode() converts the integer 3 into a string, then encodes it as a bytes object before concatenating it with b'\r\n'.

Why not represent the integer as an ASCII byte?

The bytes(int) behavior returning a zero-filled sequence is based on Python’s design choice to prioritize binary data handling. If the bytes() constructor returned the ASCII encoding of integers, it would conflict with many common use cases that rely on bytes(int) to initialize zeroed buffers.

To explicitly convert an integer to its ASCII byte, you can use:

>>> bytes([3 + 48])  # 48 is the ASCII value of '0'
b'3'

This works because ASCII digits start at 48 ('0' is 48, '1' is 49, etc.).