How to download a large file using FastAPI?

ghz 1years ago ⋅ 2280 views

Question

I am trying to download a large file (.tar.gz) from FastAPI backend. On server side, I simply validate the filepath, and I then use Starlette.FileResponse to return the whole file—just like what I've seen in many related questions on StackOverflow.

Server side:

return FileResponse(path=file_name, media_type='application/octet-stream', filename=file_name)

After that, I get the following error:

  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 149, in serialize_response
    return jsonable_encoder(response_content)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/encoders.py", line 130, in jsonable_encoder
    return ENCODERS_BY_TYPE[type(obj)](obj)
  File "pydantic/json.py", line 52, in pydantic.json.lambda
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I also tried using StreamingResponse, but got the same error. Any other ways to do it?

The StreamingResponse in my code:

@x.post("/download")
async def download(file_name=Body(), token: str | None = Header(default=None)):
    file_name = file_name["file_name"]
    # should be something like xx.tar
    def iterfile():
        with open(file_name,"rb") as f:
            yield from f
    return StreamingResponse(iterfile(),media_type='application/octet-stream')

Ok, here is an update to this problem. I found the error did not occur on this api, but the api doing forward request of this.

@("/")
def f():
    req = requests.post(url ="/download")
    return req.content

And here if I returned a StreamingResponse with .tar file, it led to (maybe) encoding problems.

When using requests, remember to set the same media-type. Here is media_type='application/octet-stream'. And it works!


Answer

If you find yield from f being rather slow when [using StreamingResponse with file-like objects](https://fastapi.tiangolo.com/advanced/custom- response/#using-streamingresponse-with-file-like-objects), for instance:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

some_file_path = 'large-video-file.mp4'
app = FastAPI()

@app.get('/')
def main():
    def iterfile():
        with open(some_file_path, mode='rb') as f:
            yield from f

    return StreamingResponse(iterfile(), media_type='video/mp4')

you could instead create a generator where you read the file in chunks using a specified chunk size; hence, speeding up the process. Examples can be found below.

Note that StreamingResponse can take either an async generator or a normal generator/iterator to stream the response body. In case you used the standard open() method that doesn't support async/await, you would have to declare the generator function with normal def. Regardless, FastAPI/Starlette will still work asynchronously, as it will check whether the generator you passed is asynchronous (as shown in the source code), and if is not, it will then run the generator in a separate thread, using iterate_in_threadpool, that is then awaited.

You can set the [Content-Disposition](https://developer.mozilla.org/en- US/docs/Web/HTTP/Headers/Content-Disposition) header in the response (as described in this answer, as well as here and here) to indicate whether the content is expected to be displayedinline in the browser (if you are streaming, for example, a .mp4 video, .mp3 audio file, etc), or as anattachment that is downloaded and saved locally (using the specified filename).

As for the media_type (also known as MIME type), there are two primary MIME types (see [Common MIME types](https://developer.mozilla.org/en- US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types)):

  • text/plain is the default value for textual files. A textual file should be human-readable and must not contain binary data.
  • application/octet-stream is the default value for all other cases. An unknown file type should use this type.

For a file with .tar extension, as shown in your question, you can also use a different subtype from octet-stream, that is, x-tar. Otherwise, if the file is of unknown type, stick to application/octet-stream. See the linked documentation above for a list of common MIME types.

Option 1 - Using normal generator

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

CHUNK_SIZE = 1024 * 1024  # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()

@app.get('/')
def main():
    def iterfile():
        with open(some_file_path, 'rb') as f:
            while chunk := f.read(CHUNK_SIZE):
                yield chunk

    headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
    return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')

Option 2 - Using async generator with

aiofiles

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import aiofiles

CHUNK_SIZE = 1024 * 1024  # = 1MB - adjust the chunk size as desired
some_file_path = 'large_file.tar'
app = FastAPI()

@app.get('/')
async def main():
    async def iterfile():
       async with aiofiles.open(some_file_path, 'rb') as f:
            while chunk := await f.read(CHUNK_SIZE):
                yield chunk

    headers = {'Content-Disposition': 'attachment; filename="large_file.tar"'}
    return StreamingResponse(iterfile(), headers=headers, media_type='application/x-tar')