Python Packaging multiple subpackages with different data direct

ghz 13hours ago ⋅ 1 views

Python Packaging multiple subpackages with different data directories

I have a structure of the directory as such with foobar and alphabet data directories together with the code something.py:

\mylibrary
    \packages
         \foobar
             foo.zip
             bar.zip
         \alphabet
             abc.zip
             xyz.zip
          something.py
     setup.py

And the goal is such that users can pip install the module as such:

pip install mylibrary[alphabet]

And that'll only include the data from the packages/alphabet/* data and the python code. Similar behavior should be available for pip install mylibrary[foobar].

If the user installs without the specification:

pip install mylibrary

Then it'll include all the data directories under packages/.

Currently, I've tried writing the setup.py with Python3.5 as such:

import glob
from setuptools import setup, find_packages


setup(
  name = 'mylibrary',
  packages = ['packages'],
  package_data={'packages':glob.glob('packages' + '/**/*.txt', recursive=True)},
)

That will create a distribution with all the data directories when users do pip install mylibrary.

How should I change the setup.py such that specific pip installs like pip install mylibrary[alphabet] is possible?

Answer

To achieve the desired behavior where users can install specific data directories (like alphabet or foobar) through pip install mylibrary[alphabet] or pip install mylibrary[foobar], you can use extra requirements in the setup.py file. This feature allows you to specify optional dependencies (in this case, data files) for different installation scenarios.

You can modify your setup.py as follows:

1. Use extras_require to define optional data sets:

The extras_require section allows you to define additional groups of dependencies, such as [alphabet] or [foobar]. We will use it to include the data files for these specific groups.

2. Use package_data and MANIFEST.in to include the data files in the distribution.

Here is how you can modify your setup.py:

Modified setup.py:

import glob
from setuptools import setup, find_packages

# Define the data files for each package
alphabet_data = glob.glob('packages/alphabet/*')
foobar_data = glob.glob('packages/foobar/*')

setup(
    name='mylibrary',
    packages=find_packages(where='packages'),
    package_data={
        'packages.alphabet': alphabet_data,
        'packages.foobar': foobar_data,
    },
    extras_require={
        'alphabet': alphabet_data,  # Data for the alphabet option
        'foobar': foobar_data,       # Data for the foobar option
        'all': alphabet_data + foobar_data,  # Install both data sets together
    },
    include_package_data=True,  # This ensures package_data is included
)

Explanation of the Changes:

  1. find_packages(where='packages'):

    • This finds the packages in the packages/ directory. It will ensure that foobar, alphabet, and any other subdirectories under packages/ are included.
  2. package_data:

    • This specifies which files should be included with the package. For each optional data set (alphabet and foobar), we specify the appropriate files using glob.glob(). The key here is the package name in the format 'packages.<subpackage>'.
  3. extras_require:

    • This section defines the extra requirements for each optional install. For example, mylibrary[alphabet] will include the files in packages/alphabet/, and mylibrary[foobar] will include the files in packages/foobar/.
    • If users install with pip install mylibrary[all], both data sets (alphabet and foobar) will be included.
    • If no extras are specified (i.e., pip install mylibrary), both data sets are installed.
  4. include_package_data=True:

    • This ensures that package_data is included in the distribution and installation.

3. Using MANIFEST.in (if needed):

In some cases, you might also need to use a MANIFEST.in file to ensure the files are included in your source distribution. This is necessary if you want these data files to be included in your version-controlled distribution.

Here is an example MANIFEST.in:

include packages/foobar/*
include packages/alphabet/*

This will include all the files from the foobar and alphabet directories in the source distribution.

4. Installing the package:

Now, when users install your package:

  • pip install mylibrary will install both data sets (from alphabet and foobar).
  • pip install mylibrary[alphabet] will install only the alphabet data.
  • pip install mylibrary[foobar] will install only the foobar data.
  • pip install mylibrary[all] will install both data sets.

5. How the package is structured:

The final structure of your package might look like this:

mylibrary/
    setup.py
    MANIFEST.in
    packages/
        alphabet/
            abc.zip
            xyz.zip
        foobar/
            foo.zip
            bar.zip
        something.py
    README.md

With this setup, you can provide a flexible installation process where users can specify what data they need, without unnecessarily including everything in the default installation.