Python Packaging multiple subpackages with different data directories
I have a structure of the directory as such with foobar
and alphabet
data directories together with the code something.py
:
\mylibrary
\packages
\foobar
foo.zip
bar.zip
\alphabet
abc.zip
xyz.zip
something.py
setup.py
And the goal is such that users can pip install the module as such:
pip install mylibrary[alphabet]
And that'll only include the data from the packages/alphabet/*
data and the python code. Similar behavior should be available for pip install mylibrary[foobar]
.
If the user installs without the specification:
pip install mylibrary
Then it'll include all the data directories under packages/
.
Currently, I've tried writing the setup.py
with Python3.5 as such:
import glob
from setuptools import setup, find_packages
setup(
name = 'mylibrary',
packages = ['packages'],
package_data={'packages':glob.glob('packages' + '/**/*.txt', recursive=True)},
)
That will create a distribution with all the data directories when users do pip install mylibrary
.
How should I change the setup.py
such that specific pip installs like pip install mylibrary[alphabet]
is possible?
Answer
To achieve the desired behavior where users can install specific data directories (like alphabet
or foobar
) through pip install mylibrary[alphabet]
or pip install mylibrary[foobar]
, you can use extra requirements in the setup.py
file. This feature allows you to specify optional dependencies (in this case, data files) for different installation scenarios.
You can modify your setup.py
as follows:
1. Use extras_require
to define optional data sets:
The extras_require
section allows you to define additional groups of dependencies, such as [alphabet]
or [foobar]
. We will use it to include the data files for these specific groups.
2. Use package_data
and MANIFEST.in
to include the data files in the distribution.
Here is how you can modify your setup.py
:
Modified setup.py
:
import glob
from setuptools import setup, find_packages
# Define the data files for each package
alphabet_data = glob.glob('packages/alphabet/*')
foobar_data = glob.glob('packages/foobar/*')
setup(
name='mylibrary',
packages=find_packages(where='packages'),
package_data={
'packages.alphabet': alphabet_data,
'packages.foobar': foobar_data,
},
extras_require={
'alphabet': alphabet_data, # Data for the alphabet option
'foobar': foobar_data, # Data for the foobar option
'all': alphabet_data + foobar_data, # Install both data sets together
},
include_package_data=True, # This ensures package_data is included
)
Explanation of the Changes:
-
find_packages(where='packages')
:- This finds the packages in the
packages/
directory. It will ensure thatfoobar
,alphabet
, and any other subdirectories underpackages/
are included.
- This finds the packages in the
-
package_data
:- This specifies which files should be included with the package. For each optional data set (
alphabet
andfoobar
), we specify the appropriate files usingglob.glob()
. The key here is the package name in the format'packages.<subpackage>'
.
- This specifies which files should be included with the package. For each optional data set (
-
extras_require
:- This section defines the extra requirements for each optional install. For example,
mylibrary[alphabet]
will include the files inpackages/alphabet/
, andmylibrary[foobar]
will include the files inpackages/foobar/
. - If users install with
pip install mylibrary[all]
, both data sets (alphabet
andfoobar
) will be included. - If no extras are specified (i.e.,
pip install mylibrary
), both data sets are installed.
- This section defines the extra requirements for each optional install. For example,
-
include_package_data=True
:- This ensures that
package_data
is included in the distribution and installation.
- This ensures that
3. Using MANIFEST.in
(if needed):
In some cases, you might also need to use a MANIFEST.in
file to ensure the files are included in your source distribution. This is necessary if you want these data files to be included in your version-controlled distribution.
Here is an example MANIFEST.in
:
include packages/foobar/*
include packages/alphabet/*
This will include all the files from the foobar
and alphabet
directories in the source distribution.
4. Installing the package:
Now, when users install your package:
pip install mylibrary
will install both data sets (fromalphabet
andfoobar
).pip install mylibrary[alphabet]
will install only thealphabet
data.pip install mylibrary[foobar]
will install only thefoobar
data.pip install mylibrary[all]
will install both data sets.
5. How the package is structured:
The final structure of your package might look like this:
mylibrary/
setup.py
MANIFEST.in
packages/
alphabet/
abc.zip
xyz.zip
foobar/
foo.zip
bar.zip
something.py
README.md
With this setup, you can provide a flexible installation process where users can specify what data they need, without unnecessarily including everything in the default installation.