Easy Python Package Publishing with Poetry
I just published my first Python package on PyPI, called feature-grouper, a data science package for a simple form of dimensionality reduction.
The package itself is almost trivial, a couple of functions and a scikit-learn transformer class, but it’s something I anticipate reusing on future projects, so I wanted to be able to just import it rather than copy-pasting the code.
I thought I’d write a post to detail the end-to-end development process to serve as a reminder to myself the next time I want to write a package, and in case anyone else finds this useful. I used Poetry for package management, which made the experience nicer by unifying several different command line tools (pip, virtualenv, twine). While the overall process was pretty straightforward, there were a couple of parts that I felt were non-obvious and deserve explanation.
Installing the tools
Installing Python (on Ubuntu):
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.8 python3-pip python3.8-dev python3.8-venv make
python3.8 -m pip install --upgrade pip setuptools wheel
Installing Poetry:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
Initializing the project
I used the poetry new
command to create the project structure.
poetry new feature-grouper
Poetry includes pytest by default (a good choice) and gives you a nice, simple package project directory like this:
cd feature-grouper
tree
.
├── feature_grouper
│ ├── feature_grouper.py
│ ├── __init__.py
├── pyproject.toml
├── README.rst
└── tests
├── __init__.py
└── test_feature_grouper.py
The pyproject.toml
file replaces setup.py
for Poetry projects, so
I opened that file and updated the package description field.
poetry install
auto-created a virtual environment the first time,
and installed the base dependencies, generating a poetry.lock
file
that records the entire package dependency tree with the compatible version
numbers.
poetry shell
activates the virtual environment and exit
deactivates it.
Installing dependencies
The next step was to add and install the dependencies I knew I would need to develop the package.
poetry add scipy numpy scikit-learn
poetry add --dev black pylint wrapt Sphinx sphinx-rtd-theme
The package itself depends on scipy
, numpy
and scikit-learn
, meanwhile I use black
and pylint
for code
formatting and linting, and Sphinx for documentation, so I install
those as dev dependencies.
poetry add
adds the dependencies to pyproject.toml
, solves
the dependency graph, updatess poetry.lock
and installs the
packages into the virtual environemnt, all in one step.
poetry export --without-hashes > requirements.txt
generates a
requirements.txt
file, which can be used by other tools that need to
install the dependencies.
Testing
Getting started with my test suite was a snap. I just had to open
tests/test_feature_grouper.py
and start writing functions with
assert statements, and run them with pytest
. That made it easy to
get started with test-driven development.
Coding
The feature_grouper/feature_grouper.py
file is only 131 lines including
comments, so there isn’t a whole lot going on–the key point is that the class
FeatureGrouper
extends BaseEstimator
and TransformerMixin
from the
sklearn.base
module, and implements its own fit
, transform
, and
inverse_transform
methods, to make it basically a drop-in replacement for
an existing scikit-learn transformer class like
sklearn.decomposition.PCA, so that it can be used in a
sklearn.pipeline.Pipeline.
I also want to mention that I find it better to write the class and function docstrings as I go, rather than writing all the code first and going back to document it later. The latter always feels like way more work. I went with Sphinx-style docstrings but for my next package I will probably try Numpy-style docstrings as they are a little more readable in plaintext and seem more popular in the Python community.
Documentation
Sphinx is the go-to for generating code documentation for Python
packages. It includes a sphinx-quickstart
CLI to help you get
started.
mkdir docs
cd docs
sphinx-quickstart
Then, so that Sphinx could find my code to autogenerate documentation pages from the docstrings I had to add these lines to conf.py:
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath(".."))
I like the Read The Docs theme for documentation so I wanted to include that theme
as well as autodoc to generate the documentation from my class and function docstrings,
so I added the sphinx.ext.autodoc
and sphinx_rtd_theme
extensions in conf.py
:
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.todo",
"sphinx.ext.viewcode",
"sphinx.ext.autodoc",
"sphinx_rtd_theme",
]
Then, to provide the content pages for the docs site,
I added two files, docs/overview.rst
and docs/reference.rst
. In the
overview file I put a description of the package and a code sample, and in the
reference file I put the following:
``feature_grouper`` API reference
=================================
.. automodule:: feature_grouper.feature_grouper
:members:
The automodule
thing tells Sphinx to read all the class and function docstrings
in your Python code file, and generate API documentation for them.
Sphinx includes a Makefile so you can build the docs site by just typing make html
and you can preview the HTML output in your browser. Each time you change a docstring,
just do make html
again and it will rebuild the docs.
After pushing my package to GitHub,
I published my documentation to Read The Docs
because they have free hosting that is pretty easy to configure.
RTD failed to build my docs the first time, because it was
looking for a file called “content” by default, but I had named my file
“index”–had to make a trip to StackOverflow
for that one. It turned out I just needed to add
another setting to docs/conf.py, master_doc = "index"
in order to
get it to find the right file. At that point, the documentation was done.
Publishing to PyPI
Creating an account on PyPI is pretty straightforward, you just sign up through their web interface. I turned Two-Factor Authentication on and added an API token in Account settings.
One configuration was needed to set up Poetry to publish my package, using the API token I obtained from the PyPI site:
poetry config pypi-token.pypi [api-token]
Once that was set, I just needed two more commands:
poetry build # to create the .tar.gz and .whl files in a dist/ folder
poetry publish # to upload the project to PyPI
and then my package was live on PyPI within seconds!