Batch processing SVGs with DoIt and vpype

doit (a.k.a. PyDoIt) is a fantastic Python-based tool to automate repetitive workflows. It works particularly well alongside vpype to address mundane plotting-related tasks. This article explains in details how to automate an SVG optimisation and conversion workflow.

Most plotter workflows involve one or more repetitive steps which, when executed manually, take time, are boring, and possibly error-prone. Here are some examples that come to mind:

Optimizing SVGs using vpype’s linemerge reloop linesort linesimplify commands.
Converting SVGs into a format your plotter understands (e.g. HPGL, or G-code using vpype-gcode).
Splitting multi-layer SVGs into individual layers (e.g. if this is a requirement of your plotter for multi-colour plots).
Making a PNG version of SVGs for archival purposes.
Running the axicli command to plot an SVG with an Axidraw.
Uploading optimised files to the computer/server/Raspberry Pi in control of your plotter.
Etc.

Not only your workflow may include one or more of these steps, but you may need to apply it on a single SVG at a time, or on a bunch of them at once. Even better, you might want to apply your workflow only on SVGs which were updated or created since the last execution.

You can do exactly that with doit—let’s see how.

Installing doit

Although its documentation sadly doesn’t mention it, pipx is the best way to install doit (as for vpype):

$ pipx install doit

You can check that the installation was successful by running this command:

$ doit --version
0.36.0
lib @ /Users/<username>/.local/pipx/venvs/doit/lib/python3.10/site-packages/doit

Basics

As a starting point, let’s assume you have a bunch of SVGs which need optimising before plotting, stored in a originals subdirectory. Save the optimisation commands in a VPY file named optimize.vpy, with the following content:

linemerge reloop linesort linesimplify

Then, create a subdirectory named processed, which will contain the optimised SVGs:

$ mkdir processed

Here is how your file hierarchy should look like:

.
├── optimize.vpy
├── originals/
│   ├── dots.svg
│   ├── halftone.svg
│   └── hline.svg
└── processed/

Our goal is to have doit automate the optimisation of the source SVGs in originals, and store the result in processed.

doit operates by loading a description of the task(s) it must execute, typically in a file named dodo.py¹. As the name suggests, the content of this file is Python code.

Create a dodo.py file with the following content:

import pathlib                                                            # (1)

DIR = pathlib.Path(__file__).parent                                       # (2)
SOURCES = list((DIR / "originals").glob("*.svg"))                         # (3)
VPY = DIR / "optimize.vpy"                                                # (4)

def task_optimize():                                                      # (5)
    """optimize SVGs"""                                                   # (6)
    for source in SOURCES:                                                # (7)
        optimized = DIR / "processed" / (source.stem + "_optimized.svg")  # (8)
        yield {                                                           # (9)
            "name": source.stem,                                          # (10)
            "actions": [
                f"vpype read '{source}' -I '{VPY}' write '{optimized}'"   # (11)
            ],
        }

Let’s examine this code line-by-line.

The pathlib built-in module is great at file wrangling. Check this Real Python article for a gentle yet thorough introduction.
Here we use it to find our project directory, which is the parent of the present file, whose path is stored in the __file__ variable by the Python interpreter.
We list all the SVGs contained in the originals subdirectory, and store them in the SOURCES variable. Note that glob() returns a generator, which must be converted to a list if SOURCES is to be iterated multiple times.
We keep the path to the optimize.vpy file in the VPY variable.
Python functions with name starting with task_ are interpreted by doit as tasks. Here we have just one. Let’s call it “optimize”, thus the task_optimize() function name.
The function’s docstring is used by doit as help string for the task, so it is useful to include one.
Task functions must return one or more Python dictionaries describing the task. In our case, we want to create one sub-tasks per source SVG file.
For each source SVG, we derive the path for the corresponding optimised SVG. The optimised SVG are located in the processed subdirectory and have a _optimized.svg suffix to their name.
Using yield keyword (instead of return) makes our function a generator (gentle introduction available here). This is a convenient way to return (er… yield) multiple objects, which is supported by doit. Here, we yield one dictionary per sub-task.
Sub-tasks must be individually named so that they can be distinguished. Here we derive the sub-task name from the source SVG filename. For example, the sub-task corresponding to my_file.svg will be named my_file, and can be referred to with doit as optimize:my_file.
Last but not least, the "actions" entry of the sub-task dictionary lists the actions to be performed by the task. doit interprets strings as shell commands, so we build a vpype pipeline to optimise the source SVG using our VPY and saving the result in the desired location. For example, for my_file.svg, the action will be vpype read originals/my_file.svg -I optimize.vpy write processed/my_file_optimized.svg².

Let’s take a step back to properly understand what’s going on.

The function task_optimize() produces a task description—it does not actually run the task. When we run doit (using the doit command), it loads the dodo.py file, notices that it contains a task function, and calls it to learn about that task. It’s only then that it can decide which action(s) to actually execute, based on the task description. In this case, the actions are the vpype pipelines stored in the "actions" entries.

Although this dodo.py file is not overly complicated, it can still feel like quite some work compared to, you know, just calling vpype manually. I certainly felt so when first using doit. So let’s see what we gained by going through this effort.

First and foremost, we now have a potent batch processing system. We can optimise all of our source SVGs by telling doit to execute the optimize task:

$ doit optimize
.  optimize:dots
.  optimize:halftone
.  optimize:hline

Here is the result after running this command:

.
├── dodo.py
├── optimize.vpy
├── originals/
│   ├── dots.svg
│   ├── halftone.svg
│   └── hline.svg
└── processed/
    ├── dots_optimized.svg
    ├── halftone_optimized.svg
    └── hline_optimized.svg

doit indeed created properly-named, optimised versions of the source SVGs in the processed directory! 🎉

Since we only have just one task defined, we don’t even need to specify its name:

$ doit
.  optimize:dots
.  optimize:halftone
.  optimize:hline

You can also specify a specific sub-task to execute:

$ doit optimize:halftone
.  optimize:halftone

Pretty neat already—but there is a lot more to gain with a little more effort!

Handling targets and dependencies

Playing with the commands above, you may notice that each call of the optimize task triggers the processing of the corresponding SVGs—even if said SVGs were already processed before. The reason for this is that doit doesn’t yet know what the task inputs and outputs are, so it cannot check whether that output exists or is outdated. So, to be on the safe side, it always executes all specified tasks every time.

By letting doit know about tasks' inputs and outputs, doit can be much smarter about what it actually needs to do.

In doit parlance, the file(s) a task uses as input are called dependencies ("file_dep" entry). Likewise, the file(s) created as output are called targets ("targets" entry). By specifying what these are in the dodo.py file, doit can decide whether the target of a given task needs to be generated or not, saving a lot of time when repeating the workflow.

Update the dodo.py file as follows:

import pathlib

DIR = pathlib.Path(__file__).parent
SOURCES = list((DIR / "originals").glob("*.svg"))
VPY = DIR / "optimize.vpy"

def task_optimize():
    """optimize SVGs"""
    for source in SOURCES:
        optimized = DIR / "processed" / (source.stem + "_optimized.svg")
        yield {
            "name": source.stem,
            "actions": [
                f"vpype read '{source}' -I '{VPY}' write '{optimized}'"
            ],
            "targets": [optimized],         # (1)
            "file_dep": [source, VPY],      # (2)
        }

The "targets" entry is a list of all the files generated by the sub-task. In our case, there is only one, whose path is stored in the optimized variable.
The "file_dep" entry is a list of all the files the sub-task depends on. In our case, both the source SVG and the VPY file are involved to create an optimised SVG, so we list them both.

It would be easy to forget the VPY file in the "file_dep" entry. That would be a mistake. All the optimised SVGs should be regenerated when the VPY file is modified. For doit to realise this, we must list the VPY file as a dependency.

With the modification above, doit now knows when to run optimisation sub-tasks and when they can be skipped.

Let’s experiment with a clean slate by deleting all the processed files:

$ rm processed/*.svg

doit must now execute all sub-tasks:

$ doit
.  optimize:dots
.  optimize:halftone
.  optimize:hline

Notice the dot (.) prefixing each line and how the execution is relatively slow.

Now, this is what happens if we run doit again:

$ doit
-- optimize:dots
-- optimize:halftone
-- optimize:hline

Execution time is now much faster and each line is now prefixed with --, indicating that doit skipped the corresponding sub-task.

Let’s see what happens if one of the source file is modified.

$ echo " " >> originals/halftone.svg
$ doit
-- optimize:dots
.  optimize:halftone
-- optimize:hline

We first append a single space to the halftone.svg (which is harmless on a valid SVG) to simulate a change³. As expected, doit rebuilds the of halftone.svg without running the other tasks! 🎉

We now have a setup able to automatically process large batches of files and be smart about if/when any sub-task must be repeated. You have a thousand SVGs to process? It’s coffee time while the CPUs churn through them⁴. You add just one to the list? Instant results, thanks to doit!

Cleaning up

The files created by the optimize task can be considered “temporary”. When missing, they are automatically recreated by doit, and are overwritten by a new version when the input file (or the VPY file) change. In that sense, they matter much less than the source SVGs and the dodo.py file, which collectively form the “recipe” to build the optimised SVGs⁵.

The ability to delete these files may occasionally be useful. For example, to force a complete rebuild of the optimised files, to make an archive with only the true source files, or simply to free some disk space.

doit provides this feature with a single modification to the dodo.py file:

import pathlib

DIR = pathlib.Path(__file__).parent
SOURCES = list((DIR / "originals").glob("*.svg"))
VPY = DIR / "optimize.vpy"

def task_optimize():
    """optimize SVGs"""
    for source in SOURCES:
        optimized = DIR / "processed" / (source.stem + "_optimized.svg")
        yield {
            "name": source.stem,
            "actions": [
                f"vpype read '{source}' -I '{VPY}' write '{optimized}'"
            ],
            "targets": [optimized],
            "file_dep": [source, VPY],
            "clean": True,                  # (1)
        }

Tell doit that target files should be deleted when running doit clean.

Let’s see this in action:

$ doit clean
optimize:hline - removing file '.../processed/hline_optimized.svg'
optimize:halftone - removing file '.../processed/halftone_optimized.svg'
optimize:dots - removing file '.../processed/dots_optimized.svg'

Works as expected! 🎉

Multiple tasks

Although doit already shines dealing with a single task, it reveals its true power when multiple tasks are involved—even more so when they depend on each other.

For the illustration purposes, let’s imagine that we need to convert the optimised SVGs to HPGL, so that we may plot them on a shiny ‘83 HP 7475a. We’ll add a second task for this⁶.

First, let’s start by creating a new hpgl subdirectory to store the HPGL files:

$ mkdir hpgl

Since we cleaned the optimised SVGs in the previous steps, this how your project directory should look:

.
├── dodo.py
├── hpgl/
├── optimize.vpy
├── originals/
│   ├── dots.svg
│   ├── halftone.svg
│   └── hline.svg
└── processed/

Now, update the dodo.py file with the following content:

import pathlib

DIR = pathlib.Path(__file__).parent
SOURCES = list((DIR / "originals").glob("*.svg"))
VPY = DIR / "optimize.vpy"

def optimized_path(source: pathlib.Path):                              # (1)
    """derive optimized path from source path"""
    return DIR / "processed" / (source.stem + "_optimized.svg")

def hpgl_path(source: pathlib.Path):                                   # (2)
    """derive HPGL path from source path"""
    return DIR / "hpgl" / (source.stem + ".hpgl")

def task_optimize():
    """optimize SVGs"""
    for source in SOURCES:
        optimized = optimized_path(source)                             # (3)
        yield {
            "name": source.stem,
            "actions": [
                f"vpype read '{source}' -I '{VPY}' write '{optimized}'"
            ],
            "file_dep": [source, VPY],
            "targets": [optimized],
            "clean": True,
        }

def task_hpgl():
    """convert to HPGL"""
    for source in SOURCES:                                             # (4)
        optimized = optimized_path(source)                             # (5)
        hpgl = hpgl_path(source)
        yield {
            "name": source.stem,
            "actions": [
                f"vpype read '{optimized}' write -d hp7475a -p a4 -q -c '{hpgl}'"
            ],
            "file_dep": [optimized],                                   # (6)
            "targets": [hpgl],                                         # (7)
            "clean": True,
        }

Let’s examine the changes one-by-one.

To clean things up and avoid code duplication, we factored in optimized_path() the code to derive the path of an optimised SVG from a source SVG.
We do the same to derive the path of an HPGL output from a source SVG in the hpgl_path() function. Note that neither of these function names start with task_, so they aren’t interpreted as tasks by doit.
The only change to the optimize task is to use the optimized_path() helper function.
This part is interesting. The purpose of the hpgl task is to convert optimised SVG into HPGL files, yet we iterate over the source SVGs instead. The reason is, for our purposes, SOURCES is our master “TODO list”. Everything the hpgl task must do is indirectly due to the presence of source SVGs.
The source path is used only to derive the paths for the optimised SVG as well as the HPGL output. In particular, notice how source is not used anywhere in the return dictionaries.
The optimised SVGs is now a dependency (as opposed to a target in the optimize task).
Instead, the target is the HPGL file.

These two tasks collectively form a “pipeline”. The output (or target) of the first task corresponds to the input (or dependency) of the second. doit understands that thanks to the "file_dep" and "targets" entries being properly populated—and can now be smart about it!

Let’s take it for a spin by executing the hpgl task:

$ doit hpgl
.  optimize:dots
.  optimize:halftone
.  optimize:hline
.  hpgl:dots
.  hpgl:halftone
.  hpgl:hline

doit knows that it needs optimised SVGs to create HPGL file, so it automatically executes the optimize task.

Let’s remove a single HPGL file to test what happens. This can be done using the doit clean command:

$ doit clean hpgl:hline
hpgl:hline - removing file '.../hpgl/hline.hpgl'

This is what happens when we run the hpgl task again:

$ doit hpgl
-- optimize:dots
-- optimize:halftone
-- optimize:hline
-- hpgl:dots
-- hpgl:halftone
.  hpgl:hline

The optimised version of hline.svg is still present and up-to-date, so the corresponding task is skipped. Only the HPGL conversion is executed.

Now, let’s change one of the source files, like we did earlier:

$ echo " " >> originals/dots.svg  
$ doit hpgl
.  optimize:dots
-- optimize:halftone
-- optimize:hline
.  hpgl:dots
-- hpgl:halftone
-- hpgl:hline

doit correctly runs both the optimize and hpgl sub-tasks for the corresponding file! 🎉

Helper tasks

Tasks don’t have to be part of an intricate pipeline with carefully specified targets and dependencies. They can also be just a nice little helper that encapsulate a useful shell command.

Consider for example this task, which can readily be added to our dodo.py file:

def task_show():
    """display SVG"""
    for source in SOURCES:
        yield {
            "name": source.stem,
            "actions": [f"vpype read {source} show"],
        }

Its action consist of loading the source SVG and displaying it with vpype. This isn’t necessarily part of your workflow, but is convenient to have handy:

$ doit show:dots

The corresponding SVG is displayed by the vpype viewer:

*vpype* viewer display a SVGs containing many dots arranged in a circle

This example is taken from vpype-perspective, where all the README’s figures are made from VPYs files stored in the repository’s examples/figures subdirectory. The conversion of these VPYs into SVGs is handled by doit using this dodo.py file. It’s a nice example of what can be done with doit.

Final words

If you made it that far, I hope you are convinced of how useful doit is for workflow automation.

In this article, I focused on vpype, but doit can be used for entirely different things. As a matter of fact, I used it to automate my #plotloop machine, which I’ve described in my next article.

One of doit drawbacks is the fact that its dodo.py file is written in Python. Creating one requires at least some Python basics—or willingness to acquire them. This might put off people uninterested by code.

But this is also its greatest strength. You wield the full power of Python when writing your dodo.py file, without any of the constraints of configuration languages such as YAML or TOML. This extends the possibilities much further than what was covered here, and makes learning doit a great investment! 🎯

Ready to take the plunge? I’m happy to help—just share details of your workflow in the comments 👇, on Twitter/Mastodon, or on the Drawingbots Discord.

Edit: TIL what “dolt” (lowercase L) means 😅, and changed DoIt (uppercase i) into doit, consistently with their documentation.

Edit: Added a link to the Automatic #plotloop Machine article and updated the video.

The file may also have a different name, or be located elsewhere, but then its path should be provided to doit. Using dodo.py is simpler because this file is automatically detected and loaded by doit. ↩︎
The code actually generates full paths. ↩︎
If you are used to make and similar systems, you might be tempted to touch originals/halftone.svg to trigger a rebuild instead of modifying the file’s content. This doesn’t work with doit as it uses a local database and file hashes instead of modification date to track dependencies. ↩︎
By the way, you can parallelise the processing of large batches using doit -n 8 optimize, where 8 is the number of CPU cores to use. ↩︎
This bears strong similarities with software build systems, where compiled object files are created from source code by the compiler. As a matter of fact, doit can serve as a build system. ↩︎
This example is slightly over-engineered. vpype can optimise and export to HPGL in one command, so technically a single doit task is needed. Even if multiple commands were required (vpype or otherwise), they can all be listed in a single doit task—the "actions" entry is a list which can contain multiple items. It is still a relevant illustration for the many instances were multiple doit tasks are indeed useful. ↩︎

Installing doit#

Basics#

Handling targets and dependencies#

Cleaning up#

Multiple tasks#

Helper tasks#

Final words#