Fix a race condition here:
with self.learn.no_bar(), self.learn.no_logging():
dl = self.learn.dls.test_dl(files, bs=bs)
batch, _ = self.learn.get_preds(dl=dl)
The calls to `no_bar()` and maybe `no_logging()` were unsafe in a
multithreaded environment because they modified the shared `self.learn`
object. This led to random exceptions and deadlocks when gunicorn was
under load in production.
Another attempt at fixing #2. b216057 had a bug where you couldn't pass
a file through stdin because we tried to read the file twice, which we
can't do if the input is a pipe.
* Only ignore invalid files in the `autotag` script, not in the
web service. In the web service we want to return an error if given an
invalid or corrupt file.
* Use the logger to log a warning instead of printing directly to stderr.
Add an option for taking the list of input files from a text file:
./autotag -i files.txt
This is useful in conjunction with the `find` command:
find ~/images/ -name '*.jpg' | ./autotag -i -
This is useful when performing batch inference on the Danbooru dataset.
In this case the filename is the MD5, so this makes the output return
only the MD5 instead of the full path to the image file.
Fix `autotag` so you can pass a filename of '-' to read a file from
stdin. This way you can do this:
docker run --rm -i ghcr.io/danbooru/autotagger autotag - < image.jpg
...to perform prediction on a single file outside of Docker.
* Make it so you can give the `autotag` script a mixed list of files and
directories, and it will recursively process every file in each directory.
* Allow choosing the output format: one (filename, tag, score) tuple per
line, or one (filename, tags) tuple per line.