Бензина / Benzina¶
Table of Contents¶
Examples¶
ImageNet loading in PyTorch¶
As long as your dataset is converted into Benzina’s data format, you can load it to train a PyTorch model in a few lines of code. Here is an example demonstrating how this can be done with an ImageNet dataset. It is based on the ImageNet example from PyTorch
import torch
import benzina.torch as bz
import benzina.torch.operations as ops
seed = 1234
torch.manual_seed(seed)
# Dataset
train_dataset = bz.dataset.ImageNet("path/to/dataset", split="train")
val_dataset = bz.dataset.ImageNet("path/to/dataset", split="val")
# Dataloaders
bias = ops.ConstantBiasTransform(bias=(0.485 * 255, 0.456 * 255, 0.406 * 255))
std = ops.ConstantNormTransform(norm=(0.229 * 255, 0.224 * 255, 0.225 * 255))
train_loader = bz.DataLoader(
train_dataset,
shape=(224, 224),
batch_size=256,
shuffle=True,
seed=seed,
bias_transform=bias,
norm_transform=std,
warp_transform=ops.SimilarityTransform(scale=(0.08, 1.0),
ratio=(3./4., 4./3.),
flip_h=0.5,
random_crop=True))
val_loader = bz.DataLoader(
val_dataset,
shape=(224, 224),
batch_size=256,
shuffle=False,
seed=seed,
bias_transform=bias,
norm_transform=std,
warp_transform=ops.CenterResizedCrop(224/256)))
for epoch in range(1, 10):
# train for one epoch
train(train_dataloader, ...)
# evaluate on validation set
accuracy = validate(valid_dataloader, ...)
In the example above, two benzina.torch.dataset.ImageNet
are first created
with the location of the dataset and the desired split specified.
Note
To be able to quickly load your dataset with the hardware decoder of a GPU, Benzina needs the data to be converted in its own format embedding H.265 images.
train_dataset = bz.dataset.ImageNet("path/to/dataset", split="train")
val_dataset = bz.dataset.ImageNet("path/to/dataset", split="val")
Then the transformations to apply to the dataset are defined. It is usually a
good idea to normalize the data based on its statistical bias and standard
deviation which can be done with Benzina by using its
benzina.torch.operations.ConstantBiasTransform
and
benzina.torch.operations.ConstantNormTransform
respectively.
Note
benzina.torch.operations.ConstantBiasTransform
will substract the bias from the images’ RGB channelsbenzina.torch.operations.ConstantNormTransform
will multiply the norm with the images’ RGB channels
bias = ops.ConstantBiasTransform(bias=(123.675, 116.28 , 103.53))
std = ops.ConstantNormTransform(norm=(58.395, 57.12 , 57.375))
The dataloaders are now ready to be instantiated. In this example, the
dataset’s images are all of size 512 x 512 by the dataset specifications. A
random crop resized to 224 x 224 and a random horizontal flip will be applied
to the images prior feeding them to the model. In Benzina, this is done by
defining the size of the output tensor with the dataloader’s shape
argument
and using Benzina’s similarity transform.
In the case of the validation transform, an alias to a specific similarity transform, which applies a center crop of edges scale 224 / 256, resize the cropped section to have its smaller edge matched to 224 then center a crop of 224 x 224. Another maybe more intuitive way to describe this transformation is to see it as a resize to have the smaller edge matched to 256 then center a crop of 224 x 224.
Note
It’s useful to know that benzina.torch.operations.SimilarityTransform
will automatically center the output frame on the center of the input image.
This makes a vanilla benzina.torch.operations.SimilarityTransform
equivalent a center crop of size of the output.
train_loader = bz.DataLoader(
train_dataset,
shape=(224, 224),
batch_size=256,
shuffle=True,
seed=seed,
bias_transform=bias,
norm_transform=std,
warp_transform=ops.SimilarityTransform(scale=(0.08, 1.0),
ratio=(3./4., 4./3.),
flip_h=0.5,
random_crop=True))
val_loader = bz.DataLoader(
val_dataset,
shape=(224, 224),
batch_size=256,
shuffle=False,
seed=seed,
bias_transform=bias,
norm_transform=std,
warp_transform=ops.CenterResizedCrop(224/256))
As demonstrated in the full example loading ImageNet to feed a PyTorch model, code change between a pure PyTorch implementation and an implementation using Benzina holds in only a few lines.
$ diff -ty --suppress-common-lines examples/python/imagenet/main.py examples/python/imagenet/imagenet_pytorch.py
> import torchvision.transforms as transforms
> import torchvision.datasets as datasets
### Benzina ### <
import benzina.torch as bz <
import benzina.torch.operations as ops <
### Benzina - end ### <
<
> parser.add_argument('-j', '--workers', default=4, type=int, met
> help='number of data loading workers (defau
### Benzina ### | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406]
train_dataset = bz.dataset.ImageNet(args.data, split="train | std=[0.229, 0.224, 0.225])
<
bias = ops.ConstantBiasTransform(bias=(0.485 * 255, 0.456 * <
std = ops.ConstantNormTransform(norm=(0.229 * 255, 0.224 * <
train_loader = bz.DataLoader( | train_dataset = datasets.ImageNet(
train_dataset, shape=(224, 224), batch_size=args.batch_ | args.data, "train",
shuffle=True, seed=args.seed, | transforms.Compose([
bias_transform=bias, | transforms.RandomResizedCrop(224),
norm_transform=std, | transforms.RandomHorizontalFlip(),
warp_transform=ops.SimilarityTransform( | transforms.ToTensor(),
scale=(0.08, 1.0), | normalize,
ratio=(3./4., 4./3.), | ]))
flip_h=0.5, |
random_crop=True)) | train_loader = torch.utils.data.DataLoader(
| train_dataset, batch_size=args.batch_size, shuffle=True
val_loader = bz.DataLoader( | num_workers=args.workers, pin_memory=True)
bz.dataset.ImageNet(args.data, split="val"), shape=(224 |
batch_size=args.batch_size, shuffle=args.batch_size, se | val_loader = torch.utils.data.DataLoader(
bias_transform=bias, | datasets.ImageNet(args.data, "val", transforms.Compose(
norm_transform=std, | transforms.Resize(256),
warp_transform=ops.CenterResizedCrop(224/256)) | transforms.CenterCrop(224),
### Benzina - end ### | transforms.ToTensor(),
> normalize,
> ])),
> batch_size=args.batch_size, shuffle=False,
> num_workers=args.workers, pin_memory=True)
Datasets List¶
General Description of a Dataset¶
Dataset Composition¶
A Benzina dataset is, in essence, an indexing over a concatenation of inputs, targets and possibly filenames with indexing
Dataset Structure¶
A Benzina dataset is structured using the mp4 format
ftyp: | Defines the compatibilities of the mp4 container |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mdat: | Concatenation in 2-3 blocks of the inputs, targets and possibly filenames |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
moov: | Contains the metadata needed to load and present the raw data of mdat
|
Dataset’s Input Sample Structure¶
A Benzina dataset’s input sample can also be structured using the mp4 format. It is roughly the same as the dataset’s structure with the differences that mdat will contains the raw concatenation of a single input, its target, possibly filename and possibly a 512 x 512 thumbnails stream.
ImageNet 2012¶
ImageNet 2012 classification dataset. It contains two size of the images along with their classification target and filename:
- Resized high resolution images each with a smaller edge of at most 512 while preserving the aspect ratio. This set is accessed by referencing the bzna_input track of the input samples.
- Resized images each with a longer edge of at most 512 while preserving the aspect ratio. This set is accessed by referencing the bzna_thumb track of the input samples.
The dataset is represented by ImageNet
which
simplifies the iteration of the data as a classification dataset.
Warning
81 images are currently missing from the dataset and 111 had to be first transcoded to PNG prior to the final H.265 format. More details can be found in the dataset’s README.
Warning
High resolution images stored in the the bzna_input track of the input
samples are currently not available through the
DataLoader
. Their widely varying sizes
prevent them from being decoded using a single hardware decoder
configuration. The selected solution is to represent the images in the HEIF
format which will be completed in future development.
Dataset Composition¶
The dataset is composed of a train set, followed by a validation set then a test set for a total of 1 431 167 entries. Targets and filenames are provided for each sets:
- Train setEntries 1 to 1281167 (1 281 167 entries)
- Validation setEntries 1281168 to 1331167 (50 000 entries)
- Test setEntries 1331168 to 1431167 (100 000 entries)
Dataset Structure¶
ilsvrc2012.bzna¶
ftyp: | Defines the compatibilities of the mp4 container
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mdat: | Raw concatenation in 3 blocks of the images, targets and filenames
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
moov: | Contains the metadata needed to load and present the raw data of mdat
|
Dataset’s Input Samples Structure¶
A Benzina ImageNet dataset’s input sample is structured using the mp4 format.
ftyp: | Defines the compatibilities of the mp4 container
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mdat: | Raw concatenation of the image, thumbnail, target and filename:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
moov: | Contains the metadata needed to load and present the raw data of mdat
|
Objectives¶
In much of the work in the field of machine learning and deep learning, a bottleneck exists in the dataloading phase itself. This is becoming increasingly recognised as an issue which needs to be solved.
Benzina aims to become a go-to tool for dataloading large datasets. Other tools exist, such as Dali. Yet Benzina concentrates itself on two aspects :
- Highest level of performance for dataloading using GPU as loading device
- Creation of a generalist storage format as a single file facilitating distribution of datasets and useful in the context of file system limits.
Further feature points¶
- Generalist DNN framework methods provided to integrate Benzina to PyTorch and TensorFlow
- Command line programs will be created to assist in Benzina - compatible datasets
- API interface to interact with Benzina
Known limitations and important notes¶
As of September 2020¶
- No TensorFlow integration
- Currently only supports ImageNet
- Unknown effect on model accuracy of transcoding from various JPEG formats to H.265
- High resolution images stored in the
bzna_input track of the input samples are currently
not available through the
Dataloader
. Their varying size prevent them from being decoded using a single hardware decoder configuration. The selected solution is to represent the images in the HEIF format which will be completed in future development. - It is currently not possible to compose transformations like you can with
torchvision.transforms.Compose
butSimilarityTransform
should cover most of the necessary images transformations.
Roadmap¶
Summer 2019¶
- Collaboration phase with researchers
- TensorFlow implementation
- Normalized format
- Specification freeze
- Dataset creation utils
- More tests
- Collaboration with researchers using new format
Autumn 2019¶
Conference Talk on Benzina
How to Contribute¶
This document is heavily based on Contributing to Open Source Projects
Submitting bugs¶
Due diligence¶
Before submitting a bug, please do the following:
Perform basic troubleshooting steps:
- Make sure you’re on the latest version. If you’re not on the most recent version, your problem may have been solved already! Upgrading is always the best first step.
- Try older versions. If you’re already on the latest release, try rolling back a few minor versions (e.g. if on 1.7, try 1.5 or 1.6) and see if the problem goes away. This will help the devs narrow down when the problem first arose in the commit log.
- Try switching up dependency versions. If the software in question has dependencies (other libraries, etc) try upgrading/downgrading those as well.
Search the project’s bug/issue tracker to make sure it’s not a known issue.
If you don’t find a pre-existing issue, consider checking with the mailing list and/or IRC channel in case the problem is non-bug-related.
What to put in your bug report¶
Make sure your report gets the attention it deserves: bug reports with missing information may be ignored or punted back to you, delaying a fix. The below constitutes a bare minimum; more info is almost always better:
What version of the core programming language interpreter are you using? For example, are you using Python 3.5? Python 3.6?
Which version or versions of the software are you using? Ideally, you followed the advice above and have ruled out (or verified that the problem exists in) a few different versions.
How can the developers recreate the bug on their end? If possible, include a copy of your code, the command you used to invoke it, and the full output of your run (if applicable.)
- A common tactic is to pare down your code until a simple (but still bug-causing) “base case” remains. Not only can this help you identify problems which aren’t real bugs, but it means the developer can get to fixing the bug faster.
Contributing changes¶
Licensing of contributed material¶
Keep in mind as you contribute, that code, docs and other material submitted to open source projects are usually considered licensed under the same terms as the rest of the work.
The details vary from project to project, but from the perspective of this document’s authors:
Anything submitted to a project falls under the licensing terms in the repository’s top level
LICENSE
file.- For example, if a project’s
LICENSE
is BSD-based, contributors should be comfortable with their work potentially being distributed in binary form without the original source code.
- For example, if a project’s
Per-file copyright/license headers are typically extraneous and undesirable. Please don’t add your own copyright headers to new files unless the project’s license actually requires them!
- Not least because even a new file created by one individual (who often feels compelled to put their personal copyright notice at the top) will inherently end up contributed to by dozens of others over time, making a per-file header outdated/misleading.
Version control branching¶
Always make a new branch for your work, no matter how small. This makes it easy for others to take just that one set of changes from your repository, in case you have multiple unrelated changes floating around.
- A corollary: don’t submit unrelated changes in the same branch/pull request! The maintainer shouldn’t have to reject your awesome bugfix because the feature you put in with it needs more review.
Base your new branch off of the appropriate branch on the main repository:
Bug fixes should be based on the branch named after the oldest supported release line the bug affects.
- E.g. if a feature was introduced in 1.1, the latest release line is 1.3, and a bug is found in that feature - make your branch based on 1.1. The maintainer will then forward-port it to 1.3 and master.
- Bug fixes requiring large changes to the code or which have a chance of being otherwise disruptive, may need to base off of master instead. This is a judgement call – ask the devs!
New features should branch off of the ‘master’ branch.
- Note that depending on how long it takes for the dev team to merge
your patch, the copy of
master
you worked off of may get out of date! If you find yourself ‘bumping’ a pull request that’s been sidelined for a while, make sure you rebase or merge to latest master to ensure a speedier resolution.
- Note that depending on how long it takes for the dev team to merge
your patch, the copy of
Code formatting¶
- Follow the style you see used in the primary repository! Consistency with the rest of the project always trumps other considerations. It doesn’t matter if you have your own style or if the rest of the code breaks with the greater community - just follow along.
- Python projects usually follow the PEP-8 guidelines (though many have minor deviations depending on the lead maintainers’ preferences.)
Documentation isn’t optional¶
It’s not! Patches without documentation will be returned to sender. By “documentation” we mean:
Docstrings (for Python; or API-doc-friendly comments for other languages) must be created or updated for public API functions/methods/etc. (This step is optional for some bugfixes.)
Don’t forget to include versionadded/versionchanged ReST directives at the bottom of any new or changed Python docstrings!
- Use
versionadded
for truly new API members – new methods, functions, classes or modules. - Use
versionchanged
when adding/removing new function/method arguments, or whenever behavior changes.
- Use
New features should ideally include updates to prose documentation, including useful example code snippets.
All submissions should have a changelog entry crediting the contributor and/or any individuals instrumental in identifying the problem.
Full example¶
Here’s an example workflow for the project Benzina
, which
is currently in hypothetic version 1.0.x. Your username is yourname
and you’re
submitting a basic bugfix.
Preparing your Fork¶
- Click ‘Fork’ on Github, creating e.g.
yourname/Benzina
. - Clone your project:
git clone git@github.com:yourname/Benzina
. cd Benzina
- Create and activate a virtual environment.
- Install the development requirements:
pip install -r dev-requirements.txt
. - Create a branch:
git checkout -b foo-the-bars 1.0
.
Making your Changes¶
- Add changelog entry crediting yourself.
- Hack, hack, hack.
- Commit your changes:
git commit -m "Foo the bars"
Creating Pull Requests¶
- Push your commit to get it back up to your fork:
git push origin HEAD
- Visit Github, click handy “Pull request” button that it will make upon noticing your new branch.
- In the description field, write down issue number (if submitting code fixing an existing issue) or describe the issue + your fix (if submitting a wholly new bugfix).
- Hit ‘submit’! And please be patient - the maintainers will get to you when they can.
API¶
benzina.torch.dataloader¶
benzina.torch.dataset¶
benzina.torch.operations¶
-
class
benzina.torch.operations.
WarpTransform
[source] Interface class that represents a warp transformation as a combined rotation, scale, skew and translation 3 x 3 matrix. The transformation is called for each sample of a batch.
-
class
benzina.torch.operations.
NormTransform
[source] Interface class that represents a normalization transformation. The transformation is called for each sample of a batch.
-
class
benzina.torch.operations.
BiasTransform
[source] Interface class that represents a bias transformation. The transformation is called for each sample of a batch.
-
class
benzina.torch.operations.
ConstantWarpTransform
(warp=(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0))[source] Represents a constant warp transformation to be applied on each sample of a batch independently of its index.
Parameters: warp (iterable of numerics, optional) – a flatten, row-major 3 x 3 warp matrix (default: flatten identity matrix).
-
class
benzina.torch.operations.
ConstantNormTransform
(norm=(1.0, 1.0, 1.0))[source] Represents a constant norm transformation to be applied on each sample of a batch independently of its index.
Parameters: norm (numeric or iterable of numerics, optional) – an iterable in RGB order containing the normalization constant of a sample’s RGB channels. Components will be multiplied to the respective channels of a sample (default: (1.0, 1.0, 1.0)).
-
class
benzina.torch.operations.
ConstantBiasTransform
(bias=(0.0, 0.0, 0.0))[source] Represents a constant bias transformation to be applied on each sample of a batch independently of its index.
Parameters: bias (numeric or iterable of numerics, optional) – an iterable in RGB order containing the bias of a sample’s RGB channels. Components will be substracted to the respective channels of a sample (default: (0.0, 0.0, 0.0)).
-
class
benzina.torch.operations.
SimilarityTransform
(scale=(1.0, 1.0), ratio=None, degrees=(-0.0, 0.0), translate=(0.0, 0.0), flip_h=0.0, flip_v=0.0, resize=False, keep_ratio=False, random_crop=False)[source] Similarity warp transformation of the image keeping center invariant.
A crop of random size, aspect ratio and location is made. This crop can then be flipped and/or rotated to finally be resized to output size.
Parameters: - scale (Sequence or float or int, optional) – crop area scaling factor
interval, e.g (a, b), then scale is randomly sampled from the range
a <= scale <= b. If scale is a number instead of sequence, the
range of scale will be (scale^-1, scale).
(default:
(+1.0, +1.0)
) - ratio (Sequence or float or int, optional) – range of crop aspect ratio. If ratio is a number instead of sequence like (min, max), the range of aspect ratio will be (ratio^-1, ratio). Will keep original aspect ratio by default.
- degrees (Sequence or float or int, optional) – range of degrees to
select from. If degrees is a number instead of sequence like
(min, max), the range of degrees will be (-degrees, +degrees).
(default:
(-0.0, +0.0)
) - translate (Sequence or float or int, optional) – sequence of maximum
absolute fraction for horizontal and vertical translations. For
example translate=(a, b), then horizontal shift is randomly sampled
in the range -output_width * a < dx < output_width * a and vertical
shift is randomly sampled in the range
-output_height * b < dy < output_height * b. If translate is a
number instead of sequence, translate will be
(translate, translate). These translations are applied
independently from
random_crop
. (default:(0.0, 0.0)
) - flip_h (bool, optional) – probability of the image being flipped
horizontally. (default:
+0.0
) - flip_v (bool, optional) – probability of the image being flipped
vertically. (default:
+0.0
) - resize (bool, optional) – resize the cropped image to fit the output
size. It is forced to
True
ifscale
orratio
are specified. (default:False
) - keep_ratio (bool, optional) – match the smaller edge to the
corresponding output edge size, keeping the aspect ratio after
resize. Has no effect if
resize
isFalse
. (default:False
) - random_crop (bool, optional) – randomly crop the image instead of
a center crop. (default:
False
)
- scale (Sequence or float or int, optional) – crop area scaling factor
interval, e.g (a, b), then scale is randomly sampled from the range
a <= scale <= b. If scale is a number instead of sequence, the
range of scale will be (scale^-1, scale).
(default:
-
class
benzina.torch.operations.
RandomResizedCrop
(scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333))[source] Crop to random size, aspect ratio and location.
A crop of random size, aspect ratio and location is made. This crop is finally resized to output size.
This is popularly used to train the Inception networks.
Parameters: - scale (Sequence or float or int, optional) – crop area scaling factor
interval, e.g (a, b), then scale is randomly sampled from the range
a <= scale <= b. If scale is a number instead of sequence, the
range of scale will be (scale^-1, scale).
(default:
(+0.08, +1.0)
) - ratio (Sequence or float or int, optional) – range of crop aspect ratio.
If ratio is a number instead of sequence like (min, max), the range
of aspect ratio will be (ratio^-1, ratio). Will keep original
aspect ratio by default. (default:
(3./4., 4./3.)
)
- scale (Sequence or float or int, optional) – crop area scaling factor
interval, e.g (a, b), then scale is randomly sampled from the range
a <= scale <= b. If scale is a number instead of sequence, the
range of scale will be (scale^-1, scale).
(default:
-
class
benzina.torch.operations.
CenterResizedCrop
(scale=1.0, keep_ratio=True)[source] Crops at the center and resize.
A crop at the center is made then resized to the output size.
Parameters: - scale (float or int, optional) – edges scaling factor.
(default:
+1.0
) - keep_ratio (bool, optional) – match the smaller edge to the
corresponding output edge size, keeping the aspect ratio after
resize. (default:
False
)
- scale (float or int, optional) – edges scaling factor.
(default:
-
benzina.torch.operations.
compute_affine_matrix
(in_shape, out_shape, crop=None, degrees=0.0, translate=(0.0, 0.0), flip_h=False, flip_v=False, resize=False, keep_ratio=False)[source] Similarity warp transformation of the image keeping center invariant.
Parameters: - in_shape (Sequence) – the shape of the input image
- out_shape (Sequence) – the shape of the output image
- crop (Sequence, optional) – crop center location, width and height. The
center location is relative to the center of the image. If
resize
is notTrue
, crop is simply a translation in thein_shape
space. - degrees (float or int, optional) – degrees to rotate the crop.
(default:
(0.0)
) - translate (Sequence, optional) – horizontal and vertical translations.
(default:
(0.0, 0.0)
) - flip_h (bool, optional) – flip the image horizontally.
(default:
False
) - flip_v (bool, optional) – flip the image vertically.
(default:
False
) - resize (bool, optional) – resize the cropped image to fit the output’s
size. (default:
False
) - keep_ratio (bool, optional) – match the smaller edge to the
corresponding output edge size, keeping the aspect ratio after
resize. Has no effect if
resize
isFalse
. (default:False
)
Description of the project¶
Benzina is an image loading library that accelerates image loading and preprocessing by making use of the hardware decoder in NVIDIA’s GPUs.
Since it minimize the use of the CPU and of the GPU computing units, it’s easier to reach saturation of GPU computing power / CPU. In our tests using ResNet18 models in PyTorch on the ImageNet 2012 dataset, we could observe an increase by 1.8x the amount of images loaded, preprocessed then processed by the model when using a single CPU and GPU:
Data Loader | CPU | CPU Workers | CPU Usage | GPU | Batch Size | Pipeline Speed |
---|---|---|---|---|---|---|
Benzina | Intel Xeon 2698* | 1 | 33% | Tesla V100* | 256 | 525 img/s |
PyTorch ImageFolder | Intel Xeon 2698* | 2 | 100% | Tesla V100* | 256 | 290 img/s |
PyTorch ImageFolder | Intel Xeon 2698* | 4 | 100% | Tesla V100* | 256 | 395 img/s |
PyTorch ImageFolder | Intel Xeon 2698* | 6 | 100% | Tesla V100* | 256 | 425 img/s |
DALI | Intel Xeon 2698* | 1 | 100% | Tesla V100* | 256 | 575 img/s |
Note
- Intel Xeon 2698 is the Intel Xeon E5-2698 v4 @ 2.20GHz version
- Tesla V100 is the Tesla V100 SXM2 16GB version
While DALI currently outperforms Benzina, the speedup can only be seen on JPEGs through the nvJPEG decoder. Benzina requires to transcode the input dataset to H.265 but then the gain can be seen on all type of images as well as providing the dataset in a format that is easier to distribute.
The name “Benzina” is a phonetic transliteration of the Ukrainian word “Бензина”, meaning “gasoline” (or “petrol”).