Objectives¶
In much of the work in the field of machine learning and deep learning, a bottleneck exists in the dataloading phase itself. This is becoming increasingly recognised as an issue which needs to be solved.
Benzina aims to become a go-to tool for dataloading large datasets. Other tools exist, such as Dali. Yet Benzina concentrates itself on two aspects :
- Highest level of performance for dataloading using GPU as loading device
- Creation of a generalist storage format as a single file facilitating distribution of datasets and useful in the context of file system limits.
Further feature points¶
- Generalist DNN framework methods provided to integrate Benzina to PyTorch and TensorFlow
- Command line programs will be created to assist in Benzina - compatible datasets
- API interface to interact with Benzina