Objectives

In much of the work in the field of machine learning and deep learning, a bottleneck exists in the dataloading phase itself. This is becoming increasingly recognised as an issue which needs to be solved.

Benzina aims to become a go-to tool for dataloading large datasets. Other tools exist, such as Dali. Yet Benzina concentrates itself on two aspects :

  • Highest level of performance for dataloading using GPU as loading device
  • Creation of a generalist storage format as a single file facilitating distribution of datasets and useful in the context of file system limits.

Further feature points

  • Generalist DNN framework methods provided to integrate Benzina to PyTorch and TensorFlow
  • Command line programs will be created to assist in Benzina - compatible datasets
  • API interface to interact with Benzina