How to Use Machine Data Hub

The advancement of computer vision over the past decade to better than human performance on many tasks has largely been due to the curation of benchmark datasets such as ImageNet. However, existing datasets related to machinery and other engineering applications of AI/ML are still hard to find, not centralized and not in a modern format. The goal of this project is to make applying state of the art Machine Learning and Artificial Intelligence techniques to engineering domains more accessible to any interested researcher.

You can view our code here!

Download CLI

The machinedatahub Python package allows users to maneuver Machine Data Hub from a local Python environment. The package can be installed in the near future using ‘pip install machine-data-hub’! It allows you to view available datasets, see information about a specific dataset, download a dataset, and suggests related datasets.

Examples of the Python package can be seen in the example blogs below!

Finding a Dataset

The Machine Data Hub has two methods to locate datasets.


The Machine Data Hub website opens up to a large grid of datasets.

Main Page

The 'Filter' and 'Sort' buttons release drop down menus to filter the datasets. These filters are multi-select.

Sorter Page
Each card is click-able. Upon clicking the card, a more detailed view will display. The files associated with each dataset can be downloaded individaully. Additionally, similar datasets will appears undernear the main card. These cards are suggested using similar tags.
Sorter Page

ETL Process

Machine Data Hub has its own Extract, Transform, Load (ETL) processes using the Python package. You can view our package here! Documentation for the package can be found here.


Three blogs were developed to help navigate the possibilities of Machine Data Hub.


Pull requests welcomed. File issues here.