The advancement of computer vision over the past decade to better than human performance on many tasks has largely been due to the curation of benchmark datasets such as ImageNet. However, existing datasets related to machinery and other engineering applications of AI/ML are still hard to find, not centralized and not in a modern format. The goal of this project is to make applying state of the art Machine Learning and Artificial Intelligence techniques to engineering domains more accessible to any interested researcher.
The machinedatahub Python package allows users to maneuver Machine Data Hub from a local Python environment. The package can be installed in the near future using ‘pip install machine-data-hub’! It allows you to view available datasets, see information about a specific dataset, download a dataset, and suggests related datasets.
Examples of the Python package can be seen in the example blogs below!
Finding a Dataset
The Machine Data Hub has two methods to locate datasets.
Website
The Machine Data Hub website opens up to a large grid of datasets.
The 'Filter' and 'Sort' buttons release drop down menus to filter the datasets. These filters are multi-select.
Each card is click-able. Upon clicking the card, a more detailed view will display. The files associated with each dataset can be downloaded individaully. Additionally, similar datasets will appears undernear the main card. These cards are suggested using similar tags.
ETL Process
Machine Data Hub has its own Extract, Transform, Load (ETL) processes using the Python package. You can view our package here!
Documentation for the package can be found here.
Blogs
Three blogs were developed to help navigate the possibilities of Machine Data Hub.