Overview
The BOS Connector for PyTorch is a high-performance data access tool provided by Baidu AI Cloud Object Storage (BOS) for AI training. It delivers optimized high-performance access to BOS-stored data during PyTorch training tasks by automatically optimizing BOS read and list requests, enhancing data loading and checkpoint operations in PyTorch workflows.
The BOS Connector for PyTorch supports reading BosMapDataset for random data access and BosIterableDataset for sequential data access from BOS. Additionally, this tool enables direct saving of checkpoint data to BOS without requiring local storage as an intermediary.
Compared with using object storage through mounting tools such as bosfs, BOS Connector for PyTorch has the following advantages:
| Dimension | Mounting tools such as bosfs | BOS Connector for Pytorch |
|---|---|---|
| Performance | Low, no targeted optimization | High, with specific optimizations for training set data loading and checkpoint data reading/writing |
| Data loading method | Require pre-downloading (preheating) data | Support streaming loading |
| Data access | Require transfer and goes through fs escape | Directly read and write BOS |
| Configuration complexity | Relatively complex | Provides simple configuration and is ready to use out of the box |
