Performance test
Updated at:2025-11-03
Test data: 107 GB dataset, 1,000,000 images, with an average of 110 KB per image
Test environment: Baidu AI Cloud bcc.c5.c8m16, 8 CPUs, 16 GB memory, 3 Gbps intranet bandwidth
Dataset type: Example of building BosIterableDataset via from_prefix
Test results:
| batch_size | num_workers | Dataset type | Construction method | Result | |
|---|---|---|---|---|---|
| bostorchconnector | 256 | 8 | BosIterableDataset | from_prefix | 2785 img/s |
| bosfs | 256 | 8 | BosIterableDataset | from_prefix | 48 img/s |
Test code:
Bash
1def transform(data):
2 data.read()
3 return data.key
4
5@time_it
6def test_bos():
7 config = BosClientConfig()
8 BOS_URI = "bos://bos-torch/img_1M/"
9
10 dataset = BosIterableDataset.from_prefix(BOS_URI, endpoint="http://su.bcebos.com", transform=transform, bos_client_config=config, enable_sharding=True)
11 dataloader = torch.utils.data.DataLoader(dataset, batch_size=256, num_workers=8)
12 for step, key in enumerate(dataloader):
13 print(key)
14 pass
