Upload files
Upload files
In BOS, the basic data unit for user operations is an object. An object consists of a key, metadata, and data. The key is the object’s name, the metadata provides a user-defined description as a set of name-value pairs, and the data is the content of the object.
The BOS C SDK provides a rich set of file upload APIs, and files can be uploaded in the following ways:
- Simple upload
- Append upload
- Multipart upload
- Resumable upload
Simple upload
In simple upload scenarios, BOS supports uploading objects in the form of specified files, data streams, file descriptors, and strings. Please refer to the following code:
1 bos_pool_t *p = NULL;
2 char *object_name = "bos_test_put_object.ts";
3 char *str = "test bos c sdk";
4 bos_status_t *s = NULL;
5 int is_cname = 0;
6 bos_string_t bucket;
7 bos_string_t object;
8 bos_table_t *headers = NULL;
9 bos_table_t *head_headers = NULL;
10 bos_table_t *head_resp_headers = NULL;
11 char *content_type = NULL;
12 bos_request_options_t *options = NULL;
13 /* test put object */
14 bos_pool_create(&p, NULL);
15 options = bos_request_options_create(p);
16 init_test_request_options(options, is_cname);
17 headers = bos_table_make(p, 1);
18 apr_table_set(headers, "x-bos-meta-author", "bos");
19 s = create_test_object(options, TEST_BUCKET_NAME, object_name, str, headers);
Objects are uploaded to BOS as files. The put_object and upload_file functions support uploading objects of up to 5 GB in size. For files larger than 5 GB, please use upload_super_file. The reference implementation is as follows:
1 bos_pool_t *p = NULL;
2 int is_cname = 0;
3 bos_status_t *s = NULL;
4 bos_request_options_t *options = NULL;
5 bos_acl_e bos_acl = BOS_ACL_PRIVATE;
6 bos_string_t bucket;
7 bos_table_t *resp_headers;
8 bos_string_t object;
9 bos_string_t file_path;
10 /* create test bucket */
11 bos_pool_create(&p, NULL);
12 options = bos_request_options_create(p);
13 init_test_request_options(options, is_cname);
14 s = create_test_bucket(options, TEST_BUCKET_NAME, bos_acl);
15 bos_str_set(&bucket, TEST_BUCKET_NAME);
16 bos_str_set(&object, "test.mp4");
17 bos_str_set(&file_path, "../../../test.mp4");
18 s = bos_put_object_from_file(options, &bucket, &object, &file_path, NULL, &resp_headers);
Set file meta information
Object metadata refers to the attributes of files provided by users when uploading to BOS. It is mainly divided into two categories: standard HTTP attribute settings (HTTP headers) and user-defined metadata.
- Set Http header of object
The BOS C SDK interacts with the background HTTP API, enabling users to define the HTTP headers of objects during file uploads. The following describes commonly used HTTP headers:
| Name | Description | Default value |
|---|---|---|
| Content-MD5 | File data verification: After setting, BOS will enable file content MD5 verification, compare the MD5 you provide with the MD5 of the file, and throw an error if they are inconsistent | None |
| Content-Type | File MIME: This defines the file type and web page encoding, determining how the browser reads the file. If unspecified, BOS generates it based on the file's extension. If the file lacks an extension, a default value will be applied. | application/octet-stream |
| Content-Disposition | Indicate how the MIME user agent displays the attached file, whether to open or download it, and the file name | None |
| Content-Length | The length of the uploaded file. If it exceeds the length of the stream/file, it will be truncated; if it is insufficient, it will be the actual value | Stream/file duration |
| Expires | Cache expiration time | None |
| Cache-Control | Specify the caching behavior of the web page when the object is downloaded | None |
Reference code is as follows:
1...
2headers = bos_table_make(p, 1);
3apr_table_set(headers, BOS_CONTENT_TYPE, "image/jpeg");
4...
- User-defined meta information
BOS supports user-defined metadata for describing objects. Example usage is shown in the following code:
1 apr_table_set(headers, "x-bce-meta-author", "bos");
Prompt:
- In the above code, the user has customized a metadata with the name “x-bce-meta-author” and the value “bos”
- When users download this object, this metadata can also be obtained.
- An object may have multiple similar parameters, but the total size of all user meta must not exceed 2KB.
Set the copy attribute of the object
BOS provides a CopyObject API for copying an existing object to a different object. During the copying process, it evaluates the source object’s ETag or modification status to determine whether to proceed. Detailed parameter descriptions are as follows:
| Name | Types | Description | Whether required |
|---|---|---|---|
| x-bce-copy-source-if-match | std::string | If the ETag value of the source object matches the ETag value provided by the user, the copy operation is performed; otherwise, it fails. | No |
| x-bce-copy-source-if-none-match | std::string | If the ETag value of the source object does not match the ETag value provided by the user, the copy operation is performed; otherwise, it fails. | No |
| x-bce-copy-source-if-unmodified-since | std::string | If the source object has not been modified since x-bce-copy-source-if-unmodified-since, the copy operation will proceed; otherwise, it will fail. | No |
| x-bce-copy-source-if-modified-since | std::string | If the source object has been modified since x-bce-copy-source-if-modified-since, the copy operation will proceed; otherwise, it will fail. | No |
The corresponding example code:
1bos_pool_t *p = NULL;
2 int is_cname = 0;
3 bos_status_t *s = NULL;
4 bos_request_options_t *options = NULL;
5 bos_string_t bucket;
6 bos_string_t object;
7 bos_string_t src_bucket;
8 bos_string_t src_object;
9 bos_string_t src_endpoint;
10 bos_table_t *resp_headers = NULL;
11 bos_pool_create(&p, NULL);
12 options = bos_request_options_create(p);
13 init_test_request_options(options, is_cname);
14 bos_str_set(&bucket, TEST_BUCKET_NAME);
15 bos_str_set(&object, "test_copy.txt");
16 bos_str_set(&src_bucket, TEST_BUCKET_NAME);
17 bos_str_set(&src_object, "bos_test_put_object.ts");
18 bos_str_set(&src_endpoint, options->config->endpoint.data);
19 bos_copy_object_params_t *params = NULL;
20 params = bos_create_copy_object_params(p);
21 bos_table_t *headers = bos_table_make(p, 2);
22 apr_table_add(headers, "x-bce-metadata-directive", "replace");
23 apr_table_add(headers, "x-bce-storage-class", "STANDARD_IA");
24 json_t *root;
25 s = bos_copy_object(options, &src_bucket, &src_object, &bucket, &object, headers, &root, &resp_headers);
Set storage class when uploading an object
BOS supports standard storage, infrequent access storage, and cold storage. Uploading an object and storing it as an infrequent access storage class is achieved by specifying the StorageClass. The parameters corresponding to the three storage classes are as follows:
| Storage class | Parameters |
|---|---|
| Standard storage | STANDARD |
| Infrequent access storage | STANDARD_IA |
| Cold storage | COLD |
| Archive storage | ARCHIVE |
Append upload
Objects created using the simple upload method described above are all of a standard type and do not support append writes. This limitation can be inconvenient in scenarios where frequent data overwriting occurs, such as log files, video surveillance, and live video streaming.
To address this, Baidu AI Cloud Object Storage (BOS) specifically supports the AppendObject method, which allows files to be uploaded in an append-write fashion. Objects created through the AppendObject operation are categorized as Appendable Objects, enabling data to be appended to them. The size limit for AppendObject files is 0–5 GB.
Example code for uploading via AppendObject is as follows:
1 bos_pool_t *p = NULL;
2 char *object_name = "bos_test_append_object_from_file";
3 bos_string_t bucket;
4 bos_string_t object;
5 char *filename = __FILE__;
6 bos_string_t append_file;
7 bos_status_t *s = NULL;
8 int is_cname = 0;
9 int64_t position = 0;
10 bos_table_t *headers = NULL;
11 bos_table_t *resp_headers = NULL;
12 bos_request_options_t *options = NULL;
13 /* test append object */
14 bos_pool_create(&p, NULL);
15 options = bos_request_options_create(p);
16 init_test_request_options(options, is_cname);
17 headers = bos_table_make(p, 0);
18 bos_str_set(&bucket, TEST_BUCKET_NAME);
19 bos_str_set(&object, object_name);
20 bos_str_set(&append_file, filename);
21 s = bos_append_object_from_file(options, &bucket, &object, position,
22 &append_file, headers, &resp_headers);
Multipart upload
In addition to simple and append uploads, BOS also offers another upload method: Multipart Upload. Users can utilize the Multipart Upload mode in various scenarios, including (but not limited to) the following:
- When resumable uploads are required.
- When uploading files larger than 5GB.
- When the connection to the BOS server is frequently interrupted due to unstable network conditions.
- Enable streaming file uploads.
- The file size cannot be determined before uploading.
The following will introduce the implementation of Multipart Upload step by step. Suppose there is a file with the local path /path/to/file.zip. Since the file is large, it will be transmitted to BOS in parts.
Initialize multipart upload
Use init_test_multipart_upload method to initialize a multipart upload event:
1 bos_status_t *s = NULL;
2 bos_table_t *resp_headers = NULL;
3 bos_string_t object;
4 bos_table_t *headers = NULL;
5 bos_table_t *complete_headers = NULL;
6 bos_string_t upload_id;
7 bos_upload_file_t *upload_file = NULL;
8 bos_list_upload_part_params_t *params = NULL;
9 bos_list_t complete_part_list;
10 bos_list_part_content_t *part_content = NULL;
11 bos_complete_part_content_t *complete_part_content = NULL;
12 int part_num = 1;
13 int64_t pos = 0;
14 int64_t file_length = 0;
15 bos_str_set(&object, "test1");
16 s = bos_init_multipart_upload(options, &bucket, &object,
17 &upload_id, headers, &resp_headers);
18 if (bos_status_is_ok(s)) {
19 printf("Init multipart upload succeeded, upload_id:%.*s\n",
20 upload_id.len, upload_id.data);
21 } else {
22 printf("Init multipart upload failed\n");
23 return;
24 }
The return result of initMultiUploadResponse contains UploadId, which is the unique identifier for distinguishing multipart upload events, and we will use it in subsequent operations.
Upload parts
The file is then uploaded in multiple parts.
1 int res = BOSE_OK;
2 bos_file_buf_t *fb = bos_create_file_buf(p);
3 res = bos_open_file_for_all_read(p, TEST_MULTIPART_FILE, fb);
4 if (res != BOSE_OK) {
5 bos_error_log("Open read file fail, filename:%s\n", TEST_MULTIPART_FILE);
6 return;
7 }
8 file_length = fb->file_last;
9 apr_file_close(fb->file);
10 while(pos < file_length) {
11 upload_file = bos_create_upload_file(p);
12 bos_str_set(&upload_file->filename, TEST_MULTIPART_FILE);
13 upload_file->file_pos = pos;
14 pos += 2 * 1024 * 1024;
15 upload_file->file_last = pos < file_length ? pos : file_length; //2MB
16 s = bos_upload_part_from_file(options, &bucket, &object, &upload_id,
17 part_num++, upload_file, &resp_headers);
18 if (bos_status_is_ok(s)) {
19 printf("Multipart upload part from file succeeded\n");
20 } else {
21 printf("Multipart upload part from file failed\n");
22 }
23 }
The core of the above code is to call the bos_upload_part_from_file method to upload each part concurrently, but the following points should be noted:
bos_upload_part_from_filerequires that the size of each part, except the last one, must be greater than or equal to 100 KB. However, the Upload Part API does not immediately verify the size of the uploaded part; the verification is only performed when Complete Multipart Upload is called. If the block size in the upload_part process does not meet the expectation, thecomplete_multipart_uploadAPI will report an error.- To ensure no errors during network transmission, it is recommended to use the Content-MD5 value returned by BOS for each part after
upload_partto verify the correctness of the uploaded part data. When all part data is combined into one Object, it no longer contains the MD5 value. - The part number must be within the range of 1 to 10,000. If this limit is exceeded, BOS will return an InvalidArgument error code.
- For each uploaded part, the stream must be positioned at the beginning of the respective part.
- After each Part upload, the return result of BOS will include a
ETagobject, which is a combination of the uploaded block's ETag and block number (PartNumber). It will be used in subsequent steps to complete the multipart upload, so it must be saved. Generally speaking, theseETagobjects will be saved in a vector.
Complete multipart upload
Complete the multipart upload as shown in the following code:
1 s = bos_list_upload_part(options, &bucket, &object, &upload_id,
2 params, &list_part_resp_headers);
3 CuAssertIntEquals(tc, 200, s->code);
4 CuAssertIntEquals(tc, 1, params->truncated);
5 CuAssertStrEquals(tc, expect_part_num_marker,
6 params->next_part_number_marker.data);
7 CuAssertPtrNotNull(tc, list_part_resp_headers);
8 bos_list_for_each_entry(bos_list_part_content_t, part_content1, ¶ms->part_list, node) {
9 complete_content1 = bos_create_complete_part_content(p);
10 bos_str_set(&complete_content1->part_number, part_content1->part_number.data);
11 bos_str_set(&complete_content1->etag, part_content1->etag.data);
12 bos_list_add_tail(&complete_content1->node, &complete_part_list);
13 }
14 bos_list_init(¶ms->part_list);
15 if (params->next_part_number_marker.data) {
16 bos_str_set(¶ms->part_number_marker, params->next_part_number_marker.data);
17 }
18 s = bos_list_upload_part(options, &bucket, &object, &upload_id, params, &list_part_resp_headers);
19 CuAssertIntEquals(tc, 200, s->code);
20 CuAssertIntEquals(tc, 0, params->truncated);
21 CuAssertPtrNotNull(tc, list_part_resp_headers);
22 bos_list_for_each_entry(bos_list_part_content_t, part_content2, ¶ms->part_list, node) {
23 complete_content2 = bos_create_complete_part_content(p);
24 bos_str_set(&complete_content2->part_number, part_content2->part_number.data);
25 bos_str_set(&complete_content2->etag, part_content2->etag.data);
26 bos_list_add_tail(&complete_content2->node, &complete_part_list);
27 }
28 bos_complete_part_content_t *content_test;
29 bos_list_for_each_entry(bos_complete_part_content_t, content_test, &complete_part_list, node) {
30 }
31 s = bos_complete_multipart_upload(options, &bucket, &object, &upload_id,
32 &complete_part_list, complete_headers, &resp_headers);
The complete_part_list in the above code is the list of part_t saved in the second step. After BOS receives the part list submitted by the user, it will verify the validity of each data Part one by one. Once all data parts are validated, BOS will assemble the data parts into a complete Object.
Cancel multipart upload event
Users can cancel multipart uploads by using the abortMultipartUpload method.
1 bos_status_t *s = NULL;
2 bos_table_t *resp_headers = NULL;
3 bos_string_t object;
4 bos_table_t *headers = NULL;
5 bos_string_t upload_id;
6 bos_str_set(&object, "test1");
7 s = bos_abort_multipart_upload(options, &bucket, &object, &upload_id,
8 &resp_headers);
Retrieve unfinished multipart upload event
Users can obtain the unfinished multipart upload events in the bucket by the bos_list_multipart_upload method.
1 bos_pool_t *p = NULL;
2 bos_string_t bucket;
3 char *object_name1 = "bos_test_abort_multipart_upload1";
4 char *object_name2 = "bos_test_abort_multipart_upload2";
5 int is_cname = 0;
6 bos_request_options_t *options = NULL;
7 bos_string_t upload_id1;
8 bos_string_t upload_id2;
9 bos_status_t *s = NULL;
10 bos_table_t *resp_headers;
11 bos_list_multipart_upload_params_t *params = NULL;
12 char *expect_next_key_marker = "bos_test_abort_multipart_upload1";
13 bos_pool_create(&p, NULL);
14 options = bos_request_options_create(p);
15 init_test_request_options(options, is_cname);
16 s = init_test_multipart_upload(options, TEST_BUCKET_NAME, object_name1, &upload_id1);
17 CuAssertIntEquals(tc, 200, s->code);
18 s = init_test_multipart_upload(options, TEST_BUCKET_NAME, object_name2, &upload_id2);
19 CuAssertIntEquals(tc, 200, s->code);
20 params = bos_create_list_multipart_upload_params(p);
21 params->max_ret = 1;
22 bos_str_set(&bucket, TEST_BUCKET_NAME);
23 s = bos_list_multipart_upload(options, &bucket, params, &resp_headers);
Note:
- By default, if the number of multipart upload events in a bucket exceeds 1,000, only 1,000 objects are returned. In such cases, the is_truncated field in the return result will be set to True, and the next_marker value will indicate the starting point for the next retrieval.
- To retrieve more multipart upload events, you can use the set_marker function to set a marker for batch reading.
Get all uploaded part information
Users can obtain all uploaded parts in an upload event by the listParts method.
1 params = bos_create_list_upload_part_params(p);
2 params->max_ret = 1;
3 bos_list_init(&complete_part_list);
4 s = bos_list_upload_part(options, &bucket, &object, &upload_id,
5 params, &list_part_resp_headers);
Resumable upload
When users upload large files to BOS, if the network is unstable or the program crashes, the entire upload will fail, and the parts that have been uploaded before the failure will also be invalid. Users have to start over. This not only wastes resources but also often fails to complete the upload after multiple retries in an unstable network environment. Based on the above scenarios, BOS provides the capability of resumable upload:
- In a generally stable network, it is recommended to use the three-step upload method, dividing the object into 5 MB blocks, refer to [Multipart Upload](#Multipart upload).
- If your network condition is very poor, it is recommended to use the append_object method for resumable upload, appending small data (256 KB) each time, refer to [Append Upload](#Append upload).
Tips
- Resumable upload is an encapsulation and enhancement of multipart upload, implemented using multipart upload;
- For large files or poor network environments, it is recommended to use multipart upload;
