Upload files
Upload files
In BOS, the basic data unit for user operations is an object. An object consists of a key, metadata, and data. The key is the object’s name, the metadata provides a user-defined description as a set of name-value pairs, and the data is the content of the object.
The BOS C++ SDK provides a rich set of file upload APIs, and files can be uploaded in the following ways:
- Simple upload
- Append upload
- Multipart upload
- Resumable upload
Simple upload
In simple upload scenarios, BOS supports uploading objects in the form of specified files, data streams, file descriptors, and strings. Please refer to the following code:
1int PutObjectDemo(Client& client,const std::string& bucketName, const std::string objectKey){
2 // Obtain file data stream
3 FileInputStream inputStream("/path/to/test.zip"); // bcesdk/util/util.h
4 int ret = 0;
5 // Upload object with file name as parameter
6 ret = client.upload_file(bucketName, objectKey, "/path/to/test.zip");
7 // Upload an object in the form of a data stream
8 ret = client.upload_file(bucketName, objectKey, inputStream);
9 // Upload object in file descriptor form
10 fd_t fd = open("/path/to/test.zip", O_RDWR, 0666);//On Linux, fd_t is defined in common.h
11 ret = client.upload_file(bucketName, objectKey, fd);
12 // Upload object in string form
13 std::string data = "this is data";
14 ret = client.put_object(bucketName, objectKey, data);
15 return ret;
16}
Objects are uploaded to BOS as files. The put_object and upload_file functions support uploading objects of up to 5 GB in size. For files larger than 5 GB, please use upload_super_file. The reference implementation is as follows:
1int PutLargeObjectDemo(Client& client,const std::string& bucketName, const std::string objectKey){
2 std::string fileName = "/path/to/test.zip"
3 return client.upload_super_file(bucketName, objectKey, fileName);//The third parameter can also be in fd_t format
4}
Set file meta information
Object metadata refers to the attributes of files provided by users when uploading to BOS. It is mainly divided into two categories: standard HTTP attribute settings (HTTP headers) and user-defined metadata.
- Set Http header of object
The BOS C++ SDK interacts with the background HTTP API, allowing users to customize the HTTP headers of objects during uploads. Common HTTP headers are described as follows:
| Name | Description | Default value |
|---|---|---|
| Content-MD5 | File data verification: After setting, BOS will enable file content MD5 verification, compare the MD5 you provide with the MD5 of the file, and throw an error if they are inconsistent | None |
| Content-Type | File MIME: This defines the file type and web page encoding, determining how the browser reads the file. If unspecified, BOS generates it based on the file's extension. If the file lacks an extension, a default value will be applied. | application/octet-stream |
| Content-Disposition | Indicate how the MIME user agent displays the attached file, whether to open or download it, and the file name | None |
| Content-Length | The length of the uploaded file. If it exceeds the length of the stream/file, it will be truncated; if it is insufficient, it will be the actual value | Stream/file duration |
| Expires | Cache expiration time | None |
| Cache-Control | Specify the caching behavior of the web page when the object is downloaded | None |
Reference code is as follows:
1...
2 //Initialize meta
3ObjectMetaData meta;
4 // Set ContentType
5meta.set_content_type("application/json");
6 //Set cache-control
7meta.set_cache_control("no-cache");
8 //Set x-bce-storage-class
9meta.set_storage_class("STANDARD");
10ret = client.upload_file(bucketName, objectKey, content, meta);
11...
- User-defined meta information
BOS supports user-defined metadata for describing objects. Example usage is shown in the following code:
1// Set the value of custom metadata name to my-data
2meta.set_user_meta("name", "my-data");
3
4 // Upload Object
5ret = client.upload_file(bucketName, objectKey, file_name, meta);
Prompt:
- In the above code, the user defines a metadata with the name
nameand the valuemy-data- When users download this object, this metadata can also be obtained.
- An object may have multiple similar parameters, but the total size of all user meta must not exceed 2KB.
Set the copy attribute of the object
BOS provides a CopyObject API for copying an existing object to a different object. During the copying process, it evaluates the source object’s ETag or modification status to determine whether to proceed. Detailed parameter descriptions are as follows:
| Name | Types | Description | Whether required |
|---|---|---|---|
| x-bce-copy-source-if-match | std::string | If the ETag value of the source object matches the ETag value provided by the user, the copy operation is performed; otherwise, it fails. | No |
| x-bce-copy-source-if-none-match | std::string | If the ETag value of the source object does not match the ETag value provided by the user, the copy operation is performed; otherwise, it fails. | No |
| x-bce-copy-source-if-unmodified-since | std::string | If the source object has not been modified since x-bce-copy-source-if-unmodified-since, the copy operation will proceed; otherwise, it will fail. | No |
| x-bce-copy-source-if-modified-since | std::string | If the source object has been modified since x-bce-copy-source-if-modified-since, the copy operation will proceed; otherwise, it will fail. | No |
The corresponding example code:
1// Initialize BosClient
2Client client = ...;
3 // Create CopyObjectRequest object
4CopyObjectRequest copyObjectRequest(destBucketName, destKey, srcBucketName, srcKey);
5CopyObjectResponse copyObjectResponse;
6 // Set new Metadata
7StringMap& userMetadata = *(meta.mutable_user_meta());//StringMap == map<string, string>
8userMetadata.clear();
9userMetadata["<user-meta-key>"] = "<user-meta-value>";
10 copyObjectRequest.set_meta(&meta, false);//If the second parameter (is_own) is true, the meta will be deleted when copyObjectRequest is destructed
11//copy-source-if-match
12copyObjectRequest.set_if_match("111111111183bf192b57a4afc76fa632");
13//copy-source-if-none-match
14copyObjectRequest.set_if_none_match("111111111183bf192b57a4afc76fa632");
15
16 std::string gmtDate = TimeUtil::now_gmttime();//Current time in GMT format
17//copy-source-if-modified-since
18copyObjectRequest.set_if_modified_since(gmtDate);
19//copy-source-if-unmodified-since
20copyObjectRequest.set_if_unmodified_since(gmtDate);
21 // Copy Object
22client.copy_object(copyObjectRequest, copyObjectResponse;);
23std::cout << "ETag: " << copyObjectResponse.etag() << " LastModified: " << copyObjectResponse.last_modified() << std::endl;
Set storage class when uploading an object
BOS supports standard storage, infrequent access storage, and cold storage. Uploading an object and storing it as an infrequent access storage class is achieved by specifying the StorageClass. The parameters corresponding to the four storage classes are as follows:
| Storage class | Parameters |
|---|---|
| Standard storage | STANDARD |
| Infrequent access storage | STANDARD_IA |
| Cold storage | COLD |
| Archive storage | ARCHIVE |
Taking infrequent access storage as an example, the code is as follows:
1void print_common_response(BceResponse &result) {
2 printf("status:%d\n", result.status_code());
3 if (result.is_ok()) {
4 printf("request-id:%s\n", result.request_id().c_str());
5 printf("debug-id:%s\n", result.debug_id().c_str());
6 }
7 if (result.is_fail()) {
8 printf("error-message:%s\n", result.error().message().c_str());
9 }
10}
11int putObjectStorageClass(){
12 std::string filename = "file.txt";
13 FileInputStream file(filename);
14 PutObjectRequest request(bucket, object, &file);
15 request.mutable_meta()->set_storage_class("STANDARD_IA");
16 PutObjectResponse result;
17 client.put_object(request, &result);
18 print_common_response(result);
19 printf("etag: %s\n", result.etag().c_str());
20}
Using upload progress bar
1// Upload progress callback function
2 //Note: There must be no time-consuming/blocking operations in this callback function, as it will affect data upload performance.
3 //increment: The amount of data uploaded in this time
4 //transfered: The total amount of data uploaded
5 //total: The amount of data to be uploaded
6 //userData: User-defined data, such as object bucket + key, etc.
7void progress_callback(int64_t increment, int64_t transfered, int64_t total, void* user_data) {
8 std::cout << "progress_callback[" << user_data << "] => " <<
9 increment <<" ," << transfered << "," << total << std::endl;
10}
11 //File to be uploaded
12std::string filename = "/tmp/put_file_test";
13FileInputStream file(filename);
14PutObjectRequest req(BUCKET, "transfer_progress_t1", &file);
15PutObjectResponse rsp;
16
17 //Set data related to upload progress
18 //The TransferProgress structure is in the header file: "bcesdk/common/common.h"
19TransferProgress progress;
20progress.transfer_progress_cb = progress_callback;
21req.set_progress(progress);
22 //Upload data from the filename file
23int ret = client()->put_object(req, &rsp);
24if (ret) {
25 LOGF(WARN, "client err: %d", ret);
26}
27if (rsp.is_fail()) {
28 LOGF(WARN,
29 "put_object: [status_code = %d], [message = %s], [requestid = %s]",
30 rsp.status_code(),
31 rsp.error().message().c_str(),
32 rsp.error().request_id().c_str());
33}
Append upload
Objects created using the simple upload method described above are all of a standard type and do not support append writes. This limitation can be inconvenient in scenarios where frequent data overwriting occurs, such as log files, video surveillance, and live video streaming.
To address this, Baidu AI Cloud Object Storage (BOS) specifically supports the AppendObject method, which allows files to be uploaded in an append-write fashion. Objects created through the AppendObject operation are categorized as Appendable Objects, enabling data to be appended to them. The size limit for AppendObject files is 0–5 GB.
Example code for uploading via AppendObject is as follows:
1int AppendObjectDemo(Client& client,const std::string& bucketName, const std::string& objectKey) {
2 // Obtain data stream
3 FileInputStream inputStream("/path/to/test.zip");
4 // Upload an object in the form of a data stream
5 AppendObjectRequest appendObjectFromInputStreamRequest(bucketName, objectKey, &inputStream);
6 AppendObjectResponse appendObjectFromInputStreamResponse;
7 int ret = client.append_object(appendObjectFromInputStreamRequest, &appendObjectFromInputStreamResponse);
8
9 // Upload object in string form
10 std::string data = "this is data";
11 AppendObjectRequest appendObjecFromStringtRequest(bucketName, objectKey, data);
12 AppendObjectResponse appendObjectFromStringResponse;
13
14 ret = client.append_object(appendObjecFromStringtRequest, &appendObjectFromStringResponse);
15 // Print ETag
16 std::cout << appendObjectFromInputStreamResponse.etag() << std::endl;
17 //Print NextAppendOffset
18 std::cout << appendObjectFromInputStreamResponse.next_append_offset() << std::endl;
19 //Example of append upload, need to add the position of the next append write in the request
20 long long nextAppendOffset = appendObjectFromInputStreamResponse.next_append_offset();
21 AppendObjectRequest appendObjectFromStringRequest(bucketName,objectKey,data);
22 appendObjectFromStringRequest.set_offset(nextAppendOffset);
23 AppendObjectResponse appendObjectFromStringResponse;
24 ret = client.append_object(appendObjectFromStringRequest, &appendObjectFromStringResponse);
25 return ret;
26}
Multipart upload
In addition to simple and append uploads, BOS also offers another upload method: Multipart Upload. Users can utilize the Multipart Upload mode in various scenarios, including (but not limited to) the following:
- When resumable uploads are required.
- When uploading files larger than 5GB.
- When the connection to the BOS server is frequently interrupted due to unstable network conditions.
- Enable streaming file uploads.
- The file size cannot be determined before uploading.
The following will introduce the implementation of Multipart Upload step by step. Suppose there is a file with the local path /path/to/file.zip. Since the file is large, it will be transmitted to BOS in parts.
Initialize multipart upload
Use initiateMultipartUpload method to initialize a multipart upload event:
1// Initiate Multipart Upload
2InitMultiUploadRequest initMultiUploadRequest(bucketName, objectKey);
3InitMultiUploadResponse initMultiUploadResponse;
4int ret = client.init_multipart_upload(initMultiUploadRequest, &initMultiUploadResponse);
5 //Exception handling
6...
7 // Print UploadId
8std::cout << "UploadId: " << initMultiUploadResponse.upload_id() << std::endl;
The return result of initMultiUploadResponse contains UploadId, which is the unique identifier for distinguishing multipart upload events, and we will use it in subsequent operations.
- Initialization for uploading infrequent access storage class objects
Initialize a multipart upload event for infrequent access storage:
1void putMultiUploadStorageClass(){
2 ObjectMetaData meta;
3 meta.set_storage_class("STANDARD_IA");
4 InitMultiUploadRequest initMultiUploadRequest(bucketName, objectKey);
5 InitMultiUploadResponse initMultiUploadResponse;
6 initMultiUploadRequest.set_meta(&meta);
7 client.init_multipart_upload(initMultiUploadRequest, &initMultiUploadResponse);
8}
- Initialization for uploading a cold storage class object
Initialize a multipart upload event for infrequent access storage:
1void putMultiUploadStorageClass(){
2 ObjectMetaData meta;
3 meta.set_storage_class("COLD)");
4 InitMultiUploadRequest initMultiUploadRequest(bucketName, objectKey);
5 InitMultiUploadResponse initMultiUploadResponse;
6 initMultiUploadRequest.set_meta(&meta);
7 client.init_multipart_upload(initMultiUploadRequest, &initMultiUploadResponse);
8}
Upload parts
The file is then uploaded in multiple parts.
1// Set each part to 5MB
2 //[Note] Except for the last part, the size of other parts must be >= 100 KB
3long partSize = 1024 * 1024 * 5L;
4 //Note: When the data to be uploaded in parts is string/in-memory data, the constructor of UploadPartRequest is as follows:
5// UploadPartRequest(const std::string &bucket_name, const std::string &object_name, const std::string &data, int part_number, const std::string &upload_id)
6 //The data field is std::string; do not pass a C-style char* string, as it will cause an error in calculating the data size.
7 //Files to be uploaded in parts
8std::string partFileName = "/path/to/file.zip";
9FileInputStream file(partFileName);
10 // Calculate the count of parts
11int partCount = static_cast<int>(file.get_size() / partSize);
12if (file.get_size() % partSize != 0){
13 partCount++;
14}
15int64_t size = file.size();
16int64_t off = 0;
17std::vector<part_t> partEtags;
18for (int i = 0; off < file.size(); ++i) {
19 if (off + partSize > size) {
20 partSize = size - off;
21 }
22 FileInputStream partFile(file.fd(), off, partSize);
23 UploadPartRequest uploadPartRequest(bucketName, objectName, partFile, i + 1, initMultiUploadResponse.upload_id());
24 UploadPartResponse uploadPartResponse;
25 int ret = client.upload_part(uploadPartRequest, &uploadPartResponse);
26 //Verify the return value
27 // Save the returned PartETag to the List.
28 part_t partInfo;
29 partInfo.part_number = i+1;
30 partInfo.etag = uploadPartResponse.etag();
31 partEtags.push_back(partInfo);
32
33 off += partSize;
34}
The core of the above code is to call the upload_part method to upload each part concurrently, but the following points should be noted:
upload_partrequires that the size of each part, except the last one, must be greater than or equal to 100 KB. However, the Upload Part API does not immediately verify the size of the uploaded part; the verification is only performed when Complete Multipart Upload is called. If the block size in the upload_part process does not meet the expectation, thecomplete_multipart_uploadAPI will report an error.- To ensure no errors during network transmission, it is recommended to use the Content-MD5 value returned by BOS for each part after
upload_partto verify the correctness of the uploaded part data. When all part data is combined into one Object, it no longer contains the MD5 value. - The part number must be within the range of 1 to 10,000. If this limit is exceeded, BOS will return an InvalidArgument error code.
- For each uploaded part, the stream must be positioned at the beginning of the respective part.
- After each Part upload, the return result of BOS will include a
ETagobject, which is a combination of the uploaded block's ETag and block number (PartNumber). It will be used in subsequent steps to complete the multipart upload, so it must be saved. Generally speaking, theseETagobjects will be saved in a vector.
Complete multipart upload
Complete the multipart upload as shown in the following code:
1CompleteMultipartUploadRequest completeMultipartUploadRequest(bucketName, objectKey, initMultiUploadResponse.upload_id());
2 //Add part information, i.e., the order of part merging
3for (part_t partInfo : partEtags) {
4 completeMultipartUploadRequest.add_part(partInfo.part_number, partInfo.etag);
5}
6
7 // Complete multipart upload
8CompleteMultipartUploadResponse completeMultipartUploadResponse;
9int ret = client.complete_multipart_upload(completeMultipartUploadRequest, &completeMultipartUploadResponse);
10
11 // Print Object's ETag
12std::cout << completeMultipartUploadResponse.etag() << std::endl;
The partETags in the above code is the list of part_t saved in the second step. After BOS receives the part list submitted by the user, it will verify the validity of each data Part one by one. Once all data parts are validated, BOS will assemble the data parts into a complete Object.
Cancel multipart upload event
Users can cancel multipart uploads by using the abortMultipartUpload method.
1AbortMultipartUploadRequest abortMultipartUploadRequest(bucketName, objectKey, uploadId);
2AbortMultipartUploadResponse abortMultipartUploadResponse;
3 // Cancel multipart upload
4int ret = client.abort_multipart_upload(abortMultipartUploadRequest, &abortMultipartUploadResponse);
Retrieve unfinished multipart upload event
Users can obtain the unfinished multipart upload events in the bucket by the list_multipart_uploads method.
1ListMultipartUploadsRequest listMultipartUploadsRequest(bucketName);
2ListMultipartUploadsResponse listMultipartUploadsResponse;
3 // Retrieve all upload events within the bucket
4int ret = client.list_multipart_uploads(listMultipartUploadsRequest, &
5listMultipartUploadsResponse);
6if (ret != 0) {
7 return ret;
8}
9 // Traverse all upload events
10for (const MultipartUploadSummary& multipartUpload : listMultipartUploadsResponse.uploads()) {
11 std::cout << "Key: " << multipartUpload.key <<
12 " UploadId: " << multipartUpload.upload_id << std::endl;
13}
Note:
- By default, if the number of multipart upload events in a bucket exceeds 1,000, only 1,000 objects will be returned, and the
is_truncatedvalue in the return result will beTrue, andnext_markerwill be returned as the starting point for the next reading.- To return more multipart upload events, you can use the
set_markerfunction to setmarkerfor batch reading.
Get all uploaded part information
Users can obtain all uploaded parts in an upload event by the listParts method.
1ListPartsRequest listPartsRequest(bucketName, objectKey, uploadId);
2
3 // Retrieve all uploaded part information
4ListPartsResponse listPartsResponse;
5int ret = client.list_parts(listPartsRequest, &listPartsResponse);
6if (ret != 0) {
7 return ret;
8}
9 // Traverse all parts
10for (consr PartSummary& part : listPartsResponse.parts()) {
11 std::cout << "PartNumber: " << part.part_number << " ETag: " << part.etag;
12}
If you need to view the storage class of an object, use the following code:
1public void listPartsStorageClass(){
2 ListPartsRequest listPartsRequest(bucketName, objectKey, uploadId);
3
4 // Retrieve all uploaded part information
5 ListPartsResponse listPartsResponse;
6 int ret = client.list_parts(listPartsRequest, &listPartsResponse);
7 if (ret != 0) {
8 return ret;
9 }
10 std::string storageClass = listPartsResponse.storage_class();
11}
Resumable upload
When users upload large files to BOS, if the network is unstable or the program crashes, the entire upload will fail, and the parts that have been uploaded before the failure will also be invalid. Users have to start over. This not only wastes resources but also often fails to complete the upload after multiple retries in an unstable network environment. Based on the above scenarios, BOS provides the capability of resumable upload:
- In a generally stable network, it is recommended to use the three-step upload method, dividing the object into 5 MB blocks, refer to [Multipart Upload](#Multipart upload).
- If your network condition is very poor, it is recommended to use the
append_objectmethod for resumable upload, appending small data (256 KB) each time, refer to [Append Upload](#Append upload).
Tips
- Resumable upload is an encapsulation and enhancement of multipart upload, implemented using multipart upload;
- For large files or poor network environments, it is recommended to use multipart upload;
