Upload files
In BOS, the basic data unit for user operations is an object. An object consists of a key, metadata, and data. The key is the object’s name, the metadata provides a user-defined description as a set of name-value pairs, and the data is the content of the object.
The BOS Ruby SDK provides a rich set of file upload APIs, and files can be uploaded in the following ways:
- Simple upload
- Append upload
- Multipart upload
- Resumable upload
Simple upload
In simple upload scenarios, BOS supports uploading objects in the form of specified files, data streams, binary strings, and strings. Please refer to the following code:
1# Upload an object in the form of a data stream
2client.put_object(bucket_name, object_name, data)
3# Upload object from a string
4client.put_object_from_string(bucket_name, object_name, "string")
5# Upload object directly from a file
6client.put_object_from_file(bucket_name, object_name, file_path)
Files are uploaded to BOS as objects. The APIs related to putObject support uploading objects no larger than 5 GB. Once the PutObject request is successfully processed, BOS will return the object's ETag in the Header as its unique identifier.
Set file meta information
Object metadata refers to the attributes of files provided by users when uploading to BOS. It is mainly divided into two categories: standard HTTP attribute settings (HTTP headers) and user-defined metadata.
Set object's HTTP Header
The BOS Ruby SDK essentially interacts with the backend HTTP API, allowing users to customize HTTP headers when uploading files. Descriptions of commonly used HTTP headers are listed below:
| Name | Description | Default value |
|---|---|---|
| Content-MD5 | File data verification: After setting, BOS will enable file content MD5 verification, compare the MD5 you provide with the MD5 of the file, and throw an error if they are inconsistent | None |
| Content-Type | File MIME: This defines the file type and web page encoding, determining how the browser reads the file. If unspecified, BOS generates it based on the file's extension. If the file lacks an extension, a default value will be applied. | application/octet-stream |
| Content-Disposition | Indicate how the MINME user agent displays the attached file, whether to open or download it, and the file name | None |
| Content-Length | The length of the uploaded file. If it exceeds the length of the stream/file, it will be truncated; if it is insufficient, it will be the actual value | Stream/file duration |
| Expires | Cache expiration time | None |
| Cache-Control | Specify the caching behavior of the web page when the object is downloaded | None |
Reference code is as follows:
1options = { Http::CONTENT_TYPE => 'string',
2 Http::CONTENT_MD5 => 'md5',
3 Http::CONTENT_DISPOSITION => 'inline',
4 'key1' => 'value1'
5}
6client.put_object_from_string(bucket_name, object_name, "string", options)
User-defined meta information
BOS supports user-defined metadata for describing objects. Example usage is shown in the following code:
1options = {
2 'user-metadata' => { "key1" => "value1" }
3}
4client.put_object_from_string(bucket_name, object_name, "string", options)
Prompt:
- In the above code, the user has customized a metadata with the name "key1" and value "value1".
- When users download this object, this metadata can also be obtained.
- An object may have multiple similar parameters, but the total size of all user meta must not exceed 2KB.
Set storage class when uploading an object
BOS supports standard storage, infrequent access storage, and cold storage. Uploading an object and storing it as an infrequent access storage class is achieved by specifying the StorageClass. The parameters corresponding to the three storage classes are as follows:
| Storage class | Parameters |
|---|---|
| Standard storage | STANDARD |
| Infrequent access storage | STANDARD_IA |
| Cold storage | COLD |
Taking infrequent access storage as an example, the code is as follows:
1# Upload an infrequent access object (the default is a standard object)
2client.put_object_from_file(bucket_name, object_name, file_path, Http::BOS_STORAGE_CLASS => 'STANDARD_IA')
After the putObject request is successfully processed, BOS returns the content-MD5 of the object in the header, which users can use to verify the file.
Append upload
Objects created using the simple upload method described above are all of a standard type and do not support append writes. This limitation can be inconvenient in scenarios where frequent data overwriting occurs, such as log files, video surveillance, and live video streaming.
To address this, Baidu AI Cloud Object Storage (BOS) specifically supports the AppendObject method, which allows files to be uploaded in an append-write fashion. Objects created through the AppendObject operation are categorized as Appendable Objects, enabling data to be appended to them. The size limit for AppendObject files is 0–5 GB.
Example code for uploading via AppendObject is as follows:
1# Upload an appendable object from a string
2client.append_object_from_string(bucket_name, object_name, "string")
3# Start appending from the offset
4client.append_object_from_string(bucket_name, object_name, "append_str", 'offset' => 6)
Multipart upload
Besides uploading files to BOS using simple upload and append upload methods, BOS also offers another upload method called Multipart Upload. This mode can be used in the following scenarios (but is not limited to these):
- When resumable uploads are required.
- When uploading files larger than 5GB.
- When the connection to the BOS server is frequently interrupted due to unstable network conditions.
- Enable streaming file uploads.
- The file size cannot be determined before uploading.
The following will introduce the implementation of Multipart Upload step by step. Suppose there is a file with the local path /path/to/file.zip. Since the file is large, it will be transmitted to BOS in parts.
Initialize Multipart Upload
Use initiate_multipart_upload method to initialize a multipart upload event:
1upload_id = client.initiate_multipart_upload(bucket_name, object_name)["uploadId"]
The return result of initiate_multipart_upload contains uploadId, which is the unique identifier for distinguishing multipart upload events, and we will use it in subsequent operations.
Initialization for uploading an infrequent access storage class object
Initialize a multipart upload event for infrequent access storage:
1options = {
2 Http::BOS_STORAGE_CLASS => 'STANDARD_IA'
3}
4client.initiate_multipart_upload(bucket_name, object_name, options)
Initialization for uploading a cold storage class object
Initialize a multipart upload event for cold storage:
1options = {
2 Http::BOS_STORAGE_CLASS => 'COLD'
3}
4client.initiate_multipart_upload(bucket_name, object_name, options)
Upload parts
The file is then uploaded in multiple parts.
1# Set the starting offset position of the part
2left_size = File.open(multi_file, "r").size()
3offset = 0
4part_number = 1
5part_list = []
6while left_size > 0 do
7 part_size = 5 * 1024 * 1024
8 if left_size < part_size
9 part_size = left_size
10 end
11 response = client.upload_part_from_file(
12 bucket_name, object_name, upload_id, part_number, part_size, multi_file, offset)
13 left_size -= part_size
14 offset += part_size
15 # your should store every part number and etag to invoke complete multi-upload
16 part_list << {
17 "partNumber" => part_number,
18 "eTag" => response['etag']
19 }
20 part_number += 1
21end
The core of the above code is to call the UploadPart method to upload each part, but the following points should be noted:
- The UploadPart method requires each part, except the last one, to be at least 5 MB in size. However, part sizes are not validated by the Upload Part interface until completing the multipart upload process.
- To ensure no errors during network transmission, it is recommended to use the Content-MD5 value returned by BOS for each part after
UploadPartto verify the correctness of the uploaded part data. When all part data is combined into one Object, it no longer contains the MD5 value. - The part number must be within the range of 1 to 10,000. If this limit is exceeded, BOS will return an InvalidArgument error code.
- For each uploaded part, the stream must be positioned at the beginning of the respective part.
- After each Part is uploaded, the return result from BOS will include
eTagandpartNumber, which need to be saved topart_list. Thepart_listtype is an array, where each element is a hash. Each hash contains two keys: partNumber and eTag; these will be used in the subsequent step of completing the multipart upload.
Complete multipart upload
Complete the multipart upload as shown in the following code:
1client.complete_multipart_upload(bucket_name, object_name, upload_id, part_list)
The part_list in the above code is the list of parts saved in the second step. After BOS receives the list of parts submitted by the user, it will verify the validity of each data Part one by one. Once all data parts are validated, BOS will assemble the data parts into a complete Object.
Cancel multipart upload
Users can cancel multipart uploads using the abort_multipart_upload method.
1client.abort_multipart_upload(bucket_name, object_name, upload_id)
Get unfinished multipart upload event
Users can obtain the unfinished multipart upload events in the bucket by the list_multipart_uploads method.
1response = client.list_multipart_uploads(bucket_name)
2puts response['bucket']
3puts response['uploads'][0]['key']
Note:
- By default, if the number of multipart upload events in a bucket surpasses 1,000, only 1,000 records will be returned. In such cases, the IsTruncated value in the response will be True, and the NextKeyMarker will indicate the starting point for the next query.
- To retrieve more multipart upload events, you can utilize the KeyMarker parameter for batch reading.
Get all uploaded information
Users can obtain all uploaded parts in an upload event by the list_parts method:
1response = client.list_parts(bucket_name, object_name, upload_id)
2puts response['bucket']
3puts response['uploads'][0]['key']
Note:
- By default, if the number of multipart upload events in a bucket surpasses 1,000, only 1,000 records will be returned. In such cases, the IsTruncated value in the response will be True, and the NextPartNumberMarker will indicate the starting point for the next query.
- To fetch additional multipart upload events, you can utilize the PartNumberMarker parameter for batch reading.
Resumable upload
When users upload large files to BOS, if the network is unstable or the program crashes, the entire upload will fail, and the parts that have been uploaded before the failure will also be invalid. Users have to start over. This not only wastes resources but also often fails to complete the upload after multiple retries in an unstable network environment. Based on the above scenarios, BOS provides the capability of resumable upload:
- In a generally stable network, it is recommended to use the three-step upload type, dividing the object into 1 MB parts, refer to [Multipart Upload](#Multipart upload).
- If your network condition is very poor, it is recommended to use the appendObject method for resumable upload, appending small data (256 KB) each time, refer to [Append Upload](#Append upload).
Prompt
- Resumable upload is an encapsulation and enhancement of multipart upload, implemented using multipart upload;
- For large files or poor network environments, it is recommended to use multipart upload;
