百度智能云

All Product Document

          Object Storage

          File Management

          Upload Files

          In BOS, the basic unit of data for user operations is object. object includes Key, Meta and Data. Key is the name of object. Meta is the user's description of the object and consists of a series of Name-Value pairs. Data is the data of the object.

          BOS Ruby SDK provides rich file upload interfaces, and can upload files with the following methods:

          • Simple Upload
          • Append Upload
          • Multipart Upload
          • Breakpoint Continued Upload

          Simple Upload

          BOS supports object upload in the forms of file, data stream, binary string and string in a simple upload, see the following codes:

          # Upload object as the data stream
          client.put_object(bucket_name, object_name, data)
          
          # Upload object from string
          client.put_object_from_string(bucket_name, object_name, "string")
          
          # Directly upload object from file
          client.put_object_from_file(bucket_name, object_name, file_path)

          Object is uploaded to OS in file form, and relevant interface of putObject supports upload of object no more than 5GB. After the Putobject request is successfully processed, BOS will return the ETag of object as a file identifier in the Header.

          Set Object Metadata

          Object metadata is the attribute description of the file when the user uploads the file to BOS. It is mainly divided into two types: Set HTTP Headers and custom meta-information.

          Set the Http Header of Object

          BOS Ruby SDK is to call the HTTP interface of backstage essentially; therefore, the user can customize the Http Header of object when uploading the file. Common http headers are described as follows:

          Name Description Default value
          Content-MD5 File data verification, BOS will enable MD5 verification of file content after setting. Comparing the MD5 you provided with the MD5 of the file, an error will be thrown if it is inconsistent. None
          Content-Type The MIME of the file defines the type of the file and the web page code, and determines the form and code in which the browser will read the file. If it does not indicate, BOS will automatically generate according to the file extension. If the file does not have an extension, the default value will be filled in. application/octet-stream
          Content-Disposition It instructs the MINME user agent how to display additional files, open or download, and file names. None
          Content-Length If the length of the uploaded file exceeds that of the stream/file, it will be cut; otherwise, it will be calculated as the actual value. Stream/File time length
          Expires Cache expiration time None
          Cache-Control It specifies the caching behavior of the web page when the object is downloaded. None

          The reference codes are as follows:

          options = { Http::CONTENT_TYPE => 'string',
                      Http::CONTENT_MD5 => 'md5',
                      Http::CONTENT_DISPOSITION => 'inline',
                      'key1' => 'value1'
          }
          
          client.put_object_from_string(bucket_name, object_name, "string", options)

          Custom Metadata

          Custom metadata is available under BOS for object description. As shown in the following code:

          options = { 
                      'user-metadata' => { "key1" => "value1" }
          }
          
          client.put_object_from_string(bucket_name, object_name, "string", options)

          Tips:

          • Among the above code, the user customizes a metadata with its name of "key1" and the value of "value one"
          • When users download this object, they can get metadata together.
          • One object possesses similar parameters, but the total size of User Meta bellows 2KB.

          Set Storage Type When Uploading Object

          BOS supports standard storage, infrequency storage and cold storage. Uploading object and storing it as a infrequency storage type can be realized by specifying StorageClass. The corresponding parameters of the three storage types are as follows:

          Storage type Parameter
          Standard storage STANDRAD
          infrequency storage STANDARD_IA
          Cold storage COLD

          Take infrequency storage as an example, the code is as follows:

          # Upload a infrequency object (the default is a standard object)
          client.put_object_from_file(bucket_name, object_name, file_path, Http::BOS_STORAGE_CLASS => 'STANDARD_IA')

          After putObject request is handled successfully, BOS returns Content-MD5 of object in Header, and you can check files according to this parameter.

          Append Upload

          In the simple upload method introduced above, the objects created are of Normal type, and you cannot append, as it is inconvenient to use in scenarios where data copying is frequent, such as log, video monitoring and live video.

          For this reason, Baidu AI Cloud BOS specifically supports appendObject, namely, uploading files through append. The object created by the appendObject operation is of Appendable object. And you can append data to the object. appendObject size is 0-5G.

          The sample code uploaded through appendObject is as follows:

          # Upload an appendable object from string
          client.append_object_from_string(bucket_name, object_name, "string")
          
          # Start to append from offset
          client.append_object_from_string(bucket_name, object_name, "append_str", 'offset' => 6)

          Multipart Upload

          In addition to simple upload and append upload, BOS also provides another upload mode, namely, Multipart Upload. You can use Multipart Upload mode in the following application scenarios (but not limited to this), such as:

          • Breakpoint upload support is required.
          • The file to upload is larger than 5 GB.
          • The network conditions are poor, and the connection with BOS servers is often disconnected.
          • The file needs to be uploaded streaming.
          • The size of the uploaded file cannot be determined before uploading it.

          Next, the implementation of Multipart Upload is described step by step. Suppose you have a file with the local path of /path/to/file.zip , upload it through Multipart Upload to BOS for its large size.

          Initialize Multipart Upload

          Use initiate_multipart_upload method to initialize a chunked upload event:

          upload_id = client.initiate_multipart_upload(bucket_name, object_name)["uploadId"] 

          The returned results of initiate_multipart_upload contain uploadId , which is the unique identifier for the chunked upload event, and we will use it in the next operation.

          Initialization of Uploaded Infrequency Storage Object

          Initialize a Multipart Upload event of infrequency storage:

          options = { 
                      Http::BOS_STORAGE_CLASS => 'STANDARD_IA'
          }
          client.initiate_multipart_upload(bucket_name, object_name, options)

          Initialization of Uploaded Cold Storage Object

          Initialize a multipart upload event of cold storage:

          options = { 
                      Http::BOS_STORAGE_CLASS => 'COLD'
          }
          client.initiate_multipart_upload(bucket_name, object_name, options)

          Upload in Parts

          Next, upload the file in parts.

          # Set the offset starting position of part 
          left_size = File.open(multi_file, "r").size()
          offset = 0
          part_number = 1
          part_list = []
          
          while left_size > 0 do
              part_size = 5 * 1024 * 1024
              if left_size < part_size
                  part_size = left_size
              end
          
              response = client.upload_part_from_file(
                  bucket_name, object_name, upload_id, part_number, part_size, multi_file, offset)
              left_size -= part_size
              offset += part_size
              # your should store every part number and etag to invoke complete multi-upload
              part_list << {
                  "partNumber" => part_number,
                  "eTag" => response['etag']
              }
              part_number += 1
          end

          The core of the above code is to call the UploadPart method to upload each part. Pay attention to the following points:

          • The UploadPart method requires that all but the last Part be larger than or equal to 5 MB. However, the Upload Part interface does not immediately verify the size of the Upload Part. It verifies only in case of Complete Multipart Upload.
          • To ensure no error occurs to data in the network transmission process, it is recommended that you use the Content-MD5 value returned by each part BOS to verify the correctness of the uploaded part data respectively after UploadPart. When all part data is combined into one object, it no longer contains the MD5 value.
          • Part numbers range from 1 to 10,000. If this range is exceeded, BOS will return the error code of InvalidArgument.
          • Every time you upload a Part, locate the stream to the location corresponding to the beginning of the uploaded part.
          • After the Part is uploaded each time, the returned results of BOS contain eTag and partNumber, which shall be saved in the part_list. The type of part_list is array, in which each element is a harsh, and each harsh contains two keywords: one of which is partNumber, and the other one is eTag; we will use it later in the steps to complete the chunked upload.

          Complete Multipart Upload

          Complete the Multipart Upload as shown in the following code:

          client.complete_multipart_upload(bucket_name, object_name, upload_id, part_list)

          The part_list in the above-mentioned codes is the part list saved in Step 2, and after BOS receives the Part list submitted by the user, it will verify the effectiveness of each datum one after one. When all data Parts are verified, BOS will combine these data parts into a complete object.

          Cancel Multipart Upload

          The user can use abort_multipart_upload method to cancel the chunked upload.

          client.abort_multipart_upload(bucket_name, object_name, upload_id)

          Get Unfinished Multipart Upload Event

          You can use the list_multipart_uploads method to get unfinished Multipart Upload events in the bucket.

          response = client.list_multipart_uploads(bucket_name)
          puts response['bucket']
          puts response['uploads'][0]['key']

          Note:

          1.By default, if the number of multipart upload event is more than 1,000, only 1,000 objects are returned, and the value of IsTruncated in returned result is True, and meanwhile, NextKeyMarker is returned as the starting point of the next reading. 2.To return more multipart upload events, you can use KeyMarker parameter for reading by time.

          Get All Uploaded Part Information

          You can use the list_parts method to get all uploaded parts in an uploaded event.

          response = client.list_parts(bucket_name, object_name, upload_id)
          puts response['bucket']
          puts response['uploads'][0]['key']

          Note:

          1.By default, if the number of multipart upload event in bucket is more than 1,000, only 1,000 objects are returned, and the value of IsTruncated in returned result is True, and meanwhile, NextPartNumberMarker is returned as the starting point of the next reading. 2.To return more multipart upload events, you can use PartNumberMarker parameter for reading by time.

          Breakpoint Continued Upload

          When a user uploads a large file to BOS, if the network is unstable or the program crashes, the entire upload fails, and the part uploaded before the failure is invalid, so the user has to start over again. This is not only a waste of resources, in the case of network instability, it cannot complete the upload even after multiple retries. Based on the above scenarios, BOS provides the ability to continue uploading at breakpoints:

          • Under normal network conditions, it is recommended to use the three-step upload method to divide the object into 1Mb part. Refer to Multipart Upload.
          • When you have a poor network condition, it is recommended to use appendObject method for breakpoint resume, and append a small data 256kb, please see Append Upload

          Tips

          • Breakpoint continued upload is the encapsulation and enhancement of multipart upload. It is realized through multipart upload.
          • When the file is large or the network environment is poor, it is recommended to upload it in parts.

          Download File

          BOS Ruby SDK provides rich file download interfaces, and you can download the files from BOS through the following ways:

          • Simple streaming download
          • Download to local file
          • Breakpoint continued download
          • Range download

          Simple Streaming Download

          You can read object in a stream through the following codes:

          client.get_object_as_string(bucket_name, object_name)

          Download Object to File

          You can download object to the specified rules by reference to the following codes:

          client.get_object_to_file(bucket_name, object_name, file_name)

          Range Download

          To realize more functions, you can specify the download range by configuring range parameters to obtain a more refined object. If the specified download range is 0-100, the 0-100th (including) byte of data is returned, 101 bytes of data in total, i.e. [0, 100]. The format of RANGE parameters is array (offset, enset), among which two variables are long integers, with the unit of byte. You can use this function for segmented download of file and breakpoint continued upload.

          range = [0,100]
          client.get_object_as_string(bucket_name, object_name, range)

          Other Methods

          Get Storage Type of Object

          The storage class attributes of object are classified into STANDARD(standard storage), STANDARD_IA(infrequency storage) and COLD (cold storage). They can be implemented by the following code:

          response = client.get_object_meta_data(bucket_name, object_name)
          puts response[Http::BOS_STORAGE_CLASS];

          Get ObjectMetadata Only

          The user can also only obtain ObjectMetadata by get_object_meta_data method, rather than physicals of object. As shown in the following code:

          response = client.get_object_meta_data(bucket_name, object_name)
          puts response['etag'];

          The callable parameters in parser class returned by the get_object_meta_data method include:

          Parameter Description
          content-type Type of object
          content-length Size of object
          content-md5 MD5 of object
          etag Entity tag of HTTP protocol for object
          x-bce-storage-class Storage type of object
          user-metadata If userMetadata custom meta is specified in Putobject, return this item.

          Change File Storage Level

          As mentioned above, BOS supports assigning STANDARD (standard storage), STANDARD_IA (infrequency storage) and COLD (cold storage) to files. Meanwhile, BOS Ruby SDK also supports users to change storage type for the specific files.

          The parameters involved are as follows:

          Parameter Description
          x-bce-storage-class Specify the storage type of object, STANDARD_IA for infrequency storage, COLD for cold storage; and when the type is not specified, it is standard storage by default.

          The following is an example:

          options = { 
                      Http::BOS_STORAGE_CLASS => 'STANDARD_IA'
          }
          
          # Conversion from standard storage to infrequency storage 
          client.copy_object(bucket_name, object_name, bucket_name, object_name, options)
          puts client.get_object_meta_data(bucket_name, object_name)[Http::BOS_STORAGE_CLASS]
          
          options = { 
                      Http::BOS_STORAGE_CLASS => 'COLD'
          }
          
          # Conversion from standard storage to infrequency storage
          client.copy_object(bucket_name, object_name, bucket_name, object_name, options)
          puts client.get_object_meta_data(bucket_name, object_name)[Http::BOS_STORAGE_CLASS]

          Get File Download URL

          You can obtain URL of object by reference to the following codes:

          options = { 'expiration_in_seconds' => 360,
                      'timestamp' => Time.now.to_i
          }
          
          puts client.generate_pre_signed_url(bucket_name, object_name, options)

          Note:

          • Before calling this function, the user needs to manually set endpoint as the domain name of the region. Baidu AI Cloud currently has opened access to multi-region support, please refer to Region Selection Description. Currently, it supports "North China-Beijing", "South China-Guangzhou" and "East China-Suzhou". Beijing: http://bj.bcebos.com; Guangzhou: http://gz.bcebos.com; Suzhou: http://su.bcebos.com
          • EXPIRATION_IN_SECONDS is the effective time length of specified URL, the time is calculated from current time, is optional, and is 1,800 sec by default in the system is not configured. If it is to set it as a time that never expires, you can set the parameter of expiration_in_seconds as -1, but not other negative number.
          • TIMESTAMP is an optional parameter, and is the current time in the system by default if not configured.
          • If the file expected to get is publicly readable, the corresponding URL link can be obtained through fast splicing of simple rules: http://BucketName.$region.bcebos.com/$bucket/$object

          Enumerate Files in Storage Space

          BOS SDK allows users to enumerate objects in the following two ways:

          • Simple enumeration
          • Complex enumeration by parameters

          In addition, you can simulate folders while listing files.

          Simple Enumeration

          When you want to list the required files simply and rapidly, they can obtain object list in bucket via listobjects method.

          client.list_objects(bucket_name)

          Note:

          1.By default, if the number of objects in bucket is more than 1,000, only 1,000 objects are returned. 2.To increase the number of returned objects, you can use the marker parameter to read by several times.

          Complex Enumeration by Parameters

          In addition to the simple list above, you can realize various flexible query functions by configuring optional parameters via options. The settable parameters are as follows:

          Parameter Function
          PREFIX The object key returned by the qualification must be prefixed with prefix. setPrefix(String prefix)
          DELIMITER It is a character used to group object names. All names contain the specified prefix and appear for the first time. object between Delimiter characters as a set of elements: CommonPrefixes
          MARKER The set result returns from the first alphabetically sorted after the marker.
          MAX_KEYS Limit the maximum number of objects returned, and if the maximum number is not set, it is 100 by default, and the value of max-keys cannot be more than 1,000.

          Note:

          1.If an object is named after Prefix, when you only use Prefix for query, all Keys returned still contain the objects named after Prefix, as shown in Recursively List All Files in the Directory. 2.If an object is named after Prefix, when you query with the combination of Prefix and Delimiter, all Keys returned contain Null, the name of which does not contain the Prefix, see View Files and Subdirectories Under the Directory.

          Next, we use several cases to illustrate the method of parameter enumeration:

          Specify the Maximum Number of Returned Entries.

          # Specify the maximum number of returned entries to be 500
          options = { 
                      maxKeys: 500
          }
          puts client.list_objects(bucket_name, options)

          Return the Object with the Specified Prefix.

          # Specify return of object prefixed by usr
          options = { 
                      prefix: 'usr'
          }
          puts client.list_objects(bucket_name, options) 

          Return from the Specified Object.

          # You can define an object not to be included, and return from it
          options = { 
                      marker: 'object'
          }
          puts client.list_objects(bucket_name, options)

          Page to Get All Objects

          You can set a maximum of 500 records per page.

          options = { 
                      maxKeys: 500
          }
          
          is_truncated = true
          while is_truncated 
              res = client.list_objects(bucket_name, options)
              is_truncated = res['isTruncated']
              options[:marker] = res['nextMarker'] unless res['nextMarker'].nil?
          end

          Page the Results after Getting All Specific Objects

          You can set up to 500 records per page and get them from a specific object.

          options = { 
                      maxKeys: 5,
                      marker: 'object'
          }
          
          is_truncated = true
          while is_truncated 
              res = client.list_objects(bucket_name, options)
              is_truncated = res['isTruncated']
              options[:marker] = res['nextMarker'] unless res['nextMarker'].nil?
          end

          The parameters available for calling in parsing class returned by listobjects method include:

          Parameter Description
          name bucket name
          prefix Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements
          marker Starting point of this query
          maxKeys Maximum number of requests returned
          isTruncated It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
          contents Container of an object returned
          +key object name
          +lastModified Last time this object was modified
          +eTag Entity tag of HTTP protocol for object
          +storageClass Storage form of object
          +size Content size of object (number of bytes)
          +owner User information of bucket corresponding to object
          ++id User ID of bucket Owner
          ++displayName Name of bucket Owner

          Simulate Folder Function

          No concept of folder exists in BOS storage results. All elements are stored in object, but BOS users often need folders to manage files when using data.

          Therefore, BOS provides the ability to create simulated folders, which essentially creates an object with size of 0. You can upload and download this object, but the console displays it as a folder for objects ending with "/".

          You can simulate the folder function through the combination of Delimiter and Prefix parameters. The combination of Delimiter and Prefix works like this:

          If setting Prefix to a folder name, you can list the files that begin with Prefix, that is, all the recursive files and subfolders (directories) under the folder. The file name is displayed in Contents.

          If Delimiter is set to "/" again, the return value only lists the files and subfolders (directories) under the folder. The names of subfiles (directories) under the folder are returned in the CommonPrefixes section, and the recursive files and folders under the subfolders are not displayed.

          Here are some application modes:

          List All Files in the Bucket

          When you need to get all files under bucket, refer to Page to Get All objects.

          Recursively List All Files in the Directory

          All files under dir directory can be obtained by configuring Prefix parameter:

          options = { 
                      prefix: 'dir/'
          }
          
          is_truncated = true
          while is_truncated 
              res = client.list_objects(bucket_name, options)
              is_truncated = res['isTruncated']
              options[:marker] = res['nextMarker'] unless res['nextMarker'].nil?
          end

          View Files and Subdirectories Under the Directory

          When prefix is combined with delimiter, the files and subdirectories under dir directory can be listed:

          options = { 
                      prefix: 'dir/',
                      delimiter: '/'
          }
          
          is_truncated = true
          while is_truncated 
              res = client.list_objects(bucket_name, options)
              is_truncated = res['isTruncated']
              options[:marker] = res['nextMarker'] unless res['nextMarker'].nil?
          end

          List the Storage Attributes of Objects in Bucket

          After uploading, if you need to view the storage class attribute of all objects in the specified bucket, you can use the following code:

          res = client.list_objects(bucket_name)
          
          res['contents'].each { |obj| puts obj['storageClass'] } 

          Object Privilege Control

          Set Access Privilege of Object

          The following codes set the privilege of object as private:

          client.set_object_canned_acl(bucket_name, object_name, Http::BCE_ACL  => 'private')

          For the specific contents of privilege, please see <BOS API Document Object Access Control>.

          Set Access Privilege of Specified Users to Object

          BOS provides set_object_acl method and set_object_canned_acl method to configure the right to access the object of specified user, you can refer to the following code:

          1.To set access right of specified user by x-bce-grant-read and x-bce-grant-full-control of set_object_canned_acl

          id_permission = "id=\"8c47a952db4444c5a097b41be3f24c94\",id=\"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\""
          client.set_object_canned_acl(bucket_name, object_name, 'x-bce-grant-read' => id_permission)
          
          id_permission = "id=\"8c47a952db4444c5a097b41be3f24c94\",id=\"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\""
          client.set_object_canned_acl(bucket_name, object_name, 'x-bce-grant-full-control' => id_permission)

          2.Set object privilege through set_object_acl

          acl = [{'grantee' => [{'id' => 'b124deeaf6f641c9ac27700b41a350a8'},
                                {'id' => 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'}],
                  'permission' => ['FULL_CONTROL']
          }]
          client.set_object_acl(bucket_name, object_name, acl)

          Note:

          1.The privilege setting in privilege contains 2 values: READ, FULL_CONTROL, which correspond to relevant privilege directly. 2.When setting above 2 (inclusive) authorized persons, please see format of the example above, and if data are combined, an error is returned.

          View Object Privilege

          The following codes enable it to view object privilege:

          client.get_object_acl(bucket_name, object_name)

          The callable parameters in parser class returned by the get_object_acl method include:

          Parameter Description
          accessControlList Identify privilege list of object
          grantee Identify authorized person
          -id Authorized person ID.
          privilege Identify the privilege of the authorized person.

          Delete Object Privilege

          The following codes enable it to delete object privilege:

          client.delete_object_acl(bucket_name, object_name)

          Delete File

          Delete a single file

          You can refer to the following code to delete an object:

          client.delete_object(bucket_name, object_name) 

          Check if the File Exists

          You can check whether a file exists through the following operations:

          begin
              client.get_object_meta_data(bucket_name, object_name)
          rescue BceServerException => e
              puts "#{object_name} not exist!" if e.status_code == 404    
          end

          Get and Update Object Metadata

          Object metadata is the attribute description of files uploaded by users to BOS. It includes two types: HTTP Headers and User Meta (custom metadata).

          Get Object Metadata

          Refer to Get ObjectMetadata Only.

          Modify Object Metadata

          BOS modifies object's Metadata by copying object. That is, when copying object, set the destination bucket as the source bucket, set the destination object as the source object, and set a new Metadata to modify Metadata through copy. If the Metadata is not set, an error is reported.

          user_metadata = { "key1" => "value1" }
          options = {
                      'user-metadata' => user_metadata
          }
          
          client.copy_object(bucket_name, object_name, bucket_name, object_name, options)
          puts client.get_object_meta_data(bucket_name, object_name)['user-metadata']['key1']

          Copy Object

          Copy File

          You can copy an object through the Copyobject function, as shown in the following code:

          client.copy_object(source_bucket_name, source_object_key, target_bucket_name, target_object_key)

          copy_object method enables it to configure optional parameters via options, with parameter list as follows:

          Parameter Description
          user-metadata User defined Meta, containing Key-Value pair
          eTag Etag of Source object, if upload is selected, Etags of Target object and Source object are compared, and if they are not the same, an error is returned.

          Synchronize Copy

          The Copyobject interface of the current BOS is implemented through synchronization. In synchronization mode, the BOS server returns successfully after Copy is completed. Synchronous copy can help users judge the copy status, but the copy time perceived by users will be longer, and the copy time is proportional to the file size.

          Synchronous Copy is more in line with industry conventions and improves compatibility with other platforms. Synchronous Copy also simplifies the business logic of BOS server and improves service efficiency.

          Multipart Upload Copy

          In addition to copying files through Copyobject, BOS also provides another copy mode, namely, Multipart Upload Copy. You can use Multipart Upload Copy in the following application scenarios (but not limited to this), such as:

          • Breakpoint copy support is required.
          • The file to copy is larger than 5 GB.
          • Network conditions are poor, and connections to BOS services are often disconnected.

          Next, the three-step copy will be introduced step by step.

          The three-step copy includes init, "copy chunking" and complete, among which the operations of init and complete are the same as the chunked upload, with direct reference to Initialize Multipart Upload and Complete Multipart Upload.

          left_size = client.get_object_meta_data(source_bucket_name, source_object_key)['content-length']
          offset = 0
          part_number = 1
          part_list = []
          
          while left_size > 0 do
              part_size = 5 * 1024 * 1024
              if left_size < part_size
                  part_size = left_size
              end
          
              response = client.upload_part_copy(
                  source_bucket_name, source_object_key, target_bucket_name, target_object_key, upload_id, part_number, part_size, offset)
              left_size -= part_size
              offset += part_size
              # your should store every part number and etag to invoke complete multi-upload
              part_list << {
                  "partNumber" => part_number,
                  "eTag" => response["eTag"]
              }
              part_number += 1
          end

          Note: size is in byte, the size of each part is defined, and except for the final Part, other parts are more than 5MB.

          Previous
          Bucket Management
          Next
          sdk Log