百度智能云

All Product Document

          Object Storage

          File Management

          Upload Files

          In BOS, the basic unit of data for user operations is object. The number of objects in bucket is not limited, but an object is permitted to store 5TB data maximally. object includes Key, Meta and Data. Key is the name of object. Meta is the user's description of the object and consists of a series of Name-Value pairs. Data is the data of the object.

          BOS Python SDK provides rich file upload interfaces, which can upload files in the following ways:

          • Simple Upload
          • Append Upload
          • Multipart Upload
          • Breakpoint Continued Upload

          Naming specification of object is as follows:

          • Use UTF-8 code.
          • The length must be between 1 and 1023 bytes.
          • The first letter cannot be /, no @ character is allowed, and @ is used for picture processing interface.

          Simple Upload

          BOS supports the execution of object upload in file, data stream and string forms in the scene of simple upload, please see the following codes:

          1.The following code can be used to upload object

          data = open(file_name, 'rb')
          #To upload object in data stream form, you need to calculate data length content_length by yourselves
          #You need to ontent_md5 by yourselves. The calculation method is to execute md5 algorithm for data to obtain 128-bit binary data, and then execute base 64 coding
          bos_client.put_object(bucket_name, object_key, data, content_length,content_md5)
          
          #object uploaded from string
          bos_client.put_object_from_string(bucket_name, object_key, string)
          
          #object uploaded from file
          bos_client.put_object_from_file(bucket_name, object_key, file_name)

          Where, data is stream object, different processing methods are adopted for different types of objects, the return of StringIO is used for the upload from string, and return of open() is used for the upload from file, so BOS provides an encapsulated interface for users to upload rapidly.

          object is uploaded to BOS in file form, and put_object related interfaces support upload of object no more than 5GB. After put_object, put_object_from_string or put_object_from_file requests are processed successfully, BOS returns ETag of object as the file identification.

          All these interfaces have optional parameters:

          Parameter Description
          content_type Type of uploaded file or string
          content_md5 File data verification, BOS will enable MD5 verification of file content after setting. Comparing the MD5 you provided with the MD5 of the file, an error will be thrown if it is inconsistent.
          content_length Define file length, and put_object_from_string() does not contain this parameter
          content_sha256 Used for file check
          user_metadata custom metadata
          storage_class Set file storage type
          user_headers User defined header

          The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.The following is an example:

          import io
          import hashlib
          import base64
          
          file_name = "your_file"
          buf_size = 8192
          fp = open(file_name, 'rb')
          md5 = hashlib.md5()
          while True:
              bytes_to_read = buf_size
              buf = fp.read(bytes_to_read)
              if not buf:
                  break
              md5.update(buf)
          content_md5 = base64.standard_b64encode(md5.digest())

          Set Object Metadata

          Object metadata is the attribute description of the file when the user uploads the file to BOS. It is mainly divided into two types: Set HTTP Headers and custom metadata.

          Set the Http Header of Object BOS Python SDK is to call the background HTTP interface in essence, so you can customize the Http Header of object when uploading files. Common http headers are described as follows:

          Name Description Default value
          Cache-Control It specifies the caching behavior of the web page when the object is downloaded. None
          Content-Encoding It represents in which way the message body encodes and converts contents None
          Content-Disposition It instructs the MINME user agent how to display additional files, open or download, and file names. None
          Expires Cache expiration time None

          The reference codes are as follows:

          • Upload object with specific header from string

            user_headers = {"header_key":"header_value"}
            #Upload object with specific header from string 
            bos_client.put_object_from_string(bucket=bucket_name, 
                                              key=object_key, 
                                              data=string,
                                              user_headers=user_headers)
            #Upload object with specific header from file 
            bos_client.put_object_from_file(bucket=bucket_name,
                                        key=object_key,
                                        file_name=file,
                                        user_headers=user_headers)

          Custom Metadata

          Custom metadata is available under BOS for object description. As shown in the following code:

          #custom metadata 
          user_metadata = {"name":"my-data"}
          #Upload object with meta customized by users from string
          bos_client.put_object_from_string(bucket=bucket_name, 
                                            key=object_key, 
                                            data=string,
                                            user_metadata=user_metadata)
          #Upload object with meta customized by users from file
          bos_client.put_object_from_file(bucket=bucket_name,
                                      key=object_key,
                                      file_name=file,
                                      user_metadata=user_metadata)

          Tips:

          • As for the above code, users have customized a metadata of which the name is "name" and the value is "my-data".
          • When users download this object, they can get metadata together.
          • One object possesses similar parameters, but the total size of User Meta bellows 2KB.

          Set the Copy Attribute of Object

          BOS provides a copy_object interface to copy an existing object to another object, and in the process of copying, Etag or modification status of source object is judged, and whether to execute copy is decided according to the judgment result. The following shows parameters in detail:

          Name Type Description Required or not
          x-bce-copy-source-if-match String If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise the copy fails. No
          x-bce-copy-source-if-none-match String If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise copy fails. No
          x-bce-copy-source-if-unmodified-since String If source object is not modified after x-bce-copy-source-if-unmodified-since, copy operation is performed, otherwise copy fails. No
          x-bce-copy-source-if-modified-since String If source object is modified after x-bce-copy-source-if-modified-since, copy operation is performed, otherwise copy fails. No

          Corresponding sample code:

          copy_object_user_headers = {"copy_header_key":"copy_header_value"}
          
          bos_client.copy_object(source_bucket_name = bucket_name, 
                                 source_key = object_name, 
                                 target_bucket_name = bucket_name, 
                                 target_key = object_name, 
                                 user_metadata = user_metadata,
                                 user_headers = user_headers,
                                 copy_object_user_headers = copy_object_user_headers)

          Set Storage Type When Uploading Object

          BOS supports standard storage, infrequency storage, cold storage and archive storage, and the storage is realized via the specified StorageClass when object is uploaded and stored as a storage type, with default standard storage, and the corresponding parameters of the 4 storage types are as follows:

          Storage type Parameter
          Standard storage STANDARD
          infrequency storage STANDARD_IA
          Cold storage COLD
          Archive storage ARCHIVE

          With infrequency storage and archive storage as an example, the codes are as follows:

          from baidubce.services.bos import storage_class
          #Upload object of cold storage type from file
          bos_client.put_object_from_file(bucket=bucket_name,
                                        key=object_key,
                                        file_name=file,
                                        storage_class=storage_class.COLD)
          #Upload object of cold storage type from string 
          bos_client.put_object_from_string(bucket=bucket_name, 
                                          key=object_key, 
                                          data=string,
                                          storage_class=storage_class.COLD)
          #Upload object of archive storage type from file 
          bos_client.put_object_from_file(bucket=bucket_name,
                                        key=object_key,
                                        file_name=file,
                                        storage_class=storage_class.ARCHIVE)

          Append Upload

          In the simple upload method introduced above, the objects created are of Normal type, and you cannot append, as it is inconvenient to use in scenarios where data copying is frequent, such as log, video monitoring and live video.

          Because of this, Baidu AI Cloud BOS supports appendObject, i.e. upload of file in the form of appending write. The object created by the appendObject operation is of Appendable object. And you can append data to the object. appendObject size is 0-5G. The archive storage type does not support appended upload.

          The sample code uploaded through appendObject is as follows:

          #Upload appendable object. Where "content_md5(data)" represents that you need to calculate the md5 value of uploaded data by themselves 
          #The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.See the section "simple upload" above for example
          #Where "content_length(data)" represents that you need to calculate the length of uploaded data by themselves 
          response = bos_client.append_object(bucket_name=bucket_name, 
                                              key=object_key,
                                              data=data,
                                              content_md5=content_md5(data), content_length=content_length(data))
          #Obtain the position of appending write next time
          next_offset = response.metadata.bce_next_append_offset
          bos_client.append_object(bucket_name=bucket_name, 
                                   key=object_key,
                                   data=next_data,
                                   content_md5=content_md5(next_data), content_length=content_length(next_data),
                                   offset=next_offset)
          
          #Upload an appendable object from string
          from baidubce.services.bos import storage_class
          bos_client.append_object_from_string(bucket_name=bucket_name,
                                               key=object_key,
                                               data=string,
                                               offset=offset,
                                               storage_class=storage_class.STANDARD,
                                               user_headers=user_headers)

          Multipart Upload

          In addition to uploading file to BOS via putObject interface, BOS provides another upload mode, Multipart Upload. You can use Multipart Upload mode in the following application scenarios (but not limited to this), such as:

          • Breakpoint upload support is required.
          • The file to upload is larger than 5 GB.
          • The network conditions are poor, and the connection with BOS servers is often disconnected.
          • The file needs to be uploaded streaming.
          • The size of the uploaded file cannot be determined before uploading it.

          Multipart Upload step by step is introduced below.

          Initialize Multipart Upload

          BOS uses initiate_multipart_upload method to initialize a multipart upload event:

          upload_id = bos_client.initiate_multipart_upload(bucket_name, object_key).upload_id

          This method returns InitMultipartUploadResponse object which contains uploadId parameter to represent the current upload event.

          Initialization of Multipart Upload with Specific Header

          bos_client.initiate_multipart_upload(bucket_name=bucket, 
                                               key=object_key,
          									 user_headers=user_headers)

          Where the settable attributes of header include: "Cache-Control", "Content-Encoding", "Content-Disposition", "Expires", interfaces get-object and get-object-meta return the 4 headers set.

          Initialization of Multipart Load of Infrequent Storage, Cold Storage and Archive Storage

          storage_class needs to be specified for the initialization of multipart upload of infrequent storage, see the following codes (and so on for cold storage):

          from baidubce.services.bos import storage_class
          
          bos_client.initiate_multipart_upload(bucket_name=bucket, 
                                                   key=object_key,
                                                   storage_class = storage_class.STANDARD_IA) 

          Upload in Parts

          Upon initialization, perform multipart upload:

          left_size = os.path.getsize(file_name)
          # left_size is used to set starting position of part
          # Set the offset starting position of part
          offset = 0
          
          part_number = 1
          part_list = []
          
          while left_size > 0:
          	# Set each part to 5 MB
              part_size = 5 * 1024 * 1024
              if left_size < part_size:
                  part_size = left_size
          
              response = bos_client.upload_part_from_file(
                  bucket_name, object_key, upload_id, part_number, part_size, file_name, offset)
          
          
              left_size -= part_size
              offset += part_size
              part_list.append({
                  "partNumber": part_number,
                  "eTag": response.metadata.etag
              })
          
              part_number += 1

          Note:

          1. The offset parameter is the starting offset position of the part, in bytes.
          2. size is in byte, the size of each part is defined, and except for the final Part, other parts are more than 5MB. However, Upload Part interface does not check the size of uploaded Part, but checks it only calling complete_multipart_upload().
          3. To ensure no error of data in the process of network transmission, you are recommended to use the Content-MD5 value returned by each part BOS to respectively verify the validity of part data uploaded after Upload Part. When all part data is combined into one object, it no longer contains the MD5 value.
          4. Part numbers range from 1 to 10,000. If this range is exceeded, BOS returns the error code of InvalidArguement.
          5. When uploading Part each time, position stream to the position corresponding to the beginning of upload part.
          6. After uploading Part each time, returned result of BOS contains an etag and part number (partNumber), which will be used in the subsequent multipart upload, so it needs to be saved. Generally, these etag and partNumber are saved in List.

          Complete Multipart Upload

          bos_client.complete_multipart_upload(bucket_name, object_key, upload_id, part_list)

          Where, the type of part_list is list, each element is a dict in it, and each dict contains 2 key words, partNumber and eTag.

          The following is an example:

          [{'partNumber': 1, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 2, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 3, 'eTag': '93b885adfe0da089cdf634904fd59f71'}]

          The parameters available for calling in parsing class returned by this method include:

          Parameter Description
          bucket bucket name
          key object name
          e_tag ETag for each upload chunk
          location URL of object

          Note: The ETag contained in this object is to upload the ETag of each part in the process of uploading part, and after receiving the list of Part submitted by the users, BOS verifies the validity of each data part one by one. When all data Parts are verified, BOS will combine these data parts into a complete object.

          Cancel Multipart Upload Event

          You can use abort_multipart_upload method to cancel multipart upload:

          bos_client.abort_multipart_upload(bucket_name, object_key, upload_id = upload_id)

          Get Unfinished Multipart Upload Event

          Users can use the following 2 methods to obtain the uncompleted multipart events in bucket:

          Method 1:

          response = bos_client.list_multipart_uploads(bucket_name)
          for item in response.uploads:
          	print item.upload_id

          For list_multipart_uploads, BOS returns at most 1,000 Multipart Upload each time, and BOS supports prefix and delimiter filtering.

          The parameters of list_multipart_uploads available for calling method also include:

          Name Type Description Required or not
          delimiter String Delimiter; mainly implements the logic of list folder No
          key_marker String After object is sorted in lexicographic order, this time it returns from the one after keyMarker. No
          max_uploads Int The maximum number of Multipart Uploads returned by this request, with default of 1,000, maximum of 1,000 No
          prefix String key prefix, object key restricted to return must be prefixed with this No

          The parameters available for calling in parsing class returned by list_multipart_uploads method include:

          Parameter Description
          bucket bucket name
          key_marker Name of part object started to be uploaded
          next_key_marker This item is returned only when delimiter is specified and IsTruncated is true, as the value to enquire marker next time.
          is_truncated It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
          prefix Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements
          common_prefixes This item is returned only when delimiter is specified
          delimiter Query terminator
          max_uploads Maximum number of requests returned
          uploads Container of all multipart events not completed
          owner User information of corresponding bucket
          id User ID of bucket Owner
          display_name Name of bucket Owner
          key object name of part
          upload_id Multipart upload id
          initiated Starting time of multipart upload

          list_all_multipart_uploads method returns the generator of uploads, and is not limited by return of at most 1,000 results each time, and all results are returned.

          Method 2:

          uploads = list_all_multipart_uploads(bucket_name)
          for item in uploads:
              print item.upload_id

          Get All Uploaded Part Information

          You can use the following 2 methods to obtain all uploaded parts in an upload event:

          Method 1:

          response = bos_client.list_parts(bucket_name, object_key, upload_id)
          for item in response.parts:
              print  item.part_number

          Note: 1. BOS is sorted in ascending order of PartNumber. 2. It is not recommended to generate Part list of the final CompleteMultipartUpload with the result from ListParts because network transfers can go wrong.

          The parameters of list_parts method available for calling also include:

          Name Type Description Required or not
          max_parts Int The maximum number of parts returned by BOS at one time, with default of 1,000, maximum of 1,000 No
          part_number_marker Int Sort by partNumber. The starting part of this request is returned from the next of this partNumber. No

          The parameters available for calling in parsing class returned by list_parts method include:

          Parameter Description
          bucket bucket name
          key object name
          initiated Starting time of current multipart upload
          max_parts Maximum number of requests returned
          is_truncated It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
          storage_class The storage type of object is currently classified into standard type STANDARD, infrequency type STANDARD_IA, cold storage type COLD and archive type ARCHIVE.
          part_number_marker Flag bit of part starting
          parts Part list, list type
          +part_number Part number
          +last_modified Last modification time of the part
          +e_tag ETag for each upload chunk
          +size Size of part contents (number of bytes)
          upload_id Current multipart upload id
          owner User information of corresponding bucket
          +id User ID of bucket owner
          +display_name Name of bucket owner
          next_part_number_marker The partNumber of the last record returned in the current request can be used as part_number_marker of the next request

          Method 2:

          parts = list_all_parts(bucket_name, object_key, upload_id = upload_id)
          for item in parts:
              print item.part_number

          list_all_parts method returns the generator of parts, and is not limited by return of at most 1,000 results each time, and all results are returned.

          Obtain storage type of multipart upload object

          response = bos_client.list_parts(bucket_name=bucket, 
                            key=object_key, 
                            upload_id=upload_id)
          
          print response.storage_class

          Package block upload

          In Python SDK, BOS provides users with put_super_obejct_from_file() interface,

          In Python SDK, BOS provides users with put super obejct from file() interface,It is related to block upload initiate_multipart_upload、upload_part_from_file、complete_multipart_upload three methods to package,users only need to call the interface to complete block upload.

          import multiprocessing
          file_name = "/path/to/file.zip"
          result = bos_client.put_super_obejct_from_file(bucket_name, key, file_name,
                      chunk_size=5, thread_num=multiprocessing.cpu_count())
          if result:
              print "Upload success!"

          The parameters that can be called by the method are:

          Name Type Describe Required or not
          chunk_size int Block size, in MB. The default is 5MB No
          thread_num int In block upload, the number of threads in the thread pool is equal to the number of CPU cores by default No

          If a large file takes a long time and the user wants to end the block upload, the cancel() method in UploadTaskHandle can be called to cancel the block upload operation. Examples are as follows

          import threading
          from baidubce.services.bos.bos_client import UploadTaskHandle
          file_name = "/path/to/file.zip"
          uploadTaskHandle = UploadTaskHandle()
          t = threading.Thread(target=bos_client.put_super_obejct_from_file, args=(bucket_name, key, file_name),
                  kwargs={
                      "chunk_size": 5,
                      "thread_num": multiprocessing.cpu_count(),
                      "uploadTaskHandle": uploadTaskHandle
                      })
          t.start()
          time.sleep(2)
          uploadTaskHandle.cancel()
          t.join()

          Breakpoint Continued Upload

          When a user uploads a large file to BOS, if the network is unstable or the program crashes, the entire upload fails, and the part uploaded before the failure is invalid, so the user has to start over again. This is not only a waste of resources, in the case of network instability, it cannot complete the upload even after multiple retries. Based on the above scenarios, BOS provides the ability to continue uploading at breakpoints:

          • Under normal network conditions, it is recommended to use the three-step upload method to divide the object into 1Mb part. Refer to Multipart Upload.
          • When you have a poor network condition, it is recommended to use appendobject method for breakpoint resume, and append a small data 256kb, please see Append upload

          Tips

          • Breakpoint continued upload is the encapsulation and enhancement of multipart upload. It is realized through multipart upload.
          • When the file is large or the network environment is poor, it is recommended to upload it in parts.

          Download File

          BOS Python SDK provides rich file download interfaces, and you can download files from BOS in the following ways:

          • Simple streaming download
          • Download to local file
          • Downloaded as string
          • Breakpoint continued download
          • Range download

          Simple Object Reading

          You can read object in a stream through the following codes:

          response = bos_client.get_object(bucket_name, object_key)
          s = response.data
          
          # Process object
          ...
          
          # Close stream
          response.data.close()

          Download Object to File or String Directly

          You can download object to the specified rules by reference to the following codes:

          bos_client.get_object_to_file(bucket_name, object_key, file_name)

          The user can download the object to the string by referring to the following code:

          result = bos_client.get_object_as_string(bucket_name, object_key)
          print result

          Range Download

          To realize more functions, you can specify the download range by using the designated 'range' parameter to obtain a more refined object. If the specified download range is 0-100, the 0-100th (including) byte of data is returned, 101 bytes of data in total, i.e. [0, 100].

          range = [0,1000]
          #Return the object data within the specified range
          print bos_client.get_object_as_string(bucket_name, object_key, range = range)
          #Return the object data within the specified range to files 
          bos_client.get_object_to_file(bucket_name, object_key, file_name, range = range) 

          Set the range to return object by the range parameters of get_object_as_string and get_object_to_file. You can use this function for segmented download of file and breakpoint continued upload.

          Other Methods

          Get Storage Type of Object

          The attribute of object's storage class can be divided into (STANDARD) (standard storage), (STANDARD_IA) (infrequent storage), (COLD) (cold storage) and (ARCHIVE) (archival storage), and the attribute of object's storage class can be obtained by the following codes:

          response = bos_client.get_object_meta_data(bucket_name, object_key)
          print response.metadata.bce_storage_class

          Get ObjectMetadata Only

          Only metadata of object can be obtained by get_object_meta_data method, rather than physical objects of object. As shown in the following code:

          response = bos_client.get_object_meta_data(bucket_name, object_key)

          The callable parameters in parser class returned by the get_object_meta_data method include:

          Parameter Description
          content_length Size of object
          e_tag Entity tag of HTTP protocol for object
          bce_meta If the customized meta of user_metadata is specified in Putobject, this item () returns
          storageClass Storage type of object
          bce_restore The archival storage object returns when it is being retrieved or has been retrieved. If the value of the archival object bce_restore under retrieval is ongoing-request="true"; if the value of the archival object bce_restore is ongoing-request="false", expiry-date="Wed, 07 Nov 2019 00:00:00 GMT". Among them, expiry-date indicates the expiration time after the retrieval of the object, which is Greenwich Mean Time.

          Retrieve the Files of Archive Storage Type

          Retrieve the Archive Files

          After the user uploads the archive files, the archive files must be frozen; to download the archive files, the user must retrieve the archive files at first. The requester must have the access to read the archive files, and the archive files are frozen.

          The example for the retrieval of the archive files is shown as follows:

          # Retrieve the archive files and set the duration after the unfreezing to be 2 days 
          bos_client.restore_object(bucket_name, target_key, days=2)

          Determine if the Archive File Is Retrieved

          For the archive storage object, if it returns when it is being retrieved or has been retrieved, bce_restore returns when obtaining ObjectMetadata. If the value of the archive object bce_restore under retrieval is ongoing-request="true"; if the value of the archive object bce_restore is ongoing-request="false", expiry-date="Wed, 07 Nov 2019 00:00:00 GMT". expiry-date indicates expiry time after object is retrieved.

          The following is an example:

          response = bos_client.get_object_meta_data(bucket_name, object_key)
          if response.metadata.bce_restore is not None:
              if response.metadata.bce_restore.find("expiry-date") >= 0:
                  print("archive object is restored.")
              else:
                  print("archive object is restoring.")
          else:
              print("archvie object is freezed.")

          Change File Storage Level

          As mentioned above, BOS supports four types of storage: 'STANDARD' (standard storage), 'STANDARD_IA' (infrequent storage), 'COLD' (cold storage) and 'ARCHIVE' (archival storage). Meanwhile, BOS python SDK also supports users to change storage type for the specific files. The parameters involved are as follows:

          Parameter Description
          x-bce-storage-class Designate the storage class of object, 'STANDARD' (standard storage), STANDARD_IA represents the infrequent storage; COLD represents the cold storage and ARCHIVE represents archival storage; it is defaulted to be standard storage if the storage class is not specified.

          Note:

          • When you call copy_object() interface, if the source object is an archive type, restore_object() needs to be called at first to retrieve the archive file.

          The following is an example:

          # Standard storage to infrequent storage 
          bos_client.copy_object(source_bucket_name, source_key,
                              target_bucket_name, target_key,
                              storage_class = storage_class.STANDARD_IA)
          # infrequent storage to cold storage 
          bos_client.copy_object(source_bucket_name, source_key,
                              target_bucket_name, target_key,
                              storage_class = storage_class.COLD)

          Get File Download URL

          The user can get the designated URL of object by the following sample code:

          url = bos_client.generate_pre_signed_url(bucket_name, object_key, timestamp, expiration_in_seconds)

          Note:

          • Before calling this function, the user needs to manually set endpoint as the domain name of the region. Baidu AI Cloud currently has opened access to multi-region support, please refer to Region Selection Description. Currently, it supports "North China-Beijing", "South China-Guangzhou" and "East China-Suzhou". Beijing: http://bj.bcebos.com; Guangzhou: http://gz.bcebos.com; Suzhou: http://su.bcebos.com.
          • The timestamp is an option parameter, and the default value is current time when it is not configured.
          • The timestamp is a timestamp, which identifies the effective start time of URL, with timestamp=int(time.time (), and it needs to * import time.
          • The expriation_in_seconds is used to set the effective duration of URL, and it is an optional parameter, whose default value is 1800 seconds if it is not configured. To set a time not invalid permanently, expirationInSeconds parameter can be set as -1, and it cannot be set as other negative numbers.

          Enumerate Files in Storage Space

          BOS SDK allows users to enumerate objects in the following two ways:

          • Simple enumeration
          • Complex enumeration by parameters

          In addition, you can simulate folders while listing files.

          Simple Enumeration

          After completing a series of uploads, the user may need to view all object in the designated bucket, which can be realized by the following codes:

          response = bos_client.list_objects(bucket_name)
          for object in response.contents:
              print object.key

          Note:

          1. By default, only 1,000 objects are returned and the is_truncated value is True if bucket has over 1,000 objects. Besides, next_marker is returned as the starting point for the next reading.
          2. To increase the number of returned objects, you can use the marker parameter to read by several times.

          List all the objects under current bucket at one time.

          for object in bos_client.list_all_objects(bucket_name):
              print object.key

          Complex Enumeration by Parameters

          Other optional parameters of list_objects include:

          Parameter Description
          prefix The object key returned by the qualification must be prefixed with prefix.
          delimiter A character used to group the object names. All names contain the specific prefix and the object between the characters Delimiter appearing for the first time serves as a set of element: CommonPrefixes.
          max_keys Limit the maximum number of object returned, and the value is not greater than 1000; the default value is 1000 if not configured.
          marker The set value returns from the first one sorted alphabetically after Marker.

          Note:

          1.If an object is named after Prefix, all the Keys returned will still contain the object named after prefix when only the Prefix is used for query, as detailed in Recursively List All Files in Directory. 2.If an object is named after Prefix, all the Keys returned will contain Null and the Key name does not contain the Prefix when the combination of Prefix and Delimiter is used for query, as detailed in View Files and Subdirectories under Directory.

          Next, we use several cases to illustrate the method of parameter enumeration:

          Specify the Maximum Number of Returned Entries

          max_keys = 500
          # Specify the maximum number of returned entries to be 500
          response = bos_client.list_objects(bucket_name, max_keys = max_keys)
          for object in response.contents:
              print object.key

          Return the Object with the Specified Prefix

          prefix = "test"
          # Specify the returned object with test as the prefix
          response = bos_client.list_objects(bucket_name, prefix = prefix)
          for object in response.contents:
              print object.key

          Return from the Specified Object

          marker = "object"
          # You can define an object not to be included, and return from it
          response = bos_client.list_objects(bucket_name, marker = marker)
          for object in response.contents:
              print object.key

          Page to Get All Objects

          isTruncated = True
          # You can set a maximum of 500 records per page
          max_keys = 500
          marker = None
          while isTruncated:
              response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
              for object in response.contents:
              	print object.key
              isTruncated = response.is_truncated
              marker = getattr(response,'next_marker',None)

          Page the Results after Getting All Specific Objects

          # You can set up to 500 records per page and get them from a specific object
          max_keys = 500
          marker = "object"
          isTruncated = True
          while isTruncated:
          	response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
              for object in response.contents:
              	print object.key
              isTruncated = response.is_truncated
              marker = getattr(response,'next_marker',None)

          Page to Get the Object Results for All the Specified Prefixes

          # You can set the page to get the object with the specified prefix, with a maximum of 500 records per page
          max_keys = 500
          prefix = "object"
          isTruncated = True
          while isTruncated:
          	response = bos_client.list_objects(bucket_name, prefix = prefix)
          	for object in response.contents:
          	    print object.key
          	isTruncated = response.is_truncated
              marker = getattr(response,'next_marker',None)

          The callable parameters in parser class returned by the list_objects method include:

          Parameter Description
          name bucket name
          prefix Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements
          marker Starting point of this query
          max_keys Maximum number of requests returned
          is_truncated It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
          contents Container of an object returned
          +key object name
          +last_modified Last time this object was modified
          +e_tag Entity tag of HTTP protocol for object
          +size Content size of object (number of bytes)
          +owner User information of bucket corresponding to object
          ++id User ID of bucket Owner
          ++display_name Name of bucket Owner
          next_marker As long as IsTruncated is true, next_marker returns as the value to query marker for the next time.
          common_prefixes This item is returned only when delimiter is specified

          The list_all_objects method returns the Generator of contents, and is not subject to the limit for a maximum return of 1000 results at a time, but all results are returned.

          Simulate Folder Function

          No concept of folder exists in BOS storage results. All elements are stored in object, but BOS users often need folders to manage files when using data. Therefore, BOS provides the ability to create simulated folders, which essentially creates an object with size of 0. You can upload and download this object, but the console displays it as a folder for objects ending with "/".

          You can simulate the folder function through the combination of Delimiter and Prefix parameters. The combination of Delimiter and Prefix works like this:

          If setting Prefix to a folder name, you can list the files that begin with Prefix, that is, all the recursive files and subfolders (directories) under the folder. The file name is displayed in Contents. If Delimiter is set to "/" again, the return value only lists the files and subfolders (directories) under the folder. The names of subfiles (directories) under the folder are returned in the CommonPrefixes section, and the recursive files and folders under the subfolders are not displayed.

          Suppose bucket have 5 files: bos.jpg, fun/, fun/test.jpg, fun/movie/001.avi and fun/movie/007.avi use the symbol "/" as the separator of a folder.

          Here are some application modes:

          List All Files in the Bucket.

          When you need to get all files under bucket, refer to Page to Get All objects.

          Recursively List All Files in the Directory.

          Set the Prefix parameter to access all files under a directory:

          prefix = "fun/";
          print "Objects:";
          # Recursively list all files in the fun directory
          response = bos_client.list_objects(bucket_name, prefix = prefix)
          for object in response.contents:
              print object.key

          Output:

          Objects:
          fun/
          fun/movie/001.avi
          fun/movie/007.avi
          fun/test.jpg

          View Files and Subdirectories under the Directory.

          The files and subdirectories under directory can be listed with the combination of Prefix and Delimiter:

          # "/" is a folder separator
          delimiter = "/"
          prefix = "fun/";
          # List all files and folders under fun file directory
          response = bos_client.list_objects(bucket_name, prefix = prefix, delimiter = delimiter)
          print "Objects:"
          for object in response.contents:
              print object.key
              
          # Traverse all CommonPrefix
          print "CommonPrefixs:"
          for object in response.common_prefixes:
              print object.prefix

          Output:

          Objects:
          fun/
          fun/test.jpg
          
          CommonPrefixs:
          fun/movie/

          In the result returned, the list in objectSummaries gives the files under fun directory. The list of CommonPrefixs gives all sub-folders under fun directory. It is clear that the fun/movie/001.avi file and fun/movie/007.avi file are not listed because they are movie directory of the fun folder.

          List the Storage Attributes of Objects in Bucket.

          In addition, the user can view all objects in specified bucket as well as the storage class of object, with the code as follows:

          response = bos_client.list_objects(bucket_name)
          for object in response.contents:
              print 'object:%s, storage_class:%s' % (object.key, object.storage_class)

          Object Privilege Control

          Set Access Privilege of Object

          Currently BOS supports two ways to set ACL. The first method is to use Canned Acl. When PutObjectAcl, set access privilege of object through x-bce-acl or x-bce-grant-privilege in header field. The currently set privileges include private and public-read, headers of these two types cannot appear in one request at the same time. The second is to upload an ACL file. For details, please see Set Object Privilege Control.

          1.Set the access right of object by using x-bce-acl of the header field

          from baidubce.services.bos import canned_acl
          # Set object as private privilege
          bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PRIVATE)
          # Set object as public-read privilege
          bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PUBLIC_READ)

          2.Set the access right of object by using x-bce-grant-privilege of the header field

          # Authorize the specified user the right to access object
          bos_client.set_object_canned_acl(bucket_name, object_key, grant_read='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')
          # Authorize the specified user the FULL_CONTROL privilege of object
          bos_client.set_object_canned_acl(bucket_name, object_key, grant_full_control='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')

          3.Set object privilege through set_object_acl() interface.

          # Authorize the specified user the right to access object
          acl = [{
                     "grantee":[{
                     "id":"12345678dfd5484399f5c85aca5c1234"
                     }],
                     "privilege":["READ"]
                 }]
          bos_client.set_object_acl(bucket_name, object_key, acl = acl)

          View Object Privilege

          You cannot set object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).

          View the privilege of object as shown in the following code:

          response = bos_client.get_object_acl(bucket_name, object_key)
          print "object acl:", response

          The callable parameters in parser class returned by the getObjectAcl method include:

          Parameter Description
          accessControlList Identify privilege list of object
          grantee Identify authorized person
          -id Authorized person ID.
          privilege Identify the privilege of the authorized person.

          Delete Object Privilege

          You cannot delete object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).

          The following codes enable it to delete object privilege:

          bos_client.delete_object_acl(bucket_name,object_key)

          Delete File

          Delete an object

          Delete an object by the following code:

          bos_client.delete_object(bucket_name, object_key)

          Delete object in batch

          The user can delete the object in batch by the following code

          key_list = [object_key1, object_key2, object_key3]
          bos_client.delete_multiple_objects(bucket_name, key_list)

          Note: support to delete at most 1000 objects in one request.

          Check if the File Exists

          You can check whether a file exists through the following operations:

          from baidubce import exception
          
          try:
          	response = bos_client.get_object_meta_data(bucket_name, object_key)
          	print "Get meta:",response.metadata
          except exception.BceError as e:
          	print e

          Get and Update Object Metadata

          Object metadata (Object Metadata) is the attribute description of files uploaded by users to BOS. It includes two types: HTTP standard attribute (HTTP Headers) and User Meta (custom metadata).

          Get Object Metadata

          Refer to ObjectMetadata Only.

          Modify Object Metadata

          BOS modifies object's Metadata by copying object. That is, when copying object, set the destination bucket as the source bucket, set the destination object as the source object, and set a new Metadata to modify Metadata through copy. If the Metadata is not set, an error is reported.

          The archived files do not support the modification to metadata.

          user_metadata = {'meta_key': 'meta_value'}
          bos_client.copy_object(source_bucket_name = bucket_name, 
                                 source_key = object_name, 
                                 target_bucket_name = bucket_name, 
                                 target_key = object_name, 
                                 user_metadata = user_metadata)
          response = bos_client.get_object_meta_data(bucket_name = bucket_name, 
                                                     key = object_name)
          print response

          Copy Object

          You can copy an object through the Copyobject function, as shown in the following code:

          bos_client.copy_object(source_bucket_name, source_object_key, target_bucket_name, target_object_key)

          Synchronize Copy

          The Copyobject interface of the current BOS is implemented through synchronization. In synchronization mode, the BOS server returns successfully after Copy is completed. Synchronous copy can help users judge the copy status, but the copy time perceived by users will be longer, and the copy time is proportional to the file size.

          Synchronous Copy is more in line with industry conventions and improves compatibility with other platforms. Synchronous Copy also simplifies the business logic of BOS server and improves service efficiency.

          If the archived file is a source object, the archived file needs to be retrieved at first.

          If you use an SDK prior to the version of bce-python-sdk-0.8.12, it is likely that the duplication request will succeed, but the document copying fails actually, so you are recommended to use the latest version of SDK.

          Multipart Upload Copy

          In addition to copying files through Copyobject , BOS also provides another copy mode, namely, Multipart Upload Copy. If the archived file is a source object, the archived file needs to be retrieved at first.

          You can use Multipart Upload Copy in the following application scenarios (but not limited to this), such as:

          • Breakpoint copy support is required.
          • The file to copy is larger than 5 GB.
          • Network conditions are poor, and connections to BOS services \x{fa38} are often disconnected.

          Next, the three-step copy will be introduced step by step.

          Three-step copy consists of init, "copy part" and complete, where the operation of init and complete is consistent with upload by part, see Initialize Multipart Upload and Complete Multipart Upload directly.

          Copy the chunked code reference:

          left_size = int(bos_client.get_object_meta_data(source_bucket,source_key).metadata.content_length)
          #Set the starting position of chunk as left_size
          
          #Set the offset starting position of part
          offset = 0
          part_number = 1
          part_list = []
          while left_size > 0:
          
          	#Set each part to 5 MB
          	part_size = 5 * 1024 * 1024
          	if left_size < part_size:
          	    part_size = left_size
          
          	response = bos_client.upload_part_copy(source_bucket, source_key, target_bucket, target_key, upload_id,part_number, part_size, offset)
          
          	left_size -= part_size
          	offset += part_size
          	part_list.append({
          		"partNumber": part_number,
          		"eTag": response.etag
          	})
          
          	part_number += 1

          Note:

          1.The offset parameter is the starting offset position of the part, in bytes. 2.The size parameter defines the size of each part in bytes. Except for the last Part, all other Parts are larger than 5 MB.

          Previous
          Bucket Management
          Next
          Log Control