File Management

Last Updated：2020-10-21

Upload Files

In BOS, the basic unit of data for user operations is object. The number of objects in bucket is not limited, but an object is permitted to store 5TB data maximally. object includes Key, Meta and Data. Key is the name of object. Meta is the user's description of the object and consists of a series of Name-Value pairs. Data is the data of the object.

BOS Python SDK provides rich file upload interfaces, which can upload files in the following ways:

Simple Upload
Append Upload
Multipart Upload
Breakpoint Continued Upload

Naming specification of object is as follows:

Use UTF-8 code.
The length must be between 1 and 1023 bytes.
The first letter cannot be /, no @ character is allowed, and @ is used for picture processing interface.

Simple Upload

BOS supports the execution of object upload in file, data stream and string forms in the scene of simple upload, please see the following codes:

1.The following code can be used to upload object

data = open(file_name, 'rb')
#To upload object in data stream form, you need to calculate data length content_length by yourselves
#You need to ontent_md5 by yourselves. The calculation method is to execute md5 algorithm for data to obtain 128-bit binary data, and then execute base 64 coding
bos_client.put_object(bucket_name, object_key, data, content_length,content_md5)

#object uploaded from string
bos_client.put_object_from_string(bucket_name, object_key, string)

#object uploaded from file
bos_client.put_object_from_file(bucket_name, object_key, file_name)

Where, data is stream object, different processing methods are adopted for different types of objects, the return of StringIO is used for the upload from string, and return of open() is used for the upload from file, so BOS provides an encapsulated interface for users to upload rapidly.

object is uploaded to BOS in file form, and put_object related interfaces support upload of object no more than 5GB. After put_object, put_object_from_string or put_object_from_file requests are processed successfully, BOS returns ETag of object as the file identification.

All these interfaces have optional parameters:

Parameter	Description
content_type	Type of uploaded file or string
content_md5	File data verification, BOS will enable MD5 verification of file content after setting. Comparing the MD5 you provided with the MD5 of the file, an error will be thrown if it is inconsistent.
content_length	Define file length, and put_object_from_string() does not contain this parameter
content_sha256	Used for file check
user_metadata	custom metadata
storage_class	Set file storage type
user_headers	User defined header

The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.The following is an example:

import io
import hashlib
import base64

file_name = "your_file"
buf_size = 8192
fp = open(file_name, 'rb')
md5 = hashlib.md5()
while True:
    bytes_to_read = buf_size
    buf = fp.read(bytes_to_read)
    if not buf:
        break
    md5.update(buf)
content_md5 = base64.standard_b64encode(md5.digest())

Set Object Metadata

Object metadata is the attribute description of the file when the user uploads the file to BOS. It is mainly divided into two types: Set HTTP Headers and custom metadata.

Set the Http Header of Object BOS Python SDK is to call the background HTTP interface in essence, so you can customize the Http Header of object when uploading files. Common http headers are described as follows:

Name	Description	Default value
Cache-Control	It specifies the caching behavior of the web page when the object is downloaded.	None
Content-Encoding	It represents in which way the message body encodes and converts contents	None
Content-Disposition	It instructs the MINME user agent how to display additional files, open or download, and file names.	None
Expires	Cache expiration time	None

The reference codes are as follows:

Upload object with specific header from string

user_headers = {"header_key":"header_value"}
#Upload object with specific header from string 
bos_client.put_object_from_string(bucket=bucket_name, 
                                  key=object_key, 
                                  data=string,
                                  user_headers=user_headers)
#Upload object with specific header from file 
bos_client.put_object_from_file(bucket=bucket_name,
                            key=object_key,
                            file_name=file,
                            user_headers=user_headers)

Custom Metadata

Custom metadata is available under BOS for object description. As shown in the following code:

#custom metadata 
user_metadata = {"name":"my-data"}
#Upload object with meta customized by users from string
bos_client.put_object_from_string(bucket=bucket_name, 
                                  key=object_key, 
                                  data=string,
                                  user_metadata=user_metadata)
#Upload object with meta customized by users from file
bos_client.put_object_from_file(bucket=bucket_name,
                            key=object_key,
                            file_name=file,
                            user_metadata=user_metadata)

Tips:

As for the above code, users have customized a metadata of which the name is "name" and the value is "my-data".

When users download this object, they can get metadata together.

One object possesses similar parameters, but the total size of User Meta bellows 2KB.

Set the Copy Attribute of Object

BOS provides a copy_object interface to copy an existing object to another object, and in the process of copying, Etag or modification status of source object is judged, and whether to execute copy is decided according to the judgment result. The following shows parameters in detail:

Name	Type	Description	Required or not
x-bce-copy-source-if-match	String	If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise the copy fails.	No
x-bce-copy-source-if-none-match	String	If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise copy fails.	No
x-bce-copy-source-if-unmodified-since	String	If source object is not modified after x-bce-copy-source-if-unmodified-since, copy operation is performed, otherwise copy fails.	No
x-bce-copy-source-if-modified-since	String	If source object is modified after x-bce-copy-source-if-modified-since, copy operation is performed, otherwise copy fails.	No

Corresponding sample code:

copy_object_user_headers = {"copy_header_key":"copy_header_value"}

bos_client.copy_object(source_bucket_name = bucket_name, 
                       source_key = object_name, 
                       target_bucket_name = bucket_name, 
                       target_key = object_name, 
                       user_metadata = user_metadata,
                       user_headers = user_headers,
                       copy_object_user_headers = copy_object_user_headers)

Set Storage Type When Uploading Object

BOS supports standard storage, infrequency storage, cold storage and archive storage, and the storage is realized via the specified StorageClass when object is uploaded and stored as a storage type, with default standard storage, and the corresponding parameters of the 4 storage types are as follows:

Storage type	Parameter
Standard storage	STANDARD
infrequency storage	STANDARD_IA
Cold storage	COLD
Archive storage	ARCHIVE

With infrequency storage and archive storage as an example, the codes are as follows:

from baidubce.services.bos import storage_class
#Upload object of cold storage type from file
bos_client.put_object_from_file(bucket=bucket_name,
                              key=object_key,
                              file_name=file,
                              storage_class=storage_class.COLD)
#Upload object of cold storage type from string 
bos_client.put_object_from_string(bucket=bucket_name, 
                                key=object_key, 
                                data=string,
                                storage_class=storage_class.COLD)
#Upload object of archive storage type from file 
bos_client.put_object_from_file(bucket=bucket_name,
                              key=object_key,
                              file_name=file,
                              storage_class=storage_class.ARCHIVE)

Append Upload

In the simple upload method introduced above, the objects created are of Normal type, and you cannot append, as it is inconvenient to use in scenarios where data copying is frequent, such as log, video monitoring and live video.

Because of this, Baidu AI Cloud BOS supports appendObject, i.e. upload of file in the form of appending write. The object created by the appendObject operation is of Appendable object. And you can append data to the object. appendObject size is 0-5G. The archive storage type does not support appended upload.

The sample code uploaded through appendObject is as follows:

#Upload appendable object. Where "content_md5(data)" represents that you need to calculate the md5 value of uploaded data by themselves 
#The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.See the section "simple upload" above for example
#Where "content_length(data)" represents that you need to calculate the length of uploaded data by themselves 
response = bos_client.append_object(bucket_name=bucket_name, 
                                    key=object_key,
                                    data=data,
                                    content_md5=content_md5(data), content_length=content_length(data))
#Obtain the position of appending write next time
next_offset = response.metadata.bce_next_append_offset
bos_client.append_object(bucket_name=bucket_name, 
                         key=object_key,
                         data=next_data,
                         content_md5=content_md5(next_data), content_length=content_length(next_data),
                         offset=next_offset)

#Upload an appendable object from string
from baidubce.services.bos import storage_class
bos_client.append_object_from_string(bucket_name=bucket_name,
                                     key=object_key,
                                     data=string,
                                     offset=offset,
                                     storage_class=storage_class.STANDARD,
                                     user_headers=user_headers)

Multipart Upload

In addition to uploading file to BOS via putObject interface, BOS provides another upload mode, Multipart Upload. You can use Multipart Upload mode in the following application scenarios (but not limited to this), such as:

Breakpoint upload support is required.
The file to upload is larger than 5 GB.
The network conditions are poor, and the connection with BOS servers is often disconnected.
The file needs to be uploaded streaming.
The size of the uploaded file cannot be determined before uploading it.

Multipart Upload step by step is introduced below.

Initialize Multipart Upload

BOS uses initiate_multipart_upload method to initialize a multipart upload event:

upload_id = bos_client.initiate_multipart_upload(bucket_name, object_key).upload_id

This method returns InitMultipartUploadResponse object which contains uploadId parameter to represent the current upload event.

Initialization of Multipart Upload with Specific Header

bos_client.initiate_multipart_upload(bucket_name=bucket, 
                                     key=object_key,
									 user_headers=user_headers)

Where the settable attributes of header include: "Cache-Control", "Content-Encoding", "Content-Disposition", "Expires", interfaces get-object and get-object-meta return the 4 headers set.

Initialization of Multipart Load of Infrequent Storage, Cold Storage and Archive Storage

storage_class needs to be specified for the initialization of multipart upload of infrequent storage, see the following codes (and so on for cold storage):

from baidubce.services.bos import storage_class

bos_client.initiate_multipart_upload(bucket_name=bucket, 
                                         key=object_key,
                                         storage_class = storage_class.STANDARD_IA)

Upload in Parts

Upon initialization, perform multipart upload:

left_size = os.path.getsize(file_name)
# left_size is used to set starting position of part
# Set the offset starting position of part
offset = 0

part_number = 1
part_list = []

while left_size > 0:
	# Set each part to 5 MB
    part_size = 5 * 1024 * 1024
    if left_size < part_size:
        part_size = left_size

    response = bos_client.upload_part_from_file(
        bucket_name, object_key, upload_id, part_number, part_size, file_name, offset)


    left_size -= part_size
    offset += part_size
    part_list.append({
        "partNumber": part_number,
        "eTag": response.metadata.etag
    })

    part_number += 1

Note:

The offset parameter is the starting offset position of the part, in bytes.

size is in byte, the size of each part is defined, and except for the final Part, other parts are more than 5MB. However, Upload Part interface does not check the size of uploaded Part, but checks it only calling complete_multipart_upload().

To ensure no error of data in the process of network transmission, you are recommended to use the Content-MD5 value returned by each part BOS to respectively verify the validity of part data uploaded after Upload Part. When all part data is combined into one object, it no longer contains the MD5 value.

Part numbers range from 1 to 10,000. If this range is exceeded, BOS returns the error code of InvalidArguement.

When uploading Part each time, position stream to the position corresponding to the beginning of upload part.

After uploading Part each time, returned result of BOS contains an etag and part number (partNumber), which will be used in the subsequent multipart upload, so it needs to be saved. Generally, these etag and partNumber are saved in List.

Complete Multipart Upload

bos_client.complete_multipart_upload(bucket_name, object_key, upload_id, part_list)

Where, the type of part_list is list, each element is a dict in it, and each dict contains 2 key words, partNumber and eTag.

The following is an example:

[{'partNumber': 1, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 2, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 3, 'eTag': '93b885adfe0da089cdf634904fd59f71'}]

The parameters available for calling in parsing class returned by this method include:

Parameter	Description
bucket	bucket name
key	object name
e_tag	ETag for each upload chunk
location	URL of object

Note: The ETag contained in this object is to upload the ETag of each part in the process of uploading part, and after receiving the list of Part submitted by the users, BOS verifies the validity of each data part one by one. When all data Parts are verified, BOS will combine these data parts into a complete object.

Cancel Multipart Upload Event

You can use abort_multipart_upload method to cancel multipart upload:

bos_client.abort_multipart_upload(bucket_name, object_key, upload_id = upload_id)

Get Unfinished Multipart Upload Event

Users can use the following 2 methods to obtain the uncompleted multipart events in bucket:

Method 1:

response　＝　bos_client.list_multipart_uploads(bucket_name)
for item in response.uploads:
	print item.upload_id

For list_multipart_uploads, BOS returns at most 1,000 Multipart Upload each time, and BOS supports prefix and delimiter filtering.

The parameters of list_multipart_uploads available for calling method also include:

Name	Type	Description	Required or not
delimiter	String	Delimiter; mainly implements the logic of list folder	No
key_marker	String	After object is sorted in lexicographic order, this time it returns from the one after keyMarker.	No
max_uploads	Int	The maximum number of Multipart Uploads returned by this request, with default of 1,000, maximum of 1,000	No
prefix	String	key prefix, object key restricted to return must be prefixed with this	No

The parameters available for calling in parsing class returned by list_multipart_uploads method include:

Parameter	Description
bucket	bucket name
key_marker	Name of part object started to be uploaded
next_key_marker	This item is returned only when delimiter is specified and IsTruncated is true, as the value to enquire marker next time.
is_truncated	It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
prefix	Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements
common_prefixes	This item is returned only when delimiter is specified
delimiter	Query terminator
max_uploads	Maximum number of requests returned
uploads	Container of all multipart events not completed
owner	User information of corresponding bucket
id	User ID of bucket Owner
display_name	Name of bucket Owner
key	object name of part
upload_id	Multipart upload id
initiated	Starting time of multipart upload

list_all_multipart_uploads method returns the generator of uploads, and is not limited by return of at most 1,000 results each time, and all results are returned.

Method 2:

uploads = list_all_multipart_uploads(bucket_name)
for item in uploads:
    print item.upload_id

Get All Uploaded Part Information

You can use the following 2 methods to obtain all uploaded parts in an upload event:

Method 1:

response = bos_client.list_parts(bucket_name, object_key, upload_id)
for item in response.parts:
    print  item.part_number

Note: 1. BOS is sorted in ascending order of PartNumber. 2. It is not recommended to generate Part list of the final CompleteMultipartUpload with the result from ListParts because network transfers can go wrong.

The parameters of list_parts method available for calling also include:

Name	Type	Description	Required or not
max_parts	Int	The maximum number of parts returned by BOS at one time, with default of 1,000, maximum of 1,000	No
part_number_marker	Int	Sort by partNumber. The starting part of this request is returned from the next of this partNumber.	No

The parameters available for calling in parsing class returned by list_parts method include:

Parameter	Description
bucket	bucket name
key	object name
initiated	Starting time of current multipart upload
max_parts	Maximum number of requests returned
is_truncated	It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
storage_class	The storage type of object is currently classified into standard type `STANDARD`, infrequency type `STANDARD_IA`, cold storage type `COLD` and archive type `ARCHIVE`.
part_number_marker	Flag bit of part starting
parts	Part list, list type
+part_number	Part number
+last_modified	Last modification time of the part
+e_tag	ETag for each upload chunk
+size	Size of part contents (number of bytes)
upload_id	Current multipart upload id
owner	User information of corresponding bucket
+id	User ID of bucket owner
+display_name	Name of bucket owner
next_part_number_marker	The partNumber of the last record returned in the current request can be used as part_number_marker of the next request

Method 2:

parts = list_all_parts(bucket_name, object_key, upload_id = upload_id)
for item in parts:
    print item.part_number

list_all_parts method returns the generator of parts, and is not limited by return of at most 1,000 results each time, and all results are returned.

Obtain storage type of multipart upload object

response = bos_client.list_parts(bucket_name=bucket, 
                  key=object_key, 
                  upload_id=upload_id)

print response.storage_class

Package block upload

In Python SDK, BOS provides users with put_super_obejct_from_file() interface，

In Python SDK, BOS provides users with put super obejct from file() interface，It is related to block upload initiate_multipart_upload、upload_part_from_file、complete_multipart_upload three methods to package，users only need to call the interface to complete block upload.

import multiprocessing
file_name = "/path/to/file.zip"
result = bos_client.put_super_obejct_from_file(bucket_name, key, file_name,
            chunk_size=5, thread_num=multiprocessing.cpu_count())
if result:
    print "Upload success!"

The parameters that can be called by the method are：

Name	Type	Describe	Required or not
chunk_size	int	Block size, in MB. The default is 5MB	No
thread_num	int	In block upload, the number of threads in the thread pool is equal to the number of CPU cores by default	No

If a large file takes a long time and the user wants to end the block upload, the cancel() method in UploadTaskHandle can be called to cancel the block upload operation. Examples are as follows

import threading
from baidubce.services.bos.bos_client import UploadTaskHandle
file_name = "/path/to/file.zip"
uploadTaskHandle = UploadTaskHandle()
t = threading.Thread(target=bos_client.put_super_obejct_from_file, args=(bucket_name, key, file_name),
        kwargs={
            "chunk_size": 5,
            "thread_num": multiprocessing.cpu_count(),
            "uploadTaskHandle": uploadTaskHandle
            })
t.start()
time.sleep(2)
uploadTaskHandle.cancel()
t.join()

Breakpoint Continued Upload

When a user uploads a large file to BOS, if the network is unstable or the program crashes, the entire upload fails, and the part uploaded before the failure is invalid, so the user has to start over again. This is not only a waste of resources, in the case of network instability, it cannot complete the upload even after multiple retries. Based on the above scenarios, BOS provides the ability to continue uploading at breakpoints:

Under normal network conditions, it is recommended to use the three-step upload method to divide the object into 1Mb part. Refer to Multipart Upload.
When you have a poor network condition, it is recommended to use appendobject method for breakpoint resume, and append a small data 256kb, please see Append upload

Tips

Breakpoint continued upload is the encapsulation and enhancement of multipart upload. It is realized through multipart upload.

When the file is large or the network environment is poor, it is recommended to upload it in parts.

Download File

BOS Python SDK provides rich file download interfaces, and you can download files from BOS in the following ways:

Simple streaming download
Download to local file
Downloaded as string
Breakpoint continued download
Range download

Simple Object Reading

You can read object in a stream through the following codes:

response = bos_client.get_object(bucket_name, object_key)
s = response.data

# Process object
...

# Close stream
response.data.close()

Download Object to File or String Directly

You can download object to the specified rules by reference to the following codes:

bos_client.get_object_to_file(bucket_name, object_key, file_name)

The user can download the object to the string by referring to the following code:

result = bos_client.get_object_as_string(bucket_name, object_key)
print result

Range Download

To realize more functions, you can specify the download range by using the designated 'range' parameter to obtain a more refined object. If the specified download range is 0-100, the 0-100th (including) byte of data is returned, 101 bytes of data in total, i.e. [0, 100].

range = [0,1000]
#Return the object data within the specified range
print bos_client.get_object_as_string(bucket_name, object_key, range = range)
#Return the object data within the specified range to files 
bos_client.get_object_to_file(bucket_name, object_key, file_name, range = range)

Set the range to return object by the range parameters of get_object_as_string and get_object_to_file. You can use this function for segmented download of file and breakpoint continued upload.

Other Methods

Get Storage Type of Object

The attribute of object's storage class can be divided into (STANDARD) (standard storage), (STANDARD_IA) (infrequent storage), (COLD) (cold storage) and (ARCHIVE) (archival storage), and the attribute of object's storage class can be obtained by the following codes:

response = bos_client.get_object_meta_data(bucket_name, object_key)
print response.metadata.bce_storage_class

Get ObjectMetadata Only

Only metadata of object can be obtained by get_object_meta_data method, rather than physical objects of object. As shown in the following code:

response = bos_client.get_object_meta_data(bucket_name, object_key)

The callable parameters in parser class returned by the get_object_meta_data method include:

Parameter	Description
content_length	Size of object
e_tag	Entity tag of HTTP protocol for object
bce_meta	If the customized meta of user_metadata is specified in Putobject, this item () returns
storageClass	Storage type of object
bce_restore	The archival storage object returns when it is being retrieved or has been retrieved. If the value of the archival object bce_restore under retrieval is `ongoing-request="true"`; if the value of the archival object bce_restore is `ongoing-request="false"`, `expiry-date="Wed, 07 Nov 2019 00:00:00 GMT"`. Among them, expiry-date indicates the expiration time after the retrieval of the object, which is Greenwich Mean Time.

Retrieve the Files of Archive Storage Type

Retrieve the Archive Files

After the user uploads the archive files, the archive files must be frozen; to download the archive files, the user must retrieve the archive files at first. The requester must have the access to read the archive files, and the archive files are frozen.

The example for the retrieval of the archive files is shown as follows:

# Retrieve the archive files and set the duration after the unfreezing to be 2 days 
bos_client.restore_object(bucket_name, target_key, days=2)

Determine if the Archive File Is Retrieved

For the archive storage object, if it returns when it is being retrieved or has been retrieved, bce_restore returns when obtaining ObjectMetadata. If the value of the archive object bce_restore under retrieval is ongoing-request="true"; if the value of the archive object bce_restore is ongoing-request="false", expiry-date="Wed, 07 Nov 2019 00:00:00 GMT". expiry-date indicates expiry time after object is retrieved.

The following is an example:

response = bos_client.get_object_meta_data(bucket_name, object_key)
if response.metadata.bce_restore is not None:
    if response.metadata.bce_restore.find("expiry-date") >= 0:
        print("archive object is restored.")
    else:
        print("archive object is restoring.")
else:
    print("archvie object is freezed.")

Change File Storage Level

As mentioned above, BOS supports four types of storage: 'STANDARD' (standard storage), 'STANDARD_IA' (infrequent storage), 'COLD' (cold storage) and 'ARCHIVE' (archival storage). Meanwhile, BOS python SDK also supports users to change storage type for the specific files. The parameters involved are as follows:

Parameter	Description
x-bce-storage-class	Designate the storage class of object, 'STANDARD' (standard storage), STANDARD_IA represents the infrequent storage; COLD represents the cold storage and ARCHIVE represents archival storage; it is defaulted to be standard storage if the storage class is not specified.

Note:

When you call copy_object() interface, if the source object is an archive type, restore_object() needs to be called at first to retrieve the archive file.

The following is an example:

# Standard storage to infrequent storage 
bos_client.copy_object(source_bucket_name, source_key,
                    target_bucket_name, target_key,
                    storage_class = storage_class.STANDARD_IA)
# infrequent storage to cold storage 
bos_client.copy_object(source_bucket_name, source_key,
                    target_bucket_name, target_key,
                    storage_class = storage_class.COLD)

Get File Download URL

The user can get the designated URL of object by the following sample code:

url = bos_client.generate_pre_signed_url(bucket_name, object_key, timestamp, expiration_in_seconds)

Note:

Before calling this function, the user needs to manually set endpoint as the domain name of the region. Baidu AI Cloud currently has opened access to multi-region support, please refer to Region Selection Description. Currently, it supports "North China-Beijing", "South China-Guangzhou" and "East China-Suzhou". Beijing: http://bj.bcebos.com; Guangzhou: http://gz.bcebos.com; Suzhou: http://su.bcebos.com.

The timestamp is an option parameter, and the default value is current time when it is not configured.

The timestamp is a timestamp, which identifies the effective start time of URL, with timestamp=int(time.time (), and it needs to * import time.

The expriation_in_seconds is used to set the effective duration of URL, and it is an optional parameter, whose default value is 1800 seconds if it is not configured. To set a time not invalid permanently, expirationInSeconds parameter can be set as -1, and it cannot be set as other negative numbers.

Enumerate Files in Storage Space

BOS SDK allows users to enumerate objects in the following two ways:

Simple enumeration
Complex enumeration by parameters

In addition, you can simulate folders while listing files.

Simple Enumeration

After completing a series of uploads, the user may need to view all object in the designated bucket, which can be realized by the following codes:

response = bos_client.list_objects(bucket_name)
for object in response.contents:
    print object.key

Note:

By default, only 1,000 objects are returned and the is_truncated value is True if bucket has over 1,000 objects. Besides, next_marker is returned as the starting point for the next reading.

To increase the number of returned objects, you can use the marker parameter to read by several times.

List all the objects under current bucket at one time.

for object in bos_client.list_all_objects(bucket_name):
    print object.key

Complex Enumeration by Parameters

Other optional parameters of list_objects include:

Parameter	Description
prefix	The object key returned by the qualification must be prefixed with prefix.
delimiter	A character used to group the object names. All names contain the specific prefix and the object between the characters Delimiter appearing for the first time serves as a set of element: CommonPrefixes.
max_keys	Limit the maximum number of object returned, and the value is not greater than 1000; the default value is 1000 if not configured.
marker	The set value returns from the first one sorted alphabetically after Marker.

Note:

1.If an object is named after Prefix, all the Keys returned will still contain the object named after prefix when only the Prefix is used for query, as detailed in Recursively List All Files in Directory. 2.If an object is named after Prefix, all the Keys returned will contain Null and the Key name does not contain the Prefix when the combination of Prefix and Delimiter is used for query, as detailed in View Files and Subdirectories under Directory.

Next, we use several cases to illustrate the method of parameter enumeration:

Specify the Maximum Number of Returned Entries

max_keys = 500
# Specify the maximum number of returned entries to be 500
response = bos_client.list_objects(bucket_name, max_keys = max_keys)
for object in response.contents:
    print object.key

Return the Object with the Specified Prefix

prefix = "test"
# Specify the returned object with test as the prefix
response = bos_client.list_objects(bucket_name, prefix = prefix)
for object in response.contents:
    print object.key

Return from the Specified Object

marker = "object"
# You can define an object not to be included, and return from it
response = bos_client.list_objects(bucket_name, marker = marker)
for object in response.contents:
    print object.key

Page to Get All Objects

isTruncated = True
# You can set a maximum of 500 records per page
max_keys = 500
marker = None
while isTruncated:
    response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
    for object in response.contents:
    	print object.key
    isTruncated = response.is_truncated
    marker = getattr(response,'next_marker',None)

Page the Results after Getting All Specific Objects

# You can set up to 500 records per page and get them from a specific object
max_keys = 500
marker = "object"
isTruncated = True
while isTruncated:
	response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
    for object in response.contents:
    	print object.key
    isTruncated = response.is_truncated
    marker = getattr(response,'next_marker',None)

Page to Get the Object Results for All the Specified Prefixes

# You can set the page to get the object with the specified prefix, with a maximum of 500 records per page
max_keys = 500
prefix = "object"
isTruncated = True
while isTruncated:
	response = bos_client.list_objects(bucket_name, prefix = prefix)
	for object in response.contents:
	    print object.key
	isTruncated = response.is_truncated
    marker = getattr(response,'next_marker',None)

The callable parameters in parser class returned by the list_objects method include:

Parameter	Description
name	bucket name
prefix	Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements
marker	Starting point of this query
max_keys	Maximum number of requests returned
is_truncated	It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time.
contents	Container of an object returned
+key	object name
+last_modified	Last time this object was modified
+e_tag	Entity tag of HTTP protocol for object
+size	Content size of object (number of bytes)
+owner	User information of bucket corresponding to object
++id	User ID of bucket Owner
++display_name	Name of bucket Owner
next_marker	As long as IsTruncated is true, next_marker returns as the value to query marker for the next time.
common_prefixes	This item is returned only when delimiter is specified

The list_all_objects method returns the Generator of contents, and is not subject to the limit for a maximum return of 1000 results at a time, but all results are returned.

Simulate Folder Function

No concept of folder exists in BOS storage results. All elements are stored in object, but BOS users often need folders to manage files when using data. Therefore, BOS provides the ability to create simulated folders, which essentially creates an object with size of 0. You can upload and download this object, but the console displays it as a folder for objects ending with "/".

You can simulate the folder function through the combination of Delimiter and Prefix parameters. The combination of Delimiter and Prefix works like this:

If setting Prefix to a folder name, you can list the files that begin with Prefix, that is, all the recursive files and subfolders (directories) under the folder. The file name is displayed in Contents. If Delimiter is set to "/" again, the return value only lists the files and subfolders (directories) under the folder. The names of subfiles (directories) under the folder are returned in the CommonPrefixes section, and the recursive files and folders under the subfolders are not displayed.

Suppose bucket have 5 files: bos.jpg, fun/, fun/test.jpg, fun/movie/001.avi and fun/movie/007.avi use the symbol "/" as the separator of a folder.

Here are some application modes:

List All Files in the Bucket.

When you need to get all files under bucket, refer to Page to Get All objects.

Recursively List All Files in the Directory.

Set the Prefix parameter to access all files under a directory:

prefix = "fun/";
print "Objects:";
# Recursively list all files in the fun directory
response = bos_client.list_objects(bucket_name, prefix = prefix)
for object in response.contents:
    print object.key

Output:

Objects:
fun/
fun/movie/001.avi
fun/movie/007.avi
fun/test.jpg

View Files and Subdirectories under the Directory.

The files and subdirectories under directory can be listed with the combination of Prefix and Delimiter:

# "/" is a folder separator
delimiter = "/"
prefix = "fun/";
# List all files and folders under fun file directory
response = bos_client.list_objects(bucket_name, prefix = prefix, delimiter = delimiter)
print "Objects:"
for object in response.contents:
    print object.key
    
# Traverse all CommonPrefix
print "CommonPrefixs:"
for object in response.common_prefixes:
    print object.prefix

Output:

Objects:
fun/
fun/test.jpg

CommonPrefixs:
fun/movie/

In the result returned, the list in objectSummaries gives the files under fun directory. The list of CommonPrefixs gives all sub-folders under fun directory. It is clear that the fun/movie/001.avi file and fun/movie/007.avi file are not listed because they are movie directory of the fun folder.

List the Storage Attributes of Objects in Bucket.

In addition, the user can view all objects in specified bucket as well as the storage class of object, with the code as follows:

response = bos_client.list_objects(bucket_name)
for object in response.contents:
    print 'object:%s, storage_class:%s' % (object.key, object.storage_class)

Object Privilege Control

Set Access Privilege of Object

Currently BOS supports two ways to set ACL. The first method is to use Canned Acl. When PutObjectAcl, set access privilege of object through x-bce-acl or x-bce-grant-privilege in header field. The currently set privileges include private and public-read, headers of these two types cannot appear in one request at the same time. The second is to upload an ACL file. For details, please see Set Object Privilege Control.

1.Set the access right of object by using x-bce-acl of the header field

from baidubce.services.bos import canned_acl
# Set object as private privilege
bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PRIVATE)
# Set object as public-read privilege
bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PUBLIC_READ)

2.Set the access right of object by using x-bce-grant-privilege of the header field

# Authorize the specified user the right to access object
bos_client.set_object_canned_acl(bucket_name, object_key, grant_read='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')
# Authorize the specified user the FULL_CONTROL privilege of object
bos_client.set_object_canned_acl(bucket_name, object_key, grant_full_control='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')

3.Set object privilege through set_object_acl() interface.

# Authorize the specified user the right to access object
acl = [{
           "grantee":[{
           "id":"12345678dfd5484399f5c85aca5c1234"
           }],
           "privilege":["READ"]
       }]
bos_client.set_object_acl(bucket_name, object_key, acl = acl)

View Object Privilege

You cannot set object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).

View the privilege of object as shown in the following code:

response = bos_client.get_object_acl(bucket_name, object_key)
print "object acl:", response

The callable parameters in parser class returned by the getObjectAcl method include:

Parameter	Description
accessControlList	Identify privilege list of object
grantee	Identify authorized person
-id	Authorized person ID.
privilege	Identify the privilege of the authorized person.

Delete Object Privilege

You cannot delete object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).

The following codes enable it to delete object privilege:

bos_client.delete_object_acl(bucket_name,object_key)

Delete File

Delete an object

Delete an object by the following code:

bos_client.delete_object(bucket_name, object_key)

Delete object in batch

The user can delete the object in batch by the following code

key_list = [object_key1, object_key2, object_key3]
bos_client.delete_multiple_objects(bucket_name, key_list)

Note: support to delete at most 1000 objects in one request.

Check if the File Exists

You can check whether a file exists through the following operations:

from baidubce import exception

try:
	response = bos_client.get_object_meta_data(bucket_name, object_key)
	print "Get meta:",response.metadata
except exception.BceError as e:
	print e

Get and Update Object Metadata

Object metadata (Object Metadata) is the attribute description of files uploaded by users to BOS. It includes two types: HTTP standard attribute (HTTP Headers) and User Meta (custom metadata).

Get Object Metadata

Refer to ObjectMetadata Only.

Modify Object Metadata

BOS modifies object's Metadata by copying object. That is, when copying object, set the destination bucket as the source bucket, set the destination object as the source object, and set a new Metadata to modify Metadata through copy. If the Metadata is not set, an error is reported.

The archived files do not support the modification to metadata.

user_metadata = {'meta_key': 'meta_value'}
bos_client.copy_object(source_bucket_name = bucket_name, 
                       source_key = object_name, 
                       target_bucket_name = bucket_name, 
                       target_key = object_name, 
                       user_metadata = user_metadata)
response = bos_client.get_object_meta_data(bucket_name = bucket_name, 
                                           key = object_name)
print response

Copy Object

You can copy an object through the Copyobject function, as shown in the following code:

bos_client.copy_object(source_bucket_name, source_object_key, target_bucket_name, target_object_key)

Synchronize Copy

The Copyobject interface of the current BOS is implemented through synchronization. In synchronization mode, the BOS server returns successfully after Copy is completed. Synchronous copy can help users judge the copy status, but the copy time perceived by users will be longer, and the copy time is proportional to the file size.

Synchronous Copy is more in line with industry conventions and improves compatibility with other platforms. Synchronous Copy also simplifies the business logic of BOS server and improves service efficiency.

If the archived file is a source object, the archived file needs to be retrieved at first.

If you use an SDK prior to the version of bce-python-sdk-0.8.12, it is likely that the duplication request will succeed, but the document copying fails actually, so you are recommended to use the latest version of SDK.

Multipart Upload Copy

In addition to copying files through Copyobject , BOS also provides another copy mode, namely, Multipart Upload Copy. If the archived file is a source object, the archived file needs to be retrieved at first.

You can use Multipart Upload Copy in the following application scenarios (but not limited to this), such as:

Breakpoint copy support is required.
The file to copy is larger than 5 GB.
Network conditions are poor, and connections to BOS services \x{fa38} are often disconnected.

Next, the three-step copy will be introduced step by step.

Three-step copy consists of init, "copy part" and complete, where the operation of init and complete is consistent with upload by part, see Initialize Multipart Upload and Complete Multipart Upload directly.

Copy the chunked code reference:

left_size = int(bos_client.get_object_meta_data(source_bucket,source_key).metadata.content_length)
#Set the starting position of chunk as left_size

#Set the offset starting position of part
offset = 0
part_number = 1
part_list = []
while left_size > 0:

	#Set each part to 5 MB
	part_size = 5 * 1024 * 1024
	if left_size < part_size:
	    part_size = left_size

	response = bos_client.upload_part_copy(source_bucket, source_key, target_bucket, target_key, upload_id,part_number, part_size, offset)

	left_size -= part_size
	offset += part_size
	part_list.append({
		"partNumber": part_number,
		"eTag": response.etag
	})

	part_number += 1

Note:

1.The offset parameter is the starting offset position of the part, in bytes. 2.The size parameter defines the size of each part in bytes. Except for the last Part, all other Parts are larger than 5 MB.

Bucket Management

Log Control

百度智能云

Object Storage

File Management

Upload Files

Simple Upload

Append Upload

Multipart Upload

Package block upload

Breakpoint Continued Upload

Download File

Simple Object Reading

Download Object to File or String Directly

Range Download

Other Methods

Retrieve the Files of Archive Storage Type

Retrieve the Archive Files

Determine if the Archive File Is Retrieved

Change File Storage Level

Get File Download URL

Enumerate Files in Storage Space

Simple Enumeration

Complex Enumeration by Parameters

Simulate Folder Function

List the Storage Attributes of Objects in Bucket.

Object Privilege Control

Set Access Privilege of Object

View Object Privilege

Delete Object Privilege

Delete File

Check if the File Exists

Get and Update Object Metadata

Get Object Metadata

Modify Object Metadata

Copy Object

Multipart Upload Copy