File Management
Upload Files
In BOS, the basic unit of data for user operations is object. The number of objects in bucket is not limited, but an object is permitted to store 5TB data maximally. object includes Key, Meta and Data. Key is the name of object. Meta is the user's description of the object and consists of a series of Name-Value pairs. Data is the data of the object.
BOS Python SDK provides rich file upload interfaces, which can upload files in the following ways:
- Simple Upload
- Append Upload
- Multipart Upload
- Breakpoint Continued Upload
Naming specification of object is as follows:
- Use UTF-8 code.
- The length must be between 1 and 1023 bytes.
- The first letter cannot be
/
, no@
character is allowed, and@
is used for picture processing interface.
Simple Upload
BOS supports the execution of object upload in file, data stream and string forms in the scene of simple upload, please see the following codes:
1.The following code can be used to upload object
data = open(file_name, 'rb')
#To upload object in data stream form, you need to calculate data length content_length by yourselves
#You need to ontent_md5 by yourselves. The calculation method is to execute md5 algorithm for data to obtain 128-bit binary data, and then execute base 64 coding
bos_client.put_object(bucket_name, object_key, data, content_length,content_md5)
#object uploaded from string
bos_client.put_object_from_string(bucket_name, object_key, string)
#object uploaded from file
bos_client.put_object_from_file(bucket_name, object_key, file_name)
Where, data is stream object, different processing methods are adopted for different types of objects, the return of StringIO is used for the upload from string, and return of open() is used for the upload from file, so BOS provides an encapsulated interface for users to upload rapidly.
object is uploaded to BOS in file form, and put_object related interfaces support upload of object no more than 5GB. After put_object, put_object_from_string or put_object_from_file requests are processed successfully, BOS returns ETag of object as the file identification.
All these interfaces have optional parameters:
Parameter | Description |
---|---|
content_type | Type of uploaded file or string |
content_md5 | File data verification, BOS will enable MD5 verification of file content after setting. Comparing the MD5 you provided with the MD5 of the file, an error will be thrown if it is inconsistent. |
content_length | Define file length, and put_object_from_string() does not contain this parameter |
content_sha256 | Used for file check |
user_metadata | custom metadata |
storage_class | Set file storage type |
user_headers | User defined header |
The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.The following is an example:
import io
import hashlib
import base64
file_name = "your_file"
buf_size = 8192
fp = open(file_name, 'rb')
md5 = hashlib.md5()
while True:
bytes_to_read = buf_size
buf = fp.read(bytes_to_read)
if not buf:
break
md5.update(buf)
content_md5 = base64.standard_b64encode(md5.digest())
Set Object Metadata
Object metadata is the attribute description of the file when the user uploads the file to BOS. It is mainly divided into two types: Set HTTP Headers and custom metadata.
Set the Http Header of Object BOS Python SDK is to call the background HTTP interface in essence, so you can customize the Http Header of object when uploading files. Common http headers are described as follows:
Name | Description | Default value |
---|---|---|
Cache-Control | It specifies the caching behavior of the web page when the object is downloaded. | None |
Content-Encoding | It represents in which way the message body encodes and converts contents | None |
Content-Disposition | It instructs the MINME user agent how to display additional files, open or download, and file names. | None |
Expires | Cache expiration time | None |
The reference codes are as follows:
-
Upload object with specific header from string
user_headers = {"header_key":"header_value"} #Upload object with specific header from string bos_client.put_object_from_string(bucket=bucket_name, key=object_key, data=string, user_headers=user_headers) #Upload object with specific header from file bos_client.put_object_from_file(bucket=bucket_name, key=object_key, file_name=file, user_headers=user_headers)
Custom Metadata
Custom metadata is available under BOS for object description. As shown in the following code:
#custom metadata
user_metadata = {"name":"my-data"}
#Upload object with meta customized by users from string
bos_client.put_object_from_string(bucket=bucket_name,
key=object_key,
data=string,
user_metadata=user_metadata)
#Upload object with meta customized by users from file
bos_client.put_object_from_file(bucket=bucket_name,
key=object_key,
file_name=file,
user_metadata=user_metadata)
Tips:
- As for the above code, users have customized a metadata of which the name is "name" and the value is "my-data".
- When users download this object, they can get metadata together.
- One object possesses similar parameters, but the total size of User Meta bellows 2KB.
Set the Copy Attribute of Object
BOS provides a copy_object interface to copy an existing object to another object, and in the process of copying, Etag or modification status of source object is judged, and whether to execute copy is decided according to the judgment result. The following shows parameters in detail:
Name | Type | Description | Required or not |
---|---|---|---|
x-bce-copy-source-if-match | String | If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise the copy fails. | No |
x-bce-copy-source-if-none-match | String | If ETag value of source object is equal to ETag provided by the user, copy operation is performed, otherwise copy fails. | No |
x-bce-copy-source-if-unmodified-since | String | If source object is not modified after x-bce-copy-source-if-unmodified-since, copy operation is performed, otherwise copy fails. | No |
x-bce-copy-source-if-modified-since | String | If source object is modified after x-bce-copy-source-if-modified-since, copy operation is performed, otherwise copy fails. | No |
Corresponding sample code:
copy_object_user_headers = {"copy_header_key":"copy_header_value"}
bos_client.copy_object(source_bucket_name = bucket_name,
source_key = object_name,
target_bucket_name = bucket_name,
target_key = object_name,
user_metadata = user_metadata,
user_headers = user_headers,
copy_object_user_headers = copy_object_user_headers)
Set Storage Type When Uploading Object
BOS supports standard storage, infrequency storage, cold storage and archive storage, and the storage is realized via the specified StorageClass when object is uploaded and stored as a storage type, with default standard storage, and the corresponding parameters of the 4 storage types are as follows:
Storage type | Parameter |
---|---|
Standard storage | STANDARD |
infrequency storage | STANDARD_IA |
Cold storage | COLD |
Archive storage | ARCHIVE |
With infrequency storage and archive storage as an example, the codes are as follows:
from baidubce.services.bos import storage_class
#Upload object of cold storage type from file
bos_client.put_object_from_file(bucket=bucket_name,
key=object_key,
file_name=file,
storage_class=storage_class.COLD)
#Upload object of cold storage type from string
bos_client.put_object_from_string(bucket=bucket_name,
key=object_key,
data=string,
storage_class=storage_class.COLD)
#Upload object of archive storage type from file
bos_client.put_object_from_file(bucket=bucket_name,
key=object_key,
file_name=file,
storage_class=storage_class.ARCHIVE)
Append Upload
In the simple upload method introduced above, the objects created are of Normal type, and you cannot append, as it is inconvenient to use in scenarios where data copying is frequent, such as log, video monitoring and live video.
Because of this, Baidu AI Cloud BOS supports appendObject, i.e. upload of file in the form of appending write. The object created by the appendObject operation is of Appendable object. And you can append data to the object. appendObject size is 0-5G. The archive storage type does not support appended upload.
The sample code uploaded through appendObject is as follows:
#Upload appendable object. Where "content_md5(data)" represents that you need to calculate the md5 value of uploaded data by themselves
#The calculation method of content_md5 is to execute md5 algorithm for the data to obtain 128-byte binary data, and then encode base64.See the section "simple upload" above for example
#Where "content_length(data)" represents that you need to calculate the length of uploaded data by themselves
response = bos_client.append_object(bucket_name=bucket_name,
key=object_key,
data=data,
content_md5=content_md5(data), content_length=content_length(data))
#Obtain the position of appending write next time
next_offset = response.metadata.bce_next_append_offset
bos_client.append_object(bucket_name=bucket_name,
key=object_key,
data=next_data,
content_md5=content_md5(next_data), content_length=content_length(next_data),
offset=next_offset)
#Upload an appendable object from string
from baidubce.services.bos import storage_class
bos_client.append_object_from_string(bucket_name=bucket_name,
key=object_key,
data=string,
offset=offset,
storage_class=storage_class.STANDARD,
user_headers=user_headers)
Multipart Upload
In addition to uploading file to BOS via putObject interface, BOS provides another upload mode, Multipart Upload. You can use Multipart Upload mode in the following application scenarios (but not limited to this), such as:
- Breakpoint upload support is required.
- The file to upload is larger than 5 GB.
- The network conditions are poor, and the connection with BOS servers is often disconnected.
- The file needs to be uploaded streaming.
- The size of the uploaded file cannot be determined before uploading it.
Multipart Upload step by step is introduced below.
Initialize Multipart Upload
BOS uses initiate_multipart_upload method to initialize a multipart upload event:
upload_id = bos_client.initiate_multipart_upload(bucket_name, object_key).upload_id
This method returns InitMultipartUploadResponse object which contains uploadId parameter to represent the current upload event.
Initialization of Multipart Upload with Specific Header
bos_client.initiate_multipart_upload(bucket_name=bucket,
key=object_key,
user_headers=user_headers)
Where the settable attributes of header include: "Cache-Control", "Content-Encoding", "Content-Disposition", "Expires", interfaces get-object and get-object-meta return the 4 headers set.
Initialization of Multipart Load of Infrequent Storage, Cold Storage and Archive Storage
storage_class
needs to be specified for the initialization of multipart upload of infrequent storage, see the following codes (and so on for cold storage):
from baidubce.services.bos import storage_class
bos_client.initiate_multipart_upload(bucket_name=bucket,
key=object_key,
storage_class = storage_class.STANDARD_IA)
Upload in Parts
Upon initialization, perform multipart upload:
left_size = os.path.getsize(file_name)
# left_size is used to set starting position of part
# Set the offset starting position of part
offset = 0
part_number = 1
part_list = []
while left_size > 0:
# Set each part to 5 MB
part_size = 5 * 1024 * 1024
if left_size < part_size:
part_size = left_size
response = bos_client.upload_part_from_file(
bucket_name, object_key, upload_id, part_number, part_size, file_name, offset)
left_size -= part_size
offset += part_size
part_list.append({
"partNumber": part_number,
"eTag": response.metadata.etag
})
part_number += 1
Note:
- The offset parameter is the starting offset position of the part, in bytes.
- size is in byte, the size of each part is defined, and except for the final Part, other parts are more than 5MB. However, Upload Part interface does not check the size of uploaded Part, but checks it only calling complete_multipart_upload().
- To ensure no error of data in the process of network transmission, you are recommended to use the Content-MD5 value returned by each part BOS to respectively verify the validity of part data uploaded after Upload Part. When all part data is combined into one object, it no longer contains the MD5 value.
- Part numbers range from 1 to 10,000. If this range is exceeded, BOS returns the error code of InvalidArguement.
- When uploading Part each time, position stream to the position corresponding to the beginning of upload part.
- After uploading Part each time, returned result of BOS contains an etag and part number (partNumber), which will be used in the subsequent multipart upload, so it needs to be saved. Generally, these etag and partNumber are saved in List.
Complete Multipart Upload
bos_client.complete_multipart_upload(bucket_name, object_key, upload_id, part_list)
Where, the type of part_list is list, each element is a dict in it, and each dict contains 2 key words, partNumber and eTag.
The following is an example:
[{'partNumber': 1, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 2, 'eTag': 'f1c9645dbc14efddc7d8a322685f26eb'}, {'partNumber': 3, 'eTag': '93b885adfe0da089cdf634904fd59f71'}]
The parameters available for calling in parsing class returned by this method include:
Parameter | Description |
---|---|
bucket | bucket name |
key | object name |
e_tag | ETag for each upload chunk |
location | URL of object |
Note: The ETag contained in this object is to upload the ETag of each part in the process of uploading part, and after receiving the list of Part submitted by the users, BOS verifies the validity of each data part one by one. When all data Parts are verified, BOS will combine these data parts into a complete object.
Cancel Multipart Upload Event
You can use abort_multipart_upload method to cancel multipart upload:
bos_client.abort_multipart_upload(bucket_name, object_key, upload_id = upload_id)
Get Unfinished Multipart Upload Event
Users can use the following 2 methods to obtain the uncompleted multipart events in bucket:
Method 1:
response = bos_client.list_multipart_uploads(bucket_name)
for item in response.uploads:
print item.upload_id
For list_multipart_uploads
, BOS returns at most 1,000 Multipart Upload each time, and BOS supports prefix and delimiter filtering.
The parameters of list_multipart_uploads
available for calling method also include:
Name | Type | Description | Required or not |
---|---|---|---|
delimiter | String | Delimiter; mainly implements the logic of list folder | No |
key_marker | String | After object is sorted in lexicographic order, this time it returns from the one after keyMarker. | No |
max_uploads | Int | The maximum number of Multipart Uploads returned by this request, with default of 1,000, maximum of 1,000 | No |
prefix | String | key prefix, object key restricted to return must be prefixed with this | No |
The parameters available for calling in parsing class returned by list_multipart_uploads
method include:
Parameter | Description |
---|---|
bucket | bucket name |
key_marker | Name of part object started to be uploaded |
next_key_marker | This item is returned only when delimiter is specified and IsTruncated is true, as the value to enquire marker next time. |
is_truncated | It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time. |
prefix | Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements |
common_prefixes | This item is returned only when delimiter is specified |
delimiter | Query terminator |
max_uploads | Maximum number of requests returned |
uploads | Container of all multipart events not completed |
owner | User information of corresponding bucket |
id | User ID of bucket Owner |
display_name | Name of bucket Owner |
key | object name of part |
upload_id | Multipart upload id |
initiated | Starting time of multipart upload |
list_all_multipart_uploads
method returns the generator of uploads, and is not limited by return of at most 1,000 results each time, and all results are returned.
Method 2:
uploads = list_all_multipart_uploads(bucket_name)
for item in uploads:
print item.upload_id
Get All Uploaded Part Information
You can use the following 2 methods to obtain all uploaded parts in an upload event:
Method 1:
response = bos_client.list_parts(bucket_name, object_key, upload_id)
for item in response.parts:
print item.part_number
Note: 1. BOS is sorted in ascending order of PartNumber. 2. It is not recommended to generate Part list of the final CompleteMultipartUpload with the result from ListParts because network transfers can go wrong.
The parameters of list_parts
method available for calling also include:
Name | Type | Description | Required or not |
---|---|---|---|
max_parts | Int | The maximum number of parts returned by BOS at one time, with default of 1,000, maximum of 1,000 | No |
part_number_marker | Int | Sort by partNumber. The starting part of this request is returned from the next of this partNumber. | No |
The parameters available for calling in parsing class returned by list_parts
method include:
Parameter | Description |
---|---|
bucket | bucket name |
key | object name |
initiated | Starting time of current multipart upload |
max_parts | Maximum number of requests returned |
is_truncated | It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time. |
storage_class | The storage type of object is currently classified into standard type STANDARD , infrequency type STANDARD_IA , cold storage type COLD and archive type ARCHIVE . |
part_number_marker | Flag bit of part starting |
parts | Part list, list type |
+part_number | Part number |
+last_modified | Last modification time of the part |
+e_tag | ETag for each upload chunk |
+size | Size of part contents (number of bytes) |
upload_id | Current multipart upload id |
owner | User information of corresponding bucket |
+id | User ID of bucket owner |
+display_name | Name of bucket owner |
next_part_number_marker | The partNumber of the last record returned in the current request can be used as part_number_marker of the next request |
Method 2:
parts = list_all_parts(bucket_name, object_key, upload_id = upload_id)
for item in parts:
print item.part_number
list_all_parts
method returns the generator of parts, and is not limited by return of at most 1,000 results each time, and all results are returned.
Obtain storage type of multipart upload object
response = bos_client.list_parts(bucket_name=bucket,
key=object_key,
upload_id=upload_id)
print response.storage_class
Package block upload
In Python SDK, BOS provides users with put_super_obejct_from_file() interface,
In Python SDK, BOS provides users with put super obejct from file() interface,It is related to block upload initiate_multipart_upload、upload_part_from_file、complete_multipart_upload three methods to package,users only need to call the interface to complete block upload.
import multiprocessing
file_name = "/path/to/file.zip"
result = bos_client.put_super_obejct_from_file(bucket_name, key, file_name,
chunk_size=5, thread_num=multiprocessing.cpu_count())
if result:
print "Upload success!"
The parameters that can be called by the method are:
Name | Type | Describe | Required or not |
---|---|---|---|
chunk_size | int | Block size, in MB. The default is 5MB | No |
thread_num | int | In block upload, the number of threads in the thread pool is equal to the number of CPU cores by default | No |
If a large file takes a long time and the user wants to end the block upload, the cancel() method in UploadTaskHandle can be called to cancel the block upload operation. Examples are as follows
import threading
from baidubce.services.bos.bos_client import UploadTaskHandle
file_name = "/path/to/file.zip"
uploadTaskHandle = UploadTaskHandle()
t = threading.Thread(target=bos_client.put_super_obejct_from_file, args=(bucket_name, key, file_name),
kwargs={
"chunk_size": 5,
"thread_num": multiprocessing.cpu_count(),
"uploadTaskHandle": uploadTaskHandle
})
t.start()
time.sleep(2)
uploadTaskHandle.cancel()
t.join()
Breakpoint Continued Upload
When a user uploads a large file to BOS, if the network is unstable or the program crashes, the entire upload fails, and the part uploaded before the failure is invalid, so the user has to start over again. This is not only a waste of resources, in the case of network instability, it cannot complete the upload even after multiple retries. Based on the above scenarios, BOS provides the ability to continue uploading at breakpoints:
- Under normal network conditions, it is recommended to use the three-step upload method to divide the object into 1Mb part. Refer to Multipart Upload.
- When you have a poor network condition, it is recommended to use appendobject method for breakpoint resume, and append a small data 256kb, please see Append upload
Tips
- Breakpoint continued upload is the encapsulation and enhancement of multipart upload. It is realized through multipart upload.
- When the file is large or the network environment is poor, it is recommended to upload it in parts.
Download File
BOS Python SDK provides rich file download interfaces, and you can download files from BOS in the following ways:
- Simple streaming download
- Download to local file
- Downloaded as string
- Breakpoint continued download
- Range download
Simple Object Reading
You can read object in a stream through the following codes:
response = bos_client.get_object(bucket_name, object_key)
s = response.data
# Process object
...
# Close stream
response.data.close()
Download Object to File or String Directly
You can download object to the specified rules by reference to the following codes:
bos_client.get_object_to_file(bucket_name, object_key, file_name)
The user can download the object to the string by referring to the following code:
result = bos_client.get_object_as_string(bucket_name, object_key)
print result
Range Download
To realize more functions, you can specify the download range by using the designated 'range' parameter to obtain a more refined object. If the specified download range is 0-100, the 0-100th (including) byte of data is returned, 101 bytes of data in total, i.e. [0, 100].
range = [0,1000]
#Return the object data within the specified range
print bos_client.get_object_as_string(bucket_name, object_key, range = range)
#Return the object data within the specified range to files
bos_client.get_object_to_file(bucket_name, object_key, file_name, range = range)
Set the range to return object by the range parameters of get_object_as_string and get_object_to_file. You can use this function for segmented download of file and breakpoint continued upload.
Other Methods
Get Storage Type of Object
The attribute of object's storage class can be divided into (STANDARD) (standard storage), (STANDARD_IA) (infrequent storage), (COLD) (cold storage) and (ARCHIVE) (archival storage), and the attribute of object's storage class can be obtained by the following codes:
response = bos_client.get_object_meta_data(bucket_name, object_key)
print response.metadata.bce_storage_class
Get ObjectMetadata Only
Only metadata of object can be obtained by get_object_meta_data
method, rather than physical objects of object. As shown in the following code:
response = bos_client.get_object_meta_data(bucket_name, object_key)
The callable parameters in parser class returned by the get_object_meta_data
method include:
Parameter | Description |
---|---|
content_length | Size of object |
e_tag | Entity tag of HTTP protocol for object |
bce_meta | If the customized meta of user_metadata is specified in Putobject, this item () returns |
storageClass | Storage type of object |
bce_restore | The archival storage object returns when it is being retrieved or has been retrieved. If the value of the archival object bce_restore under retrieval is ongoing-request="true" ; if the value of the archival object bce_restore is ongoing-request="false" , expiry-date="Wed, 07 Nov 2019 00:00:00 GMT" . Among them, expiry-date indicates the expiration time after the retrieval of the object, which is Greenwich Mean Time. |
Retrieve the Files of Archive Storage Type
Retrieve the Archive Files
After the user uploads the archive files, the archive files must be frozen; to download the archive files, the user must retrieve the archive files at first. The requester must have the access to read the archive files, and the archive files are frozen.
The example for the retrieval of the archive files is shown as follows:
# Retrieve the archive files and set the duration after the unfreezing to be 2 days
bos_client.restore_object(bucket_name, target_key, days=2)
Determine if the Archive File Is Retrieved
For the archive storage object, if it returns when it is being retrieved or has been retrieved, bce_restore returns when obtaining ObjectMetadata. If the value of the archive object bce_restore under retrieval is ongoing-request="true"
; if the value of the archive object bce_restore is ongoing-request="false", expiry-date="Wed, 07 Nov 2019 00:00:00 GMT"
. expiry-date indicates expiry time after object is retrieved.
The following is an example:
response = bos_client.get_object_meta_data(bucket_name, object_key)
if response.metadata.bce_restore is not None:
if response.metadata.bce_restore.find("expiry-date") >= 0:
print("archive object is restored.")
else:
print("archive object is restoring.")
else:
print("archvie object is freezed.")
Change File Storage Level
As mentioned above, BOS supports four types of storage: 'STANDARD' (standard storage), 'STANDARD_IA' (infrequent storage), 'COLD' (cold storage) and 'ARCHIVE' (archival storage). Meanwhile, BOS python SDK also supports users to change storage type for the specific files. The parameters involved are as follows:
Parameter | Description |
---|---|
x-bce-storage-class | Designate the storage class of object, 'STANDARD' (standard storage), STANDARD_IA represents the infrequent storage; COLD represents the cold storage and ARCHIVE represents archival storage; it is defaulted to be standard storage if the storage class is not specified. |
Note:
- When you call copy_object() interface, if the source object is an archive type, restore_object() needs to be called at first to retrieve the archive file.
The following is an example:
# Standard storage to infrequent storage
bos_client.copy_object(source_bucket_name, source_key,
target_bucket_name, target_key,
storage_class = storage_class.STANDARD_IA)
# infrequent storage to cold storage
bos_client.copy_object(source_bucket_name, source_key,
target_bucket_name, target_key,
storage_class = storage_class.COLD)
Get File Download URL
The user can get the designated URL of object by the following sample code:
url = bos_client.generate_pre_signed_url(bucket_name, object_key, timestamp, expiration_in_seconds)
Note:
- Before calling this function, the user needs to manually set
endpoint
as the domain name of the region. Baidu AI Cloud currently has opened access to multi-region support, please refer to Region Selection Description. Currently, it supports "North China-Beijing", "South China-Guangzhou" and "East China-Suzhou". Beijing:http://bj.bcebos.com
; Guangzhou:http://gz.bcebos.com
; Suzhou:http://su.bcebos.com
.- The
timestamp
is an option parameter, and the default value is current time when it is not configured.- The
timestamp
is a timestamp, which identifies the effective start time of URL, withtimestamp=int(time.time ()
, and it needs to* import time
.- The
expriation_in_seconds
is used to set the effective duration of URL, and it is an optional parameter, whose default value is 1800 seconds if it is not configured. To set a time not invalid permanently,expirationInSeconds
parameter can be set as -1, and it cannot be set as other negative numbers.
Enumerate Files in Storage Space
BOS SDK allows users to enumerate objects in the following two ways:
- Simple enumeration
- Complex enumeration by parameters
In addition, you can simulate folders while listing files.
Simple Enumeration
After completing a series of uploads, the user may need to view all object in the designated bucket, which can be realized by the following codes:
response = bos_client.list_objects(bucket_name)
for object in response.contents:
print object.key
Note:
- By default, only 1,000 objects are returned and the is_truncated value is True if bucket has over 1,000 objects. Besides, next_marker is returned as the starting point for the next reading.
- To increase the number of returned objects, you can use the marker parameter to read by several times.
List all the objects under current bucket at one time.
for object in bos_client.list_all_objects(bucket_name):
print object.key
Complex Enumeration by Parameters
Other optional parameters of list_objects
include:
Parameter | Description |
---|---|
prefix | The object key returned by the qualification must be prefixed with prefix. |
delimiter | A character used to group the object names. All names contain the specific prefix and the object between the characters Delimiter appearing for the first time serves as a set of element: CommonPrefixes. |
max_keys | Limit the maximum number of object returned, and the value is not greater than 1000; the default value is 1000 if not configured. |
marker | The set value returns from the first one sorted alphabetically after Marker. |
Note:
1.If an object is named after Prefix, all the Keys returned will still contain the object named after prefix when only the Prefix is used for query, as detailed in Recursively List All Files in Directory. 2.If an object is named after Prefix, all the Keys returned will contain Null and the Key name does not contain the Prefix when the combination of Prefix and Delimiter is used for query, as detailed in View Files and Subdirectories under Directory.
Next, we use several cases to illustrate the method of parameter enumeration:
Specify the Maximum Number of Returned Entries
max_keys = 500
# Specify the maximum number of returned entries to be 500
response = bos_client.list_objects(bucket_name, max_keys = max_keys)
for object in response.contents:
print object.key
Return the Object with the Specified Prefix
prefix = "test"
# Specify the returned object with test as the prefix
response = bos_client.list_objects(bucket_name, prefix = prefix)
for object in response.contents:
print object.key
Return from the Specified Object
marker = "object"
# You can define an object not to be included, and return from it
response = bos_client.list_objects(bucket_name, marker = marker)
for object in response.contents:
print object.key
Page to Get All Objects
isTruncated = True
# You can set a maximum of 500 records per page
max_keys = 500
marker = None
while isTruncated:
response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
for object in response.contents:
print object.key
isTruncated = response.is_truncated
marker = getattr(response,'next_marker',None)
Page the Results after Getting All Specific Objects
# You can set up to 500 records per page and get them from a specific object
max_keys = 500
marker = "object"
isTruncated = True
while isTruncated:
response = bos_client.list_objects(bucket_name, max_keys = max_keys, marker=marker)
for object in response.contents:
print object.key
isTruncated = response.is_truncated
marker = getattr(response,'next_marker',None)
Page to Get the Object Results for All the Specified Prefixes
# You can set the page to get the object with the specified prefix, with a maximum of 500 records per page
max_keys = 500
prefix = "object"
isTruncated = True
while isTruncated:
response = bos_client.list_objects(bucket_name, prefix = prefix)
for object in response.contents:
print object.key
isTruncated = response.is_truncated
marker = getattr(response,'next_marker',None)
The callable parameters in parser class returned by the list_objects
method include:
Parameter | Description |
---|---|
name | bucket name |
prefix | Match object starting from prefix to the Delimiter character of the first occurrence to return as a set of elements |
marker | Starting point of this query |
max_keys | Maximum number of requests returned |
is_truncated | It indicates whether all queries have returned; false means all results have been returned this time; true means all results have not been returned this time. |
contents | Container of an object returned |
+key | object name |
+last_modified | Last time this object was modified |
+e_tag | Entity tag of HTTP protocol for object |
+size | Content size of object (number of bytes) |
+owner | User information of bucket corresponding to object |
++id | User ID of bucket Owner |
++display_name | Name of bucket Owner |
next_marker | As long as IsTruncated is true, next_marker returns as the value to query marker for the next time. |
common_prefixes | This item is returned only when delimiter is specified |
The list_all_objects
method returns the Generator of contents, and is not subject to the limit for a maximum return of 1000 results at a time, but all results are returned.
Simulate Folder Function
No concept of folder exists in BOS storage results. All elements are stored in object, but BOS users often need folders to manage files when using data. Therefore, BOS provides the ability to create simulated folders, which essentially creates an object with size of 0. You can upload and download this object, but the console displays it as a folder for objects ending with "/".
You can simulate the folder function through the combination of Delimiter and Prefix parameters. The combination of Delimiter and Prefix works like this:
If setting Prefix to a folder name, you can list the files that begin with Prefix, that is, all the recursive files and subfolders (directories) under the folder. The file name is displayed in Contents. If Delimiter is set to "/" again, the return value only lists the files and subfolders (directories) under the folder. The names of subfiles (directories) under the folder are returned in the CommonPrefixes section, and the recursive files and folders under the subfolders are not displayed.
Suppose bucket have 5 files: bos.jpg, fun/, fun/test.jpg, fun/movie/001.avi and fun/movie/007.avi use the symbol "/" as the separator of a folder.
Here are some application modes:
List All Files in the Bucket.
When you need to get all files under bucket, refer to Page to Get All objects.
Recursively List All Files in the Directory.
Set the Prefix
parameter to access all files under a directory:
prefix = "fun/";
print "Objects:";
# Recursively list all files in the fun directory
response = bos_client.list_objects(bucket_name, prefix = prefix)
for object in response.contents:
print object.key
Output:
Objects:
fun/
fun/movie/001.avi
fun/movie/007.avi
fun/test.jpg
View Files and Subdirectories under the Directory.
The files and subdirectories under directory can be listed with the combination of Prefix
and Delimiter
:
# "/" is a folder separator
delimiter = "/"
prefix = "fun/";
# List all files and folders under fun file directory
response = bos_client.list_objects(bucket_name, prefix = prefix, delimiter = delimiter)
print "Objects:"
for object in response.contents:
print object.key
# Traverse all CommonPrefix
print "CommonPrefixs:"
for object in response.common_prefixes:
print object.prefix
Output:
Objects:
fun/
fun/test.jpg
CommonPrefixs:
fun/movie/
In the result returned, the list in objectSummaries
gives the files under fun directory. The list of CommonPrefixs
gives all sub-folders under fun directory. It is clear that the fun/movie/001.avi
file and fun/movie/007.avi
file are not listed because they are movie
directory of the fun
folder.
List the Storage Attributes of Objects in Bucket.
In addition, the user can view all objects in specified bucket as well as the storage class of object, with the code as follows:
response = bos_client.list_objects(bucket_name)
for object in response.contents:
print 'object:%s, storage_class:%s' % (object.key, object.storage_class)
Object Privilege Control
Set Access Privilege of Object
Currently BOS supports two ways to set ACL. The first method is to use Canned Acl. When PutObjectAcl, set access privilege of object through x-bce-acl
or x-bce-grant-privilege
in header field. The currently set privileges include private and public-read, headers of these two types cannot appear in one request at the same time. The second is to upload an ACL file.
For details, please see Set Object Privilege Control.
1.Set the access right of object by using x-bce-ac
l of the header field
from baidubce.services.bos import canned_acl
# Set object as private privilege
bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PRIVATE)
# Set object as public-read privilege
bos_client.set_object_canned_acl(bucket_name, object_key, canned_acl=canned_acl.PUBLIC_READ)
2.Set the access right of object by using x-bce-grant-privilege
of the header field
# Authorize the specified user the right to access object
bos_client.set_object_canned_acl(bucket_name, object_key, grant_read='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')
# Authorize the specified user the FULL_CONTROL privilege of object
bos_client.set_object_canned_acl(bucket_name, object_key, grant_full_control='id="12345678dfd5487e99f5c85aca5c1234",id="1234567880274ea5a9d50fe94c151234"')
3.Set object privilege through set_object_acl() interface.
# Authorize the specified user the right to access object
acl = [{
"grantee":[{
"id":"12345678dfd5484399f5c85aca5c1234"
}],
"privilege":["READ"]
}]
bos_client.set_object_acl(bucket_name, object_key, acl = acl)
View Object Privilege
You cannot set object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).
View the privilege of object as shown in the following code:
response = bos_client.get_object_acl(bucket_name, object_key)
print "object acl:", response
The callable parameters in parser class returned by the getObjectAcl method include:
Parameter | Description |
---|---|
accessControlList | Identify privilege list of object |
grantee | Identify authorized person |
-id | Authorized person ID. |
privilege | Identify the privilege of the authorized person. |
Delete Object Privilege
You cannot delete object acl when retrieving an archive storage type object is incomplete, or when you just upload an archive type file (for the duration, please see the retrieval duration).
The following codes enable it to delete object privilege:
bos_client.delete_object_acl(bucket_name,object_key)
Delete File
Delete an object
Delete an object by the following code:
bos_client.delete_object(bucket_name, object_key)
Delete object in batch
The user can delete the object in batch by the following code
key_list = [object_key1, object_key2, object_key3]
bos_client.delete_multiple_objects(bucket_name, key_list)
Note: support to delete at most 1000 objects in one request.
Check if the File Exists
You can check whether a file exists through the following operations:
from baidubce import exception
try:
response = bos_client.get_object_meta_data(bucket_name, object_key)
print "Get meta:",response.metadata
except exception.BceError as e:
print e
Get and Update Object Metadata
Object metadata (Object Metadata) is the attribute description of files uploaded by users to BOS. It includes two types: HTTP standard attribute (HTTP Headers) and User Meta (custom metadata).
Get Object Metadata
Refer to ObjectMetadata Only.
Modify Object Metadata
BOS modifies object's Metadata by copying object. That is, when copying object, set the destination bucket as the source bucket, set the destination object as the source object, and set a new Metadata to modify Metadata through copy. If the Metadata is not set, an error is reported.
The archived files do not support the modification to metadata.
user_metadata = {'meta_key': 'meta_value'}
bos_client.copy_object(source_bucket_name = bucket_name,
source_key = object_name,
target_bucket_name = bucket_name,
target_key = object_name,
user_metadata = user_metadata)
response = bos_client.get_object_meta_data(bucket_name = bucket_name,
key = object_name)
print response
Copy Object
You can copy an object through the Copyobject function, as shown in the following code:
bos_client.copy_object(source_bucket_name, source_object_key, target_bucket_name, target_object_key)
Synchronize Copy
The Copyobject interface of the current BOS is implemented through synchronization. In synchronization mode, the BOS server returns successfully after Copy is completed. Synchronous copy can help users judge the copy status, but the copy time perceived by users will be longer, and the copy time is proportional to the file size.
Synchronous Copy is more in line with industry conventions and improves compatibility with other platforms. Synchronous Copy also simplifies the business logic of BOS server and improves service efficiency.
If the archived file is a source object, the archived file needs to be retrieved at first.
If you use an SDK prior to the version of bce-python-sdk-0.8.12, it is likely that the duplication request will succeed, but the document copying fails actually, so you are recommended to use the latest version of SDK.
Multipart Upload Copy
In addition to copying files through Copyobject , BOS also provides another copy mode, namely, Multipart Upload Copy. If the archived file is a source object, the archived file needs to be retrieved at first.
You can use Multipart Upload Copy in the following application scenarios (but not limited to this), such as:
- Breakpoint copy support is required.
- The file to copy is larger than 5 GB.
- Network conditions are poor, and connections to BOS services \x{fa38} are often disconnected.
Next, the three-step copy will be introduced step by step.
Three-step copy consists of init, "copy part" and complete, where the operation of init and complete is consistent with upload by part, see Initialize Multipart Upload and Complete Multipart Upload directly.
Copy the chunked code reference:
left_size = int(bos_client.get_object_meta_data(source_bucket,source_key).metadata.content_length)
#Set the starting position of chunk as left_size
#Set the offset starting position of part
offset = 0
part_number = 1
part_list = []
while left_size > 0:
#Set each part to 5 MB
part_size = 5 * 1024 * 1024
if left_size < part_size:
part_size = left_size
response = bos_client.upload_part_copy(source_bucket, source_key, target_bucket, target_key, upload_id,part_number, part_size, offset)
left_size -= part_size
offset += part_size
part_list.append({
"partNumber": part_number,
"eTag": response.etag
})
part_number += 1
Note:
1.The offset parameter is the starting offset position of the part, in bytes. 2.The size parameter defines the size of each part in bytes. Except for the last Part, all other Parts are larger than 5 MB.