ceph rgw 上传大文件采用的是分片上传的方法。
事先配置参数最小分片大小改为12
为了使上传文件透明化,我们使用curl去实现整个rgw上传文件的三个逻辑
第一个准备阶段,目的使获取uploadid
ACCESS_KEY="test1" #填access key
SECRET_KEY="test1" #填secret key
HOST="192.168.1.29:7480" #填S3的Endpoint地址
BUCKET="curlbucket" #填bucket名称
CONTENT_TYPE="text/plaid" #MIME
FILENAME=/root/curltest/24 #文件本地路径
OBJECTNAME="24"
ACL="x-amz-acl:public-read" #Object的ACL
META_DATA="x-amz-meta-ukey:value" #自定义medadata
FILESIZE=$(stat -c%s "$FILENAME")
echo $FILESIZE
FILEMD5=`cat ${FILENAME}| openssl dgst -md5 -binary | openssl enc -base64`
AUTH_PATH="/${BUCKET}/${OBJECTNAME}?uploads"
CURRENT_TIME=`TZ=GMT LANG=en_US date "+%a, %d %b %Y %H:%M:%S GMT"`
stringToSign="POST\n\n\n${CURRENT_TIME}\n${AUTH_PATH}"
echo $stringToSign
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
echo $signature
curl -s -v -X POST "http://${HOST}${AUTH_PATH}" \
-H "Authorization: AWS ${ACCESS_KEY}:${signature}" \
-H "Date: ${CURRENT_TIME}"
第二阶段,根据uploadid上传部分文件,并保留response中的ETAG值
#!/bin/bash
ACCESS_KEY="test1" #填access key
SECRET_KEY="test1" #填secret key
HOST="192.168.1.29:7480" #填S3的Endpoint地址
BUCKET="curlbucket" #填bucket名称
CONTENT_TYPE="text/plaid" #MIME
FILENAME=/root/curltest/121 #文件本地路径
OBJECTNAME="24"
FILESIZE=$(stat -c%s "$FILENAME")
echo $FILESIZE
FILEMD5=`cat ${FILENAME}| openssl dgst -md5 -binary | openssl enc -base64`
AUTH_PATH="/${BUCKET}/${OBJECTNAME}?partNumber=1&uploadId=2~cKXKXvrqwcYFya_UpQqGq4ltxUAYNGV"
CURRENT_TIME=`TZ=GMT LANG=en_US date "+%a, %d %b %Y %H:%M:%S GMT"`
stringToSign="PUT\n\n\n${CURRENT_TIME}\n${AUTH_PATH}"
echo $stringToSign
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
echo $signature
curl -s -v -X PUT "http://${HOST}${AUTH_PATH}" \
-H "Authorization: AWS ${ACCESS_KEY}:${signature}" \
-H "Date: ${CURRENT_TIME}" \
-H "Content-Length: 12" \
-T "${FILENAME}"
#!/bin/bash
ACCESS_KEY="test1" #填access key
SECRET_KEY="test1" #填secret key
HOST="192.168.1.29:7480" #填S3的Endpoint地址
BUCKET="curlbucket" #填bucket名称
CONTENT_TYPE="text/plaid" #MIME
FILENAME=/root/curltest/122 #文件本地路径
OBJECTNAME="24"
FILESIZE=$(stat -c%s "$FILENAME")
echo $FILESIZE
FILEMD5=`cat ${FILENAME}| openssl dgst -md5 -binary | openssl enc -base64`
AUTH_PATH="/${BUCKET}/${OBJECTNAME}?partNumber=2&uploadId=2~cKXKXvrqwcYFya_UpQqGq4ltxUAYNGV"
CURRENT_TIME=`TZ=GMT LANG=en_US date "+%a, %d %b %Y %H:%M:%S GMT"`
stringToSign="PUT\n\n\n${CURRENT_TIME}\n${AUTH_PATH}"
echo $stringToSign
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
echo $signature
curl -s -v -X PUT "http://${HOST}${AUTH_PATH}" \
-H "Authorization: AWS ${ACCESS_KEY}:${signature}" \
-H "Date: ${CURRENT_TIME}" \
-H "Content-Length: 12" \
-T "${FILENAME}"
第三个阶段,完成上传逻辑
#!/bin/bash
ACCESS_KEY="test1" #填access key
SECRET_KEY="test1" #填secret key
HOST="192.168.1.29:7480" #填S3的Endpoint地址
BUCKET="curlbucket" #填bucket名称
CONTENT_TYPE="text/plaid" #MIME
FILENAME=/root/curltest/1.xml #文件本地路径
OBJECTNAME="24"
FILESIZE=$(stat -c%s "$FILENAME")
echo $FILESIZE
FILEMD5=`cat ${FILENAME}| openssl dgst -md5 -binary | openssl enc -base64`
AUTH_PATH="/${BUCKET}/${OBJECTNAME}?uploadId=2~cKXKXvrqwcYFya_UpQqGq4ltxUAYNGV"
CURRENT_TIME=`TZ=GMT LANG=en_US date "+%a, %d %b %Y %H:%M:%S GMT"`
stringToSign="POST\n\n\n${CURRENT_TIME}\n${AUTH_PATH}"
echo $stringToSign
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
echo $signature
curl -s -v -X POST "http://${HOST}${AUTH_PATH}" \
-H "Authorization: AWS ${ACCESS_KEY}:${signature}" \
-H "Date: ${CURRENT_TIME}" \
-T "${FILENAME}"
xml文件内容
<CompleteMultipartUpload>
<Part>
<PartNumber>1</PartNumber>//文件里面的顺序可以是无序的,但是partnumber与Etag值一定要对应,
<ETag>"14812c00f44e41ef5233694083171b26"</ETag>
</Part>
<Part>
<PartNumber>2</PartNumber>
<ETag>"610fcea2f7195f652dbef3bfc77ea30a"</ETag>
</Part>
</CompleteMultipartUpload>
1.ceph rgw 按你上传的partnumber(partnumber 是一个uint32的数字(范围大于等于1),但是如果你输入partnumber是-1,0 在上传阶段是不会给你返回错误,只有在完成阶段才报给你invoidpart问题,使用的话必须大于0 rgw里面有这层代码) 从小到大排序。来组织文件
2.partnumber可以不连续,程序中有这一层逻辑,如果找不到顺序的一个就进行 return 函数本身。
3.断点续传就是在获取上传分片的部分进行上传(需要客户端进行文件切片,S3cmd 有这个功能,就是这样实现),并行上传:原理就是(可以同时上传多个part异步上传分片,并记录完所有etag后直接发送完成标志)
4.关于大文件下载,这个没进行研究,现在ceph L版本支持AWS torrent 下载(减少带宽 百度百科https://baike.baidu.com/item/BitTorrent/142795?fr=aladdin),https://ceph.com/releases/v12-0-1-luminous-dev-released/ https://docs.aws.amazon.com/zh_cn/AmazonS3/latest/dev/S3Torrent.html