AWS : S3 (Simple Storage Service) IV - Uploading a large file
The code below is based on An Introduction to boto's S3 interface - Storing Data.
To setup boto on Mac:
$ sudo easy_install pip $ sudo pip install boto
Because S3 requires AWS keys, we should provide our keys: AWS_ACCESS_KEY and AWS_ACCESS_SECRET_KEY. The code uses them from /etc/boto.conf or ~/.boto:
[Credentials] AWS_ACCESS_KEY_ID = A...3 AWS_SECRET_ACCESS_KEY = W...9
Here is our Python code (s3upload.py):
#!/bin/python import os import argparse import boto import sys from boto.s3.key import Key AWS_ACCESS_KEY = boto.config.get('Credentials', 'aws_access_key_id') AWS_ACCESS_SECRET_KEY = boto.config.get('Credentials', 'aws_secret_access_key') def check_arg(args=None): parser = argparse.ArgumentParser(description='args : start/start, instance-id') parser.add_argument('-b', '--bucket', help='bucket name', required='True', default='') parser.add_argument('-f', '--filename', help='file to upload', required='True', default='') results = parser.parse_args(args) return (results.bucket, results.filename) def upload_to_s3(aws_access_key_id, aws_secret_access_key, file, bucket, key, callback=None, md5=None, reduced_redundancy=False, content_type=None): """ Uploads the given file to the AWS S3 bucket and key specified. callback is a function of the form: def callback(complete, total) The callback should accept two integer parameters, the first representing the number of bytes that have been successfully transmitted to S3 and the second representing the size of the to be transmitted object. Returns boolean indicating success/failure of upload. """ try: size = os.fstat(file.fileno()).st_size except: # Not all file objects implement fileno(), # so we fall back on this file.seek(0, os.SEEK_END) size = file.tell() conn = boto.connect_s3(aws_access_key_id, aws_secret_access_key) rs = conn.get_all_buckets() for b in rs: print b nonexistent = conn.lookup(bucket) if nonexistent is None: print 'Not there!' bucket = conn.get_bucket(bucket, validate=True) k = Key(bucket) k.key = key if content_type: k.set_metadata('Content-Type', content_type) sent = k.set_contents_from_file(file, cb=callback, md5=md5, reduced_redundancy=reduced_redundancy, rewind=True) # Rewind for later use file.seek(0) if sent == size: return True return False if __name__ == '__main__': bucket, filename = check_arg(sys.argv[1:]) file = open(filename, 'r+') print 'ACCESS_KEY=',AWS_ACCESS_KEY print 'ACCESS_SECRET_KEY=',AWS_ACCESS_SECRET_KEY key = file.name print 'key=',key print 'bucket=',bucket if upload_to_s3(AWS_ACCESS_KEY, AWS_ACCESS_SECRET_KEY, file, bucket, key): print 'It worked!' else: print 'The upload failed...'
To run, use the following syntax:
python s3upload.py -b bucket-name -f file-name
Real sample:
$ python s3upload.py -b s3-sample-bucket -f sample-file ACCESS_KEY= A... ACCESS_SECRET_KEY= W... key= sample-file bucket= s3-sample-bucket <Bucket: s3-sample-bucket> It worked!
The code below is based on An Introduction to boto's S3 interface - Storing Large Data.
To make the code to work, we need to download and install boto and FileChunkIO.
To upload a big file, we split the file into smaller components, and then upload each component in turn. The S3 combines them into the final object. The python code below makes use of the FileChunkIO module. So, we may want to do
$ pip install FileChunkIO
if it isn't already installed.
Here is our Python code (s3upload2.py):
# s3upload.py # Can be used to upload large file to S3 #!/bin/python import os import sys import argparse import math import boto from boto.s3.key import Key from filechunkio import FileChunkIO def check_arg(args=None): parser = argparse.ArgumentParser(description='args : start/start, instance-id') parser.add_argument('-b', '--bucket', help='bucket name', required='True', default='') parser.add_argument('-f', '--filename', help='file to upload', required='True', default='') results = parser.parse_args(args) return (results.bucket, results.filename) def upload_to_s3(file, bucket): source_size = 0 source_path = file.name try: source_size = os.fstat(file.fileno()).st_size except: # Not all file objects implement fileno(), # so we fall back on this file.seek(0, os.SEEK_END) source_size = file.tell() print 'source_size=%s MB' %(source_size/(1024*1024)) aws_access_key = boto.config.get('Credentials', 'aws_access_key_id') aws_secret_access_key = boto.config.get('Credentials', 'aws_secret_access_key') conn = boto.connect_s3(aws_access_key, aws_secret_access_key) bucket = conn.get_bucket(bucket, validate=True) print 'bucket=%s' %(bucket) # Create a multipart upload request mp = bucket.initiate_multipart_upload(os.path.basename(source_path)) # Use a chunk size of 50 MiB (feel free to change this) chunk_size = 52428800 chunk_count = int(math.ceil(source_size / chunk_size)) print 'chunk_count=%s' %(chunk_count) # Send the file parts, using FileChunkIO to create a file-like object # that points to a certain byte range within the original file. We # set bytes to never exceed the original file size. sent = 0 for i in range(chunk_count + 1): offset = chunk_size * i bytes = min(chunk_size, source_size - offset) sent = sent + bytes with FileChunkIO(source_path, 'r', offset=offset, bytes=bytes) as fp: mp.upload_part_from_file(fp, part_num=i + 1) print '%s: sent = %s MBytes ' %(i, sent/1024/1024) # Finish the upload mp.complete_upload() if sent == source_size: return True return False if __name__ == '__main__': ''' Usage: python s3upload.py -b s3-sample-bucket -f sample-file2 ''' bucket, filename = check_arg(sys.argv[1:]) file = open(filename, 'r+') if upload_to_s3(file, bucket): print 'It works!' else: print 'The upload failed...'
The code takes two args: bucket-name and file-name:
/Users/kihyuckhong/DATABACKUP_From_EC2$ python s3upload2.py -b s3-sample-bucket -f sample-file2 source_size=524 MB bucket=<Bucket: s3-sample-bucket> chunk_count=10 0: sent = 50 MBytes 1: sent = 100 MBytes 2: sent = 150 MBytes 3: sent = 200 MBytes 4: sent = 250 MBytes 5: sent = 300 MBytes 6: sent = 350 MBytes 7: sent = 400 MBytes 8: sent = 450 MBytes 9: sent = 500 MBytes 10: sent = 524 MBytes It works!
AWS (Amazon Web Services)
- AWS : EKS (Elastic Container Service for Kubernetes)
- AWS : Creating a snapshot (cloning an image)
- AWS : Attaching Amazon EBS volume to an instance
- AWS : Adding swap space to an attached volume via mkswap and swapon
- AWS : Creating an EC2 instance and attaching Amazon EBS volume to the instance using Python boto module with User data
- AWS : Creating an instance to a new region by copying an AMI
- AWS : S3 (Simple Storage Service) 1
- AWS : S3 (Simple Storage Service) 2 - Creating and Deleting a Bucket
- AWS : S3 (Simple Storage Service) 3 - Bucket Versioning
- AWS : S3 (Simple Storage Service) 4 - Uploading a large file
- AWS : S3 (Simple Storage Service) 5 - Uploading folders/files recursively
- AWS : S3 (Simple Storage Service) 6 - Bucket Policy for File/Folder View/Download
- AWS : S3 (Simple Storage Service) 7 - How to Copy or Move Objects from one region to another
- AWS : S3 (Simple Storage Service) 8 - Archiving S3 Data to Glacier
- AWS : Creating a CloudFront distribution with an Amazon S3 origin
- AWS : Creating VPC with CloudFormation
- AWS : WAF (Web Application Firewall) with preconfigured CloudFormation template and Web ACL for CloudFront distribution
- AWS : CloudWatch & Logs with Lambda Function / S3
- AWS : Lambda Serverless Computing with EC2, CloudWatch Alarm, SNS
- AWS : Lambda and SNS - cross account
- AWS : CLI (Command Line Interface)
- AWS : CLI (ECS with ALB & autoscaling)
- AWS : ECS with cloudformation and json task definition
- AWS Application Load Balancer (ALB) and ECS with Flask app
- AWS : Load Balancing with HAProxy (High Availability Proxy)
- AWS : VirtualBox on EC2
- AWS : NTP setup on EC2
- AWS: jq with AWS
- AWS & OpenSSL : Creating / Installing a Server SSL Certificate
- AWS : OpenVPN Access Server 2 Install
- AWS : VPC (Virtual Private Cloud) 1 - netmask, subnets, default gateway, and CIDR
- AWS : VPC (Virtual Private Cloud) 2 - VPC Wizard
- AWS : VPC (Virtual Private Cloud) 3 - VPC Wizard with NAT
- DevOps / Sys Admin Q & A (VI) - AWS VPC setup (public/private subnets with NAT)
- AWS - OpenVPN Protocols : PPTP, L2TP/IPsec, and OpenVPN
- AWS : Autoscaling group (ASG)
- AWS : Setting up Autoscaling Alarms and Notifications via CLI and Cloudformation
- AWS : Adding a SSH User Account on Linux Instance
- AWS : Windows Servers - Remote Desktop Connections using RDP
- AWS : Scheduled stopping and starting an instance - python & cron
- AWS : Detecting stopped instance and sending an alert email using Mandrill smtp
- AWS : Elastic Beanstalk with NodeJS
- AWS : Elastic Beanstalk Inplace/Rolling Blue/Green Deploy
- AWS : Identity and Access Management (IAM) Roles for Amazon EC2
- AWS : Identity and Access Management (IAM) Policies, sts AssumeRole, and delegate access across AWS accounts
- AWS : Identity and Access Management (IAM) sts assume role via aws cli2
- AWS : Creating IAM Roles and associating them with EC2 Instances in CloudFormation
- AWS Identity and Access Management (IAM) Roles, SSO(Single Sign On), SAML(Security Assertion Markup Language), IdP(identity provider), STS(Security Token Service), and ADFS(Active Directory Federation Services)
- AWS : Amazon Route 53
- AWS : Amazon Route 53 - DNS (Domain Name Server) setup
- AWS : Amazon Route 53 - subdomain setup and virtual host on Nginx
- AWS Amazon Route 53 : Private Hosted Zone
- AWS : SNS (Simple Notification Service) example with ELB and CloudWatch
- AWS : Lambda with AWS CloudTrail
- AWS : SQS (Simple Queue Service) with NodeJS and AWS SDK
- AWS : Redshift data warehouse
- AWS : CloudFormation
- AWS : CloudFormation Bootstrap UserData/Metadata
- AWS : CloudFormation - Creating an ASG with rolling update
- AWS : Cloudformation Cross-stack reference
- AWS : OpsWorks
- AWS : Network Load Balancer (NLB) with Autoscaling group (ASG)
- AWS CodeDeploy : Deploy an Application from GitHub
- AWS EC2 Container Service (ECS)
- AWS EC2 Container Service (ECS) II
- AWS Hello World Lambda Function
- AWS Lambda Function Q & A
- AWS Node.js Lambda Function & API Gateway
- AWS API Gateway endpoint invoking Lambda function
- AWS API Gateway invoking Lambda function with Terraform
- AWS API Gateway invoking Lambda function with Terraform - Lambda Container
- Amazon Kinesis Streams
- AWS: Kinesis Data Firehose with Lambda and ElasticSearch
- Amazon DynamoDB
- Amazon DynamoDB with Lambda and CloudWatch
- Loading DynamoDB stream to AWS Elasticsearch service with Lambda
- Amazon ML (Machine Learning)
- Simple Systems Manager (SSM)
- AWS : RDS Connecting to a DB Instance Running the SQL Server Database Engine
- AWS : RDS Importing and Exporting SQL Server Data
- AWS : RDS PostgreSQL & pgAdmin III
- AWS : RDS PostgreSQL 2 - Creating/Deleting a Table
- AWS : MySQL Replication : Master-slave
- AWS : MySQL backup & restore
- AWS RDS : Cross-Region Read Replicas for MySQL and Snapshots for PostgreSQL
- AWS : Restoring Postgres on EC2 instance from S3 backup
- AWS : Q & A
- AWS : Security
- AWS : Security groups vs. network ACLs
- AWS : Scaling-Up
- AWS : Networking
- AWS : Single Sign-on (SSO) with Okta
- AWS : JIT (Just-in-Time) with Okta
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization