Nobody likes hidden fees.
From contract fine print to being charged for spending money, there are innumerable examples of surprise costs taking a huge chunk out of peoples’ wallets. However, it’s particularly annoying when it comes as the result of necessary action.
Thus we come to data egress.
Everybody transfers data - it’s not nearly as useful staying in a silo. However, when you do this you can sometimes incur extra data egress charges, especially when you’re running a cloud-powered company with services like AWS.
That’s why in this post we’re going to teach you all about data egress, including what exactly it is, how it boosts AWS costs, and how to reduce said costs. Feel free to skip ahead to the section you’re interested in:
- What is data egress?
- When does data transfer in the cloud count as egress?
- Pricing, and why data egress is an additional AWS fee
- How to reduce data egress charges
- How to effectively manage your data egress
Let’s get started.
What is data egress?
“Data egress” is a term describing the process of data leaving a local network. The data itself doesn't matter - there’s no one type of information that can be egressed. Instead it’s the act of transferring it out of a local network which is called “egress”.
Similarly, it doesn’t matter what kind of “local” network the data is being transferred out of, only that it is leaving the network. Whether you’re saving data from the company’s physical servers to a USB stick and taking it home with you, a client is downloading a file from your software, or you are sending an email, all of these involve data egress via it leaving the network that it originated in.
It’s worth noting that this doesn’t include data that’s being inputted into your local network - this is known as “data ingress”.
There are, however, a few nuances to be aware of as to what does and doesn’t count as data egress, which we’ll cover a little later.
When does data transfer in the cloud count as egress?
You get it. Data egress is always the act of transferring data out of a closed network. Yet, when you are using a cloud computing provider like AWS, you might not be aware of all the network boundaries your data may be crossing. As a result, data transfers that count as egress might surprise you.
Let’s run through a few examples to show you what we mean and really set the boundaries of what a closed network in AWS is.
We’ll start off simple with a basic setup. You have a simple SaaS app which leverages the computing power of EC2 and the storage capabilities of S3, all within a single AWS Region and Availability Zone, and within the same AWS Account. In this configuration, no artificial network boundaries have been established within the workload of your SaaS app. You’ve got a closed network. However, your clients do need to be able to use your app.
When your clients load up your app, and retrieve images and information within it, data is transferred out of your closed AWS network, out to the public internet and into the users’ browsers or applications. This is data egress, often referred to as internet transfer.
Next, imagine that you have a similar SaaS app that has grown in response to customer needs. It still leverages the computing power of EC2 and the storage capabilities of S3, but you have established some virtual private clouds (VPCs) to isolate a variety of enterprise clients’ data to alleviate privacy concerns, you have distributed your workloads across availability zones to harden against disaster, you have replicated your clusters in several AWS Regions to improve global response times, and you have spread your workloads across multiple AWS accounts, each associated with a specific feature, so that you can easily analyze costs. So far, so good.
Well, from a cost perspective, it’s not great. Each of these improvements (establishing VPC boundaries, leveraging resources across AZs, and establishing capabilities across regions or accounts) represent a newly constructed closed network barrier in your AWS configuration. As a result, anytime you send data across these boundaries, you will incur data egress charges, before any data is sent to a client over the internet, though, luckily at a slightly lower rate. Plus, anytime data is exchanged between AZs, there’s a data ingress charge attached as well!
For our final, most complex example, let’s take a look at how data egress would change if this same SaaS app were to start sending data from each of its regional clusters to a single AWS Lambda workload for processing, thereby adding greater functionality to the application without disturbing its core architecture. Great!
Perhaps not. Data needs to be transferred from each of your EC2 VPCs to your Lambda implementation in order for the workload to execute, upon which point the results are then returned to the original source. In this case we have two examples of data egress; first, when the data leaves for the Lambda workload, and another when the results return from the Lambda workload. To make matters worse, this type of data egress is charged at the same high rate as data egress to the public internet.
However, notice that we called egress charges “extra”. This is because they aren’t included in the pricing model of any AWS service.
Pricing, and why data egress is an additional AWS fee
One of the biggest enduring problems of AWS is that, no matter which service you’re using, the pricing documentation is as clear as mud. While you’re technically provided with most of the information you need, it’s a hell of an ordeal to try and dig through the various submenus to actually find what you’re looking for.
So let’s start simple; why isn’t data egress included in your regular AWS pricing?
The short answer is that AWS service pricing isn’t a predictor of your rate of data egress. AWS service pricing typically relates to the scale of usage of the resources, such as the amount of data stored or processed, or the size of the resource used to do the processing. All these variables are completely independent of how many times data is sent across closed network boundaries. For example, you could be paying S3 just pennies a month to store a series of photos from a recent vacation, but if you view each one of those photos every hour of the day, you could be looking at some significant egress fees.
When planning for your AWS budget, you have to predict data transfer costs separately, but it does get complicated.
Like the rest of AWS, data transfer is inherently on-demand - unless you know exactly how many times you’ll be exporting or requesting data in advance you can’t simply create a round figure and call it a day. Thankfully, AWS throws us a bone here by grouping their data egress costs into categories and bands, much like with their S3 pricing.
Speaking of which, your data egress (or what Amazon simply calls “data transfer”) costs are also separated within your regular bill because they exist largely separate from each service.
For example, the costs of data egress from EC2 instances to the public internet are as follows:
- First 100 GB/Month of data egress from EC2 to the internet = free
- Next 10 TB/Month = $0.0900 per GB
- Next 40 TB/Month = $0.0850 per GB
- Next 100 TB/Month = $0.0700 per GB
- Anything beyond the first 150 TB/Month = $0.0500 per GB
Additionally, data egress from an EC2 instance to AWS GovCloud or any other AWS Region costs an additional $0.02 per GB. Similarly, each GB of data egress and ingress between Availability Zones in a single AWS Region costs $0.01, which comes to a total of $0.02 per GB. Data egress to Amazon CloudFront is free.
However, this is also the price for almost all data egress fees from AWS as a whole, no matter the service. The only exceptions are:
- S3 Multi-Region Access Points data egress within AWS = $0.0033 per GB
- Internet acceleration for S3 Multi-Region Access Points = between $0.0050 per GB and $0.0600 per GB, depending on your host region
- S3 Transfer Acceleration = $0.0400 per GB
How to reduce data egress charges
As with any bonus charges you can incur, it’s good to know how to limit them as much as possible so that you can retain your budget for, say, purchasing extra resources to expand your operation. But how are you supposed to do that without totally rearchitecting how your app works?
The answer is:
- Limit unnecessary data egress
- Take advantage of Amazon CloudFront
- Limit the number of Virtual Private Cloud(s) (VPC), AWS Region(s) or Availability Zone(s) (AZ) in your configuration
- Look for the cheapest region or zone
The first answer is the most obvious one - to reduce any unnecessary data egress. You won’t get charged for something you don’t do, so if there are any operations you can streamline by transferring less data between services, across AWS Regions, or out to the internet at large, you’ll save money. For example, lazy-load images in a browser as a user scrolls down the page, instead of loading all images at once - the user may never scroll all the way down!
Second, you should be using Amazon CloudFront as much as possible to alleviate some of your egress woes. This is because CloudFront has its own free data egress allowance and doesn’t incur extra charges when data is transferred from another AWS service to it.
CloudFront is especially useful for those with smaller-scale operations, as its free tier allows up to 1TB of data to be transferred out before you start incurring charges.
Next, you can optimize your setup to limit the number of Virtual Private Clouds, AWS Regions or Availability Zones (smaller closed networks within each region). This will limit your egress charges by reducing the number of closed network boundaries across which data needs to travel, incurring costs each time.
If you’re willing to put in some extra migration effort and optimization, you could take this one step further and look for the cheapest region or zone that you can use without affecting performance. This is only really relevant for internet acceleration charges for data traveling through S3 Multi-Region Access Points as they’re the only instance of regions having different data egress costs, but it’s still worth noting as an easy way to potentially save a lot of money over the years.
How to effectively manage your data egress
There’s only one way to effectively manage your data egress, and it’s the same way you manage your AWS bill in general. You need to know it inside-out, and analyze it for what’s working, what isn’t, and what’s costing too much (or more than it should).
To that end, Aimably’s AWS Spend Transparency Software, the AWS Cost Explorer and the AWS Billing console are great tools that can help you get almost all of the data you’ll need. You’ll also get a lot more insight if you use cost allocation tags in Cost Explorer to single out your data egress charges, as this will show you at a glance where most of your money is going.
It’s a tricky act to balance - finding the right mix of data transfer to get the important work done and keep your services running smoothly while also staying within budget and not wasting all of your money on needless data egress. However, with these tips and a little experimentation with the available tools, you’ll be streamlining your operations in no time.