A common scenario for log analytics is that many log events are high value for real time analytics, but there are also events that are low value for analytics, but account for a very large percentage of overall log volume.
Often these same low value logs are used only for ad-hoc investigations from time to time or need to be retained for audit purposes.
Sumo Logic data tiering provides a comprehensive solution for all types of data that an organization has, low touch, high touch and everything in between, at an economical price. Data Tiers provide tier-based pricing based on your planned usage of the data you ingest.
The continuous data tier provides the best Sumo Logic features with a fixed ingest cost per GB in credits.
Where users don't require dashboards or real time alerts, the infrequent data tier can ingest logs at a vastly reduced credits cost per GB with an additional small scan charge per search run. If implemented correctly credits savings per GB can be over 90% vs the continuous data tier.
This article shows how to work through analysis and the implementation of data tiering for AWS Cloudtrail logs - but the same analysis methodology and tiering design can be used for any log type.
Routing low value Cloudtrail events to the infrequent tier
AWS Cloudtrail is a log source that is a critical log source for security analytics in AWS environments, and Sumo Logic and has many out of the box Cloudtrail Apps for security and operations use cases.
Cloudtrail logging can also be very verbose, and it's quite common that some AWS workloads can generate such large volumes of Cloudtrail events that the log analytics budget can be under intense cost pressure!
By identifying and routing low value logs to the infrequent data tier Sumo Logic customers can match credits consumption to the analytics value of the Cloudtrail log based on categories such as AWS service, event or workload.
Users can setup Cloudtrail ingestion such that:
Events with real time analytics value are in 'continuous' tier and available for Cloudtrail apps, dashboards and alerting;
Low value, large volume cloudtrail events are stored in the infrequent tier where they are available for ad-hoc search if required;
Overall solution cost is greatly reduced;
All events are online for immediate search with no archive retrieval delays;
All events are stored in the same Sumo Logic highly secure, scalable cloud platform.
In the next sections we will see how to:
Check for and identify high volume / low value Cloudtrail logs
Create an infrequent tier routing solution in Sumo Logic using a custom 'tier' field, field extraction rules and an infrequent partition.
Walkthrough
1. Analysis: Find What Makes the Volume
To make data tiering valuable we would need to identify one or more segments of the logs that are over 10% of the total volume.
First we need to understand what cloudtrail events generate the most volume. The Sumo Logic Data Volume index can be valuable across categories but we need a new approach to look at the volume drivers inside the same category.
The key is to analyze the breakdown of cloudtrail using properties of the event and find the largest categories by the builtin _size field.
Here is an example Cloudtrail event shown in the Sumo Logic UI, some of the most commonly used fields are highlighted in the green boxes.
In order to identify the largest contributors of cloudtrail events we can do analysis using fields like eventname, or eventsource and look for large contributors to overall size.
Here is an example Sumo Logic query to analyze Cloudtrail logs by volume and a custom dimension: eventname using the _size field.
_sourcecategory=*cloudtrail*
| json field=_raw "eventSource"
| json field=_raw "eventName"
| sum(_size) as bytes by eventname | sort bytes
Use the pie chart visualisation to see which are the top contributors. In the example below Decrypt events generate over 35% of traffic in this account. This would be a top candidate to look at routing to the infrequent tier to reduce credit consumption.
Further Filtering
Once you have determined high level volumes segments, it's important to understand why what AWS workload is creating verbose event logging. To do this update your search to filter for a single large category: such as eventname="decrypt"; then repeat the bytes breakdown by other key fields to find what workloads create the events.
This query is an example of generating a breakdown that is filtered to only decrypt events but providing context using other fields. Use the table aggregation view to view the results of this query.
_sourceCategory=*cloudtrail* decrypt
| json field=_raw "eventName"
| where eventname="Decrypt"
| json field=_raw "eventSource"
| json field=_raw "userIdentity.arn" as arn nodrop
| sum(_size) as bytes by eventsource,eventname,arn
| sort bytes
| total bytes as total_bytes | 100 * (bytes/total_bytes) as percent | fields -total_bytes,bytes
In this example account most Decrypt events have no user arn context, but events from a trusted third party role make up a significant portion.
Common Analytics Dimensions
This table below shows some suggestions for fields to break down your logs by and example workload scenarios:
Field/Dimension | About | Example Scenarios Creating Verbose Logging |
eventname | Name of the API call |
|
userIdentity.arn or userName | User context for assumed roles or user credential |
|
eventsource | AWS API source for event |
|
2. Validate the Use Case
Once the workload sources is understood this can be discussed with security and devops/SRE teams to establish if they are suitable for the infrequent tier.
A good candidate for the infrequent tier is one that:
Comprises significant volume for the log (ideally > 10% or more);
Is used for ad-hoc search ony (ie. no dashboards or real time alerts);
You can define an if statement to scope a query to those logs.
The Search Audit index is a great way to validate what queries are being run against log sources, and what query type they are: dashboard, interactive search etc. Analysing queries in this case vs Cloudtrail logs for say a 30 day period it's possible to validate:
Who searches the logs;
Do they search all logs or a subset of them;
If dashboards and alerts are used is this vs just a subset of logs.
Here is an example dashboard you can import into your Sumo Logic account that enables easier analytics of the search audit data. It can be filtered by query, type or user and shows what meta data strings are used as well as complete queries and types. You can use the searches in these panels as a start for custom analysis.
Continuing with our example scenario administrators use the search audit index to analyse queries vs Cloudtrail logs and find that the Security team are the key users of Cloudtrail logs, and there seem to be no queries targeting decrypt events directly.
After discussing with the security team it's agreed Decrypt events need to be kept for audit purposes but the use case for search is that they are only occasionally searched in response to some security investigations, so dashboards and alerting are not required.
3. Create A Routing Plan
Before configuring routing to infrequent tier you should have a clear plan for what data to route.
In our example scenario the team decided on the following routing plan.
Eventname = Decrypt events could be routed to infrequent except where they are errors;
An error is defined as "contains errorCode" key;
This should reduce continuous ingest credits use by about 35%, but all events are still retained in Sumo Logic for audit purposes.
4. Implement The Infrequent Tier
Let's walk through the steps taken by the team to setup data tiering for Cloudtrail.
Define A Tier Field
Open the Manage Data / Logs / Fields UI and add a field called tier if it's not already present. Sumo will be configured to set the value of this field for each event in the following steps - with the tier field used to route logs to partitions.
The team should agree on known values for this field for example:
Value | Routing Strategy |
continuous (default) | continuous tier |
infrequent | infrequent tier |
Create Partitions
For cloudtrail implement TWO partitions. One for continuous data and one for infrequent.
Partition name | cloudtrail | cloudtrail_infreq |
Type | continuous | infrequent |
Routing | _sourcecategory=cloudtrail not tier=infrequent | _sourcecategory=cloudtrail tier=infrequent |
contains | Most cloudtrail event types but not decrypt (unless it contains an error code) | Only decrypt events with no errorcode. |
Use case | Mission critical, frequent searches and alerts | Audit trail or ad hoc searches when required using the search UI or API. |
Create A Field Extraction Rule
You can set a field value in Sumo Logic at a collector or source level, but a great flexible way to set the tier field value per event is to use field extraction rules(FER). FER can accommodate very complex routing logic by using parsing and logical operator like if.
Here's what the team creates for Cloudtrail logs, including one or more if statements to set the tier field value.
Source:
_sourcecategory=*cloudtrail*
Parse Expression:
parse "eventSource\":\"*\"" as eventsource
| parse "\"sourceIPAddress\":\"*\"" as sourceipaddress
| parse "\"eventName\":\"*\"" as eventname
| parse "awsRegion\":\"*\"" as awsRegion
| parse "\"userName\":\"*\"" as userName nodrop
| json field=_raw "userIdentity.arn" as arn nodrop
| json field=_raw "recipientaccountid" as recipientaccountid nodrop
| json field=_raw "errorCode" as errorCode nodrop
// tier routing if statements
| "continuous" as tier
| if (eventname = "Decrypt" and isempty(errorcode),"infrequent",tier) as tier
As new data streams in it will be split into the two partitions based on the tier value. Most Cloudtrail events will have a 'continuous' as the tier field value except exceptions set by if statements.
5. Searching Infrequent Data
To search the infrequent data users would search using the _index name:
_index=cloudtrail_infreq
Or the infrequent tier just add the _datatier modifier to any existing search:
_datatier=infrequent _sourcecategory=*cloudtrail*
A query such as below can be used by administrators to validate data is routing to the correct index by using the _view field and _datatier modifier:
_sourcecategory=*cloudtrail* _datatier=infrequent // or _datatier=continuous
| _view as index
| count by _sourcecategory,_index
No More Tears With Tiers
In summary we learned tha Sumo Logic data tiering is a solution to the dilemma that not all logs are of equal log analytics value (or size), that enables Sumo Logic customers to work with all log types at an economical price.
We learned about how to use _size field and parsing out log specific fields to do detailed analysis of contributors to overall log size.
Following this process outlined above you can:
Understand what types of events within a set of logs are driving volume size;
Use the search audit index to investigate actual usage;
Design and implement data tiers to get better value from your Sumo Logic log analytics.
Thanks for taking the time to read this article about data tiers and good luck on your data tiering journey!
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.