In the first post of our three-part Amazon Redshift series, we covered what Redshift is and how it works. For the second installment, we’ll discuss how Amazon Redshift queries are analyzed and monitored. Before we go deep into gauging query performance on Redshift, let’s take a quick refresher on what Amazon Redshift is and what it does.
What is Amazon Redshift?
Amazon Redshift is a fully-managed petabyte-scale cloud-based data warehouse service designed for large scale data set storage and analysis. Part of the larger Amazon Web Services (AWS) platform, Redshift is built to handle massive amounts of data for processing, analysis, and migration.
Read more: What is Amazon Redshift?
When you’re set up and running on Amazon Redshift, it’s crucial to monitor the performance of your clusters and databases regularly. You can use tools and techniques to optimize clusters, but without performance benchmarks, you can’t accurately track if your efforts are working. You can begin benchmarking by understanding the performance data and metrics Redshift provides through the AWS console.
How to Monitor Redshift Query Performance (300)
Monitoring query performance is essential in ensuring that clusters are performing as expected.
Redshift users can use the console to monitor database activity and query performance. Since the data is aggregated in the console, users can correlate physical metrics with specific events within databases simply.
If you are interested in monitoring the physical performance of your clusters, including CPU Utilization and Network Throughput, these metrics and more can be monitored through Amazon CloudWatch. You can learn more about CloudWatch here.
Viewing Performance Metrics in the Redshift Console
On the Redshift console, performance metrics are organized by cluster.
For performance metrics on the cluster-level, simply go to the Redshift console and choose the cluster you want to monitor from the dropdown menu. The Cluster details page will give you a quick view of the status of your cluster, including DB Health and availability. For query-level metrics, choose the Queries tab.
On the Queries tab, you will see the query summary, a list of all running queries in a cluster, including recently run queries. This list can be sorted by ID, query run time, and status. Though limited, the information and metrics provided in the Queries tab are beneficial in analyzing specific queries on your cluster. Information here include:
- General query properties: query ID, query type, cluster the query ran on, and query run time.
- Status information: Query status and number of errors.
- SQL statement that was run.
- If available, an explain plan.
- Cluster data during the query execution.
Remember that query performance metrics are very limited on Amazon CloudWatch, so to correlate an irregularity, it’s very useful to be able to view events on the query level to see more details on what really happened during an event.
How to Analyze Redshift Queries
Because of the massive amounts of data in Redshift, it can take a long time to execute complex queries to retrieve information from your clusters. To get the most out of Redshift, your queries must be processed as fast as possible. To know how your cluster is performing, there are tools that you can use to analyze queries and pinpoint the cause of performance issues.
- Use the ANALYZE command to get updated statistics on your queries, especially when you make significant changes in the size of your tables. You can run ANALYZE per column to save time.
- Review query alerts on the STL_ALERT_EVENT_LOG table.
- Run the EXPLAIN command to get the query plan--it is crucial in analyzing queries and identifying issues early on. The query plan is what the execution engine follows when running queries. The execution engine translates your query plan into steps, segments, and streams. Learn more about the query plan here.
- To get more human-readable and detailed information about query execution steps and statistics, use the SVL_QUERY_SUMMARY and SVL_QUERY_REPORT views.
Monitoring Redshift Disk Space
Typical Redshift users process large amounts of data so it’s crucial to regularly monitor how much space is left on your Redshift cluster. This allows you to incrementally upgrade and avoid being surprised by a full cluster.
There are three ways you can monitor disk space in Redshift.
First, via CloudWatch. Second, through the Performance tab on the AWS Console. And third, by querying Redshift directly.
Monitoring Redshift Disk Space via CloudWatch
Setting up alerts on CloudWatch can save you time by automating the monitoring of your Redshift storage. You can set up an alert for when the disk reaches a certain percentage of usage. Learn how to set up alerts here.
Checking Redshift Storage on the Redshift console
From the cluster details page, choose the Performance tab and scroll down to the graph that shows how much storage is currently in use.
Querying Redshift to Get Disk Space Information
Use the STV_PARTITIONS query to monitor your cluster’s disk utilization. This is only visible to superusers, so make sure to have the correct permissions set up beforehand.
How Do I Monitor Amazon Redshift
The monitoring tools provided by Amazon are very useful in optimizing Redshift. However, these tools are limited and could be inflexible, which can result in you not getting the information you need.
Using third-party monitoring tools like Sumo Logic, which seamlessly integrates with Redshift, gives you access to a richer set of metrics that are updated in real-time. In the next installment of this series, we will go deeper into Sumo Logic and how you can use it to optimize your Redshift clusters further.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.