The powerful analytics capabilities of the Sumo Logic platform have always provided the greatest insights into your machine data. Recently we added an operator – bringing the essence of a SQL JOIN to your stream of unstructured data, giving you even more flexibility.
In a standard relational join, the datasets in the tables to be joined are fixed at query time. However, matching up IDs between log messages from different days within your search timeframe likely produces the wrong result because actions performed yesterday should not be associated with a login event that occurred today. For this reason, our Join operator provides for a specified moving timeframe within which to join log messages. In the diagram below, the pink and orange represent two streams of disparate log messages. They both contain a key/value pair that we want to match on and the messages are only joined on that key/value when they both occur within the time window indicated by the black box.
Now let’s put this to use. Suppose an application has both real and machine-controlled users. I’m interested in knowing which users are which so that I can keep an eye out for any machine-controlled users that are impacting performance. I have to find a way to differentiate between the real vs the machine-controlled users. As it turns out, the human users create requests at a reasonably low rate while the machine-controlled users (accessing via an API) are able to generate several requests per second and always immediately after the login event.
In these logs, there are several different messages coming in with varying purposes and values. Using Join, I can query for both the logins and requests and then restrict the time window of the matching logic to combine the two messages streams. The two sub queries in my search will look for request/query events and login events respectively. I’ve restricted the match window to just 15 seconds so that I’m finding the volume of requests that are very close to the login event. Then I’m filtering out users who made less than 10 requests in that 15-second time frame following a login. The result is a clear view of the users that are actively issuing a large volume of requests via the API immediately upon logging in. Here is my example query:
(login or (creating query)) | join (parse "Creating query: '*'" as query, "auth=User:*:" as user) as query, (parse "Login success for: '*'" as user) as login on query.user = login.user timewindow 15s | count by query_user | where _count > 10 | sort _count
As you can see from the above syntax, the subqueries are written with the same syntax and even support the use of aggregates (count, sum, average, etc) so that you can join complex results together and achieve the insights you need. And of course, we support joining more than just two streams of logs – combining all your favorite data into one query!