because there are just too many things to remember
Resources worth bookmarking and other jq lessons

It’s been a while since the last lesson. In that time, my computer died and was repaired and I’ve finally got things back together.

Today we’re going to dive into more useful jq things, but I’m also going to refer to resources worth bookmarking.

First up: jqterm. This is a fantastic tool to test your queries iteractively. It does almost everything you could do in a terminal on your laptop, and gives immediate visual feedback. It however, doesn’t seem to like comments like jq, but as a testing tool, it’s great.

It handles newlines beautifully as illustrated below

.Events[]
    [.EventName,
        (.EventTime
            | strftime("%Y-%m-%d %H:%M:%S") ),
        (.CloudTrailEvent
            |fromjson
            |.sourceIPAddress,
             .userIdentity.userName,
             .eventTime,
             (.userAgent|split(" ")[3]),
             .responseElements.ConsoleLogin
        )
    ]

I use this expression because it illustrates a few things:

(1) The CloudTrailEvent looks like JSON, but handling this initially required two jq statements chained together like this:

This is ugly for a number of reasons, but mostly because I would like to get all the output in a single sweep. More on that later.

(2) Writing the output into a format that we might like (like tab-separated-values (tsv)), jq has a function for that. Piping the output directly though @tsv does this cleanly.

There are some great resources out there, but it’s taken me time to find them. I’ll create a separate blog post about them, but this and this are great ones to bookmark now. Lots of examples and demos. Had I found these before I started this series I wouldn’t have done the series because these are comprhensive resources on jq. Anyhow, here we are!

Lesson 13:

Converting a document into JSON that you can use

Getting the ConsoleLogin events from CLoudTrail, here’s a sample event:

       {
            "EventId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx068613",
            "EventName": "ConsoleLogin",
            "ReadOnly": "false",
            "EventTime": 1584080740.0,
            "EventSource": "signin.amazonaws.com",
            "Username": "hamish",
            "Resources": [],
            "CloudTrailEvent": "{\"eventVersion\":\"1.05\",\"userIdentity\":{\"type\":\"IAMUser\",\"principalId\":\"AIDAxxxxxxxxxxxxxxxxx\",\"arn\":\"arn:aws:iam::xxxxxxxx0012:user/hamish\",\"accountId\":\"xxxxxxxx0012\",\"userName\":\"hamish\"},\"eventTime\":\"2020-03-13T06:25:40Z\",\"eventSource\":\"signin.amazonaws.com\",\"eventName\":\"ConsoleLogin\",\"awsRegion\":\"us-east-1\",\"sourceIPAddress\":\"156.xxx.xxx.xxx\",\"userAgent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:73.0) Gecko/20100101 Firefox/73.0\",\"requestParameters\":null,\"responseElements\":{\"ConsoleLogin\":\"Success\"},\"additionalEventData\":{\"LoginTo\":\"https://console.aws.amazon.com/iam/home?region=eu-west-1&state=hashArgs%23&isauthcode=true\",\"MobileVersion\":\"No\",\"MFAUsed\":\"No\"},\"eventID\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx068613\",\"eventType\":\"AwsConsoleSignIn\",\"recipientAccountId\":\"xxxxxxxx0012\"}"
        }

Let’s start with this as part (1) of this lesson:

.Events[]
    |[(.CloudTrailEvent
                    |fromjson
                    |.sourceIPAddress,.userIdentity.userName,.eventTime,(.userAgent|split(" ")[3]), .responseElements.ConsoleLogin
                ),
                .EventName,
                (.EventTime
                    |strftime("%Y-%m-%d %H:%M")),
                ]
| @tsv

Now the .CloudTrailEvent looks awlfully like JSON. In fact it is, and previously we would have had to pipe the events through two jq commands (see above).

The fromjson builtin function allows us to convert this ugly-looking almost-JSON to JSON we can use. Hence the jq above. Notice how I enclose the .CloudTrailEvent in round ( ) braces. This allows me to handle all the .CloudTrailEvent keys in a single group.

Part (2): Take note too how I changed the order of the data returned. How is that so?

What this means is that the input JSON goes though each part of the jq pipeline - irrespective of the order it is in. I’ve subtly swapped the first bit of JSON (the .Event information) with the second bit (the .CloudTrailEvent). That’s handy.

Lesson 14

Defining variables for later use

Following on from the above (that jq passes every part of the JSON through the program), what does it matter?

Well, unlike many other languages, we would have to define variables if we wanted to do something, while in jq we can achieve this more simply. Say we wanted to find the average of the following .time_taken:

	{
	    "url_details": [
	        {
	            "api_url": "https://xxxxxxx/v1/wellness/count/level/customer/id/123456/impactarea/performance,efficiency,availability,protection,capacity,configuration,security?sum1=performance,efficiency&sum2=availability,protection",
	            "time_taken": 1373
	        },
	        {
	            "api_url": "https://xxxxxxxxxxxxx/v1/wellness/renewal/count/level/customer/id/1234546",
	            "time_taken": 1414
	        },
	        {
	            "api_url": "https://xxxxxxxxxxxxx/v1/wellness/renewal/count/level/customer/id/1234546",
	            "time_taken": 1478
	        },
	        {
	            "api_url": "https://xxxxxxxxxxxxx/v1/wellness/renewal/count/level/customer/id/1234546",
	            "time_taken": 1956
	        }]
	}

In traditional coding, we would need to keep track of (a) the number of entries (in this case 4) and (b) the sum of all entries to calculate the average. However, since jq sends the data through each part of a pipeline, we simply do this:

jq '[.url_details[] | .time_taken]|add/length'

Here the .time_taken is passed through the function add and that same JSON is passed through the function length. Bingo, we have the average.

But wait: there’s something else going on here.

You should note the first set of square [ ] braces inside the jq. We’ve not seen that before! Why do they appear before the .url_details?

The add and the length functions are expecting an array object - to be able to add and to find the length of the array. Previously we’ve used the square braces AFTER the pipe, but in this case we’re using an array to enclose the values we wish to add and find the length of.

So contrast this:

jq '.url_details[] | [.time_taken]'
[
  1373
]
[
  1414
]
[
  1478
]
[
  1956
]

with this:

jq '[.url_details[] | .time_taken]'
[
  1373,
  1414,
  1478,
  1956
]

Subtle, but different; and to the add and length functions, this is a significant difference!

Here’s another example: We want to find out the average spot instance price of the m4.xlarge instance sizes over some period of time:

aws ec2 describe-spot-price-history --instance-types m4.xlarge --product-descriptions "Linux/UNIX" --start-time 2020-06-01T00:00:00|jq '[.SpotPriceHistory[]|(.SpotPrice|tonumber)]|add/length'

This is very cool, but if you wish to re-use part of the parsed JSON, you’re going to have to define a variable. jq makes this easy with the as $var syntax.

Check this out:

.Events[]
    | (.CloudTrailEvent|fromjson) as $ct |
    [
        .EventName,
        (.EventTime
            | strftime("%Y-%m-%d %H:%M:%S") ),
        $ct.userIdentity.userName,
        ($ct.userAgent|split(" ")[0,3,4])
    ]

How cool is that?

We’re actually assigned the variable ct to the whole .CloudTrailEvent | fromjson. This means that later, we can use it in the same way as before.

That’s defining a variable for a whole section of JSON. I’ve included some additional things here including splitting the .userAgent string into parts so I can get the OS and the browser used to login.

Of course I could also place something specific into a variable like this:

.Events[]
    | (.CloudTrailEvent|fromjson) as $ct |
    [
        .EventName,
        (.EventTime
            | strftime("%Y-%m-%d %H:%M:%S") ),
        ($ct.userIdentity.userName|ascii_upcase) as $user |
        ($ct.userAgent|split(" ")[0,3,4]),
        "User is "+$user
    ]

The output:

[
  "ConsoleLogin",
  "2020-05-04 10:32:09",
  "Mozilla/5.0",
  "Linux",
  "x86_64;",
  "User is HAMISH"
]
[
  "ConsoleLogin",
  "2020-03-19 03:45:34",
  "Mozilla/5.0",
  "Mac",
  "OS",
  "User is HAMISH"
] ...

As you can see, I’ve assigned .userName to $user, but in addition, I’ve upper-cased the result for later use. That is pretty cool.

Take note that this can get pretty messy if you don’t keep it neat. I mean, all this works on a single line; but don’t ask me to maintain that later!

Additional things:

  • Putting a string into the resulting JSON with the final line “User is…". We can do this with a variable.

  • Assigning a variable means we have to terminate the assignment with a pipe as before (and by way of convention), I’m going to use the ending pipe to terminate the variables at the end of the line rather than at the beginning.

  • Do yourself a favour, come up with some converntion that works for you and stick to it. Whether it’s placing different outputs on different lines, or putting the pipe on the end of the line for variable definition. Whatever. Just make sure you stick to it because you’re not going to undestand this jq later unless you do!

Phew. That felt like a long post, a lot of information and technically a little more challenging.


Additional resources

Here are some resources I found useful:

A jq cheat-sheet

An interactive jq interface on the Interwebs jqterm

Some good jq recipes (though they get advanced quickly!) jq recipes

Another great jq tutorial Reshaping JSON with jq

An alternative to jqterm JQ play


Last modified on 2020-06-02