Welcome to Tony's Notebook

URL Encoding

I had an issue at work recently that led me to look into a subject that I'd never really taken much notice of before - URL encoding. With the help of a colleague I managed to track down an issue that had been puzzling me and several others - more on that later.

What is URL encoding?

At its simplest, URL encoding is a way of transforming characters so they are valid in a URL, such as https://tonys-notebook.com. Only ASCII characters are valid as part of a URL. Briefly, the ASCII character set consists of digits 0-9, A-Z, a-z, and some special characters. URLs cannot contain spaces. Usually URL encoding replaces the space with either a '+' character or %20, which is the hex value (32 decimal) in the ASCII table for 'space'. For example, if your input text was 'Hey there', this would be URL encoded as 'Hey+there'.

Python code

OK so you know you aren't going to get far with content on this site before you have to write some Python code. So, to play with URL encoding you can rustle up a quick piece of Python like:

import urllib.parse

q = urllib.parse.quote_plus('Hey there!')
print(q)

q = urllib.parse.quote_plus('Hey, Punch & Judy?')
print(q)

q = urllib.parse.quote_plus('Hobbs + Shaw')
print(q)

This output if you run this code with python3 url-encode.py would be:

bash-3.2$ python3 url-encode.md                                                                                                              
Hey+there%21                                                                                                                                 
Hey%2C+Punch+%26+Judy%3F                                                                                                                     
Hobbs+%2B+Shaw                                                                                                                              ```

You'll see that spaces are converted to '+' and '+' is converted to '%2B' (this is 0x2B, which is 43 decimal, the ASCII code for '+').

So far, so good...

## The problem I ran into

The problem I ran into was when you are using Curl to test REST APIs. In this case I was testing the Vonage Reports API using Curl. I had already set up and tested the API with Paw. But there was an issue when I was using Curl on the command line. This only happened with [ISO-8601 dates](/articles/dealing-with-dates.html) of a certain format. As you may know there are a couple of formats that conform to the ISO-8601 spec. You can have `2011-08-27T23:22:37Z` which is the time expressed at UTC. There is also another format that expresses the time with a time offset such as `2017-11-10T11:07:29+0000`. Notice the '+' nonchalantly sitting in there. 

Something like this would be fine (not all of request shown):

```shell
curl "https://api.nexmo.com/v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound&date_start=2020-06-04T08:00:00Z&date_end=2020-06-04T14:00:00Z"

But the problem then comes when you have a Curl command such as:

curl "https://api.nexmo.com/v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound&date_start=2020-06-04T08:00:00+0002&date_end=2020-06-04T14:00:00+0002"

Now your problem is going to be your query URL has '+' characters in it, and we know already '+' is a special character in URLs that usually reserved to represent a space. Oops.

The solution

To get around this issue you need to encode the URL. This means the '+' would be replaced with '%2B', and any spaces, if there were any, would be replaced by '+'.

Curl has a convenient way to deal with this. You would end up with something like the following:

curl -G --data-urlencode date_start=$DATE_START --data-urlencode date_end=$DATE_END -u "$NEXMO_API_KEY:$NEXMO_API_SECRET" \
     "https://api.nexmo.com/v2/reports/records?account_id=$ACCOUNT_ID&product=$REPORT_PRODUCT&direction=$REPORT_DIRECTION"

A quick check of the Curl help:

curl --help | grep --color '\-G'
 -G, --get           Send the -d data with a HTTP GET (H)

and:

curl --help | grep --color '\-\-data-urlencode'
     --data-urlencode DATA  HTTP POST data url encoded (H)

This is nice. It means we can add data as if using the -d option, but have Curl convert this to GET using the -G option. The option --data-urlencode is essentially the -d option but with the data URL encoded for us.

Debugging to confirm what's actually going on

Let's go back to our request, and plug in some actual values:

curl -G --data-urlencode date_start=2020-06-04T08:00:00+0001 --data-urlencode date_end=2020-06-04T14:00:00+0001 -u "abcd1234:password"\ 
     "https://api.nexmo.com/v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound"

Add this to a file such as get-records-dates.sh, then chmod +x get-records-dates.sh. You can then run the script as:

./get-records-dates.sh | jq

The script is piped to jq to nicely format the JSON response. The response would start something like:

{
  "_links": {
    "self": {
      "href": "https://api.nexmo.com/v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound&date_start=2020-06-04T08%3A00%3A00%2B0001&date_end=2020-06-04T14%3A00%3A00%2B0001"
    }
  },
  "request_id": "ffdf0e6b-cb4a-5678-1234-7e2f2f55d673",
  "request_status": "SUCCESS",
  "received_at": "2020-07-07T15:14:42+0000",
  "price": 0,
  "currency": "EUR",
  "direction": "outbound",
  "product": "SMS",
  ...

You can see that the '+' in the date format has been correctly URL encoded as '%2B', so '+0001' becomes %2B0001.

The point here was the URL contained in the response is confirmed as correctly encoded - if it hadn't been we would have an error response.

Of course you could more logically check on the outbound request using Curl debugging. For this simply add the -v option to your Curl command line in the script, and you will see something like the following tracing:

tbedford@Anthonys-MBP: ~/checkouts/tbedford/nexmo-apps/reports [master] $ ./get-records-dates.sh
*   Trying 104.18.99.29...
* TCP_NODELAY set
* Connected to api.nexmo.com (104.18.99.29) port 443 (#0)
...
> GET /v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound&date_start=2020-06-04T08%3A00%3A00%2B0001&date_end=2020-06-04T14%3A00%3A00%2B0001 HTTP/2
> Host: api.nexmo.com
> Authorization: Basic ZmJ...k==
> User-Agent: curl/7.54.0
> Accept: */*
...

Quite a bit of debug info has been removed.

The key part is the request:

GET /v2/reports/records?account_id=abcd1234&product=SMS&direction=outbound&date_start=2020-06-04T08%3A00%3A00%2B0001&date_end=2020-06-04T14%3A00%3A00%2B0001

You can see in the dates that they are correctly URL encoded to %2B0001.

Summary

So, you have seen that URL encoding is conceptually quite simple. However, there can be problems related to situation where you have not correctly URL encoded your data. One to watch out for! Good luck!

Resources