Building URLs in Python
Janne Kemppainen |Building URLs is really common in applications and APIs because most of the applications tend to be pretty interconnected. But how should we do it in Python? Here’s my take on the subject.
Different codebases might have different requirements such as:
- no unnecessary dependencies
- clean code
- quick and dirty
- etc..
Let’s see how the different options compare.
The standard way
Python has a built in library that is specifically made for parsing URLs, called urllib.parse.
You can use the urllib.parse.urlsplit
function to break a URL string to a five-item named tuple. The items are parsed
scheme://netloc/path?query#fragment
The opposite of breaking an URL to parts is to build it using the urllib.parse.urlunsplit
function.
If you check the library documentation you’ll notice that there is also a urlparse function. The difference between it and the urlsplit function is an additional item in the parse result for path parameters.
https://www.example.com/some/path;parameter=12?q=query
Path parameters are separated with a semicolon from the path and located before the query arguments that start with a question mark. Most of the time you don’t need them but it is good to know that they exist.
So how would you then build an URL with urllib.parse?
Let’s assume that you want to call some API and need a function for building the API URL. The required URL could be for example:
https://example.com/api/v1/book/12?format=mp3&token=abbadabba
Here is how we could build the URL:
import os
from urllib.parse import urlunsplit, urlencode
SCHEME = os.environ.get("API_SCHEME", "https")
NETLOC = os.environ.get("API_NETLOC", "example.com")
def build_api_url(book_id, format, token):
path = f"/api/v1/book/{book_id}"
query = urlencode(dict(format=format, token=token))
return urlunsplit((SCHEME, NETLOC, path, query, ""))
Calling the function works as expected:
>>> build_api_url(12, "mp3", "abbadabba")
'https://example.com/api/v1/book/12?format=mp3&token=abbadabba'
I used environment variables for the scheme and netloc because typically your program is calling a specific API endpoint that you might want to configure via the environment.
I also introduced the urlencode function which transforms a dictionary to a series of key=value
pairs separated with &
characters. This can be handy if you have lots of query arguments as a dictionary of values can be easier to manipulate.
The urllib.parse
library also contains urljoin
which is similar to os.path.join
. It can be used to build URLs by combining a base URL with a path. Let’s modify the example code a bit.
import os
from urllib.parse import urljoin, urlencode
BASE_URL = os.environ.get("BASE_URL", "https://example.com/")
def build_api_url(book_id, format, token):
path = f"/api/v1/book/{book_id}"
query = "?" + urlencode(dict(format=format, token=token))
return urljoin(BASE_URL, path + query)
This time the whole base URL comes from the environment. The path and query are combined with the base URL using the urljoin
function. Notice that this time the question mark at the beginning of the query needs to be set manually.
The manual way
Libraries can be nice but sometimes you just want to get things done without thinking that much. Here’s a straight forward way to build a URL manually.
import os
BASE_URL = os.environ.get(BASE_URL, "https://example.com").rstrip("/")
def build_api_url(book_id, format, token):
return f"{BASE_URL}/api/v1/book/{book_id}?format={format}&token={token}"
The f-strings in Python make this quite clean, especially with URLs that always have the same structure and not that many parameters. The BASE_URL
initialization strips the tailing forward slash from the environment variable. This way the user doesn’t have to remember if it should be included or not.
Note that I haven’t added any validations for the input parameters in these examples so you may need take that into consideration.
The Furl way
Then there is a library called furl which aims to make URL parsing and manipulation easy. It can be installed with pip:
>> python3 -m pip install furl
Let’s see it in action.
import os
from furl import furl
BASE_URL = os.environ.get("BASE_URL", "https://example.com")
def build_api_url(book_id, format, token):
f = furl(BASE_URL)
f /= f"/api/v1/book/{book_id}"
f.args["format"] = format
f.args["token"] = token
return f.url
There are a bit more lines here when compared to the previous example. First we need to initialize a furl object from the base url. The path can be appended using the /=
operator which is custom defined by the library.
The query arguments can be set with the args
property dictionary. Finally, the final URL can be built by accessing the url
property.
Here’s an alternative implementation using the set()
method to change the path and query arguments of an existing URL.
def build_api_url(book_id, format, token):
return (
furl(BASE_URL)
.set(path=f"/api/v1/book/{book_id}", args={"format": format, "token": token},)
.url
)
In addition to building URLs Furl lets you modify existing URLs and parse parts of them. You can find many more examples from the API documentation.
Conclusion
These are just some examples on how to create URLs. Which one do you prefer?
Read next in the Python bites series.
Create Your Own Python Packages