JSON

Author

Clayton Cafiero

Published

2025-06-23

Structured data with JSON

JSON (JavaScript object notation) is a widely-used format for representing structured data. Though it was originally developed for use with JavaScript, no JavaScript is needed to use this format. Python has incorporated support for JSON in its standard releases since version 2.6 (2008). JSON has become popular because it can be read easily by humans and computers alike.

JSON can include key / value pairs like Python dictionaries. JSON can include sequences like lists and other values.

So what is JSON exactly? It’s a way of representing objects—dictionaries, lists, values—as strings.

For example, we can take a Python dictionary containing key / value pairs, with values that might be lists, tuples, integers, strings, whatever, and turn that into a JSON string. We call this encoding. Then we can save that string to a file, or send it over the internet.

In the other direction, we can read strings from a file and convert them into Python dictionaries containing key / value pairs, with values that might be lists, tuples, integers, strings, whatever. We call this decoding.

Let’s take a look at some small examples. Here’s a little Python dictionary representing Spotify streams of top Noah Kahan songs (as of June 2025).

d = {'Stick Season': {'total': 1527026407, 
                      'daily': 903012},
     'Northern Attitude': {'total': 474141182,
                           'daily': 529964},
     'Dial Drunk': {'total': 465714487, 
                    'daily': 355062},
     'Hurt Somebody': {'total': 407002777, 
                       'daily': 137980}}

Now let’s convert this to a JSON string using Python’s json module. We convert to a string using the .dumps() function (short for “dump string”; we refer to encoding a JSON string as “dumping” and decoding as “loading”.)

import json

# given d (above)
s = json.dumps(d)
print(s)

This encodes the string and when we print it, here’s what we see:

{"Stick Season": {"total": 1527026407, "daily": 903012}, 
"Northern Attitude": {"total": 474141182, "daily": 529964}, 
"Dial Drunk": {"total": 465714487, "daily": 355062}, 
"Hurt Somebody": {"total": 407002777, "daily": 137980}}

Look at that. Surprise! The JSON-encoded string looks exactly like a dictionary!

What happens if we JSON encode a list?

s = json.dumps(["red", "green", "blue"])
print(s)

This prints:

["red", "green", "blue"]

Amazing! This string looks just like a list!

So you see, there’s not a lot of mystery about JSON encoding. That’s part of its beauty.

Let’s go in the other direction and decode JSON strings.

import json

s = """{"Stick Season": {"total": 1527026407, "daily": 903012},
"Northern Attitude": {"total": 474141182, "daily": 529964},
"Dial Drunk": {"total": 465714487, "daily": 355062},
"Hurt Somebody": {"total": 407002777, "daily": 137980}}"""
# Notice s is a string, NOT a dictionary (using triple-quotes
# for multi-line string), and let's say we read this from
# a file or got it from some web API. Now we can turn it into
# a Python dict.

d = json.loads(s)
# Now we have the decoded dictionary. Let's confirm:

print(d['Hurt Somebody'])
# Prints: {'total': 407002777, 'daily': 137980}
print(f"{d['Stick Season']['total']:,d}")
# Prints: 1,527,026,407

Here we used json.loads() to decode a string and return a Python dictionary.

Let’s construct a complete example:

  1. Encode a Python dictionary as a JSON string.
  2. Write the string to a file.
  3. Check the contents of the file.
  4. Read the contents of the file and decode to convert back to a Python dictionary.
import json

data = {'Egbert': {'born': 1960,
                   'favorite foods': ['grubs', 'pumpkins']},
        'Edwina': {'born': 1968,
                   'favorite foods': ['pickles', 'kale']}}

with open("test.json", 'w') as fh:
    s = json.dumps(data)
    fh.write(s)   # write json to file
    
s = None      # destroy s so we can test file contents
data = None   # destroy data so we can test json.loads()

with open("test.json", 'r') as fh:
    s = fh.read()
    print(s)                             # 1

data = json.loads(s)
print(data['Egbert']['favorite foods'])  # 2
print(data['Edwina']['born'])            # 3

In the above example:

  • #1 prints a valid JSON string of the encoded data.
  • #2 prints ['grubs', 'pumpkins']
  • #3 prints 1968

By making this round trip, we confirm everything works as expected. Notice that what we encoded was a dictionary, and what we got back was a dictionary. The original dictionary had string keys, and what we got back were string keys. The values of the original dictionary were themselves dictionaries, and that’s exactly what we got back. The values of the nested dictionaries were integers and lists of strings, and that’s exactly what we got back. So encoding an object to JSON and recovering the object by decoding preserves the structure of the original object!

We can also write and read JSON encoded data from a file, skipping the steps of first encoding to a string, or of reading file contents to a string. To do this we use json.dump() instead of json.dumps(), and json.load() instead of json.loads(). json.dump() takes two arguments, the first is the data you wish to encode and the second is an open file handle. With json.load() we supply a file handle as an argument. Here’s revised code:

import json

data = {'Egbert': {'born': 1960,
                   'favorite foods': ['grubs', 'pumpkins']},
        'Edwina': {'born': 1968,
                   'favorite foods': ['pickles', 'kale']}}

with open("test.json", 'w') as fh:
    json.dump(data, fh)

data = None   # destroy data so we can test json.loads()

with open("test.json", 'r') as fh:
    data = json.load(fh)

print(data['Egbert']['favorite foods'])  # 1
print(data['Edwina']['born'])            # 2

  • #1 prints ['grubs', 'pumpkins']
  • #2 prints 1968

This is the preferred way of writing and reading JSON data with files.

What happens if we try to load something invalid?

json.loads() and json.load() are used to load JSON-encoded data and return the appropriate objects. It’s reasonable to ask what happens if the strings we try to load arent’t valid JSON. In this event we get a JSONDecodeError—a new kind of exception.

For example, while Python isn’t very fussy about whether we use single- or double-quotes to quote a string, JSON has some strong opinions about this.

>>> json.loads('"foo"')
'foo'
>>> json.loads("'foo'")
Traceback (most recent call last):
  File "<python-input-21>", line 1, in <module>
    json.loads("'foo'")
    ~~~~~~~~~~^^^^^^^^^
  ...
json.decoder.JSONDecodeError: 
     Expecting value: line 1 column 1 (char 0)

Here json.loads() expects the JSON-encoded string to be wrapped in single quotes, and then there’s a string within that. However, json.loads() doesn’t care about which we use for this:

>>> json.loads('17')
17
>>> json.loads("17")
17

What else might constitute an invalid string? Here are some examples, each of which would result in a JSONDecodeError:

  • "[=]"
  • (4, 5, 6)
  • __name__

Try these and others at the shell.

Pretty-printing JSON

Pretty-printing JSON is … pretty easy. All you need to do is specify an indentation (e.g., indent=4) and json.dumps() will format with the specified indentation. When you do this, the indentation and newlines are automatically added to the JSON-encoded string, so when you print it’s nicely formatted.

>>> import json
>>> data = {'Egbert': {'born': 1960,
...                    'favorite foods': ['grubs', 'pumpkins']},
...         'Edwina': {'born': 1968,
...                    'favorite foods': ['pickles', 'kale']}}
>>> s = json.dumps(data, indent=4)   # indent keyword argument
>>> print(s)
{
    "Egbert": {
        "born": 1960,
        "favorite foods": [
            "grubs",
            "pumpkins"
        ]
    },
    "Edwina": {
        "born": 1968,
        "favorite foods": [
            "pickles",
            "kale"
        ]
    }
}

Why use JSON?

If we want to save or send data, we must somehow convert to a string (or some binary format, which we won’t get into here). We use JSON when we want to preserve structure when saving data to disk or passing it around across the internet. For example, JSON gives us something that plain CSV data cannot. CSV is fine for flat tables of rows and columns, but it cannot handle any other structure effectively. We can’t use CSV as a format to store dictionaries, nested lists, or anything other than a tabular structure.

Limitations of JSON

JSON is not without its limitations though. There are some things that cannot be encoded as JSON strings, and some things for which we cannot preserve the original structure. (The JSON module allows for user customization of encoding and decoding but that’s beyond the scope of what we’ll cover here.)

Common, simple objects such as int, float, bool, str, list, tuple, dict, and NoneType are all fair game for encoding.

The NoneType object is encoded as the string 'null', and decoding a string 'null' yields the NoneType object None, so be careful if you have strings with the value 'null'—they will be decoded as None.

Boolean literals True and False will be encoded as 'true' and 'false', so be careful with values 'true' and 'false'—they will be decoded as True and False respectively.

Sets, like {1, 2, 3}, cannot be encoded.

Tuples, like (1, 2, 3), are encoded as a list would be:

>>> import json
>>> json.dumps((1, 2, 3))
'[1, 2, 3]'

So if you JSON encode a tuple, and then decode the resulting string, you’ll get back a list, not a tuple.

>>> import json
>>> json.loads(json.dumps((1, 2, 3)))
[1, 2, 3]

JSON requires that strings used as keys be double-quoted (despite the fact that strings can be single or double quoted in Python). For example, here the first succeeds, but the second fails:

>>> import json
>>> json.loads('{"a": "b"}')
{'a': 'b'}
>>> json.loads("{'a': 'b'}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
json.decoder.JSONDecodeError: 
     Expecting property name enclosed in double quotes: 
     line 1 column 2 (char 1)

We cannot encode things like range or enumerate objects or functions. If we try, we get a TypeError.

>>> import json
>>> json.dumps(range(10))
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    json.dumps(range(10))
    ~~~~~~~~~~^^^^^^^^^^^
  ...
TypeError: Object of type range is not JSON serializable
>>> json.dumps(len)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
TypeError: Object of type builtin_function_or_method 
           is not JSON serializable

What about named tuples? What do you think would happen when we try to JSON encode one of these?

It works, but perhaps not in the way you’d expect, and we don’t get a complete round-trip.

>>> from collections import namedtuple
>>> import json
>>> Xyz = namedtuple('Xyz', ['x', 'y', 'z'])
>>> xyz = Xyz(1, 2, 3)
>>> json.dumps(xyz)
'[1, 2, 3]'
>>> result = json.loads(json.dumps(xyz))
>>> result
[1, 2, 3]

The fields of the named tuple are encoded and decoded as a list. Does this mean we can’t use named tuples with JSON? Of course not, but it does mean that if we want to recover a named tuple from a JSON-encoded string we need to take care of that ourselves.

>>> Xyz = namedtuple('Xyz', ['x', 'y', 'z'])
>>> xyz = Xyz(1, 2, 3)
>>> result = json.loads(json.dumps(xyz))
>>> Xyz(*result)
Xyz(x=1, y=2, z=3)

You may ask: What’s up with the * in *result? That’s called a splat, and this operator unpacks the sequence into its individual elements.

>>> lst = [1, 2, 3]
>>> print(*lst)
1 2 3

There’s another alternative that makes use of the standard features of named tuples. Named tuples can be converted to dictionaries using the ._asdict() method (“as dictionary”).

>>> Xyz = namedtuple('Xyz', ['x', 'y', 'z'])
>>> xyz = Xyz(1, 2, 3)
>>> xyz._asdict()
{'x': 1, 'y': 2, 'z': 3}
>>> result = json.loads(json.dumps(xyz._asdict()))
>>> Xyz(**result)
Xyz(x=1, y=2, z=3)

This can be more robust than the earlier approach because we preserve the field names rather than depending on their order. Again, you may ask: What’s up with the ** in **result? First we had one star, now we have two? That’s called a kwarg splat (kwarg is short for keyword argument), and it unpacks a dictionary into keyword arguments. So if result is the dictionary {'x': 1, 'y': 2, 'z': 3}, then Xyz(**result) is equivalent to Xyz(x=1, y=2, z=3).

>>> Xyz(x=1, y=2, z=3)
Xyz(x=1, y=2, z=3)

Resources

For more on JSON and Python’s JSON module, see:

 

Copyright © 2023–2025 Clayton Cafiero

No generative AI was used in producing this material. This was written the old-fashioned way.