Using Python to Read Multiple JSON Files and Export Values to a CSV

Comparing data from multiple JSON files can get unweildy – unless you leverage Python to give you the data you need.

I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:

working_directory
└───data
    ├───export1.json
    ├───export2.json
    ├───export3.json
    ├───...

For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?

The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py:

json-to-csv-exporter.py

#!/usr/bin/env python3

# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"

import json
import glob
from datetime import datetime
import csv

# Place your JSON data in a directory named 'data/'
src = "data/"

date = datetime.now()
data = []

# Change the glob if you want to only look through files with specific names
files = glob.glob('data/*', recursive=True)

# Loop through files

for single_file in files:
  with open(single_file, 'r') as f:

    # Use 'try-except' to skip files that may be missing data
    try:
      json_file = json.load(f)
      data.append([
        json_file['requestedUrl'],
        json_file['fetchTime'],
        json_file['categories']['performance']['score'],
        json_file['audits']['largest-contentful-paint']['numericValue'],
        json_file['audits']['speed-index']['numericValue'],
        json_file['audits']['max-potential-fid']['numericValue'],
        json_file['audits']['cumulative-layout-shift']['numericValue'],
        json_file['audits']['first-cpu-idle']['numericValue'],
        json_file['audits']['total-byte-weight']['numericValue']
      ])
    except KeyError:
      print(f'Skipping {single_file}')

# Sort the data
data.sort()

# Add headers
data.insert(0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight'])

# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str(date)}.csv'
with open(csv_filename, "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(data)

print("Updated CSV")

That gives you a CSV that you can use to create charts or analyze to your heart’s content.

| Requested URL           | Date                     | Performance Score | LCP                | Speed Index        | FID | CLS                 | CPU Idle           | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| https://www.example.com | 2020-08-26T11:19:42.608Z | 0.96              | 1523.257           | 1311.5760337571400 | 153 | 0.5311671549479170  | 1419.257 301319    | 301319            |
| https://www.example.com | 2020-08-26T11:32:16.197Z | 0.99              | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282            |

More from my blog