I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:
working_directory
└───data
├───export1.json
├───export2.json
├───export3.json
├───...
For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?
The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py
:
#!/usr/bin/env python3
# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"
import json
import glob
from datetime import datetime
import csv
# Place your JSON data in a directory named 'data/'
src = "data/"
date = datetime.now()
data = []
# Change the glob if you want to only look through files with specific names
files = glob.glob('data/*', recursive=True)
# Loop through files
for single_file in files:
with open(single_file, 'r') as f:
# Use 'try-except' to skip files that may be missing data
try:
json_file = json.load(f)
data.append([
json_file['requestedUrl'],
json_file['fetchTime'],
json_file['categories']['performance']['score'],
json_file['audits']['largest-contentful-paint']['numericValue'],
json_file['audits']['speed-index']['numericValue'],
json_file['audits']['max-potential-fid']['numericValue'],
json_file['audits']['cumulative-layout-shift']['numericValue'],
json_file['audits']['first-cpu-idle']['numericValue'],
json_file['audits']['total-byte-weight']['numericValue']
])
except KeyError:
print(f'Skipping {single_file}')
# Sort the data
data.sort()
# Add headers
data.insert(0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight'])
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str(date)}.csv'
with open(csv_filename, "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
print("Updated CSV")
That gives you a CSV that you can use to create charts or analyze to your heart’s content.
| Requested URL | Date | Performance Score | LCP | Speed Index | FID | CLS | CPU Idle | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| https://www.example.com | 2020-08-26T11:19:42.608Z | 0.96 | 1523.257 | 1311.5760337571400 | 153 | 0.5311671549479170 | 1419.257 301319 | 301319 |
| https://www.example.com | 2020-08-26T11:32:16.197Z | 0.99 | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282 |