har-analyze Review: 2-Second CLI Summaries for HAR Files

har-analyze: a stdlib-only Python CLI for two-second HAR summaries and common performance opportunities

har-analyze is a stdlib-only Python CLI that creates a quick terminal summary of HAR files, flags issues like missing compression, and exports JSON or markdown.

FACTUAL ACCURACY

Only include information explicitly supported by the source content.
Do not infer, assume, or generalize beyond the source.
Do not invent features, architecture, benchmarks, or integrations.
If a detail is uncertain or not clearly stated, omit it.

Accuracy is more important than completeness.

A tiny CLI to get the first 30 seconds of insight from a HAR

har-analyze is a small, dependency-free Python command-line tool that turns a verbose HTTP Archive (HAR) JSON export into an immediate, actionable summary. The project is presented as roughly 700 lines of Python that use only the standard library (notably json and argparse) to produce human-friendly output, machine-readable exports, and a short list of “opportunities” such as missing compression. It’s designed for the moment a teammate drops a multi-megabyte HAR into Slack and asks someone else to triage a performance problem — running har-analyze should give you the important signals in a couple of seconds.

Why HAR files frustrate debugging workflows

HAR is the de facto interchange format for browser DevTools network exports across Chrome, Firefox, Safari and Edge: the file structure is simple JSON where log.entries[] enumerates requests and each entry contains nested request, response and timings objects. In practice, however, HARs are verbose and noisy: an export of a modest page can easily be tens of kilobytes, with fields that are present but rarely useful (request.cookies appears by default, cache blocks are often empty), redundant size fields, and timing semantics that can mislead readers (startedDateTime is not guaranteed to be monotonic because of parallel requests).

Those properties make manual inspection slow and error-prone. Existing options are useful but not always practical for the quick second-opinion case: the DevTools Network panel is per-machine and requires someone to open the original tab; WebPageTest produces excellent reports but requires running a separate test on its infrastructure; using jq to query the raw JSON works but forces you to remember field names and schema quirks. har-analyze positions itself as the missing middle ground: a tiny, portable CLI you can run locally or inside Docker to get a terse, grep-friendly summary in about two seconds.

Which HAR fields actually carry signal

A central design question for har-analyze is “which HAR fields are useful and which are noise?” The author distilled the signal down to a handful of fields that the tool actually reads and trusts:

request.url and request.method — always relevant.
response.status — used for failures and redirects.
response.content.size — treated as the uncompressed body byte count and preferred for totals.
response.content.mimeType — used for content-type bucketing.
response.headers — only a few headers are read (content-encoding, cache-control, expires).
time — the total request time in milliseconds, taken from the HAR (browser-reported).
_transferSize (Chrome-only) — used when present as the bytes actually transferred, otherwise the tool falls back to bodySize + headersSize.
startedDateTime — present but not trusted for ordering; parallelism can make it non-monotonic.
_initiator.type — used best-effort (parser vs script) to support a render-blocking heuristic.

The parser intentionally skips or downranks fields that are almost always empty or unhelpful in exported HARs (cache blocks, request.cookies/response.cookies, and pageTimings are noted as inconsistent or routinely empty). The tool focuses on a compact set of inputs that reliably produce useful output for the common cases the author sees.

How entries are normalized for simple rules

Parsing a HAR into a usable form is the first substantive step. har-analyze converts each nested HAR entry into an immutable Entry dataclass with a small set of fields: url, method, status, mime_type, content_type (bucketed), domain, size (uncompressed body size), transfer_size, time_ms, response_headers, request_headers and initiator_type.

Two small normalization practices make the rest of the code straightforward:

Lowercase header names once at parse time. HAR exports sometimes vary header capitalisation (Content-Encoding, content-encoding, etc.). By normalizing header names to lowercase at the parser boundary, downstream rules can use simple lookups without defensive casing logic.
Bucket mime types on parse rather than in each rule. Full MIME strings often contain parameters (for example, text/html; charset=utf-8). The parser maps common MIME prefixes or values into seven high-level buckets — html, css, js, img, font, xhr, other — so each rule can reason about content types without repeating string-matching logic.

The parser also clamps negative size values to zero. HAR sometimes uses -1 to indicate “unknown” (for example, for cached responses); without that clamp, summing sizes can produce negative totals and misleading summaries.

The missing-compression opportunity: a focused, conservative rule

One of har-analyze’s most practical checks is the “missing-compression” opportunity. The rule targets compressible buckets (html, css, js, xhr) and flags responses that are larger than a noise threshold and served without a content-encoding header that indicates compression (gzip, br, deflate, zstd). The implementation uses five guard clauses designed to avoid false positives:

Only consider compressible content buckets (skip images, fonts and other already-compressed or non-text content).
Ignore tiny payloads (< 1,024 bytes) because compression headers and overhead make savings negligible and would produce noisy results like flagging favicons.
Respect existing compression: check content-encoding via a substring match so headers like gzip or gzip, identity pass.
Skip server error and other non-success responses (status 0 or >= 400) because error bodies may not pass through normal middleware.
Rule severity and messaging are conservative and actionable (for example, suggesting gzip or brotli at the edge).

The overall approach is intentionally conservative: the author notes that a linter that cries wolf will be ignored, so har-analyze aims to avoid noisy recommendations.

Human-friendly terminal output and developer-friendly exports

har-analyze’s default renderer is a tree-style, terminal-oriented formatter that aims to feel familiar to CLI users. The human output uses box-drawing characters (├──, │, └──) similar to tree(1) and git’s graph output to represent hierarchy, and ANSI color is optional and controlled by a boolean flag to avoid pulling in dependencies. The formatter includes readable unit helpers for bytes and milliseconds, and the tool also supports machine-friendly exports: JSON for scripting, markdown for PR comments, and CSV for other workflows.

That flexibility lets har-analyze serve both interactive triage and automation use cases: you can pipe JSON into jq for scripted checks or generate markdown to paste into an issue or pull request comment.

Implementation structure and test discipline

The project is broken into small, pure modules with a thin CLI wrapper:

parser.py — converts HAR JSON into flat Entry records.
analyzer.py — computes counts, top-N lists, failures and duplicate detection from entries.
opportunities.py — runs rule checks and produces Opportunity records.
formatters.py — renders Analysis objects into the different output formats (human, json, markdown, csv).
cli.py — exposes argparse-based CLI entry points and manages exit codes.

The codebase favors pure functions and isolates side effects to the HAR file read and the final print in cli.main. Tests are designed to be fast and self-contained: the suite constructs a synthetic HAR in conftest.py so there’s no on-disk fixture and the test run is extremely fast (the author reports the test harness runs in roughly 60ms and provides an example pytest run showing 49 tests passing in 0.05s). That same focus on minimal runtime and no external dependencies extends to packaging: a Docker image built around Python 3.12 alpine plus the project source is about 60 MB in size.

Tradeoffs and deliberate omissions

har-analyze intentionally avoids trying to be a full synthetic lab or audit tool. Notable tradeoffs called out by the author:

Wall-clock timing is not reconstructed. The HAR time field is browser-reported and may reflect throttling or capture settings; har-analyze reports what’s in the HAR rather than attempting to translate it into an external wall-clock reality.
HTTP/2 server-push detection is skipped. HAR doesn’t explicitly identify pushed responses reliably across browsers, and heuristics have a poor false-positive rate; Chrome itself moved away from server push.
Connection reuse and related connection-level analysis aren’t a focus. HAR has serverIPAddress and connection fields, but they’re inconsistent across browsers and the typical symptoms of bad connection reuse tend to surface elsewhere in the summary.
Cache-aware reconstruction is out of scope. HARs captured under different reload modes (hard vs soft reload) can look substantially different; har-analyze reports cached and uncached entries honestly but does not attempt to simulate cold or warm loads (that would be closer to what Lighthouse or WebPageTest do).

The guiding principle is to be the “first 30 seconds of looking at a HAR”: a quick filter that tells engineers whether they need to run a deeper test or pull up a fuller tool.

How to try it quickly

The project is maintained in a public GitHub repository under the sen-ltd organization (repo name: sen-ltd/har-analyze). The author provides quick start instructions in the repository: clone the repo, build the Docker image, capture a HAR from the browser DevTools, and run the har-analyze image against the capture file. Example usages documented by the author include running with –opportunities to show rule findings, –format json to use the output in scripts, and –format markdown to produce PR-friendly summaries. The Docker image is intentionally small — the stated image contents are Python 3.12 on alpine plus exactly the project source — and the zero pip-dependency policy means there’s nothing to install beyond Python itself.

Who benefits and where it fits in a workflow

har-analyze is aimed at developers, SREs and performance engineers who need a fast, local way to triage HARs that colleagues paste into chat or attach to bug reports. It’s explicitly not a replacement for WebPageTest, Lighthouse or a full synthetic test suite; instead it’s complementary: use har-analyze first to surface obvious problems (large hero images, missing gzip/brotli, high-latency requests, status failures), then decide whether to run deeper lab tests. Because it produces both human and machine-readable outputs, it can live in ad-hoc local use and be integrated into lightweight automation or PR workflows where a quick markdown summary is useful.

Broader implications for developer tooling and performance triage

har-analyze exemplifies a minimalist design philosophy that has broader relevance for developer tooling. The project makes a case for tools that are small, dependency-free, and focused on the most common, high-value signals rather than attempting to cover every corner case. For teams that frequently trade performance anecdotes and HAR files over chat, reducing the cognitive overhead of inspection can shorten feedback loops and reduce context switching: a succinct summary that surfaces the outstanding issues lets an engineer decide quickly whether to escalate, run a lab test, or file a bug.

From a tooling ecosystem perspective, har-analyze occupies a complementary niche alongside browser DevTools, WebPageTest and Lighthouse. Those larger tools are indispensable for deep analysis and synthetic lab metrics; a fast, portable CLI reduces the friction of routine triage and can feed into larger workflows (for example, by creating a PR comment-ready markdown section or producing JSON for automated checks).

Project context and openness to extension

The tool is presented as an entry in a broader portfolio by SEN LLC and the author positions it as one small, pragmatic utility among other stdlib-first projects (examples named in the project portfolio include csvdiff and robots-lint). The author explicitly invites feedback about additional opportunity rules; the project currently implements four opportunity checks that cover frequent cases the author encounters, and the plan is to consider adding more rules only when they address recurring, real-world pain points rather than theoretical edge cases.

The repository’s tests, small codebase and Docker-first distribution model all support rapid iteration and contribution: the codebase is modular (parser, analyzer, opportunities, formatters, cli), which makes adding rules or formatters straightforward within the stated architecture.

Looking ahead, har-analyze’s focused approach suggests an efficient route for teams that want to automate the first-pass triage of HARs without committing to heavyweight dependencies: small, auditable code with deliberate guardrails and a conservative rule set can raise signal without drowning developers in noise, and the project’s author is actively receptive to expanding the checks when real recurring frustrations are identified.