Skip to content
  • Florian 
  • 7 min read

Compare Single-Line HTML Files: A Step-by-Step Guide

This hint is based on a support email we received a while ago:

Doing a diff of 2 files with no new line characters.  There are 3 changes in the line – I used to be able to click the next change and go to the next difference.  Now the app is just showing a single line (wrapped) and says 1 change which is the whole line.

How do I get back to doing character diffs instead of whole line diffs?

In the spirit of a quick tip, we first offer the final solution, before explaining how to arrive at it.

The Solution

Let’s assume we have page1.html and page2.html, both containing only one line of text, despite describing a complex HTML page. Use this command to show a human-readable diff in Kaleidoscope:

ksdiff <(tidy -qi page1.html 2>/dev/null) <(tidy -qi page2.html 2>/dev/null)

Let’s step back and explain what this is all about. While this hint shows a specific solution for HTML files, its strategy can be applied to many other formats, with some modifications.

The Problem

Kaleidoscope shows differences between files, but when those files consist of a single, very long line of text (like the HTML files mentioned above), the comparison can both take a long to compute and be difficult to read.

A comparison of two single-line HTML files shows only 1 Change, as it contains only one line, and is hard to read.

How can we improve that diff?

The Strategy

We need to divide our problem into two steps:

  1. The key is to convert the HTML into a format containing multiple lines, but still results in the same rendered page as the original. In general, a reasonable conversion depends on the content and your goal, so it can’t be done automatically by Kaleidoscope.
  2. Compare the converted files instead of the originals.

This process might sound inconvenient if it required creating intermediate files that need to be cleaned up later. Fortunately, we can avoid that!

The Implementation

We are going to use the command line tool ksdiff for this example. Using the command line provides a powerful way to accomplish this task, giving us the full flexibility of Unix tools. If you aren’t familiar with ksdiff, it’s the command line tool for Kaleidoscope, integrating Kaleidoscope with the Unix world. We have several articles to help you get started.

In the simplest example, just calling ksdiff with those two files as arguments would compare them in Kaleidoscope, getting the same result as if you would drop them onto the Kaleidoscope app icon.

ksdiff page1.html page2.html

Let’s look at step 1 above: how can we convert those HTML files into something more readable without modifying the resulting page? macOS ships with a command called tidy that helps with formatting (and fixing) HTML files. A possible alternative would be xmllint.

Tidy to the Rescue

Here’s what the official project page says about tidy:

Tidy is a console application for macOS, Linux, Windows, UNIX, and more. It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.

Tidy has been around for a long time and the version shipping with macOS is quite outdated. If you plan on using it more often, you may want to install a more recent variant, e.g., using brew install tidy-html5.

Tidy Example

Let’s say page.html contains this very simple HTML code, all in one line:

<html><head><title>Page Title</title></head><body><h1>Main Headline</h1></body></html>

When using the command tidy -qi page.html, we get a nicely formatted result printed to the standard output. Depending on the input file, tidy might also print warnings or errors above the formatted HTML.

line 1 column 1 - Warning: missing <!DOCTYPE> declaration
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">

<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 204), see www.w3.org">

  <title>Page Title</title>
</head>

<body>
  <h1>Main Headline</h1>
</body>
</html>

See how ⁠tidy has added line breaks, indentation, and even a meta tag, making the HTML much more readable?

We can route the formatted output into a file using the -o parameter, but then we’d have to deal with those files. So let’s look at a way to do this in one step. We could also route the error output to a file using the -f parameter, but that would be yet another file to deal with.

ksdiff and the Standard Input

Let’s start with the main part, the HTML result. Turns out you can use standard input with ksdiff, by using - instead of a file name as a parameter in ksdiff. The following command does the trick:

tidy -qi page.html | ksdiff -

It will first call tidy on page.html and pipe the output of that command into ksdiff. But then we have a Kaleidoscope document with only one page. How about getting the other one in as well? There’s only one standard input and output in Unix, called stdin and stdout, so we can’t use - twice in one command.

Using process substitution

But there’s actually an even more elegant way on Unix operating systems, it’s called process substitution. The syntax looks like this:

ksdiff <(processA) <(processB)

processA is something outputting the content for side A and processB for side B. So we could write ksdiff <(cat page1.html) <(cat page2.html) to use the cat command to print the file content into ksdiff.

Now that we have all ingredients, the final step is obvious: we need to replace processA and processB with our tidy commands:

ksdiff <(tidy -qi page1.html) <(tidy -qi page2.html)

This will transform both page1.html and page2.html using the tidy command and feed both outputs into ksdiff, which will happily open the contents in Kaleidoscope, giving you a nice comparison. Goal achieved!

Bonus Hint: ksdiff is not limited to two inputs. Just add more files to the command, they will be added to the Kaleidoscope document, visible in the File Shelf, with the last two being compared initially.

Ignoring Errors

In case either file contains formatting that tidy complains about, we end up with all kinds of warnings on the command line. Let’s prevent that by ignoring those warnings, emitted to stderr. The proper way to do this in Unix is to send stderr to /dev/null. The syntax for redirecting stderr to a file is 2>, so we need to add 2>/dev/null.

ksdiff <(tidy -qi page1.html 2>/dev/null) <(tidy -qi page2.html 2>/dev/null)

The Final Result

Our final ksdiff command compares the re-formatted HTML files, making the diff much easier to understand.

Summary

With a little command line magic, seemingly complex tasks can be carried out without creating extra files to be concerned about and we can provide an elegant and quick solution. When needed regularly, one could easily wrap that command in a shell script, Alfred Workflow, Raycast Extension, Shortcuts workflow, …