Hyphens are punctuation, a part of the text; en and em dashes are not, they are formatting marks. I’ll talk a little about the differences and genealogical applications of each. A brief resources section to highlight significant sources used in this article is also given.

The hyphen, en, and em dashes discussed here are part of the standard font package. The hyphen is in the Basic Latin section and the other two are found in the General Punctuation part of the font’s special characters listings.

Hyphens are punctuation, a part of the text. In the old days of the typewriter and early days of the computer, hyphens were doubled and tripled to substitute for dashes. This is unnecessary now as we have proper dashes available. The hyphen is also distinct from a minus sign, but mathematical expressions occur only rarely in our type of writing.

En Dash

En dashes are what Bringhurst (see resources section) calls analphabetic characters. His thought about the handling of them is different from traditional usage. The differences he considers significant take into account more languages than English, which most fonts are designed for.

In genealogical writing, the en dash is the strongest visual indicator for date ranges. En dashes are meant to separate the two ends of a range such as 1582–1752. Some textual terms can also benefit from its use. En dashes emphasize a separation between a prefix and a word in a compound term such as post–1945, or pre–marriage.

Em Dash

Em dashes separate thoughts. They represent missing data in some cases as in unknown surnames (—?—).

In terms of formatting, there are several micro-stylistic thoughts to consider. One is how much spacing there should be around the em dash.

Bringhurst would have us use spaces around the en dash as an alternative to the (subjectively) lengthy em dash as in “… – …”. Doing this would lead to putting a non-breaking space between the last characters before the en dash to keep the two together, possibly affecting a text’s justification.

One of the faults with Times New Roman is that the em dash is too long. Most professionally designed fonts compensate for the length of the em dash by making the capital M a more realistic width. Times New Roman was designed for a specific purpose: newspapers, and should only be used by that type of publication. Linux Libertine, on the other hand, was designed for more common publications such as this one, and books, so its readability is greater.

Illustration: Linux Libertine and Times New Roman em dashes

Hatcher, and Leclerc and Hoff (see resource section for both), differ on whether there should be spaces around an em dash in text. I would prefer the latter, and include the spaces. Doing this also requires that you pay attention to justification and word breaks at the beginning, so the dash doesn’t sit by itself at the beginning of a line.

My own thought on doubling or tripling the em dash for missing names is that it’s unnecessary. A triple dash, or in Unicode terminology a “horizontal bar” (―) can stand in. It is shorter, and more representative of the strong emphasis necessary. I prefer to denote missing data with just an em dash or as (—?—) [opening parenthesis em dash question mark em dash closing parenthesis].

Dumb and Curly Quotes, Redux

Using real quotes (curly “ / ”) raises the tone of what we read. It’s also what we’re most brought up to see in printed published materials. Online it’s another matter, though, since most early computer systems couldn’t handle curly quotes and kept the dumb quote from the teletype repertoire.


