Understanding Soundex

Understanding Soundex

Mary K. Popovich, Busia's Roots

The Soundex is a system of indexing used by a number of genealogical resources, including the federal censuses of 1880, 1900, 1910 and 1920, as well as some immigration records. It's purpose is to enable the researcher to find a given person's name even if it was misspelled. This can be very important when tracing your immigrant ancestor, especially if he had a Slavic name, or even one that was originally written in Cyrillic and transliterated into English based on its pronunciation.

The Soundex reduces each surname to an alpha-numeric code number, one letter and three numbers. Generally, the basic information was transcribed, e.g., from the census, and put onto a 3 x 5 index card. The Soundex code was recorded at the top left-hand corner of the card, and the information identifying where the person's record could be found was recorded in the top right-hand corner. All cards with a like number were grouped together, and then the group was alphabetized by given name.

Most of today's genealogical software programs include a utility to convert your surnames into Soundex codes. There is also a book available at most genealogical libraries that Soundexes all common surnames and most of the uncommon ones as well. So, why then would you want to understand the Soundex system?

The 3 x 5 cards that make up Soundex indexes were used for years by clerks before the cards were microfilmed. Over the years the oils and dirt from the clerks' hands darkened the paper of the cards, generally at the top corners. Thus, when the cards were microfilmed the information contained on the darkened corner, the soundex code number, may be unreadable. As you're cranking through the microfilm looking for your ancestor's code number, you may need to be able to code surnames "on the fly" in order to know whether you're in the right code for your ancestor, or whether you need to crank forward or backward.

Second, if you're familiar with how the Soundex works, you'll be able to put it to use in other areas of your research. For instance, you're looking for a name on the microfiche index of Wisconsin birth records. You know your ancestor was born in Wisconsin in 1899, but you're not finding her name where it ought to be. Well, perhaps the name was misspelled by the clerk recording the birth certificate. If you understand how Soundex works, you'll be able to work backwards from your ancestor's Soundex code number, substituting letters, based on the Soundex, which the clerk might have used.

The Soundex code numbers are based on how consonants are pronounced. Generally, the Soundex disregards vowels. However, the first letter of the surname is always kept, whether consonant or vowel. The letters "H", "W" and "Y" are also disregarded, because they often either act like vowels or are silent. These letters are easy to remember if you anagram them to the word "WHY".

Groups of consonants with similar characteristics of pronunciation are coded alike. For instance, say the words "do" and "to". Notice that your tongue is placed against your top front teeth and you expel a puff of air when making the initial "t" or "d" sound. "T" and "d" are therefore coded alike in the Soundex system.

The Soundex Code Numbers:

  1. = b, f, p, v
  2. = c, g, j, k, q, s, x, z
  3. = d, t
  4. = l
  5. = m, n
  6. = r

Let's take a surname and code it.

  1. Write the name "Washington" on a scrap of paper.
  2. Put a circle around the "W" because you always keep the first letter of the surname.
  3. Cross out the vowels -- "a", "i" and "o."
  4. Cross out the "h" (it's in WHY).
  5. You're left with Wsngtn.
  6. Inserting Soundex code numbers for these letters you arrive at: W-252

Note that there were a couple consonants left over ("t" and "n"). Once you have three digits following your surname initial, you're done. Ignore the remaining consonants. Similarly, if there are too few consonants in a surname, you add zeroes at the end until you have three digits. Some short surnames don't even have any consonants except the initial. The surname "Lee," for example, is coded L-000.

There is one more very important rule of Soundexing to remember. When any two contiguous letters would be coded alike, only one of them is counted. That is, if two consonants that are not separated from each other by a vowel would be coded alike, cross out all but the first of them.

Let's take another surname and code it.

  1. Write the name Szymanski on a scrape of paper.
  2. Put a circle around the "S".
  3. Cross out the "z" because it would be coded the same as the "S".
  4. Cross out the vowels -- "a" and "i".
  5. Cross out the "y" (it's in WHY).
  6. Cross out the "k" because it would be coded like the "s" immediately before it.
  7. You're left with Smns. (Note that you kept both the "m" and the "n" because even though they would be coded alike, they were separated by a vowel.)
  8. Inserting Soundex code numbers for these letters you arrive at: S-552

Okay, now try coding the name Shemanski or Shamansky or Semanski. You'll get exactly the same code for each. That's the beauty of the Soundex system! And, when you look at the actual Soundex index, along with Szymanski will also be Simons, Simmons and Simonson, all Americanized forms of the name, which you're ancestor might have chosen.

Soundex is not a perfect system. Some letters that are in fact silent are coded, for example, the "g" in Wright. There may also be a problem when letters of a surname were transposed. For instance, over the centuries, one of my surnames was alternately spelled Mulzof and Musolf. The "l" and the "z" or "s" sounds would switch. Another problem could be presented by names such as Schulz/Schultz. The German "z" already contains the "ts" sound, so the "t" in Schultz really isn't necessary for proper pronunciation in German. It probably crept into the spelling based on foreign influences.

However, anomalies such as these are few, and now that you understand the Soundex system, you'll be able to take them into consideration when doing your research. Happy Soundex Hunting!

Mary K. Popovich is a professional genealogist specializing in U.S. urban research, Germany & Poland. She has 16 years experience assisting research library patrons with wide variety of ethnic backgrounds and research problems.. Please contact her by email for additional information about her services.


Copyright © 2001
By the Author
All rights reserved