Research Limitations with “New” Death Master File
By Earl F Glynn | Franklin Center
In late 2011 the Social Security Administration made changes to the Death Master File to delete 4.2 million records and to remove geographic information.
At the same time the SSA announced they would be reporting fewer deaths in the future. About 36% of the deaths previously reported in the DMF will now not appear.
These changes to the DMF impose analysis limitations and may make the data nearly worthless within a few years.
Reporters looking for dead people who are still registered to vote will become less and less effective in their searches using the DMF in future years.
Analysis will be most affected in states where large numbers of people move to and later die, such as Florida.
Selected statements about the changes:
4.2 million state records removed; fewer records in future
Social Security Administration Fact Sheet:
“We began disclosing certain state records on the Public DMF in 2002. After review of the Public DMF, we have determined that we can no longer disclose protected State records. Section 205(r) of the Social Security Act prohibits SSA from disclosing State death records we receive through our contracts with the States, except in limited circumstances. Therefore, we cannot legally share those State records on the Public DMF.”
NTIS Important Notice: Change in Public Death Master File Records:
“Effective November 1, 2011, the DMF data that we receive from [Social Security Administration] will no longer contain protected state death records. …”
“The historical Public DMF contains 89 million records. SSA will remove approximately 4.2 million records from this file and add about 1 million fewer records annually.”
Testimony of SSA Inspector General Patrick P. O’Carroll, Jr. to Congress, Feb. 2, 2012:
“The file contains about 85 million records, and it adds about 1.3 million records each year. …”
“SSA receives about 2.5 million death reports each year from many sources, including family members and funeral homes.”
Simple math from the SSA IG’s statement suggests 48% of death reports may be missing in the DMF now.
Geographic Data Removed
Death Master File Extract Output Record Specification, Nov. 2011:
“Revised November 1, 2011 to remove the State/Country Code of Residence, Zip code – Last Residence, and Zip code – Lump Sum Payment fields as a result of no longer publishing protected state records.”
If one attempts to find matches in the 85-million record DMF from the whole country, there will be many false matches with many names.
One approach to reduce the number of false matches is to extract a state subset using geographic clues that may be in the file for a dead person.
Matching against a state subset is easier and faster, and the geographic information in the subset is useful for verification of matches.
How to find a state DMF subset in 2010
In 2010 there were four fields that provided geographic information about a dead person:
- SSN3: the first 3 digits of the SSN, which reflects the state where the original SSN application was made. For many this is where they were born, but not always.
- State code: a two-digit code maintained by SSA.
- LumpZIP: the five-digit post office ZIP code where a lump sum distribution payment was sent by SSA.
- LastZIP: the last known ZIP code maintained by SSA.
Let’s use the State of Missouri for an example of how to extract a state DMF subset:
1. SSN3: 486 – 500 for Missouri
2. State Code: 26 for Missouri
3. LumpZIP, LastZIP: 63001 (Allenton) to 65899 (Springfield)
The range(s) of ZIP codes for a state can be found in several places. One online resource from neighborhoodlink.com shows a list of ZIP codes for a state.
A free online file of ZIP codes can be filtered and sorted in Excel to find the ranges of ZIP codes for a particular state.
The USPS ZIP lookup can be used to verify the limits of the ranges.
I did not attempt to validate particular ZIP codes since only about half of all 5-digit numbers are valid ZIP codes. I’ll accept a ZIP for a state subset as long as it is within the correct range, whether or not the ZIP is technically valid.
We’ll want all DMF records that match any of the four conditions. A relatively simple SQL statement in MySQL tells us there are 2,736,860 records in a Missouri 2010 DMF:
SELECT COUNT(*) FROM dmf2010 WHERE ((SSN > "486000000") AND (SSN < "501000000")) OR ((LUMPZIP > "63000") AND (LUMPZIP < "65900")) OR ((LASTZIP > "63000") AND (LASTZIP < "65900")) OR (STATE = "26");
Some additional SQL queries tell us which conditions contributed how many of the total records — many death records hit on multiple conditions:
2,304,026 with proper SSN3 range for Missouri
792,313 with proper STATE code 26 for Missouri
1,729,023 for LASTZIP with Missouri ZIP
249,396, for LUMPZIP with Missouri ZIP
The high number of SSN3s in the total means most that died in Missouri applied for a Social Security Number in Missouri, which is not surprising.
With the removal of the geographic fields in late 2011, what happens to a Missouri 2012 subset?
How to find a state DMF subset in 2012
With the removal of the STATE, LASTZIP and LUMPZIP fields, the SQL statement to extract the Missouri DMF from an updated database is simple:
SELECT COUNT(*) FROM dmf WHERE (SSN > "486000000") AND (SSN < "501000000");
With DMF updates through Aug. 1, 2012, the 2012 Missouri DMF has only 2,335,383 dead people.
The 2010 and 2012 DMF subsets can be compared with a Venn diagram (created from the MIT online Venn Diagram Generator).
Missouri DMF Comparison 2010 and 2012
The SSA deletes erroneous entries in monthly DMF updates but most of the loss between 2010 and 2012 (the blue portion) reflects the Missouri portion of the 4.2 million dead nationally, who were dropped from the DMF file in Nov. 2011.
The diagram shows an addition of 59,557 new death records by Aug. 2012 from what was observed in early 2010. These new death records have no geographic information except for the SSN3.
The diagram shows an overlap of 2,275,826 death records from the 2010 and 2012 files. With this match geographic information from the 2010 can be used to update information in the new 2012 file.
Two possibilities explain most of the portion in blue:
- People who died in Missouri but did not apply for their SSN in Missouri, or
- Dead people with Missouri connections who were part of the 4.2 million purge records.
Obtaining an Oct. 2011 up-to-date national DMF the month before the purge might help with connecting new and old death information.
As geographic information for those not born in Missouri is connected with current death records the size of the blue area may shrink considerably.
Texas DMF Comparison 2010 and 2012
The 2010 extraction of a Texas DMF was complicated by multiple ranges of ZIPs. The ZIP code range of 75001 (Addison) to 79999 (El Paso) was found to be lacking some valid Texas ZIPs.
After studying several sources, I learned about these additional TX ZIPs: 73301, 73344 (Austin) and 88510-88595 (also El Paso).
Overall, the Venn diagram for Texas is similar to Missouri.
The size of the blue area is slightly larger in Texas than Missouri, and the size of the middle overlap is slightly smaller. This is likely caused because a larger proportion of people moving to Texas from other states who later die, than what is seen in Missouri.
Florida DMF Comparison 2010 and 2012
The Venn diagram show how the loss of geographic information in creating the 2012 Florida DMF will greatly impact any dead voter analysis in Florida.
With such a huge migration of people to Florida from other states, who then die in Florida, the 2012 Florida DMF has become quite small.
With only 2012 Florida DMF information, the number of dead to match against voter records is less than half of what it was in 2010.
SSN Randomization
On June 25, 2011 the Social Security Administration implemented a “randomization” process to help protect the integrity of the SSN.
The Social Security Administration removed the geographical significance of the first three digits of the SSN with these new “randomized” SSNs.
Creating a “state subset” of the DMF for those receiving SSNs after June 2011 will not be possible. Matching names against the full DMF for those receiving these randomized SSNs will result in a huge false positive problem.
Matching voter registration lists against the DMF will become ineffective in a few decades as all geographic information is lost to constrain the process to reduce false positives.
Data Sources
- Social Security Administration’s Death Master File, NTIS.
- Social Security Administration Death Master File, Investigative Reporters & Editors.
- Death Master File Notes 2010, Franklin Center. Gives statistical overview of DMF at that time.
- DMF Record Layout, Sept. 2001.
- DMF Record Layout, Nov. 2011.
Related
- Social Security Number Randomization, Social Security Administration
- Fact Sheet: Change to the Public Death Master File (DMF), Social Security Administration, 2011.
- NTIS Important Notice: Change in Public Death Master File Records, 2011.
- Death Master File Extract Output Record Specification, NTIS, Nov. 2011.
- Changes to the Public Death Master File (DMF) and the Social Security Death Index (SSDI), Steve’s Genealogy Blog, Nov. 1, 2011.
- Researchers Wring Hands as U.S. Clamps Down on Death Record Access, New York Times, Oct. 8, 2012.
- Changes in Access to Death Data May Impede Medical Research, news@JAMA, Oct. 19, 2011.
Previous Stories using DMF
- Dead voters in Kansas, Kansas Watchdog, Oct. 28, 2010.
- Dead voters in Missouri, Missouri Watchdog, Oct. 28, 2010.
efg
Contact Info: Email: Earl.Glynn@FranklinCenterHq.org, Twitter: @WatchdogLabs, Facebook: http://www.facebook.com/WatchdogLabs
Leave a comment



[...] Research Limitations with “New” Death Master File [...]
Death Master File Enables Youth SSN Identity Theft « Watchdog Labs
August 10, 2012
The Public Death Master File and the Social Security Death Index do not list the place of death. These databases do, hevweor, list the zip code of the address of record, that is, the zip code of the most recent address that the Social Security Administration had on file while the person was still living. For example, my Aunt Helen died in Ellis Hospital in Schenectady, Schenectady County, New York. The zip code for her address of record in the databases, hevweor, is 12019, the zip code for Ballston Lake, Saratoga County, New York.
Wali
October 10, 2012
As described in the article above, the ZIPs were removed about a year ago and are no longer in the file.
efg
October 10, 2012
I needed to sent them a NOTIZED leettr as to WHY? I wanted her death certicate ALSO I have to indict my relationship to the decease? As I was not close enough to get one. ONLY HER IMMEDIATE FAMILY would be allow to get it.I also noticed it was already on the death index however her daughter (who never had a close relationship with her) had given the WRONG BIRTH DATE on her certificate.She lied about dates and names for years. I see many alternate names on internet for her. Some legal some not. She was a person who would steal you ID and had over 20 that the family knew about.Banks tell us that people are going into the banks and closing accounts based on these certificates. I don’t believe that, but was told by one bank that this has happened.Use hers Pamela (O’NEIL)-IVIE-HEREFORD-KAIN she might have used yours.
Saurabh
October 10, 2012