How to Download and Analyze Your Own Census Data
Oddly enough, there are market research consultants who are unaware that they (and you, and anybody in the world) can have direct access to census data files for analysis.
We just wrote a newsletter about the value of census data, and about the amazing rigor that goes into validating questions like those on race & ethnicity. I take it for granted that our competitors keep up on this stuff and know all the same things that we do. So I was a little disbelieving when the president of another firm posted this query on one of our top industry online forums:
Can anyone provide a resource for discovering a breakdown of percentage of each age (NOT age bracket) in the U.S. Population? For instance, I need to know the incidence of people who are 18 in the U.S.
It seems he is accustomed to using the Census Bureau’s website data tools, which of course aggregates data into age brackets for reporting. He has no idea that he can download the raw data that drives these reports, which would give him faster, easier, and more powerful information.
I could have jumped in and answered his post: “Sure, it’s right here on my desktop. In 2018 there were an estimated 4,613,832 people age 18 in the U.S. so the incidence is 1.4%” Or I could have been more precise and looked at the number of people age 16 and/or 17 (now that we are in the year 2020). Nah. I’ll just share that secret with you, dear readers of the Versta Research newsletter and blog: You can get your own census data! By the way, it took me exactly 1 minute and 26 seconds to locate the data on my machine, open it up, run the query in my (painfully-slow) SPSS software, and find the answer to this guy’s question.
Here’s what you will need:
1. Two files from the 2018 ACS PUMS website (which stands for American Community Survey Public Use Microdata Sample): csv_pus.zip and csv_hus.zip. The first file has person-level data. The second file has household-level data. It’s best to sort and merge the files so that you can look at person-level data (like age) by household-level attributes (like income).
3. You will also need software to tabulate and analyze the data. SPSS is handy. The downside is that it’s slow with large volumes of data like this. You can try Excel, but it is cumbersome for data tabulation, and too easy to make mistakes. The coolest and most elegant option would be to use R (which is free!) but if you don’t already use it, it will take you several months to learn.
Every year, I download the most recent data and store it on my local machine. I find myself using it all the time, sometimes daily. You can too! Need help? Reach out to your research vendor — or, wink, wink — just call us.
—Joe Hopper, Ph.D.