Economystified: US Homicide Rates

Over the past few decades, criminology and economics have developed an interesting relationship. There's quite a few economists who research crime. And at plenty universities, the two departments are tightly linked.

Why? One obvious motivation is better understanding how economic realities might drive individuals to crime. But there's another reason - there is a TON of data on crime! And economists love data!

Many of the analysis methods economists learn aren't econ specific - the techniques can be applied to any kind of data.

Enterprising economists looking for a under-examined (or less crowded) field may turn to crime stats for unique and salient avenues of research.

The FBI Uniform Crime Report

The FBI releases an annual data set showing some broad, but comprehensive crime stats called the Uniform Crime Report (UCR). The UCR complies records of all known crimes committed in any US city with a population over 10,000 in a year.

I downloaded the most recent complete UCR the other day, and started playing with the data. Frequently, I was found myself surprised by what I was seeing. Quite a few cities I think of as "dangerous" actually had lower crime rates than I anticipated, and vice versa.

I can't tell you where, exactly, I got the idea that NYC was an unsafe place, for example. But it turns out, the Big Apple has some of the lowest crime rates in the nation. The completely incorrect idea had gotten into my head somehow, and I never even thought to question it.

We often don't think about our own intuition or assumptions, or wonder how they develop, do we? We just kinda feel like we know what we know - its not until we take the exam that we realize there are gaps in our understanding.

But data gives us the objective information needed to keep those prejudices in check.

"Murder and Nonnegligent Manslaughter"

Although there's A LOT of different data included in the report, I'd like to focus on the homicide rates in the largest US cities.

I'm choosing homicide both because of its severity, but also because of its relative consistency across police departments in definition.

Additionally, since many other types of crimes (rapes, robberies, etc.) tend to go unreported, and dead bodies are hard to miss/ignore, I take this tally to be one of the most substantial and reliable ones included in the UCR.

Definition

The UCR defines murder and nonnegligent manslaughter as:

"The willful (nonnegligent) killing of one human being by another. The classification of this offense is based solely on police investigation as opposed to the determination of a court, medical examiner, coroner, jury, or other judicial body. The UCR Program does not include the following situations in this offense classification: deaths caused by negligence, suicide, or accident; justifiable homicides; and attempts to murder or assaults to murder, which are scored as aggravated assaults."

'Major' Cities

There are 36,011 cities, towns, and villages in the US.

Of course, most of these aren't included in the UCR data, as it only considers municipalities with populations over 100,000. There are 597 of these.

And since this is just a humble blog post, I filtered that number down even more, and looked just at cities with populations of 250,000 or more (of which there are 72 nationwide). No particular reason I picked 250K, its completely arbitrary, but lets call them the 'major' US cities for now.

Lowest Homicide Rates

SO - of the US's 72 cities with populations 250k or greater, here are the ones you were least likely to be killed in, in 2012. Do any of these surprise you?

Rank	City	State	Chances any resident would be victim of a homicide were 1 in:
1	Plano	TX	273,816
2	Lincoln	NE	88,058
3	Henderson	NV	65,867
4	Mesa	AZ	32,242
5	Santa Ana	CA	30,226
6	Portland	OR	29,902
7	El Paso	TX	29,371
8	San Diego	CA	28,478
9	Seattle	WA	27,255
10	Austin	TX	26,868
11	Lexington	KY	25,194
12	Raleigh	NC	24,741
13	Colorado Springs	CO	24,016
14	Anaheim	CA	22,968
15	Jersey City	NJ	22,869
16	St. Paul	MN	22,362
17	Arlington	TX	22,311
18	San Jose	CA	21,699
19	Virginia Beach	VA	21,314
20	Anchorage	AK	19,943
21	New York	NY	19,784
22	Riverside	CA	19,596
23	Corpus Christi	TX	19,535
24	Las Vegas	NV	19,466
25	Fort Worth	TX	17,502
26	Wichita	KS	16,800
27	Denver	CO	16,117
28	San Antonio	TX	15,507
29	Tampa	FL	15,250
30	Long Beach	CA	14,684
31	Sacramento	CA	14,016
32	Albuquerque	NM	13,504
33	Greensboro	NC	13,149
34	Los Angeles	CA	12,893
35	Tucson	AZ	12,361
36	Phoenix	AZ	12,077

Highest Homicide Rates

Rank	State	City	Chances any resident would be victim of a homicide were 1 in:
1	Detroit	MI	1,832
2	New Orleans	LA	1,880
3	St. Louis	MO	2,820
4	Baltimore	MD	2,869
5	Newark	NJ	2,905
6	Oakland	CA	3,146
7	Stockton	CA	4,213
8	Kansas City	MO	4,420
9	Philadelphia	PA	4,649
10	Memphis	TN	4,943
11	Atlanta	GA	5,266
12	Chicago	IL	5,417
13	Buffalo	NY	5,467
14	Miami	FL	6,005
15	Cincinnati	OH	6,439
16	Milwaukee	WI	6,587
17	Oklahoma City	OK	7,007
18	Washington	DC	7,185
19	Toledo	OH	7,334
20	Pittsburgh	PA	7,612
21	Mobile	AL	7,860
22	Dallas	TX	8,062
23	Indianapolis	IN	8,646
24	Jacksonville	FL	9,039
25	Tulsa	OK	9,498
26	Fresno	CA	9,922
27	Minneapolis	MN	10,006
28	Nashville	TN	10,014
29	Houston	TX	10,034
30	Omaha	NE	10,194
31	Bakersfield	CA	10,462
32	Louisville	KY	10,745
33	Boston	MA	11,064
34	Aurora	CO	11,619
35	Fort Wayne	IN	11,665
36	San Francisco	CA	11,889

*Note - Flint, MI, and Camden, NJ actually had the highest homicide rates in the country. In Flint, 1 in 1,613 residents were victims of a homicide, in Camden it was 1 in 1,159. However, they didn't make this list, because their populations are below my 250,000 threshold.

Do analysis of your own

Of course, this is just a demonstration of the sort of stuff you'll find in the UCR. I highly encourage you to download the data set, and mess around with it for yourself! Share any interesting finding in the comments section below.

(You'll download the data as a zip file. You'll have to 'unzip' the file, to extract the Excel spreadsheets that make up the UCR. If you don't know how to unzip a file, watch this video. If you don't know how to use Excel, I'd start by watching this playlist.)

Caveats

The FBI strongly encourages people not to do what I'm doing here: rank localities.

They don't like seeing it done because its not really fair to compare really different cities side by side on simple crime stats. Every city has its own unique relevant demographics, economic situation, policing practices, number of police, culture, urban geography...heck even weather might affect crime rates.

Because of this, The FBI doesn't like to see ranking lists. They're afraid people will draw unsubstantiated inferences from them.

Maybe a reporter finds City A has twice the car theft rate as City B, and jumps to the conclusion that thieves must be bolder in City A, and police less effective.

But what if City A also just has twice the car-ownership rates? Or half the off-street parking? Or half the number of car alarms?

If the reporter misses those facts, they'd fail to see that the cops and robbers in the two cities could very well be equally busy, and equally successful. The conclusion is revealed as erroneous once more info is introduced.

The forces that drive crime can be is so complicated, that even a tiny bit of additional information might completely discredit a researchers otherwise reasonable conclusion.

My rebuttal

I think the FBI makes some sensible arguments about the "pitfalls of ranking."

Ultimately, however, they're confusing cause and effect.

The FBI cautions against rankings because they're afraid people will draw bad inferences them. But that's a reason to caution against snap judgements, not a reason to caution against ranking lists themselves.

You need to create ranks and averages and indexes to give individual data points context, which is the only way they have meaning.

"One in every thousand citizens of our city had a cellphone stolen this year." ....Ok, so what? Is that a lot or a little? Are you saying I don't need to worry about, or that I do?

So I say, listicle on! There's nothing wrong with putting the stats out there. But, dear reader, do be careful in drawing conclusions from what you see, and always be asking "What else could this be? What else could have caused this?"

1 comment:

dwbapstApril 9, 2014 at 10:08 AM
On that point, I recently was involved in a G+ conversation on literacy rates, with the original poster suggesting that United Arab Emirates, Kuwait, Jamiaca and The Bahamas had significantly higher female literacy than male literacy rates, and was wondering *why* those four.

I immediately smelled a fish: that's a pretty random list of countries, and 'significance' is a slippery fish when we're talking about hundreds of countries. So, I went and got the data from Wikipedia (which was from CIA worldbook) and looked at it myself. The first problem was that it was displayed in proportions of the total population, which are problematic from a statistical point of view: lots of countries were close to 1 for both male and female literacy, making it difficult to discuss significance from a linear model (i.e. a regression). Furthermore, its hard to see the effect of smaller population sizes (and thus smaller sample sizes) on the variance in the rates. I recalculated it as the ratio of literate females to literate males (and vice-versa) and looked at these much better-behaved numbers.

http://goo.gl/2wLgUX

First, if we do a regression, we find that the slope is parallel to a 1-1 line, but the intercept is lowered: in other words, ACROSS THE ENTIRE WORLD, there is a systematic bias toward male literacy rates being higher than female literacy rates. We can then use classic two-tailed 95% confidence intervals on the regressions. We can then identify the outliers as countries which have significantly higher male literacy or significantly higher female literacy.

> #these countries have more literate males to unliterate males than expected
> row.names(data)[pred1[,2]>y]
[1] "BosniaandHerzegovina" "Sudan" "Tanzania"

> #these countries have more literate females to unliterate females than expected
> row.names(data)[pred1[,3]<y]
[1] "AntiguaandBarbuda" "Bermuda" "Israel"
[4] "Kenya" "Laos" "Yemen"

Which are completely different than the four countries the original poster listed, and thus an immediate red flag: changing how we look at our data has changed our answer!

However, I'm not done! Every brand of statistics can be laid to waste by the Multiple Comparisons issue, and 95% CI still say we'd expect 1 outlier for every 20 observations. We have 195 countries here. How many outliers would we expect given that we are effectively testing for 'significant outlier-ism' with each one?

195/20 = 9.75

And we have 9 outlier up above. Hmmm.

Okay, so if we really want to test if any country is really different from our statistical model, then we want a really strenuous statistical test that accounts for each comparison we're making: i.e. each point against the confidence intervals. A common fix is a Bonferroni correction, where we divide the alpha (0.05) by the 195 comparisons we want to make, which means we are now using 99.9... confidence intervals. Yeah, its gonna be tough for anything to be an outlier.

And... (see bonferroni CI plot)... nothing is. There's no evidence from this data that any country really represents a break from a model where male literacy beats female literacy. There's just a lot of variance and a lot of countries.

I don't do social science though, so I might have missed something that disproves my little statistical exercise. Just like I wouldn't believe a non-paleontologist saying anything about the fossil record either. But as I said in the G+ thread, I don't see any evidence that any country really has a significant difference in male/female literacy ratios.

Economystified

Pages

Additional text

Monday, April 7, 2014

US Homicide Rates

1 comment: