Additional text

Recommended browser for this blog: Chrome

Follow Economystified on facebook

All posts by Dan Whalen (LinkedIn, Github)

Monday, April 7, 2014

US Homicide Rates

Over the past few decades, criminology and economics have developed an interesting relationship.  There's quite a few economists who research crime.  And at plenty universities, the two departments are tightly linked.

Why?  One obvious motivation is better understanding how economic realities might drive individuals to crime.  But there's another reason - there is a TON of data on crime!  And economists love data!

Many of the analysis methods economists learn aren't econ specific - the techniques can be applied to any kind of data.

Enterprising economists looking for a under-examined (or less crowded) field may turn to crime stats for unique and salient avenues of research.


The FBI Uniform Crime Report

The FBI releases an annual data set showing some broad, but comprehensive crime stats called the Uniform Crime Report (UCR).  The UCR complies records of all known crimes committed in any US city with a population over 10,000 in a year.

I downloaded the most recent complete UCR the other day, and started playing with the data.  Frequently, I was found myself surprised by what I was seeing.  Quite a few cities I think of as "dangerous" actually had lower crime rates than I anticipated, and vice versa.

I can't tell you where, exactly, I got the idea that NYC was an unsafe place, for example.  But it turns out, the Big Apple has some of the lowest crime rates in the nation.  The completely incorrect idea had gotten into my head somehow, and I never even thought to question it.

We often don't think about our own intuition or assumptions, or wonder how they develop, do we?  We just kinda feel like we know what we know - its not until we take the exam that we realize there are gaps in our understanding.

But data gives us the objective information needed to keep those prejudices in check.


"Murder and Nonnegligent Manslaughter"

Although there's A LOT of different data included in the report, I'd like to focus on the homicide rates in the largest US cities.  

I'm choosing homicide both because of its severity, but also because of its relative consistency across police departments in definition.

Additionally, since many other types of crimes (rapes, robberies, etc.) tend to go unreported, and dead bodies are hard to miss/ignore, I take this tally to be one of the most substantial and reliable ones included in the UCR.


Definition


"The willful (nonnegligent) killing of one human being by another. The classification of this offense is based solely on police investigation as opposed to the determination of a court, medical examiner, coroner, jury, or other judicial body. The UCR Program does not include the following situations in this offense classification: deaths caused by negligence, suicide, or accident; justifiable homicides; and attempts to murder or assaults to murder, which are scored as aggravated assaults." 


'Major' Cities

There are 36,011 cities, towns, and villages in the US.

Of course, most of these aren't included in the UCR data, as it only considers municipalities with populations over 100,000.  There are 597 of these.

And since this is just a humble blog post, I filtered that number down even more, and looked just at cities with populations of 250,000 or more (of which there are 72 nationwide).  No particular reason I picked 250K, its completely arbitrary, but lets call them the 'major' US cities for now.


Lowest Homicide Rates

SO - of the US's 72 cities with populations 250k or greater, here are the ones you were least likely to be killed in, in 2012.  Do any of these surprise you?

Rank City State Chances any resident would be victim of a homicide were 1 in:
1 Plano TX 273,816
2 Lincoln NE 88,058
3 Henderson NV 65,867
4 Mesa AZ 32,242
5 Santa Ana CA 30,226
6 Portland OR 29,902
7 El Paso TX 29,371
8 San Diego CA 28,478
9 Seattle WA 27,255
10 Austin TX 26,868
11 Lexington KY 25,194
12 Raleigh  NC 24,741
13 Colorado Springs CO 24,016
14 Anaheim CA 22,968
15 Jersey City NJ 22,869
16 St. Paul MN 22,362
17 Arlington TX 22,311
18 San Jose CA 21,699
19 Virginia Beach VA 21,314
20 Anchorage AK 19,943
21 New York NY 19,784
22 Riverside  CA 19,596
23 Corpus Christi TX 19,535
24 Las Vegas NV 19,466
25 Fort Worth TX 17,502
26 Wichita KS 16,800
27 Denver CO 16,117
28 San Antonio TX 15,507
29 Tampa  FL 15,250
30 Long Beach CA 14,684
31 Sacramento CA 14,016
32 Albuquerque NM 13,504
33 Greensboro NC 13,149
34 Los Angeles CA 12,893
35 Tucson AZ 12,361
36 Phoenix AZ 12,077


Highest Homicide Rates

Rank State City Chances any resident would be victim of a homicide were 1 in:
1 Detroit MI 1,832
2 New Orleans LA 1,880
3 St. Louis MO 2,820
4 Baltimore MD 2,869
5 Newark NJ 2,905
6 Oakland CA 3,146
7 Stockton CA 4,213
8 Kansas City MO 4,420
9 Philadelphia PA 4,649
10 Memphis TN 4,943
11 Atlanta GA 5,266
12 Chicago IL 5,417
13 Buffalo NY 5,467
14 Miami FL 6,005
15 Cincinnati OH 6,439
16 Milwaukee WI 6,587
17 Oklahoma City OK 7,007
18 Washington DC 7,185
19 Toledo OH 7,334
20 Pittsburgh PA 7,612
21 Mobile AL 7,860
22 Dallas TX 8,062
23 Indianapolis IN 8,646
24 Jacksonville FL 9,039
25 Tulsa OK 9,498
26 Fresno CA 9,922
27 Minneapolis MN 10,006
28 Nashville TN 10,014
29 Houston TX 10,034
30 Omaha NE 10,194
31 Bakersfield CA 10,462
32 Louisville KY 10,745
33 Boston MA 11,064
34 Aurora CO 11,619
35 Fort Wayne IN 11,665
36 San Francisco CA 11,889

*Note - Flint, MI, and Camden, NJ actually had the highest homicide rates in the country.  In Flint, 1 in 1,613 residents were victims of a homicide, in Camden it was 1 in 1,159.  However, they didn't make this list, because their populations are below my 250,000 threshold.


Do analysis of your own

Of course, this is just a demonstration of the sort of stuff you'll find in the UCR.  I highly encourage you to download the data set, and mess around with it for yourself!  Share any interesting finding in the comments section below.

(You'll download the data as a zip file.  You'll have to 'unzip' the file, to extract the Excel spreadsheets that make up the UCR.  If you don't know how to unzip a file, watch this video.  If you don't know how to use Excel, I'd start by watching this playlist.)


Caveats

The FBI strongly encourages people not to do what I'm doing here: rank localities. 

They don't like seeing it done because its not really fair to compare really different cities side by side on simple crime stats.  Every city has its own unique relevant demographics, economic situation, policing practices, number of police, culture, urban geography...heck even weather might affect crime rates.

Because of this, The FBI doesn't like to see ranking lists.  They're afraid people will draw unsubstantiated inferences from them.  

Maybe a reporter finds City A has twice the car theft rate as City B, and jumps to the conclusion that thieves must be bolder in City A, and police less effective.  

But what if City A also just has twice the car-ownership rates?  Or half the off-street parking?  Or half the number of car alarms?

If the reporter misses those facts, they'd fail to see that the cops and robbers in the two cities could very well be equally busy, and equally successful.  The conclusion is revealed as erroneous once more info is introduced.

The forces that drive crime can be is so complicated, that even a tiny bit of additional information might completely discredit a researchers otherwise reasonable conclusion.


My rebuttal

I think the FBI makes some sensible arguments about the "pitfalls of ranking."  

Ultimately, however, they're confusing cause and effect.

The FBI cautions against rankings because they're afraid people will draw bad inferences them.  But that's a reason to caution against snap judgements, not a reason to caution against ranking lists themselves.

You need to create ranks and averages and indexes to give individual data points context, which is the only way they have meaning 

"One in every thousand citizens of our city had a cellphone stolen this year."  ....Ok, so what?  Is that a lot or a little?  Are you saying I don't need to worry about, or that I do?

So I say, listicle on!  There's nothing wrong with putting the stats out there.  But, dear reader, do be careful in drawing conclusions from what you see, and always be asking "What else could this be?  What else could have caused this?"

1 comment:

  1. On that point, I recently was involved in a G+ conversation on literacy rates, with the original poster suggesting that United Arab Emirates, Kuwait, Jamiaca and The Bahamas had significantly higher female literacy than male literacy rates, and was wondering *why* those four.

    I immediately smelled a fish: that's a pretty random list of countries, and 'significance' is a slippery fish when we're talking about hundreds of countries. So, I went and got the data from Wikipedia (which was from CIA worldbook) and looked at it myself. The first problem was that it was displayed in proportions of the total population, which are problematic from a statistical point of view: lots of countries were close to 1 for both male and female literacy, making it difficult to discuss significance from a linear model (i.e. a regression). Furthermore, its hard to see the effect of smaller population sizes (and thus smaller sample sizes) on the variance in the rates. I recalculated it as the ratio of literate females to literate males (and vice-versa) and looked at these much better-behaved numbers.

    http://goo.gl/2wLgUX

    First, if we do a regression, we find that the slope is parallel to a 1-1 line, but the intercept is lowered: in other words, ACROSS THE ENTIRE WORLD, there is a systematic bias toward male literacy rates being higher than female literacy rates. We can then use classic two-tailed 95% confidence intervals on the regressions. We can then identify the outliers as countries which have significantly higher male literacy or significantly higher female literacy.

    > #these countries have more literate males to unliterate males than expected
    > row.names(data)[pred1[,2]>y]
    [1] "BosniaandHerzegovina" "Sudan" "Tanzania"

    > #these countries have more literate females to unliterate females than expected
    > row.names(data)[pred1[,3]<y]
    [1] "AntiguaandBarbuda" "Bermuda" "Israel"
    [4] "Kenya" "Laos" "Yemen"

    Which are completely different than the four countries the original poster listed, and thus an immediate red flag: changing how we look at our data has changed our answer!

    However, I'm not done! Every brand of statistics can be laid to waste by the Multiple Comparisons issue, and 95% CI still say we'd expect 1 outlier for every 20 observations. We have 195 countries here. How many outliers would we expect given that we are effectively testing for 'significant outlier-ism' with each one?

    195/20 = 9.75

    And we have 9 outlier up above. Hmmm.

    Okay, so if we really want to test if any country is really different from our statistical model, then we want a really strenuous statistical test that accounts for each comparison we're making: i.e. each point against the confidence intervals. A common fix is a Bonferroni correction, where we divide the alpha (0.05) by the 195 comparisons we want to make, which means we are now using 99.9... confidence intervals. Yeah, its gonna be tough for anything to be an outlier.

    And... (see bonferroni CI plot)... nothing is. There's no evidence from this data that any country really represents a break from a model where male literacy beats female literacy. There's just a lot of variance and a lot of countries.

    I don't do social science though, so I might have missed something that disproves my little statistical exercise. Just like I wouldn't believe a non-paleontologist saying anything about the fossil record either. But as I said in the G+ thread, I don't see any evidence that any country really has a significant difference in male/female literacy ratios.

    ReplyDelete