Over the past few decades, criminology and economics have developed an interesting relationship. There's quite a few economists who research crime. And at plenty universities, the two departments are tightly linked.
Why? One obvious motivation is better understanding how economic realities might drive individuals to crime. But there's another reason - there is a TON of data on crime! And economists love data!
Many of the analysis methods economists learn aren't econ specific - the techniques can be applied to any kind of data.
Enterprising economists looking for a under-examined (or less crowded) field may turn to crime stats for unique and salient avenues of research.
Why? One obvious motivation is better understanding how economic realities might drive individuals to crime. But there's another reason - there is a TON of data on crime! And economists love data!
Many of the analysis methods economists learn aren't econ specific - the techniques can be applied to any kind of data.
Enterprising economists looking for a under-examined (or less crowded) field may turn to crime stats for unique and salient avenues of research.
The FBI Uniform Crime Report
The FBI releases an annual data set showing some broad, but comprehensive crime stats called the Uniform Crime Report (UCR). The UCR complies records of all known crimes committed in any US city with a population over 10,000 in a year.
I downloaded the most recent complete UCR the other day, and started playing with the data. Frequently, I was found myself surprised by what I was seeing. Quite a few cities I think of as "dangerous" actually had lower crime rates than I anticipated, and vice versa.
I can't tell you where, exactly, I got the idea that NYC was an unsafe place, for example. But it turns out, the Big Apple has some of the lowest crime rates in the nation. The completely incorrect idea had gotten into my head somehow, and I never even thought to question it.
I can't tell you where, exactly, I got the idea that NYC was an unsafe place, for example. But it turns out, the Big Apple has some of the lowest crime rates in the nation. The completely incorrect idea had gotten into my head somehow, and I never even thought to question it.
We often don't think about our own intuition or assumptions, or
wonder how they develop, do we? We just kinda feel like we know what we know - its not until we take the exam that we realize there are gaps in our understanding.
But data gives us the objective information needed to keep those prejudices in check.
But data gives us the objective information needed to keep those prejudices in check.
"Murder and Nonnegligent Manslaughter"
Although there's A LOT of different data included in the report, I'd like to focus on the homicide rates in the largest US cities.
I'm choosing homicide both because of its severity, but also because of its relative consistency across police departments in definition.
Additionally, since many other types of crimes (rapes, robberies, etc.) tend to go unreported, and dead bodies are hard to miss/ignore, I take this tally to be one of the most substantial and reliable ones included in the UCR.
Additionally, since many other types of crimes (rapes, robberies, etc.) tend to go unreported, and dead bodies are hard to miss/ignore, I take this tally to be one of the most substantial and reliable ones included in the UCR.
Definition
The UCR defines murder and nonnegligent manslaughter as:
"The willful (nonnegligent) killing of one human being by another. The classification of this offense is based solely on police investigation as opposed to the determination of a court, medical examiner, coroner, jury, or other judicial body. The UCR Program does not include the following situations in this offense classification: deaths caused by negligence, suicide, or accident; justifiable homicides; and attempts to murder or assaults to murder, which are scored as aggravated assaults."
'Major' Cities
There are 36,011 cities, towns, and villages in the US.
Of course, most of these aren't included in the UCR data, as it only considers municipalities with populations over 100,000. There are 597 of these.
And since this is just a humble blog post, I filtered that number down even more, and looked just at cities with populations of 250,000 or more (of which there are 72 nationwide). No particular reason I picked 250K, its completely arbitrary, but lets call them the 'major' US cities for now.
*Note - Flint, MI, and Camden, NJ actually had the highest homicide rates in the country. In Flint, 1 in 1,613 residents were victims of a homicide, in Camden it was 1 in 1,159. However, they didn't make this list, because their populations are below my 250,000 threshold.
Of course, most of these aren't included in the UCR data, as it only considers municipalities with populations over 100,000. There are 597 of these.
And since this is just a humble blog post, I filtered that number down even more, and looked just at cities with populations of 250,000 or more (of which there are 72 nationwide). No particular reason I picked 250K, its completely arbitrary, but lets call them the 'major' US cities for now.
Lowest Homicide Rates
SO - of the US's 72 cities with populations 250k or greater, here are the ones you were least likely to be killed in, in 2012. Do any of these surprise you?
Rank | City | State | Chances any resident would be victim of a homicide were 1 in: |
1 | Plano | TX | 273,816 |
2 | Lincoln | NE | 88,058 |
3 | Henderson | NV | 65,867 |
4 | Mesa | AZ | 32,242 |
5 | Santa Ana | CA | 30,226 |
6 | Portland | OR | 29,902 |
7 | El Paso | TX | 29,371 |
8 | San Diego | CA | 28,478 |
9 | Seattle | WA | 27,255 |
10 | Austin | TX | 26,868 |
11 | Lexington | KY | 25,194 |
12 | Raleigh | NC | 24,741 |
13 | Colorado Springs | CO | 24,016 |
14 | Anaheim | CA | 22,968 |
15 | Jersey City | NJ | 22,869 |
16 | St. Paul | MN | 22,362 |
17 | Arlington | TX | 22,311 |
18 | San Jose | CA | 21,699 |
19 | Virginia Beach | VA | 21,314 |
20 | Anchorage | AK | 19,943 |
21 | New York | NY | 19,784 |
22 | Riverside | CA | 19,596 |
23 | Corpus Christi | TX | 19,535 |
24 | Las Vegas | NV | 19,466 |
25 | Fort Worth | TX | 17,502 |
26 | Wichita | KS | 16,800 |
27 | Denver | CO | 16,117 |
28 | San Antonio | TX | 15,507 |
29 | Tampa | FL | 15,250 |
30 | Long Beach | CA | 14,684 |
31 | Sacramento | CA | 14,016 |
32 | Albuquerque | NM | 13,504 |
33 | Greensboro | NC | 13,149 |
34 | Los Angeles | CA | 12,893 |
35 | Tucson | AZ | 12,361 |
36 | Phoenix | AZ | 12,077 |
Highest Homicide Rates
Rank | State | City | Chances any resident would be victim of a homicide were 1 in: |
1 | Detroit | MI | 1,832 |
2 | New Orleans | LA | 1,880 |
3 | St. Louis | MO | 2,820 |
4 | Baltimore | MD | 2,869 |
5 | Newark | NJ | 2,905 |
6 | Oakland | CA | 3,146 |
7 | Stockton | CA | 4,213 |
8 | Kansas City | MO | 4,420 |
9 | Philadelphia | PA | 4,649 |
10 | Memphis | TN | 4,943 |
11 | Atlanta | GA | 5,266 |
12 | Chicago | IL | 5,417 |
13 | Buffalo | NY | 5,467 |
14 | Miami | FL | 6,005 |
15 | Cincinnati | OH | 6,439 |
16 | Milwaukee | WI | 6,587 |
17 | Oklahoma City | OK | 7,007 |
18 | Washington | DC | 7,185 |
19 | Toledo | OH | 7,334 |
20 | Pittsburgh | PA | 7,612 |
21 | Mobile | AL | 7,860 |
22 | Dallas | TX | 8,062 |
23 | Indianapolis | IN | 8,646 |
24 | Jacksonville | FL | 9,039 |
25 | Tulsa | OK | 9,498 |
26 | Fresno | CA | 9,922 |
27 | Minneapolis | MN | 10,006 |
28 | Nashville | TN | 10,014 |
29 | Houston | TX | 10,034 |
30 | Omaha | NE | 10,194 |
31 | Bakersfield | CA | 10,462 |
32 | Louisville | KY | 10,745 |
33 | Boston | MA | 11,064 |
34 | Aurora | CO | 11,619 |
35 | Fort Wayne | IN | 11,665 |
36 | San Francisco | CA | 11,889 |
*Note - Flint, MI, and Camden, NJ actually had the highest homicide rates in the country. In Flint, 1 in 1,613 residents were victims of a homicide, in Camden it was 1 in 1,159. However, they didn't make this list, because their populations are below my 250,000 threshold.
Do analysis of your own
Of course, this is just a demonstration of the sort of stuff you'll find in the UCR. I highly encourage you to download the data set, and mess around with it for yourself! Share any interesting finding in the comments section below.
(You'll download the data as a zip file. You'll have to 'unzip' the file, to extract the Excel spreadsheets that make up the UCR. If you don't know how to unzip a file, watch this video. If you don't know how to use Excel, I'd start by watching this playlist.)
(You'll download the data as a zip file. You'll have to 'unzip' the file, to extract the Excel spreadsheets that make up the UCR. If you don't know how to unzip a file, watch this video. If you don't know how to use Excel, I'd start by watching this playlist.)
Caveats
The FBI strongly encourages people not to do what I'm doing here: rank localities.
They don't like seeing it done because its not really fair to compare really different cities side by side on simple crime stats. Every city has its own unique relevant demographics, economic situation, policing practices, number of police, culture, urban geography...heck even weather might affect crime rates.
They don't like seeing it done because its not really fair to compare really different cities side by side on simple crime stats. Every city has its own unique relevant demographics, economic situation, policing practices, number of police, culture, urban geography...heck even weather might affect crime rates.
Because of this, The FBI doesn't like to see ranking lists. They're afraid people will draw unsubstantiated inferences from them.
Maybe a reporter finds City A has twice the car theft rate as City B, and jumps to the conclusion that thieves must be bolder in City A, and police less effective.
But what if City A also just has twice the car-ownership rates? Or half the off-street parking? Or half the number of car alarms?
If the reporter misses those facts, they'd fail to see that the cops and robbers in the two cities could very well be equally busy, and equally successful. The conclusion is revealed as erroneous once more info is introduced.
The forces that drive crime can be is so complicated, that even a tiny bit of additional information might completely discredit a researchers otherwise reasonable conclusion.
The forces that drive crime can be is so complicated, that even a tiny bit of additional information might completely discredit a researchers otherwise reasonable conclusion.
Ultimately, however, they're confusing cause and effect.
The FBI cautions against rankings because they're afraid people will draw bad inferences them. But that's a reason to caution against snap judgements, not a reason to caution against ranking lists themselves.
The FBI cautions against rankings because they're afraid people will draw bad inferences them. But that's a reason to caution against snap judgements, not a reason to caution against ranking lists themselves.
You need to create ranks and averages and indexes to give individual data points context, which is the only way they have meaning.
"One in every thousand citizens of our city had a cellphone stolen this year." ....Ok, so what? Is that a lot or a little? Are you saying I don't need to worry about, or that I do?
So I say, listicle on! There's nothing wrong with putting the stats out there. But, dear reader, do be careful in drawing conclusions from what you see, and always be asking "What else could this be? What else could have caused this?"
On that point, I recently was involved in a G+ conversation on literacy rates, with the original poster suggesting that United Arab Emirates, Kuwait, Jamiaca and The Bahamas had significantly higher female literacy than male literacy rates, and was wondering *why* those four.
ReplyDeleteI immediately smelled a fish: that's a pretty random list of countries, and 'significance' is a slippery fish when we're talking about hundreds of countries. So, I went and got the data from Wikipedia (which was from CIA worldbook) and looked at it myself. The first problem was that it was displayed in proportions of the total population, which are problematic from a statistical point of view: lots of countries were close to 1 for both male and female literacy, making it difficult to discuss significance from a linear model (i.e. a regression). Furthermore, its hard to see the effect of smaller population sizes (and thus smaller sample sizes) on the variance in the rates. I recalculated it as the ratio of literate females to literate males (and vice-versa) and looked at these much better-behaved numbers.
http://goo.gl/2wLgUX
First, if we do a regression, we find that the slope is parallel to a 1-1 line, but the intercept is lowered: in other words, ACROSS THE ENTIRE WORLD, there is a systematic bias toward male literacy rates being higher than female literacy rates. We can then use classic two-tailed 95% confidence intervals on the regressions. We can then identify the outliers as countries which have significantly higher male literacy or significantly higher female literacy.
> #these countries have more literate males to unliterate males than expected
> row.names(data)[pred1[,2]>y]
[1] "BosniaandHerzegovina" "Sudan" "Tanzania"
> #these countries have more literate females to unliterate females than expected
> row.names(data)[pred1[,3]<y]
[1] "AntiguaandBarbuda" "Bermuda" "Israel"
[4] "Kenya" "Laos" "Yemen"
Which are completely different than the four countries the original poster listed, and thus an immediate red flag: changing how we look at our data has changed our answer!
However, I'm not done! Every brand of statistics can be laid to waste by the Multiple Comparisons issue, and 95% CI still say we'd expect 1 outlier for every 20 observations. We have 195 countries here. How many outliers would we expect given that we are effectively testing for 'significant outlier-ism' with each one?
195/20 = 9.75
And we have 9 outlier up above. Hmmm.
Okay, so if we really want to test if any country is really different from our statistical model, then we want a really strenuous statistical test that accounts for each comparison we're making: i.e. each point against the confidence intervals. A common fix is a Bonferroni correction, where we divide the alpha (0.05) by the 195 comparisons we want to make, which means we are now using 99.9... confidence intervals. Yeah, its gonna be tough for anything to be an outlier.
And... (see bonferroni CI plot)... nothing is. There's no evidence from this data that any country really represents a break from a model where male literacy beats female literacy. There's just a lot of variance and a lot of countries.
I don't do social science though, so I might have missed something that disproves my little statistical exercise. Just like I wouldn't believe a non-paleontologist saying anything about the fossil record either. But as I said in the G+ thread, I don't see any evidence that any country really has a significant difference in male/female literacy ratios.