tag:blogger.com,1999:blog-5737450142880120395.post6865643172145563866..comments2023-10-07T04:04:19.391-07:00Comments on Economystified: US Homicide RatesEconomystifiedhttp://www.blogger.com/profile/13721219444051369880noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-5737450142880120395.post-18852815073863987342014-04-09T10:08:23.211-07:002014-04-09T10:08:23.211-07:00On that point, I recently was involved in a G+ con...On that point, I recently was involved in a G+ conversation on literacy rates, with the original poster suggesting that United Arab Emirates, Kuwait, Jamiaca and The Bahamas had significantly higher female literacy than male literacy rates, and was wondering *why* those four.<br /><br />I immediately smelled a fish: that's a pretty random list of countries, and 'significance' is a slippery fish when we're talking about hundreds of countries. So, I went and got the data from Wikipedia (which was from CIA worldbook) and looked at it myself. The first problem was that it was displayed in proportions of the total population, which are problematic from a statistical point of view: lots of countries were close to 1 for both male and female literacy, making it difficult to discuss significance from a linear model (i.e. a regression). Furthermore, its hard to see the effect of smaller population sizes (and thus smaller sample sizes) on the variance in the rates. I recalculated it as the ratio of literate females to literate males (and vice-versa) and looked at these much better-behaved numbers.<br /><br />http://goo.gl/2wLgUX<br /><br />First, if we do a regression, we find that the slope is parallel to a 1-1 line, but the intercept is lowered: in other words, ACROSS THE ENTIRE WORLD, there is a systematic bias toward male literacy rates being higher than female literacy rates. We can then use classic two-tailed 95% confidence intervals on the regressions. We can then identify the outliers as countries which have significantly higher male literacy or significantly higher female literacy.<br /><br />> #these countries have more literate males to unliterate males than expected<br />> row.names(data)[pred1[,2]>y]<br />[1] "BosniaandHerzegovina" "Sudan" "Tanzania" <br /><br />> #these countries have more literate females to unliterate females than expected<br />> row.names(data)[pred1[,3]<y]<br />[1] "AntiguaandBarbuda" "Bermuda" "Israel" <br />[4] "Kenya" "Laos" "Yemen" <br /><br />Which are completely different than the four countries the original poster listed, and thus an immediate red flag: changing how we look at our data has changed our answer!<br /><br />However, I'm not done! Every brand of statistics can be laid to waste by the Multiple Comparisons issue, and 95% CI still say we'd expect 1 outlier for every 20 observations. We have 195 countries here. How many outliers would we expect given that we are effectively testing for 'significant outlier-ism' with each one?<br /><br />195/20 = 9.75<br /><br />And we have 9 outlier up above. Hmmm.<br /><br />Okay, so if we really want to test if any country is really different from our statistical model, then we want a really strenuous statistical test that accounts for each comparison we're making: i.e. each point against the confidence intervals. A common fix is a Bonferroni correction, where we divide the alpha (0.05) by the 195 comparisons we want to make, which means we are now using 99.9... confidence intervals. Yeah, its gonna be tough for anything to be an outlier.<br /><br />And... (see bonferroni CI plot)... nothing is. There's no evidence from this data that any country really represents a break from a model where male literacy beats female literacy. There's just a lot of variance and a lot of countries.<br /><br />I don't do social science though, so I might have missed something that disproves my little statistical exercise. Just like I wouldn't believe a non-paleontologist saying anything about the fossil record either. But as I said in the G+ thread, I don't see any evidence that any country really has a significant difference in male/female literacy ratios. dwbapsthttps://www.blogger.com/profile/17606476387441191531noreply@blogger.com