You are currently browsing the tag archive for the ‘statistics’ tag.
If you are like me and watch an unhealthy amount of professional basketball, it would behoove you to check out Stats.NBA.com and spend the next, oh, 8 hours or so poring over “advanced metrics.” New statistical categories like TS% (True Shooting Percentage), PIE (Player Impact Estimate), and EFF (Efficiency Rating), have been created in the past few years to explain the game in ways plain ol’ points, rebounds, and assists cannot.
The “statistical revolution” in basketball was started by nerds, embraced by bloggers, co-opted by front offices, and is now packaged in friendly charts and graphs. The site is comprehensive, easy to use, and pretty to look at. Check it out.
I confess, it’s true: as a statistician it can be easy to position a storyline a certain way.
But numbers themselves don’t lie, people do. What’s even worse is when certain numbers are removed wholesale from data sets. You really don’t see this happen in market reseach studies. If so, the data is usually incorrectly defined when collected, and the collection is usually redone. But it sure does happen with the U.S. government. Especially, it seems, with really important numbers they share with the general population. Bellweather benchmarks numbers of how the country is doing.
A great example is the official unemployment rate. Big news nowadays given the sluggish economy; big news in a presedential election year. Up until 1994 reporting was, give or take, a number representing the percentage of those not in school of legal and unretired work age who were unemployed. Then, for whatever reasons, the method for calculating this number was changed, overnight, by the U.S. Bureau of Labor Statistics. They removed what was termed “discouraged workers.”
Take the U.S. unemployment number (chart, right). The red line (U.3) is what the goverment says is going on, the official number that is reported., i.e., the monthly headline number we all read. The gray line (U.6) includes “short-term discouraged workers,” those seeking full-time employment, but only employed part-time, i.e., the ‘under-employed’ that was once only Christmas help and jobs for schoolkids. This number is not included in unemployment statistics, nor is it reported. A big difference. The blue line (SGS) includes “long-term discouraged workers” defined as “those who have looked for work in the past 12 months, but are not currently looking because of real or perceived poor employment prospects,” i.e., those who’ve tried everthing they know, and don’t know what to do. Or, in other words, lots of my freinds who have been out of work for a while. This number is not included in unemployment statistics, nor is it reported. A huge difference that is baffling.
A great place to dig and find the numbers that may better reflect what’s really going is “Shadow Government Statistics.” Sure, it may have a bit of an anti-government conspiratorial tone, but offers food for thought…
The President’s (not too) recent State of the Union Address finally has me thinking, so a post. The crux of his narrative was about the state of the U.S. economy and what needs to be done. Part of the issue, or course, is the ever-lingering problem we all face regarding high unemployment.
Statistics are the “science” of what I do for a living. Some revere the practice, others abhor its positivistic tenets, and others just scratch their heads and nod.
When speaking about out-of-work Americans, both we the people and the President zero in on statistics as a quick benchmark to gauge how bad (or good) things are nowadays. Specifically, that statistic is the percent of Americans unemployed within a particular period of time. This is the heuristic we all use: Obama, economists, cab drivers, Oprah. Yet, we all mutter under our breath that it’s not a “true measure” of those unemployed, and underneath we’ve been taught since our youth that “statistics lie” and “can be manipulated.” Maybe so.
Yet, often when my clients, the President and the rest of us focus on unemployment, we commit a classic boo-boo in the realm of statistics, by focusing on the incorrect statistic when explaining a phenomena. Statistics don’t “lie,” they can’t, they’re just dumb numbers. It’s people’s misperception and subsequent misinterpretation of their respective findings that are off the mark. For example, what matters when speaking about those out of work is to in fact assess those that are employed.
The employed-to-population number is a ratio, or statistic, of those in the U.S. over 16 years of age or older who are not in the Armed Forces, or in institutions (e.g. VA, nursing homes, prisons). That statistic now stands at 58.3%. Hence, just a bit more than half of us are working. That’s it! The interpretation of 58.3% working is a heck of a lot different that an unemployment rate of about nine percent. Neither number lies. And theoretically, they’re examining the same phenomena.
The moral of this rant: don’t obsess about the statistic presented to you. Instead, ask first what you want to measure. Or as Chuck D. say’s: “don’t believe the hype!”
As we finish week two of the NFL, I wanted to point out an interesting use of sports statistics made by the New York Times. Specifically, the Times looked at the connection between winning percentage and games lost to injury. Using a fairly simple, but logical weighting scheme the Times came up with 10 rules, ranging from the obvious (injuries, or their absence, have a drastic effect on a team’s success) to the eye opening (hamstring injuries linger – and NFL teams don’t know).
It’s a good article to keep in mind as your favorite NFL team makes you happy or serves up week after week of crushing despair.
XKCD is a web-comic with a rather sizable fanbase, particularly among the Digg crowd. For me personally, it’s hit or miss. However, from time to time, a strip comes up that I find both humorous and sometimes informative. Thought I’d share a couple of my favorites.
And don’t forget…
Correlation ≠ Causation