You are currently browsing the tag archive for the ‘statistics’ tag.

If you are like me and watch an unhealthy amount of professional basketball, it would behoove you to check out and spend the next, oh, 8 hours or so poring over “advanced metrics.” New statistical categories like TS% (True Shooting Percentage), PIE (Player Impact Estimate), and EFF (Efficiency Rating), have been created in the past few years to explain the game in ways plain ol’ points, rebounds, and assists cannot.

NBA Stats - Home

The “statistical revolution” in basketball was started by nerds, embraced by bloggers, co-opted by front offices, and is now packaged in friendly charts and graphs. The site is comprehensive, easy to use, and pretty to look at. Check it out.

I confess, it’s true: as a statistician it can be easy to position a storyline a certain way.

But numbers themselves don’t lie, people do. What’s even worse is when certain numbers are removed wholesale from data sets. You really don’t see this happen in market reseach studies. If so, the data is usually incorrectly defined when collected, and the collection is usually redone. But it sure does happen with the U.S. government. Especially, it seems, with really important numbers they share with the general population. Bellweather benchmarks numbers of how the country is doing.

A great example is the official unemployment rate. Big news nowadays given the sluggish economy; big news in a presedential election year. Up until 1994 reporting was, give or take, a number representing the percentage of those not in school of legal and unretired work age who were unemployed. Then, for whatever reasons, the method for calculating this number was changed, overnight, by the U.S. Bureau of Labor Statistics. They removed what was termed “discouraged workers.”

Take the U.S. unemployment number (chart, right). The red line (U.3) is what the goverment says is going on, the official number that is reported., i.e., the monthly headline number we all read. The gray line (U.6) includes “short-term discouraged workers,” those seeking full-time employment, but only employed part-time, i.e., the ‘under-employed’ that was once only Christmas help and jobs for schoolkids. This number is not included in unemployment statistics, nor is it reported. A big difference. The blue line (SGS) includes “long-term discouraged workers” defined as “those who have looked for work in the past 12 months, but are not currently looking because of real or perceived poor employment prospects,” i.e., those who’ve tried everthing they know, and don’t know what to do. Or, in other words, lots of my freinds who have been out of work for a while. This number is not included in unemployment statistics, nor is it reported. A huge difference that is baffling.

A great place to dig and find the numbers that may better reflect what’s really going is “Shadow Government Statistics.” Sure, it may have a bit of an anti-government conspiratorial tone, but offers food for thought…

The President’s (not too) recent State of the Union Address finally has me thinking, so a post. The crux of his narrative was about the state of the U.S. economy and what needs to be done. Part of the issue, or course, is the ever-lingering problem we all face regarding high unemployment.

Statistics are the “science” of what I do for a living. Some revere the practice, others abhor its positivistic tenets, and others just scratch their heads and nod.  

When speaking about out-of-work Americans, both we the people and the President zero in on statistics as a quick benchmark to gauge how bad (or good) things are nowadays. Specifically, that statistic is the percent of Americans unemployed within a particular period of time. This is the heuristic we all use: Obama, economists, cab drivers, Oprah. Yet, we all mutter under our breath that it’s not a “true measure” of those unemployed, and underneath we’ve been taught since our youth that “statistics lie” and “can be manipulated.” Maybe so.

Yet, often when my clients, the President and the rest of us focus on unemployment, we commit a classic boo-boo in the realm of statistics, by focusing on the  incorrect statistic when explaining a phenomena. Statistics don’t “lie,” they can’t, they’re just dumb numbers. It’s people’s misperception and subsequent misinterpretation of their respective findings that are off the mark. For example, what matters when speaking about those out of work is to in fact assess those that are employed.

The employed-to-population number is a ratio, or statistic, of those in the U.S. over 16 years of age or older who are not in the Armed Forces, or in institutions (e.g. VA, nursing homes, prisons). That statistic now stands at 58.3%. Hence, just a bit more than half of us are working. That’s it! The interpretation of 58.3% working is a heck of a lot different that an unemployment rate of about nine percent. Neither number lies. And theoretically, they’re examining the same phenomena.

The moral of this rant: don’t obsess about the statistic presented to you. Instead, ask first what you want to measure. Or as Chuck D. say’s: “don’t believe the hype!”

A recent article in the WSJ titled “Hard Questions From ‘Soft’ Sciences” got me thinking, and when I’m spurred to think a bit, I might as well share the thought!

In much of the last century, the goal of science was to answer “big” questions in the ‘hard’ sciences: curing polio, getting to the moon, building a better car, quicker. In school as a kid, I was taught the validity of objectivity by way of the “scientific method” – observable, empirical, and measurable evidence as a methodology to reasoning. In high school, both my statistics and biology teacher explained that this approach led to the development and advancement of knowledge as we know it, and the main reason why we don’t all still lived in caves. “Thank God they came up with that!” I remember thinking at the time, for I kinda liked the way I had my bedroom set-up at the time.

Not any longer.  Nowadays, probably since the 70s, the public discource of inquiry has shifted to the study of the social sciences. Even more recently, say the past ten years, this focus on social scientific inquiry has taken a nose dive, and become notably obsessed with pop consumer behavior. People of all stripes, not just business leaders, are reading “The Tipping Point”, “The Blank Slate”, and “Freakanomics” to better understand, in a couple hundred pages, the application of sociology, psychology, evolution, and economics.  Grasping the dynamics of the social order in less than a week; understanding the big ‘why’s’ of the order of man in his environment.

Heck, only twenty years ago the business and social sciences sections of a bookstore was the vestige of dull bankers seeking interest rate formulas or neurotic mothers trying to figure out why their kids smoke pot. Now bin titles are pushed up to the best sellers table, featured as top choices on Amazon, or visiting visiting a Kindle or iPad near you.

Arguably, the social sciences today are under attack from those seeking a book deal and $25,000+ fees from keynote speaking at marketing conferences. It’s a good gig if you’re willing to spend a chunk of time pouring over the past two decades of arcane academic journals in a particular field of social science study; find themes of scholarly discourse that evolve and bubble up over time in the literature, and then condense the themes into palaple chapter-bites of info dumbed down to a Flesch Readability Index at the junior-high school level (i.e. cocktail party banter). Then find an agent to sell it to a publisher; then get a stylist.

What ever happened to the a good old-fashioned maturation in a social sciences graduate degree program for those of us who sought beyond the strict adherence to the scientific method explaining all (and detour a few more years from fulltime work)? I mean, what’s wrong with being mentored by scholars who, without motive of profit or fame, help align the stars a bit? 

In the end, I’ve never read Gladwell (yes, I’ve lied) or really any of the others who provide ‘cliff notes’ on social phenomena. Myself, I choose to go to the source, the academics, the originals. Like Gladwell? Read Granovetter and Rodgers. That’s what grad school taught me.  If you skip to shortcuts, your only shorting yourself…


NFL logoAs we finish week two of the NFL, I wanted to point out an interesting use of sports statistics made by the New York Times. Specifically, the Times looked at the connection between winning percentage and games lost to injury. Using a fairly simple, but logical weighting scheme the Times came up with 10 rules, ranging from the obvious (injuries, or their absence, have a drastic effect on a team’s success) to the eye opening (hamstring injuries linger – and NFL teams don’t know).

It’s a good article to keep in mind as your favorite NFL team makes you happy or serves up week after week of crushing despair.

XKCD is a web-comic with a rather sizable fanbase, particularly among the Digg crowd. For me personally, it’s hit or miss. However, from time to time, a strip comes up that I find both humorous and sometimes informative. Thought I’d share a couple of my favorites.

Statistically Significant

Statistically Significant

And don’t forget…

Correlation ≠ Causation