Thought: How Happy Are You?

9/7/17

By Kevin DeLuca

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $2.13 – about 9.67 minutes

I like to find solutions to Daily Optimization problems, but ultimately the reason I want to do this is to increase my happiness. By finding out the optimal time to leave in order to minimize the time I spend waiting for the subway, the less time I have to spend in an unpleasant state, which is one way of making myself happier.

But how much happier am I? How do you measure how happy you are in the first place?

I’ll try not to get too philosophical here, so just ask yourself this: on a scale from 0-10, how satisfied are you with where you are in your life right now?

On average, people in the United States say they are at a 6.99, according to a 2016 Gallup report. Norway was the highest at 7.54 and Central African Republic (CAR) was the lowest at 2.69. Syria was a low 3.46, less than half the US average.

Now ask: If you had to rate how happy you feel right now, on a scale from 0-6, what would you say? In the United States, the average response was 4.32.[1]

These are two simple ways to get an idea of how happy people are. The 0-10 life satisfaction scale (often called the “Cantril Ladder”) gives a measure of how satisfied people are with their lives overall. This measure depends heavily on the “remembering self” – it relies on a global evaluation of your entire life. In contrast, you can also think about an “experiencing self”, which relies on your subjective evaluation of how you feel in a particular moment. This is what the 0-6 scale measures.

Using this framework, if you want to make yourself happier, you could have two goals, which may or may not be at odds with each other:

  • Increase your overall life satisfaction (make the remembering self happy)
  • Improve your moment-to-moment experiences (make the experiencing self happy)

Ideally I think you would want to accomplish both, but first you need some way to measure each type of happiness – one measurement that captures the sort of overall life satisfaction that the remembering self thinks about, and another that captures the moment-by-moment feelings that the experiencing self enjoys. If these measures have a high degree of validity (i.e. if they measure what we want them to measure), then you can become happier by tracking and improving them.

I’ll come back to the broader life satisfaction measure in the future. For now, I want to focus on the moment-to-moment, “experiencing self” measure of happiness, because I think there is a really good way to measure this and potentially use it to help make us happier.

Happiness Ratings and the U-Index

My favorite approach to measuring the emotions of the experiencing self was developed by psychologist Daniel Kahneman and the economist Alan Krueger. They’ve written a paper as well as a book (with additional coauthors) that explains many of the conceptual issues surrounding the measurement of subjective well-being and happiness. Much of the following discussion comes from their work.

In their research, they first survey people and ask them to “reconstruct” everything they did the previous day – e.g., “first I woke up, then I took 30 minutes traveling to work, then I worked for a couple hours, then I spent an hour at lunch, etc., etc.” Next, for each activity mentioned, they ask the person to rate their subjective experience of happiness, tiredness, stress, sadness, pain, and meaningfulness, at the time they were participating in the activity. So, if they said they had a meeting with their boss in the afternoon, they would have to rate how happy they felt in the meeting, how sad they felt in the meeting, how stressed they were in the meeting, and so on. Using the survey results, they were able to get estimates of how certain activities affect people’s subjective well-being.

There are a bunch of interesting and fun things you can do with this data. The Census recently collected this type of data using a survey – the Well-Being Module of the American Time Use Survey – which asked the same emotional affect questions to a bunch of Americans. Table 1 shows the average responses, on a 0-6 scale, for men and women in the United States, for 2010, 2012, and 2013:

The “Cantril Ladder” row in Table 1 is the 0-10 life satisfaction rating mentioned earlier.

Because the survey asks about specific activities, you can also estimate how happy people are while they do different things. Table 2 shows people’s emotional affect ratings, conditional on which type of activity in which they were participating.

Kahneman and Krueger also define a measure called the “U-Index”, where “U” stands for “unpleasant” or “undesirable”. It measures the proportion of time in which a person or a group of people spend in a negative emotional state. In practice, this is calculated by creating a variable equal to one if a negative emotion is the dominant emotion at a particular time, and equal to zero otherwise. In this case, it’s equal to one if sadness, stress, or pain is higher than self-reported happiness. Averaging this indicator variable over all time spent gives the percentage of time where a negative emotion is the dominant emotion. In general, I think it’s fair to assume that people would like for their U-index to be as low as possible.

Table 1 above shows the U-index for the US population. It can be interpreted as: men spend 13% (U-index = 0.13) of their time in an unpleasant state, whereas women spend 15% (U-index = 0.15) of their time in an unpleasant state. Table 2 shows the U-index for different activities – education has the highest U-index out of this subset of activities, with about a fourth of the time spent doing educational activities being rated as unpleasant. From the table, it looks like this comes from the relatively low happiness and high stress ratings for education.

The point of this research, as stated in related work by Krueger et al., is to help people measure society’s subjective well-being. It provides alternative measures to traditional economic indicators, like GDP per capita, which can miss important trends. Many countries now collect time-use data, and many are working on incorporating this and related well-being measures into their policy evaluations. Generally, the idea is that you could look at or whether the U-index of certain groups has been changing over time, and explore how these trends have implications for policy prescriptions.

I believe it can also be used by individuals to track their own happiness. Instead of surveying a lot of people about a single day, you could just survey yourself every day and track how happy you were, how sad you were, how much pain you felt, etc., and even calculate your own personal U-indexes for all your activities. To make it easier, instead of reconstructing your entire day and rating your emotions for each activity, you could sample with randomly timed self-surveys throughout the day to produce the relevant statistics (if you did this for enough days, of course). If you collected this data consistently for a long enough time, you could create a table of your own personal emotional affect.

In the classic Thoughtburner self-experiment style, I’ve been administering exactly these randomly-timed surveys to myself for over half a year. I have an app buzz me randomly throughout the day (on average, once an hour), and then I fill out a shortened version of the well-being module survey that was administered by the Census Bureau and Krueger et al. This method has the added benefit of being even more accurate at capturing the moment-by-moment emotional affect (experiencing self) than the Census well-being module, since my emotional reports literally take place in the moment whereas the well-being module asks people to recall their emotions from the day before.

Table 3 compares the US average emotional affect to my own personal averages, and the last column shows the difference between the two:

The table shows that my own ratings are lower than the US averages – both positive and negative emotions. I’m not too intense of a guy when it comes to emotions, so I think this is a reasonable result.

It’s also possible to break down the U-index into its component parts using this data. I can see how much of the time I spend in an unpleasant state is due to stress vs. pain vs. sadness.

Clearly a lot of my unpleasant experiences came from being stressed out, and almost none of it came from being in pain (it’s nice to be young and healthy). If I really want to lower my U-index, probably the best thing to do would be how to either reduce the time I spend in stressful situations or do something to make those situations less stressful. (The reason that the three categories don’t add up to the overall U-index is that there is some overlap between the different criteria).

Since I’ve been doing this for a while, another cool thing we can look at is any trends over time in these emotional affect results.

Happiness over time:

Stress over time:

Most of the other emotions had no trend over time for me. The gradual increasing of my happiness rating combined with the gradual decreasing of my stress rating suggests that my U-index might also be decreasing.  Looking at the U-index, the data shows that it decreased a fair amount:

This trend comes from a few things. First, I was working on graduate school applications in November, and their deadlines were all around December 1st, which is when my U-Index begins its gradual decline. Second, I took vacation towards the end of December/beginning of January, which probably helped to lower my U-index. And then I was accepted to graduate school in early February, which is part of why I think it stayed low despite returning to work.

We can also look at how certain emotions change depending on the time of day. I wish this were more interesting, but the only sort of cool result is the rating for “tired”, since the rest don’t have much variation. The time of day is military time (0-24 hours), and my average tiredness rating follows the pattern you’d probably expect:

A Simple Application 

There’s still a question as to how this data can be useful. Some people (like myself) may find it interesting on its own, regardless if there is another need for it. But others will want something more concrete.

One simple exercise we can do is to estimate how a re-allocation of time would affect my well-being. Let’s say I want to estimate how reducing my commute time by 50% would affect my average level of happiness, ceteris paribus. To do this, we need to know 1) the percent of my time I spend commuting, 2) the average emotional ratings for happiness while I’m commuting, and 3) the average emotional ratings for happiness during the activity to which I reallocate my commute time.

All of these factors can be easily computed using the data I’ve collected from my random sampling. Table 4 below shows the necessary stats:

(I used to commute a lot, so if the percentage seems high, well, it is)

I’m not as miserable commuting as I thought I would be. I think it’s because I listen to (interesting and funny) podcasts on my commute, which apparently keeps my happiness rating fairly high. To calculate how a 50% reduction in commuting would affect my happiness, simply reduce the percentage of time I spend commuting by 50% and then recalculate my average happiness rating (for simplicity, I will assume that the average emotional ratings for “not commuting” is what my emotional ratings will be for the time that I no longer spend commuting in this hypothetical scenario):

Which is an increase of my average happiness rating from 3.73 to 3.76 (+0.03), or less than 1%. This was surprising to me, since I really don’t like commuting. But given that my happiness rating while commuting was already close to the average happiness rating, it makes sense in hindsight that reducing my commute time wouldn’t affect my overall happiness rating much.

One possible explanation is that when I think about the overall experience of commuting, I don’t like it because it seems super inefficient and boring, but in my moment-to-moment experiences of it I tend to be doing OK (i.e., a difference between my remembering and experiencing selves).

Being Cautious

While the previous demonstration shows how the emotional affect data could be used to estimate certain counterfactuals, the assumptions behind this type of estimation are very strong. I would feel best predicting changes in well-being when the changes in your life are relatively small.

Even then, it’s hard to know exactly how your time use will change, and how the new combination of time use will affect your overall emotions. Reducing your commute time might lower your stress level since commutes are stressful, or it might raise it if it means you end up spending more time at a stressful office.

Another thing to remember is that people are notoriously good at adapting to new levels of “normal”, so that even after major life events (both good and bad ones) they return to similar baseline levels of well-being measures. That is to say, I think there are strong individual fixed effects when it comes to subjective well-being.

Lastly, the amount of time you spend doing an activity will also most likely affect your overall rating of the activity. Things can become more or less enjoyable depending on how much time you spend doing them. Doing sport activities[2] might be fun and exciting at first, but after a while it can become tiring and stressful. Using the commuting example, maybe in addition to the effect of spending less time commuting, I would also become happier while I was commuting, which would imply that the effect on my overall happiness would be larger than +0.03.

These considerations make the well-being data more difficult to incorporate into your decision making, but not impossible. Ultimately, I think individuals with lots of private knowledge about themselves are best equipped to judge when these types of hypotheticals would be a valid thing to take into consideration. They just need to collect the data first.

Now that I’m in grad school, it’ll be interesting to see 1) how the allocation of my time changes, and 2) how it affects my emotional ratings and subjective well-being.

_______________________________________________________________________

[1] Author’s calculations, American Time Use Survey Well Being Module (2010, 2012, 2013).

[2] Can you tell I’m super sporty?

_______________________________________________________________________

PDF Version

_______________________________________________________________________

© Kevin DeLuca and ThoughtBurner, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, links and citations may be used, provided that full and clear credit is given to Kevin DeLuca and ThoughtBurner with appropriate and specific direction to the original content.

Advertisements

Thought: Waiting for the Subway

2/2/17

By Kevin DeLuca

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $0.66 – about 3.02 minutes

You know how when you’re waiting for something to happen, time feels like it goes by a lot slower? Like how a minute is super long when you’re watching a clock. That’s how I feel waiting for the subway to come. But the subway wait is even worse, because usually by the time I’m waiting for the subway I’m also quite eager to get home. This makes it feels like I wait forever for the train.

It really only takes about two minutes.

I know this because I started recording how long I wait for the subway on my way home. Weird? Yeah, I know. Worth it? Sure? Anyway, every day I take the uptown 1 train from Penn Station on my way home. I’ve been recording how long I wait for the train for a few months now (70 work days), and using this data I can finally answer the very very important question of how long I actually stand waiting for the train.

Here is the distribution of wait times:

waitdist

Most of my wait times are under five minutes, and there is one true outlier that took 14 minutes (I remember that time, it was the worst). The average wait time is 2.01 minutes – not as long as I would have thought. For a perhaps surprising number of days, I get there right as the train is arriving at the station – close to 20% of the time.

But I don’t always arrive at the subway stop at the same time. Depending on when I leave work, I either get to the stop right before 7pm or right before 8pm (I usually take one of the NJ Transit trains that are scheduled to arrive at Penn Station at 6:35pm, 6:48pm or 7:43pm). The time I spend waiting for the train might depend on what time I actually get to Penn Station to begin this whole waiting ordeal.

Below is each wait time depending on the time of day I get to the station:

waiting

The line shows the best linear fit through all of the observations. There is a slight decrease in the time I wait for the train if I get to the stop later in the day. When I get to the station close to 8pm, the average wait time is 1.37 minutes. When I get there closer to 7pm, however, the average wait time is 2.43 minutes – a whole minute and 3.6 seconds more! The decrease in average wait times later in the day seems to be mostly due to a higher chance that the 1 train will take a really long time to come if I get to the station closer to 7pm.

In fact, of the wait times that are in the top 10% of the distribution (i.e. my top 10% wait time lengths – more than 3.8 minutes), only one occurs at times closer to 8pm. Of those in the top 5% – which is more than 6.5 minutes – all of them occur at times closer to 7pm.

The table below shows the average wait time for all days, for days when I get to the station closer to 7pm, for days when I get to the station closer to 8pm, and the difference between the 7pm and 8pm wait times. It also shows the odds of getting to the station right when the train is arriving (wait time = 0) and the odds of having to wait for the train for more than 6.5 minutes (the 95th percentile).

subwaytablepic

Not only is the train more likely to be very late at 7pm, I’m also about half as likely to get to the station right as the train arrives – 14.3% chance at 7pm vs. a 32.1% chance at 8pm.

So what’s the big takeaway from all of this? I guess if I was perfectly indifferent in regards to which train to Penn Station I took home, I could save about a minute in expected waiting time by choosing the later time. One minute less waiting isn’t a huge difference, but I could always frame it as ‘cutting your subway wait time in half’ since the average is two minutes. Also, the one minute of waiting time I save is one of those “long” minutes that takes a long time to go by, so subjectively it feels nicer than saving a “normal” minute of doing something else.

For a way cooler analysis of NYC subways and waiting times, see Erik Bernhardsson’s web article on the topic.[i] His analysis also shows evidence that average wait times are slightly decreasing between 7 and 8 (see the “Waiting time by time of day” section). Or rather, my “field work” confirms his analysis.

_______________________________________________________________________

[i] https://erikbern.com/2016/04/04/nyc-subway-math.html

_______________________________________________________________________

PDF Version

_______________________________________________________________________

© Kevin DeLuca and ThoughtBurner, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, links and citations may be used, provided that full and clear credit is given to Kevin DeLuca and ThoughtBurner with appropriate and specific direction to the original content.

Thought: The Great Coffee Experiment

7/7/16

By Kevin DeLuca

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $1.97 – about 8.95 minutes

See the Official ThoughtBurner Report

I drink a lot of coffee.

In fact, I’m drinking a cup as I write this article. I drink coffee because I think that it makes me more productive and gives me a slight boost in energy which allows me to work more on whatever it is I happen to be working on. As a Daily Optimizer,[i] it makes perfect sense for me to drink more coffee if it truly does make me more productive. And while I suspect that any boost in work productivity from coffee-drinking is small, sub-optimal coffee drinking habits could lead to large productivity losses over a lifetime.

I’m not alone in my belief in the power of coffee. An old survey found that 65% of workers drank coffee on the job and 38% of people say they “couldn’t live without it.” [ii] Another study showed that half of all people say that they are less productive without coffee.[iii] Clearly, many people believe in the productivity-boosting power of coffee. What I worry about, however, is that people tend to drink coffee precisely when they have to be more productive. They sip that late night java as they cram for an exam, or chug a second cup on a particularly busy morning at work. Does the coffee actually help, or is it all in their head? Causation or correlation?

From what I could find online, there has been a lot of research done on the effects of coffee-drinking. But the evidence looks pretty mixed, and it is less about work productivity and more about health. It seems like every few weeks or so, an article goes around social media talking about a new study (often with a sensationalist headline[iv]) that proves that coffee is bad for you. Then the next week, one will say that coffee is cancer-curing,[v] or, at worst, harmless. Some research has shown that caffeine boosts performance on a number of simple cognitive tasks. Many of these studies, however, take place in highly controlled environments and test performance on “memory, reasoning and concentration tasks,” [vi] and it’s not clear whether these lab results generalize to the real world. These health effects are interesting for sure, but people don’t worry about minimizing their risk of dying within 17 years[vii] when they switch on their coffee pots in the morning. They drink it because they believe it’s optimal to drink.

At least, that’s certainly true for me. I drink coffee because I think it makes me more productive. But I don’t really know that it does. This issue has plagued me ever since I became an avid coffee drinker. Rather than letting this problem keep me up at night (or is that all the caffeine?), I designed an experiment to get to the bottom of this coffee controversy.

The Great Coffee Experiment

My coffee-drinking self-experiment was a blind study meant to investigate the causal impacts of drinking coffee on my work output. I randomized the timing, amount, and type of coffee (regular or decaf) that I drank during workdays for a six-month period.

Experiment Design

The randomization of my coffee-drinking schedule was carried out in a number of steps. First, each day’s coffee type was determined – either regular or decaf. These two conditions happened with equal probability. Next, the number of cups was randomly decided. The number of cups was restricted to being an integer between zero and five inclusive, with each value having the same chance of being chosen. Then, the day was randomly chosen to be either a “random” day or a “free” day. On free days, I was told the number of cups of coffee I was assigned for the day, and was allowed to choose when I drank each cup. On random days, each cup was assigned a time at which it was required to be drank. This random time was restricted to morning and work hours (for example, I did not allow for cups to be assigned a time in the middle of the night while I was most likely asleep). Times were chosen by selecting a random minute within the allowed time interval (6:30am-6:00pm).

For the experiment, I exclusively used Maxwell House instant coffee (either regular or decaf, depending on the assigned type). Coffee was given to me in a plastic bag at the beginning of each day by my research assistant (special shout out and big thank you to Ellen Kim, my research assistant and fiancée, who patiently put up with all of my weird experimental requests). The bags were pre-filled with enough instant coffee for five standard 8-oz cups. Notice that this means that I could not guess how much coffee I would have to drink on a given day based on the amount of instant coffee I observed in the bag each morning. In addition to being “blind” to the total amount, I was also “blind” to the coffee type assignment – the bags did not indicate whether the coffee was regular or decaf. After the timing for each cup was randomized, I used a web service called OhDontForget[viii] to send myself a text message at each of the randomly assigned times.

On free days, I was sent a text message at 6am that told me the number of cups I was required to drink that day. On random days, I received a text message exactly at each of the randomly assigned times. My research assistant entered in all of the times so that I did not know beforehand when I would be required to drink each cup. On random days, whenever I received a text message I was required to make a cup of instant coffee from the bag I had been given that morning, and I had to finish the entire cup within 30 minutes. Compliance to the experimental conditions was good; for 92% of days, I stuck perfectly to the randomized schedule.

This design solves two major endogeneity problems present in observational studies of coffee-drinking. First, by randomizing the coffee types and having the study be a blind study, we can control for any placebo effects. In a non-blind study, I would know how many caffeinated cups of coffee I drink on a given day, and therefore would know my caffeine dosage. In this case, we might worry that I might change my behavior based on how many cups I get to drink (i.e. increase my work effort if I get to drink more coffee), creating what seems to be a causal relationship between caffeine and work output. In this experiment, however, the placebo effect will influence both the regular and decaf work output equally due to the blind assignment. Therefore, we can control for (and measure) this placebo effect.

Second, by randomizing the timing and number of cups of coffee, we eliminate any endogeneity between choosing to drink coffee and the amount of work I have to complete. For example, it seems likely that, if people believe that drinking coffee increases their productivity or energy, that they would drink more coffee precisely when they have a lot of work to complete. This would bias any estimates of coffee drinking on productivity upward, since people would choose to drink more coffee at times during the day when they are required to work more. The randomized amount and timing of coffee-drinking eliminates that bias.

In addition to solving these two endogeneity issues, the design of the experiment also allows us to measure how good I am at naturally optimizing my own coffee-drinking in terms of the timing of my coffee consumption. If people are good at deciding the best times for them to drink coffee, we might imagine that when my coffee-drinking schedule is randomly determined the positive effects of drinking coffee may not be as strong. For example, coffee may boost productivity more if I drink it when I start to feel tired during the day (and choose to drink it) rather than when I am randomly assigned to drink it at a time when I may or may not be feeling tired. By comparing the estimates of the coffee effects on random days versus free days, we can get a sense of how good I am at being a Daily Optimizer.[ix]

The outcome of interest was my daily labor supply – the number of minutes I worked in a day. I measured this by taking a stopwatch with me to my office. Whenever I would start working on a task, I would start the timer. Whenever I stopped working – whether it was to go have lunch, read a WaitButWhy[x] article, or go to the coffee shop across the street – I would stop the timer and mark down how long I had worked. This outcome measure, along with a record of the randomized coffee drinking assignment, is what I use to estimate the causal effects of drinking coffee on my work output and work efficiency.

Results

Figure 1 shows the best linear fit line for average total daily work time conditional on coffee type and number of cups, both for free and random days. For free days (left side of Figure 1), I worked more on regular type days as compared to decaf days, as indicated by the caffeinated line being above the decaf line. For both coffee types, total work output increases as the number of cups increases, which is visually suggestive of a positive placebo effect. Additionally, the slopes of both the regular and decaf lines seem to be approximately the same, which suggests that there is no positive effect of drinking an additional cup of regular coffee (as opposed to drinking an additional cup of decaf). In other words, there is no visual evidence that caffeinated coffee increases my labor supply any more than decaf coffee on free days.

coffeelfit3

 

For random days (right side of Figure 1), the figure tells a slightly different story. Both the regular and decaf lines are still upward sloping, which again suggests a placebo effect. However, the slope for regular days looks steeper than the slope for decaf days. This difference in slope would suggest that the effect of drinking an additional cup of regular coffee increases my work output more than drinking an additional cup of decaf coffee. In other words, there is visually suggestive evidence that there is, indeed, a true effect of drinking caffeinated coffee on my total work output.

To more rigorously check these effects, I ran a regression of daily work time on the different coffee drinking variables. Specifically, I used the following model to test whether any of the effects were statistically significant:

Eq1_coffee

where Time Worked is my daily time worked in minutes, Regular is an indicator variable equal to one when the coffee type was regular, Cups is the number of cups of coffee I drank that day (regular or decaf), and (Regular x Cups) is the number of regular cups of coffee I drank that day (decaf cups don’t count).

The sign, magnitude, and statistical significance of the beta coefficients are important. Beta 1 tells us whether I worked more on days where I drank regular coffee compared to days when I drank decaf (not dependent on how many cups I had). Beta 2 measures whether an additional cup of coffee, either regular or decaf, increases the amount of time worked – in other words,  measures any placebo effect. Beta 3 picks up any causal impacts of drinking an additional, regular cup of coffee on the total number of minutes I worked in a day. Most people believe that drinking more caffeinated coffee will help you get more work done. This is equivalent to believing that Beta 3 is positive and statistically significant. We will test whether this is true.

Results of the regression are presented below in the table below.

Table1coffeetb

The estimates from the table generally confirm our visual intuitions, with the caveat that most of the estimates are not statistically significant. For both free days and random days, the estimate for Beta 2  is positive, indicating evidence of a placebo effect.

For random days, the estimates suggest that drinking an additional cup of coffee – regular or decaf – increases the total daily number of minutes worked by approximately 17 minutes. This is equivalent to a 4.8% increase from the daily mean total time worked. On random days, the estimate for Beta 3 is also positive, indicating that drinking an additional cup of regular cup of coffee has a stronger positive effect on total work supply than drinking an additional cup of decaf coffee. Specifically, the estimates suggest that drinking an additional cup of regular coffee has double the effect of drinking an additional cup of decaf coffee – when the coffee was regular, my total daily work output increased by an additional 17 minutes per cup. However, neither the estimate for Beta 2 or for Beta 3 are statistically significant at conventional levels.

Interestingly, there is no evidence of an “amplifying” effect on free days. In fact, the results of running the regressions separately for free and random days suggest that, if anything, being able to choose the timing of my coffee drinking actually decreased the effect of drinking an additional cup of regular coffee, though I don’t want to read too much into this result because the estimates are not statistically significant.

Discussion

In terms of my personal results, there seems to be slight evidence of both a placebo effect and “true” effect from drinking caffeinated coffee. I suspect that part of what is really happening, for me, is that when the experiment forced me to drink less coffee than I normally would, I worked less. Naturally, I probably drink about four regular cups of coffee per day. Whenever I had to drink less than this, I may have decreased my work output. It’s not so much that coffee makes me work more, just that not being able to drink my “natural” amount of coffee makes me work less. This would explain why both regular and decaf coffee seemed to increase my total work time.

While these results are real for me, they may not be the same for you. Caffeine and coffee can have very different effects on different people,[xi] and I wouldn’t be surprised if the effects were only applicable for me or a particular subset of people who are very similar to me.

So what is the solution to the daily optimization problem of how much coffee to drink? Given that the results of the experiment look promising for coffee consumption, I have since lifted restrictions on my coffee-drinking in order to maximize my work output and productivity. I now drink a combination of caffeinated and decaf coffee in order to hedge against possible long-term negative health consequences of daily caffeine consumption, in part because it seems to be the case that I can still increase my total work output even while drinking decaf coffee (though, perhaps, slightly less than if I exclusively drank regular coffee). Further (self-)research is needed to get more precise estimates of these potential coffee-drinking effects, and to be certain whether this is truly an optimal course of action.

_______________________________________________________________________

Note: Figure 1 was updated 12/12/16 to a prettier format. The report was also modified to include the new version of Figure 1.

_______________________________________________________________________

[i] https://thoughtburner.org/2015/02/19/thought-daily-optimization/

[ii] http://www.businessinsider.com/the-one-office-perk-you-must-splurge-on-2011-3

[iii] http://fizzler.com/167/can-coffee-add-to-productivity/

[iv] http://www.atlantamagazine.com/health/new-study-coffee-can-kill-you/

[v] http://well.blogs.nytimes.com/2015/01/22/coffee-may-cut-melanoma-risk/

[vi] http://www.telegraph.co.uk/news/health/news/7710780/How-office-coffee-breaks-make-staff-work-harder.html

[vii] http://www.npr.org/sections/thesalt/2013/08/17/212710767/how-many-cups-of-coffee-per-day-is-too-many

[viii] http://ohdontforget.com/

[ix] https://thoughtburner.org/2015/02/19/thought-daily-optimization/

[x] http://waitbutwhy.com/

[xi] http://earnworthy.com/caffeine-and-productivity/

_______________________________________________________________________

PDF Version

Coffee Drinking Data

Official Thoughtburner Report

_______________________________________________________________________

Thought: Powerball Lottery Tickets Are Actually Worth Less Now That The Jackpot Prize Is Higher

1/12/16

By Kevin DeLuca

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $1.28 – about 5.83 minutes

You might want to think twice about buying a Powerball ticket.

The Powerball jackpot prize is currently a whopping $1.5 billion, making it the largest jackpot lottery ever[i]. The media hype is huge, and people all over the country are lining up[ii] to have a chance to become unthinkably rich overnight. If there was ever a good time to buy a lottery ticket, it would be now, right[iii]?

Well, no.

Many people don’t seem to realize that a higher jackpot doesn’t necessarily increase the value of a lottery ticket. Sure, the total jackpot prize gets higher. But as more and more people buy tickets, the odds that somebody else will also win the lottery increases significantly. This is bad because splitting the lottery with someone else greatly reduces the size of the jackpot payout, and therefore makes the lottery ticket much less valuable.

So, when should you buy a lottery ticket? The key idea is that you want the jackpot to be high enough to make the ticket worth the cost, but not so high as to induce a lot of people to buy tickets – you’ve got to find the jackpot sweet spot. When trying to find this sweet spot, you need to adjust the expected values of lottery tickets to reflect the changing probability of having multiple winners – that is, of having to split your winnings with someone else.

The Naïve Expected Value Approach

The standard approach used to determine whether it is “worth it” to buy a lottery ticket is to use an expected value calculation (for example, see this[iv] and this[v]). The basic idea is: find the “expected value” (EV) of the lottery ticket, which is simply the probability of winning a prize times the dollar amount of that prize, and then compare that to the cost of the ticket. Since there are multiple potential prizes, each of these possible prizes are multiplied by their respective odds and added together to determine the lottery ticket’s expected value. If the expected value is greater than the cost of the ticket ($1 for Mega Millions and $2 for Powerball), then you should buy a lottery ticket. If the expected value of the ticket is lower than the cost, then you should save your money for something better. Odds can be found for Powerball[vi] and Mega Millions[vii] on their websites.

The trick to this technique is that the expected value of a lottery ticket increases as the jackpot increases. This is because as the dollar amount of the prize increases, the price of the ticket and the chance of you choosing the winning numbers stay the same. In the end, it all comes down to finding the minimum jackpot value that makes the expected value of a lottery ticket equal to its cost. In the case of this fairly straightforward calculation, that comes out to about $491 million. Below, you can see the naïve expected values for Powerball tickets plotted over jackpot size, along with the cost of a ticket:

fig1naive

The point where the two lines cross show where the jackpot prize makes the ticket ‘worth’ its cost. The higher the jackpot prize, the more the ticket is worth. According to this calculation, the $1.4 billion Powerball prize indeed means that lottery tickets are more valuable than ever, worth an expected $5.11.

Splitting the Jackpot

This standard approach, however, doesn’t take into account the fact that as the jackpot amount gets higher and higher, the number of tickets sold increases. This doesn’t change your odds – you still have the same chance of winning the lottery – but it does increase the chances that someone else will also win the lottery. This makes a big difference in the expected value of a lottery ticket: if two people both choose the winning numbers, they have to split the jackpot, effectively cutting the value of winning in half. If three people win, they have to split it into thirds… and so on and so forth.

This is why lottery tickets don’t necessarily keep increasing in value as the jackpot prize amount increases. Depending on how many more people buy lottery tickets as the prize increases, the value of lottery tickets could fall. You want the jackpot to be high, but not so high that you’ll have to split the winnings with someone else.

In order to take this into account, I used lottery ticket sales data[viii] to predict the number of tickets sold depending on the Powerball prize. This will help us adjust the expected values based on how likely it is that you’d split the jackpot with some other lucky person (or unlucky person, given that you’d both have to share now).

Below, I’ve plotted ticket sales over the size of the jackpot prize for every Powerball drawing since they switched to a $2 ticket (January 18th, 2012). The trend line is a cubic function fitted to the sales data. It estimates the number of tickets sold (y) based on the size of the jackpot prize in millions (x).

fig2lottosales

Using the equation from the model to estimate the number of tickets sold for each jackpot prize, I then calculated the odds that, given you have already won the lottery, you would have to split it with someone[1]. For example, with a $300 million Powerball prize, the model estimates that about 63,000,000 tickets will be sold. This makes the odds that nobody else wins equal to about 80%. If the prize were $600 million, however, the odds that nobody else will win the lottery is only about 51%. The odds that nobody else will win the jackpot decreases as the prize increases because the number of estimated tickets sold increases as the prize increases.

For each of the odds of splitting the jackpot, I multiplied the probability of the outcome by the prize amount you would receive if the outcome actually occurred. For example, if there was a $100 million prize with a 70% chance of not splitting the lottery, then the adjusted expected value of winning the lottery ticket changes from $0.34 to $0.29:

fig2.5equation

Notice that this adjusted expected value of winning the jackpot is less than the unadjusted expected value of winning the jackpot. This will always be true because this new expected value takes into account the increased probability of having to split the jackpot prize as its amount increases. For each jackpot, I calculated the adjusted expected value of winning. Then, I added the expected value of the other prizes[2], and the result is the new adjusted expected value of the entire Powerball ticket.

Below, I’ve plotted the adjusted expected value of a Powerball ticket, along with its cost.

fig3corrected

As you can see, at first, the expected value of lottery tickets rises as the jackpot prize increases. But eventually it hits a tipping point where the ticket’s expected value actually starts to decline. This is because with a higher jackpot prize, more people buy tickets, which increases the probability of having to split the prize so much that it actually makes the ticket less valuable than it was before.

Specifically, the graph shows that Powerball tickets are most valuable when the jackpot is between $870-$875 million, where they have an expected value of about $1.96. At this jackpot size, the prize is high enough to increase the expected value of the ticket a good amount, but not so high as to make it very likely that you would have to split the jackpot with other people (about a 49% chance you have to split it with someone else).

Notice that the maximum expected value of a Powerball ticket is always less than the cost of a ticket ($2). This just reinforces the idea that buying lottery tickets is a horrible investment strategy. If you were to play Powerball an infinite number of times (infinite drawings, not infinite tickets), there is no strategy that will let you win money. And even if you bought 5 tickets for every Powerball drawing for 100 years, you would still have 99.98% chance of never choosing the winning ticket.

While the $1.5 billion dollar prize may be the largest jackpot in history, the adjusted expected value of a Powerball ticket right now is not actually at it’s maximum. In fact, the current expected value of a Powerball ticket is approximately $1.57 – which is about the same as the expected value of a Powerball ticket if the jackpot were $445 million.

This is all just to say that now is in fact not the time to buy Powerball tickets. That time was last week on Saturday, January 9th, when the jackpot would have paid out $950 million and Powerball tickets had an adjusted expected value of $1.96 – which is just $0.01 short of the maximum. I’m sorry to say, you’ve all missed the sweet spot already.

Then again, it is 1.5 billion dollars.

_______________________________________________________________________

[1] I actually calculated the odds for each possibility – that nobody wins, that exactly one person wins, that exactly two people win, etc. – and added the expected value of each outcome together, until the expected values were too small to make a difference (i.e. less $0.01).

[2] The expected value of the other, non-jackpot prizes remains the same, since these aren’t split if there are multiple winners.

_______________________________________________________________________

[i] http://money.cnn.com/2016/01/06/luxury/largest-lottery-jackpots/

[ii] http://www.wsj.com/articles/powerball-no-winner-so-jackpot-may-hit-1-3-billion-1452410989

[iii] http://www.marketwatch.com/story/you-should-really-buy-a-700-million-powerball-ticket-today-2016-01-08

[iv] http://time.com/money/4172196/powerball-math-odds-advantage/

[v] http://davidtorbert.com/2012/03/is-it-ever-worth-it-to-play-mega-millions/

[vi] http://www.powerball.com/powerball/pb_howtoplay.asp

[vii] http://www.megamillions.com/how-to-play

[viii] http://www.lottoreport.com/salescomparison.htm

_______________________________________________________________________

PDF Version

Lottery Sales Data and Expected Value Calculations

_______________________________________________________________________

 

Thought: Optimizing Presidential Debates

8/6/15

By Kevin DeLuca and Boyd Garriott

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $1.54 – about 6.99 minutes

Today is the day of the first GOP presidential debate. Reminiscent of this time in the last presidential election cycle, there are a large number of potential candidates all looking to secure the Republican nomination. Naturally, all of them want a place in the nationally televised debates. But with limited time and space, the networks are not sending invitations to everyone.

If we assume that news networks are indeed trying to maximize the quality of the televised debates, FOX and CNN are facing a difficult optimization problem. Picking the optimal combination of candidates is no easy task; candidates don’t come with a measure of debate-inclusion quality. There are many different aspects that news stations must consider – the number of candidates, their popularity, their “true” chance of winning the nomination, the scope of their power and influence, and many more – while it tries to optimize its debate. All of these factors are determined when the combination of candidates is decided.

Here at ThoughtBurner, we spent a lot of time thinking about the problem of how to optimize a presidential debate. We (Kevin DeLuca and Boyd Garriott) each independently came up with our own optimizing processes, inspired by a FiveThirtyEight contest. After sharing our ideas with each other, we realized that our methods were designed to optimize different aspects of debate quality. Namely, Boyd devised a formal, more objective way to measure candidate (and future president) quality, while Kevin created a method focused on optimizing the representation of different viewpoints.

These two aspects of presidential debates – candidate quality and the variety of viewpoints represented – break down the measure of ‘debate quality’ into more manageable parts. We decided to combine our methods into a single super-proposal that simultaneously maximizes candidate quality and ideological representativeness. Using this method, we came up with our own list of who should be invited to the GOP debates. We also created a personalized debate optimization calculator, which lets you customize our process based on your own personal opinion of how much certain candidate factors matter.

Here’s how it works: enter how important you personally think each of the 6 factors below are in order to have good debate. That’s it – ThoughtBurner’s optimization calculator will tell you which five candidates would be in the ‘best’ debate according to your preferences.

Try out the Personalized Debate Optimization Calculator

The Method

First, we focused our attention on designing a better measure of candidate quality. We wanted to create a measure of candidate quality that would be better than simple poll results. The five main factors we included in our analysis of candidate quality were: Endorsements, Net Favorability Ratings, Polling Averages, Fundraising Numbers, and Political Experience.

Using a variety of sources as well as our own formulations (see appendix at bottom), we turned each criterion into a percentile measure to capture how the candidates compare to each other. The average percentile ranking of all five measures (weighed equally) is the overall final score ranking of the candidate, expressed as 0%-100%.

The top 10 highest quality candidates, using this design, are:

QualityChart

If we were to only invite the top five highest quality candidates to the debate, it looks like Donald Trump wouldn’t make the cut, even though his polling numbers are the highest.

Next, we focused our attention on optimizing the debate by ensuring that a variety of opinions were included in the debate. If the candidates selected for the debate do not have different viewpoints, then we’ll end up watching a debate with very little debating. To ensure that issues are actually discussed, we need to ensure that a variety of viewpoints are represented by the candidates.

To do this, we collected ideological scores on four issues (Individual Rights, Domestic Issues, Economic Issues, and International/Defense Issues) from Inside.gov[i] in order to understand the positions that each contender holds. We then calculated the average ‘ideological distance’ of each candidate to all of the other candidates. A distance of 0 means the candidate is exactly in the middle of the pack; a negative score means they are less conservative while a positive score means they are more conservative.

We then decided to include a range of five different viewpoints: Least Conservative, Less Conservative, Moderately Conservative, More Conservative, and Most Conservative. Using the range of previously calculated distance scores, we then created five ‘Ideal Position Scores’ that would perfectly represent the five different desired viewpoints. Last, we compared each candidate’s distance to the ideal position to which they were closest to get their Opinion Diversity Score, which is a measure of how close they are to representing one of the desired viewpoints.

For each of the five viewpoints, the candidate who was closest was selected to be in the debate:

DiversityChart

Notice that, compared to the quality measure, these results would select only two of the same top-five candidates. Chris Christie gains a spot because he’s the least conservative of all the candidates; Rick Santorum gains a spot because he’s the most conservative; Marco Rubio beats out Rick Perry and Scott Walker for the more conservative spot; Josh Kasich takes the less conservative spot from Rand Paul; Mike Huckabee is closest to the middle, stealing the spot from Jeb Bush and Donald Trump.

By itself, the quality measure doesn’t ensure diversity of opinions – Jeb Bush and Mike Huckabee are not very different in terms of their ideological position scores, yet they would both get a spot in the debate. On the other hand, the diversity method doesn’t account for candidate quality in any way – Donald Trump is much less qualified than Jeb Bush, but his opinion diversity score is better so he would take the spot from Bush (if they both hadn’t lost it to Mike Huckabee, who is also less qualified than Bush).

Our final results, however, combine the methods above to ensure both candidate quality and opinion diversity. It provides a way to select a group of candidate that is both diverse and highly qualified. In theory, this group of candidates will lead to an optimized debate because the most qualified candidate able to do so will defend each of the desired viewpoints.

To do this, we simply gave an equal weighting to candidate quality and diversity of opinion. The opinion diversity score of each candidate was changed to a percentile, and this new percentile score was averaged with the candidate’s quality percentile ranking (this would be the equivalent of rating each of the first 5 factors as “1” and the last as “5” if you want to double check our work).

Using the combined method, the five candidates that should be invited to the debate are:

FinalRankings

This is ThoughtBurner’s solution to the debate optimization problem.

Of course, you may feel differently. We weighed all of our measures equally when deciding whom to invite to the presidential debates. But if you don’t think all of our factors matter equally, then your solution to the debate optimization question is different. You might not think that diversity of opinion matter ‘makes’ half of the debate. Or maybe you think that Fundraising Numbers isn’t a good factor to include in candidate quality, so you would never consider it when designing your optimal debate.

To account for this, and to help you find your own personal solution to the debate optimization problem, we’ve created a personalized debate optimization calculator. Using our calculator, you can assign different weights of importance to each of the 6 factors we used that influence debate quality (5 candidate quality factors plus 1 measure of opinion diversity). We assign relative weights to each of the factors according to your personal 1-10 ratings of importance. After we perform the same calculations to measure candidate quality and opinion diversity, we then weigh each factor by your personalized weights and re‑rank all of the candidates.

Rather than having to think about, research, and design an optimal debate all by yourself, ThoughtBurner’s optimization calculator takes your preferences and gives you a personalized solution. If FOX and CNN really want to have the best debate, they could allow a large number of people to enter in their personal ratings of each factors importance, average each one, then enter in the average preferences into our debate calculator.

Technical Appendix

This is for all of you who are more interesting in learning how we calculated our custom measures for determining candidate quality and optimal ideological positioning. Below, we both explain our measures in more detail. You can also download our spreadsheets below if you’d like.

Measuring Candidate Quality – Boyd Garriott

There are 5 criteria I used to measure candidate quality: Endorsements[ii], Net Favorability Ratings[iii], Polling Averages[iv], Fundraising Numbers[v], and Political Experience.

Political experience was calculated by assigning 1 ‘point’ for each year a candidate served in a local political position, 2 for each year in a state legislature, 3 for each year in a state-wide office [elected or appointed] as well as for federal appointments and House of Representatives, and 4 for each year spent as a governor or senator.

Further, each criterion is then converted into a percentile, and each criterion’s percentile is weighted based on your desired importance.

The thought process behind including each of these criteria is as follows:

Endorsements are a proven predictor of a candidate that can rally the support of their respective party – a necessity to win a party’s nomination.

Net Favorability ratings show what polling doesn’t – the support beyond a candidates ‘first-choice’ supporters. This indicates the staying power of a particular candidate since, in a field this big, candidates are going to need to steal support from beyond their die-hard loyalists.

Polling averages, while limited in scope, show which candidates the party’s voters really want to see on that stage and should rightfully be included.

Fundraising numbers show that a candidate is receiving adequate monetary support – an indicator of broader electoral support and long-term viability.

Lastly, political experience shows that a candidate has governing credentials and should be better able to withstand the intense scrutiny that comes with running for (or serving in) political office. Basically, Trump polling higher than, say Rick Perry, shouldn’t completely discount Perry’s 15 years as governor.

Spreadsheet of Candidate Quality Calculations

Measuring Diversity of Opinion – Kevin DeLuca

First, I recorded each of the candidates’ viewpoints and positions based on their ‘issues’ score from this website[vi]. Then, I calculated a ‘distance’ score for each candidate by averaging each pairwise difference between the candidate and all other candidate. This is their position score, which measures how conservative a candidate is compared to all of the other candidates. Again, a score of 0 means the candidate is exactly ‘average’ in terms of their conservative-ness; positive numbers are more conservative and negative numbers are less conservative.

The scores ranged between -4.09 (Christie) and 3.41 (Santorum). I took the range (7.40) and divided it by 4 (1.85) to get the idea distances between viewpoints (assuming 5 candidates).  I then constructed the ideal viewpoints to include 0, the average viewpoint, and spaced the remaining positions 1.85 points apart in both directions – these are the ‘ideal’ position scores (to maximize representativeness).

Next, I matched up each candidate to the ideal viewpoint to which they were closest, and subtracted their position score from the ideal position score. This number tells us how close the candidate is to representing one of the ideal viewpoints for the debate. The lower the number, the closer they are.

Last, for each of the 5 viewpoints I selected the candidate that was closest to that viewpoint (the candidate who had the lowest ‘distance’ from the ideal position). This method ensures that the candidate whose views most agree with the ideal positions are invited to the debate, which allows for the whole conservative spectrum to be represented at the debate.

Spreadsheet of Candidate Diversity Calculations

_______________________________________________________________________

[i] http://presidential-candidates.insidegov.com/

[ii] http://projects.fivethirtyeight.com/2016-endorsement-primary/

[iii] http://www.gallup.com/poll/184337/among-republicans-gop-candidates-better-known-liked.aspx?utm_source=position1&utm_medium=related&utm_campaign=tiles

[iv] http://www.realclearpolitics.com/epolls/2016/president/us/2016_republican_presidential_nomination-3823.html

[v] http://www.nytimes.com/interactive/2016/us/elections/election-2016-campaign-money-race.html

[vi] http://presidential-candidates.insidegov.com/

_______________________________________________________________________

PDF Version

Personalized Debate Optimization Calculator

Candidate Quality Calculations

Candidate Diversity Calculations

New Logo, New Post, New Stuff

Hey all,

No Thoughts at the moment, I am spending my summer traveling and enjoying vacation before I start work in the fall. But I have done a bit of work on the ThoughtBurner website – I created a new logo (read about it here) and I’ve done some reorganizing of the different pages. I’ve also started working on some new Daily Optimization research projects (which involve some heavy data collecting) and am eating a lot of Korean food (not part of the data collecting, but also delicious). I’ve also created a ThoughtBurner twitter account (@ThoughtBurner) and Facebook page. I should have one or two research projects as well as some new Thoughts, published sometime in August, so make sure to check back soon. Until then, wait on the edge of your seats as you anticipate how much better off you will be when I publish the next Thought that will undoubtedly reveal insights that will help you better optimize your life

ThoughtBurner Logo

ThoughtBurner needed a logo, so I finally made one:

Logo1

The first character is ‘tau’, a greek letter for ‘t’, and the second character is ‘beta’, the greek letter for ‘b’. Both are in lowercase. As you may have deduced, “tb” or tau-beta stands for ThoughtBurner. The reason that greek letters were used rather than their modernized counterparts is because economists are seriously in love with greek letters. Exhibit A, from a finance class:

Screen Shot 2015-06-05 at 10.28.37 PM

Ew.

It is only fitting, then, that ThoughtBurner’s logo utilize the letters of economists (really mathematicians but w/e). Also, it looks significantly cooler with greek letters (in the colloquial sense, not the statistical sense).

I would also like to point out that tau and beta work particularly well for the logo due to their common usage in economics. Tau is often used to denote time subscripts in economic models, and beta is usually the ‘coefficient of interest’ when economists run regressions and test for significance. Not only does the tau-beta combination represent the letters of ThoughtBurner, it also is made out of greek letters that are important and commonly used in economics! Basically, I think I’m really clever.

Check out the new logo in action on our new Facebook page and follow us on our new twitter account, @ThoughtBurner.

Thought: Speeding Quotas In Austin, Texas

5/14/15

By Kevin DeLuca

ThoughtBurner

Opportunity Cost of Reading this ThoughtBurner post: $1.61 – about 7.3 minutes

This is the last post I’ll make about speeding, I swear (for now). You can read about the driver’s problem or about the government’s problem in my earlier posts about speeding. I wanted to share one more interesting discovery I had made while investigating common speeding myths – hopefully it will help you better optimize you speeding behavior. I investigate the claim that the City of Austin Police use speeding ticket quotas, and how to optimally adjust your behavior to account for the effects of these possible quotas.

TRAFFIC QUOTA BEHAVIOR IN AUSTIN POLICE

I have often heard from friends and family that you shouldn’t speed at the end of the month because police officers will be trying to finish their month’s quota of speeding tickets. Also, apparently there were a few police forces around the country that got caught using speeding ticket quotas in the past, including cities in Texas[i]. Even though speeding ticket quotas are illegal in Texas[ii], it may be the case that police departments have implicit quotas or “performance standards” that effectively create the same behavior that we would expect from explicit quotas.

The assertions, then, are that 1) police are required, either explicitly or implicitly, to give out a certain number of speeding tickets each month and 2) as each month comes to an end, police give out more tickets to make sure that they meet their quota.

Rather than continuing to speculate, I turned to the data. Using the City of Austin’s open data website[iii], I gathered the details of every traffic violation given out by city police in fiscal year 2013-2014 (data available at the end of this post). After removing all non-speeding traffic violations, there were 67,606 tickets given out for a number of different types of speeding violations:

Description

This amounts to about 1 for every 4,700 Austinites, or 185 speeding tickets every day on average. Included in the data set is the date of each violation, which is exactly what we need to test whether police actually try to catch more speeders at the end of each month, as we might suspect.

First, I simply plotted the average number of speeding tickets for each day of the month and looked to see if it increased as the end of each month draws near. If more tickets were given out at later days of the month, it would provide suggestive evidence of quota behavior in Austin police. Here are the results:

AvgDOM

It looks to me like the number of tickets given out actually decreased as the month came to an end. Visually, it doesn’t appear like there are quotas – most days, police gave out between 150 and 200 tickets with no clear increase as the month came closer to an end. It was a bit suspicious to me, however, that there seemed to be a higher-than-average number of tickets given out between the 14th and 22nd of the month, with a sudden and somewhat lasting decrease starting on the 23rd

I was also worried that there might be day of the week effects – maybe the police department had certain days of the week where all of the cops would spend a large part of their time trying to catch speeders in the city. To visually test for this, I plotted the total number of tickets given out on each day of the week:

AvgDOW

While there appeared to be no single days where police gave out more tickets (except maybe Monday?), it looks like police gave out far fewer tickets on Sundays compared to other days. Because of this pattern, I decided to add in controls for day of the week and day of the month effects just to be safe.

First, I calculated the average number of tickets given out on each day-of-week-day-of-month combination. For example, there were three Tuesdays that were also the first day of a month in fiscal year 2013-2014, and there were a total of 373 tickets given out on those three days, which made for an average of 124.33 tickets on Tuesday-the-1sts. Using these calculated averages, I ran a regression of average number of tickets on day of the month and day of the week controls:

Regression1

The regression results confirm what we suspected from the visual inspection of the data. As the day of the month increases (as we get closer to the end of each month), there was no significant effect on the average number of tickets police gave out (negative, insignificant coefficient for DayofMonth). On Sundays, cops gave out about 100 tickets less than on Saturday (Saturday is the comparison day since it was dropped) and the effect is highly significant (coefficient on Sun). Also, cops gave out about 40 tickets more on Mondays, and this effect is marginally significant as well (coefficient on Mon)[iv]. Because there is no information about the number of people who drive on each day of the week, we can’t tell if these day of the week effects are because cops act differently or because people drive differently over the course of a week. For example, if half as many people drive on Sundays then that could explain why we see about half as many tickets.

While the day of the month variable was insignificant in the regression above, it doesn’t necessarily rule out the possibility of speeding ticket quotas. It just means that there probably aren’t ticket quotas due at the end of each month. If Austin City police had to meet their quotas by the middle of every month – maybe by the 22nd, for example – then the regression above wouldn’t be able to detect the expected pattern in the average number of speeding tickets given out.

To test for quotas that are “due” on different days of the month, I created artificial “quota months” where the suspected due date of the quota was made to be the last day of the month. For example, there was visual suggestive evidence that the 22nd might be the last day of a quota, so I changed the 22nd to be the last day of the month (31st), the 23rd to be the first day of the month (1st), and so on for each day of the month. Using the new, modified dates as the day of the month variable, I reran the regression with controls to see if the pattern of the average number of tickets given out conformed to expected quota behavior for any of the hypothetical quota months (i.e. if there was an increase in tickets towards the end of the new quota months). I actually did this for each possible monthly quota due date, but rather than showing you all 31 regression results I put them in a nifty little table that summarizes the important parts:

The suspected end date column indicates by which day of the month I am assuming police officers had to complete their speeding ticket quota. For example, the expected end date of 22 means that I assume that police officers had to reach their quota by the end of the 22nd of each month. The effect size column indicates how big the effect is if I was correct about the quota end date. So, if I was right that the quota was due on the 22nd, then police gave out 1.434 more tickets on average for each day closer to the 22nd (starting from the 23rd of the month before). The highlighted rows are statistically significant effects at the 5% level.

The 21st, 22nd, and 23rd are all prime suspect dates for a quota end date. If the City of Austin police force is using a quota system, police most likely need to meet their quota by one of these dates. The 13th is essentially the least likely quota end date – if this were the actual quota end date, then police would actually be giving out about 1 ticket (significantly) less per day as their quota end date drew closer, which we think doesn’t really make sense.

I want to be careful with interpreting the results – this does not prove that police officers are using quotas, it just suggests that if they are, then their quotas are probably due between the 21st and 23rd of each month. The assumptions are: police officers do in fact have quotas, and police “procrastinate” in the sense that they wait until the quota end date is close to finish collecting their quota. Given that these are true, then we would see a higher number of tickets given out as the quota end date approaches. A higher, significant effect size means that police more strongly follow this expected behavior given that the suspected end date is in fact when a monthly quota is due. The effect size, then, is sort of like the “chance that the quota ends on that date”. Keeping this interpretation in mind, it becomes clear that if a quota exists then it’s probably needs to be met sometime between the 21st and 23rd of each month, and almost definitely is not due around the 13th. This is apparent visually as well:

QuotaEndDates

Another word of caution: As I mentioned before, I could not control for the number of drivers on each day. While I can’t think of a reason why more people would drive (or be more likely to speed) on these days of the month (and therefore more tickets would be given out), we can’t rule out the possibility that driver behavior is causing the trend rather than police quota behavior.

Last, I just want to point out that even if we were right about the existence of speeding quotas and police behavior and the day that quotas were due (the 22nd, say), the effects are rather small – only about 1.5 tickets more per day out of all police officer and drivers. There are 2,300 police officers in Austin[v] and the city has a population of 885,400[vi], so 1.5 more tickets a day is a practically small effect.

Caveats aside, according to the best evidence I have, it appears that Austin police do not give out more tickets at the end of each month. Rather, they give out more tickets during the 3rd week of each month, between the 15th and 22nd. If speeding ticket quotas do exist, explicitly or implicitly, then they are most likely to be due around the 22nd of each month. For all of you drivers out there who are worried about the increased probability that you will get a speeding ticket due to ticket quotas, you should all speed a little less during the 3rd week of each month, and maybe a little more during the last week. Also, there seems to be some evidence that you are less likely to get a ticket on Sundays, and more likely to get a ticket on Mondays.

Which means that Sunday-the-23rds are probably the best days to speed, from the driver’s point of view. The most recent Sunday-the-23rd (that I have data for) was in March of 2014, and only 32 speeding tickets were given out. Compare that to the Austin City average of 185 tickets per day. August 23, 2015 is the next Sunday that is also the 23rd of the month – this day would probably give you the best chance to speed without getting caught. Monday-the-22nds would probably be the day you would be most likely to get a speeding ticket – June 22nd, 2015 is the next of these. Make sure you plan your speeding accordingly.

_______________________________________________________________________

[i] http://blog.motorists.org/if-you-didnt-believe-ticket-quotas-existed-before-you-will-now/

[ii]http://www.statutes.legis.state.tx.us/Docs/TN/htm/TN.720.htm (see Sec. 720.002. PROHIBITION ON TRAFFIC-OFFENSE QUOTAS.)

[iii]https://data.austintexas.gov/

[iv]An F-test on the day-of-the-week coefficients show that the average number of tickets given out on Tuesday through Saturday are not significantly different. This test and its results are included in the Stata .do file.

[v]http://www.austintexas.gov/department/police

[vi]http://quickfacts.census.gov/qfd/states/48/4805000.html

_______________________________________________________________________

PDF Version

Excel File

Notice About Stata Files: Because WordPress does not allow upload of .do or .dta files (for Stata), I have uploaded the files as .doc (s). If you want to use the files as .do or .dta files, simply save the file as a .doc and rename the old extension to the appropriate, new extension.

2014dataraw (rename to .dta)

QuotaBehaviorTest (rename to .dta)

SpeedingDoFile (rename to .do)