A Behavioral Lens on Being Mortal
What would you be willing to trade to live for two more months? Would you risk paralysis of your lower body? Would you go on a feeding tube? Or would you rather accept death?
These are the difficult choices that face many patients and their caregivers. In Being Mortal, Dr. Atul Gawande argues that end of life care tends to focus on increasing the patient’s length of life, while sacrificing quality of life. In light of this dilemma, Dr. Gawande urges us to be more thoughtful when considering end of life treatment options. We should make deliberate choices that adhere to our personal definition of well-being. While one individual might accept a high risk of paralysis in exchange for the ability to watch football and eat ice cream, others might make radically different tradeoffs.
Looking at Being Mortal through a behavioral lens reveals a parallel argument: in order to support individual well-being, end of life care must focus on goals rather than preferences . To understand why, we can look to one of the most challenging human biases – the empathy gap.
Empathy gaps occur when we overemphasize our current “hot” emotional state and miscalculate our future “cold” state behavior (Loewenstein, 2005). The binge drinker who swears off alcohol at the height of inebriation offers a perfect example of the empathy gap in action. Overwhelmed by visceral pain and preoccupied with his emotional state, he vows to never drink again. What he doesn’t realize, however, is that his choice is heavily influenced by his “hot” state. After sobering up and returning to a “cold” state, he forgets the ill effects of inebriation and repeats the cycle at the next party.
The binge drinker in this example teaches us that a choice made in one state might not hold up in the other, because visceral factors like hunger, thirst, pleasure, and pain can cloud our judgment.
Now consider three common preferences in a living will:
- Do you want cardiopulmonary resuscitation if you heart ceases to function?
- Do you want mechanical respiration if you are unable to breathe unassisted?
- Do you want artificial hydration or nutrition if you are unable to eat or drink?
How can we expect a healthy individual sitting in a lawyer’s office to accurately make choices about a distant hypothetical life-threatening situation? Even in the cold state of good health, about 1 in 3 individuals will reverse their preferences for life-sustaining interventions at least once in a two-year period (Danis et al., 1994; Ditto et al., 2003). This cold state indecisiveness makes it difficult to generate authentic living wills in times of health. Add to that the hot state complexity of living with dementia or paralysis, and making end of life choices looks like a losing behavioral proposition. This cold-to-hot state empathy gap represents the inverse problem of the binge drinker, yet it is just as difficult to overcome.
Now consider some of the questions mentioned in Being Mortal:
- What is your understanding of your situation and its potential outcomes?
- What are your fears and what are your hopes?
- What are the trade-offs you are willing to make and not willing to make?
While Dr. Gawande acknowledges the importance of assessing preferences for life-sustaining interventions, he appears to focus primarily on a patient’s goals for living. The behavioral literature supports this approach in theory, as goals are more durable than preferences. When we set goals, we draw on our experiences in order to decide what we want to achieve. While goals shift as an individual moves through life, preferences are significantly more susceptible to empathy gaps and other context biases, such as framing. Some of these biases are fairly minor, but others will cause us to make critical mistakes. This is especially true when we are asked to make choices about situations with which we have little to no experience, which is often the case for end of life care.
As I’ve written before, the behavioral literature can only offer heuristics and guardrails for optimizing choice and mitigating bias. Ultimately, medical professionals must negotiate these behavioral hurdles and help their patients to make the right tradeoffs.
We’ve been hunter-gatherers for 99% of our history. We used to walk between five and ten miles a day, consume five pounds of sugar each year, and scour all day for sources of fat. We craved calorie-rich foods and our bodies figured out how to store energy to survive frequent bouts of famine.
We now live in a materially different environment. The average American now walks about one-third as far, consumes 25 times more sugar, and has ample access to caloric and fatty foods.
Our bodies are wired for famine, but we now live in abundance. As a result, we overeat and struggle to metabolize the excess sugar and fat, leading to obesity and type 2 diabetes. These chronic conditions are mismatch problems partially caused by a gap between our environment and our biology. A mismatch problem occurs when the traits that helped us thrive in our evolutionary past become inadequate in our new environment.
Mismatch problems also exist in the brain. Just as our endocrine system didn’t evolve to handle a high sugar intake, our minds didn’t specifically evolve to create art, philosophize, or make complex decisions.
Consider the Linda problem made famous by Amos Tversky and Daniel Kahneman in 1983 [PDF]:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?
A. Linda is a bank teller.
B. Linda is a bank teller and is active in the feminist movement.
If you’re like 85% of people, you’ll choose B because the description of Linda makes her sound like a feminist bank teller. However, basic probability reminds us that there must be more bank tellers than feminist bank tellers, as the latter is a subset of the former, so A must be more probable. This error is known as the conjunction fallacy.
Just as we can’t blame our bodies for struggling to keep our blood sugar stable after eating processed foods, we can’t fault our minds for making mistakes when asked to interpret abstract conjunctions. It’s unlikely that ancient humans needed to ruminate on conjunction problems to excel in the age of the woolly mammoth, so it’s no surprise we make these mistakes. In this way, being bad at abstract probabilities is like having diabetes — these simply aren’t problems that our evolutionary ancestors needed to handle.
How can we solve mismatch problems? We often attempt to enhance our capabilities through education. Unfortunately, the food pyramid is no match for biology programmed to devour the bulk bag of candy that awaits us in the kitchen.
Instead of trying to band-aid our biology, we should focus on shaping our local environment. We can craft our surroundings, which we often control, to promote the desired behavior. If we don’t buy bulk bags of candy, we won’t need to exhaust willpower to fight the inevitable temptation to overindulge.
The same technique applies to the Linda problem. By reshaping the problem with natural fractions that represent our experience of the world, fewer than 15% of people violate the conjunction rule (Hertwig & Gigerenzer, 1999 [PDF]):
In an opinion poll, the 200 women selected to participate have the following features in common: They are, on average, 30 years old, single, and very bright. They majored in philosophy. As students, they were deeply concerned with issues of discrimination and social justice and also participated in anti-nuclear demonstrations.
Please estimate the frequency of the following events.
How many of the 200 women are bank tellers? ____ of 200
How many of the 200 women are bank tellers and are active in the feminist movement? ____ of 200
Humans have been evolving for about two million years, but the demands of today are a substantial shift from our past. Our diet has changed dramatically in the last few thousand years, and we started using probabilities only a few hundred years ago. Our genetics simply haven’t had time to catch up.
In the meantime, we are simply caught in a state of disequilibrium between our genes and our environment. We shouldn’t hold ourselves to unreasonable normative standards in light of our evolutionary past. While we can’t yet easily change our genes, we can architect our environment to promote the behavior we desire.
Thanks to Kristen Berman, Miles Grimshaw, and Anastasiya Novatorskaya, who provided feedback on drafts of this article.
The well-known story of how the Economist magazine’s clever pricing led to a dramatic increase in premium subscriptions – an example of the asymmetric dominance or decoy effect1 – is probably incorrect. And it doesn’t really matter.
Two separate journal articles published this year have failed to replicate this classic finding after multiple attempts with thousands of participants (Frederick, Lee, & Baskin, 2014; Yang & Lynn, 2014). These studies have helped clarify that the asymmetric dominance effect generally only occurs with salient, numerically-defined attributes in which the dominance relationship is readily encoded. In the Economist example, the choices are defined by a jumble of words and numbers. It is much easier to compare $59 to $125 than “Economist.com subscription” to “Print & web subscription.” Combining these two dimensions mutes the effect.
Why did the example “work” ten years ago and why is it “not working” today? Nobody knows. Perhaps the concept of a print magazine is simply less relevant in 2014. Perhaps participants in earlier studies were unintentionally biased in some unrecognized manner. Perhaps the random perturbations of behavior flew under the arbitrary 5% level in statistical significance testing [pdf].
The true reason for the discrepancy, if it could be determined, is irrelevant. The Economist example teaches us this crucial lesson: behavioral science research cannot tell us what will or will not translate into the real world. Nor does it claim to. All it can do is suggest ways of thinking about the problem and point us in some general directions for experimentation.
Taken together, the behavioral science literature represents a tour de force of how bad we are at things we think we are pretty good at. There is not – and can never be – a grand theory that explains all of these observations. Behavioral decision theory is not a closed-ended discipline. We have some decent constructs and psychological models, but behavioral predictions in multifactorial decision contexts are hard to make and even harder to defend. I admire and applaud authors who deftly translate academic research into accessible vignettes for a general audience, but stories lifted from academic journals are not designed to translate directly into the messy real world. That’s precisely why the Economist example might not work today, and that’s why it doesn’t matter whether it works.
This reality ought not discourage. While human behavior is complicated, the heuristic value of this research is substantial, and applications to public policy and health care are particularly encouraging. It will simply require leaders with intellectual honesty, perseverance, and a dedication to experimentation to derive practical value from the groundwork laid by decades of research.
1 If you need a brief primer on asymmetric dominance, Wikipedia has a decent article. To be fair, this is not a pure example of the asymmetric dominance effect, as the third option completely dominates the second, but this is a distinction unimportant to the broader point.
Suppose your office printer breaks. What problem do you solve?
If you answered “a broken printer,” you’re likely going to buy a new printer. Your solution was a near tautology given the structure of the problem. This is how 99% of people “solve” problems. This is one reason why 99% of people are mediocre. (It’s not entirely their fault; they weren’t trained to do otherwise.)
Try raising the problem. Take it up a few notches. Ask: Why does the broken printer matter? What function did it serve? What is the goal that printing enables?
You might conclude that printers are used for communication—a much higher order goal. Now you don’t have a “broken printer” problem; you have a “communication” problem. That’s a problem that hurts because it forces you stop and consider the situation in a way that doesn’t posit an obvious answer. It’s a real question. This new problem framing leads to an entirely different line of thinking:
How do people communicate here? Why do people print instead of email? Is the fifteen megabyte file size limit preventing employees from emailing large files? Why would someone print instead of use a projector? Are the conference rooms sufficiently private? Did we lose the VGA to HDMI adapter again? Why does the sales team dominate printer use? Are there alternatives to printing and faxing contracts to obtain signatures?
Perhaps you’ll end up with a paperless office in which everyone uses tablets, edits documents collaboratively in the cloud, and signs contracts electronically. Perhaps you’ll just replace your broken printer. Regardless, you’ll learn a lot about your environment and start to proactively solve problems that were not on your radar. Some of these problems will be interesting and meaningful.
We’ve all been told that problem framing is important for big strategic questions. The truth is that accepting the default problem leads to default solutions in any context. Managers tolerate this laziness because it results in mediocrity, not catastrophe. You probably won’t single-handedly destroy your company by rushing out to buy a new printer, but you likely won’t be the person who invents the iPad either.
Don’t be mediocre. Next time you encounter a challenge, raise the problem until it hurts—until it makes you really stop and think. Go solve that problem.
The federal health insurance exchanges launched in 2013 to a modestly encouraging but tepid insurer response. Anxious at the prospect of developing consumer insurance products to be sold through a relatively untested channel and targeted at a challenging demographic, many large insurers adopted a wait and see approach. While the marketplaces exceeded enrollment targets and enabled millions of Americans to purchase insurance, consumers in many markets faced limited options. In several states, such as New Hampshire, Mississippi, and Rhode Island, only one or two insurers participated on the exchanges.
Core to the public insurance marketplace thesis was the concept that a larger individual insurance market would attract competition, which would in turn reduce prices. However, many in the industry understood that competition on the exchange would not mature overnight. Prior to the Affordable Care Act, it was not uncommon for the dominant player in a state’s individual insurance market to enjoy a 50% market share. As coverage year 2015 health exchange filings become available, we can start to examine the relationship between market competition and insurance premiums on the federal exchange.
My first analytical approach was to plot the mean monthly premium in each state (by coverage level) against the number of unique insurers in each market (see R code on Github). A basic linear regression of premiums by number of unique insurers and by coverage level revealed no significant relationship.
This is clearly a disappointing discovery. However, it is possible that the important unit of analysis is not the mean premium in each market, but instead the minimum premium in each market. Repeating the same analysis with the lowest premium as our dependent variable reveals the following:
There is a significant negative relationship between increased competition and health insurance premiums for all coverage levels in 2014 and 2015. The regression model (collapsing 2014 and 2015 rates) suggests that each incremental market entrant corresponds to a 3.4% reduction in monthly premiums. Moreover, this trend continued as more entrants entered the market in 2015. However, linear regressions by year suggest that the size of the competition relationship is decreasing: the effect of an incremental insurer on premiums was -3.7% in 2014 and -3.1% in 2015.
A variety of potential factors could explain this pattern:
- As more insurers enter a market, the relatively fixed population is fragmented across a larger number of discrete risk pools, which leads to greater actuarial risk. However, an insurer with a dominant market share or national footprint may pool primary or secondary risk. A reduced risk profile should lead to lower premiums.
- In some markets, the past decade of provider consolidation afforded integrated delivery systems and academic medical centers with more negotiating leverage, which has led to higher contract rates. This may pose a particular challenge to new or young insurers (10-15% of the exchange market) that lack the patient volume to obtain the deep discounts required to compete on price.
- Selling insurance directly to consumers contributes to substantially higher marketing and administrative costs. However, national carriers may benefit from some scale efficiencies. Larger markets likely attract more insurers, and national insurers with potentially lower administrative costs are more likely to participate in these markets.
- Medical Loss Ratio (MLR) requirements squeeze all insurers into a narrow margin band, which deters all but the most confident and price competitive entrants. A failure to reduce mean premiums might imply that insurers are approaching a theoretical local maximum on profitability. Low margins, even on high top-line revenue, might impinge insurers’ abilities to invest in innovative benefit designs and reimbursement structures to dramatically improve savings.
These are some plausible factors that may contribute to the observed patterns, and it is important to not read too closely into such a simple analysis. The key challenge behind this dynamic is that the nature of price competition is still predominantly volume-based and not value-based. In other words, insurers in this market reduce costs by leveraging the number of patient lives they have to obtain discounts from providers, and not by fundamentally changing the incentive structures to reward cost effective improvements in patient outcomes. Only through the proper alignment of reimbursement with outcomes will our system be able to drive the transformative reduction in total costs and improvements in patient care.
Warning – this is a bit dense!
Le Nozze di Figaro, commonly known as The Marriage of Figaro, is a comedic opera written by Wolfgang Amadeus Mozart in 1786. The opera portrays the dramatic trials and tribulations of the soon-to-wed Figaro and Susanne as they progress through a series of feudal customs. While the story itself might be less familiar to the reader, the opera’s overture – Cavatina No. 3 (often referred to by its text incipit as Se vuol ballare) – is one of Mozart’s best-known pieces. The music world has studied Le Nozze for centuries, but the emerging field of music cognition presents new opportunities for novel analyses. So, what can a 225 year old opera teach us about the brain?
Musical Accentuation as Cognition
In the cavatina, Figaro flaunts his plans to thwart the Count, who plans to exercise his “feudal rights” with Figaro’s fiancée. Given the importance of this cavatina in the overall opera, Mozart set the text in a specific accentual pattern to highlight specific words that indicate Figaro’s emotional state. Interestingly, these factors create extra accent by diverging from the set metrical patterns and places extra emphasis on the word “si” to supplement the emotional plot developments. The poetic structure and musical accentuation in Se vuol ballare drives the complex rhythmic pattern that structures the listener’s perception of the piece within the overall plot of the cavatina. Using the Italian textual accentuation rules set forth by Balthazar (1992), I analyzed the poetic structure and accentuation in the No. 3 Cavatina (Figure 1). The divisions of the bars and text/notation subdivisions are derived from an equal combination of the auditory perception of the piece and the musical score.
Figure 1: Poetic structure and accentuation in No. 3 Cavatina from Mozart Le Nozze
P = piano text setting
T = trunco text setting
S = sdrucciolo text setting
A = agogic accent
ME/T = metrical/textual accent
HM = hypermeasure
1 = words correspond roughly 1:1 with the first violin voice, textual accents match with agogic
2 = words do not correspond roughly 1:1 with violin voice
3 = strong disparity between metrical and agogic accents – feels like two separate beats
Hypermeasure = corresponds with hyperbeat for the motive for each section
Major accent = corresponds to location of most salient accent within each hyperbeat (note location/hypermeasure)
* = only a partial hypermeasure
Blank spaces indicate indeterminable or insignificant patterns
It is clear from Figure 1 that a pattern of A, B, C, D, A’, and E emerges, where the capital letters designate overall sections. Within these major sections are smaller text and notation subdivisions that are separated based on the coherence between Figaro’s vocal notation with the major first violin score. For example, section A involves an overarching rhythmic notation and textual pattern between measures 1-42. It includes the subdivided section of A1 in which Figaro’s notation follows roughly in a 1:1 pattern with the first violin notation in terms of pitch variation and note length, a section of A2 in which the words do not correspond with the first violin notation, and a second A1 section similar to the first A1 subdivision. This chart matches the auditory experience of the piece in which the beginning sections start with similar measures 1-42 consistent in content and musical notation.
Next, measures 42-55 contribute to the beginning of a climax in which the text is mismatched with the musical notation of the violin parts. Subsequently, measures 56-63 provide a connection between the more dramatic sections with the impending subdivision in measures 64-103 that contribute to the most significant climax in the cavatina. In this section, Figaro flaunts his plans to beat the Count at his own game, which is the central theme of the aria.
Finally, Figaro repeats the opening lines in measures 104-122, and measures 123-131 provide a flourish of instrumentation to mark the end of the cavatina. The interplay between the 3/4- metered of sections A, B, C, A’, and E interrupted by the 2/4 metered subdivision of D emphasizes the climactic structure of the piece. The patterns in notation and plot provide Mozart with fertile ground for the manipulation of the listener’s attention through deliberate accentuation.
Mozart’s use of segmentation created by piano endings in the subdivided 3/4-metered sections of the first A1, second A2, C2, and A1’ create a pattern of accentuation to which the listener’s attention becomes entrained. The combination of the musical elements in this section results in a specific pattern: trunco (where the final syllable is accented) and piano (where the penultimate syllable is accented) text setting endings are combined with significant auditory accents caused by metrical/textual accents in coincidence with agogic accents (see Figure 2).
Figure 2: Pattern of Metrical/Textual & Agogic Accents in Measures 1-8 of Le Nozze
Moreover, the compounding of these accents is exacerbated through its interaction with the textual accents. Figure 3 displays the segmentation of the lyrical content in the first A1, second A2, C2, and A1’ sections. A pattern of quinario lines with piano and/or trunco endings combined with a metrical/textual notational accent on the stanza-end quaternario line with either sdrucciolo (where the antepenultimate syllable is accented) or trunco endings develops a strong pattern of accentuation in these sections. Additionally, the melismaticism created by the presence of alternating or consistent quarter notes per line for each quinario ending increases the power of the metrical entrainment. This pattern repeats throughout the piece, lending credence to the notion that Mozart deliberately used the combination of piano and trunco endings with textual and agogic accents to provide an entrainable rhythm that drives the listener’s perception of the cavatina.
Figure 3: Segmentation in No. 3 Cavatina from Mozart Le Nozze
STA = stanza number
REP = indicates repetition
QPL = quarter notes per line
MT/E = metrical/textual accent
Scansion = ending
The grouping of the abovementioned notational and accentual effects generates a hypermeasure that is then shifted by abnormal accentuation on the word “si.” The metric grid in Figure 4 for measures 1-4 of No. 3 of Le Nozze di Figaro demonstrates the various Metrical Performance Rules (MPRs) outlined by Lerdahl & Jackendoff (1983). Figure 4, when considered along with Figure 1, reveals a two-bar hypermeasure with a the major accent (combination of metrical/text accent and agogic accent) in section A1 of the cavatina. This two-bar hypermeasure is similar in construct in terms of the MPR rules in measures 9-12, although the major accent that anchors the hypermeasure shifts to the 7th beat in the sequence. Measures 13-16 should have been structurally similar to measures 1-4. However, the existence of the extra word “si” shifts the hyperbeat to a position more distant from the major accentual position, which is on the 4th running beat (first note of the second measure) of the 13-16 measure segment. While the metric grid differences between measures 1-4 and 9-12 are significant, they are consistent with a simple shift in major accentuation position, whereas the shift between measures 1-4 and 13-19 constitute an entire displacement of the hypermeasure. It is important to recognize that an extra quarter note in the first violin voice at the 11th position in measures 13-16 creates a whole note at that position. This additional note helps shift the beat that anchors the hypermeasure. This finding is significant because it reveals that this hyperbeat shift is caused directly by the unique accentuation of the word “si.” This departure from the expected attentional oscillations implies that the word “si” garners the listener’s attention by disrupting the expected rhythmic patterns.
Figure 4: Metric Grid for No. 3 Cavatina from Mozart Le Nozze (mm. 1-4, 12-19)
The Link between Lyrical and Musical Attention
This analysis reveals that the combination of primary metrical/textual and agogic accents interacts with the piano and trunco line endings to give rise to a hypermeasure that is shifted with the addition of the word “si,” thus breaking the listener’s internal attentional oscillations. Considered within the context of the cavatina, in which an interplay between the 3/4 measure segments and 2/4 measure segments gives rise to a climax consistent with the plot of the opera, the early and repeated shift in attention plays an important role in the listener’s perception of the storyline. By punctuating Figaro’s opening sentiments about his interaction with the Count through the addition of the word “si,” Mozart disrupted listener attention to highlight the emotional climax of the plot. The interruption of the attentional oscillation introduced by the word “si” suggests that Mozart had an intuitive understanding of the interplay between lyrical and musical attention — a finding that music researchers discovered over 100 years later.
1. Balthazar, Scott L., “The rhythm of text and music in ottocento melody: An empirical reassessment in light of contemporary treatises,” Current Musicology 49 (1992): 5-28; adapted from Robert A. Moreen, “Integration of text forms and musical forms in Verdi’s early operas” (Ph.D. diss., Princeton University, 1975).
2. Lerdahl, Fred, and Ray Jackendoff. “Theoretical Perspective.” In A Generative Theory of Tonal Music, 1-31. Cambridge: MIT Press, 1983.
Traditional insurance prices health products and services according to their cost. Formularies, which are the schedules of drug benefits created by insurance plans, operate under a one size fits all model. As a result, insurance plans inadvertently discourage many health behaviors with high clinical value by charging for them at prices irrespective of their benefit to the patient. This leads to misaligned incentives and suboptimal outcomes. For example, a 2004 study demonstrated that decreasing medication co-payments for patients with coronary heart disease (the leading cause of death [pdf]) from $20 to $10 increased medication adherence from 49% to 76%. Poor medication adherence alone accounts for about 15% of total health care spending in the United States, and a nontrivial portion of this expense could be prevented by removing cost barriers to accessing medications.
Value-Based Insurance Design
Value-based insurance design (VBID) aims to solve the failure of the cost-based, one size fits all approach. VBID acknowledges decades of behavioral economics research demonstrating that humans do not follow the strict axioms of rational decision making. VBID recognizes patient heterogeneity and adjusts out-of-pocket costs to match the expected clinical benefit according to evidence-based research. While traditional insurance design pairs a patient with an insurance product, the value-based insurance product is optimized for the patient, who is unable to properly assess the costs and benefits of most clinical options. VBID suggests that a patient with coronary heart disease should have lower co-payments for medications that drive the best outcomes. An insurance plan might also waive co-payments for routine physician visits for patients with chronic disease. These outpatient interactions are designed to monitor health, ensure medication adherence, and spot early indications of condition deterioration – often preventing expensive emergency inpatient care. VBID typically offers disease prevention programs promoting physical activity and proper nutrition at little to no cost. Type 2 diabetes adds an average of $85,000 in lifetime health care costs, so the expected outcome of preventive measures is often worth the up-front investment, as at least one empirical study has shown. At the same time, VBID discourages the use of high cost, low value services. Identifying these services is challenging and sometimes controversial, but there are obvious interventions that reduce systemic costs. Colonoscopies in the same city frequently vary in price by up to 300%. VBID often implements reference pricing to reduce overpaid diagnostic tests. Value-based insurance design helps alleviate the patient decision burden by incentivizing the high value services that improve health outcomes.
The Challenge of Economic Incentives
The Affordable Care Act follows the value-based insurance approach by permitting incentives for favorable behaviors, such as smoking cessation, weight loss, and exercise. These incentives are primarily economic – e.g., insurers may levy a 50% premium surcharge for smokers. Will it work? The empirics tell a mixed message. While 87% of large employers offer wellness programs, participation rates are abysmally low (5% to 8%). Benefit consultant Towers Watson predicted that an incentive of $350 per employee would be required to boost health risk appraisal (HRA) participation above 50%. The story is worse for more involved wellness programs that could meaningfully impact health outcomes and costs. Fewer than 10% of employees even attempt to enroll in a weight loss program, even when offered $600 or more to participate. Economic incentives are a necessary but insufficient component of value-based insurance design.
Behavioral Sciences and Population Health
The rules of trade will be rewritten as our health care system reorients towards population health. In the coming decade of care delivery and reimbursement experimentation, the roles of payers (“payors”), providers, and employers will shift in potentially dramatic ways. Accountable care organizations (ACOs) paid under some combination of capitation and fee-for-service with shared savings arrangements will shift risk to providers, who will be incented on patient outcomes. With the shift from reactive treatment and hospitalization to proactive prevention, many revenue line items on a hospital P&L will swap places with expenses. A real challenge for payers and providers in the coming decade will be to orchestrate behavioral change at scale – using techniques that go beyond economic incentives. But who owns these capabilities? Providers know how to treat illness. Insurers understand how to manage actuarial risk. And self-insured employers know how to write big checks. How can physicians conduct patient outreach and activation? What tools can they use to encourage patients to join weight loss programs, change their diet, or start exercising? And how do they monitor and coach patients through these behavioral changes?
The behavioral sciences will play an important role in this transformation, but the current capability gap is enormous. We know more about an individual as a consumer of discretionary products than as a consumer of essential health services. And we do more to convince the former to buy expensive headphones than the latter to get a flu shot. The academic community has proactively increased research at the intersection of health and decision theory. Most notably, the University of Pennsylvania’s schools of medicine, law, and business have joined together with Carnegie Mellon’s Center for Behavioral Decision Research to form the Center for Health Incentives and Behavioral Economics (CHIBE). But we can start with the existing playbook used by advertisers and begin applying these techniques more aggressively to drive health outcomes and improve lives. We can’t afford to do otherwise.
I enjoy speaking with industry folks about the behavioral sciences, but it’s often frustrating – and it’s not their fault.
While I sit and listen to well-educated and thoughtful individuals extol the benefits of using framing effects to boost organ donation rates or anchoring to secure a better price in a negotiation, I often think about some of the fascinating research from the past decade that they haven’t even heard about. I think about the number of times I read a relatively unknown journal article and thought “just wait until Amazon learns about this…” or the see a social problem that could in some way benefit from research locked away in academic databases.
The laws of human behavior are being rewritten in a handful of PhD classrooms around the world. Academics are making discoveries about decision making, social behavior, and motivation that can meaningfully improve lives. But it will most likely take a decade, and probably two, before the average interested individual will even hear about them.
Why do findings about core human behavior often take so long to filter into industry? Here we can borrow from neoclassical economics: to understand the behavior, follow the incentive structure.
Professors are incentivized for the lifelong privilege of tenure by publishing original research – and not by repackaging decades-old research into pop psychology business books. For supplemental income, it’s much safer and easier to do one-off consulting projects. Plus, since many of the behavioral sciences involve open-ended theories (i.e., no proofs as in economics), authoring a book with practical implications is exceedingly difficult. You can’t fault academics for not translating their research for a broader audience when the incentive structure does not encourage it.
Example: The Jam Study
Consider the “jam study”, which demonstrated that more choices leads to fewer purchases, by Sheena Iyengar at Columbia and Mark Lepper at Stanford. Iyengar conducted the study in the mid-1990s while still a graduate student, and due to the slow pace of academic research and peer review, her findings were not published until 2000. However, the jam study has only recently entered the industry consciousness. Even the New York Times editorialized in 2010: “There is a famous jam study (famous, at least, among those who research choice)…” One might point out that now, over a decade and a half later, even psuedo-news sources cover the jam study, but the fifteen year lag between academia and industry still remains a moat too wide for many of us to bridge.
How do we tighten the cycle between academic discovery and industry discussion? Unfortunately, I don’t have a great answer just yet. However, the recent and sustained popularity of Dan Ariely might be evidence that private academics can bridge the public consciousness gap. While public thought leaders must write about interesting topics in order to sell copies, necessarily leaving behind countless avenues of interesting research, Dan Ariely’s fame is outliving the launch of each of his books. This trend is some promise that figures like Dan will serve as a vehicle for transmitting research into reality.
Old World Monkeys and Social Structures
While the modern human Homo sapiens sapiens subspecies evolved from the ancestors of Old World monkeys (e.g., baboons, rhesus macaques) approximately twenty-five million years ago, many of our modern social behaviors are remnants of our evolutionary past. All primates, both human and non-human, live in social groups that sustain life by promoting cooperation and group cohesion. In fact, all social animals arrange themselves into groups with non-random structures that influence cooperative social behavior, including mating, food gathering, patterns of association, and social learning. The importance of group structures gives rise to a dominant supporting behavior: primates constantly act to maintain long term relationships.
Grooming as Relationship Maintenance
In non-human primates, grooming is the primary behavior to maintain social relationships. Grooming typically involves reciprocated sessions, often temporally distinct, in which one monkey combs the hair of another to remove parasites. In addition to its anti-parasitic benefits, grooming plays a larger sociopositive role as the most fundamental form of social intercourse – it is related to fighting and dominance, reproductive behavior, and kin relations. Grooming typically mirrors the group hierarchy, so the despotic alpha male directs and receives the most grooming. Overall, public sociopositive grooming behavior, which monkeys observe and track, plays an integral role in creating and maintaining primate social structures.
Facebook as Modern Grooming
While grooming is the primary primate sociopositive behavior, no single action inhabits that role for humans. Privacy, expense, and researcher sanity considerations make longitudinal human interactions difficult to observe. However, Facebook provides an interesting set of structured and generally authentic data that parallels monkey grooming. Facebook enables individuals to maintain relationships, indicate public approval, and broadcast one’s affect to one’s group of friends. Is Facebook the modern equivalent to grooming?
Social Network Analysis
Social network analysis is one method particularly useful in understanding social structures of groups due to its ability to describe networks in terms of relationship properties and patterns. Techniques from this field have been used to investigate social behavior in a variety of species, including buffalo, elephants, dolphins, and fish. Both monkey grooming and Facebook status likes and comments have several structural commonalities that enable the social analytic approach: the behavior indicates public dyadic appropriation among individuals, and the reciprocal relationships encapsulated in dyads help maintain social relationships, including perceived hierarchy and rank.
I used the Facebook API to scrape twenty-five status updates with likes and comments from over 100 individuals (anonymized) in my network. I coded each like or comment as one directed link from an individual to another, and constructed an egocentric network for all individuals in the network using Pajek. (Pajek is easier to use for quick network explorations than R.) After removing myself from the data set, the resulting network was comprised of vertices (nodes) representing individuals and arcs representing the interaction strength between each individual (M = 2.17, SD = 2.30, MIN = 1, MAX = 18). I extracted the largest component for comparison with the primate data.
I assembled the primate social network from an academic paper with detailed observational data about the grooming behavior of free-ranging rhesus macaques (Macaca mulatta) on the island of Cayo Santiago, Puerto Rico, a population still used today in comparative cognition research. Rhesus macaques are so similar to humans that 60-70% of all National Institutes of Health grants involving primates use this species. I coded the directed dyadic grooming behavior for all sixteen monkeys, represented as vertices, and the collective grooming duration as the arc strength (M = 6.10, SD = 10.36, MIN = 1, MAX = 63).
Descriptive Measures and Network Density
Since primate and human social systems differ in size and density, I focused on the quality of network structure and the constituent relationships. Descriptive measures of each network are listed below. The Facebook data had fewer nodes (vertices) than the monkey data (11 vs. 16), and there were fewer directed ties between nodes in the Facebook data (35 vs. 110). These differences lead to two important considerations about the underlying type of data collected and the nature of the types of interactions that produced these networks. While the monkey data was representative of the sum total of interactions that occurred between all actors in a fixed population, the Facebook data represents a slice of one type of interaction of one subgroup of a population. At the same time, while grooming behavior is the primary form of reciprocal behavior in primate networks, Facebook status liking and commenting are just one of many forms of this type of behavior for humans.
Network density is a measure of the proportion of actual arcs relative to possible arcs. The monkey network was 50% more dense than the Facebook network, thus indicating more links between individuals.
|Density w/o loops||0.318||0.46|
|All closeness centralization||0.43 vs. 0.27||0.27 vs. 0.24|
|Betweenness centralization||0.27 vs. 0.14||0.11 vs. 0.03|
|All degree centralization||0.406 vs. 0.10||0.24 vs. 0.12|
|Watts-Strogatz Clustering Coefficient||0.53 vs. 0.29||0.60 vs. 0.44|
|Network Clustering Coefficient (Transitivity)||0.45 vs. 0.30||0.56 vs. 0.44|
All measures of Facebook and monkey network data (“vs.” expected values)
Centrality: Betweenness, Closeness, and Degree
The three primary measures of centrality – betweenness, closeness, and degree – indicate the importance of a node in a network based on structural positioning.
Betweenness centrality is a topological distance metric that reports the extent to which nodes lie on directed paths between other nodes. Nodes that lie on the shortest paths between others have higher betweenness centrality scores. In general, nodes with high betweenness scores have more control over information or behavioral flows. In humans, individuals with high betweenness centrality transmit norms, cultural knowledge, and behaviors to others in the network. The Facebook network had a betweenness centrality score more than twice than expected in a network of comparable structure (0.27 vs. 0.14). The monkey network had a betweenness centrality score three times greater than expected (0.11 vs. 0.03). The presence of significantly higher betweenness centrality scores in both networks indicates a hierarchical composition, and the greater betweenness centrality in the monkey data supports the notion that individual roles are more clearly defined and obeyed in primate social hierarchies. The alpha monkey had the highest betweenness centrality score.
Closeness centrality is the distance between nodes, and indicates the speed of interaction between nodes. In effect, nodes with the highest betweenness centrality scores are able to easily monitor others in the network. The monkey network had a closeness centrality score only slightly higher than expected (0.27 vs. 0.24). However, the closeness centrality score of the Facebook network was 60% higher than expected (0.43 vs. 0.27), meaning that individuals in the Facebook network were significantly more closely connected with each other than in a similar statistical network.
Degree centrality measures the overall connectedness of the network by looking at the number of arcs connected to each node. Degree centrality in the Facebook network was more than four times greater than expected (0.41 vs. 0.10), which indicates that the Facebook data was much more connected than a random network. The monkey network had a degree centrality score that was roughly twice than expected (0.24 vs. 0.12).
The network clustering coefficient measures the magnitude of adjacency between individual nodes. This local measure provides a means of understanding the extent to which subgroups exist in a network, which is particularly important in understanding affiliative interactions. The clustering coefficient for the Facebook data was nearly twice than expected (0.53 vs. 0.29), whereas the monkey network clustering coefficient was only a third larger than expected (0.60 vs. 0.44). The larger than expected clustering in the monkey network points is evidence of cliques.
One tool to understand the patterns of connections between individual nodes in a network is the triadic census. Directed links between three vertices are called triads, and these links can be organized into sixteen distinct combinations (below). The prevalence of each triad, represented as the number found in a network relative to the expected number based on the number of arcs and nodes, provides valuable information about the underlying structure of a network.
|Triadic modes overrepresented in both networks
|Triadic modes overrepresented in the
|Triadic models interestingly underrepresented
Visual triad representations (De Nooy, Mrvar & Batagelj, 2005)
According to balance theory, individuals seek consistent relations among connections and act to reduce the number of intransitive triads. There are two types of triads that are prohibited in balanced networks: forbidden triads – in which A has two friends B and C, where B and C are not friends with each other, identified as 201 – and transitive triads, in which triads are at different hierarchical levels, identified as 012. A balanced network should have fewer triads classified as 201 or 012, because stable networks require symmetry. Both networks exhibited more 201 forbidden triads than expected, and the monkey network showed significantly more 012 transitive triads. Perhaps the greater ratio of found to expected triads in the Facebook network (7:1) versus the monkey network (2:1) indicates that Facebook networks are less “real” in the sense that relationships that should not exist are relatively prevalent. The presence of many non-permissible transitive networks (012, 210) indicates a tendency towards hierarchy. On the other hand, the overrepresentation of the 300 type triad – individuals in the same clique with complete mutual reciprocal influence – in both networks indicates that more balance than normal. In fact, the Facebook and monkey networks were 35 and 7 times more reciprocal than expected. The combination of balanced reciprocity, intransitivity, and hierarchy point to similar network structures that are hierarchical in global structure but have intra-level balance.
The absence of specific triad models is also interesting. Most notably, there was an underrepresentation of 021 triads, in which two nodes have unidirectional links to a single third node. This triad would represent a situation in which two monkeys groom a single monkey, but receive no grooming in return. It would have been surprising if this type of triad were frequent in either network, given the reciprocity inherent in both social behaviors. Similarly, the 030 triad model – in which each node is unidirectionally linked to exactly one other node – was also underrepresented. This triadic model of indirect reciprocity would be antithetical to the public signaling inherent in both monkey grooming and Facebook status liking.
Conclusions & Thoughts
So, what can monkeys teach us about Facebook? At least in this brief analysis, we can uncover striking similarities and important differences between rhesus macaque grooming and Facebook likes and comments networks:
- Grooming and Facebook networks show significantly higher levels of reciprocity than a statistically similar network.
- The grooming network demonstrated a higher hierarchical structure than the Facebook network.
- The grooming network demonstrated less connectedness than the Facebook status network.
The coexistence of balanced but hierarchical triads is a particularly puzzling outcome of the analysis. An analysis of intra-level cliques would provide a closer look at how different hierarchies interact and the relative strength of the ties within hierarchies. Similarly, only the alpha male node was identified in this data set, but there are important structural differences based on the age of each actor. Understanding the correlates between structural position and monkey age could point to underlying variations in how different genders and ages are included into social fabric of a primate colony. A similar comparison could be conducted with individuals joining social groups, with a time-series analysis offering a structural view at this process. Collecting this sort of data would require an enormous amount of behavioral observation, but future research in this area might yield answers to interesting questions about the primitivity of prosocial networks.