I would like to speak to you tonight about econometrics and its role in the analysis of the economic effects of proposed changes in public policy. Econometrics is the subfield of economics whose focus is the development of statistical techniques and their and application to economic data. The typical use of econometrics is in the quantification of relationships between and among economic variables. The two primary areas of interest to econometricians are: 1) forecasting future values of economic variables; and 2) measuring the effect that a change in the value one economic variable has on another. My talk tonight will deal mainly with the latter and, in particular, will focus on quantifying the effect of a change in a qualitative policy variable on a given economic outcome variable of interest. By qualitative policy variable I mean a binary (on/off, yes/no) variable that is, to some degree, under the control of a policy maker. For example, a key component of nearly every health care reform proposal is the extension of public health care insurance coverage to segments of the population that are not currently covered. Health insurance coverage is a qualitative policy variable and measuring its potential effect on aggregate health care costs should be of great interest to policy makers. The key question to be answered by the econometrician is "How much will the typical individual, who does not currently have health insurance, increase his usage of the health care system if he is given coverage?" The outcome variable of interest in this context is a specified quantitative measure of health services utilization (e.g the number of doctor visits, or the number of prescriptions filled). The effect of such a qualitative policy variable is often referred to as a treatment effect and the variable itself is termed the treatment variable.
On its face, the evaluation (estimation) of a treatment effect does not appear to be a difficult task. It would seem that one can simply follow the lead of the chemist who seeks to determine the effect of light (the treatment variable) on the speed of a particular chemical reaction (the outcome variable). The chemist takes data on these two variables and estimates the treatment effect of light by computing the difference between the average reaction time for experiments done in light vs. the average reaction time for those conducted in the dark. This is the so-called difference-of-means approach to treatment effect estimation. Likewise, the econometrician might estimate the treatment effect of health insurance on health services utilization by computing the difference in average utilization for those who have health insurance coverage as opposed to those who do not. Unfortunately, there is an important distinction between the health insurance and chemistry examples that typically precludes the use of the difference-of-means method for estimating treatment effects in the public policy realm. This distinction has mainly to do with the manner in which the data are gathered. The chemist generates data on reaction time and the presence of light as the result of a series of carefully controlled laboratory experiments that are conducted while holding other relevant influences constant. Such potentially influential variables might include ambient temperature, and the composition of the medium in which the reaction is to take place. By controlling these "nuisance" variables, the chemist can be reasonably sure that observed changes in reaction time are due only to differences in the treatment variable (light). In analyzing the potential effect of a public policy change on the value of an economic variable of interest, the econometrician does not typically have experimentally generated data at his disposal. Most often the only data available for analysis is nonexperimental data that came from a retrospective survey of individuals in the real world. Therefore extraneous influences on the outcome variable are not controlled and the difference-of-means estimator of the treatment effect is likely to be biased. To see this, consider again the analysis of the potential increase in health services utilization resulting from the extension of health insurance coverage to uncovered segments of the population. In surveying individual health insurance status and health services utilization rates, important variables like age are not (in fact, cannot be) held constant. If older individuals are more likely to have health insurance (which is probably true), and older individuals tend to use more health care services (which is also probably true), and the analysis fails to control for the effect of age, then it will be difficult to determine whether observed increases in utilization are attributable to health insurance or are merely due to differences in age.
In the remainder of the talk I will discuss ways of controlling for
the effects of nuisance variables in the context of the estimation of policy-relevant
treatment effects. We begin by examining situations in which the nuisance
variables are observable. The most common solution in this case, regression
analysis, is not new. We will nevertheless review it for the uninitiated.
The review will also help to fix ideas for the subsequent discussion. The
next part of the talk will cover two ways to avoid bias for the more troublesome
case in which the culprit nuisance variables are unobservable. The first
of these is the so-called pseudo experimental sampling approach in which
individuals are randomly assigned to treatment and non-treatment groups.
Pseudo experiments, though effective for eliminating bias in the estimation
of treatment effects, can be quite costly. The second way to avoid bias
is by applying appropriate econometric techniques to nonexperimental (retrospective
survey) data, which is typically less expensive to collect than pseudo
experimental data. Here I describe a new econometric method that I have
developed for this purpose. Two applications of the new econometric technique
are then considered. The talk concludes with a discussion of directions
for future research.
Controlling the Influences of Observable Nuisance Variables
There are two types of nuisance variable -- observable and unobservable.
If the nuisance variable is observable, its influence can be directly controlled
by the use of the appropriate statistical technique regardless of whether
the data were obtained experimentally or nonexperimentally. The statistical
technique to which I refer is called regression analysis. By properly
"weighting" each of the nuisance variables according to its impact on the
outcome variable, regression analysis allows unbiased estimation of the
treatment effect. In the health insurance example mentioned earlier, suppose
age is among the variables included in the sample survey. In applying regression
analysis to the estimation of the treatment effect of health insurance
on the number of visits to the doctor made by an individual per year we
would write
V* = ?1 + ?2HI + ?3AGE
where
V* denotes the mean (average) number of doctor visits for an individual per year
HI = 1 if the person has health insurance, 0 otherwise
AGE = the person’s age in years
?2 = the difference between the average number of doctor visits for insured vs. uninsured, after controlling for the effect of age (this is the insurance treatment effect)
The ?s (i.e. the weights) are unknown but
by applying regression analysis to a sample of data we can obtain estimates
of these weights. As a practical matter, we can now estimate the impact
of proposed public policy measures that provide for universal health insurance
coverage. For example suppose we applied regression analysis to a nationally
representative sample of data and, among the other ? estimates suppose
we obtained the following insurance treatment effect
func{beta hat sub 2 ~=~ 4}.
This result would be interpreted as a projected increase of four doctor visits per year for the average uninsured person in the event that he is given insurance coverage. If we multiply this number by the approximately 45 million Americans who do not currently have health insurance, we obtain the number of additional visits per year that would be expected under universal health insurance coverage – 180 million. Moreover, if the average doctor visit costs $100 then the projected increase in health care cost will be $18 billion per year. This is of course, only one aspect of health care utilization that will be affected by the expansion of public health insurance coverage. Similar analyses would hold true for such health services as hospital care, prescription drug usage, and mental health care, just to name a few. From this example, one can easily see that unbiased estimation of the treatment effect of health insurance is a key component of accurate assessment of the projected aggregate cost of proposed health care reforms.
Controlling the Influences of Unobservable Nuisance Variables
Unbiased estimation of the treatment effect is more difficult when a variable is related to the treatment variable but not observable. In our health insurance example suppose individuals suffer from unobserved medical conditions that influence both their health service utilization and the likelihood that they will choose to be insured. This variable will result in the same kind of bias as that caused by the age variable. The problem here is that regression analysis is not directly applicable because such medical conditions are not reported and therefore are unobservable. There are two ways to control for such unobservable nuisance influences -- 1) pseudo experimental data; and 2) econometric technique applied to nonexperimental data.
Pseudo Experimental Data Data can be pseudo-experimentally generated in the sense that individuals can be randomly assigned to the treatment and non-treatment groups. It is theorized that such randomization effectively breaks the link between the treatment variable and the nuisance variable. For example in the estimation of the health insurance treatment effect, individual choice with respect to health insurance would be eliminated by randomization. This effectively eliminates any systematic relationship between unobserved medical conditions and health insurance status. This in turn means that we should expect no bias due to this unobserved nuisance variable. This was the motivation for the Health Insurance Experiment (HIE) conducted by the RAND Corporation for the US Government in the late 1970's and early 1980's. In the HIE, 2,000 families were randomized into five different health insurance plans categorized by the coinsurance rate (percentage of medical expenses paid out of pocket) which ranged from 0 to 95 percent. Data on many of these families were collected yearly over a five-year span. Various researchers have argued that the randomization implemented in the HIE did indeed break the link between unobservables and health insurance status, allowing unbiased estimates of the effects of different coinsurance rates on health care utilization. There are, however, two important shortcomings of this pseudo experimental approach. First, although randomization may eliminate bias, pseudo experimental data it is typically more expensive than comparably sized nonexperimental surveys. The cost of the HIE exceeded $210 million in today’s dollars. The most promising nonexperimental alternative to the HIE is the Medical Expenditure Panel Study (MEPS) initiated by the Agency for Health Care Policy and Research (AHCPR) in 1997. The MEPS is a yearly survey consisting of four components: 1) the household component involving 10,000 families; 2) the nursing home component involving 800 nursing homes; 3) the medical provider component involving 2,700 hospitals, 20,000 physicians, and 300 home health care providers; and 4) the insurance component involving 5,700 employers, 150 union officials, and managers at nearly 20,000 establishments. In order to draw comparisons with the HIE, I estimated the cost of a five-year run of the MEPS using figures I obtained from AHCPR. My cost estimate of $172 million, coupled with the fact that the MEPS is much more comprehensive than the HIE, explains why we will probably never again see such a large-scale pseudo experiment as the HIE. Secondly, although one might argue that the coinsurance effects estimated from the HIE can be extrapolated to measure utilization rates for the uninsured (100 percent coinsurance rate), strictly speaking the uninsured were not included in the study. To be specific, for ethical and practical reasons, families were not randomized into an uninsured category so that the insurance treatment effect, as we defined it earlier, cannot be directly estimated. It is the very nature of pseudo experimentation that often brings it into conflict with such practical and ethical considerations.
Econometric Technique Applied to Nonexperimental Data We saw earlier that regression analysis can be used to control the effects of observable nuisance variables that simultaneously influence both the treatment variable and the outcome variable. Extensions of the regression analysis method can be used to deal with unobservable nuisance variables. Let us summarize the culprit unobservable nuisance variables as U. In the health insurance example, as we discussed earlier, the main component of U might be medical conditions that are known only to the individual. Note that the bias problem would be solved if we could include U in the regression analysis weighting scheme. Specifically, we would write
V* = ?1 + ?2HI + ?3AGE + ?*U
where ?* is the regression analysis weight for the unobservable U, and all of the other components are defined as above. As we know, however, we cannot directly apply regression analysis because we cannot observe U. Although we cannot observe U for a particular individual in the sample, we might have some knowledge (or be able to make some reasonable assumptions) about how the relative frequencies of the various values of U stack up across individuals in the population. In technical jargon, we might be confident in making assumptions about the probability distribution of U. As it turns out, after a rather lengthy mathematical argument, knowledge of the probability distribution of U is all that is needed to make the appropriate correction for biases caused by the unobservable variable U. I will spare you the technical details. The more technically inclined among you can consult the bibliography that I will give at the end of the talk.
This does not, however, completely solve our bias problem. Typically, the regression analysis formulation that we have suggested above is inappropriate because it does not allow for the possibility that the treatment effect of insurance will differ depending on the age group (or any other relevant observable nuisance variable). To be specific, ?2 is assumed to be constant across the population. Moreover, our simple regression analysis formulation does not place realistic restrictions on the value of the outcome variable. For example, the doctor visit outcome variable is strictly positive. In this case an alternative strictly positive regression analysis formulation is called for. In other contexts the outcome variable of interest may be the probability of the receipt of a particular treatment (e.g. vaccination). In such cases the regression analysis formulation should be restricted to values between 0 and 1. In order to deal with the potential dependence of the treatment effect on other variables, and restrictions on the range of the outcome variable, the regression analysis must be reformulated in a way that makes accounting for the effect of the unobservable nuisance variable more difficult. In some of my recent work, I have shown that even in these more complicated cases all that is needed is some general knowledge about the probability distribution of U, and I have derived methods for correcting the bias in treatment effect estimation (technical details can be found in the bibliography given at the end of the talk).
Applications in Health Economics
I would like to briefly discuss two applications of my methodology in health economics. The first application demonstrates the importance of correcting for unobservable effects in nonexperimental data. The second application, in addition to making this point in a different context, shows that formulating the regression analysis to take account of the restrictions on the outcome variable can also be important.
The Effect of Physician Advice on Alcohol Consumption Don Kenkel (Cornell University) and I examined the effect of physician advice on the demand for alcohol. The private and public costs of alcohol abuse and its negative consequences are substantial. Data from the 1993 National Longitudinal Alcohol Epidemiology Survey indicate that nearly 7.5 percent of Americans (almost 14 million) were alcohol abusers and/or alcohol dependent in 1993. Moreover, it has been estimated that over 100,000 deaths a year are alcohol-related. Policy makers have imposed higher alcohol taxes and enacted stricter drunk driving laws, partly in response econometric studies demonstrating the effectiveness of such policy measures for reducing alcohol abuse. These approaches to solving the problem, though effective, have their shortcomings. Taxes impose costs on responsible drinkers, and enforcement of drunk driving laws can be quite costly.
Advice from physicians on the adverse effects of drinking is a preventative measure that is relatively inexpensive and imposes little or no cost on responsible drinkers. By providing advice and counseling, physicians have the opportunity to influence their patients’ drinking practices before serious alcohol-related problems and alcohol dependencies develop. The objective of our research was to evaluate the potential effectiveness of public policy aimed at encouraging physicians to regularly offer their patients advice on the adverse effects of drinking. The focus of our investigation is therefore the estimation of the treatment effect of physician advice on alcohol consumption. The outcome variable we chose to analyze was the number of alcoholic beverages consumed in the two-weeks prior to the survey. The treatment variable of interest in this context is a binary variable indicating whether or not the individual had ever received advice from a physician regarding the adverse health consequences of drinking. Using data from the1990 National Health Interview Survey, we estimated the treatment effect in three ways. First, we used the simple difference-of-means approach and obtained an estimated treatment effect of 7.79. According to this result, by giving their patients advice about the adverse health effects of drinking, doctors drive patients to drink. We then used regression analysis, controlling for such variables as income, age, education, race, marital status, employment status, and region of residence. The regression results were somewhat less disheartening, though still counterintuitive, yielding an estimated treatment effect of 7.60.
We surmised that the treatment effect estimate was upward biased (in fact positive) due to unobservable effects. For example, suppose that physicians diagnose individuals as alcoholics (either formally or informally) but this diagnosis is not observed as part of the 1990 NHIS. If diagnosed alcoholics are more likely to get advice and at the same time tend to drink more, then naive methods like difference-of-means and regression analysis that do not account for the influence of alcoholism will tend to overstate the treatment effect of advice. In our case, the treatment effect of advice is so overstated that it has a counterintuitive sign. It is positive when we expect it to be negative. Therefore, as a third approach to estimating the treatment effect of physician advice, we applied my methodology (discussed earlier) in which information about the probability distribution of the culprit unobservable is used to correct the bias, and the regression formulation is designed to obey the restrictions imposed by the nature of the outcome variable -- recall that the outcome variable in this case is the number of drinks and therefore is restricted to be nonnegative. With this method we obtained an estimated treatment effect of -4.5. This result indicates that advice from a physician can lead to significant decreases in alcohol consumption.
The Effect of Alcohol Abuse on Employment Outcomes As the preceding discussion implies, reducing alcohol abuse is of considerable public policy importance. To the extent that public policy is effective in this regard, alcohol abuse can itself be viewed as a controllable qualitative variable, and estimation of the treatment effect of alcohol abuse becomes a legitimate research goal. One sector of the economy in which alcohol abuse is suspected to have serious adverse effects is the labor market. John Mullahy (Wisconsin) and Jody Sindelar (Yale) attempted to estimate the treatment effect of alcohol abuse on employment outcomes. They categorized employment status in the usual way as: 1) out of the labor force; 2) unemployed; or 3) employed. As is often done in categorical analyses like this one, they cast the discussion in terms of two outcome variables: 1) the probability of being unemployed vs. out of the labor force; and 2) the probability of being employed vs. out of the labor force. To control for the influences of observable nuisance variables they took a regression analysis approach to estimation of the alcohol abuse treatment effects on the two outcome probabilities . They realized, however, that unobservable nuisance variables relating to such things as personality and family background are likely to exert influence on the employment outcome probabilities and the likelihood of alcohol abuse. For this reason, they implemented a conventional estimation method that, although designed to deal with bias due to the unobservables, does not take account of the fact that the outcome variables are probabilities whose values must be between 0 and 1. Using data from the Alcohol Supplement of the 1988 National Health Interview Survey, Mullahy and Sindelar find no statistically significant treatment effects on either of the outcome probabilities. I obtained their data and applied my method which, in addition to dealing with biases due to unobservables, takes explicit account of the inherent restrictions on the outcome variable. I found no alcohol abuse treatment effect on the probability of being unemployed vs. out of the labor force. I did, however, find that alcohol abuse has a statistically significant and substantial effect on the probability of being employed vs. out of the labor force. For example, suppose an individual with given characteristics is not an alcohol abuser and has a .75 probability of being employed. My results imply that his chances of employment would drop to .16 if he were an alcohol abuser. The difference between this result and that of Mullahy and Sindelar, lies in the fact that my method takes account of the (0, 1) restriction on the outcome variable, while theirs does not.
Summary and Future Work
The lesson that I hope you will take from tonight’s talk can perhaps be summarized in the old adage often invoked by econometricians "correlation does not imply causation." In the context of the present discussion this translates to "simple correlation between an outcome variable and a treatment variable does not imply that the treatment variable is a potential policy instrument." The problem is that nuisance variables tend to cloud the issue. Therefore, use of treatment variables as policy instruments based on analyses that do not adequately deal with the influences of such nuisance variables will generally lead to incorrect policy predictions and unintended policy results. There are two types of nuisance variables – observables and unobservables. Regression analysis can be used to deal with observables. Unobservables are more of a problem. Pseudo experimentally generating data can be used to break the link between the treatment variable and the unobservable, as was done in the RAND Health Insurance Experiment. This approach can, however, be quite costly and is typically subject to practical and ethical constraints.
On the other hand, one can use nonexperimental (survey) data and an appropriate econometric technique to deal with the problem. This approach is appealing because data collection for retrospective samples is relatively inexpensive and there now exist good econometric techniques for avoiding biases due to unobservable nuisance variables. I have developed such a method which, in addition to dealing with such biases, accounts for inherent restrictions on the range of the outcome variable. We discussed two applications of this method. The first of these (the effect of physician advice on alcohol demand) demonstrated the importance of controlling for biases due to the unobservables. The second (the effect of alcohol abuse on employment) shows the importance of accounting for the range restriction on the outcome variable.
We began tonight’s talk with a discussion the importance of obtaining accurate estimates of the treatment effects of health insurance on utilization levels of the various health services. Such research is indispensable for the evaluation and comparison of proposed health care reforms. As a topic for future research, I plan to apply my new econometric technique in this context.