I saw a group of young adults outside the grocery store today smoking near the doorway. That experience made me wonder what demographic variables might be associated with this known behaviorial risk. The database of choice for this research project is the Behavioral Risk Factor Surveillance System (BRFSS), an ongoing data collection program designed to measure behavioral risk factors for the adult population. Fortunately, I had previously downloaded the dataset for another research project.
Data
The variables of interest in this study are:
- Smokers, the dependent variable, measured 1 = Yes, 0= No
- Race, a categorical variable, capturing race in two categories: white and non-white
- Education, a categorical variable with four levels of educational attainment: 1) Less than high school, 2) high school diploma, 3) attended college and 4) college degree.
- Income, a categorical variable capturing five levels of income: 1) Less than $15,000, 2) $15,000 to $25,000, 3) $25,000 to $35,000, 4) $35,000 to 50,000 and 5) $50,000 or more
- Age, a continuous variable, capturing the person’s age in years.
The number of observations with complete information equals 380,136.
Statistical Tool
As the dependent variable is a binary, categorical variable and because we are interested in predicting the probability of smoking, the statistical tool of choice is logistics regression. Smokers was regressed on race, education, income and age.
Findings
The model is statistically significant (Prob > chi2 = 0.000) as well as all covariates ( P>|z| = 0.000). To depict the findings I will present a series of graphs which demonstrate the probability of smoking based on the predictor.
A. Education
It’s observed that as age and educational attainment increase the probability of smoking decreases with the differences in probability narrowing with age.
B. Income
As age and income increase, smoking probabilities decrease. It’s also observed that probability differences narrow as age increases.
C. Race
Whites exhibit greater probabilities of smoking then non-whites throughout the age distribution, narrowing as age increases.
D. Ideal Types
Using Max Weber’s concept of an ideal type - an analytical construct that provides an opportunity to make comparative observations - I calculated two ideal types for two individuals at age 40 with the best and worst characteristics associated with chances to become a smoker:
Individual 1 (All the worst characteristics associated with smoking) : Age 40 with less than a high school diploma, an income less than $15,000 and white.
Individual 2 (All the best characteristics associated with not smoking): Age 40 with a college degree, an income in excess of $50,000 and is a non-white.
Individual 1 has a 47% probability of becoming a smoker, compared to individual 2′s probability of only 7%.


