Probably Overthinking It: 2012

Wednesday, July 11, 2012

Secularization in America: part seven

Abstract

Based on 2000-2010 data from the General Social Survey (GSS), I present results of a logistic regression that measures the relationship between Internet use and religious affiliation, controlling for religious upbringing, income and socioeconomic index, year born (age), and education.

I find that moderate Internet use reduces the chance of religious affiliation by 2 percentage points (odds ratio 0.8); heavier Internet use reduces affiliation by an additional 5 percentage points (odds ratio 0.7). Four years of college reduces affiliation by an additional 2 percentage points (odds ratio 0.8).

All reported effects are statistically significant with N=8960 respondents.

Results of logistic regression can be difficult to interpret; it might help to imagine the following progression:

Start with a hypothetical baseline person raised in any religion, with moderate or high household income ($25,000 per year or more), born in 1960, with high school education but no college, and low Internet use (less than 2 hours per week): in the GSS survey, 91% of people in this category have a religious affiliation. Now we change one variable at a time.
If this person were born 10 years later (in 1970) the fraction would drop to 89%.
If this person went to college, the fraction would drop to 87%
If this person used the Internet 2 or more hours per week, the fraction would drop to 85%.
If this person used the Internet 8 or more hours per week, the fraction would drop to 80%.

Taken together, college education and Internet use are associated with a decrease in religious affiliation of 9 percentage points.

Introduction

From 1990 to 2010 the fraction of Protestants in the U.S. population dropped from 62% to 51%; at the same time the fraction of people with no religious preference increased from 8% to 18%. The following graph shows these trends:

In a previous article I presented evidence that something happened in the 1990s, continuing through the 2000s, that is causing disaffiliation from religion across all generations, with the largest effect on the youngest generations in the survey, people born in the 1960s and 1970s.

There are many possible explanations, but for me, the Internet pops to the top of this list. First, the timing is at least approximately right. Here is data from the World Bank, showing number of Internet users per hundred people in the U.S.

Internet use increased rapidly from 1995 to 2010, which is the interval of steepest change in religious affiliation.

Regressions

To identify factors that contribute to disaffiliation, I ran logistic regressions with the following dependent variable:

has_relig: 1 if the respondent reported any religious affiliation when interviewed as an adult, or 0 if the respondent reported "None" (based on the GSS variable RELIG)

And these explanatory variables:

had_relig: 1 if the respondent reported being raised in a religion, 0 otherwise (based on RELIG16)

born_from_1960: year the respondent was born minus 1960 (based on AGE and survey year). Subtracting 1960 makes it easier to interpret the results of the regression.

educ_from_12: number of years of school completed, minus 12 (based on EDUC).

somewww: 1 if the respondent reported using the Internet 2 of more hours per week, 0 otherwise (based on WWWHR, with the threshold chosen near the median)

heavywww: 1 if the respondent uses the Internet more than 8 hours per week, 0 otherwise (threshold chosen near the 75th percentile)

SEI: Socioeconomic index (a measure of occupational prestige developed by the GSS).

high_income: 1 if the respondent reports annual household income of $25,000 or more, which includes 62% of respondents who answered the question.

I used data from GSS survey years 2000, 2002, 2004, 2006, and 2010 (the relevant questions were not asked in 2008). I excluded respondents who were not asked or did not answer one or more of the questions I used in my analysis.

It turns out that SEI does not make a contribution that is either statistically or practically significant, so I omit it from the model.

Here are the results of the model as reported by R:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.164434 0.094978 -1.731 0.0834 .

had_relig 2.318141 0.087372 26.532 < 2e-16 ***

high_income 0.166673 0.072345 2.304 0.0212 *

born_from_1960 -0.020161 0.002128 -9.474 < 2e-16 ***

educ_from_12 -0.051850 0.012228 -4.240 2.23e-05 ***

somewww -0.178409 0.078490 -2.273 0.0230 *

heavywww -0.336658 0.080546 -4.180 2.92e-05 ***

---

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 7860.3 on 8959 degrees of freedom

Residual deviance: 6872.5 on 8953 degrees of freedom

AIC: 6886.5

Number of Fisher Scoring iterations: 5

All explanatory variables are statistically significant: high_income and somewww are borderline, both at p=0.02.

The odds ratios and cumulative probabilities are:

odds cumulative

ratio probability

(Intercept) 0.85 46

had_relig 10.16 90

high_income 1.18 91

born_from_1960 0.82 89

educ_from_12 0.81 87

somewww 0.84 85

heavywww 0.71 80

These results are summarized and interpreted in the Abstract, above.

Discussion

As always, statistical association does not prove causation, but in this case I think there are reasons to believe that Internet use causes disaffiliation from religion:

It is easy to imagine how Internet use could allow a person in a homogeneous community to find information about people of other religions (and none), and to interact with them personally. And there is anecdotal evidence that those interactions contribute to religious disaffiliation (for example, numerous personal reports on reddit.com/r/atheism).
Conversely it is harder to imagine plausible reasons why disaffiliation might cause increased Internet use (except possibly on Sunday mornings).
Although it is possible that a third factor causes both disaffiliation and Internet use, that factor would also have to be new, coincidentally rising in prevalence, like the Internet, during the 1990s and 2000s.
Whatever causes disaffiliation has the strongest effect on the youngest generations, which is consistent with the hypothesis that Internet use during adolescence and young adulthood has the strongest effect on religious affiliation.

So with appropriate caution, I think there is a strong case here for causation, and not just statistical association.

Furthermore, the magnitude of the effect is large enough to explain a substantial part of the observed changes in religious affiliation. In my next article I will incorporate this regression model into the generational model I presented in Part Six, in order to estimate the effect of Internet use on these trends.

Summary of previous reports

In Part One I described trends in market share of major religions in the U.S.: since 1988, the fraction of Protestants dropped from 60% to 51%, and the fraction of people with no religious affiliation increased from 8% to 18%.

In Part Two I used data from the 1988 General Social Survey (GSS) to model transmission of religion from parent to child, and found that the model failed to predict the decrease in Protestants and the increase in Nones that occurred between 1988 and 2010.

In Part Three I looked at changes, between 1988 and 2008, in the spouse tables (which describe the tendencies of people to marry within their religions), the environment table (which describes parents' decisions about their children's religious upbringing), and the transmission table (which describes the likely outcomes for children raised within each religion). I found that the transmission table has changed substantially since 1988, and accounts for a large part of the observed increase in Nones, but not the decrease in Protestants.

In Part Four I looked at changes in religiosity over the lifetime of respondents. I tentatively concluded that the differences between generations were larger than changes in affiliation, within generations, over time.

But in Part Five I looked more closely and saw that all generations were becoming more religious, or staying the same, prior to 1990, and that all generations began to disaffiliate during the 1990s, continuing into the 2000s.

In Part Six I presented a generational model that retroactively "predicts" the changes we have seen since 1988, and used it to predict how those changes are likely to continue in the next 30 years. I expect the fraction of Protestants to continue to decrease, and the fraction of Nones to increase and overtake Catholic as the second-largest affiliation by 2030.

Tuesday, July 10, 2012

Secularization in America: part six

Summary so far

Generational Model

Now I am ready to get back to the generational model I have been working up to. The goal of the generational model is to separate these three effects:

Changes in religious preference from one generation to the next.
Changes in religious affiliation over the lifetime of respondents.
Changes in the composition of the GSS cohort over time.

The model works by simulation. Assuming that we are starting in 1988, here are the steps:

Read the survey data from 1988 and resample it. Compute and store the distribution of ages.
For each respondent, generate a hypothetical child. Use the BirthModel to determine year of birth, the UpbringingModel to determine what religion the child is raised in, and the TransmissionModel to determine what affiliation the child will have as an adult. Details of these models follow.
Form a combined cohort of parents and simulated children. Since the cohort of parents is a representative sample of the US population, the cohort of simulated children is a representative sample of the population one generation later (based, for now, on the simplifying assumptions that all groups have the same number of children on average, and there is no immigration).
In order to generate a cohort from a future survey year, draw a sample from the combined cohort, weighted so that the distribution of ages in the future year is the same as the original distribution of ages in 1988. As the simulation goes forward in time, this generated cohort contains fewer of the parents and more of the simulated children. After 20 years, about 25% of the "real" respondents have been replaced with "fake" respondents.

Now, where do all these auxiliary models come from?

BirthModel: This is just the distribution of parent's age when each child is born. It is based on data from the 1994 GSS, which includes questions about children. I had to do some work to correct for an obvious bias due to the ages of the respondents; I will skip the details here.

UpbringingModel: This is a combination of the SpouseTable and the EnvironmentTable, described in Part Three. It is a map from the parent's religion to the distribution of possible religions the child might be raised in.

TransmissionModel: This is the TransmissionTable described in Part Three. It is a map from the religious environment of the child to the distribution of religious affiliation reported by the child as an adult.

The Upbringing and Transmission models come in two flavors:

Time invariant: We use all respondents to estimate the parameters of the model, and apply the same model to generate all simulated children.

Time variant: We estimate different parameters for each generation (partitioned by decade born) and use different models to generate simulated children, depending on what year they are born.

For the time variant model, we have to extrapolate from observed data into the future. To keep this simple we copy the latest reliable data (based on sample size) and apply it to people born in later decades.

Ok, that's enough methodology for now. Let's take a look at some...

Results

The first step is to validate the model by showing that it can predict the observed changes using past data. Here I mean "predict" in a peculiar sense, which is that I will use the entire dataset (including data after 1988) to build the auxiliary models, then use the simulator to generate trends from 1988 to 2010.

Here is what the results look like:

The thick lines are the observed data; the thin lines are simulations. Here are my observations:

For Jews and Catholics, the observed data falls within the bounds of the simulations, so the model validates.
For Other, the observed data sometimes exceeds the bounds of the simulations, which may be due to immigration (not included in this model).
For None, the observed data is at the high end of the range, and for Prot it is at the low end of the range. This is most likely due to the disaffiliation we saw in Part Five, which is only partly captured in this model.

I conclude that the model is capturing a large part of the observed changes since 1988, but of course I am cheating by using data from after 1988. So these results validate my modeling decisions (what to include and what to leave out) but they don't test the predictive power of the model.

Predictive power

To make an honest test, we have to restrict ourselves to data from before 1988. That way we can tell what part of the observed changes would have been predictable in 1988.

Here's what the result looks like:

So if we had used this model in 1988, we would have predicted a small decrease in the fraction of Protestants and a small increase in None, but we would have underestimated both trends.

This supports my conclusion in Part Five that something happened in the 1990s that changed trends in religious affiliation, and suggests that these changes were unpredictable based on data observable before 1988.

Predictions

Finally, we can use all data to build the models, use 2010 as the starting place for the simulations, and make some predictions for the next 30 years:

So what should we expect?

The decline in fraction of Protestants will continue. The fraction of Catholics will also decrease, but more slowly.
The fraction of Nones will increase, overtaking Catholics as the second-largest religious affiliation around 2030.
The fraction of Others will increase slowly, about 1 percentage point in 30 years. If immigration from Asia continues at current rates, that would add another percentage point, bringing the total to 6%.
The fraction of Jews will decrease, possibly by half by 2040.

These predictions are likely to be conservative; that is, the rate of secularization will almost certainly be faster. Why?

Over the last several generations, the UpbringingModel and the TransmissionModel have changed substantially. Parents are less likely to raise their children with religion, and those children are less likely to adopt the religion they are raised with. The model captures these trends, but assumes that they will level off in 2010. It would probably be more accurate to assume that they will continue.
Rates of disaffiliation among adults are also increasing. Again, the model includes trends that have already occurred, but it assumes that they will level off rather than continue.

So there are reasons to expect the fraction of Nones to accelerate.

Conversely, it is hard to imagine that the trends will be any slower than these predictions. To a large extent, these results are not predictions about things that will happen in the future; rather, they are the future consequences of things that have already happened. For example, in 2020, the GSS survey will include a cohort of people in their 40s. What will they be like? They will be a lot like the people in the 2010 survey who are in their 30s. But they will be older. Changes in the general population are slow because is takes a long time to replace each generation with the next; but as a result, they are also predictable.

Next time: Was Rick Santorum right? Is college the #1 enemy of religious belief? (Hint: no.) I will look more closely at the TransmissionModel to see what factors make vertical transmission of religion more (or less) likely.

Monday, July 9, 2012

Secularization in America: part five

Summary so far

Part Four revisited

In Part Four I looked at changes in religiosity over the lifetime of respondents. The GSS is not a longitudinal survey, so we can't follow individuals, but we can follow generations (which I partition by decade of birth) over time.

Last time I presented this figure, which shows religiosity (the fraction of respondents with any religious preference) as a function of respondent's age, partitioned by decade of birth, for people who were raised Protestant:

Each line represents a different generation. For example, the red line shows that people born in the 1920s were about 96% likely to report a religious preference when they were interviewed in their 40s, 50s, and 60s, and possibly less likely to be religious when they were in their 80s.

The conclusion I drew from this figure is that the differences between generations are larger than the changes, over time, within each generation. For purposes of modeling I concluded that religious disaffiliation accounts for only a small part of the observed changes in religious identity.

But I was bothered by one feature of these curves: many of them are concave down, and the maximum point in the curves is apparently shifting toward younger ages. I came to suspect that this picture of the data is "out of focus".

We can refocus the image by plotting the date of the survey (rather than the respondent's age) on the x-axis. Here's what that looks like:

In this figure, two trends are more apparent: before 1990, most generations were becoming more religious; after 1990, they all became less religious. So it seems clear that the explanation is something that affected all generations at a particular interval in time, not something that affects all people as they age.

We can see these changes more clearly by normalizing each curve with its 1990 value:

Again, most generation were becoming more religious before 1990; after 1990, all of them became less religious. Among people born in the 1960s, more than 10% lost their religion between 1990 and 2010 (when they were in their 30s and 40s).

Here's the same graph for people raised Catholic:

The general shape is the same: religious affiliation was flat or increasing prior to 1990, and decreasing for almost all generations after 1990.

Since the trends are similar for Catholics and Protestants, we can get a less noisy picture by combining them. Here is the same graph for respondents raised with any religion.

This figures makes it easier to compare across generations. It appears that more recent generations (born in the 1960s and 1970s) are disaffiliating at higher rates than earlier generations.

[As an aside, this result contradicts one of the primary (and widely-reported) claims of this article: Schwadel, Period and Cohort Effects on Religious Nonaffiliation and Religious Disaffiliation. Schwadel reports that people born in the 1960s and 1970s were disaffiliating at a slower rate than the previous generations. Some reasons my results might be different: Schwadel only had GSS data up to 2006, and he discards people under 30 years of age. So very little data about the youngest generations is included. Also, his results are based on statistical models that (if I understand correctly) don't include time as an explanatory variable, so they cannot account for an event that affects all generations during a particular interval.]

All right, it's audience participation time. What happened in the 1990s that caused widespread religious disaffiliation? Remember, idle speculations only. No evidence, please!

Thursday, June 28, 2012

Secularization in America: part four

Summary so far

Religiosity curves

Respondents in the GSS are surveyed at different ages, so we can get a sense of when people lose their religion (or acquire one). I collected all GSS respondents and partitioned them by the religion they were raised in and the decade they were born. For each of these subgroups, I plotted religiosity (the fraction with some religious preference) as a function of age when surveyed.

Here are the curves for people raised Protestant:

In the top right, we see that people born between 1900 and 1910 and raised Protestant were likely to be religious when they were interviewed in their 70s and 80s. In the lower left, we see that people born in the 1980s were less likely to be religious when they were interviewed in their 20s.

For the middle generations, we have a better sense of changes in religiosity over a respondent's lifetime. Several of the curves have an apparent peak in middle age; if this apparent effect is real, the location of the peak may be shifting left.

Overall, these curves are relatively flat, which suggests that respondents are not changing substantially after adulthood (everyone in the GSS is 18 or older).

The curves for Catholics are similar:

Again, there is a substantial differences between generations, but within each generation, little change over the respondents' lifetimes. People born in the 50s, 60s and 70s might be leaving the church as they age, but it is hard to tell in this plot whether these trends are statistically significant.

Finally, here are the curves for people raised with no religion:

There are only enough respondents in this category to plot curves for a few generations, and even then, the curves are noisy. Not surprisingly, people raised without religion are less likely to be religious, and recent generation are less religious than their elders. Again, the curves are generally flat, suggesting that people generally do not change religious affiliation as adults.

A possible exception is that people born in the 1970s and raised without religion might be finding religion in their 30s. But this data point is based on a small number of respondents, so it is probably too early to tell.

Why people switch

In 1988 the GSS asked respondents questions about changes in religious affiliation and the reasons for the change. Unfortunately, it looks like this data won't do me much good, because:

In many cases where a respondent switched from a religious preference to None, they were not asked why.
There are so many inconsistencies in the data, I wonder if it might have been mangled.
Because these questions were only asked once, we can't track trends.

So that's disappointing.

Modeling a mixed-age cohort

One of the challenges of working with GSS data is that the respondents each year are a mixture of people of all ages. From year to year, the oldest generation drops out of the cohort and the youngest generation joins the mix.

So when there is a trend from each generation to the next, as with religious behavior, there is a lag before the trend appears in a GSS time series, and the slope of the trend is much slower.

However, for purposes of prediction, this lag is actually useful. For example, 18 years after a baby boom, there is likely to be a spike in college enrollment; that's not really a prediction about the future; it's just a consequence of something that has already happened.

Similarly, we already know what most of the GSS cohort will look like next year. It will look like the cohort this year, one year older. The difference is that a few of the oldest respondents are replaced by the next group of 18 year olds.

In the 2010 cohort, the age range is roughly 20-80. To predict the 2020 cohort, we can:

Remove respondents older than 80.
Age the rest of the respondents by 10 years.
Add a new batch of respondents in their 20s.

Step 2 might be hard if people were changing religious affiliation as they age, but as we saw above, they generally do not. Step 3 is harder, but there are two reasonable options:

Conservatively, we can assume that the next generation will be like their immediate predecessors.
Alternatively, we can extrapolate from current trends. This option is probably better for prediction, but in some ways unsatisfying because it does not explain the cause of the trends, or why we should expect them to continue.

If we use this method to predict 20 years into the future, we replace about 25% of the cohort with simulated respondents. But since 75% of the prediction is based on simple population aging, it is likely to reliable.

Tuesday, June 26, 2012

The falling slinky problem

Let's take a break from statistics and do some physics!

My friend Ted Bunn recently wrote about the falling slinky problem in his blog. He points to this video, which shows a falling slinky in slow motion. After the top of the slinky is released, the bottom seems to hover until the top reaches it. The effect is particularly strange because if you look carefully, the top of the slinky does not accelerate as we expect for an object in free fall. Rather, it falls at a constant rate.

Ted explains:

...the information that the top end has been dropped can’t propagate down the slinky any faster than the speed of sound in the slinky (i.e., the speed at which waves propagate down it), so there’s a delay before the bottom end “knows” it’s been dropped. But it’s surprising (at least to me) to see how long the delay is.

This explains why there is a delay, but to me it doesn't explain why the delay is the same as the time it takes for the top of the slinky to reach the bottom. There are lots of models out there that explain parts of this behavior, but the ones I found are either complicated or wrong.

Here's my take on it. First, let's assume that what we see in the video is correct: the slinky collapses from top to bottom, so that each coil doesn't move until the one above it comes down and (nearly) hits it.

Let's call the initial length L and the mass m. After some time, a fraction of the slinky, x, has collapsed. At that point, the collapsed part of the slinky has mass xm at height (1-x)L. The rest of the slinky is spread uniformly [EDIT: this assumption is not right...see Ted's comment below] between height 0 and (1-x)L. So the center of mass is

x(1-x)L + (1-x)(1-x)L/2

Since the slinky is in free fall, we know the center of mass as a function of time:

L/2 - g/2 t^2

If we set those equal and type them into WolframAlpha, we get

x = sqrt(g/L) t

Which means that the top of the slinky is moving at constant speed. Remember that x is the fraction of the slinky that collapsed; to get the distance traveled, we multiply by L:

d = xL = sqrt(gL) t

So the speed of the top of the slinky is sqrt(gL).

We can get to the same result a different way by using the formula for wave speed in a vibrating string: sqrt(T/μ), where T is tension and μ is mass per linear measure. In this case T=mg and μ=m/L. Plug that in and get wave speed sqrt(gL).

I think this analysis is useful, but to be rigorous, I haven't really explained why the slinky behaves the way it does. I have only shown that if the slinky collapses from top to bottom (as it appears to), then the top moves at a constant speed (as it appears to).

[UPDATE: Provoked by my amateurish attempts at Physics, Ted Bunn wrote up a version of this model that deals correctly with the change in the density of the spring from top to bottom. The result is that the speed of the top of the slinky is almost constant -- it slows down a bit at the end. ]

Friday, June 22, 2012

Secularization in America: part three

The spouse tables are based on the parents of 1988 respondents. People from later generations might be increasingly likely to marry outside their religion.
The environment table is also based on the previous generation; again, later parents might be making different decisions about the religious environment of their children.
The transmission table is based on 1988 respondents; it's possible that after 1988, children were less likely to adopt the religion they were raised in. Anecdotally, the culprits most often blamed for this effect are college and the Internet.
Finally, I have not considered adult conversions from one religious identity to another. The GSS has data on these switches, so I could add them to the model.

I will investigate each possibility in turn, starting with the prevalence of mixed-religion marriages. In Secularization, Steve Bruce presents results from a study of intermarriage in the UK that found that the rate of vertical transmission:

"is halved if the parents are of different faiths (even when the differences are just Methodist-Anglican). Even if the parents agree on which faith they wish to pass on, the fact of disagreement makes the child aware that there are good people in other churches and introduces the relativism that weakens conviction. [page 71]"

So if the rate of mixed marriages is increasing, that could contribute to the increasing number of Nones.

To measure this effect, I used these GSS variables:

RELIG: What is your religous preference?
SPREL: What is your husband's/wife's religious preference?

In cases where one partner converts to the other's religion before marriage, that would count (for this model) as a same-religion marriage, since we are interested in the decision the couple makes about the religious environment they raise children in.

The Spouse Tables

The following graph shows the fraction of same-religion marriages over the history of the survey (data for SPREL were not collected every year):

Before 1988, the fraction of same-religion marriages was around 84%; after 1988 it fell to 78%. The abruptness of the change makes me worry that it may be an artifact; for example, a chance in the wording of the question. Also, these results only include respondents who are married, so they are biased toward older people and socio-economic groups that are more likely to be married.

But as it turns out, even if we take the data at face value, it has a small effect on the model's predictions.

I used the respondents from 2004-2010 to build spouse tables for men and women (see Part Two), then ran the 1988 model again with the anachronistic data. The results are almost identical to what we saw last time:

The only noticeable effect is that the prediction for Other got worse. I conclude:

It's possible that people are more likely now to marry outside their religion than in 1988, but the difference is small, and
Even if we cheat by using the 2004-2010 data in 1988, this change does not explain the subsequent changes in the fractions of Protestants and Nones.

The Environment Table

It seems unlikely that parents now are making different decisions about what religious environment to raise their children in, but just to rule it out, I compared the environment tables for 1988 and 2008.

prot cath jew other none change N excess

prot-prot 97 1 0 0 2 +1 634 8.7
prot-cath 43 46 0 1 10 +4 49 1.9
prot- jew 0 0 0 0 0 +0 1 0.0
prot-othe 44 21 0 36 0 +0 5 0.0
prot-none 85 4 0 0 11 +5 90 4.2
cath-prot 30 56 0 0 14 +14 37 5.1
cath-cath 1 98 0 0 1 +0 294 0.8
cath- jew 0 0 0 0 0 +0 0 0.0
cath-othe 0 0 0 0 0 +0 2 0.0
cath-none 2 82 0 0 16 +7 20 1.5
jew-prot 0 0 0 0 0 +0 1 0.0
jew-cath 0 100 0 0 0 +0 0 0.0
jew- jew 0 0 96 0 4 +0 27 0.0
jew-othe 0 0 0 0 0 +0 0 0.0
jew-none 0 0 67 0 33 +33 0 0.0
othe-prot 27 0 0 54 18 +18 4 0.7
othe-cath 0 43 0 57 0 +0 3 0.0
othe- jew 0 0 0 0 0 +0 0 0.0
othe-othe 7 0 0 84 9 +3 28 0.7
othe-none 25 25 0 25 25 +25 4 1.0
none-prot 83 0 0 0 17 +17 6 1.0
none-cath 14 68 0 0 18 -82 2 -1.6
none- jew 0 0 100 0 0 +0 0 0.0
none-othe 0 0 0 0 0 -100 1 -1.0
none-none 23 4 0 0 74 +18 34 6.1

The left column is the mother's-father's religion. The next five columns show the religious environments those parents chose, as reported by their children in 2008. For example, the second row shows that if the mother is Protestant and the father Catholic, 43% of the children were raised Protestant, 46% Catholic, and 10% None.

The next column shows the change in the None column, in percentage points, since the 1988 survey. N is the number of families in 1988 that fell into each category. Finally, Nones is the product of change and N, an estimate of the number of additional Nones in the 1988 survey that could be explained by changes in the environment table. The total of this column is 29, which is not nearly enough to explain the actual excess of 177.

Of course, most of the numbers in the change column are based on small samples, so we should not take them too seriously. By running simulations with resampled survey data, we can take account of these sample sizes.

Using the tables from 1988 to predict the fractions of Nones in 2008, we expect only 8.0% (compared to the actual 16.8%). If we used the environment table from 2008, the prediction goes to 8.5%. If we also use the spouse table, it goes up to 8.7%. So clearly the changes in these tables were not enough to explain the observed changes.

The transmission table

The transmission table is a cross-tabulation of the religion the respondent was brought up in and the religion reported when surveyed. It shows the outcome, after some years, of parents' decisions about their children's religious upbringing and the effect of the environment.

The following is the transmission table for 2008, with changes since 1988:

prot cath jew other none change N excess
prot 82 3 1 2 13 +7 951 67.7
cath 17 70 0 2 12 +6 414 24.6
jew 9 2 73 1 14 +9 31 2.7
other 12 2 0 75 10 +1 31 0.4
none 31 5 1 2 62 +5 53 2.6

Each row corresponds to a religious upbringing; each column shows a possible outcome. For example, the first row shows that of children raised Protestant, 82% report that their religious preference is Protestant, and 13% report None. The fraction of Nones has increased 7 percentage points since 1988. Since there are 951 people in this row, this increase accounts of 68 excess Nones in the 2008 survey.

Overall, the changes in the transmission table account for 98 excess Nones, which is a little more than half of the observed increase.

If we run the simulations again, applying the transmission table from 2008 in 1988, we get the following predictions:

The prediction for Nones is better, but it's clear that this model still misses the mark: it predicts that the fraction of Catholics should be going down, and fails to predict the decrease in the fraction of Protestants.

The problem is that I am treating everyone interviewed in 1988 as a cohort, but they represent people of all ages, who were raised in different environments. Also, I am using data from 2008 to predict what will happen in 2008, so I have got away from the original goal, to see whether the changes that occurred between 1988 and 2008 could have been predicted in 1988.

However, this model has given me some leads. It looks like a large part of the increase in Nones is due to changes in the transmission table, possibly a small part due to the environment table, and little or none due to the spouse tables.

Next time I will present a different model that reorganizes respondents into cohorts by age of birth, which will make it possible to compare people raised over the same time span. It will also allow me to look for trends that began prior to 1988.

Thursday, June 21, 2012

Secularization in America: part two

In Part One I described some trends in market share of the major religions in the U.S.; in particular, since 1988, the fraction of Protestants dropped from 60% to 51%, and the fraction of people with no religious affiliation increased from 8% to 18%.

I would like to know if something happened after 1988 to cause these changes, or if they could have been predicted based on patterns occurring before 1988. As a first step, I will use data from 1988 to model vertical transmission (from parent to child) and see if it predicts the observed changes

My model of vertical transmission works like this:

Each respondent chooses a spouse,
Each pair decides what religion to bring their children up in,
Each child chooses a religion.

I model each step of this process using data from the General Social Survey (GSS); specifically, I used these variables.

RELIG: What is your religous preference?
RELIG16: In what religion were you raised?
MARELIG: What was your mother's religious preference when you were growing up?
PARELIG: What was your fathers's religious preference when you were growing up?

The first two questions were asked every year, but questions about parents' religion were only asked in 1988 and 2008. I will use the data from 1988 to build and validate models, then use the data from 2008 to make predictions.

I used MARELIG and PARELIG to build two "Spouse tables", one for men and one for women. Here is the table for men:

Spouse Table (men)

	prot	cath	jew	other	none
prot	93	6	0	0	1
cath	14	85	0	0	1
jew	4	0	96	0	0
other	6	4	0	90	0
none	59	13	0	3	24

Each row indicates the religion of a male respondent; each column is the religion of a possible spouse; the numbers are percents. For example, the first row indicates that 93% of male Protestants married other Protestants, and another 6% married Catholics.

Here is the spouse table for women:

Spouse Table (women)

	prot	cath	jew	other	none
prot	82	7	0	0	12
cath	10	85	0	0	5
jew	4	0	96	0	0
other	8	3	0	74	15
none	13	7	0	0	80

In general, women are more likely to marry out of their religion than men, but still the great majority marry a co-religionist. One asymmetry is apparent: men with no religion seldom marry another None (24%), but women with no religion usually do (80%). This effect is partly due to the gender gap: 11% of male respondents are Nones, but only 5% of the women are (there is a similar, possibly smaller, gender gap in the CIRP data).

Once the respondents have paired up, they decide what religion to raise the children in. The following table shows results from the 1988 data. The rows enumerate all pairs of mother's and father's religion; the columns indicate the religious environment they chose. For example, the second row indicates that if a Protestant woman marries a Catholic man, they raise the children Protestant 58% of the time, Catholic 36% of the time, and None 6%.

Environment table

parents	prot	cath	jew	other	none
prot-prot	99	1	0	0	1
prot-cath	58	36	0	0	6
prot-jew	100	0	0	0	0
prot-other	100	0	0	0	0
prot-none	89	4	0	1	7
cath-prot	39	61	0	0	0
cath-cath	1	99	0	0	0
cath-jew	0	0	0	0	0
cath-other	100	0	0	0	0
cath-none	17	69	0	6	8
jew-prot	0	0	100	0	0
jew-cath	0	0	0	0	0
jew-jew	0	0	96	0	4
jew-other	0	0	0	0	0
jew-none	0	0	0	0	0
other-prot	60	0	0	40	0
other-cath	0	0	0	100	0
other-jew	0	0	0	0	0
other-other	7	2	0	89	2
other-none	33	0	0	67	0
none-prot	100	0	0	0	0
none-cath	0	0	0	0	100
none-jew	0	0	0	0	0
none-other	0	0	0	0	0
none-none	40	4	0	0	56

One surprise in this table is the last row: when two people with no religion marry, 40% of the time they apparently choose to raise their children Protestant. This seems unlikely, but there are several possible explanations: (1) the parents might have chosen to raise their children in the prevalent religion of their community, (2) a respondent might not have been raised by his parents, (3) a respondent might not be reporting his parents' religion accurately. For purposes of modeling I take these responses at face value.

Children raised with a religion usually adopt that religion, but not always. The following "transition table" shows possible outcomes for each religious environment. For example, 89% of respondents who say they were raised Protestant also report that their religious preference is Protestant, but 3% are Catholic and 6% have no religious preference. More people convert from Catholic to Protestant than the other way around.

Transition table

	prot	cath	jew	other	none
prot	89	3	0	1	6
cath	11	83	0	0	6
jew	0	0	95	0	5
other	5	3	0	83	9
none	32	11	0	0	57

As expected, the majority of people raised with no religion report no religious preference, but 32% of them identify as Protestant and 11% identify as Catholic. I found that surprising. I will look more closely later, but for now, again, I will take it at face value.

Finally, we can combine these results into a single "Generation table" that shows the transitions from one generation to the next. I ran simulations with following steps.

For each respondent, choose a spouse's religion from the Spouse Table.
For each parent pair, choose a religious environment from the Environment Table.
For each hypothetical child, choose a religious identity from the Transition Table.
For each parent-child pair, make an entry in the Generation Table, below.

Since this computation is based on random simulations, it varies from run to run, but here is a typical outcome:

Generation table

	prot	cath	jew	other	none
prot	86	6	0	1	7
cath	19	72	1	1	7
jew	0	0	95	0	5
other	29	10	0	55	6
none	67	9	0	1	23

Assuming that a generation time is about 22 years, we can use this model to predict the distribution of religions in 2010 (using only data from 1988). This figure shows the actual time series and the model predictions for each group:

On the right side of the plot, the vertical lines show the 90% confidence interval; the boxes show the mean of 20 simulation runs. [One technical note: each simulation is based on tables from resampled survey data, so the confidence intervals reflect both the sampling error of the survey and random variation of the simulations.]

The actual values for Catholics, Jews and Other fall within the prediction intervals, but the model fails to predict the decrease in Protestants or the increase in None.

So, what's missing from this model that could account for the observed changes?

The spouse tables are based on the parents of 1988 respondents. People from later generations are increasingly likely to marry outside their religion.
The environment table is also based on the previous generation; again, later parents might be making different decisions about the religious environment of their children.
The transition table is based on 1988 respondents; it's possible that after 1988, children were less likely to adopt the religion they were raised in. Anecdotally, the culprits most often blamed for this effect are college and the Internet.
Finally, I have not considered adult conversions from one religious identity to another. The GSS has data on these switches, so I could add them to the model.

Over the next few installments, I will investigate each of these factors to see which, if any, account for the observed changes.