Campaign Donations and Voting Behavior in Congress
In the labyrinth of American politics, money talks. But how loud does it speak, especially when it comes to influencing the votes of congressmen?
This question was part of my capstone project, taking me on a deep dive into the murky waters of political contributions and their impact on legislative decisions between 2006 and 2012, the interval of time containing the Great Recession. This project was done with Brandon Escobar, Joseph Hudson, and Surya Venkatraman, fellow statisticians. Here comes the intersection of finance, politics, and data in today’s Venn Datagram!
Motivation and Scope
Our team was intrigued by the extent to which political contributions from various sectors sway congressional voting patterns. Specifically, we wanted to determine if these contributions influenced congressmen to vote against the majority of their political party on bills and joint resolutions. To unravel this, we relied on datasets from Adam Bonica, a political science researcher at Stanford University.
Data — The Building Blocks
In the last few years, Bonica’s research has focused on working with data related to campaign finances. He appealed to the data-minded people through his creation of the DIME database, which stands for Database on Ideology, Money in Politics, and Elections. It contains data on campaign donations for federal elections, along with information on the demographics of congressmen.
Additionally, he created a database called DIME PLUS, which includes the voting records of congressmen. These are both great sources of information for federal campaigns and measuring results.
I liked this data, but one additional thing that I found fascinating was donations from companies. Bonica did too, and published a paper on that. This included a database related to Fortune 500 companies and their donation records, which made for an even better project.
The analysis relied on merging three key areas of data:
- Demographic Information: Details about congressional candidates from 1979 to 2022, filtered to include only those who won their elections between 2006 and 2012.
- Congressional Bills and Votes: Information on bills, joint resolutions, and voting records from 2006 to 2014.
- Political Contributions: Data on contributions from Fortune 500 companies to congressmen within the specified period.
Methodology
Our analytical journey began with selective filtering. We considered only the votes of Republicans and Democrats and focused on yes or no votes, excluding abstentions. To gauge whether a congressman voted against his party’s majority, we created a Vote.Against.Majority label by comparing individual votes to the party’s majority stance on each bill. Majority stance was measured by a majority Yes or No by members of that party. The reason we did not include Independent votes was because we could not organize a group for every bill to compare the individual vote to, meaning we could not determine if a vote was against a majority.
Another essential aspect was the timing of donations. We used donations from the previous election cycle to predict voting behavior in the following congressional session. An election cycle was defined as the year and previous year of donations for an election. For example, the election cycle 2006 would contain donations from 2006 and 2005. This approach ensured that we captured the immediate impact of financial contributions, and accounted for the difference in terms for senators (6 years) and house members (2 years).
To address outliers in total donations, we performed a log transformation on donation amounts. This step was crucial for maintaining consistency in our linear models.
We then had to think of merging all of this together to make it workable for an analysis. Here is how we structured our data:
From bill and vote data, we took the bill, bill topic, bill vote, and created the vote against the majority label. We joined the congressman vote from there to the donation records for that congressman from the previous election cycle, and matched the bill topics to the company sectors. This meant that one record represented a single vote for a bill, and the donation information represented the total donations to that congressman from the previous election cycle and sector that aligned with the bill’s topic.
If we were looking at a vote from congressman Y on healthcare bill X from 2011, then one row of data represented the (logged) total donations from the healthcare sector to congressman Y from 2009 to 2010, for a vote on healthcare bill X.
We then connected this information to whether a congressman voted against the majority of his party.
In the analysis below, we looked at other variables, such as the House of Congress, and even broke company sectors further into their industries, but our main focus was always on the effect from total donations alone.
NOTE: I use total donations, TD, logged total donations, and log(TD) interchangeably in this article, but anytime I write any of these, I mean logged total donations.
Unearthing Patterns
Healthcare and Transportation — The Standouts
Political contributions seemed to affect some outcomes when looking at boxplots.


In Figures 1 and 2, the x-axes represent the logged transformation of total donations from the previous election cycle. The y-axes represent whether the congressman voted against the majority of his party (1) or not (0). This stark contrast between median logged donations from the healthcare and transportation sectors made us curious, especially since it only appeared for those two sectors.
Lasso Regression
We began to wonder whether total donations was even considered an important item to look at when making predictions on voting behaviors. We needed to see if after performing a variable selection, it was considered significant. We turned to Lasso Regression, not only for that selection, to gain an interpretation of donations as a coefficient and on lambda plots. Each model would separate the data by sector.
Lambda Plots
First, we were excited: the logged total donations variable (TD) was never 0 for each lambda plot at the specified lambdas (Fig. 3)! This meant that the penalized model still found some effect from donations. After further evaluation of these plots though, we ended up finding issues with TD.
In some plots, the coefficient ended up being negative, compared to the other variables. One thing that we also noticed through these plots was that being a Senator (seatTrue) was usually the biggest indicator of whether a candidate would vote against his party for each sector, rather than TD. In a future analysis, we would separate out bills and votes between the House of Representatives and the Senate.
Further Coefficient Interpretation
Regardless, we began to think about how to interpret the TD variable through these plots. Almost every other variable we used was a categorical variable, meaning there was a one time increase in probability by the record’s value (1 or 0) multiplied by the coefficient. For example, the coefficient value was 0.567 for genderM (congressman gender is male) (Fig. 4).
If the congressman that we are looking at is male, then the probability that he votes against the majority of his party increases by 0.567, holding other variables constant. We just add a number once to the probability.
We began to think “what would a one time increase in the effects from TD look like from a modeling perspective?” or “What did we expect the probability to increase by after TD?”. We decided to use the median total donation by sector, and see what the value was after multiplying it by the TD coefficient.
We compared the absolute value of each beta coefficient to the absolute value of the logged total donations, multiplied by the median logged donation. After this, we found that TD was one of the largest contributors to the probability. The only larger coefficients were the ones for Senator and, for the Transportation sector, donations from the trucking industry.
Logistic Regression
After Lasso Regression revealed that the total donation variables were significant for each sector after feature selection, we began to think about predictiveness. We turned to logistic regression and log odds ratios. If you are unfamiliar with interpreting logistic regression models and odds ratios, here is a great article for you!
For these models, we wanted to interpret the effects from donations alone, so that is the only coefficient that we include for each model. Additionally, each model is a single sector.
TD was marked as a significant coefficient for every logistic regression model at our threshold (p < 0.05). In the model above for the Healthcare sector (Fig. 6), the p-value was extremely low (<2e-16). This was the same for total donation coefficients for each sector’s model except for Environment, which was p = 0.01.
Interpreting Probabilities
Before thinking about odds ratios, we decided to see how reasonable our values were to reach certain probability thresholds. We compared the max donation amount for each sector to the amount required to reach a probability of 0.5.
In order to calculate what the predicted donation value would be at this point, we needed to find when the equation 𝛂 + 𝛃*log(TD) = 0. If you are confused, below is a breakdown of this logic in a formulaic manner (Fig. 7).

Below are the alpha values and the coefficients, which we substituted in (Fig. 8).
After inputting the alpha and beta values into the model, we compared that donation amount at π = 0.5 to the max donation amount by sector (Fig. 9).
Looking at the values, almost all of the donation amounts to reach a probability of 0.5 for voting against the majority were more than the sector’s max donation amount. The only one below that amount was for the healthcare sector. Transportation was closer than other sectors, but still above the max donation amount by a good mountain o’ money.
After inspecting the 50% probability mark and the max donation amount, we were curious about what probabilities certain percentiles of donations would actually produce in our model. We looked at the 50th and 80th percentile donation amounts, along with their probabilities (Fig. 10). One other item that we added was the dollars required to cause a percent change between the 50th and 80th percentiles.
The results were fascinating.
After looking at the donations required for each probability, the models did not seem useful for predicting the actual dollar amount required to get a congressman to vote against his party (especially based on the table showing donations for probability = 0.5); rather, these models better indicated how much impact each donation has by sector and allowed for comparison.
For example, there was about a $11 million difference between the 50th and 80th percentile donations in the energy sector, but the increase in probability only increases from around 12% to 13.2%, a measly 1.2%. For healthcare, the difference is about $9 million and the probability increases from 7.5% to 17.6%, which is over 10%. Similar differences in donation amounts ($11m and $9m) produce vast differences in probability (1.2% and 10%).
This information is important for identifying topics where donations may have more influence, such as healthcare and transportation, where these donations that are similar to the other topics result in larger increases in probability of voting against the party.
Interpreting Odds Ratios
Next, we considered interpretation by odds ratios.
Our 𝛃 values are useful for interpreting the log odds ratio by saying a one-unit increase in the logged total donations increases the log odds ratio by 𝛃. Still, this is a little less useful for us, so we convert the log odds ratio to odds ratio by raising the value to the power of e, and doing the same to each 𝛃.
For each coefficient 𝛃, we calculated e^𝛃. Then, we said that each one-unit increase in the logged total donations increased the odds of voting against his own party by a factor of e^𝛃.
We took the 𝛃 coefficients and performed the manipulation (Fig. 11).
For example, a one-unit increase in the log of total donations from the healthcare sector increased the odds of voting against the congressman’s party by e^1.04499, which equals 2.8434, which is over 184.34%. This is almost triple the odds of voting against the majority of his party, which is insane.
Similarly, a one-unit increase in the log total donations for transportation resulted in more than a doubling of the odds ratio. This was further indication that the donations to the healthcare and transportation sectors had a larger effect on voting behaviors of a congressman.
Takeaway
Our study underscores the multifaceted nature of political influence. While statistical models revealed significant connections between contributions and voting behavior, they also highlighted the other factors which come to play. It’s not as straightforward as paying a congressman to secure a favorable vote.
Our project offers a glimpse into the significant yet intricate relationship between political contributions and congressional voting behavior. Future research could delve deeper into other variables, such as committee donations and houses in Congress, or independently-affiliated congressmen, to paint a fuller picture. In the end, our analysis reaffirms the timeless adage: in politics, money talks — but it doesn’t tell the whole story.
Link to GitHub Repository here.