This notebook shares code to create a plot of migration flows in the San Francisco Bay Area. I show how to use python's networkx library to plot a network of migration flows. You can find the jupyter notebook on my github page.
The most interesting result from this exercise is that most moves in the Bay Area come from outside the Bay Area, rather than from moves between the counties within the Bay Area.
I use data from the American Community Survey.
This is an early part of research by Brian Higgins and Boaz Abramson. Keep posted for the full paper.
import numpy as np
from numba import njit
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
We import the data using pandas. The raw data is a spreadsheet which includes the proportion of moves to county A that come from county B, and vice versa. We include the nine counties of the Bay Area and an extra group, Outside, which captures all moves to-from outside the Bay. This dataset was produced from the American Community Survey, which provides county to county moves for every county in the US.
You can see the basic structure of the data printed below.
#import data
data = pd.read_excel("../../3.data/County_Migration/phi.xls")
data.head()
We use python's networkx class to convert our data to a graph. The nodes of the graph are the counties and the edges links if there is migration between any two counties. In fact, this is a weighted directed graph because the edges indicate the proportion of migrations into a county from another county.
We can also convert this graph into the adjacency matrix, which we denote $\phi$.
#define a directed graph
DG = nx.from_pandas_edgelist(df=data, source='fipsnameB', target='fipsnameA', edge_attr=['phi_B_to_A'],create_using=nx.DiGraph)
phi = nx.to_numpy_array(DG, dtype=None, nodelist=('Contra Costa County', 'Alameda County', 'Marin County', 'Napa County', 'San Francisco County', 'San Mateo County', 'Santa Clara County', 'Solano County', 'Sonoma County','Outside'), weight='phi_B_to_A', nonedge=0.0)
# create list of names which will be useful later
names = ['Con', 'Ala', 'Mar', 'Napa', 'SS', 'SM', 'SC', 'Sol', 'Son','Out']
names_long = ['Contra', 'Alameda', 'Marin', 'Napa', 'SF', 'Mateo', 'Clara', 'Solano', 'Sonoma','Outside']
Once we've created a graph in networkx, we can easily plot the links using matplotlib.pyplot. The size of the link in the plot represents the proportion of moves flowing into each county. It's clear that most flows come from outside the Bay Area.
I also print the adjacency matrix so that you can see the underlying numbers.
plt.figure(figsize =(10, 7))
node_color = [DG.degree(v) for v in DG]
node_size = [5 for v in DG]
edge_width = [15 * DG[u][v]['phi_B_to_A'] for u, v in DG.edges()]
# width of edge is a list of weight of edges
nx.draw_networkx(DG, node_size = node_size,
node_color = node_color, alpha = 0.7,
with_labels = True, width = edge_width,
edge_color ='.4', cmap = plt.cm.Blues)
plt.title('Network of moves in to the SF Bay Area')
plt.axis('off')
plt.tight_layout();
plt.savefig('network_BayArea.png')
# put phi matrix in a dataframe to make it easier to print with name
df_phi = pd.DataFrame(phi, index=names_long, columns=names_long)
df_phi[['Contra', 'Alameda', 'Marin', 'Napa', 'SF', 'Mateo', 'Clara', 'Solano', 'Sonoma']].style.background_gradient(cmap='Blues')