Contact : Arthur Charpentier charpentier.arthur@uqam.ca


We will use some some network packages (mainly to illustrate)

as well as some packages for spatial analysis, and visualization

For technical reasons, we need additional packages

In the applications, we will use gurobi. It is a commercial optimization solver for linear programming, quadratic programming (among many other), that is available for free in academia. Installation can be a little bit complicated (see the vignette online),

We will consider three applications in this guide

  • the toy example just to explain the four components in our approach : the nodes and the edges (a road network for instance), the location of sellers and buyers. The first part is to match locations of sellers and buyers with nodes. Then we solve the linear programming problem.
  • the metro example to illustrate the use of a real network (here the metro in Paris), with a single buyer and several sellers (hospitals here)
  • the maternity example to illustrate on read data, with all maternities in France (sellers) and all pregnant women (buyers).

1 The Theoretical Model

The geometry of the geographic market is described by a connected graph \(G=\left(V,E\right)\) where \(V\) is a set of vertices and \(E\) is a set of directed edges. Vertices are the different locations; edges are road segments linking two adjacent locations.

Agents are located at the vertices of our graph (if not, as we will see, we will match them with the closest vertice).We make the following assumptions on their consumption preferences:

  • (A1) Consumers with the same location have the same outside option,
  • (A2) Consumers have the same cost of travel along a given edge,
  • (A3) There is no capacity constraints on the supply side.

A path \(\pi_{vw}\) between two nodes \(v,w\in V\) is a sequence of connected edges starting in \(v\) and ending in \(w\). We note \(\Pi_{vw}\) the set of paths between \(v\) and \(w\).

For each edge \(e\in E\), we define \(c_{e}\) the cost of travel along this arc. A seller located in \(w\in V\) sells at an endogenous price \(p_{w}\). Therefore a consumer located in \(v\) chooses to buy from the seller located in \(w\) that minimizes his total cost \(p_{v}\) \[ p_{v}=\underset{w\in V\text{, }\pi_{vw}\in\Pi_{vw}}{\min}\left[p_{w}+\underset{e\in\pi_{vw}}{\sum}c_{e}\right] \]

Let’s denote\(u_{v}\) the reservation utility of agents located in \(v\) and \(p_{v}\) the price for these agents (including the cost of transportation). Therefore, if \(p_{v}>u_{v}\) a consumer in \(v\) his outside option, else he buys from the most competitive seller depending on his location in the market.

This outside option can be modeled by a fictitious node \(0\), where the price of the commodity is set to \(p_{0}=0\) and the cost of travel is \(c_{v0}=u_{v}\).

Hence the excess demand \(z_{v}\) at a node \(v\in V\) does not depend on the price. We are solving a matching problem over a network between sellers and consumers.

At equilibrium each consumer chooses the most competitive seller depending on his location; furthermore the demand received by each seller must equal his supply. If we define \(\mu_{e}\) the number of agents traveling along edge \(e\), these equilibrium conditions can be written: \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }p_{w}-p_{v}\leq c_{vw}\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }p_{w}-p_{v}=c_{vw}\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \]

These equations are the complementary slackness conditions for a minimum cost flow problem over network \(G\). Hence, if the graph is connected and if there isn’t any loop of negative cost in \(G\), there exists an equilibrium which is the solution of the min-cost flow problem \[ \begin{array}{cl} \mu=\text{argmin}\underset{e\in E}{\sum}c_{e}\mu_{e}\\ & \mu\text{ s.t. }\forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array} \]

Models used in trade theory usually use iceberg transport costs. These costs are linear in distance and paid in the transported good once arrived - hence the metaphor of the iceberg that melts as you transport it.

Our model can incorporate these costs applying a log-transformation to the equilibrium conditions. Let’s define for all \(vw\in E\) iceberg costs \(\tau_{vw}\). An equilibrium \(\left(\mu,p\right)\) should satisfy \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }\left(1-\tau_{vw}\right)p_{w}\leq p_{v}\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }\left(1-\tau_{vw}\right)p_{w}=p_{v}\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \] which becomes \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }\log p_{w}-\log p_{v}\leq-\log\left(1-\tau_{vw}\right)\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }\log p_{w}-\log p_{v}=-\log\left(1-\tau_{vw}\right)\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \]

Therefore this problem is equivalent to the one presented earlier with \(\forall vw\in E\), \(c_{vw}=-\log\left(1-\tau_{vw}\right)\), and replacing \(p\) by its logarithm.

2 A Toy Example

2.1 Networks and Shapefiles

Consider a shapefile of roads, or train lines : each road is a collection of segments, or connected nodes. Our shapefile is, so fact, only in a dataframe dd

Here are the (given) knots and the four roads

Let us create a shapefile from those segments

Show code for the extend_shp() function

Intersections of roads are now added, as knots of the network

We can also visualize all knots of the network

2.3 Minimal Distance (Euclidean)

Here, we want to connect the buyers (green squares) and the sellers (orange bullets). Let use visualize, as a starting point, euclidean distances

We use the closest distance to match demand and supply

FALSE     X   Y   Type
FALSE 4 6.5 5.5 Supply

or with a more classical bipartite graph with buyers on the left (demand), and sellers on the right (supply)

Show code for the bipartite graph

2.4 Minimal Distance on a Network

From now on, let us forget about that shortest Euclidean distance, and let us consider the distance on the network (using roads).

2.4.1 Agents Location and Knots

The first step is to match the agents with a knot of the network (we use here the Euclidean distance)

FALSE     X   Y   Type       Xk       Yk NO
FALSE 1 5.5 4.0 Demand 5.000000 4.000000  7
FALSE 2 3.5 9.0 Demand 3.000000 9.000000  1
FALSE 3 6.5 8.0 Supply 6.000000 8.000000 13
FALSE 4 6.5 5.5 Supply 6.263158 5.894737  6
FALSE 5 4.5 9.0 Supply 5.000000 9.000000 11

2.5 Using igraph

One can use library(igraph) (see http://igraph.org) to find the closest supply location

FALSE The graph is connex : TRUE

If the graph is not connex, we need to keep the largest strongly connected subgraph

We need to keep the largest strongly connected subgraph (this technical assumption will be important later on):

FALSE          [,1]     [,2]     [,3]
FALSE [1,] 542.9799 253.2018 772.7032
FALSE [2,] 425.6273 519.5013 246.6023

2.6 Using gurobi

One can also use library(gurobi) (see http://gurobi.com/) to solve a linear programing problem. Very generally, the gurobi interface enables to solve problems of the form \(\mathbf{x}^{\text{T}}Q\mathbf{x}+c^{\text{T}}\mathbf{x}+\alpha\) subject to bound constraints \(\mathbf{\ell}\leq\mathbf{x}\leq \text{u}\), some quadratic constraints \(\mathbf{x}^{\text{T}}Qc\mathbf{x}+q^{\text{T}}\mathbf{x}\leq\beta\), some integrality constraints, some ordered set constraints, etc.

Here,the syntax will be the following (for gurobi functions)

  • a sparse matrix \(\nabla\) - Nabla - of size the number of edges (raws) times the number of nodes (columns). Each raw contains zeros, except for the starting node (+1) and the arrival node (-1). Note that \(|\nabla^{\text{T}}|\) is the incidence matrix of the network,
  • a cost vector \(\Phi\) - Phi - of size the number of edges. Here, we use a linear cost function, so \(\Phi\) is the vector of length (in km) of edges,
  • an in/out flow vector \(n\) - n - of size the number of nodes
  • an objective called modelsense : here we want to maximize the flow.

result = gurobi ( list(A=t(Nabla), obj=Phi, modelsense='max', rhs=n, sense='=', start=matrix(0,nbArcs,1)), params=NULL)

The outputs of the gurobi call will be

  • the flows on each edge \(\pi\), pi = result$x
  • the prices for each node \(p\) (in the same unit as matrix \(\Phi\))
FALSE [1] 0
FALSE Optimize a model with 13 rows, 26 columns and 52 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+01, 6e+02]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [2e+00, 3e+00]
FALSE Presolve removed 10 rows and 20 columns
FALSE Presolve time: 0.01s
FALSE Presolved: 3 rows, 6 columns, 12 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -1.6954446e+03   4.000000e+00   0.000000e+00      0s
FALSE        2   -1.9682155e+03   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 2 iterations and 0.01 seconds
FALSE Optimal objective -1.968215514e+03
FALSE $status
FALSE [1] "OPTIMAL"
FALSE 
FALSE $runtime
FALSE [1] 0.01120687
FALSE 
FALSE $itercount
FALSE [1] 2
FALSE 
FALSE $baritercount
FALSE [1] 0
FALSE 
FALSE $nodecount
FALSE [1] 0
FALSE 
FALSE $objval
FALSE [1] -1968.216
FALSE 
FALSE $x
FALSE  [1] 3 1 0 1 0 0 3 0 0 0 2 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0
FALSE 
FALSE $slack
FALSE  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
FALSE 
FALSE $rc
FALSE  [1]    0.0000    0.0000 -383.6504    0.0000 -383.6521    0.0000    0.0000
FALSE  [8]    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 -246.6023
FALSE [15] -246.8586 -912.8559 -161.8896    0.0000 -750.9680 -506.4036 -295.1692
FALSE [22] -703.7146 -246.8586 -246.6023 -117.6091 -195.9041
FALSE 
FALSE $pi
FALSE  [1]  246.7304715  123.4293099    0.0000000 -264.6027722  -80.9448213
FALSE  [6]  110.8812242  364.0830417  -36.7033854 -388.5606944  246.8586198
FALSE [11]    0.1281483  -22.1402962 -178.8968596
FALSE 
FALSE $vbasis
FALSE  [1]  0  0 -1  0 -1  0  0  0  0  0  0  0  0 -1 -1 -1 -1  0 -1 -1 -1 -1 -1
FALSE [24] -1 -1 -1
FALSE 
FALSE $cbasis
FALSE  [1] -1 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
FALSE 
FALSE $objbound
FALSE [1] -1968.216

The legend for those prices is here

FALSE     X   Y   Type       Xk       Yk NO        price
FALSE 1 5.5 4.0 Demand 5.000000 4.000000  7  364.0830417
FALSE 2 3.5 9.0 Demand 3.000000 9.000000  1  246.7304715
FALSE 3 6.5 8.0 Supply 6.000000 8.000000 13 -178.8968596
FALSE 4 6.5 5.5 Supply 6.263158 5.894737  6  110.8812242
FALSE 5 4.5 9.0 Supply 5.000000 9.000000 11    0.1281483

3 Subway in Paris

For this application, we will use network data of Paris’ Metro with a dataset of edges/arcs and a dataset with locations of nodes,

FALSE [1] 868   3
FALSE [1] 366   6

The following code can be used to visualize our network, with background information obtained from OpenStreetMap, as described in caRtosm’s github page.

Show code to import Paris background information

The following function use use to create a background to visualize the metro network

Consider for instance locations of some hospitals in Paris (one could actually use the geocode function of library(ggmap), from the adresses of hospitals, but here, we have locations, whith longitudes - lon - and latitudes - lat)

and consider someone sick, located Place Denfert-Rochereau

We can plot them on the map

Let us move all those locations to the closest subsway station (the nodes of our network)

Show code

3.1 Using igraph

Using library(igraph), it is possible to compute distances, for instance to get to the first hospital in our list (from the location of the sick person)

or the third one

Distances are respectively

FALSE          [,1]
FALSE [1,] 2872.404
FALSE          [,1]
FALSE [1,] 6029.571

On a graph, we get

Here is the closest hopital (here the th one), for our sick agent

FALSE [1] 5