Contact : Arthur Charpentier charpentier.arthur@uqam.ca


We will use some some network packages (mainly to illustrate)

as well as some packages for spatial analysis, and visualization

For technical reasons, we need additional packages

In the applications, we will use gurobi. It is a commercial optimization solver for linear programming, quadratic programming (among many other), that is available for free in academia. Installation can be a little bit complicated (see the vignette online),

We will consider three applications in this guide

  • the toy example just to explain the four components in our approach : the nodes and the edges (a road network for instance), the location of sellers and buyers. The first part is to match locations of sellers and buyers with nodes. Then we solve the linear programming problem.
  • the metro example to illustrate the use of a real network (here the metro in Paris), with a single buyer and several sellers (hospitals here)
  • the maternity example to illustrate on read data, with all maternities in France (sellers) and all pregnant women (buyers).

1 The Theoretical Model

The geometry of the geographic market is described by a connected graph \(G=\left(V,E\right)\) where \(V\) is a set of vertices and \(E\) is a set of directed edges. Vertices are the different locations; edges are road segments linking two adjacent locations.

Agents are located at the vertices of our graph (if not, as we will see, we will match them with the closest vertice).We make the following assumptions on their consumption preferences:

  • (A1) Consumers with the same location have the same outside option,
  • (A2) Consumers have the same cost of travel along a given edge,
  • (A3) There is no capacity constraints on the supply side.

A path \(\pi_{vw}\) between two nodes \(v,w\in V\) is a sequence of connected edges starting in \(v\) and ending in \(w\). We note \(\Pi_{vw}\) the set of paths between \(v\) and \(w\).

For each edge \(e\in E\), we define \(c_{e}\) the cost of travel along this arc. A seller located in \(w\in V\) sells at an endogenous price \(p_{w}\). Therefore a consumer located in \(v\) chooses to buy from the seller located in \(w\) that minimizes his total cost \(p_{v}\) \[ p_{v}=\underset{w\in V\text{, }\pi_{vw}\in\Pi_{vw}}{\min}\left[p_{w}+\underset{e\in\pi_{vw}}{\sum}c_{e}\right] \]

Let’s denote\(u_{v}\) the reservation utility of agents located in \(v\) and \(p_{v}\) the price for these agents (including the cost of transportation). Therefore, if \(p_{v}>u_{v}\) a consumer in \(v\) his outside option, else he buys from the most competitive seller depending on his location in the market.

This outside option can be modeled by a fictitious node \(0\), where the price of the commodity is set to \(p_{0}=0\) and the cost of travel is \(c_{v0}=u_{v}\).

Hence the excess demand \(z_{v}\) at a node \(v\in V\) does not depend on the price. We are solving a matching problem over a network between sellers and consumers.

At equilibrium each consumer chooses the most competitive seller depending on his location; furthermore the demand received by each seller must equal his supply. If we define \(\mu_{e}\) the number of agents traveling along edge \(e\), these equilibrium conditions can be written: \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }p_{w}-p_{v}\leq c_{vw}\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }p_{w}-p_{v}=c_{vw}\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \]

These equations are the complementary slackness conditions for a minimum cost flow problem over network \(G\). Hence, if the graph is connected and if there isn’t any loop of negative cost in \(G\), there exists an equilibrium which is the solution of the min-cost flow problem \[ \begin{array}{cl} \mu=\text{argmin}\underset{e\in E}{\sum}c_{e}\mu_{e}\\ & \mu\text{ s.t. }\forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array} \]

Models used in trade theory usually use iceberg transport costs. These costs are linear in distance and paid in the transported good once arrived - hence the metaphor of the iceberg that melts as you transport it.

Our model can incorporate these costs applying a log-transformation to the equilibrium conditions. Let’s define for all \(vw\in E\) iceberg costs \(\tau_{vw}\). An equilibrium \(\left(\mu,p\right)\) should satisfy \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }\left(1-\tau_{vw}\right)p_{w}\leq p_{v}\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }\left(1-\tau_{vw}\right)p_{w}=p_{v}\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \] which becomes \[ \left\{ \begin{array}{l} \forall vw\in E\text{, }\log p_{w}-\log p_{v}\leq-\log\left(1-\tau_{vw}\right)\\ \forall vw\in E\text{ such that }\mu_{vw}>0\text{, }\log p_{w}-\log p_{v}=-\log\left(1-\tau_{vw}\right)\\ \forall v\in V\text{, }\underset{w\in V}{\sum}\mu_{vw}-\underset{u\in V}{\sum}\mu_{uv}=z_{v} \end{array}\right. \]

Therefore this problem is equivalent to the one presented earlier with \(\forall vw\in E\), \(c_{vw}=-\log\left(1-\tau_{vw}\right)\), and replacing \(p\) by its logarithm.

2 A Toy Example

2.1 Networks and Shapefiles

Consider a shapefile of roads, or train lines : each road is a collection of segments, or connected nodes. Our shapefile is, so fact, only in a dataframe dd

Here are the (given) knots and the four roads

Let us create a shapefile from those segments

Show code for the extend_shp() function

Intersections of roads are now added, as knots of the network

We can also visualize all knots of the network

2.3 Minimal Distance (Euclidean)

Here, we want to connect the buyers (green squares) and the sellers (orange bullets). Let use visualize, as a starting point, euclidean distances

We use the closest distance to match demand and supply

FALSE     X   Y   Type
FALSE 4 6.5 5.5 Supply

or with a more classical bipartite graph with buyers on the left (demand), and sellers on the right (supply)

Show code for the bipartite graph

2.4 Minimal Distance on a Network

From now on, let us forget about that shortest Euclidean distance, and let us consider the distance on the network (using roads).

2.4.1 Agents Location and Knots

The first step is to match the agents with a knot of the network (we use here the Euclidean distance)

FALSE     X   Y   Type       Xk       Yk NO
FALSE 1 5.5 4.0 Demand 5.000000 4.000000  7
FALSE 2 3.5 9.0 Demand 3.000000 9.000000  1
FALSE 3 6.5 8.0 Supply 6.000000 8.000000 13
FALSE 4 6.5 5.5 Supply 6.263158 5.894737  6
FALSE 5 4.5 9.0 Supply 5.000000 9.000000 11

2.5 Using igraph

One can use library(igraph) (see http://igraph.org) to find the closest supply location

FALSE The graph is connex : TRUE

If the graph is not connex, we need to keep the largest strongly connected subgraph

We need to keep the largest strongly connected subgraph (this technical assumption will be important later on):

FALSE          [,1]     [,2]     [,3]
FALSE [1,] 542.9799 253.2018 772.7032
FALSE [2,] 425.6273 519.5013 246.6023

2.6 Using gurobi

One can also use library(gurobi) (see http://gurobi.com/) to solve a linear programing problem. Very generally, the gurobi interface enables to solve problems of the form \(\mathbf{x}^{\text{T}}Q\mathbf{x}+c^{\text{T}}\mathbf{x}+\alpha\) subject to bound constraints \(\mathbf{\ell}\leq\mathbf{x}\leq \text{u}\), some quadratic constraints \(\mathbf{x}^{\text{T}}Qc\mathbf{x}+q^{\text{T}}\mathbf{x}\leq\beta\), some integrality constraints, some ordered set constraints, etc.

Here,the syntax will be the following (for gurobi functions)

  • a sparse matrix \(\nabla\) - Nabla - of size the number of edges (raws) times the number of nodes (columns). Each raw contains zeros, except for the starting node (+1) and the arrival node (-1). Note that \(|\nabla^{\text{T}}|\) is the incidence matrix of the network,
  • a cost vector \(\Phi\) - Phi - of size the number of edges. Here, we use a linear cost function, so \(\Phi\) is the vector of length (in km) of edges,
  • an in/out flow vector \(n\) - n - of size the number of nodes
  • an objective called modelsense : here we want to maximize the flow.

result = gurobi ( list(A=t(Nabla), obj=Phi, modelsense='max', rhs=n, sense='=', start=matrix(0,nbArcs,1)), params=NULL)

The outputs of the gurobi call will be

  • the flows on each edge \(\pi\), pi = result$x
  • the prices for each node \(p\) (in the same unit as matrix \(\Phi\))
FALSE [1] 0
FALSE Optimize a model with 13 rows, 26 columns and 52 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+01, 6e+02]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [2e+00, 3e+00]
FALSE Presolve removed 10 rows and 20 columns
FALSE Presolve time: 0.01s
FALSE Presolved: 3 rows, 6 columns, 12 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -1.6954446e+03   4.000000e+00   0.000000e+00      0s
FALSE        2   -1.9682155e+03   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 2 iterations and 0.01 seconds
FALSE Optimal objective -1.968215514e+03
FALSE $status
FALSE [1] "OPTIMAL"
FALSE 
FALSE $runtime
FALSE [1] 0.01120687
FALSE 
FALSE $itercount
FALSE [1] 2
FALSE 
FALSE $baritercount
FALSE [1] 0
FALSE 
FALSE $nodecount
FALSE [1] 0
FALSE 
FALSE $objval
FALSE [1] -1968.216
FALSE 
FALSE $x
FALSE  [1] 3 1 0 1 0 0 3 0 0 0 2 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0
FALSE 
FALSE $slack
FALSE  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
FALSE 
FALSE $rc
FALSE  [1]    0.0000    0.0000 -383.6504    0.0000 -383.6521    0.0000    0.0000
FALSE  [8]    0.0000    0.0000    0.0000    0.0000    0.0000    0.0000 -246.6023
FALSE [15] -246.8586 -912.8559 -161.8896    0.0000 -750.9680 -506.4036 -295.1692
FALSE [22] -703.7146 -246.8586 -246.6023 -117.6091 -195.9041
FALSE 
FALSE $pi
FALSE  [1]  246.7304715  123.4293099    0.0000000 -264.6027722  -80.9448213
FALSE  [6]  110.8812242  364.0830417  -36.7033854 -388.5606944  246.8586198
FALSE [11]    0.1281483  -22.1402962 -178.8968596
FALSE 
FALSE $vbasis
FALSE  [1]  0  0 -1  0 -1  0  0  0  0  0  0  0  0 -1 -1 -1 -1  0 -1 -1 -1 -1 -1
FALSE [24] -1 -1 -1
FALSE 
FALSE $cbasis
FALSE  [1] -1 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
FALSE 
FALSE $objbound
FALSE [1] -1968.216

The legend for those prices is here

FALSE     X   Y   Type       Xk       Yk NO        price
FALSE 1 5.5 4.0 Demand 5.000000 4.000000  7  364.0830417
FALSE 2 3.5 9.0 Demand 3.000000 9.000000  1  246.7304715
FALSE 3 6.5 8.0 Supply 6.000000 8.000000 13 -178.8968596
FALSE 4 6.5 5.5 Supply 6.263158 5.894737  6  110.8812242
FALSE 5 4.5 9.0 Supply 5.000000 9.000000 11    0.1281483

3 Subway in Paris

For this application, we will use network data of Paris’ Metro with a dataset of edges/arcs and a dataset with locations of nodes,

FALSE [1] 868   3
FALSE [1] 366   6

The following code can be used to visualize our network, with background information obtained from OpenStreetMap, as described in caRtosm’s github page.

Show code to import Paris background information

The following function use use to create a background to visualize the metro network

Consider for instance locations of some hospitals in Paris (one could actually use the geocode function of library(ggmap), from the adresses of hospitals, but here, we have locations, whith longitudes - lon - and latitudes - lat)

and consider someone sick, located Place Denfert-Rochereau

We can plot them on the map

Let us move all those locations to the closest subsway station (the nodes of our network)

Show code

3.1 Using igraph

Using library(igraph), it is possible to compute distances, for instance to get to the first hospital in our list (from the location of the sick person)

or the third one

Distances are respectively

FALSE          [,1]
FALSE [1,] 2872.404
FALSE          [,1]
FALSE [1,] 6029.571

On a graph, we get

Here is the closest hopital (here the th one), for our sick agent

FALSE [1] 5

3.2 Using gurobi

As discussed in the previous example, it is necessary to reshape the data to fit with inputs of the gurobi function. Very generally, the gurobi interface enables to solve problems of the form \(\mathbf{x}^{\text{T}}Q\mathbf{x}+c^{\text{T}}\mathbf{x}+\alpha\) subject to bound constraints \(\mathbf{\ell}\leq\mathbf{x}\leq \text{u}\), some quadratic constraints \(\mathbf{x}^{\text{T}}Qc\mathbf{x}+q^{\text{T}}\mathbf{x}\leq\beta\), some integrality constraints, some ordered set constraints, etc. We have here

  • a sparse matrix \(\nabla\) - Nabla - of size the number of edges (raws) times the number of nodes (columns). Each raw contains zeros, except for the starting node (+1) and the arrival node (-1). Note that \(|\nabla^{\text{T}}|\) is the incidence matrix of the network,
  • a cost vector \(\Phi\) - Phi - of size the number of edges. Here, we use a linear cost function, so \(\Phi\) is the vector of length (in km) of edges,
  • an in/out flow vector \(n\) - n - of size the number of nodes
  • an objective called modelsense : here we want to maximize the flow.

result = gurobi ( list(A=t(Nabla), obj=Phi, modelsense='max', rhs=n, sense='=', start=matrix(0,nbArcs,1)), params=NULL)

The outputs of the gurobi call will be

  • the flows on each edge \(\pi\), pi = result$x
  • the prices for each node \(p\) (in the same unit as matrix \(\Phi\))
FALSE Optimize a model with 366 rows, 868 columns and 1736 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+00, 6e+03]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 1e+00]
FALSE Presolve removed 300 rows and 600 columns
FALSE Presolve time: 0.00s
FALSE Presolved: 66 rows, 268 columns, 536 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -0.0000000e+00   2.000000e+00   0.000000e+00      0s
FALSE       31   -6.0295714e+03   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 31 iterations and 0.00 seconds
FALSE Optimal objective -6.029571414e+03
FALSE [1] "OPTIMAL"
FALSE [1] 6029.571
FALSE [1] 231 276 763 852 854 856 859 862
FALSE [1]  78 359 360 361 362 316 363  93  37

3.3 Voronoi sets

To conclude, notice that it is also possible to create some sort of Voronoi diagrams. The classical one is based on euclidean distances,

Now, if we use subway-based distances

Show code

3.4 Min-Cost Flow

Assume there is one sick person in each metro station. Suppose also that hospitals admit the same number of sick people.

FALSE  [1] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 44.75 -1.00 -1.00 -1.00
FALSE [12] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [23] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [34] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [45] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [56] -1.00 -1.00 44.75 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [67] -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [78] -1.00 -1.00 -1.00 -1.00 44.75 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00
FALSE [89] -1.00 -1.00 -1.00
FALSE [1] 0

Then we use a min-cost flow algorithm

FALSE Optimize a model with 366 rows, 868 columns and 1736 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [6e+00, 6e+03]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 4e+01]
FALSE Presolve removed 147 rows and 294 columns
FALSE Presolve time: 0.00s
FALSE Presolved: 219 rows, 574 columns, 1148 nonzeros
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE        0   -1.0169517e+06   7.030000e+02   0.000000e+00      0s
FALSE      317   -1.7186206e+06   0.000000e+00   0.000000e+00      0s
FALSE 
FALSE Solved in 317 iterations and 0.00 seconds
FALSE Optimal objective -1.718620640e+06

The black points below are the hospitals… colors here are related to the price, from (dark) red to (dark) blue

The legend for those prices is here

Show code

4 Natality

In this section, we plan to match pregnent women (demand) and maternities (supply), in France.

4.1 Preliminary work on the datasets

  • Shapefile of roads, where roads are the edges of the network, and intersections are the nodes, obtained from ign. Two files arcs.csv and nodes.csv were created.
  • Maternities details where obtained from data.gouv (to get the locations of all maternitives) and data.drees.sante.gouv.fr to get yearly statistics. From those files, three files were created: PERINAT_2016r.csv with birth counts, ID_2016r.csv with information about maternities, and etalab_stock_et_20181231.csv with spatial locations of hospitals,
  • Births in France, where obtained from data.gouv, and has been transfered to two files naissance_commune.csv and naissance_arrondissement.csv

For visualisation, we will use various shapefiles

  • Shapefiles of communes (cities) where obtained from actualitix.com but one can also use data.gouv
  • Shapefiles of departements where obtained from actualitix.com There is one shapefile per departement

4.1.2 Birth dataset

The birth file contains the number of births per city (ZIP code) in France. A first step can be to visualize those data,

Show code

We can get the number of birth per city (commune) even within Paris (per arrondissement)

Show code

For the region ÃŽle-de-France (containing the city of Paris, north-central and the most populous one in France, with 18.2% of the population). Colors are here just based on raw number of births, per commune (red is law while blue is high)

It is also possible to zoom in, on Paris (75), Hauts de Seine (92), Seine Saint-Denis (93) and https://en.wikipedia.org/wiki/Val-de-Marne (94), the three départements with a frontier with Paris

Show code

4.1.3 Births and Network

Consider the Hauts de Seine (92) departement, we can visualize the 36 communes (the polygons) and all the node (road intersections)

For each polygon, we have the number of birth, for a given year. It is then possible to allocate them to knots inside each polygon.

4.1.4 Maternity dataset

Information about maternities is downloaded from data.gouv

Show code

We have a collection of 578 maternities, that can be ploted below

FALSE 'data.frame': 578 obs. of  10 variables:
FALSE  $ FI       : Factor w/ 664 levels "010000024","010000032",..: 1 2 3 4 6 7 8 9 10 11 ...
FALSE  $ ACCUN    : int  2112 413 566 801 1233 882 1089 599 600 574 ...
FALSE  $ ACCMU    : int  40 5 3 5 18 12 12 7 8 7 ...
FALSE  $ dep      : Factor w/ 101 levels "01","02","03",..: 1 1 1 1 2 2 2 2 2 2 ...
FALSE  $ COMINSEE : Factor w/ 1794 levels "01004","01034",..: 3 2 11 1 32 28 33 23 22 32 ...
FALSE  $ coordx   : num  871098 908660 902284 882724 719368 ...
FALSE  $ coordy   : num  6569398 6521504 6578385 6542588 6973708 ...
FALSE  $ birth    : int  2152 418 569 806 1251 894 1101 606 608 581 ...
FALSE  $ NEAR_FID : int  151065 194305 206357 90300 39307 90743 42038 15097 158438 193937 ...
FALSE  $ ID_RTE500: int  151065 194305 206357 90300 39307 90743 42038 15097 158438 193937 ...

Per département, the total number of maternities is the following

FALSE dpt
FALSE  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 
FALSE  4  6  5  2  2 11  4  3  2  3  4  5 17  8  4  4  7  4  4  5  6  1  4  4  6 
FALSE 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
FALSE  7  5  9  6  8  2 12  8  7  3  7 10  3  3  4  8  2  7  5  3  4  1  5  6  7 
FALSE 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
FALSE  3  3  6  3  7  8  3 22  6  4 12  4  7  4  3  8  8 16  1  5  3  5  7 19 11 
FALSE 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 
FALSE 10 11  3  4  4  3  9  7  7  3  4  5  3  1 13 13 14  9 11

4.2 Matching Homes and Maternities

The first step is a transformation of the edge file (some roads are one way, while others are two way - we duplicate the two way road)

Then transformation of the node file (names of nodes)

A transformation of the birth (demand) file (again, the names of variables)

And a transformation of the maternity (supply) file

We do not have homes of pregnent mothers, only their city - code postal. Sometimes, nodes can be on two cities. For technical reasons, it is necessary to keep nodes only once.

FALSE [1] 980
FALSE [1] 320763      5
FALSE [1] 310742
FALSE [1] 611002      5

Let us check the connexity of the graph

FALSE The graph is connex : FALSE

Unfortunately, the graph is not connected. We will keep the largest strongly connected subgraph,

Show code

We have now 564 maternities.

Now, we need to balance the flow : the number of pregnent mothers should be equal to the number of deliveries in maternies, again for technical reasons. But the first step is to allocate births within a city among all the nodes. We will do that uniformely

For instance, in the Hauts de Seine region, we have

Observe that the number of births per node is not necessarily an integer, but it is not (technically at least) a major issue here.

Let us now balance to insure that the in-coming flow equals the out-coming flow

Number of births in the demand dataset (number of pregnent mothers) is 7.3893310^{5}, while the number of births according to the maternity dataset should be 806341. The difference is here significant (9.1%), but since the goal here is to illustrate the technique, let us no discuss further this issue here.

This will be our balanced in-coming flows

In order to allocate mothers to hospital, define the following shortest path function shortestPath(arcs, nodes, originNode, destinationNode), based on gurobi.

Show code for the shortestPath() function

For instance, consider two random nodes, 41 as the starting one, and 50115 as the destination one. Here is the shortest road,

FALSE Optimize a model with 209105 rows, 605318 columns and 1210636 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [1e-02, 4e+01]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [1e+00, 1e+00]
FALSE 
FALSE Concurrent LP optimizer: dual simplex and barrier
FALSE Showing barrier log only...
FALSE 
FALSE Presolve removed 46809 rows and 93850 columns
FALSE Presolve time: 2.19s
FALSE Presolved: 162296 rows, 511468 columns, 1022930 nonzeros
FALSE 
FALSE Ordering time: 0.00s
FALSE 
FALSE Barrier statistics:
FALSE  AA' NZ     : 2.519e+05
FALSE  Factor NZ  : 2.634e+06 (roughly 300 MBytes of memory)
FALSE  Factor Ops : 9.147e+07 (less than 1 second per iteration)
FALSE  Threads    : 1
FALSE 
FALSE                   Objective                Residual
FALSE Iter       Primal          Dual         Primal    Dual     Compl     Time
FALSE    0  -6.07058023e+05  1.27199149e+00  4.37e+00 2.60e+00  1.51e+01     3s
FALSE    1  -4.55634553e+05 -2.60680691e+01  5.72e-01 4.99e+00  4.08e+00     3s
FALSE    2  -2.14918787e+05 -4.83183251e+01  2.04e-02 2.84e-01  5.55e-01     3s
FALSE    3  -3.53126137e+04 -1.04241845e+02  2.02e-03 5.19e-02  7.74e-02     4s
FALSE    4  -1.42523713e+04 -2.11013242e+02  7.07e-04 2.36e-02  2.94e-02     4s
FALSE    5  -5.25404704e+03 -3.04486913e+02  2.10e-04 1.42e-02  1.02e-02     4s
FALSE    6  -2.94902504e+03 -4.90216105e+02  9.32e-05 1.09e-02  4.98e-03     4s
FALSE    7  -2.00978381e+03 -6.19302115e+02  5.03e-05 5.12e-03  2.78e-03     5s
FALSE    8  -1.50040656e+03 -6.49211353e+02  2.84e-05 3.34e-03  1.70e-03     5s
FALSE    9  -1.31510556e+03 -6.66366203e+02  2.09e-05 2.23e-03  1.29e-03     5s
FALSE   10  -1.19957750e+03 -6.87874872e+02  1.64e-05 1.98e-03  1.02e-03     5s
FALSE   11  -1.09070090e+03 -6.92678533e+02  1.24e-05 3.36e-03  7.89e-04     5s
FALSE   12  -9.27352798e+02 -7.06689238e+02  6.60e-06 1.51e-03  4.36e-04     6s
FALSE   13  -7.48638047e+02 -7.09681943e+02  9.05e-07 8.38e-04  7.69e-05     6s
FALSE   14  -7.14444353e+02 -7.13293883e+02  8.96e-09 8.45e-06  2.25e-06     6s
FALSE   15  -7.13341458e+02 -7.13338658e+02  1.18e-11 1.10e-13  5.48e-09     6s
FALSE   16  -7.13339997e+02 -7.13339997e+02  2.78e-11 1.10e-13  1.25e-14     7s
FALSE 
FALSE Barrier solved model in 16 iterations and 6.54 seconds
FALSE Optimal objective -7.13339997e+02
FALSE 
FALSE Crossover log...
FALSE 
FALSE   161975 DPushes remaining with DInf 0.0000000e+00                 7s
FALSE        0 DPushes remaining with DInf 6.2022124e-12                 9s
FALSE 
FALSE        1 PPushes remaining with PInf 0.0000000e+00                 9s
FALSE        0 PPushes remaining with PInf 0.0000000e+00                 9s
FALSE 
FALSE   Push phase complete: Pinf 0.0000000e+00, Dinf 6.2022124e-12      9s
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE   161979   -7.1334000e+02   0.000000e+00   0.000000e+00      9s
FALSE 
FALSE Solved with barrier
FALSE Solved in 161979 iterations and 9.24 seconds
FALSE Optimal objective -7.133399975e+02

Here, maternities are distributed as follows

For the min-cost flow analysis, use the following code

FALSE [1] 2.388758e-10

The in-coming and out-coming flows are here equal so we can use gurobi()

FALSE Optimize a model with 209105 rows, 605318 columns and 1210636 nonzeros
FALSE Coefficient statistics:
FALSE   Matrix range     [1e+00, 1e+00]
FALSE   Objective range  [1e-02, 4e+01]
FALSE   Bounds range     [0e+00, 0e+00]
FALSE   RHS range        [5e-02, 1e+03]
FALSE 
FALSE Concurrent LP optimizer: dual simplex and barrier
FALSE Showing barrier log only...
FALSE 
FALSE Presolve removed 12131 rows and 23783 columns
FALSE Presolve time: 1.78s
FALSE Presolved: 196974 rows, 581535 columns, 1163064 nonzeros
FALSE 
FALSE Ordering time: 0.00s
FALSE 
FALSE Barrier statistics:
FALSE  AA' NZ     : 2.890e+05
FALSE  Factor NZ  : 3.003e+06 (roughly 340 MBytes of memory)
FALSE  Factor Ops : 9.776e+07 (less than 1 second per iteration)
FALSE  Threads    : 1
FALSE 
FALSE                   Objective                Residual
FALSE Iter       Primal          Dual         Primal    Dual     Compl     Time
FALSE    0  -1.47600095e+09 -1.18765362e+05  8.80e+03 1.77e+00  2.60e+04     3s
FALSE    1  -1.06983833e+09 -3.10921683e+06  1.52e+03 3.34e+00  6.88e+03     3s
FALSE    2  -4.72815511e+08 -5.82705536e+06  1.15e+02 1.49e-01  9.65e+02     4s
FALSE    3  -1.04226535e+08 -1.20454452e+07  1.31e+01 1.28e-02  1.66e+02     4s
FALSE    4  -6.10772196e+07 -1.69639690e+07  5.62e+00 5.50e-03  7.78e+01     4s
FALSE    5  -4.50557406e+07 -2.08870328e+07  2.98e+00 5.16e-03  4.22e+01     4s
FALSE    6  -3.85653932e+07 -2.22695869e+07  1.91e+00 2.92e-03  2.84e+01     5s
FALSE    7  -3.73820196e+07 -2.31777180e+07  1.72e+00 2.58e-03  2.47e+01     5s
FALSE    8  -3.25649020e+07 -2.38170487e+07  9.47e-01 1.92e-03  1.52e+01     5s
FALSE    9  -2.93672014e+07 -2.49830366e+07  4.48e-01 9.39e-04  7.60e+00     5s
FALSE   10  -2.77523198e+07 -2.54155134e+07  1.99e-01 6.13e-04  4.05e+00     5s
FALSE   11  -2.72155369e+07 -2.58793383e+07  1.18e-01 2.86e-04  2.31e+00     6s
FALSE   12  -2.70252887e+07 -2.60649004e+07  8.97e-02 1.65e-04  1.66e+00     6s
FALSE   13  -2.66488543e+07 -2.61851594e+07  3.55e-02 8.88e-05  8.03e-01     6s
FALSE   14  -2.65325627e+07 -2.62911117e+07  1.74e-02 4.14e-05  4.18e-01     6s
FALSE   15  -2.64802350e+07 -2.63627024e+07  9.40e-03 1.13e-05  2.03e-01     7s
FALSE   16  -2.64480568e+07 -2.63829883e+07  5.07e-03 4.82e-06  1.12e-01     7s
FALSE   17  -2.64299368e+07 -2.63951109e+07  2.65e-03 1.73e-06  6.02e-02     7s
FALSE   18  -2.64170805e+07 -2.64031819e+07  9.52e-04 6.86e-10  2.40e-02     7s
FALSE   19  -2.64112650e+07 -2.64074925e+07  2.23e-04 1.72e-10  6.52e-03     8s
FALSE   20  -2.64097741e+07 -2.64086210e+07  5.88e-05 5.27e-11  1.99e-03     8s
FALSE   21  -2.64093544e+07 -2.64089704e+07  1.62e-05 1.83e-11  6.64e-04     8s
FALSE   22  -2.64092458e+07 -2.64091285e+07  5.75e-06 3.69e-12  2.02e-04     8s
FALSE   23  -2.64091952e+07 -2.64091670e+07  1.15e-06 8.53e-13  4.88e-05     9s
FALSE   24  -2.64091831e+07 -2.64091807e+07  1.10e-06 2.87e-10  4.21e-06     9s
FALSE   25  -2.64091817e+07 -2.64091814e+07  9.73e-07 5.49e-10  4.07e-07     9s
FALSE   26  -2.64091816e+07 -2.64091816e+07  3.00e-07 1.20e-10  1.02e-07     9s
FALSE   27  -2.64091816e+07 -2.64091816e+07  1.25e-07 3.02e-11  3.26e-08     9s
FALSE   28  -2.64091816e+07 -2.64091816e+07  4.09e-08 7.53e-12  9.87e-09    10s
FALSE 
FALSE Barrier solved model in 28 iterations and 9.68 seconds
FALSE Optimal objective -2.64091816e+07
FALSE 
FALSE Crossover log...
FALSE 
FALSE     2406 DPushes remaining with DInf 0.0000000e+00                10s
FALSE        0 DPushes remaining with DInf 9.3240346e-12                11s
FALSE 
FALSE     1477 PPushes remaining with PInf 0.0000000e+00                11s
FALSE        0 PPushes remaining with PInf 0.0000000e+00                11s
FALSE 
FALSE   Push phase complete: Pinf 0.0000000e+00, Dinf 9.2009386e-12     11s
FALSE 
FALSE Iteration    Objective       Primal Inf.    Dual Inf.      Time
FALSE     3886   -2.6409182e+07   0.000000e+00   0.000000e+00     12s
FALSE 
FALSE Solved with barrier
FALSE Solved in 3886 iterations and 11.85 seconds
FALSE Optimal objective -2.640918159e+07

We can actually zoom in, e.g. on South-East part of France

or on Paris

4.3 Overall Perspective

One can also obtain easily chloropleth maps, but it is also possible to smooth, to visualize prices

Let us define a grid on France

Show code

where we keep only points within the French territory.

Let us use k-nearest neighbors, on the k closest nodes (here k=20)

Again, we can zoom in, for instance on the south-east of France, with maternities in black dots

or on Paris

5 Generic R Function(s)

5.1 Dataset Preparation

We need four datasets here as inputs

  • the nodes dataset nodes, with locations (latitude, longitude)
  • the edge dataset arcs, with (directed) connexions between nodes
  • the supply dataset supply, with supply locations (latitude, longitude)
  • the demand dataset demand, with demand locations (latitude, longitude)

The tidy(arcs,nodes,supply,demand) function that reshape datasets * to make sure that supply and demand location are located on one node * to make sure that the graph is connected * to balance supply and demand (or skew symmetry and conservation) : the total flow entering must equal the total flow leaving

It will return four clean datasets arcs, nodes, supply and demand.

Show code for the tidy() function

Show code for the pricing() function

5.2 Application to Paris Region

Consider the sub-graph of nodes and edges, located in that rectangle,

There are 333 ‘cities’ remaining.

Just to check, just look at

Show code Show code

If we focus only on maternity (there are 24 maternities in this region), we obtain the following map,

5.3 Application to South-East France

Consider the sub-graph of nodes and edges, located in that rectangle,

There are 8216 ‘cities’ remaining.

Just to check, just look at

Show code Show code

If we focus only on maternity (there are 163 maternities in this region - or 156 depending on the dataset we use…), we obtain the following map,

5.4 Border Issue

5.4.1 In/Out Difference

We did mention it previously, but it is necessary here that the sum of deliveries in (pregnant women, birth registered in France) equals the sum of deliveries out (birth in hospitals), \[ \sum_{\text{node } i:\text{ pregnant woman}}\overline{d}_i = \sum_{\text{node } j:\text{ maternity}}d_j \] Unfortunately, because we use two different sources, both not not equal. So it is necessay to use a correction \[ \tilde{d}_k=\frac{\displaystyle{\sum_{\text{node } j:\text{ maternity}}d_j}}{\displaystyle{\sum_{\text{node } i:\text{ pregnant woman}}\overline{d}_i}}\cdot\overline{d}_k \] Unfortunately, this approximation is uniform over all France. To overcome that issue, local pricing was considered.

5.4.2 Local Approaches

Using the generic function mentioned previously, it is possible to compute prices on any territory. The strategy was to dived France in small territories, and to compute prices on each of them (the area below). In order to avoid border problems when we reconcile, we did actually use slighly larger regions.