Exploring Geospatial data in Elixir with Phoenix, D3, and PostGIS

28 Jan 2016

Captain Cook’s Journeys on the Endeavor

I wanted to explore using Elixir, the Phoenix Web Framework, the D3 javascript library, and PostGIS.

Source code is available at this GitHub repo.

I noticed that Kaggle has started hosting some publicly available datasets. One of which is Ocean Ship Logbooks (1750-1850). It is provided as CSV files and they describe the data as

In the mid-eighteenth to nineteenth centuries, navigating the open ocean was an
imprecise and often dangerous feat. In order to calculate their daily progress
and avoid running/sailing into the unknown, a ship's crew kept a detailed logbook
with data on winds, waves, and any remarkable weather.

An example they provide is a Python visualization of Captain Cook’s travels on the HMS Endeavour. I thought it would be interesting to recreate the example in a web browser (which you can see in action above).

I assume some familiarity with the technologies used and just wanted to provide a working example. If you need more information I highly recommend reading guides like this excellent one for Phoenix.

Docker

For simplicity in this example we’ll be using Docker and the mdillon/postgis image. If you already have PostGIS installed or do not want to use Docker feel free to skip this section.

# Create a persistent data volume
$ sudo docker create -v /var/lib/postgresql/data \
    --name postgres-ocean-ships-data busybox

# Create the postgis container
$ sudo docker run --name postgres-ocean-ships -p 5432:5432 \
    -e POSTGRES_PASSWORD=postgres -d --volumes-from postgres-ocean-ships-data \
    mdillon/postgis:9.4

You can double check everything is working by connecting to Postgres. When prompted, the password will be “postgres”, but you should consider using a much better one.

If you have psql installed locally you can do it like so:

```sh $ psql -h localhost -p 5432 -U postgres ~~~

Otherwise you can also use the one in the container and connect with these commands:

$ sudo docker exec -i -t postgres-ocean-ships bash
root@10499abd059e:/# psql -h localhost -p 5432 -U postgres

Now that the containers are set up, in the future you can start and stop the PostGIS container like this:

$ sudo docker start postgres-ocean-ships
$ sudo docker stop postgres-ocean-ships

Phoenix

Create the Phoenix app:

$ mix phoenix.new ocean_ship_logbooks
$ cd ocean_ship_logbooks/
$ mix ecto.create

Add these Elixir CSV and Geo libraries to your mix.exs dependencies.

{:csv, "~> 1.2.3"}
{:geo, "~> 1.0"}

Fetch them

$ mix deps.get

We are going to be using be using the CSV library to parse the data, and the Geo library to work with PostGIS (via Ecto), so add to your config files (dev.exs, test.exs, and prod.secret.exs):

  extensions: [{Geo.PostGIS.Extension, library: Geo}]

For example, the database section of tour config/dev.exs should now look something like:

# Configure your database
config :ocean_ship_logbooks, OceanShipLogbooks.Repo,
  adapter: Ecto.Adapters.Postgres,
  username: "postgres",
  password: "postgres",
  database: "ocean_ship_logbooks_dev",
  hostname: "localhost",
  pool_size: 10,
  extensions: [{Geo.PostGIS.Extension, library: Geo}]

Create database migrations

First we will want to enable the PostGIS extension in Postgres for our database. Lets generate a migration.

$ mix ecto.gen.migration enable_postgis
* creating priv/repo/migrations
* creating priv/repo/migrations/20160128183445_enable_postgis.exs

Edit 20160128183445_enable_postgis.exs so that the contents are this:

defmodule OceanShipLogbooks.Repo.Migrations.EnablePostgis do
  use Ecto.Migration

  def up do
    execute "CREATE EXTENSION IF NOT EXISTS postgis"
  end

  def down do
    execute "DROP EXTENSION IF EXISTS postgis"
  end

end

Now lets create ship data table.

Since the ship data will be accessed via a JSON API, you normally might generate a resource with the mix phoenix.gen.json task to create the model, migration, view, etc. code, but since we are using custom types from the Geo library we’ll just create them manually.

$ mix ecto.gen.migration create_ship_data
* creating priv/repo/migrations
* creating priv/repo/migrations/20160128223253_create_ship_data.exs

We’ll want to store the name of the ship, a timestamp, and a geometry object in the database, and we will be querying on the ship’s name. Alter the file so that it has the following contents:

defmodule OceanShipLogbooks.Repo.Migrations.CreateShipData do
  use Ecto.Migration

  def change do
    create table(:ship_data) do
      add :ship,     :string
      add :utc,      :integer
      add :geom,     :geometry
    end
    create index(:ship_data, [:ship])
  end

end

Run the migrations:

$ mix ecto.migrate

ShipData Model

Create a file at web/models/ship_data.ex with these contents:

defmodule OceanShipLogbooks.ShipData do
  use OceanShipLogbooks.Web, :model
  schema "ship_data" do
    field :ship, :string
    field :utc,  :integer
    field :geom, Geo.Point
  end
end

This is a pretty straightforward model. We are using the Geo library so we can use a geometry column type that is specific to PostGIS.

ShipData View

Create a file web/views/ship_data_view.ex with these contents:

defmodule OceanShipLogbooks.ShipDataView do
  use OceanShipLogbooks.Web, :view

  def render("show_ships.json", %{ships: ships}) do
   %{ship_data: render_many(ships, OceanShipLogbooks.ShipDataView, "ship_data.json")}
  end

  def render("ship_data.json", %{ship_data: ship}) do
    {lat, lon} = ship.geom.coordinates

    %{ship: ship.ship,
      utc: ship.utc,
      lat: lat,
      lon: lon}
  end

end

Router

Look in web/router.ex for the api block. Uncomment it and change to look like this:

# Other scopes may use custom stacks.
scope "/api", OceanShipLogbooks do
  pipe_through :api
  resources "/ship_data", ShipDataController, only: [:index]
end

Ship Data Controller

Create a file web/controllers/ship_data_controller.ex with these contents:

defmodule OceanShipLogbooks.ShipDataController do
  use OceanShipLogbooks.Web, :controller
  alias OceanShipLogbooks.ShipData

  def index(conn, _params) do
    # For simplicity in this example we hard code against
    # a specific ship name.
    query = from s in ShipData,
    where: s.ship == "Endeavour",
    order_by: [asc: s.utc],
    select: s

    ships = Repo.all(query)
    render(conn, "show_ships.json", ships: ships)
  end


  # These next two functions probably do not belong in a controller,
  # since they are for manually importing data, but we put them in here
  # for this example.
  def import_from_csv() do
    File.stream!("ship-data.csv")
    |> CSV.decode(headers: true)
    |> Stream.filter(fn(x) -> x["Lat3"] != "NA" and x["Lon3"] != "NA" end)
    |> Stream.map(&build_ship/1)
    |> Enum.each(&OceanShipLogbooks.Repo.insert!/1)
  end

  defp build_ship(row) do
    {lat, _} = Float.parse(row["Lat3"])
    {lon, _} = Float.parse(row["Lon3"])
    {utc, _} = Integer.parse(row["UTC"])
    geom = %Geo.Point{ coordinates: {lat, lon}, srid: 4326}
    %OceanShipLogbooks.ShipData{ship: row["ShipName"], utc: utc, geom: geom}
  end

end

Download and clean the data.

Ensure you have downloaded the data the ship data from Kaggle. Extract the file CLIWOC15.csv into the root directory of your project (this file should be about 200 megabytes in size).

Unfortunately the Elixir CSV parser we are using is not RFC 4180 compliant and the ship data includes fields with escaped newlines. I did not investigate other Elixir or Erlang CSV parsers, so while not ideal, until this is fixed I removed the newlines with the following python script called transform_csv.py:

import csv

with open("CLIWOC15.csv", 'rU') as csvIN:
    with open('ship-data.csv', 'wb') as csvOUT:
        writer = csv.writer(csvOUT, delimiter=',', quoting=csv.QUOTE_ALL)
        for line in csv.reader(csvIN, delimiter=','):
            line = [x.replace('\n', '') for x in line]
            writer.writerow(line)

Run the script to remove extraneous newlines:

$ python transform-csv.py

Load the data into PostGIS

Run the app inside IEx (Interactive Elixir):

$ iex -S mix phoenix.server
iex(1)> OceanShipLogbooks.ShipDataController.import_from_csv
# Exit iex

Now verify the data is in PostGIS. Your session should look something like this:

$ psql -h localhost -p 5432 -U postgres ocean_ship_logbooks_dev

ocean_ship_logbooks_dev=# select count(*) from ship_data;
 count  
--------
 252917
(1 row)


ocean_ship_logbooks_dev=# select * from ship_data where ship = 'Endeavour' limit 4;
  id   |   ship    |    utc     |                        geom                        
-------+-----------+------------+----------------------------------------------------
 41085 | Endeavour | 1768110915 | 0101000020E610000024287E8CB97B35C03333333333D342C0
 41084 | Endeavour | 1768093011 | 0101000020E61000006666666666E62F40423EE8D9ACAA35C0
 41083 | Endeavour | 1768090513 | 0101000020E6100000545227A08988454000000000000024C0
 41082 | Endeavour | 1768090413 | 0101000020E6100000AE47E17A14CE4540D7A3703D0A5724C0

You can also query the data via the phoenix app:

$ iex -S mix phoenix.server

iex(1)> import Ecto.Query, only: [from: 2]
iex(2)> query = from s in OceanShipLogbooks.ShipData,
...(2)> where: s.ship == "Endeavour",
...(2)> limit: 4,
...(2)> select: s
iex(3)> ships = OceanShipLogbooks.Repo.all(query)
[%OceanShipLogbooks.ShipData{__meta__: #Ecto.Schema.Metadata<:loaded>,
  geom: %Geo.Point{coordinates: {-21.4833, -37.65}, srid: 4326}, id: 41085,
  ship: "Endeavour", utc: 1768110915},
 %OceanShipLogbooks.ShipData{__meta__: #Ecto.Schema.Metadata<:loaded>,
  geom: %Geo.Point{coordinates: {15.95, -21.6667}, srid: 4326}, id: 41084,
  ship: "Endeavour", utc: 1768093011},
 %OceanShipLogbooks.ShipData{__meta__: #Ecto.Schema.Metadata<:loaded>,
  geom: %Geo.Point{coordinates: {43.0667, -10.0}, srid: 4326}, id: 41083,
  ship: "Endeavour", utc: 1768090513},
 %OceanShipLogbooks.ShipData{__meta__: #Ecto.Schema.Metadata<:loaded>,
  geom: %Geo.Point{coordinates: {43.61, -10.17}, srid: 4326}, id: 41082,
  ship: "Endeavour", utc: 1768090413}]

Feel free to delete the CSV files to clear up a few hundred megs of disk space.

Download Countries GeoJSON

Get the GeoGSON data from this GitHub repo and put the geojson file at priv/static/js/world.geojson

This will let us draw a map of the world in D3.

HTML and Javascript code

Edit web/templates/page/index.html.eex to look like this:

<div class="jumbotron">
  <h2>Captain Cook's travels</h2>
</div>

<div id="d3_map"></div> 

<script src="//d3js.org/d3.v3.min.js" charset="utf-8"></script>

<script>
  var shipData;
  var width = 500;
  var height = 500;
  var projection = d3.geo.mercator()
    .scale(80)
    .translate([width / 2, height / 2]);

  var svg = d3.select("#d3_map")
    .append("svg")
    .attr("width", width)
    .attr("height", height);
  
  d3.json("/js/world.geojson", createMap);

  function createMap(countries) {
    var geoPath = d3.geo.path().projection(projection);
    d3.select("svg").selectAll("path").data(countries.features)
      .enter()
      .append("path")
      .attr("d", geoPath)
      .attr("class", "countries");
  };

  function makeRequest(url) {
    httpRequest = new XMLHttpRequest();
    if (!httpRequest) {
      console.log('Cannot create an XMLHTTP instance');
      return false;
    }
    httpRequest.onreadystatechange = jsonContents;
    httpRequest.open('GET', url);
    httpRequest.send();
  }

  function jsonContents() {
    if (httpRequest.readyState === XMLHttpRequest.DONE) {
      if (httpRequest.status === 200) {
        var response = JSON.parse(httpRequest.responseText);
        shipData = response.ship_data;
        drawCircle(shipData.slice(0));
      } else {
        console.log('There was a problem with the request.');
      }
    }
  }
  
  function drawCircle(shipDataCopy) {
    // if no data, erase the circles from the map and start over
    if (shipDataCopy.length == 0) {
      svg.selectAll("circle").remove();
      setTimeout(function() { drawCircle(shipData.slice(0))}, 50);
      return;
    }

    shipDatum = shipDataCopy.shift();

    d3.select("svg")
      .append("circle")
      .style("fill", "red")
      .attr("r", 2)
      .attr("cx", function(d) {return projection([shipDatum.lon,shipDatum.lat])[0]})
      .attr("cy", function(d) {return projection([shipDatum.lon,shipDatum.lat])[1]});

    setTimeout(function() { drawCircle(shipDataCopy) }, 50);
 }

makeRequest("/api/ship_data");

</script>

Now start the server

$ mix phoenix.server

Open a web browser to http://localhost:4000 to see an animation like at the top of this post and a page that should look something like this:

Cook's Travel

comments powered by Disqus