How to use custom input files

Before starting the analysis, the brokenspoke-analyzer will need to download some input files in order to proceed. In most of the cases you want to let it download the files automatically. However, if can happen that you would want to use your own input files, for example for testing an hypothesis.

Doing so is very simple, you simply have to provide the file(s), and copy them in the data directory for the city, following our naming conventions.

What input files are being used

Let say that you want to analyze the city of Provincetown, MA in the United States.

If you would let the brokenspoke-analyzer do the work automatically you would obtain the following file structure:

.
└── data
   └── provincetown-massachusetts-united-states
       ├── city_fips_speed.csv
       ├── ma_od_aux_JT00_2022.csv
       ├── ma_od_main_JT00_2022.csv
       ├── massachusetts-latest.osm.pbf
       ├── massachusetts-latest.osm.pbf.md5
       ├── population.cpg
       ├── population.dbf
       ├── population.prj
       ├── population.shp
       ├── population.shx
       ├── population.xml
       ├── population.zip
       ├── provincetown-massachusetts-united-states.clipped.osm
       ├── provincetown-massachusetts-united-states.cpg
       ├── provincetown-massachusetts-united-states.dbf
       ├── provincetown-massachusetts-united-states.geojson
       ├── provincetown-massachusetts-united-states.osm
       ├── provincetown-massachusetts-united-states.prj
       ├── provincetown-massachusetts-united-states.shp
       ├── provincetown-massachusetts-united-states.shx
       └── state_fips_speed.csv

Let’s see what these are in details.

Data directory

First, the data directory. This is the location where all the necessary input files are going to be written on disk.

By default it is named data in the folder where you cloned the repository. This can be overridden with the --data-dir flag in most of the commands if need be, but most of the time the default location will work just fine.

The name of the directory containing the data matches the following convention: <city>[-<region>]-<country>.

All these values match the parameters that you passed on the CLI. Note that the region parameter is optional for non-US cities, therefore you may end up with a directory named <city>-<country>, like valetta-malta for instance.

Boundary files

They represent the administrative boundaries of the city. For historical reasons, this file exists in 2 formats:

  • Geojson

  • Shapefile

However only the shapefile is used for the analysis.

The name of the file is the same name as the directory, with the .geojson extension, or all the extensions of a shapefile:

├── provincetown-massachusetts-united-states.cpg
├── provincetown-massachusetts-united-states.dbf
├── provincetown-massachusetts-united-states.geojson
├── provincetown-massachusetts-united-states.osm
├── provincetown-massachusetts-united-states.prj
├── provincetown-massachusetts-united-states.shp
├── provincetown-massachusetts-united-states.shx

OSM region file

This is the osm.pbf file (OSM PBF files are binary files that contain OpenStreetMap data in the Protocolbuffer Binary Format, which is more compact and faster to process than the XML format) representing the region where the city is located.

Note that if the region was omitted, this file will have the name of the country instead.

The checksum file, .md5, is required to verify the integrity of the data.

├── massachusetts-latest.osm.pbf
├── massachusetts-latest.osm.pbf.md5

Clipped city file

This is an extract of the region file, matching the boundaries of the city to analyze.

It has the same name as the directory, with the .clipped.osm extension.

├── provincetown-massachusetts-united-states.clipped.osm

Population file

For US cities it is simply the shapefile provided by the US Census Bureau for the state where the city is located.

For non-US cities, we generate synthetic population data to simulate the census. Refer to the “Preparation workflow” tutorial for more details.

The shapefile is simply named “population”.

├── population.cpg
├── population.dbf
├── population.prj
├── population.shp
├── population.shx
├── population.xml

Employment files (US only)

These files are provided by the US census and contain information about US jobs.

├── ma_od_aux_JT00_2022.csv
├── ma_od_main_JT00_2022.csv

Speed limits

There is a file containing the default speed limits per state (US only), and a file for the speed limit of the cities if it differs from the default one.

├── city_fips_speed.csv
└── state_fips_speed.csv

Note that while these files can be edited, we recommend you use the --city-speed-limit option on the CLI if you need to override the default value.