# Caching Depending on how you use the analyzer and how frequently, caching the files downloaded from the internet can be very beneficial and substantially reduce the time spent for preparing the analysis. Here is a flowchart depicting the process: ```{graphviz} digraph Flowchart { rankdir=LR; node [shape=rectangle, style=filled, fillcolor=lightgray]; A [label="File Request"]; B [label="Cached?", shape=diamond, fillcolor=white]; D [label="Download into cache"]; F [label="Copy to Store"]; G [label="Processing?", shape=diamond, fillcolor=white]; H [label="Process"]; K [label="Done"]; A -> B; B -> F [label="Yes"]; B -> D [label="No"]; D -> F; F -> G; G -> H [label="Yes"]; G -> K [label="No"]; H -> K; } ``` The following files will be cached: - US 2010 Census blocks - US 2019 LODES data (employment) - US Water blocks - US State speed limits - US City speed limits ```{important} The brokenspoke-analyzer does not perform any cache management operation, like invalidating the cache or cleaning up the files. This is the user's responsability to ensure the content of the cache is up to date. ``` ## Caching strategies The brokenspoke analyzer provides several caching strategies: - No cache - User cache directory - AWS S3 Bucket The cache is configured using environment variables only. ```{attention} Environment variables are case sensitive. If the value is incorrect, it falls back to the "no cache" strategy. ``` ### No cache (default) By default, the brokenspoke-analyzer does not cache any data. It simply stores them in the `output directory` specified by the user via the `--output-dir` option (by default it is `./data`). However if the files already exist in the `output directory`, then they won't be redownloaded. If you use the analyzer occasionally for only one or two cities, this strategy is most likely the best match. ### User cache directory For users running multiple or frequent analyses, this is the recomended caching strategy. Files will be downloaded and stored in the user cache directory for future uses, speeding up the data ingestion phase. Depending on the platform, the user cache directory will be one of the following: - OSX: `~/Library/Application Support/brokenspoke-analyzer` - Linux: `~/.local/share/brokenspoke-analyzer` - Windows: `C:\Documents and Settings\\Application Data\Local Settings\PeopleForBikes\brokenspoke-analyzer` To use it, set the following environment variable: ```bash export BNA_CACHING_STRATEGY=USER_CACHE ``` ### AWS S3 bucket When using the brokenspoke-analyzer in the AWS cloud there is the possibility to cache them in an S3 bucket. The bucket name (i.e. wthout the `s3://` scheme) and the AWS region of the account must be specified. To use it, set the following environment variables: ```bash export BNA_CACHING_STRATEGY=AWS_S3 export BNA_CACHE_AWS_S3_BUCKET=my-aws-cache-bucket export AWS_REGION=us-east-1 ```