From 3ca876c5744d9030f4d954f73038ddb05d014d42 Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Tue, 27 Nov 2018 16:43:00 -0800 Subject: [pipeline] Add additional documentation to the README (#1328) --- packages/pipeline/README.md | 113 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) (limited to 'packages') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index df54de21c..c647950a2 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -31,3 +31,116 @@ yarn clean ```bash yarn lint ``` + +### Migrations + +Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase` +Run migrations: `yarn migrate:run` +Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert` + +## Connecting to PostgreSQL + +Across the pipeline package, any code which accesses the database uses the +environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly +formatted +[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url). + +## Test environment + +The easiest way to start Postgres is via Docker. Depending on your +platform, you may need to prepend `sudo` to the following command: + +``` +docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine +``` + +This will start a Postgres server with the default username and database name. +You should set the environment variable as follows: + +``` +export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres +``` + +First thing you will need to do is run the migrations: + +``` +yarn migrate:run +``` + +Now you can run scripts locally: + +``` +node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js +``` + +To stop the Postgres server (you may need to add `sudo`): + +``` +docker stop pipeline_postgres +``` + +This will remove all data from the database. + +If you prefer, you can also install Postgres with e.g., +[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or +[Postgress.app](https://postgresapp.com/). As long as you set the +`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres +server will work. + +## Directory structure + +``` +. +├── lib: Code generated by the TypeScript compiler. Don't edit this directly. +├── migrations: Code for creating and updating database schemas. +├── node_modules: +├── src: All TypeScript source code. +│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source. +│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models". +│   ├── parsers: Code for converting raw data into entities. +│   ├── scripts: Executable scripts which put all the pieces together. +│   └── utils: Various utils used across packages/files. +├── test: All tests go here and are organized in the same way as the folder/file that they test. +``` + +## Adding new data to the pipeline + +1. Create an entity in the _entities_ directory. Entities directly mirror our + database schemas. We follow the practice of having "dumb" entities, so + entity classes should typically not have any methods. +2. Create a migration using the `yarn migrate:create` command. Create/update + tables as needed. Remember to fill in both the `up` and `down` methods. Try + to avoid data loss as much as possible in your migrations. +3. Create a class or function in the _data_sources_ directory for getting raw + data. This code should abstract away pagination and rate-limiting as much as + possible. +4. Create a class or function in the _parsers_ directory for converting the raw + data into an entity. Also add tests in the _tests_ directory to test the + parser. +5. Create an executable script in the _scripts_ directory for putting + everything together. Your script can accept environment variables for things + like API keys. It should pull the data, parse it, and save it to the + database. Scripts should be idempotent and atomic (when possible). What this + means is that your script may be responsible for determining **which** data + needs to be updated. For example, you may need to query the database to find + the most recent block number that we have already pulled, then pull new data + starting from that block number. +6. Run the migrations and then run your new script locally and verify it works + as expected. + +#### Additional guidelines and tips: + +* Table names should be plural and separated by underscores (e.g., + `exchange_fill_events`). +* Any table which contains data which comes directly from a third-party source + should be namespaced in the `raw` PostgreSQL schema. +* Column names in the database should be separated by underscores (e.g., + `maker_asset_type`). +* Field names in entity classes (like any other fields in TypeScript) should + be camel-cased (e.g., `makerAssetType`). +* All timestamps should be stored as milliseconds since the Unix Epoch. +* Use the `BigNumber` type for TypeScript code which deals with 256-bit + numbers from smart contracts or for any case where we are dealing with large + floating point numbers. +* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a + helpful resource. -- cgit v1.2.3