aboutsummaryrefslogtreecommitdiffstats
path: root/packages/pipeline
diff options
context:
space:
mode:
authorAlex Browne <stephenalexbrowne@gmail.com>2018-11-28 08:43:00 +0800
committerFred Carlsen <fred@sjelfull.no>2018-12-06 19:04:25 +0800
commit3ca876c5744d9030f4d954f73038ddb05d014d42 (patch)
treedc2c901978748b2e7bff9e7e16de9b574215ef8b /packages/pipeline
parent6d2f4b91a99b2ae022564ceb2163ca80b7af2d90 (diff)
downloaddexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar.gz
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar.bz2
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar.lz
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar.xz
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.tar.zst
dexon-sol-tools-3ca876c5744d9030f4d954f73038ddb05d014d42.zip
[pipeline] Add additional documentation to the README (#1328)
Diffstat (limited to 'packages/pipeline')
-rw-r--r--packages/pipeline/README.md113
1 files changed, 113 insertions, 0 deletions
diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md
index df54de21c..c647950a2 100644
--- a/packages/pipeline/README.md
+++ b/packages/pipeline/README.md
@@ -31,3 +31,116 @@ yarn clean
```bash
yarn lint
```
+
+### Migrations
+
+Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`
+Run migrations: `yarn migrate:run`
+Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert`
+
+## Connecting to PostgreSQL
+
+Across the pipeline package, any code which accesses the database uses the
+environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly
+formatted
+[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url).
+
+## Test environment
+
+The easiest way to start Postgres is via Docker. Depending on your
+platform, you may need to prepend `sudo` to the following command:
+
+```
+docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine
+```
+
+This will start a Postgres server with the default username and database name.
+You should set the environment variable as follows:
+
+```
+export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres
+```
+
+First thing you will need to do is run the migrations:
+
+```
+yarn migrate:run
+```
+
+Now you can run scripts locally:
+
+```
+node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js
+```
+
+To stop the Postgres server (you may need to add `sudo`):
+
+```
+docker stop pipeline_postgres
+```
+
+This will remove all data from the database.
+
+If you prefer, you can also install Postgres with e.g.,
+[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or
+[Postgress.app](https://postgresapp.com/). As long as you set the
+`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres
+server will work.
+
+## Directory structure
+
+```
+.
+├── lib: Code generated by the TypeScript compiler. Don't edit this directly.
+├── migrations: Code for creating and updating database schemas.
+├── node_modules:
+├── src: All TypeScript source code.
+│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source.
+│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models".
+│   ├── parsers: Code for converting raw data into entities.
+│   ├── scripts: Executable scripts which put all the pieces together.
+│   └── utils: Various utils used across packages/files.
+├── test: All tests go here and are organized in the same way as the folder/file that they test.
+```
+
+## Adding new data to the pipeline
+
+1. Create an entity in the _entities_ directory. Entities directly mirror our
+ database schemas. We follow the practice of having "dumb" entities, so
+ entity classes should typically not have any methods.
+2. Create a migration using the `yarn migrate:create` command. Create/update
+ tables as needed. Remember to fill in both the `up` and `down` methods. Try
+ to avoid data loss as much as possible in your migrations.
+3. Create a class or function in the _data_sources_ directory for getting raw
+ data. This code should abstract away pagination and rate-limiting as much as
+ possible.
+4. Create a class or function in the _parsers_ directory for converting the raw
+ data into an entity. Also add tests in the _tests_ directory to test the
+ parser.
+5. Create an executable script in the _scripts_ directory for putting
+ everything together. Your script can accept environment variables for things
+ like API keys. It should pull the data, parse it, and save it to the
+ database. Scripts should be idempotent and atomic (when possible). What this
+ means is that your script may be responsible for determining **which** data
+ needs to be updated. For example, you may need to query the database to find
+ the most recent block number that we have already pulled, then pull new data
+ starting from that block number.
+6. Run the migrations and then run your new script locally and verify it works
+ as expected.
+
+#### Additional guidelines and tips:
+
+* Table names should be plural and separated by underscores (e.g.,
+ `exchange_fill_events`).
+* Any table which contains data which comes directly from a third-party source
+ should be namespaced in the `raw` PostgreSQL schema.
+* Column names in the database should be separated by underscores (e.g.,
+ `maker_asset_type`).
+* Field names in entity classes (like any other fields in TypeScript) should
+ be camel-cased (e.g., `makerAssetType`).
+* All timestamps should be stored as milliseconds since the Unix Epoch.
+* Use the `BigNumber` type for TypeScript code which deals with 256-bit
+ numbers from smart contracts or for any case where we are dealing with large
+ floating point numbers.
+* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a
+ helpful resource.