From 4061731245a8513e8d990f3af87e182fb674838b Mon Sep 17 00:00:00 2001
From: Alex Browne <stephenalexbrowne@gmail.com>
Date: Tue, 27 Nov 2018 16:43:00 -0800
Subject: [pipeline] Add additional documentation to the README (#1328)

---
 packages/pipeline/README.md | 113 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

(limited to 'packages')

diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md
index df54de21c..c647950a2 100644
--- a/packages/pipeline/README.md
+++ b/packages/pipeline/README.md
@@ -31,3 +31,116 @@ yarn clean
 ```bash
 yarn lint
 ```
+
+### Migrations
+
+Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`
+Run migrations: `yarn migrate:run`
+Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert`
+
+## Connecting to PostgreSQL
+
+Across the pipeline package, any code which accesses the database uses the
+environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly
+formatted
+[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url).
+
+## Test environment
+
+The easiest way to start Postgres is via Docker. Depending on your
+platform, you may need to prepend `sudo` to the following command:
+
+```
+docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine
+```
+
+This will start a Postgres server with the default username and database name.
+You should set the environment variable as follows:
+
+```
+export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres
+```
+
+First thing you will need to do is run the migrations:
+
+```
+yarn migrate:run
+```
+
+Now you can run scripts locally:
+
+```
+node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js
+```
+
+To stop the Postgres server (you may need to add `sudo`):
+
+```
+docker stop pipeline_postgres
+```
+
+This will remove all data from the database.
+
+If you prefer, you can also install Postgres with e.g.,
+[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or
+[Postgress.app](https://postgresapp.com/). As long as you set the
+`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres
+server will work.
+
+## Directory structure
+
+```
+.
+├── lib: Code generated by the TypeScript compiler. Don't edit this directly.
+├── migrations: Code for creating and updating database schemas.
+├── node_modules:
+├── src: All TypeScript source code.
+│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source.
+│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models".
+│   ├── parsers: Code for converting raw data into entities.
+│   ├── scripts: Executable scripts which put all the pieces together.
+│   └── utils: Various utils used across packages/files.
+├── test: All tests go here and are organized in the same way as the folder/file that they test.
+```
+
+## Adding new data to the pipeline
+
+1.  Create an entity in the _entities_ directory. Entities directly mirror our
+    database schemas. We follow the practice of having "dumb" entities, so
+    entity classes should typically not have any methods.
+2.  Create a migration using the `yarn migrate:create` command. Create/update
+    tables as needed. Remember to fill in both the `up` and `down` methods. Try
+    to avoid data loss as much as possible in your migrations.
+3.  Create a class or function in the _data_sources_ directory for getting raw
+    data. This code should abstract away pagination and rate-limiting as much as
+    possible.
+4.  Create a class or function in the _parsers_ directory for converting the raw
+    data into an entity. Also add tests in the _tests_ directory to test the
+    parser.
+5.  Create an executable script in the _scripts_ directory for putting
+    everything together. Your script can accept environment variables for things
+    like API keys. It should pull the data, parse it, and save it to the
+    database. Scripts should be idempotent and atomic (when possible). What this
+    means is that your script may be responsible for determining **which** data
+    needs to be updated. For example, you may need to query the database to find
+    the most recent block number that we have already pulled, then pull new data
+    starting from that block number.
+6.  Run the migrations and then run your new script locally and verify it works
+    as expected.
+
+#### Additional guidelines and tips:
+
+*   Table names should be plural and separated by underscores (e.g.,
+    `exchange_fill_events`).
+*   Any table which contains data which comes directly from a third-party source
+    should be namespaced in the `raw` PostgreSQL schema.
+*   Column names in the database should be separated by underscores (e.g.,
+    `maker_asset_type`).
+*   Field names in entity classes (like any other fields in TypeScript) should
+    be camel-cased (e.g., `makerAssetType`).
+*   All timestamps should be stored as milliseconds since the Unix Epoch.
+*   Use the `BigNumber` type for TypeScript code which deals with 256-bit
+    numbers from smart contracts or for any case where we are dealing with large
+    floating point numbers.
+*   [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a
+    helpful resource.
-- 
cgit v1.2.3