aboutsummaryrefslogtreecommitdiffstats
path: root/packages/pipeline/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'packages/pipeline/README.md')
-rw-r--r--packages/pipeline/README.md166
1 files changed, 166 insertions, 0 deletions
diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md
new file mode 100644
index 000000000..794488cac
--- /dev/null
+++ b/packages/pipeline/README.md
@@ -0,0 +1,166 @@
+## @0xproject/pipeline
+
+This repository contains scripts used for scraping data from the Ethereum blockchain into SQL tables for analysis by the 0x team.
+
+## Contributing
+
+We strongly recommend that the community help us make improvements and determine the future direction of the protocol. To report bugs within this package, please create an issue in this repository.
+
+Please read our [contribution guidelines](../../CONTRIBUTING.md) before getting started.
+
+### Install dependencies:
+
+```bash
+yarn install
+```
+
+### Build
+
+```bash
+yarn build
+```
+
+### Clean
+
+```bash
+yarn clean
+```
+
+### Lint
+
+```bash
+yarn lint
+```
+
+### Migrations
+
+Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`
+Run migrations: `yarn migrate:run`
+Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert`
+
+## Testing
+
+There are several test scripts in **package.json**. You can run all the tests
+with `yarn test:all` or run certain tests seprately by following the
+instructions below. Some tests may not work out of the box on certain platforms
+or operating systems (see the "Database tests" section below).
+
+### Unit tests
+
+The unit tests can be run with `yarn test`. These tests don't depend on any
+services or databases and will run in any environment that can run Node.
+
+### Database tests
+
+Database integration tests can be run with `yarn test:db`. These tests will
+attempt to automatically spin up a Postgres database via Docker. If this doesn't
+work you have two other options:
+
+1. Set the `DOCKER_SOCKET` environment variable to a valid socket path to use
+ for communicating with Docker.
+2. Start Postgres manually and set the `ZEROEX_DATA_PIPELINE_TEST_DB_URL`
+ environment variable. If this is set, the tests will use your existing
+ Postgres database instead of trying to create one with Docker.
+
+## Running locally
+
+`pipeline` requires access to a PostgreSQL database. The easiest way to start
+Postgres is via Docker. Depending on your platform, you may need to prepend
+`sudo` to the following command:
+
+```
+docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine
+```
+
+This will start a Postgres server with the default username and database name
+(`postgres` and `postgres`). You should set the environment variable as follows:
+
+```
+export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres
+```
+
+First thing you will need to do is run the migrations:
+
+```
+yarn migrate:run
+```
+
+Now you can run scripts locally:
+
+```
+node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js
+```
+
+To stop the Postgres server (you may need to add `sudo`):
+
+```
+docker stop pipeline_postgres
+```
+
+This will remove all data from the database.
+
+If you prefer, you can also install Postgres with e.g.,
+[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or
+[Postgress.app](https://postgresapp.com/). Keep in mind that you will need to
+set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid
+[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url)
+
+## Directory structure
+
+```
+.
+├── lib: Code generated by the TypeScript compiler. Don't edit this directly.
+├── migrations: Code for creating and updating database schemas.
+├── node_modules:
+├── src: All TypeScript source code.
+│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source.
+│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models".
+│   ├── parsers: Code for converting raw data into entities.
+│   ├── scripts: Executable scripts which put all the pieces together.
+│   └── utils: Various utils used across packages/files.
+├── test: All tests go here and are organized in the same way as the folder/file that they test.
+```
+
+## Adding new data to the pipeline
+
+1. Create an entity in the _entities_ directory. Entities directly mirror our
+ database schemas. We follow the practice of having "dumb" entities, so
+ entity classes should typically not have any methods.
+2. Create a migration using the `yarn migrate:create` command. Create/update
+ tables as needed. Remember to fill in both the `up` and `down` methods. Try
+ to avoid data loss as much as possible in your migrations.
+3. Add basic tests for your entity and migrations to the **test/entities/**
+ directory.
+4. Create a class or function in the **data_sources/** directory for getting
+ raw data. This code should abstract away pagination and rate-limiting as
+ much as possible.
+5. Create a class or function in the **parsers/** directory for converting the
+ raw data into an entity. Also add tests in the **tests/** directory to test
+ the parser.
+6. Create an executable script in the **scripts/** directory for putting
+ everything together. Your script can accept environment variables for things
+ like API keys. It should pull the data, parse it, and save it to the
+ database. Scripts should be idempotent and atomic (when possible). What this
+ means is that your script may be responsible for determining _which_ data
+ needs to be updated. For example, you may need to query the database to find
+ the most recent block number that we have already pulled, then pull new data
+ starting from that block number.
+7. Run the migrations and then run your new script locally and verify it works
+ as expected.
+
+#### Additional guidelines and tips:
+
+* Table names should be plural and separated by underscores (e.g.,
+ `exchange_fill_events`).
+* Any table which contains data which comes directly from a third-party source
+ should be namespaced in the `raw` PostgreSQL schema.
+* Column names in the database should be separated by underscores (e.g.,
+ `maker_asset_type`).
+* Field names in entity classes (like any other fields in TypeScript) should
+ be camel-cased (e.g., `makerAssetType`).
+* All timestamps should be stored as milliseconds since the Unix Epoch.
+* Use the `BigNumber` type for TypeScript code which deals with 256-bit
+ numbers from smart contracts or for any case where we are dealing with large
+ floating point numbers.
+* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a
+ helpful resource.