diff options
Diffstat (limited to 'packages/pipeline/README.md')
-rw-r--r-- | packages/pipeline/README.md | 186 |
1 files changed, 0 insertions, 186 deletions
diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md deleted file mode 100644 index 23113fd9b..000000000 --- a/packages/pipeline/README.md +++ /dev/null @@ -1,186 +0,0 @@ -## @0xproject/pipeline - -This repository contains scripts used for scraping data from the Ethereum blockchain into SQL tables for analysis by the 0x team. - -## Contributing - -We strongly recommend that the community help us make improvements and determine the future direction of the protocol. To report bugs within this package, please create an issue in this repository. - -Please read our [contribution guidelines](../../CONTRIBUTING.md) before getting started. - -### Install dependencies: - -```bash -yarn install -``` - -### Build - -```bash -yarn build -``` - -### Clean - -```bash -yarn clean -``` - -### Lint - -```bash -yarn lint -``` - -### Migrations - -Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`. - -Run migrations: `yarn migrate:run` - -Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert` - -## Testing - -There are several test scripts in **package.json**. You can run all the tests -with `yarn test:all` or run certain tests seprately by following the -instructions below. Some tests may not work out of the box on certain platforms -or operating systems (see the "Database tests" section below). - -### Unit tests - -The unit tests can be run with `yarn test`. These tests don't depend on any -services or databases and will run in any environment that can run Node. - -### Database tests - -Database integration tests can be run with `yarn test:db`. These tests will -attempt to automatically spin up a Postgres database via Docker. If this doesn't -work you have two other options: - -1. Set the `DOCKER_SOCKET` environment variable to a valid socket path to use - for communicating with Docker. -2. Start Postgres manually and set the `ZEROEX_DATA_PIPELINE_TEST_DB_URL` - environment variable. If this is set, the tests will use your existing - Postgres database instead of trying to create one with Docker. - -## Running locally - -`pipeline` requires access to a PostgreSQL database. The easiest way to start -Postgres is via Docker. Depending on your platform, you may need to prepend -`sudo` to the following command: - -``` -docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine -``` - -This will start a Postgres server with the default username and database name -(`postgres` and `postgres`). You should set the environment variable as follows: - -``` -export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres -``` - -First thing you will need to do is run the migrations: - -``` -yarn migrate:run -``` - -Now you can run scripts locally: - -``` -node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js -``` - -To stop the Postgres server (you may need to add `sudo`): - -``` -docker stop pipeline_postgres -``` - -This will remove all data from the database. - -If you prefer, you can also install Postgres with e.g., -[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or -[Postgress.app](https://postgresapp.com/). Keep in mind that you will need to -set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid -[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url) - -## Directory structure - -``` -. -├── lib: Code generated by the TypeScript compiler. Don't edit this directly. -├── migrations: Code for creating and updating database schemas. -├── node_modules: -├── src: All TypeScript source code. -│ ├── data_sources: Code responsible for getting raw data, typically from a third-party source. -│ ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models". -│ ├── parsers: Code for converting raw data into entities. -│ ├── scripts: Executable scripts which put all the pieces together. -│ └── utils: Various utils used across packages/files. -├── test: All tests go here and are organized in the same way as the folder/file that they test. -``` - -## Adding new data to the pipeline - -1. Create an entity in the _entities_ directory. Entities directly mirror our - database schemas. We follow the practice of having "dumb" entities, so - entity classes should typically not have any methods. -2. Create a migration using the `yarn migrate:create` command. Create/update - tables as needed. Remember to fill in both the `up` and `down` methods. Try - to avoid data loss as much as possible in your migrations. -3. Add basic tests for your entity and migrations to the **test/entities/** - directory. -4. Create a class or function in the **data_sources/** directory for getting - raw data. This code should abstract away pagination and rate-limiting as - much as possible. -5. Create a class or function in the **parsers/** directory for converting the - raw data into an entity. Also add tests in the **tests/** directory to test - the parser. -6. Create an executable script in the **scripts/** directory for putting - everything together. Your script can accept environment variables for things - like API keys. It should pull the data, parse it, and save it to the - database. Scripts should be idempotent and atomic (when possible). What this - means is that your script may be responsible for determining _which_ data - needs to be updated. For example, you may need to query the database to find - the most recent block number that we have already pulled, then pull new data - starting from that block number. -7. Run the migrations and then run your new script locally and verify it works - as expected. -8. After all tests pass and you can run the script locally, open a new PR to - the monorepo. Don't merge this yet! -9. If you added any new scripts or dependencies between scripts, you will need - to make changes to https://github.com/0xProject/0x-pipeline-orchestration - and make a separate PR there. Don't merge this yet! -10. After your PR passes code review, ask @feuGeneA or @xianny to deploy your - changes to the QA environment. Check the [QA Airflow dashboard](http://airflow-qa.0x.org:8080) - to make sure everything works correctly in the QA environment. -11. Merge your PR to 0x-monorepo (and - https://github.com/0xProject/0x-pipeline-orchestration if needed). Then ask - @feuGeneA or @xianny to deploy to production. -12. Monitor the [production Airflow dashboard](http://airflow.0x.org:8080) to - make sure everything still works. -13. Celebrate! :tada: - -#### Additional guidelines and tips: - -- Table names should be plural and separated by underscores (e.g., - `exchange_fill_events`). -- Any table which contains data which comes directly from a third-party source - should be namespaced in the `raw` PostgreSQL schema. -- Column names in the database should be separated by underscores (e.g., - `maker_asset_type`). -- Field names in entity classes (like any other fields in TypeScript) should - be camel-cased (e.g., `makerAssetType`). -- All timestamps should be stored as milliseconds since the Unix Epoch. -- Use the `BigNumber` type for TypeScript code which deals with 256-bit - numbers from smart contracts or for any case where we are dealing with large - floating point numbers. -- [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a - helpful resource. - -* Scripts/parsers should perform minimum data transformation/normalization. - The idea here is to have a raw data feed that will be cleaned up and - synthesized in a separate step. |