From 57e7119c0d4f1ab7dd1d4c0118e72dc1706e2151 Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Mon, 17 Sep 2018 11:27:38 -0700 Subject: Rebase pipeline branch off development --- packages/pipeline/README.md | 54 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 packages/pipeline/README.md (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md new file mode 100644 index 000000000..594454bd0 --- /dev/null +++ b/packages/pipeline/README.md @@ -0,0 +1,54 @@ +## @0xproject/pipeline + +This repository contains scripts used for scraping data from the Ethereum blockchain into SQL tables for analysis by the 0x team. + +## Contributing + +We strongly recommend that the community help us make improvements and determine the future direction of the protocol. To report bugs within this package, please create an issue in this repository. + +Please read our [contribution guidelines](../../CONTRIBUTING.md) before getting started. + +## Local Dev Setup + +Requires Node version 6.9.5 or higher. + +Add the following to your `.env` file: + +``` +REDSHIFT_USER +REDSHIFT_DB +REDSHIFT_PASSWORD +REDSHIFT_PORT +REDSHIFT_HOST +WEB3_PROVIDER_URL +``` + +Running a script example: + +``` +node ./lib/scripts/scrape_data.js --type tokens +``` + +### Install dependencies: + +```bash +yarn install +``` + +### Build + +```bash +yarn build +``` + +### Clean + +```bash +yarn clean +``` + +### Lint + +```bash +yarn lint +``` -- cgit v1.2.3 From dca2a4e9c2712f67852bed4ae6ae76c6434f7e56 Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Tue, 6 Nov 2018 12:53:59 -0800 Subject: Remove outdated info from README --- packages/pipeline/README.md | 21 --------------------- 1 file changed, 21 deletions(-) (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index 594454bd0..df54de21c 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -8,27 +8,6 @@ We strongly recommend that the community help us make improvements and determine Please read our [contribution guidelines](../../CONTRIBUTING.md) before getting started. -## Local Dev Setup - -Requires Node version 6.9.5 or higher. - -Add the following to your `.env` file: - -``` -REDSHIFT_USER -REDSHIFT_DB -REDSHIFT_PASSWORD -REDSHIFT_PORT -REDSHIFT_HOST -WEB3_PROVIDER_URL -``` - -Running a script example: - -``` -node ./lib/scripts/scrape_data.js --type tokens -``` - ### Install dependencies: ```bash -- cgit v1.2.3 From 4061731245a8513e8d990f3af87e182fb674838b Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Tue, 27 Nov 2018 16:43:00 -0800 Subject: [pipeline] Add additional documentation to the README (#1328) --- packages/pipeline/README.md | 113 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index df54de21c..c647950a2 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -31,3 +31,116 @@ yarn clean ```bash yarn lint ``` + +### Migrations + +Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase` +Run migrations: `yarn migrate:run` +Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert` + +## Connecting to PostgreSQL + +Across the pipeline package, any code which accesses the database uses the +environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly +formatted +[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url). + +## Test environment + +The easiest way to start Postgres is via Docker. Depending on your +platform, you may need to prepend `sudo` to the following command: + +``` +docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine +``` + +This will start a Postgres server with the default username and database name. +You should set the environment variable as follows: + +``` +export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres +``` + +First thing you will need to do is run the migrations: + +``` +yarn migrate:run +``` + +Now you can run scripts locally: + +``` +node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js +``` + +To stop the Postgres server (you may need to add `sudo`): + +``` +docker stop pipeline_postgres +``` + +This will remove all data from the database. + +If you prefer, you can also install Postgres with e.g., +[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or +[Postgress.app](https://postgresapp.com/). As long as you set the +`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres +server will work. + +## Directory structure + +``` +. +├── lib: Code generated by the TypeScript compiler. Don't edit this directly. +├── migrations: Code for creating and updating database schemas. +├── node_modules: +├── src: All TypeScript source code. +│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source. +│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models". +│   ├── parsers: Code for converting raw data into entities. +│   ├── scripts: Executable scripts which put all the pieces together. +│   └── utils: Various utils used across packages/files. +├── test: All tests go here and are organized in the same way as the folder/file that they test. +``` + +## Adding new data to the pipeline + +1. Create an entity in the _entities_ directory. Entities directly mirror our + database schemas. We follow the practice of having "dumb" entities, so + entity classes should typically not have any methods. +2. Create a migration using the `yarn migrate:create` command. Create/update + tables as needed. Remember to fill in both the `up` and `down` methods. Try + to avoid data loss as much as possible in your migrations. +3. Create a class or function in the _data_sources_ directory for getting raw + data. This code should abstract away pagination and rate-limiting as much as + possible. +4. Create a class or function in the _parsers_ directory for converting the raw + data into an entity. Also add tests in the _tests_ directory to test the + parser. +5. Create an executable script in the _scripts_ directory for putting + everything together. Your script can accept environment variables for things + like API keys. It should pull the data, parse it, and save it to the + database. Scripts should be idempotent and atomic (when possible). What this + means is that your script may be responsible for determining **which** data + needs to be updated. For example, you may need to query the database to find + the most recent block number that we have already pulled, then pull new data + starting from that block number. +6. Run the migrations and then run your new script locally and verify it works + as expected. + +#### Additional guidelines and tips: + +* Table names should be plural and separated by underscores (e.g., + `exchange_fill_events`). +* Any table which contains data which comes directly from a third-party source + should be namespaced in the `raw` PostgreSQL schema. +* Column names in the database should be separated by underscores (e.g., + `maker_asset_type`). +* Field names in entity classes (like any other fields in TypeScript) should + be camel-cased (e.g., `makerAssetType`). +* All timestamps should be stored as milliseconds since the Unix Epoch. +* Use the `BigNumber` type for TypeScript code which deals with 256-bit + numbers from smart contracts or for any case where we are dealing with large + floating point numbers. +* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a + helpful resource. -- cgit v1.2.3 From 3d211c415b58a67f84332ff512bf9372cac5a3ac Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Wed, 28 Nov 2018 13:21:04 -0800 Subject: Introduce framework for running basic tests for entities (#1344) * Introduce framework for running basic tests for entities * Add pipeline tests to CircleCI config * Make pipeline tests more configurable and fix CircleCI config * Add coverage dir to pipeline package * Add basic tests for all exchange event entities * Add tests for remaining entities * Create separate test scripts in package.json and add new info to README * Update db_setup.ts to revert migrations even if you are using docker * Automatically pull the postgres image if needed * Add comment about why NumberToBigIntTransformer is needed --- packages/pipeline/README.md | 59 ++++++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 20 deletions(-) (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index c647950a2..fb563b14c 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -38,17 +38,34 @@ Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase` Run migrations: `yarn migrate:run` Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert` -## Connecting to PostgreSQL +## Testing -Across the pipeline package, any code which accesses the database uses the -environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly -formatted -[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url). +There are several test scripts in **package.json**. You can run all the tests +with `yarn test:all` or run certain tests seprately by following the +instructions below. Some tests may not work out of the box on certain platforms. -## Test environment +### Unit tests -The easiest way to start Postgres is via Docker. Depending on your -platform, you may need to prepend `sudo` to the following command: +The unit tests can be run with `yarn test`. These tests don't depend on any +services or databases and will run in any environment that can run Node. + +### Database tests + +Database integration tests can be run with `yarn test:db`. These tests will +attempt to automatically spin up a Postgres database via Docker. If this doesn't +work you have two other options: + +1. Set the `DOCKER_SOCKET` environment variable to a valid socket path to use + for communicating with Docker. +2. Start Postgres manually and set the `ZEROEX_DATA_PIPELINE_TEST_DB_URL` + environment variable. If this is set, the tests will use your existing + Postgres database instead of trying to create one with Docker. + +## Running locally + +`pipeline` requires access to a PostgreSQL database. The easiest way to start +Postgres is via Docker. Depending on your platform, you may need to prepend +`sudo` to the following command: ``` docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine @@ -83,9 +100,9 @@ This will remove all data from the database. If you prefer, you can also install Postgres with e.g., [Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or -[Postgress.app](https://postgresapp.com/). As long as you set the -`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres -server will work. +[Postgress.app](https://postgresapp.com/). Keep in mind that you will need to +set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid +[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url) ## Directory structure @@ -111,21 +128,23 @@ server will work. 2. Create a migration using the `yarn migrate:create` command. Create/update tables as needed. Remember to fill in both the `up` and `down` methods. Try to avoid data loss as much as possible in your migrations. -3. Create a class or function in the _data_sources_ directory for getting raw - data. This code should abstract away pagination and rate-limiting as much as - possible. -4. Create a class or function in the _parsers_ directory for converting the raw - data into an entity. Also add tests in the _tests_ directory to test the - parser. -5. Create an executable script in the _scripts_ directory for putting +3. Add basic tests for your entity and migrations to the **test/entities/** + directory. +4. Create a class or function in the **data_sources/** directory for getting + raw data. This code should abstract away pagination and rate-limiting as + much as possible. +5. Create a class or function in the **parsers/** directory for converting the + raw data into an entity. Also add tests in the **tests/** directory to test + the parser. +6. Create an executable script in the **scripts/** directory for putting everything together. Your script can accept environment variables for things like API keys. It should pull the data, parse it, and save it to the database. Scripts should be idempotent and atomic (when possible). What this - means is that your script may be responsible for determining **which** data + means is that your script may be responsible for determining _which_ data needs to be updated. For example, you may need to query the database to find the most recent block number that we have already pulled, then pull new data starting from that block number. -6. Run the migrations and then run your new script locally and verify it works +7. Run the migrations and then run your new script locally and verify it works as expected. #### Additional guidelines and tips: -- cgit v1.2.3 From 00f86ca0f7871639d2b0be496f6f8c5e0d8d7ffe Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Tue, 4 Dec 2018 20:04:08 -0800 Subject: Address PR feedback --- packages/pipeline/README.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index fb563b14c..0f4abd935 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -42,7 +42,8 @@ Revert the most recent migration (CAUTION: may result in data loss!): `yarn migr There are several test scripts in **package.json**. You can run all the tests with `yarn test:all` or run certain tests seprately by following the -instructions below. Some tests may not work out of the box on certain platforms. +instructions below. Some tests may not work out of the box on certain platforms +or operating systems (see the "Database tests" section below). ### Unit tests @@ -71,8 +72,8 @@ Postgres is via Docker. Depending on your platform, you may need to prepend docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine ``` -This will start a Postgres server with the default username and database name. -You should set the environment variable as follows: +This will start a Postgres server with the default username and database name +(`postgres` and `postgres`). You should set the environment variable as follows: ``` export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres @@ -149,17 +150,17 @@ set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid #### Additional guidelines and tips: -* Table names should be plural and separated by underscores (e.g., +- Table names should be plural and separated by underscores (e.g., `exchange_fill_events`). -* Any table which contains data which comes directly from a third-party source +- Any table which contains data which comes directly from a third-party source should be namespaced in the `raw` PostgreSQL schema. -* Column names in the database should be separated by underscores (e.g., +- Column names in the database should be separated by underscores (e.g., `maker_asset_type`). -* Field names in entity classes (like any other fields in TypeScript) should +- Field names in entity classes (like any other fields in TypeScript) should be camel-cased (e.g., `makerAssetType`). -* All timestamps should be stored as milliseconds since the Unix Epoch. -* Use the `BigNumber` type for TypeScript code which deals with 256-bit +- All timestamps should be stored as milliseconds since the Unix Epoch. +- Use the `BigNumber` type for TypeScript code which deals with 256-bit numbers from smart contracts or for any case where we are dealing with large floating point numbers. -* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a +- [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a helpful resource. -- cgit v1.2.3 From 2e704ac01a077b0c73288aaa53c9cf66c73e27f1 Mon Sep 17 00:00:00 2001 From: Alex Browne Date: Tue, 4 Dec 2018 20:08:32 -0800 Subject: Fix prettier --- packages/pipeline/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'packages/pipeline/README.md') diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md index 0f4abd935..794488cac 100644 --- a/packages/pipeline/README.md +++ b/packages/pipeline/README.md @@ -150,17 +150,17 @@ set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid #### Additional guidelines and tips: -- Table names should be plural and separated by underscores (e.g., +* Table names should be plural and separated by underscores (e.g., `exchange_fill_events`). -- Any table which contains data which comes directly from a third-party source +* Any table which contains data which comes directly from a third-party source should be namespaced in the `raw` PostgreSQL schema. -- Column names in the database should be separated by underscores (e.g., +* Column names in the database should be separated by underscores (e.g., `maker_asset_type`). -- Field names in entity classes (like any other fields in TypeScript) should +* Field names in entity classes (like any other fields in TypeScript) should be camel-cased (e.g., `makerAssetType`). -- All timestamps should be stored as milliseconds since the Unix Epoch. -- Use the `BigNumber` type for TypeScript code which deals with 256-bit +* All timestamps should be stored as milliseconds since the Unix Epoch. +* Use the `BigNumber` type for TypeScript code which deals with 256-bit numbers from smart contracts or for any case where we are dealing with large floating point numbers. -- [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a +* [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a helpful resource. -- cgit v1.2.3