aboutsummaryrefslogtreecommitdiffstats
path: root/packages/pipeline/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'packages/pipeline/README.md')
-rw-r--r--packages/pipeline/README.md186
1 files changed, 0 insertions, 186 deletions
diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md
deleted file mode 100644
index 23113fd9b..000000000
--- a/packages/pipeline/README.md
+++ /dev/null
@@ -1,186 +0,0 @@
-## @0xproject/pipeline
-
-This repository contains scripts used for scraping data from the Ethereum blockchain into SQL tables for analysis by the 0x team.
-
-## Contributing
-
-We strongly recommend that the community help us make improvements and determine the future direction of the protocol. To report bugs within this package, please create an issue in this repository.
-
-Please read our [contribution guidelines](../../CONTRIBUTING.md) before getting started.
-
-### Install dependencies:
-
-```bash
-yarn install
-```
-
-### Build
-
-```bash
-yarn build
-```
-
-### Clean
-
-```bash
-yarn clean
-```
-
-### Lint
-
-```bash
-yarn lint
-```
-
-### Migrations
-
-Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`.
-
-Run migrations: `yarn migrate:run`
-
-Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert`
-
-## Testing
-
-There are several test scripts in **package.json**. You can run all the tests
-with `yarn test:all` or run certain tests seprately by following the
-instructions below. Some tests may not work out of the box on certain platforms
-or operating systems (see the "Database tests" section below).
-
-### Unit tests
-
-The unit tests can be run with `yarn test`. These tests don't depend on any
-services or databases and will run in any environment that can run Node.
-
-### Database tests
-
-Database integration tests can be run with `yarn test:db`. These tests will
-attempt to automatically spin up a Postgres database via Docker. If this doesn't
-work you have two other options:
-
-1. Set the `DOCKER_SOCKET` environment variable to a valid socket path to use
- for communicating with Docker.
-2. Start Postgres manually and set the `ZEROEX_DATA_PIPELINE_TEST_DB_URL`
- environment variable. If this is set, the tests will use your existing
- Postgres database instead of trying to create one with Docker.
-
-## Running locally
-
-`pipeline` requires access to a PostgreSQL database. The easiest way to start
-Postgres is via Docker. Depending on your platform, you may need to prepend
-`sudo` to the following command:
-
-```
-docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine
-```
-
-This will start a Postgres server with the default username and database name
-(`postgres` and `postgres`). You should set the environment variable as follows:
-
-```
-export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres
-```
-
-First thing you will need to do is run the migrations:
-
-```
-yarn migrate:run
-```
-
-Now you can run scripts locally:
-
-```
-node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js
-```
-
-To stop the Postgres server (you may need to add `sudo`):
-
-```
-docker stop pipeline_postgres
-```
-
-This will remove all data from the database.
-
-If you prefer, you can also install Postgres with e.g.,
-[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or
-[Postgress.app](https://postgresapp.com/). Keep in mind that you will need to
-set the`ZEROEX_DATA_PIPELINE_DB_URL` environment variable to a valid
-[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url)
-
-## Directory structure
-
-```
-.
-├── lib: Code generated by the TypeScript compiler. Don't edit this directly.
-├── migrations: Code for creating and updating database schemas.
-├── node_modules:
-├── src: All TypeScript source code.
-│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source.
-│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models".
-│   ├── parsers: Code for converting raw data into entities.
-│   ├── scripts: Executable scripts which put all the pieces together.
-│   └── utils: Various utils used across packages/files.
-├── test: All tests go here and are organized in the same way as the folder/file that they test.
-```
-
-## Adding new data to the pipeline
-
-1. Create an entity in the _entities_ directory. Entities directly mirror our
- database schemas. We follow the practice of having "dumb" entities, so
- entity classes should typically not have any methods.
-2. Create a migration using the `yarn migrate:create` command. Create/update
- tables as needed. Remember to fill in both the `up` and `down` methods. Try
- to avoid data loss as much as possible in your migrations.
-3. Add basic tests for your entity and migrations to the **test/entities/**
- directory.
-4. Create a class or function in the **data_sources/** directory for getting
- raw data. This code should abstract away pagination and rate-limiting as
- much as possible.
-5. Create a class or function in the **parsers/** directory for converting the
- raw data into an entity. Also add tests in the **tests/** directory to test
- the parser.
-6. Create an executable script in the **scripts/** directory for putting
- everything together. Your script can accept environment variables for things
- like API keys. It should pull the data, parse it, and save it to the
- database. Scripts should be idempotent and atomic (when possible). What this
- means is that your script may be responsible for determining _which_ data
- needs to be updated. For example, you may need to query the database to find
- the most recent block number that we have already pulled, then pull new data
- starting from that block number.
-7. Run the migrations and then run your new script locally and verify it works
- as expected.
-8. After all tests pass and you can run the script locally, open a new PR to
- the monorepo. Don't merge this yet!
-9. If you added any new scripts or dependencies between scripts, you will need
- to make changes to https://github.com/0xProject/0x-pipeline-orchestration
- and make a separate PR there. Don't merge this yet!
-10. After your PR passes code review, ask @feuGeneA or @xianny to deploy your
- changes to the QA environment. Check the [QA Airflow dashboard](http://airflow-qa.0x.org:8080)
- to make sure everything works correctly in the QA environment.
-11. Merge your PR to 0x-monorepo (and
- https://github.com/0xProject/0x-pipeline-orchestration if needed). Then ask
- @feuGeneA or @xianny to deploy to production.
-12. Monitor the [production Airflow dashboard](http://airflow.0x.org:8080) to
- make sure everything still works.
-13. Celebrate! :tada:
-
-#### Additional guidelines and tips:
-
-- Table names should be plural and separated by underscores (e.g.,
- `exchange_fill_events`).
-- Any table which contains data which comes directly from a third-party source
- should be namespaced in the `raw` PostgreSQL schema.
-- Column names in the database should be separated by underscores (e.g.,
- `maker_asset_type`).
-- Field names in entity classes (like any other fields in TypeScript) should
- be camel-cased (e.g., `makerAssetType`).
-- All timestamps should be stored as milliseconds since the Unix Epoch.
-- Use the `BigNumber` type for TypeScript code which deals with 256-bit
- numbers from smart contracts or for any case where we are dealing with large
- floating point numbers.
-- [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a
- helpful resource.
-
-* Scripts/parsers should perform minimum data transformation/normalization.
- The idea here is to have a raw data feed that will be cleaned up and
- synthesized in a separate step.