Skip to content

Commit

Permalink
Rename erc20_token to token
Browse files Browse the repository at this point in the history
  • Loading branch information
medvedev1088 committed Aug 4, 2018
1 parent 74882c5 commit 468b7b4
Show file tree
Hide file tree
Showing 31 changed files with 192 additions and 194 deletions.
54 changes: 27 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Read this article https://medium.com/@medvedev1088/exporting-and-analyzing-ether
- [receipts.csv](#receiptscsv)
- [logs.csv](#logscsv)
- [contracts.csv](#contractscsv)
- [erc20_tokens.csv](#erc20_tokenscsv)
- [tokens.csv](#tokenscsv)
- [Exporting the Blockchain](#exporting-the-blockchain)
- [Export in 2 Hours](#export-in-2-hours)
- [Command Reference](#command-reference)
Expand Down Expand Up @@ -88,13 +88,13 @@ tx_input | hex_string |

Column | Type |
--------------------|-------------|
erc20_token | address |
erc20_from | address |
erc20_to | address |
erc20_value | numeric |
erc20_tx_hash | hex_string |
erc20_log_index | bigint |
erc20_block_number | bigint |
token_address | address |
from_address | address |
to_address | address |
value | numeric |
tx_hash | hex_string |
log_index | bigint |
block_number | bigint |

### receipts.csv

Expand Down Expand Up @@ -133,20 +133,20 @@ contract_function_sighashes | string |
contract_is_erc20 | boolean |
contract_is_erc721 | boolean |

### erc20_tokens.csv
### tokens.csv

Column | Type |
-----------------------------|-------------|
erc20_token_address | address |
erc20_token_symbol | string |
erc20_token_name | string |
erc20_token_decimals | bigint |
erc20_token_total_supply | numeric |
address | address |
symbol | string |
name | string |
decimals | bigint |
total_supply | numeric |

You can find column descriptions in [schemas/gcp](schemas/gcp)

Note: `erc20_token_symbol`, `erc20_token_name`, `erc20_token_decimals`, `erc20_token_total_supply`
columns in `erc20_tokens.csv` can have empty values in case the contract doesn't implement the corresponding methods
Note: `symbol`, `name`, `decimals`, `total_supply`
columns in `tokens.csv` can have empty values in case the contract doesn't implement the corresponding methods
or implements it incorrectly (e.g. wrong return type).

Note: for the `address` type all hex characters are lower-cased.
Expand Down Expand Up @@ -230,7 +230,7 @@ Additional steps:
- [extract_token_transfers.py](#extract_token_transferspy)
- [export_receipts_and_logs.py](#export_receipts_and_logspy)
- [export_contracts.py](#export_contractspy)
- [export_erc20_tokens.py](#export_erc20_tokenspy)
- [export_tokens.py](#export_tokenspy)
- [get_block_range_for_date.py](#get_block_range_for_datepy)

All the commands accept `-h` parameter for help, e.g.:
Expand Down Expand Up @@ -332,28 +332,28 @@ Then export contracts:

You can tune `--batch-size`, `--max-workers` for performance.

##### export_erc20_tokens.py
##### export_tokens.py

First extract token addresses from `token_transfers.csv`
(Exported with [export_token_transfers.py](#export_token_transferspy)):

```bash
> python extract_csv_column.py -i token_transfers.csv -c erc20_token -o - | sort | uniq > erc20_token_addresses.csv
> python extract_csv_column.py -i token_transfers.csv -c token_address -o - | sort | uniq > token_addresses.csv
```

Then export ERC20 tokens:

```bash
> python export_erc20_tokens.py --token-addresses erc20_token_addresses.csv \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output erc20_tokens.csv
> python export_tokens.py --token-addresses token_addresses.csv \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output tokens.csv
```

You can tune `--max-workers` for performance.

Note that there will be duplicate tokens across different partitions,
which need to be deduplicated (see Querying in Google BigQuery section).

Upvote this pull request to make erc20_tokens export faster
Upvote this pull request to make tokens export faster
https://github.com/ethereum/web3.py/pull/944#issuecomment-403957468

##### get_block_range_for_date.py
Expand Down Expand Up @@ -394,7 +394,7 @@ CREATE DATABASE ethereumetl;
- contracts: [schemas/aws/contracts.sql](schemas/aws/contracts.sql)
- receipts: [schemas/aws/receipts.sql](schemas/aws/receipts.sql)
- logs: [schemas/aws/logs.sql](schemas/aws/logs.sql)
- erc20_tokens: [schemas/aws/erc20_tokens.sql](schemas/aws/erc20_tokens.sql)
- tokens: [schemas/aws/tokens.sql](schemas/aws/tokens.sql)

### Tables for Parquet Files

Expand Down Expand Up @@ -437,7 +437,7 @@ To upload CSVs to BigQuery:
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.receipts gs://<your_bucket>/ethereumetl/export/receipts/*.csv ./schemas/gcp/receipts.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.logs gs://<your_bucket>/ethereumetl/export/logs/*.json ./schemas/gcp/logs.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.contracts gs://<your_bucket>/ethereumetl/export/contracts/*.json ./schemas/gcp/contracts.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 --allow_quoted_newlines ethereum.erc20_tokens_duplicates gs://<your_bucket>/ethereumetl/export/erc20_tokens/*.csv ./schemas/gcp/erc20_tokens.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 --allow_quoted_newlines ethereum.tokens_duplicates gs://<your_bucket>/ethereumetl/export/tokens/*.csv ./schemas/gcp/tokens.json
```

Note that NEWLINE_DELIMITED_JSON is used to support REPEATED mode for the columns with lists.
Expand All @@ -449,11 +449,11 @@ Join `transactions` and `receipts`:
> bq --location=US query --replace --destination_table ethereum.transactions_join_receipts --use_legacy_sql=false "$(cat ./schemas/gcp/transactions_join_receipts.sql | tr '\n' ' ')"
```

Deduplicate `erc20_tokens`:
Deduplicate `tokens`:

```bash
> bq mk --table --description "Exported using https://github.com/medvedev1088/ethereum-etl" ethereum.erc20_tokens ./schemas/gcp/erc20_tokens.json
> bq --location=US query --replace --destination_table ethereum.erc20_tokens --use_legacy_sql=false "$(cat ./schemas/gcp/erc20_tokens_deduplicate.sql | tr '\n' ' ')"
> bq mk --table --description "Exported using https://github.com/medvedev1088/ethereum-etl" ethereum.tokens ./schemas/gcp/tokens.json
> bq --location=US query --replace --destination_table ethereum.tokens --use_legacy_sql=false "$(cat ./schemas/gcp/tokens_deduplicate.sql | tr '\n' ' ')"
```

### Public Dataset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
# SOFTWARE.


class EthErc20Token(object):
class EthToken(object):
def __init__(self):
self.erc20_token_address = None
self.erc20_token_symbol = None
self.erc20_token_name = None
self.erc20_token_decimals = None
self.erc20_token_total_supply = None
self.address = None
self.symbol = None
self.name = None
self.decimals = None
self.total_supply = None
14 changes: 7 additions & 7 deletions ethereumetl/domain/token_transfer.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@

class EthTokenTransfer(object):
def __init__(self):
self.erc20_token = None
self.erc20_from = None
self.erc20_to = None
self.erc20_value = None
self.erc20_tx_hash = None
self.erc20_log_index = None
self.erc20_block_number = None
self.token_address = None
self.from_address = None
self.to_address = None
self.value = None
self.tx_hash = None
self.log_index = None
self.block_number = None
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@

from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.jobs.base_job import BaseJob
from ethereumetl.mappers.erc20_token_mapper import EthErc20TokenMapper
from ethereumetl.service.erc20_token_service import EthErc20TokenService
from ethereumetl.mappers.token_mapper import EthTokenMapper
from ethereumetl.service.token_service import EthTokenService


class ExportErc20TokensJob(BaseJob):
class ExportTokensJob(BaseJob):
def __init__(self, web3, item_exporter, token_addresses_iterable, max_workers):
self.item_exporter = item_exporter
self.token_addresses_iterable = token_addresses_iterable
self.batch_work_executor = BatchWorkExecutor(1, max_workers)

self.erc20_token_service = EthErc20TokenService(web3, clean_user_provided_content)
self.erc20_token_mapper = EthErc20TokenMapper()
self.token_service = EthTokenService(web3, clean_user_provided_content)
self.token_mapper = EthTokenMapper()

def _start(self):
self.item_exporter.open()
Expand All @@ -47,8 +47,8 @@ def _export_tokens(self, token_addresses):
self._export_token(token_address)

def _export_token(self, token_address):
token = self.erc20_token_service.get_token(token_address)
token_dict = self.erc20_token_mapper.erc20_token_to_dict(token)
token = self.token_service.get_token(token_address)
token_dict = self.token_mapper.token_to_dict(token)
self.item_exporter.export_item(token_dict)

def _end(self):
Expand Down
14 changes: 7 additions & 7 deletions ethereumetl/jobs/exporters/token_transfers_item_exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@
from ethereumetl.jobs.exporters.composite_item_exporter import CompositeItemExporter

FIELDS_TO_EXPORT = [
'erc20_token',
'erc20_from',
'erc20_to',
'erc20_value',
'erc20_tx_hash',
'erc20_log_index',
'erc20_block_number'
'token_address',
'from_address',
'to_address',
'value',
'tx_hash',
'log_index',
'block_number'
]


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,20 @@
from ethereumetl.jobs.exporters.composite_item_exporter import CompositeItemExporter

FIELDS_TO_EXPORT = [
'erc20_token_address',
'erc20_token_symbol',
'erc20_token_name',
'erc20_token_decimals',
'erc20_token_total_supply'
'address',
'symbol',
'name',
'decimals',
'total_supply'
]


def erc20_tokens_item_exporter(erc20_tokens_output):
def tokens_item_exporter(tokens_output):
return CompositeItemExporter(
filename_mapping={
'erc20_token': erc20_tokens_output
'token': tokens_output
},
field_mapping={
'erc20_token': FIELDS_TO_EXPORT
'token': FIELDS_TO_EXPORT
}
)
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@
# SOFTWARE.


class EthErc20TokenMapper(object):
def erc20_token_to_dict(self, erc20_token):
class EthTokenMapper(object):
def token_to_dict(self, token):
return {
'type': 'erc20_token',
'erc20_token_address': erc20_token.erc20_token_address,
'erc20_token_symbol': erc20_token.erc20_token_symbol,
'erc20_token_name': erc20_token.erc20_token_name,
'erc20_token_decimals': erc20_token.erc20_token_decimals,
'erc20_token_total_supply': erc20_token.erc20_token_total_supply
'type': 'token',
'address': token.address,
'symbol': token.symbol,
'name': token.name,
'decimals': token.decimals,
'total_supply': token.total_supply
}
14 changes: 7 additions & 7 deletions ethereumetl/mappers/token_transfer_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ class EthTokenTransferMapper(object):
def token_transfer_to_dict(self, token_transfer):
return {
'type': 'token_transfer',
'erc20_token': token_transfer.erc20_token,
'erc20_from': token_transfer.erc20_from,
'erc20_to': token_transfer.erc20_to,
'erc20_value': token_transfer.erc20_value,
'erc20_tx_hash': token_transfer.erc20_tx_hash,
'erc20_log_index': token_transfer.erc20_log_index,
'erc20_block_number': token_transfer.erc20_block_number,
'token_address': token_transfer.token_address,
'from_address': token_transfer.from_address,
'to_address': token_transfer.to_address,
'value': token_transfer.value,
'tx_hash': token_transfer.tx_hash,
'log_index': token_transfer.log_index,
'block_number': token_transfer.block_number,
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@

from web3.exceptions import BadFunctionCallOutput

from ethereumetl.domain.erc20_token import EthErc20Token
from ethereumetl.domain.token import EthToken
from ethereumetl.erc20_abi import ERC20_ABI


class EthErc20TokenService(object):
class EthTokenService(object):
def __init__(self, web3, function_call_result_transformer=None):
self._web3 = web3
self._function_call_result_transformer = function_call_result_transformer
Expand All @@ -41,12 +41,12 @@ def get_token(self, token_address):
decimals = self._call_contract_function(contract.functions.decimals())
total_supply = self._call_contract_function(contract.functions.totalSupply())

token = EthErc20Token()
token.erc20_token_address = token_address
token.erc20_token_symbol = symbol
token.erc20_token_name = name
token.erc20_token_decimals = decimals
token.erc20_token_total_supply = total_supply
token = EthToken()
token.address = token_address
token.symbol = symbol
token.name = name
token.decimals = decimals
token.total_supply = total_supply

return token

Expand Down
14 changes: 7 additions & 7 deletions ethereumetl/service/token_transfer_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,13 @@ def extract_transfer_from_log(self, receipt_log):
return None

token_transfer = EthTokenTransfer()
token_transfer.erc20_token = to_normalized_address(receipt_log.address)
token_transfer.erc20_from = word_to_address(topics_with_data[1])
token_transfer.erc20_to = word_to_address(topics_with_data[2])
token_transfer.erc20_value = hex_to_dec(topics_with_data[3])
token_transfer.erc20_tx_hash = receipt_log.transaction_hash
token_transfer.erc20_log_index = receipt_log.log_index
token_transfer.erc20_block_number = receipt_log.block_number
token_transfer.token_address = to_normalized_address(receipt_log.address)
token_transfer.from_address = word_to_address(topics_with_data[1])
token_transfer.to_address = word_to_address(topics_with_data[2])
token_transfer.value = hex_to_dec(topics_with_data[3])
token_transfer.tx_hash = receipt_log.transaction_hash
token_transfer.log_index = receipt_log.log_index
token_transfer.block_number = receipt_log.block_number
return token_transfer

return None
Expand Down
Loading

0 comments on commit 468b7b4

Please sign in to comment.