Skip to content

Commit

Permalink
Rename erc20_transfer to token_transfer
Browse files Browse the repository at this point in the history
  • Loading branch information
medvedev1088 committed Aug 4, 2018
1 parent dc6359c commit 74882c5
Show file tree
Hide file tree
Showing 22 changed files with 110 additions and 110 deletions.
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ Export blocks and transactions ([Reference](#export_blocks_and_transactionspy)):
--provider-uri https://mainnet.infura.io/ --blocks-output blocks.csv --transactions-output transactions.csv
```

Export ERC20 transfers ([Reference](#export_erc20_transferspy)):
Export ERC20 transfers ([Reference](#export_token_transferspy)):

```bash
> python export_erc20_transfers.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output erc20_transfers.csv
> python export_token_transfers.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv
```

Export receipts and logs ([Reference](#export_receipts_and_logspy)):
Expand All @@ -31,7 +31,7 @@ Read this article https://medium.com/@medvedev1088/exporting-and-analyzing-ether
- [Schema](#schema)
- [blocks.csv](#blockscsv)
- [transactions.csv](#transactionscsv)
- [erc20_transfers.csv](#erc20_transferscsv)
- [token_transfers.csv](#token_transferscsv)
- [receipts.csv](#receiptscsv)
- [logs.csv](#logscsv)
- [contracts.csv](#contractscsv)
Expand Down Expand Up @@ -84,7 +84,7 @@ tx_gas | bigint |
tx_gas_price | bigint |
tx_input | hex_string |

### erc20_transfers.csv
### token_transfers.csv

Column | Type |
--------------------|-------------|
Expand Down Expand Up @@ -194,7 +194,7 @@ there is no need to wait until the full sync as the state is not needed.
...
output/transactions/start_block=00000000/end_block=00099999/transactions_00000000_00099999.csv
...
output/erc20_transfers/start_block=00000000/end_block=00099999/erc20_transfers_00000000_00099999.csv
output/token_transfers/start_block=00000000/end_block=00099999/token_transfers_00000000_00099999.csv
...
```

Expand Down Expand Up @@ -226,8 +226,8 @@ Additional steps:
#### Command Reference

- [export_blocks_and_transactions.py](#export_blocks_and_transactionspy)
- [export_erc20_transfers.py](#export_erc20_transferspy)
- [extract_erc20_transfers.py](#extract_erc20_transferspy)
- [export_token_transfers.py](#export_token_transferspy)
- [extract_token_transfers.py](#extract_token_transferspy)
- [export_receipts_and_logs.py](#export_receipts_and_logspy)
- [export_contracts.py](#export_contractspy)
- [export_erc20_tokens.py](#export_erc20_tokenspy)
Expand Down Expand Up @@ -260,21 +260,21 @@ Omit `--blocks-output` or `--transactions-output` options if you want to export

You can tune `--batch-size`, `--max-workers` for performance.

##### export_erc20_transfers.py
##### export_token_transfers.py

The API used in this command is not supported by Infura, so you will need a local node.
If you want to use Infura for exporting ERC20 transfers refer to [extract_erc20_transfers.py](#extract_erc20_transferspy)
If you want to use Infura for exporting ERC20 transfers refer to [extract_token_transfers.py](#extract_token_transferspy)

```bash
> python export_erc20_transfers.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output erc20_transfers.csv
> python export_token_transfers.py --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output token_transfers.csv
```

Include `--tokens <token1> <token2>` to filter only certain tokens, e.g.

```bash
> python export_erc20_transfers.py --start-block 0 --end-block 500000 --provider-uri file://$HOME/Library/Ethereum/geth.ipc \
--output erc20_transfers.csv --tokens 0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0 0x06012c8cf97bead5deae237070f9587f8e7a266d
> python export_token_transfers.py --start-block 0 --end-block 500000 --provider-uri file://$HOME/Library/Ethereum/geth.ipc \
--output token_transfers.csv --tokens 0x86fa049857e0209aa7d9e616f7eb3b3b78ecfdb0 0x06012c8cf97bead5deae237070f9587f8e7a266d
```

You can tune `--batch-size`, `--max-workers` for performance.
Expand Down Expand Up @@ -302,14 +302,14 @@ You can tune `--batch-size`, `--max-workers` for performance.
Upvote this feature request https://github.com/paritytech/parity/issues/9075,
it will make receipts and logs export much faster.

##### extract_erc20_transfers.py
##### extract_token_transfers.py

First export receipt logs with [export_receipts_and_logs.py](#export_receipts_and_logspy).

Then extract transfers from the logs.csv file:

```bash
> python extract_erc20_transfers.py --logs logs.csv --output erc20_transfers.csv
> python extract_token_transfers.py --logs logs.csv --output token_transfers.csv
```

You can tune `--batch-size`, `--max-workers` for performance.
Expand All @@ -334,11 +334,11 @@ You can tune `--batch-size`, `--max-workers` for performance.

##### export_erc20_tokens.py

First extract token addresses from `erc20_transfers.csv`
(Exported with [export_erc20_transfers.py](#export_erc20_transferspy)):
First extract token addresses from `token_transfers.csv`
(Exported with [export_token_transfers.py](#export_token_transferspy)):

```bash
> python extract_csv_column.py -i erc20_transfers.csv -c erc20_token -o - | sort | uniq > erc20_token_addresses.csv
> python extract_csv_column.py -i token_transfers.csv -c erc20_token -o - | sort | uniq > erc20_token_addresses.csv
```

Then export ERC20 tokens:
Expand Down Expand Up @@ -390,7 +390,7 @@ CREATE DATABASE ethereumetl;
- Create the tables:
- blocks: [schemas/aws/blocks.sql](schemas/aws/blocks.sql)
- transactions: [schemas/aws/transactions.sql](schemas/aws/transactions.sql)
- erc20_transfers: [schemas/aws/erc20_transfers.sql](schemas/aws/erc20_transfers.sql)
- token_transfers: [schemas/aws/token_transfers.sql](schemas/aws/token_transfers.sql)
- contracts: [schemas/aws/contracts.sql](schemas/aws/contracts.sql)
- receipts: [schemas/aws/receipts.sql](schemas/aws/receipts.sql)
- logs: [schemas/aws/logs.sql](schemas/aws/logs.sql)
Expand All @@ -403,7 +403,7 @@ Read this article on how to convert CSVs to Parquet https://medium.com/@medvedev
- Create the tables:
- parquet_blocks: [schemas/aws/parquet/parquet_blocks.sql](schemas/aws/parquet/parquet_blocks.sql)
- parquet_transactions: [schemas/aws/parquet/parquet_transactions.sql](schemas/aws/parquet/parquet_transactions.sql)
- parquet_erc20_transfers: [schemas/aws/parquet/parquet_erc20_transfers.sql](schemas/aws/parquet/parquet_erc20_transfers.sql)
- parquet_token_transfers: [schemas/aws/parquet/parquet_token_transfers.sql](schemas/aws/parquet/parquet_token_transfers.sql)

Note that DECIMAL type is limited to 38 digits in Hive https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-decimal
so values greater than 38 decimals will be null.
Expand Down Expand Up @@ -433,7 +433,7 @@ To upload CSVs to BigQuery:
> cd ethereum-etl
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.blocks gs://<your_bucket>/ethereumetl/export/blocks/*.csv ./schemas/gcp/blocks.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.transactions gs://<your_bucket>/ethereumetl/export/transactions/*.csv ./schemas/gcp/transactions.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.erc20_transfers gs://<your_bucket>/ethereumetl/export/erc20_transfers/*.csv ./schemas/gcp/erc20_transfers.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.token_transfers gs://<your_bucket>/ethereumetl/export/token_transfers/*.csv ./schemas/gcp/token_transfers.json
> bq --location=US load --replace --source_format=CSV --skip_leading_rows=1 ethereum.receipts gs://<your_bucket>/ethereumetl/export/receipts/*.csv ./schemas/gcp/receipts.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.logs gs://<your_bucket>/ethereumetl/export/logs/*.json ./schemas/gcp/logs.json
> bq --location=US load --replace --source_format=NEWLINE_DELIMITED_JSON ethereum.contracts gs://<your_bucket>/ethereumetl/export/contracts/*.json ./schemas/gcp/contracts.json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# SOFTWARE.


class EthErc20Transfer(object):
class EthTokenTransfer(object):
def __init__(self):
self.erc20_token = None
self.erc20_from = None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@

from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.jobs.base_job import BaseJob
from ethereumetl.mappers.erc20_transfer_mapper import EthErc20TransferMapper
from ethereumetl.mappers.token_transfer_mapper import EthTokenTransferMapper
from ethereumetl.mappers.receipt_log_mapper import EthReceiptLogMapper
from ethereumetl.service.erc20_transfer_extractor import EthErc20TransferExtractor, TRANSFER_EVENT_TOPIC
from ethereumetl.service.token_transfer_extractor import EthTokenTransferExtractor, TRANSFER_EVENT_TOPIC
from ethereumetl.utils import validate_range


class ExportErc20TransfersJob(BaseJob):
class ExportTokenTransfersJob(BaseJob):
def __init__(
self,
start_block,
Expand All @@ -49,8 +49,8 @@ def __init__(
self.batch_work_executor = BatchWorkExecutor(batch_size, max_workers)

self.receipt_log_mapper = EthReceiptLogMapper()
self.erc20_transfer_mapper = EthErc20TransferMapper()
self.erc20_transfer_extractor = EthErc20TransferExtractor()
self.token_transfer_mapper = EthTokenTransferMapper()
self.token_transfer_extractor = EthTokenTransferExtractor()

def _start(self):
self.item_exporter.open()
Expand Down Expand Up @@ -78,9 +78,9 @@ def _export_batch(self, block_number_batch):
events = event_filter.get_all_entries()
for event in events:
log = self.receipt_log_mapper.web3_dict_to_receipt_log(event)
erc20_transfer = self.erc20_transfer_extractor.extract_transfer_from_log(log)
if erc20_transfer is not None:
self.item_exporter.export_item(self.erc20_transfer_mapper.erc20_transfer_to_dict(erc20_transfer))
token_transfer = self.token_transfer_extractor.extract_transfer_from_log(log)
if token_transfer is not None:
self.item_exporter.export_item(self.token_transfer_mapper.token_transfer_to_dict(token_transfer))

self.web3.eth.uninstallFilter(event_filter.filter_id)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@
]


def erc20_transfers_item_exporter(erc20_transfer_output):
def token_transfers_item_exporter(token_transfer_output):
return CompositeItemExporter(
filename_mapping={
'erc20_transfer': erc20_transfer_output
'token_transfer': token_transfer_output
},
field_mapping={
'erc20_transfer': FIELDS_TO_EXPORT
'token_transfer': FIELDS_TO_EXPORT
}
)
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@

from ethereumetl.executors.batch_work_executor import BatchWorkExecutor
from ethereumetl.jobs.base_job import BaseJob
from ethereumetl.mappers.erc20_transfer_mapper import EthErc20TransferMapper
from ethereumetl.mappers.token_transfer_mapper import EthTokenTransferMapper
from ethereumetl.mappers.receipt_log_mapper import EthReceiptLogMapper
from ethereumetl.service.erc20_transfer_extractor import EthErc20TransferExtractor
from ethereumetl.service.token_transfer_extractor import EthTokenTransferExtractor


class ExtractErc20TransfersJob(BaseJob):
class ExtractTokenTransfersJob(BaseJob):
def __init__(
self,
logs_iterable,
Expand All @@ -40,8 +40,8 @@ def __init__(
self.item_exporter = item_exporter

self.receipt_log_mapper = EthReceiptLogMapper()
self.erc20_transfer_mapper = EthErc20TransferMapper()
self.erc20_transfer_extractor = EthErc20TransferExtractor()
self.token_transfer_mapper = EthTokenTransferMapper()
self.token_transfer_extractor = EthTokenTransferExtractor()

def _start(self):
self.item_exporter.open()
Expand All @@ -55,9 +55,9 @@ def _extract_transfers(self, log_dicts):

def _extract_transfer(self, log_dict):
log = self.receipt_log_mapper.dict_to_receipt_log(log_dict)
erc20_transfer = self.erc20_transfer_extractor.extract_transfer_from_log(log)
if erc20_transfer is not None:
self.item_exporter.export_item(self.erc20_transfer_mapper.erc20_transfer_to_dict(erc20_transfer))
token_transfer = self.token_transfer_extractor.extract_transfer_from_log(log)
if token_transfer is not None:
self.item_exporter.export_item(self.token_transfer_mapper.token_transfer_to_dict(token_transfer))

def _end(self):
self.batch_work_executor.shutdown()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@
# SOFTWARE.


class EthErc20TransferMapper(object):
def erc20_transfer_to_dict(self, erc20_transfer):
class EthTokenTransferMapper(object):
def token_transfer_to_dict(self, token_transfer):
return {
'type': 'erc20_transfer',
'erc20_token': erc20_transfer.erc20_token,
'erc20_from': erc20_transfer.erc20_from,
'erc20_to': erc20_transfer.erc20_to,
'erc20_value': erc20_transfer.erc20_value,
'erc20_tx_hash': erc20_transfer.erc20_tx_hash,
'erc20_log_index': erc20_transfer.erc20_log_index,
'erc20_block_number': erc20_transfer.erc20_block_number,
'type': 'token_transfer',
'erc20_token': token_transfer.erc20_token,
'erc20_from': token_transfer.erc20_from,
'erc20_to': token_transfer.erc20_to,
'erc20_value': token_transfer.erc20_value,
'erc20_tx_hash': token_transfer.erc20_tx_hash,
'erc20_log_index': token_transfer.erc20_log_index,
'erc20_block_number': token_transfer.erc20_block_number,
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@
import logging
from builtins import map

from ethereumetl.domain.erc20_transfer import EthErc20Transfer
from ethereumetl.domain.token_transfer import EthTokenTransfer
from ethereumetl.utils import chunk_string, hex_to_dec, to_normalized_address

# https://ethereum.stackexchange.com/questions/12553/understanding-logs-and-log-blooms
TRANSFER_EVENT_TOPIC = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
logger = logging.getLogger(__name__)


class EthErc20TransferExtractor(object):
class EthTokenTransferExtractor(object):
def extract_transfer_from_log(self, receipt_log):

topics = receipt_log.topics
Expand All @@ -50,15 +50,15 @@ def extract_transfer_from_log(self, receipt_log):
.format(receipt_log.log_index, receipt_log.transaction_hash))
return None

erc20_transfer = EthErc20Transfer()
erc20_transfer.erc20_token = to_normalized_address(receipt_log.address)
erc20_transfer.erc20_from = word_to_address(topics_with_data[1])
erc20_transfer.erc20_to = word_to_address(topics_with_data[2])
erc20_transfer.erc20_value = hex_to_dec(topics_with_data[3])
erc20_transfer.erc20_tx_hash = receipt_log.transaction_hash
erc20_transfer.erc20_log_index = receipt_log.log_index
erc20_transfer.erc20_block_number = receipt_log.block_number
return erc20_transfer
token_transfer = EthTokenTransfer()
token_transfer.erc20_token = to_normalized_address(receipt_log.address)
token_transfer.erc20_from = word_to_address(topics_with_data[1])
token_transfer.erc20_to = word_to_address(topics_with_data[2])
token_transfer.erc20_value = hex_to_dec(topics_with_data[3])
token_transfer.erc20_tx_hash = receipt_log.transaction_hash
token_transfer.erc20_log_index = receipt_log.log_index
token_transfer.erc20_block_number = receipt_log.block_number
return token_transfer

return None

Expand Down
16 changes: 8 additions & 8 deletions export_all.sh
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,14 @@ for (( batch_start_block=$start_block; batch_start_block <= $end_block; batch_st
python3 export_blocks_and_transactions.py --start-block=${batch_start_block} --end-block=${batch_end_block} --provider-uri="${provider_uri}" --batch-size=${export_blocks_batch_size} --blocks-output=${blocks_file} --transactions-output=${transactions_file}
quit_if_returned_error

### erc20_transfers
### token_transfers

erc20_transfers_output_dir=${output_dir}/erc20_transfers${partition_dir}
mkdir -p ${erc20_transfers_output_dir};
token_transfers_output_dir=${output_dir}/token_transfers${partition_dir}
mkdir -p ${token_transfers_output_dir};

erc20_transfers_file=${erc20_transfers_output_dir}/erc20_transfers_${file_name_suffix}.csv
log "Exporting ERC20 transfers from blocks ${block_range} to ${erc20_transfers_file}"
python3 export_erc20_transfers.py --start-block=${batch_start_block} --end-block=${batch_end_block} --provider-uri="${provider_uri}" --batch-size=${export_erc20_batch_size} --output=${erc20_transfers_file}
token_transfers_file=${token_transfers_output_dir}/token_transfers_${file_name_suffix}.csv
log "Exporting ERC20 transfers from blocks ${block_range} to ${token_transfers_file}"
python3 export_token_transfers.py --start-block=${batch_start_block} --end-block=${batch_end_block} --provider-uri="${provider_uri}" --batch-size=${export_erc20_batch_size} --output=${token_transfers_file}
quit_if_returned_error

### receipts_and_logs
Expand Down Expand Up @@ -152,8 +152,8 @@ for (( batch_start_block=$start_block; batch_start_block <= $end_block; batch_st
mkdir -p ${erc20_token_addresses_output_dir}

erc20_token_addresses_file=${erc20_token_addresses_output_dir}/erc20_token_addresses_${file_name_suffix}
log "Extracting erc20_token_address from erc20_token_transfers file ${erc20_transfers_file}"
python3 extract_csv_column.py -i ${erc20_transfers_file} -c erc20_token -o - | sort | uniq > ${erc20_token_addresses_file}
log "Extracting erc20_token_address from erc20_token_transfers file ${token_transfers_file}"
python3 extract_csv_column.py -i ${token_transfers_file} -c erc20_token -o - | sort | uniq > ${erc20_token_addresses_file}
quit_if_returned_error

erc20_tokens_output_dir=${output_dir}/erc20_tokens${partition_dir}
Expand Down
8 changes: 4 additions & 4 deletions export_erc20_transfers.py → export_token_transfers.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@

from web3 import Web3

from ethereumetl.jobs.export_erc20_transfers_job import ExportErc20TransfersJob
from ethereumetl.jobs.exporters.erc20_transfers_item_exporter import erc20_transfers_item_exporter
from ethereumetl.jobs.export_token_transfers_job import ExportTokenTransfersJob
from ethereumetl.jobs.exporters.token_transfers_item_exporter import token_transfers_item_exporter
from ethereumetl.logging_utils import logging_basic_config
from ethereumetl.providers.auto import get_provider_from_uri
from ethereumetl.thread_local_proxy import ThreadLocalProxy
Expand All @@ -48,12 +48,12 @@

args = parser.parse_args()

job = ExportErc20TransfersJob(
job = ExportTokenTransfersJob(
start_block=args.start_block,
end_block=args.end_block,
batch_size=args.batch_size,
web3=ThreadLocalProxy(lambda: Web3(get_provider_from_uri(args.provider_uri))),
item_exporter=erc20_transfers_item_exporter(args.output),
item_exporter=token_transfers_item_exporter(args.output),
max_workers=args.max_workers,
tokens=args.tokens)

Expand Down
8 changes: 4 additions & 4 deletions extract_erc20_transfers.py → extract_token_transfers.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
import json

from ethereumetl.file_utils import smart_open
from ethereumetl.jobs.exporters.erc20_transfers_item_exporter import erc20_transfers_item_exporter
from ethereumetl.jobs.extract_erc20_transfers_job import ExtractErc20TransfersJob
from ethereumetl.jobs.exporters.token_transfers_item_exporter import token_transfers_item_exporter
from ethereumetl.jobs.extract_token_transfers_job import ExtractTokenTransfersJob
from ethereumetl.logging_utils import logging_basic_config

logging_basic_config()
Expand All @@ -46,10 +46,10 @@
logs_reader = (json.loads(line) for line in logs_file)
else:
logs_reader = csv.DictReader(logs_file)
job = ExtractErc20TransfersJob(
job = ExtractTokenTransfersJob(
logs_iterable=logs_reader,
batch_size=args.batch_size,
max_workers=args.max_workers,
item_exporter=erc20_transfers_item_exporter(args.output))
item_exporter=token_transfers_item_exporter(args.output))

job.run()
Loading

0 comments on commit 74882c5

Please sign in to comment.