Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataframe in eia_io_trading.py #251

Open
i-sayuh opened this issue Aug 13, 2024 · 19 comments
Open

Empty dataframe in eia_io_trading.py #251

i-sayuh opened this issue Aug 13, 2024 · 19 comments

Comments

@i-sayuh
Copy link

i-sayuh commented Aug 13, 2024

The dataframe titled 'df_ba_trade_sum' is empty when running main..py for BA areas.

@m-jamieson
Copy link
Collaborator

Can you provide some more details? Perhaps it would be easier to share the model config file you're using and then which version of electricitylci you're running.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 13, 2024

ELCI_1_config.txt
Here is the model config file in TXT format. I am using the latest version of electricitylci. Main.py for eGRID region works. The dataframe 'df_ba_trade' is also empty in eia_io_trading.py.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 13, 2024

It looks like the list 'BA_TO_BA_ROWS', which 'df_ba_trade' depends on, also remains empty in eia_io_trading.py.

@m-jamieson
Copy link
Collaborator

Are you able to find the folders where electricitylci downloads the data files from EIA, etc.? I'm wondering if electricitylci/bulk_data/EBA.zip is either missing or downloading an empty version. I'd be surprised if electricitylci didn't error out if the download failed. The EBA.zip I have downloaded is from 2022 and is 457MB.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 14, 2024

I located EBA.zip file which contains only EBA.txt. It is 321 MB compressed size. When I open this file through anaconda power shell prompt and display the first 10 lines, it is full of arrays with characters that look like the following: ["20230828T11-04","-642"],["20230828T10-04","-789"],["20230828T09-04","-866"]...["20181231T22-05","-442"],["20181231T21-05","-447"],["20181231T20-05","-473"]. In terms of all the folders within \electricitylci\data that have EIA data such as \eia860_2014 and \f923_2015, they seem to have all relevant EIA data successfully downloaded and stored.

@m-jamieson
Copy link
Collaborator

m-jamieson commented Aug 15, 2024

Thanks - I'm looking at two different vintages of the EBA now - one I downloaded yesterday and one that I've had since 2022. There are two different issues going on here:

  1. They changed the format of the file a little bit. In the 2022 era data, there's a field in the lines called geoset_id. This is where the code is currently looking for the text "keywords": EBA.NG.H or EBA.NG.HL. The new version of the bulk data file doesn't appear to have the same field. We'll likely need to change the code to do a regex search at the beginning of the lines to catch EBA.TVA-ALL.NG.COL.H for example. This is why you're getting completely empty dataframes.
  2. They appear to be rotating through the data so that the 2016 data no longer exists. I see this line in the 2024 EBA "start":"20190101T00","end":"20240814T10" and in the same data line in the 2022 file I see: start":"20180716T01-05","end":"20220616T00-05". I'm actually trying to go back further but it appears I only have a May 2021 EBA.zip that doesn't contain 2016 data unless I'm misunderstanding the data here. For reference, I ran the electricitylci for the official version of the electricity baseline in May of 2020. So even if we were correctly reading the new version of the file, we wouldn't be able to get 2016 data.

@m-jamieson
Copy link
Collaborator

Just a quick update on the available data years. It takes more than a spot-check of a couple of lines. I found data going back to 2018 in the 2024 EBA.zip, and in the May 2021 version, there are data for 2016. Now whether the older data is in the right place, we won't really find out until the script turns up empty data frames. I have no desire to parse 2+GB text files. However, it appears that the 2024-vintage EBA.zip I have only has the data we need going back to 2019.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 15, 2024

If you run the latest version of electricity LCI with the may 2021 EBA.zip version for 'BA' regional aggregation, does the main code run fine or do you run into an error with eia_io_trading.py? If it runs fine, which file do you recommend I download from EIA to replace the existing EBA.zip that the code generated?

@m-jamieson
Copy link
Collaborator

I was able to run v.1.0.1 with my ~2021 era EBA.zip. Unfortunately, I'm not aware of historical EBA files being stored anywhere on EIA. I also just tried to add my 2021 as an attachment here, but file size limit is 25MB. I might make an attempt to filter mine for 2016 data needed and ad it here.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 15, 2024

In the config file, I set 'replace_egrid' and 'net_trading' to False which means generation is coming from eGRID and trading does not depend on EIA data. In that case, what purpose does EIA bulk data serve? Is this bulk EIA data necessary for the code to run regardless of configuration settings? I am trying to find a way to bypass the need for the data, if possible.

@m-jamieson
Copy link
Collaborator

Well, that is a separate issue. You could use EPA_eGRID_trading: True and that'll use a different method entirely that doesn't depend on bulk data. You would also have to set replace_egrid: False. It's a different method. You'll only get consumption mixes for eGRID regions, if you're okay with that.

@m-jamieson
Copy link
Collaborator

For posterity, even after removing the unneeded data from EBA.txt, I was still only able to get the zipped file size to 100MB.

@i-sayuh
Copy link
Author

i-sayuh commented Aug 15, 2024

The goal is to generate the inventories (generation or consumption) at the BA area spatial level. I realize now that 'EPA_eGRID_trading' can only be used when eGRID is the aggregation level such that it must be set to False or an error occurs if set to True while BA is the chosen aggregation level. I set this to False. I also set 'net_trading' to False indicating that I am only interested in generation mixes (i.e. do not use EIA bulk data to generate consumption mixes at BA level). And, I set 'replace_egrid' to True indicating that generation should come from eGRID, and not EIA bulk for BA area. After this, we still get the original error of empty dataframes in eia_io_trading.py when running the main code for BA. It seems like the code requires this EIA data regardless of the model configuration for BA aggregation. If the issue is indeed within the EBA.zip itself, then the solution would be to obtain the 2021 zip file you have so that I can run the code successfully. I work at NREL and we can communicate externally (check email).

@i-sayuh
Copy link
Author

i-sayuh commented Aug 16, 2024

Basically, the question is why does the main code for BA calculate BA level consumption mixes via EIA bulk data when it has been specified by the configuration file that trading is turned off and generation should come from eGRID?

@m-jamieson
Copy link
Collaborator

If your aggregation is BA then the script is going to force the consumption mix path to be compatible with that - I think there's a logging message somewhere along the way that says this, but I think we chose to just go ahead and run the code rather than stop execution. regional_aggregation needs to be eGRID in order to use the EPA_eGRID trading scheme. And then the net_trading flag might be misteading - it's an option for the EPA_egrid trading scheme.

@m-jamieson
Copy link
Collaborator

As another update here, it looks like in the updated EBA files, the datetime strings have changed to and are now time zone agnostic (all forced to zulu). In any case, it's going to require a change to how the datetimes are processed. Working on an update now.

m-jamieson added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Aug 19, 2024
@m-jamieson
Copy link
Collaborator

As an FYI, I made some changes to the bulk data processing to address this. Still won't get 2016 data, but it can at least process a newer EBA.zip: KeyLogicLCA@3d8c244

@dt-woods
Copy link
Collaborator

dt-woods commented Sep 20, 2024

It's a pity that the EIA's API does not have data that go back to 2016. Fortunately, I have a copy of EBA.txt from March 2024, that includes data back to 2015. It includes 8,784 rows for net generation, and 2.5 million rows in BA trade.

@dt-woods
Copy link
Collaborator

dt-woods commented Nov 6, 2024

In 2022, the U.S. Energy Information Administration (EIA) records officer transferred the older 930 data to NARA for permanent archival. That link is here:

https://catalog.archives.gov/search?page=1&q=%22Hourly%20and%20Daily%20Balancing%20Authority%20Operations%20Report%20Files%22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants