A locally run chatbot using Arctic texts for Retrieval Augmented Generation (RAG)
The app was copied from the excellent pixelgami tutorial found here: https://github.com/pixegami/rag-tutorial-v2 . Some revisions were made to accomodate langchain
updates and experiment with chunking and prompt engineering, a different embedding model was used (nomic-embed-text), and a different LLM was used (phi3:medium). Deprecation warnings were silenced.
In the various branches, I have added data about Arctic Climate from various sources including SNAP's Northern Climate Reports.
- Download
ollama
from here and install on your machine. - Pull the embedding model and LLM. This is approximately 9GB download.
ollama pull nomic-embed-text
ollama pull phi3:medium
- Start the
ollama
server
ollama serve
- Build and activate the conda environment from the
environment.yml
conda env create -f environment.yml
conda activate ai
- Fair warning, you might need to install more packages as you run the code!
- Put your data files in the
/data
folder, and runpython populate_database.py
. - Data format can be PDF, Markdown, JSON, HTML, etc. Check out the different branches in this repo to see the changes required in
populate_database.py
when using different file types, and explore what the loading and chunking process looks like for the different file types inexplore_loading_<filetype>.ipynb
notebooks. Themain
branch uses PDFs.
- Run
python query_data.py
plus a string query to ask a question. You will recieve a response, along with the full prompt to the LLM including the top 5 text chunks that make up the RAG component of the query, and a list of explicit references for each of those chunks. The references include the file name, the page number, and the chunk on that page. (If desired, the prompt can be removed from the response by commenting out theprint(prompt)
line inquery_data.py
.)
python query_data.py "How fast is the Arctic warming?"
Human:
Answer the question based only on the following context:
termed Arctic Amplification. Recent studies have shown that, after accounting for natural variability, the
Arctic is warming approximately three times faster than the global mean based on observational data
and climate model simulations since 1980 (Sweeney et al. 2023; Zhou et al. 2024). There are many
documented physical indicators of a warming Arctic. Recent decades have witnessed springtime snow
cover declines on Arctic lands (see essay Terrestrial Snow Cover), summer sea ice losses (see essay Sea
Ice), and Greenland Ice Sheet mass loss (see essay Greenland Ice Sheet). Extreme events are also
becoming more frequent (Overland 2024) and their character is changing over time (e.g., Arctic cold
---
extremes are warming at twice the rate as pan-Arctic annual temperatures since 1979; Polyakov et al.
2024). The warming Arctic has palpable ecological impacts at various spatial scales, which have been
---
NOAA Technical Report OAR ARC ; 24-01 Arctic Report Card 2024
2
2024 Headlines
The rapid pace and complexity of Arctic change demand new and strengthened
Arctic adaptation and global reductions of fossil fuel pollution
The Arctic continues to warm at a faster rate than the global average. The 2024 Arctic Report Card
highlights record-breaking and near-record-breaking observations that demonstrate dramatic change,
including Arctic tundra transformation from carbon sink to carbon source, declines of previously large
inland caribou herds, and increasing winter precipitation. Observations also reveal regional differences
that make local and regional experiences of environmental change highly variable for people, plants and
---
temperatures in the Arctic did not climb to record levels in 2024, yet a long-term warming trend is clear
and most Arctic Ocean marginal seas were ~2-4°C (3.6-7.2°F) warmer in August 2024 than the 1991-
2020 baseline. One more example from the marine environment is overall high primary productivity,
including sharp increases in some seas during the 2003-24 observational record.
Framing the Arctic as in a “new regime” underscores that the region today is dramatically changed from
even a decade or two ago, yet it must not imply that the Arctic climate has stabilized under human-
caused warming. Projections of climate change for the next several decades are clear: change will
continue. Every year the Arctic Report Card includes observations that bring sometimes surprising and
---
animals. Adaptation is increasingly necessary and Indigenous Knowledge and community-led research
programs are essential to understand and respond to rapid Arctic changes.
Headlines
In the air
• Arctic annual surface air temperatures ranked second warmest since 1900.
• Autumn 2023 and summer 2024 were especially warm across the Arctic with temperatures
ranking 2nd and 3rd warmest, respectively.
• An early August 2024 heatwave set all-time record daily temperatures in several northern
Alaska and Canada communities.
• The last nine years are the nine warmest on record in the Arctic.
• Summer 2024 across the Arctic was the wettest on record.
• Arctic precipitation has shown an increasing trend from 1950 through 2024, with the most
pronounced
increases occurring in winter.
---
Answer the question based on the above context: How fast is the Arctic warming?
Response: The Arctic is experiencing a significant increase in temperature, warming approximately three times faster than the global mean since 1980 according to observational data and climate model simulations. This phenomenon has been termed Arctic Amplification. It's important to note that these figures are after accounting for natural variability.
Sources: ['data/ArcticReportCard_full_report2024.pdf:9:3', 'data/ArcticReportCard_full_report2024.pdf:9:4', 'data/ArcticReportCard_full_report2024.pdf:3:0', 'data/ArcticReportCard_full_report2024.pdf:6:1', 'data/ArcticReportCard_full_report2024.pdf:3:1']
python query_data.py "Will there be salmon in the Yukon River in 2050?"
Human:
Answer the question based only on the following context:
disease are also likely important factors. The Yukon
chum salmon collapse was linked to lower food
quality during recent marine heatwaves.
Impacts on fisheries
The combined Chinook and chum declines resulted
in the first ever complete closures of subsistence
fishing for salmon in the Yukon Basin in 2021
and 2022. The closures were highly disruptive to
Indigenous people in the region who have been
linked to salmon for thousands of years.
The declines also heightened concerns over salmon
bycatch in marine fisheries and led to calls for greater inclusion of Indigenous Tribes and organizations in
monitoring and management decisions.
Large sockeye and pink runs benefited many fisheries, but even these had disruptive effects. The record high
---
18
Chinook salmon populations have been declining
for decades all over the state. In contrast, pink and
sockeye salmon have returned in above-average
numbers statewide, but these returns have been
highly variable, punctuated by extreme highs and
lows. For example, sockeye salmon returned to
Bristol Bay in record high numbers in 2022.
Chinook salmon declines in Alaska are linked to
climate extremes, including marine heatwaves,
high river temperatures during the spawning run,
and heavy fall rains when eggs are in the gravel.
Population declines are also linked to declining
adult body sizes, associated with more competition
at sea with highly abundant pink and chum salmon.
Climate-linked changes in predators, prey and
disease are also likely important factors. The Yukon
---
and subsistence fishery closures, have an
economic impact in rural Alaska by reducing
access to sustainable livelihoods.
These aspects of salmon well-being in
an Indigenous context are often under
accounted for in fisheries management.
FEWER EGGS DEPOSITED IN RIVERS
Chinook salmon — and to a lesser extent chum and sockeye
salmon — have become smaller since the 1970s (top right
graph). In Chinook, this is associated with 15% fewer eggs
per female (bottom left graph) deposited in rivers.
25
2020s
1970s
Harmony Wayner
26
950 mm salmon length
500
Sockeye
6,000
2020
1970
8,500 eggs per female Chinook
Chinook
Chum
---
ANIMALS & FISH
HEAT & SALMON DIE-OFFS
In June and July of 2019, thousands of salmon died as they
migrated to their spawning grounds of western Alaska.
Although the cause is not confirmed, the leading suspect
is unusually warm water temperatures above the range that
causes stress to adult salmon. Warm water causes several
problems: it contains less life-sustaining dissolved oxygen
than cool water, greatly accelerates metabolism, resulting
in faster burning of stored energy in the migrating fish, and
promotes the growth of parasites and fungus that can weaken
fish. Surveys of the Koyukuk River (a major tributary of the
Yukon River) confirmed thousands of dead summer Chum
salmon, which most likely succumbed to the heat, as the river
---
with more competition at sea, lower quality food, and possibly more marine predators selecting the largest fish.
Salmon and well-being
Salmon are a keystone of cultural,
spiritual, relational, and physical health
in Alaska, allowing Indigenous peoples to
live their traditional way of life. For many
coastal and river communities salmon
epitomize well-being.
Without access to salmon, Indigenous
peoples lose more than healthy food in
rural villages where groceries are limited
and prices are astronomical. There is also
less sharing with others, time with family
and knowledge passed from generation
to generation through the act of catching,
preparing and sharing fish. Dramatic salmon
changes, and the associated commercial
and subsistence fishery closures, have an
---
Answer the question based on the above context: Will there be salmon in the Yukon River in 2050?
Response: The provided context does not provide specific information about whether or not there will be salmon in the Yukon River in 2050. However, it discusses ongoing issues such as climate extremes and marine heatwaves which have been linked to declines in various salmon populations. It also mentions that these changes are likely disruptive to Indigenous peopcks who rely on salmon for their livelihoods and cultural practices. Therefore, while we can't predict a specific scenario for 2050 based on this context, it does indicate challenges for the future of salmon in rivers like Yukon due to environmental changes and human impact.
Sources: ['data/Alaskas-Changing-Environment-2024.pdf:17:1', 'data/Alaskas-Changing-Environment-2024.pdf:17:0', 'data/Alaskas-Changing-Environment-2024.pdf:18:2', 'data/Alaskas-Changing-Environment_2019_WEB.pdf:12:0', 'data/Alaskas-Changing-Environment-2024.pdf:18:1']
- Run
pytest test_rag.py -W ignore::DeprecationWarning
to run the unit tests. - These tests consist of a sample question and expected output. We actually use the LLM again to compare the application's response to our expected output, since the wording might not be exactly the same. This has potential uses for complex answers that are not explicit numerical values.
- The tests also include a negative and positive version of each unit test. This is to ensure that the LLM is not being too generous in comparing the responses.