Skip to content

Commit

Permalink
remove usage of stale in some docs (dagster-io#13658)
Browse files Browse the repository at this point in the history
## Summary & Motivation

Updates docs to respect the revised staleness ontology, which does not
use the word "Stale".

As part of this change, I removed the tab on the asset details page that
deals with "Upstream changed". This was already out of date.

This PR also includes a fix to the imports in one of the
AssetObservation examples. I can pull that out into a separate change if
helpful.

## How I Tested These Changes
  • Loading branch information
sryza authored Apr 14, 2023
1 parent bcbd8ee commit eb663dd
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 47 deletions.
15 changes: 10 additions & 5 deletions docs/content/concepts/assets/asset-observations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ height={917}
There are a variety of types of metadata that can be associated with an observation event, all through the <PyObject object="MetadataValue" /> class. Each observation event optionally takes a dictionary of metadata that is then displayed in the event log and the [Asset Details](/concepts/dagit/dagit#asset-details) page. Check our API docs for <PyObject object="MetadataValue" /> for more details on the types of event metadata available.

```python file=concepts/assets/observations.py startafter=start_observation_asset_marker_2 endbefore=end_observation_asset_marker_2
from dagster import AssetObservation, MetadataValue, op
from dagster import AssetMaterialization, AssetObservation, MetadataValue, op


@op
Expand Down Expand Up @@ -93,7 +93,7 @@ height={1146}
If you are observing a single slice of an asset (e.g. a single day's worth of data on a larger table), rather than mutating or creating it entirely, you can indicate this to Dagster by including the `partition` argument on the object.

```python file=/concepts/assets/observations.py startafter=start_partitioned_asset_observation endbefore=end_partitioned_asset_observation
from dagster import AssetMaterialization, op
from dagster import AssetObservation, op


@op(config_schema={"date": str})
Expand All @@ -111,8 +111,13 @@ def partitioned_dataset_op(context):
<PyObject object="SourceAsset" /> objects may have a user-defined observation function
that returns a <PyObject object="DataVersion" />. Whenever the observation
function is run, an <PyObject object="AssetObservation" /> will be generated for
the source asset and tagged with the returned data version. The data version is
used in staleness calculations for downstream assets.
the source asset and tagged with the returned data version. When an asset is
observed to have a newer data version than the data version it had when a
downstream asset was materialized, then the downstream asset will be given a
label in the UI that indicates that upstream data has changed.

<PyObject object="AutoMaterializePolicy" pluralize /> can be used to automatically
materialize downstream assets when this occurs.

The <PyObject object="observable_source_asset" /> decorator provides a convenient way to define source assets with observation functions. The below observable source asset takes a file hash and returns it as the data version. Every time you run the observation function, a new observation will be generated with this hash set as its data version.

Expand All @@ -130,7 +135,7 @@ def foo_source_asset():
return DataVersion(hash_sig.hexdigest())
```

When the file content changes, the hash and therefore the data version will change - this will notify Dagster that downstream assets derived from an older value (i.e. a different data version) of this source asset are stale.
When the file content changes, the hash and therefore the data version will change - this will notify Dagster that downstream assets derived from an older value (i.e. a different data version) of this source asset might need to be updated.

Source asset observations can be triggered via the "Observe sources" button in the Dagit graph explorer view. Note that this button will only be visible if at least one source asset in the current graph defines an observation function.

Expand Down
41 changes: 3 additions & 38 deletions docs/content/concepts/assets/software-defined-assets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -274,15 +274,15 @@ def downstream(may_not_materialize):

### Asset code versions

Assets may be assigned a `code_version`. Versions let you help Dagster track what assets are stale and avoid performing redundant computation.
Assets may be assigned a `code_version`. Versions let you help Dagster track what assets haven't been re-materialized since their code has changed, and avoid performing redundant computation.

```python file=/concepts/assets/code_versions.py startafter=start_single_asset endbefore=end_single_asset
@asset(code_version="1")
def asset_with_version():
return 100
```

When an asset with a code version is materialized, the generated `AssetMaterialization` is tagged with the version. An asset that has a different code version than the code version used for its most recent materialization will be considered stale. Any assets downstream of a stale asset are also considered stale.
When an asset with a code version is materialized, the generated `AssetMaterialization` is tagged with the version. The UI will indicate when an asset has a different code version than the code version used for its most recent materialization.

Multi-assets may assign different code versions for each of their outputs:

Expand All @@ -298,7 +298,7 @@ def multi_asset_with_versions():
yield Output(200, "b")
```

Just as with regular assets, these versions are attached to the `AssetMaterialization` objects for each of the constituent assets and accounted for when determining asset staleness.
Just as with regular assets, these versions are attached to the `AssetMaterialization` objects for each of the constituent assets and represented in the UI.

---

Expand Down Expand Up @@ -335,7 +335,6 @@ A <PyObject object="Definitions" /> object defines a code location, which is a c
- [Viewing all assets](#asset-catalog)
- [Details for an asset](#asset-details)
- [Dependency graph](#dependency-graph)
- [Upstream changes](#upstream-changed)

<TabGroup>
<TabItem name="Asset catalog (all assets)">
Expand Down Expand Up @@ -388,40 +387,6 @@ width={3574}
height={1962}
/>

</TabItem>
<TabItem name="Upstream changed indicator">

#### Upstream changed

<Note>
Currently, the <strong>upstream changed</strong> indicator won't display in
the following scenarios:
<ul>
<li>The upstream asset is in another code location definition or job</li>
<li>The assets are partitioned</li>
</ul>
</Note>

On occasion, you might see an **upstream changed** indicator on an asset in the dependency graph or on the **Asset Details** page:

<Image
alt="Asset Graph with an upstream changed indicator"
src="/images/concepts/assets/software-defined-assets/upstream-changed.png"
width={1556}
height={790}
/>

This occurs when a downstream asset's last materialization took place **earlier than the asset it depends on.** Dagit displays this alert to notify you that the contents of an asset may be stale. For example:

- `comments` is upstream of `comment_stories`
- `comment_stories` depends on `comments`
- `comment_stories` was last materialized on February 25 at **5:30PM**
- `comments` was last materialized on February 25 at **7:05PM**

In this case, the contents of `comment_stories` may be outdated, as the most recent data from `comments` wasn't used to compute them.

You can resolve this issue by re-materializing the downstream asset. This will re-compute the contents with the most recent data/changes to its upstream dependency.

</TabItem>
</TabGroup>

Expand Down
2 changes: 1 addition & 1 deletion docs/content/integrations/dbt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Dagster orchestrates dbt alongside other technologies, so you can combine dbt wi
Dagster has built-in support for loading dbt models, seeds, and snapshots as software-defined assets, enabling you to:

- Visualize and orchestrate a graph of dbt assets, and execute them with a single dbt invocation
- Version your dbt models by their defining SQL code, allowing downstream assets to be automatically marked stale when a model changes
- Version your dbt models by their defining SQL code, allowing Dagster to indicate when a model has changed
- View detailed historical metadata and logs for each asset
- Define Python computations that depend directly on tables updated using dbt
- Track data lineage through dbt and your other tools
Expand Down
2 changes: 1 addition & 1 deletion docs/content/integrations/dbt/reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ dbt_assets = load_assets_from_dbt_project(

## dbt models, code versions, and staleness

Note that Dagster allows the optional specification of a [`code_version`](/guides/dagster/scheduling-assets#step-5-change-code-versions) for each software-defined asset, which is used to track asset staleness. The `code_version` for an asset arising from a dbt model is defined automatically as the hash of the SQL defining the DBT model. This means that changing the code of the model will automatically cause the corresponding asset, and all downstream assets, to be marked stale.
Note that Dagster allows the optional specification of a [`code_version`](/guides/dagster/scheduling-assets#step-5-change-code-versions) for each software-defined asset, which are used to track changes. The `code_version` for an asset arising from a dbt model is defined automatically as the hash of the SQL defining the DBT model. This allows the asset graph in the UI to indicate which dbt models have new SQL since they were last materialized.

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def observation_op(context):
# end_observation_asset_marker_0

# start_partitioned_asset_observation
from dagster import AssetMaterialization, op
from dagster import AssetObservation, op


@op(config_schema={"date": str})
Expand All @@ -52,7 +52,7 @@ def partitioned_dataset_op(context):


# start_observation_asset_marker_2
from dagster import AssetObservation, MetadataValue, op
from dagster import AssetMaterialization, AssetObservation, MetadataValue, op


@op
Expand Down

0 comments on commit eb663dd

Please sign in to comment.