Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve status image quality #391

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

Shinigami92
Copy link
Member

Improves the image quality of status preview cards by fetching async behind the background server and calls the underlying website and extracts the og:image meta-property
then caches it and keep it in memory until not relevant anymore

previous new
image image

tip: open both images and compare by switching tabs back and forth

@Shinigami92 Shinigami92 added the c: feature Request for new feature label Dec 10, 2022
@Shinigami92 Shinigami92 self-assigned this Dec 10, 2022
@stackblitz
Copy link

stackblitz bot commented Dec 10, 2022

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@netlify
Copy link

netlify bot commented Dec 10, 2022

Deploy Preview for elk-zone ready!

Name Link
🔨 Latest commit 5b37047
🔍 Latest deploy log https://app.netlify.com/sites/elk-zone/deploys/6397b46b65ae7f0008ea1634
😎 Deploy Preview https://deploy-preview-391--elk-zone.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@patak-dev
Copy link
Member

@Shinigami92 is the cache really necessary? I think that og-images should already be returning with proper cache headers and we can let the browser do this work. I think we shouldn't need to deal with this on our side 🤔

@Shinigami92
Copy link
Member Author

@Shinigami92 is the cache really necessary? I think that og-images should already be returning with proper cache headers and we can let the browser do this work. I think we shouldn't need to deal with this on our side thinking

We have two options:

  • use an opengraph api with an auth-token and in worst case run into rate-limit + cost
  • having this cache and just do it on our own

The solution I implemented fetches the WHOLE html content and then extract the og:image from it (if it exists)
Requesting and extracting the og:image is IMO more costy then a O(1) cache access without requesting e.g. GitHub or Reddit for every single thumbnail

@patak-dev
Copy link
Member

mmm... I misunderstood what the cache was for. Ok, this now makes more sense, although I don't know if we want this kind of logic in the server. Leaving this one to @danielroe to asses, but this may not scale. It is a pity but maybe we should change the way we do our previews to have smaller social images somehow to use the social image infra of Mastodon servers 🤔
It may be something that is configurable at the server level, and we could detect how good the resolution is and change how we render the card.

Or maybe the original image is cached too? I don't see the API in their docs though. Looks like only this size is cached by the servers.
https://media.webtoo.ls/cache/preview_cards/images/000/002/750/original/b5f8f103aca6a135.png

@Shinigami92
Copy link
Member Author

Or maybe the original image is cached too? I don't see the API in their docs though. Looks like only this size is cached by the servers. media.webtoo.ls/cache/preview_cards/images/000/002/750/original/b5f8f103aca6a135.png

I already searched like an hour for alternative bigger images in the cache, and I found nothing
Trust me I would have implemented differently if I had found a better solution ^^

@patak-dev
Copy link
Member

Ya, I see. IMO we should still use the social images infra that Mastodon servers provide and:

  • Use a different card design so they don't look blurry
  • For certain URLs, don't show the social image and instead generate a card ourselves with the info we have (or using another endpoint). For example for GitHub URLs, we can recreate the card with HTML and show the org name, the repo name, and the avatar with a better definition for https://github.com/vitejs/vite, using https://github.com/vitejs.png.

@Shinigami92
Copy link
Member Author

Ya, I see. IMO we should still use the social images infra that Mastodon servers provide and:

  • Use a different card design so they don't look blurry
  • For certain URLs, don't show the social image and instead generate a card ourselves with the info we have (or using another endpoint). For example for GitHub URLs, we can recreate the card with HTML and show the org name, the repo name, and the avatar with a better definition for https://github.com/vitejs/vite, using https://github.com/vitejs.png.

I'm slightly against this as these thumbnails holds usually much more info then just a logo
It already contains the description text, contributor, issues, stars and forks count
also usually an avatar of the committer and so on and so forth
an this is just for github, but these thumbnails are not only for github but e.g. also for everything else like e.g. fakerjs.dev 🤷

It is impossible to create thumbnails for all kind of sites without asking concrete informations

Let's wait on @danielroe and I hope he sees the value in this cached solution that already works great
I could also think, that the 5k count can be adjusted to any value the elk-server hoster wants to

@patak-dev
Copy link
Member

For the record, I see the value and would love to have full resolution social images but if it would be just a matter of a memory cache, the Mastodon server would already be doing this. Twitter also has strong caching of social images.

@Shinigami92
Copy link
Member Author

Shinigami92 commented Dec 12, 2022

Here is another not GitHub example where I can clearly spot a difference in image quality

https://elk.zone/mas.to/@[email protected]/109496256426369984
Links to => https://twitter.com/valorantde/status/1601970137187614723

Extracts og:image:

<meta content="https://pbs.twimg.com/media/FjtXVhIX0AEZfle.jpg:large" property="og:image" data-rh="true">

And another example: https://elk.zone/mas.to/@[email protected]/109500354824115624


luckily both these examples have reposted the image itself, but this is not always the case and therefore not the point here for these examples


Okay here is now an extreme example: https://elk.zone/mas.to/@[email protected]/109500422454039277

Mastodon: https://media.mas.to/masto-public/cache/preview_cards/images/014/199/635/original/c816ebf180bc7d69.png
vs
Real: https://cdn.kevquirk.com/wp-content/uploads/2020/04/night-light-gnome-1024x613.png

How in the world could someone resolve og:image from cdn.kevquirk.com in any other way? (except of using opengraph which costs money)

server/api/og-image.ts Outdated Show resolved Hide resolved
@Shinigami92 Shinigami92 marked this pull request as draft December 12, 2022 13:35
@Shinigami92
Copy link
Member Author

The current solution does not work for examples like Twitter that async loads og:image via JS

@jasikpark
Copy link

Eh, I was thinking of how it's nice that some services like Signal and vanilla Mastodon only request the og image once, via the author in Signal's case & via the homeserver in Mastodon's case.

The nice thing is this prevents following people from causing you to make requests to exterior sites, which could be a privacy vector similar to facebook tracking pixel for example, ++ then servers aren't getting N og image requests per client loading the post

@Shinigami92
Copy link
Member Author

@ayoayco @jasikpark But I already use caching strategies as most as I can
I'm aware of that it makes some more requests, but as it only perform requests if it needs to make one and also it only calls distributed pages, I think this is not such a big problem

I'm assuming that Twitter makes a lot more such requests than mastodon and elk combined

@Shinigami92 Shinigami92 marked this pull request as ready for review December 12, 2022 22:20
return
}

await sendRedirect(event, ogImageUrl)
Copy link
Member

@danielroe danielroe Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main issue is that by sending a redirect we are opting-out of the netlify built assets cache. pushing something to test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But fetching all the images and caching them on netlify is non goal and something I specially wanted to omit from the start

All these og:image out there in the world are already exactly provided to be shown via social media, are they?
So why do we need to copy images and store them again on our side? IMO this is just duplicating and waste of carbon footprint.

Whereas the first solution I had with the in memory cache worked totally find but just was one hop more to our internal api
Why do we not cache public status posts or account but every client ask the mastodon api individually? This is okay but the in memory internal server side cache is not?

Copy link
Member

@danielroe danielroe Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Yes, I see that. It's tricky, but firing up a lambda to respond to a request is also quite costly compared to CDN cache. We could cache a JSON response with the URL to the image instead.

But, on the other hand, that way we have eagerly request rather than relying on browser + srcset 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Yes, I see that. It's tricky, but firing up a lambda to respond to a request is also quite costly compared to CDN cache. We could cache a JSON response with the URL to the image instead.

But, on the other hand, that way we have eagerly request rather than relying on browser + srcset 🤔

A benefit would be, we can enhance it later with more tags like og:title, og:image:alt and og:description (if needed)
I think the on-demand loading could be solved via virtual scrolling and/or Intersection Observer

@danielroe
Copy link
Member

Removed opengraph integration for now as the 100 req / mo limit was very low and it became costly. I'm still worrked about usage; the consequence would be what appears to be sluggish image loading, and huge usage on netlify serverless functions.

It might be better to perform selective enhancement for known domains (like github.com) which would also reduce the potential of this being misused (after all, currently we will fetch any url included in the request to /api/og-image, which feels somewhat unsafe) - and in some cases we may be able to perform that optimisation without another round trip (that is, if og image url is predictable).

@Shinigami92
Copy link
Member Author

Opened an issue at Mastodon: mastodon/mastodon#22268

@antfu
Copy link
Member

antfu commented Dec 13, 2022

Generally, I would suggest avoiding putting server-side logic into Elk as much as possible. As it increases the load exponentially as the user base grows. I agree it's better to be handled on Mastodon side instead.

@Shinigami92
Copy link
Member Author

Generally, I would suggest avoiding putting server-side logic into Elk as much as possible. As it increases the load exponentially as the user base grows. I agree it's better to be handled on Mastodon side instead.

I wasn't aware that all server side code inside Nuxt's server folder are running on lambda functions.
I thought it is just one instance (like EC2).
Therefore my in-memory-cache strategy only worked locally on my machine, but wouldn't work on lambda functions as they do not have shared mem.

I put this PR on hold for now and we will see what Mastodon will do.
One improvement already made by @ayoayco via #390, but this doesn't fix blurriness of e.g. GitHub cards.

@Shinigami92 Shinigami92 added the s: on hold Blocked by something or frozen to avoid conflicts label Dec 14, 2022
@Shinigami92 Shinigami92 marked this pull request as draft December 14, 2022 08:26
@ayoayco
Copy link
Member

ayoayco commented Dec 14, 2022

Thanks @Shinigami92! I think you did good exploration here.

In #413 @LittleSound also got in some code change to not stretch smaller images.

I'm wanna check next if native dimensions (i think 400x200) will look okay for bigger ones

@KaKi87 KaKi87 closed this Jan 4, 2023
@KaKi87 KaKi87 deleted the feat-increase-status-preview-card-image-quality branch January 4, 2023 17:49
@danielroe danielroe restored the feat-increase-status-preview-card-image-quality branch January 4, 2023 17:57
@danielroe danielroe reopened this Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: feature Request for new feature s: on hold Blocked by something or frozen to avoid conflicts
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants