Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

Closed
mikkovedru opened this issue Jan 9, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@mikkovedru
Copy link
Contributor

It would really be nice to have this program support saving the previews of external URLs. Either like Telegram is doing:
image
or extend it to be able to crawl (something simple like https://github.com/baranazal/HTML-PAGE-SCRAPER-BOT ) the page and save info from the page according to the logic/template files.

But why?..

In addition to normal Obsidian notes (daily or zettelkasten) I also use Obsidian to keep track of references and sources. Some of them are .pdfs of research, some are links or even files of videos, some are long articles (opinion pieces) and my opinion on them. But a lot of them are simple news articles.

This has been incredibly helpful. I now have a database of actual raw news that I can reference in my normal notes. News are linked together and I can easily search for developments, noticing patterns and inconsistencies/lies. With this system I feel like a superhuman, acting as if I remember everything and can provide the sources to my opinions and point the flip-flopping in an instant.

The only problem is that it just takes a lot of time. Imagine having merely 10 news articles per day worth saving - how much times is spent on manual and tedious work of saving this raw data into my Obsidian. It would be amazing if this could be done automatically with me simply forwarding the news links I view worthy within the Telegram.

Solution

  • having one LOGIC FILE for each URL source. New logic file needs to be created for each new base URL because every website is different and info locates in different places.
  • having one NEWS TEMPLATE FILE per person. Each person either uses the current creates his own basing on several given news template example files.

There is not many sources that I consider reputable enough to be my sources and I suspect creating merely 20-30 logic files (and one template) would be enough to cover the vast majority of my grunt work. Other people could create logic files for other sources, as well as can easily modify the template file to fit their individual needs.

How I invision the solution (to have at some point)

  1. Check if message consists of URL only.
  2. If yes, then check if there is a LOGIC FILE associated with the base URL.
  3. If yes, then crawl the page and send it fully to the specific URL's logic file.
  4. The logic file processes the content of the page and returns the content in parts (date, title, short content/description, full content)
  5. All the returned info is then used according to the NEWS TEMPLATE FILE to create a new file.
  6. PROFIT. The most tedious work is done and if I want, I can go and manually process the news note further: add some tags, write my own opinion about this news, link the news to other news, and make some names clickable/active.

The features needed

  • Template (in addition to the support of the mentioned: date, title, short content/description, full content; so that it):
    • Generate YYYYMMDDHHMMSS zettelkasten index number as well as have a restriction of slowing the process enough for there to be at max only one generation each second (in case the program needs to process several messages with URLs).

Discussion

What do you think? I know that this is quite a lot of work. But could we at least start working in this direction?

@dimonier dimonier added the enhancement New feature or request label Jan 11, 2023
@dimonier dimonier moved this to 📋 Backlog in Telegram to Obsidian Jan 11, 2023
@dimonier
Copy link
Owner

@mikkovedru Thank you for the suggestion!
Looks both useful and quite a lot of work :)

For now I'd prefer to incorporate a ready made solution for this that would return a tiny screen shot like the following for a link:
image

And get back to idea of saving the whole page later.

dimonier added a commit that referenced this issue Jan 22, 2023
Previews/content of external URLs part of changes for #4
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Telegram to Obsidian Jan 22, 2023
@dimonier dimonier reopened this Jan 22, 2023
@dimonier
Copy link
Owner

@mikkovedru Please check how @pkb implemented this using meta properties in og: namespace.
Looks fine and relatively simple for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

2 participants