Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

mikkovedru · 2023-01-09T20:11:59Z

It would really be nice to have this program support saving the previews of external URLs. Either like Telegram is doing:

or extend it to be able to crawl (something simple like https://github.com/baranazal/HTML-PAGE-SCRAPER-BOT ) the page and save info from the page according to the logic/template files.

But why?..

In addition to normal Obsidian notes (daily or zettelkasten) I also use Obsidian to keep track of references and sources. Some of them are .pdfs of research, some are links or even files of videos, some are long articles (opinion pieces) and my opinion on them. But a lot of them are simple news articles.

This has been incredibly helpful. I now have a database of actual raw news that I can reference in my normal notes. News are linked together and I can easily search for developments, noticing patterns and inconsistencies/lies. With this system I feel like a superhuman, acting as if I remember everything and can provide the sources to my opinions and point the flip-flopping in an instant.

The only problem is that it just takes a lot of time. Imagine having merely 10 news articles per day worth saving - how much times is spent on manual and tedious work of saving this raw data into my Obsidian. It would be amazing if this could be done automatically with me simply forwarding the news links I view worthy within the Telegram.

Solution

having one LOGIC FILE for each URL source. New logic file needs to be created for each new base URL because every website is different and info locates in different places.
having one NEWS TEMPLATE FILE per person. Each person either uses the current creates his own basing on several given news template example files.

There is not many sources that I consider reputable enough to be my sources and I suspect creating merely 20-30 logic files (and one template) would be enough to cover the vast majority of my grunt work. Other people could create logic files for other sources, as well as can easily modify the template file to fit their individual needs.

How I invision the solution (to have at some point)

Check if message consists of URL only.
If yes, then check if there is a LOGIC FILE associated with the base URL.
If yes, then crawl the page and send it fully to the specific URL's logic file.
The logic file processes the content of the page and returns the content in parts (date, title, short content/description, full content)
All the returned info is then used according to the NEWS TEMPLATE FILE to create a new file.
PROFIT. The most tedious work is done and if I want, I can go and manually process the news note further: add some tags, write my own opinion about this news, link the news to other news, and make some names clickable/active.

The features needed

Template (in addition to the support of the mentioned: date, title, short content/description, full content; so that it):
- Generate YYYYMMDDHHMMSS zettelkasten index number as well as have a restriction of slowing the process enough for there to be at max only one generation each second (in case the program needs to process several messages with URLs).

Discussion

What do you think? I know that this is quite a lot of work. But could we at least start working in this direction?

dimonier · 2023-01-15T17:45:20Z

@mikkovedru Thank you for the suggestion!
Looks both useful and quite a lot of work :)

For now I'd prefer to incorporate a ready made solution for this that would return a tiny screen shot like the following for a link:

And get back to idea of saving the whole page later.

Previews/content of external URLs part of changes for #4

dimonier · 2023-01-22T21:29:52Z

@mikkovedru Please check how @pkb implemented this using meta properties in og: namespace.
Looks fine and relatively simple for me.

dimonier added this to Telegram to Obsidian Jan 11, 2023

dimonier added the enhancement New feature or request label Jan 11, 2023

dimonier moved this to 📋 Backlog in Telegram to Obsidian Jan 11, 2023

pkb mentioned this issue Jan 20, 2023

Previews/content of external URLs part of changes for #4 #18

Merged

dimonier closed this as completed in 602a0bc Jan 22, 2023

dimonier added a commit that referenced this issue Jan 22, 2023

Merge pull request #18 from pkb/link_info

04bdd65

Previews/content of external URLs part of changes for #4

github-project-automation bot moved this from 📋 Backlog to ✅ Done in Telegram to Obsidian Jan 22, 2023

dimonier reopened this Jan 22, 2023

mikkovedru mentioned this issue Feb 19, 2023

feature request: Telegram + Obsidian = Read Later #9

Open

dimonier closed this as completed Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

mikkovedru commented Jan 9, 2023

dimonier commented Jan 15, 2023

dimonier commented Jan 22, 2023

Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

Suggestion: save previews/content of external URLs (crawler and logic/template support) #4

Comments

mikkovedru commented Jan 9, 2023

But why?..

Solution

How I invision the solution (to have at some point)

The features needed

Discussion

dimonier commented Jan 15, 2023

dimonier commented Jan 22, 2023