Commit graph

51 commits

Author SHA1 Message Date
Mark Felder 5da9cbd8a5 RichMedia refactor
Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.

Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.

Implementation notes:

- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering

Overall performance of timelines and creating new posts which contain URLs is greatly improved.
2024-06-09 17:33:48 +01:00
FloatingGhost 98cb255d12 Support elixir1.15
OTP builds to 1.15

Changelog entry

Ensure policies are fully loaded

Fix :warn

use main branch for linkify

Fix warn in tests

Migrations for phoenix 1.17

Revert "Migrations for phoenix 1.17"

This reverts commit 6a3b2f15b7.

Oban upgrade

Add default empty whitelist

mix format

limit test to amd64

OTP 26 tests for 1.15

use OTP_VERSION tag

baka

just 1.15

Massive deps update

Update locale, deps

Mix format

shell????

multiline???

?

max cases 1

use assert_recieve

don't put_env in async tests

don't async conn/fs tests

mix format

FIx some uploader issues

Fix tests
2023-08-03 17:44:09 +01:00
FloatingGhost bf7ff6a337 Put rich media processing in a Task 2022-12-30 20:11:53 +00:00
Haelwenn (lanodan) Monnier c4439c630f
Bump Copyright to 2021
grep -rl '# Copyright © .* Pleroma' * | xargs sed -i 's;Copyright © .* Pleroma .*;Copyright © 2017-2021 Pleroma Authors <https://pleroma.social/>;'
2021-01-13 07:49:50 +01:00
lain 713612c377 Cachex: Make caching provider switchable at runtime.
Defaults to Cachex.
2020-12-18 17:44:46 +01:00
rinpatch db80b9d630 RichMedia: Fix log spam on failures and resetting TTL on cached errors 2020-09-17 16:56:39 +03:00
rinpatch bb407edce4 RichMedia: fix a compilation error due to nonexistent variable
No idea why this passed Gitlab CI
2020-09-14 15:46:00 +03:00
rinpatch f66a15c4a5 RichMedia parser: do not set a cache TTL for unchanging errors 2020-09-14 14:44:25 +03:00
rinpatch 170599c390 RichMedia: do not log webpages missing metadata as errors
Also fixes the return value of Parser.parse on errors, previously
was just `:ok` due to the logger call in the end
2020-09-05 22:05:35 +03:00
rinpatch 19691389b9 Rich media: Add failure tracking 2020-09-02 14:59:52 +03:00
Alexander Strizhakov 03d06062ab
don't fail on url fetch 2020-09-01 19:39:07 +03:00
Mark Felder 016d8d6c56 Consolidate construction of Rich Media Parser HTTP requests 2020-08-03 12:37:31 -05:00
Mark Felder 18438a9bf0 Add "Bot" to User Agent to coerce Twitter into serving OGP <meta> tags. 2020-07-07 11:21:05 -05:00
Egor Kislitsyn 58e4e3db8b
Merge remote-tracking branch 'origin/develop' into merge-ogp-twitter-parsers 2020-06-15 16:03:40 +04:00
Egor Kislitsyn 520367d6fd Fix atom leak in Rich Media Parser 2020-06-13 12:08:46 +03:00
Egor Kislitsyn 1f35acce54
Merge OGP parser with TwitterCard 2020-06-11 17:57:31 +04:00
Alexander Strizhakov 509c81e4b1
Merge branch 'develop' into gun 2020-03-03 10:08:07 +03:00
Haelwenn (lanodan) Monnier 6da6540036
Bump copyright years of files changed after 2020-01-07
Done via the following command:
git diff fcd5dd259a --stat --name-only | xargs sed -i '/Pleroma Authors/c# Copyright © 2017-2020 Pleroma Authors <https:\/\/pleroma.social\/>'
2020-03-02 06:08:45 +01:00
Mark Felder cf94349287 Merge branch 'develop' into gun 2020-02-18 09:06:27 -06:00
Alexander Strizhakov 514c899275
adding gun adapter 2020-02-18 08:19:01 +03:00
rinpatch 472132215e Use floki's new APIs for parsing fragments 2020-02-16 01:55:26 +03:00
feld 237b2068f9 Revert "Merge branch 'feat/floki-fasthtml' into 'develop'"
This reverts merge request !2194
2020-02-11 16:55:18 +00:00
rinpatch ea1631d7e6 Make Floki use fast_html 2020-02-11 16:17:21 +03:00
Maksim Pechnikov b4cf74c106 added prepare html for RichMedia.Parser 2019-09-15 14:53:58 +03:00
Ariadne Conill d3bdb8e704 rich media: parser: splice the given URL into the result 2019-07-23 23:51:29 +00:00
rinpatch 3368174785 Fix rich media parser failing when no TTL can be found by image TTL
setters
2019-07-21 18:22:22 +03:00
Sachin Joshi de9906ad56 change the structure of image ttl parsar 2019-07-19 11:43:42 +05:45
Sachin Joshi 18234cc44e add the rich media ttl based on image exp time 2019-07-17 00:20:34 +05:45
Alex S f4447d82b8 parsers configurable 2019-07-14 09:21:56 +03:00
rinpatch 92213fb87c Replace Mix.env with Pleroma.Config.get(:env)
Mix.env/0 is not availible in release environments such as distillery or
elixir's built-in releases.
2019-06-06 23:59:51 +03:00
Sergey Suprunenko 1690be991e Replace missing non-nullable Card attributes with empty strings 2019-05-30 21:03:31 +00:00
Roman Chvanikov 4615e56219 Add with_body: true to requests relying on max_body: val 2019-04-12 00:16:33 +07:00
William Pitcock 19afd9f81f http: rework connection timeouts to match hackney docs, enforce 1 second max TCP connection timeout 2019-03-08 22:56:16 +00:00
William Pitcock 45e57dd187 rich media: tighten fetching timeouts and size limits 2019-02-10 21:54:08 +00:00
William Pitcock d83dbd9070 rich media: parser: reject any data which cannot be explicitly encoded into JSON 2019-02-05 20:50:57 +00:00
William Pitcock 46dba03098 rich media: parser: only try to validate strings, not numbers (OEmbed) 2019-01-31 16:19:31 +00:00
William Pitcock dafb6f0b5e rich media: parser: reject OGP fields we cannot safely process 2019-01-31 16:03:56 +00:00
href 5ea0397e2d
Fix 4aff4efa typos 2019-01-30 21:08:41 +01:00
href 4aff4efa8d
Use multiple hackney pools
* federation (ap, salmon)
* media (rich media, media proxy)
* upload (uploader proxy)

Each "part" will stop fighting others ones -- a huge federation outbound
could before make the media proxy fail to checkout a connection in time.

splitted media and uploaded media for the good reason than an upload
pool will have all connections to the same host (the uploader upstream).
it also has a longer default retention period for connections.
2019-01-30 15:06:46 +01:00
William Pitcock 0f11254a06 rich media: parser: add some basic sanity checks on the returned data with pattern matching 2019-01-28 20:43:21 +00:00
William Pitcock 83b7062634 rich media: parser: cache negatives 2019-01-28 20:19:07 +00:00
William Pitcock 8fb16e9f0f rich media: parser: add copyright header 2019-01-28 20:00:01 +00:00
William Pitcock de42646634 rich media: add try/rescue to ensure we catch parsing and fetching failures 2019-01-28 05:53:17 +00:00
William Pitcock 8f2f471e94 rich media: gracefully handle fetching nil URIs 2019-01-26 16:36:17 +00:00
Maxim Filippov b8a77c5d70 Add OEmbed parser 2019-01-13 02:06:50 +02:00
Maxim Filippov 1f851a0723 Add Twitter Card parser 2019-01-10 18:09:56 +00:00
rinpatch a2d7f0e0e9 Remove :commit since a tuple is already returned 2019-01-09 21:35:01 +03:00
William Pitcock 487c00d36d rich media: disable cachex in test mode 2019-01-04 23:53:26 +00:00
William Pitcock 0964c207eb rich media: use cachex to avoid flooding remote servers 2019-01-04 23:32:01 +00:00
Maxim Filippov 48e81d3d40 Add RichMediaController and tests 2019-01-02 17:02:50 +03:00