Commit graph

15710 commits

Author SHA1 Message Date
Mark Felder f4daa90bd8 Remove test validating missing descriptions are returned as an empty string 2024-06-09 17:37:59 +01:00
Mark Felder 688748b531 Improve test description 2024-06-09 17:37:32 +01:00
Mark Felder 2e5aa71176 Rich Media Cards are fetched asynchonously and not guaranteed to be available on first post render 2024-06-09 17:37:22 +01:00
Mark Felder 7ca655a999 Rich Media Cards are cached by URL not per status 2024-06-09 17:36:57 +01:00
Mark Felder 4746f98851 Fix broken Rich Media parsing when the image URL is a relative path 2024-06-09 17:36:28 +01:00
Mark Felder 765c7e98d2 Respect the TTL returned in OpenGraph tags 2024-06-09 17:36:15 +01:00
Mark Felder ddbe989461 Fix broken tests 2024-06-09 17:35:47 +01:00
Floatingghost 4a3dd5f65e lost in cherry-pick 2024-06-09 17:34:41 +01:00
Mark Felder bfe4152385 Increase the :max_body for Rich Media to 5MB
Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.
2024-06-09 17:34:29 +01:00
Mark Felder 5da9cbd8a5 RichMedia refactor
Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.

Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.

Implementation notes:

- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering

Overall performance of timelines and creating new posts which contain URLs is greatly improved.
2024-06-09 17:33:48 +01:00
Floatingghost a924e117fd Add pool timeouts 2024-06-09 17:20:29 +01:00
floatingghost d1c4b97613 Merge pull request 'Raise minimum PostgreSQL version to 12' (#786) from Oneric/akkoma:psql-min-ver into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/786
2024-06-07 16:53:22 +00:00
Oneric 2180d068ae Raise log level for start failures 2024-06-07 16:21:21 +02:00
Oneric a3840e7d1f Raise minimum PostgreSQL version to 12
This lets us:
 - avoid issues with broken hash indices for PostgreSQL <10
 - drop runtime checks and legacy codepaths for <11 in db search
 - always enable custom query plans for performance optimisation

PostgreSQL 11 is already EOL since 2023-11-09, so
in theory everyone should already have moved on to 12 anyway.
2024-06-07 16:21:09 +02:00
Oneric b17d3dc6d8 Fix changelog
Apparently got jumbled during some rebase(s)
2024-06-07 16:20:34 +02:00
floatingghost f8f364d36d Merge pull request 'Handle errors from HTTP requests gracefully' (#791) from wp-embeds into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/791
2024-06-07 12:58:58 +00:00
floatingghost 329d8fcba8 Merge pull request 'Update PGTune recommendations' (#795) from norm/akkoma:pgtune into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/795
2024-06-07 12:57:00 +00:00
Norm e2860e5292 Update PGTune recommendations
From experience, setting DB type to "Online transaction processing
system" seems to give the most optimal configuration in terms of
performance.

I also increased the recomended max connections to 25-30 as that leaves
some room for maintenance tasks to run without running out of
connections.

Finally, I removed the example configs since they're probably out of
date and I think it's better to direct people to use PGTune instead.
2024-06-06 12:18:51 -04:00
Floatingghost 0f65dd3ebe remove pointless logger 2024-06-04 14:34:59 +01:00
Floatingghost 38d09cb0ce remove now-pointless clause 2024-06-04 14:34:18 +01:00
Floatingghost c9a03af7c1 Move rescue to the HTTP request itself 2024-06-04 14:30:16 +01:00
Floatingghost 0f7ae0fa21 am i baka 2024-06-04 14:26:33 +01:00
Floatingghost 30e13a8785 Don't error on rich media fail 2024-06-04 14:21:40 +01:00
Floatingghost 778b213945 enqueue pin fetches after changeset validation 2024-06-01 08:25:35 +01:00
floatingghost 8f97c15b07 Merge pull request 'Preserve Meilisearch’s result ranking' (#772) from Oneric/akkoma:search-meili-order into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/772
2024-05-31 14:12:05 +00:00
Floatingghost 3af0c53a86 use proper workers for fetching pins instead of an ad-hoc task (#788)
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/788
Co-authored-by: Floatingghost <hannah@coffee-and-dreams.uk>
Co-committed-by: Floatingghost <hannah@coffee-and-dreams.uk>
2024-05-31 08:58:52 +00:00
Oneric fc7e07f424 meilisearch: enable using search_key
Using only the admin key works as well currently
and Akkoma needs to know the admin key to be able
to add new entries etc. However the Meilisearch
key descriptions suggest the admin key is not
supposed to be used for searches, so let’s not.

For compatibility with existings configs, search_key remains optional.
2024-05-29 23:17:27 +00:00
Oneric 59685e25d2 meilisearch: show keys by name not description
This makes show-key’s output match our documentation as of Meilisearch
1.8.0-8-g4d5971f343c00d45c11ef0cfb6f61e83a8508208. Since I’m not sure
if older versions maybe only provided description, it will fallback to
the latter if no name parameter exists.
2024-05-29 23:17:27 +00:00
Oneric 65aeaefa41 meilisearch: respect meili’s result ranking
Meilisearch is already configured to return results sorted by a
particular ranking configured in the meilisearch CLI task.
Resorting the returned top results by date partially negates this and
runs counter to what someone with tweaked settings expects.

Issue and fix identified by AdamK2003 in
https://akkoma.dev/AkkomaGang/akkoma/pulls/579
But instead of using a O(n^2) resorting, this commit directly
retrieves results in the correct order from the database.

Closes: https://akkoma.dev/AkkomaGang/akkoma/pulls/579
2024-05-29 23:17:27 +00:00
Oneric 5d6cb6a459 meilisearch: remove duplicate preload 2024-05-29 23:17:27 +00:00
floatingghost 8afc3bee7a Merge pull request 'Use /var/tmp for media cache path' (#776) from norm/akkoma:nginx-var-tmp into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/776
Reviewed-by: floatingghost <hannah@coffee-and-dreams.uk>
2024-05-28 02:05:17 +00:00
floatingghost 72871d4514 Merge pull request 'Drop unused indices' (#767) from Oneric/akkoma:purge-unused-indices into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/767
2024-05-28 01:35:18 +00:00
floatingghost 72af38c0e9 Merge pull request 'migrate CI config to v2' (#785) from woodpecker-v2 into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/785
2024-05-27 03:32:40 +00:00
Floatingghost ae19fd90c9 use elixir 1.16 for format checks 2024-05-27 04:07:44 +01:00
Floatingghost 66b3248dd3 mix tests probably shouldn't be async 2024-05-27 04:03:13 +01:00
Floatingghost 73ead8656a don't allow emoji formatter to be async 2024-05-27 03:25:18 +01:00
Floatingghost f32a7fd76a arch is aarch64 now 2024-05-27 03:02:02 +01:00
Floatingghost 4078fd655c migrate CI config to v2 2024-05-27 02:56:05 +01:00
floatingghost 5bdef8c724 Merge pull request 'Allow for attachment to be a single object in user data' (#783) from single-attachment into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/783
2024-05-27 01:44:53 +00:00
floatingghost cdc918c8f1 Merge pull request 'Document AP and nodeinfo extensions' (#778) from Oneric/akkoma:doc_ap-extensions into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/778
2024-05-27 01:34:58 +00:00
Floatingghost f15eded3e1 Add extra test case for nonsense field, increase timeouts 2024-05-27 02:09:48 +01:00
Oneric 05eda169fe Document AP and nodeinfo extensions
And while add it point to this via a top-level
FEDERATION.md document as standardised by FEP-67ff.

Also add a few missing descriptions to the config cheatsheet
and move the recently removed C2S extension into an appropiate
subsection.
2024-05-26 19:04:06 +02:00
floatingghost 3ce855cbde Merge pull request 'Fix Exiftool stderr being read as an image description' (#782) from norm/akkoma:fix-exiftool-description into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/782
2024-05-26 16:11:12 +00:00
Floatingghost da67e69af5 Allow for attachment to be a single object in user data 2024-05-26 17:09:26 +01:00
Norm c2d3221be3 Fix Exiftool stderr being read as an image description
Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/773
2024-05-23 14:44:17 -04:00
Floatingghost 5e92f955ac bump version 2024-05-22 19:42:25 +01:00
Floatingghost b72127b45a Merge remote-tracking branch 'oneric-sec/media-owner' into develop 2024-05-22 19:36:10 +01:00
Oneric 9a91299f96 Don't try to handle non-media objects as media
Trying to display non-media as media crashed the renderer,
but when posting a status with a valid, non-media object id
the post was still created, but then crashed e.g. timeline rendering.
It also crashed C2S inbox reads, so this could not be used to leak
private posts.
2024-05-22 20:30:23 +02:00
Oneric fbd961c747 Drop activity_type override for uploads
Afaict this was never used, but keeping this (in theory) possible
hinders detecting which objects are actually media uploads and
which proper ActivityPub objects.

It was originally added as part of upload support itself in
02d3dc6869 without being used
and `git log -S:activity_type` and `git log -Sactivity_type:`
don't find any other commits using this.
2024-05-22 20:30:23 +02:00
Oneric 0c2b33458d Restrict media usage to owners
In Mastodon media can only be used by owners and only be associated with
a single post. We currently allow media to be associated with several
posts and until now did not limit their usage in posts to media owners.
However, media update and GET lookup was already limited to owners.
(In accordance with allowing media reuse, we also still allow GET
lookups of media already used in a post unlike Mastodon)

Allowing reuse isn’t problematic per se, but allowing use by non-owners
can be problematic if media ids of private-scoped posts can be guessed
since creating a new post with this media id will reveal the uploaded
file content and alt text.
Given media ids are currently just part of a sequentieal series shared
with some other objects, guessing media ids is with some persistence
indeed feasible.

E.g. sampline some public media ids from a real-world
instance with 112 total and 61 monthly-active users:

  17.465.096  at  t0
  17.472.673  at  t1 = t0 + 4h
  17.473.248  at  t2 = t1 + 20min

This gives about 30 new ids per minute of which most won't be
local media but remote and local posts, poll answers etc.
Assuming the default ratelimit of 15 post actions per 10s, scraping all
media for the 4h interval takes about 84 minutes and scraping the 20min
range mere 6.3 minutes. (Until the preceding commit, post updates were
not rate limited at all, allowing even faster scraping.)
If an attacker can infer (e.g. via reply to a follower-only post not
accessbile to the attacker) some sensitive information was uploaded
during a specific time interval and has some pointers regarding the
nature of the information, identifying the specific upload out of all
scraped media for this timerange is not impossible.

Thus restrict media usage to owners.

Checking ownership just in ActivitDraft would already be sufficient,
since when a scheduled status actually gets posted it goes through
ActivityDraft again, but would erroneously return a success status
when scheduling an illegal post.

Independently discovered and fixed by mint in Pleroma
1afde067b1
2024-05-22 20:30:18 +02:00