Archive your Bookmarks - Personal Archive Tool
Tim Schuster 771d316074
good progress on webinterface
9 months ago
aybm-rss Metadata plugin works now 9 months ago
aybm-web good progress on webinterface 9 months ago
plugins Concluded most work on aybm-rss, moving towards aybm-web 9 months ago
vendor New Vendor entries 9 months ago
.gitignore config.secl is now user overwrites 9 months ago
LICENSE Plugin Architecture complete, migrated wget and favicon fetching to plugins 9 months ago
README.md Hardcoded formatting in roadmap 9 months ago
config.go good progress on webinterface 9 months ago
fetch.go Concluded most work on aybm-rss, moving towards aybm-web 9 months ago
index.go Concluded most work on aybm-rss, moving towards aybm-web 9 months ago
info.go good progress on webinterface 9 months ago
plugins.go Concluded most work on aybm-rss, moving towards aybm-web 9 months ago
request.go Concluded most work on aybm-rss, moving towards aybm-web 9 months ago

README.md

AYBM - Archive your Bookmarks

AYBM is a tool that aims to make the archival of your bookmarks as easy as possible.

Using a plugin-driven architecture, AYBM can archive data very precisely if needed, for example only capturing a PDF file as PDF and not downloading it using WGET with recursive options and taking screenshots of it.

AYBM offers a self-consistent data structure, an archive consists of an infofile that contains all relevant information about the page and it's archival status.

Installation

To install, run: go get go.rls.moe/aybm

Usage

AYBM is not yet complete, so there is no binary and no documented usage guide yet.

However, to capture a RSS Feed, simply go into the aybm-rss directory and run go run main.go <RSS Feed URL>

AYBM will fetch all links from the feed.

Goals

  1. Make archival easy, installation should be quick and the tool should inform on missing dependencies
  2. Make data self-contained, the archived site should not depend on outside files, this makes sharing easy
  3. Unsupervised archiving, the tool should work mostly unsupervised and make good decisions on it's own
  4. Don't fix data, if there is an error, a human should fix it, archives are sensitive
  5. Extensible, some sites are better captured with dedicated tools, AYBM should allow developers to easily extend it to use these tools

Roadmap

[x] PDF Archival
[x] Favicon Archival
[x] Wget Archival (Recursive + WARC)
[ ] Metadata Capture on HTML Pages
[ ] PDF Printout of Webpages
[ ] Screenshots (Fixed Size)
[ ] Screenshots (Dynamic Size)
[ ] Web Interface (Read-Only]
[ ] Web Administration Interface (retry captures, fetch links automatically from feed, etc.)

Config

The Git Repository comes with a default configuration, copy that to the working directory of your AYBM tool.

The config is written in SECL format, the existing options should however give sufficient examples to easily work with the file.

Linkhash

Per default the directory path is /bookmarks/{linkhash:chop}

linkhash refers to the xxhash64 algorithm, which is a non-cryptographic, fast hash with low collision rate. This default path should handle up to a couple million links without running into any collisions.

You can use any of these placeholders instead:

  • {linkhash:sha256}, {linkhash:sha256-chop} for SHA2 256bit
  • {linkhash:sha512}, {linkhash:sha512-chop} for SHA2 512bit
  • {linkhash}, {linkhash:chop} for xxhash 64bit
  • {linkhash:short}, {linkhash:short-chop} for xxhash 32bit (this might generate collisions upwards of a couple thousand links but is shorter)
  • {link} for the escaped link
  • {timestamp}, {timestamp:hex} for the current unix timestamp, beware that this breaks refetching and remembering which links are stored, atm

The chop option will split the path every 4 character by default, this can be adjusted in extra.chop-size.

User-Agent

The default user agent is Lynx AYBM/Crawler0.1, this triggers some websites to better present themselves for capture but will reduce graphical complexity.

It is not recommended to change the user agent but you can adjust it in extra.user-agent

LogLevel

The default log level is debug, for a production setting warn is recommended.

You may set DISCORD_WEBHOOK_URL to receive log messages above and at warn in a discord channel.