Skip to main content

An exporter written in python to export all documents from a bookstack instance in different formats

Project description

bookstack-file-exporter

Table of Contents

Background

Features are actively being developed. See Future Items section for more details. Open an issue for a feature request.

This tool provides a way to export Bookstack pages and their content (text, images, metadata, etc.) into a relational parent-child layout locally with an option to push to remote object storage locations. See Backup Behavior section for more details on how pages are organized.

This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline.

Features

What it does:

  • Discover and build relationships between Bookstack Shelves/Books/Chapters/Pages to create a relational parent-child layout
  • Export Bookstack pages and their content to a .tgz archive
  • Additional content for pages like their images and metadata and can be exported
  • The exporter can also Modify Markdown Files to replace image links with local exported image paths for a more portable backup
  • YAML configuration file for repeatable and easy runs
  • Can be run via Python or Docker
  • Can push archives to remote object storage like Minio
  • Basic housekeeping option (keep_last) to keep a tidy archive destination

Supported backup targets are:

  1. local
  2. minio
  3. s3 (Not Yet Implemented)

Supported backup formats are based on Bookstack API and shown here and below:

  1. html
  2. pdf
  3. markdown
  4. plaintext

Use Case

The main use case is to backup all docs in a relational directory-tree format to cover the scenarios:

  1. Share docs with another person to keep locally.
  2. Offline copy wanted.
  3. Back up at a file level as an accessory or alternative to disk and volume backups.
  4. Migrate all Bookstack page contents to Markdown documenting for simplicity.
  5. Provide an easy way to do automated file backups locally, in docker, or kubernetes for Bookstack page contents.

Using This Application

Ensure a valid configuration is provided when running this application. See Configuration section for more details.

Simple example configuration:

# config.yml
host: "https://bookstack.yourdomain.com"
credentials:
    token_id: ""
    token_secret: ""
formats:
- markdown
- html
- pdf
- plaintext
output_path: "bkps/"
assets:
    export_images: false
    modify_markdown: false
    export_meta: false
    verify_ssl: true

Run via Pip

The exporter can be installed via pip and run directly.

Examples

python -m pip install bookstack-file-exporter

# using pip
python -m bookstack_file_exporter -c <path_to_config_file>

# if you already have python bin directory in your path
bookstack-file-exporter -c <path_to_config_file>

Options

Command line options:

option required description
-c, --config-file True Relative or Absolute path to a valid configuration file. This configuration file is checked against a schema for validation.
-v, --log-level False, default: info Provide a valid log level: info, debug, warning, error.

Environment Variables

See Valid Environment Variables for more options.

Example:

export LOG_LEVEL=debug

# using pip
python -m bookstack_file_exporter -c <path_to_config_file>

Python Version

Note: This application is tested and developed on Python version 3.12.X. The min required version is >= 3.8 but is recommended to install (or set up a venv) a 3.12.X version.

Run Via Docker

Docker can be utilized to run the exporter.

Examples

# --user flag to override the uid/gid for created files. Set this to your uid/gid
docker run \
    --user ${USER_ID}:${USER_GID} \
    -v $(pwd)/config.yml:/export/config/config.yml:ro \
    -v $(pwd)/bkps:/export/dump \
    homeylab/bookstack-file-exporter:latest

Minimal example with object storage upload. A temporary filesystem will be used so archive will not be persistent locally.

docker run \
    -v $(pwd)/config.yml:/export/config/config.yml:ro \
    homeylab/bookstack-file-exporter:latest

Environment Variables

See Valid Environment Variables for more options.

Tokens and other options can be specified, example:

# '-e' flag for env vars
# --user flag to override the uid/gid for created files. Set this to your uid/gid
docker run \
    -e LOG_LEVEL='debug' \
    -e BOOKSTACK_TOKEN_ID='xyz' \
    -e BOOKSTACK_TOKEN_SECRET='xyz' \
    --user 1000:1000 \
    -v $(pwd)/config.yml:/export/config/config.yml:ro \
    -v $(pwd)/bkps:/export/dump \
    homeylab/bookstack-file-exporter:latest

Bind Mounts

purpose static docker path description example
config /export/config/config.yml A valid configuration file -v /local/yourpath/config.yml:/export/config/config.yml:ro
dump /export/dump Directory to place exports. This is optional when using remote storage option(s). Omit if you don't need a local copy. -v /local/yourpath/bkps:/export/dump

Authentication

Note visibility of pages is based on user, so use a user that has access to pages you want to back up.

Ref: https://demo.bookstackapp.com/api/docs#authentication

Provide a tokenId and a tokenSecret as environment variables or directly in the configuration file.

  • BOOKSTACK_TOKEN_ID
  • BOOKSTACK_TOKEN_SECRET

Env variables for credentials will take precedence over configuration file options if both are set.

For object storage authentication, find the relevant sections further down in their respective sections.

Configuration

See below for an example and explanation. Optionally, look at examples/ folder of the github repo for more examples. Ensure Authentication has been set up beforehand for required credentials.

For object storage configuration, find more information in their respective sections

Schema and values are checked so ensure proper settings are provided. As mentioned, credentials can be specified as environment variables instead if preferred.

Just Run

Below is an example configuration to just get quickly running without any additional options.

host: "https://bookstack.yourdomain.com"
credentials:
    token_id: ""
    token_secret: ""
formats: # md only example
- markdown
# - html
# - pdf
# - plaintext
output_path: "bkps/"
assets:
    export_images: false
    modify_markdown: false
    export_meta: false
    verify_ssl: true

Full Example

Below is an example configuration that shows example values for all possible options.

host: "https://bookstack.yourdomain.com"
credentials:
    token_id: ""
    token_secret: ""
additional_headers:
  test: "test"
  test2: "test2"
  User-Agent: "test-agent"
formats:
  - markdown
  - html
  - pdf
  - plaintext
minio:
  host: "minio.yourdomain.com"
  access_key: ""
  secret_key: ""
  region: "us-east-1"
  bucket: "mybucket"
  path: "bookstack/file_backups"
  keep_last: 5
output_path: "bkps/"
assets:
  export_images: true
  modify_markdown: false
  export_meta: false
  verify_ssl: true
keep_last: 5

Options and Descriptions

More descriptions can be found for each section below:

Configuration Item Type Required Description
host str true If http/https not specified in the url, defaults to https. Use assets.verify_ssl to disable certificate checking.
credentials object false Optional section where Bookstack tokenId and tokenSecret can be specified. Env variable for credentials may be supplied instead. See Authentication for more details.
credentials.token_id str true if credentials If credentials section is given, this should be a valid tokenId
credentials.token_secret str true if credentials If credentials section is given, this should be a valid tokenSecret
additional_headers object false Optional section where key/value for pairs can be specified to use in Bookstack http request headers.
formats list<str> true Which export formats to use for Bookstack page content. Valid options are: ["markdown", "html", "pdf", "plaintext"]
output_path str false Optional (default: cwd) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. If not provided, will use current run directory by default
assets object false Optional section to export additional assets from pages.
assets.export_images bool false Optional (default: false), export all images for a page to an image directory within page directory. See Backup Behavior for more information on layout
assets.modify_markdown bool false Optional (default: false), modify markdown files to replace image links with local exported image paths. This requires assets.export_images to be true in order to work. See Modify Markdown Files for more information.
assets.export_meta bool false Optional (default: false), export of metadata about the page in a json file
assets.verify_ssl bool false Optional (default: true), whether or not to check ssl certificates when requesting content from Bookstack host
keep_last int false Optional (default: None), if exporter can delete older archives. valid values are:
- set to -1 if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to 1+ if you want to retain a certain number of archives
- 0 will result in no action done
minio object false Optional Minio configuration options.

Valid Environment Variables

General

  • LOG_LEVEL: default: `info``. Provide a valid log level: info, debug, warning, error.

Bookstack Credentials

  • BOOKSTACK_TOKEN_ID
  • BOOKSTACK_TOKEN_SECRET

Minio Credentials

  • MINIO_ACCESS_KEY
  • MINIO_SECRET_KEY

Backup Behavior

General

Backups are exported in .tgz format and generated based off timestamp. Export names will be in the format: %Y-%m-%d_%H-%M-%S (Year-Month-Day_Hour-Minute-Second). Files are first pulled locally to create the tarball and then can be sent to object storage if needed. Example file name: bookstack_export_2023-09-22_07-19-54.tgz.

The exporter can also do housekeeping duties and keep a configured number of archives and delete older ones. See keep_last property in the Configuration section. Object storage provider configurations include their own keep_last property for flexibility.

For file names, slug names (from Bookstack API) are used, as such certain characters like !, / will be ignored and spaces replaced from page names/titles.

All sub directories will be created as required during the export process.

Shelves --> Books --> Chapters --> Pages

## Example
kafka (shelf)
---> controller (book)
    ---> settings (chapter)
        ---> retention-settings.md (page)
        ---> retention-settings_meta.json
            ...
        ---> compression.html (page)
        ---> compression.pdf
        ---> compression_meta.json
            ...
        ---> optional-config.md (page)
            ...
        ---> main.md (page)
            ...
---> broker (book)
    ---> settings.md (page)
        ...
    ---> deploy.md (page)
        ...
kafka-apps (shelf)
---> schema-registry (book)
    ---> protobuf.md (page)
        ...
    ---> settings.md (page)
        ...

## Example with image layout
# unassigned dir is used for books with no shelf
unassigned (shelf)
---> test (book)
    ---> images (image_dir)
        ---> test_page (page directory)
            ---> img-001.png
            ---> img-002.png
        ---> rec-page
            ---> img-010.png
            ---> img-020.png
    ---> test_page.md (page)
            ...
    ---> rec_page (page)
        ---> rec_page.md
        ---> rec_page.pdf

Another example is shown below:

## First example:
# programming = shelf
# book = react
# basics = page

bookstack_export_2023-11-28_06-24-25/programming/react/basics.md
bookstack_export_2023-11-28_06-24-25/programming/react/basics.pdf
bookstack_export_2023-11-28_06-24-25/programming/react/images/basics/YKvimage.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/basics/dwwimage.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/basics/NzZimage.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/nextjs/next1.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/nextjs/tips.png
bookstack_export_2023-11-28_06-24-25/programming/react/nextjs.md
bookstack_export_2023-11-28_06-24-25/programming/react/nextjs.pdf

Books without a shelf will be put in a shelve folder named unassigned.

Empty/New Pages will be ignored since they have not been modified yet from creation and are empty but also do not have a valid slug. Example:

{
    ...
    "name": "New Page",
    "slug": "",
    ...
}

You may notice some directories (books) and/or files (pages) in the archive have a random string at the end, example - nKA: user-and-group-management-nKA. This is expected and is because there were resources with the same name created in another shelve and bookstack adds a string at the end to ensure uniqueness.

Images

Images will be dumped in a separate directory, images within the page parent (book/chapter) directory it belongs to. The relative path will be {parent}/images/{page}/{image_name}. As shown earlier:

bookstack_export_2023-11-28_06-24-25/programming/react/images/basics/dwwimage.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/basics/NzZimage.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/nextjs/next1.png
bookstack_export_2023-11-28_06-24-25/programming/react/images/nextjs/tips.png

Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run Cleanup Images in the Maintenance Settings or via CLI to remove them.

Modify Markdown Files

To use this feature, assets.export_images should be set to true

The configuration item, assets.modify_markdown, can be set to true to modify markdown files to replace image url links with local exported image paths. This feature allows for you to make your markdown exports much more portable.

Page (parent) -> Images (Children) relationships are created and then each image url is replaced with its own respective local export path. Example:

## before
[![pool-topology-1.png](https://demo.bookstack/uploads/images/gallery/2023-07/scaled-1680-/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png)

## after
[![pool-topology-1.png](./images/{page_name}/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png)

This allows the image to be found locally within the export files and allow your markdown docs to have all the images display properly like it would normally would.

Note: This will work properly if your pages are using the notation used by Bookstack for Markdown image links, example: [![image alt text](Bookstack Markdown image URL link)](anchor/url link) The (anchor/url link) is optional.

Object Storage

Optionally, target(s) can be specified to upload generated archives to a remote location. Supported object storage providers can be found below:

Minio Backups

Optionally, look at examples/minio_config.yml folder of the github repo for more examples.

Authentication

Credentials can be specified directly in the minio configuration section or as environment variables. If specified in config and env, env variable will take precedence.

Environment variables:

  • MINIO_ACCESS_KEY
  • MINIO_SECRET_KEY

Example

minio:
    host: "minio.yourdomain.com"
    region: "us-east-1"
    bucket: "mybucket"
    access_key: ""
    secret_key: ""
    path: "bookstack/file_backups"
    keep_last: 5

Configuration

Item Type Required Description
host str true Hostname for minio. A host/ip + port combination is also allowed, example: minio.yourdomain.com:8443
region str true This is required since minio api appears to require it. Set to the region your bucket resides in, if unsure, try us-east-1
bucket str true Bucket to upload to
access_key str false if specified through env var instead, otherwise true Access key for the minio instance
secret_key str false if specified through env var, otherwise true Secret key for the minio instance
path str false Optional, path of the backup to use. Will use root bucket path if not set. <bucket_name>:/<path>/bookstack-<timestamp>.tgz
keep_last int false Optional (default: None), if exporter can delete older archives in minio.
- set to 1+ if you want to retain a certain number of archives
- 0 will result in no action done

Future Items

  1. Be able to pull images locally and place in their respective page folders for a more complete file level backup.
  2. Include the exporter in a maintained helm chart as an optional deployment. The helm chart is here.
  3. Be able to modify markdown links of images to local exported images in their respective page folders for a more complete file level backup.
  4. Be able to pull attachments locally and place in their respective page folders for a more complete file level backup.
  5. Export S3 and more options.
  6. Filter shelves and books by name - for more targeted backups. Example: you only want to share a book about one topic with an external friend/user.
  7. Be able to pull media/photos from 3rd party providers like drawio

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookstack-file-exporter-1.0.1.tar.gz (27.7 kB view hashes)

Uploaded Source

Built Distribution

bookstack_file_exporter-1.0.1-py3-none-any.whl (27.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page