Creating a mirror

From ParabolaWiki
Jump to: navigation, search

1 Private Mirrors

A complete mirror is over 250GB in size. While disk space is inexpensive, bandwidth is very costly for the public mirror hosts. Unlike most consumer-grade ISP plans, public mirrors must pay according to the amount of data transferred. When operating a complete private mirror, many more packages will be downloaded, than will ever be used, transferring many gigabytes per year wastefully, at the expense of the mirror operator. Please be kind to mirror operators, and do not keep a complete mirror, unless your intention is to share the entire mirror with many others, or you are working on the Parabola system as whole, with the intention to contribute your work to Parabola.

If your intention is only to speed up upgrades on multiple local computers, please consider Mirroring on demand instead, instead of setting up a complete private mirror.

2 Public Mirrors

Due to the high load and bandwidth limits, Parabola uses 2-tier mirroring scheme. There are few dedicated high-bandwidth mirrors which with directly from parabola.nu, and sync multiple times per hour. These are denoted as: 'tier-1'. Only those mirrors are allowed to sync with parabola.nu. The other mirrors on the Parabola mirrorlist sync with one of the tier-1 mirrors, and are expected to sync at least once per hour. All other mirrors should sync with a tier-1 or tier-2 mirror. If you are not sure which one to use, consider geographical proximity as the best measure.

3 Distributed Mirrors

If you wish to share Parabola packages publicly, but you can not meet the bandwidth and up-time requirements in the following section, or you would like to avoid revealing your identity or IP address, you can help Parabola distribute packages and LiveISOs by serving them on the Pacman2Pacman p2p network. That would actually be more helpful to Parabola, than another centralized mirror.

Pacman2Pacman is a plugin for pacman which allows it to download via bittorrent and HTTP mirrors simultaneously and transparently, and to "seed" downloaded packages back to other Parabola users. This reduces load on the mirrors, and makes Parabola more autonomous, by making package distribution less dependent on centralized hosts.

More importantly, the Pacman2Pacman network provides an extra layer of resiliency, which is absent from the standard federated mirror network. It is the "plan-b" cushion against any unexpected outages. It is also much easier to setup and operate than a complete mirror; and it can help, even if your computer is not always online, and even if you only share the packages that you have installed anyways.

The standard federated mirror network is quite strong; but it has a certain vulnerability, due to it's hierarchical nature. As with any federated network, there is an ever-present risk of partial or total blackouts. This p2p distributed (aka: de-centralized) distribution system allows Parabola to distribute software, even in some of the worse case scenarios (eg: DNS blackout, censorship, or total loss of the Parabola web servers infrastructure). However, it's health depends on the participation of Parabola users. Adding your disk space and bandwidth to the distributed network, and encouraging other parabola users to do the same, is actually preferable than growing the federated mirror network.

4 Mirror Requirements

The Parabola repositories are currently about 75GB per architecture; and there are currently 3 arches. This should be considered to be the bare minimum, as there are often supporting tools other than packages, such as LiveISOs and snapshots for exotic hardware, which could vary in quantity at any time.

All of the software in the Parabola repos is freely distributable, so anyone is free to keep their own personal mirror and offer access to it to others. There are no specific requirements for that use-case; so it can be accomplished on any computer with sufficient disk space. However, all public mirrors recognized by the Parabola project, are expected to meet some criteria for speed and reliability.

4.1 Tier-0 Requirements

  • Expose a public IPv4 and IPv6 rsync service
  • Bandwidth >= 10Gbit/s
  • Sync with repo.parabola.nu, continuously
  • Always permit rsync access from all Parabola Tier-1 mirrors (and let us know if that ever becomes a problem)
  • Meet the Tier-1 requirements also, except:
    • Public access is not a strict requirement
    • HTTP/HTTPS access is not a strict requirement

4.2 Tier-1 Requirements

  • Expose a public IPv4 rsync service
  • Bandwidth >= 1Gbit/s
  • Sync with a Tier-0 source, multiple times per hour, and coordinate the schedule with the Tier-0 source admin (The 'mirrors' mailing list may be used for this purpose)
  • Demonstrate reliability and dedication (see the "Mirror Information" section for details)
  • Provide IRL contact information (see the "Mirror Information" section for details)
  • Meet the Tier-2 requirements also

4.3 Tier-2 Requirements

  • Expose a public IPv4 or IPv6 web service, with both HTTP and HTTPS support
  • Bandwidth >= 100Mbit/s
  • Disk-space > 250 GB (This may increase to 500GB in the future)
  • Sync with a Tier-1 mirror (see https://www.parabola.nu/mirrors/)
  • Sync all contents of the upstream mirror (i.e. do not sync only some repositories)
  • Sync once per hour, and coordinate the schedule with your chosen Tier-1 mirror (The 'mirrors' mailing list may be used for this purpose)
  • Meet the Tier-3 requirements also

4.4 Tier-3 Requirements

  • Expose a public IPv4 or IPv6 web service, with HTTP or HTTPS support
  • Bandwidth >= 1Mbit/s
  • May sync repositories selectively (eg: a single arch, LiveISOs only, source-balls only); but host each completely
  • Use the following rsync options: -rtlvH --delete-after --delay-updates --safe-links
  • The "Mirror Sync Script" is perfectly suitable for this purpose, and pre-configured to meet the previous criteria
  • Subscribe to the low-volume, private 'mirrors' mailing list
  • Restrict usage of the 'mirrors' mailing list to only important and on-topic messages/questions
  • Meet the private mirror requirements also

4.5 Private Mirror Requirements

  • Disk-space > 75 GB (per arch)
  • Always check the '/lastupdate' file, and avoid running the rsync command, if the timestamps match
  • The "Mirror Sync Script" is perfectly suitable for this purpose, and pre-configured to meet the previous criteria
Note: We can not expect that any mirror will allow unlimited access to everyone. This is at their discretion. Please respect their terms of use, and do not evade IP throttling or bans. The Pacman2Pacman network exists to provide unlimited anonymous access to everyone.

5 Join the Parabola Mirror Network

Access via http:, https:, ftp:, and rsync: are supported over both IPv4 and IPv6. Only IPv4 access via HTTP and HTTPS (tier-1 and tier-2), and IPv4 via rsync (tier-1) are mandatory. All other protocols are encouraged, but optional. We expect that you will maintain the sync schedule, which you should coordinate with your upstream mirror, in order to avoid congestion.

5.1 Parabola 'mirrors' Mailing List

We expect that all mirror operators will be subscribed to the dedicated 'mirrors' mailing list. We intend for this to be a very low-volume list. Feel free to send any messages/questions. All messages will be read by a Parabola team member; but messages will not necessarily be propagated to other subscribers. This mailing list is fully moderated to eliminate noise; and only messages which are relevant to multiple parties will be propagated to the list. Subscriptions are also moderated and the messages are not published publicly. Anyone may send messages to the list; but only operators of active mirrors will ever receive messages, and only if deemed important by a moderator.

5.2 Mirror Information

5.2.1 Public Mirrors

If you would like your mirror to become part of the official Parabola public mirror network, send an email to the Parabola mirrors mailing list with the following information:

  • Geographical location of the service (eg: the State/Province, not a large country such as USA)
  • Real name of the primary responsible party or organization
  • Email address(es) of server admin(s)
  • Nominal out-going bandwidth that your server can offer consistently
  • The upstream mirror with which you are synchronizing
  • URLs to the repository base directory on the server, for each supported protocol

Also, if you will not be hosting all repositories (all arches, LiveISOs, and source-balls), indicate which ones you will be hosting (eg: everything, i686 only, ISOs only). You can save a lot of bandwidth and disk space that way, especially if keeping backups.

Example:

Geographical Location:  Umeå, Sweden
Responsible Party:      Academic Computer Club, Umeå University
Admin Email:            admin@example.org
Alternate Email:        (optional)
Nominal Outgoing B/W:   20Mbit/s
Upstream Mirror:        rsync.cyberbits.eu
Repos Hosted: everything
Service URLs:           http://ftp.acc.umu.se/mirror/parabola/
                        https://ftp.acc.umu.se/mirror/parabola/
                        rsync://ftp.acc.umu.se/mirror/parabola/
Repos Hosted:           everything

Private (Optional):

Personal Email:     (optional)
PGP key ID or file: (optional)
Telephone:          (optional)
Snail Mail:         (optional)

The primary public mirror information per the example above, will be entered into the ParabolaWeb database; and be visible only to ParabolaWeb moderators. Any optional personal email addresses, and offline contact information will not be in the database.

NOTE TO MODERATORS: Be sure _not_ to pass personal contact information onto the mailing list. This information is not to be entered into the ParabolaWeb database. Parabola sysadmins keep this information locally, in an encrypted text file. The primary public mirror information should be entered into the ParabolaWeb interface though.

5.2.2 Tier-1 Mirrors

If your server can reliably provide a relatively large amount of outgoing bandwidth, we may ask that you become a Tier-1 mirror, with which less capable mirrors can sync directly. We prefer that no one asks for this promotion specifically. Rather, we may ask existing mirrors, whenever another Tier-1 mirror is needed. To be considered as a candidate Tier-1 mirror, it is sufficient to be a reliable Tier-2 mirror for a while (eg: one year, with reasonable up-time, above average bandwidth, responsiveness to out-of-sync notifications, etc). We are looking for a long-term commitment at this level.

The following information is optional for most mirrors. Mirrors will be expected to provide it to a Parabola sysadmin on a side-channel, but only when elevating to Tier-1.

  • Secondary offline contact information (telephone and/or snail-mail)
  • PGP key for private messages
  • Stable rsync IP for authorization of connections to a Tier-0 source

6 Parabola Sysadmin Tasks

Parabola mirrors are represented in three places. When a new public mirror joins the network, or when an old one leaves permanently or temporarily, or changes (geographical location, domain name, etc), adjust its representations accordingly. Private mirrors are currently not tracked nor represented anywhere.

  1. The complete mirror info is entered via the parabolaweb admin web interface. That is, everything requested in the "Mirror information" section above. The "Notes" field is not public; but it is in the database. Personal information may be omitted (eg: the admin's personal cell phone number or home address); but add a note, indicating that you have it.
    • Ensure that new mirrors have the "active" and "public" flags set.
    • Parting mirrors can simply have the "active" flag reset. That will hide it from the web.
  2. If the mirror hosts package repos, it's geographical location and HTTP URLs are in the canonical mirrorlist.txt, in the repo root directory.
    • Regenerate the 'pacman-mirrorlist' package, whenever this information changes.
  3. If the mirror hosts LiveISOs, it's geographical location and one of it's HTTP URLs are in the "HTTP_Mirrors" section of the Downloads page.
    • XML comments (<!-- -->) may be used on the wiki to hide table row temporarily.

7 Mirror Sync Script

The Parabola repositories include a pre-configured mirror sync script, which can be used to create and maintain a mirror. Before using the script, there are several constants in the '### CONFIG ###' section which could be adjusted; but at the very least, 'UPSTREAM_HOST' must be defined explicitly.

In order to keep your mirror up to date, you should run this script regularly (eg: by configuring a cron job or systemd timer to your preferred schedule). Please note that, in order to minimize the server load and bandwidth used, this script will only attempt to synchronize when the upstream '/lastupdate' file has changed. Even when there are no new packages to download, rsync still transmits approximately 18MiB of metadata for each sync session, which a non-trivial amount for a public service with many clients. Please be kind to your upstream mirror operator, and do not circumvent the if [[ "${upstream_ts}" == "${local_ts}" ]] test.

Finally, in order to use the mirror, you should add the path to your repo on the local filesystem to your /etc/pacman.d/mirrorlist. If you would like to expose the repo to the internet or LAN, any standard web-server such as Apache or nginx will suffice. If you have enough bandwidth to serve the repo to the general public, please do consider joining the Parabola mirror network and/or seeding the Pacman2Pacman mirror network.