Discussion:
Existing cloud mirror automation
Bastian Blank
2018-10-15 08:55:54 UTC
Permalink
Hi folks

We decided we would be willing to maintain cloud provider mirrors.

These are the things I already created in the past:

You can try to current test setup running on GCE:

http://cdn-gce.deb.debian.org/debian
https://cdn-gce.deb.debian.org/debian
http://cdn-gce.deb.debian.org/debian-security
https://cdn-gce.deb.debian.org/debian-security

The Terraform definition:
https://salsa.debian.org/mirror-team/debian-mirror-cloud-terraform

The system Ansible definition:
https://salsa.debian.org/mirror-team/debian-mirror-cloud-ansible

And the mirror config definition:
https://salsa.debian.org/mirror-team/mirror/ansible

Mainly I developed that for the GCE case, but there are initial AWS
definitions in the Terraform stuff.

Regards,
Bastian
--
"... freedom ... is a worship word..."
"It is our worship word too."
-- Cloud William and Kirk, "The Omega Glory", stardate unknown
Bastian Blank
2018-10-15 11:44:44 UTC
Permalink
Here my initial description of this setup:

Google have a globally distributed anycast load balancer, which I'd like
to use. So we have one IP (+IPv6) that supports HTTP and HTTPS. It is
reachable from, hopefully, most parts of the world.

There is one large difference to what we currently have a Fastly and
CloudFront. The load balancer can only utilize backends within the own
cloud infrastructure, so we need backends there. This also makes sure
the infrastructure does not have any common failure modes with Fastly
and CloudFront.

Also every system only carries one archive. So even main and security
archives are completely separated.

Setup
-----

My currently planned initial setup includes all stuff for the main and
security archive. The others, debug and ports, will be redirects for the
time being.

This means the following amount of instances:
- one syncproxy per archive (= 2),
- two backends per archive in three regions (= 12),
- one jump host and management router (= 1), and
- one monitoring host for Prometheus and Icinga2 (= 1).

The numbers for the backends can be changed easily later. Setting up a
new one for the main archive using an existing snapshot takes less then
30 minutes.

Prometheus is used to extract both host and web access statistics,
including the load balancer. Apache is not able to provide this kind
information, so this setup runs on nginx.

A more traditional monitoring via Icinga2 is also partially configured,
but without any output.

The whole setup is maintained in three different git repositories:
- the cloud stuff itself is done via Terraform.[^terraform-git]
- the main setup is done via Ansible.[^ansible-git]
- the mirror sync setup is done via the mirror team Ansible stuff.

Terraform does the whole network, load balancer and instance setup. Only
Terraform knows the numbers of systems available. Everything else just
asks the Google platform to see what's available.

Capacity planing
----------------

The described setup should (from official quota documentation and own
observation) provide us per backend with:
- Network: 4Gbps
- Disk for main archive: 240MB/s, 15kIOPS
- Disk for security archive: 72MB/s, 7kIOPS

So we have from the commulated backends:
- for the main archive:
- 24Gbps network
- 1440MB/s disk
- for the security archive:
- 24Gbps network
- 432MB/s disk

As everything is cached in the CDN as well, we can push way higher
figures to the users.

Still unfinished stuff
----------------------

- Instance mail. Google blocks access to port 25, 465, and 587, so mail
needs to go another way.
- If we want to do HTTPS, DSA needs to provide certificates and
automatically update them in the Google cloud target proxy.
--
Death, when unnecessary, is a tragic thing.
-- Flint, "Requiem for Methuselah", stardate 5843.7
Loading...