Sprint minutes

Tomasz Rybak

2018-11-12 23:19:39 UTC

Hello.
I finished editing and polishing minutes from our sprint.
Here you can find all files from Gobby, joined and reordered,
so related issues are close together. I also tried to wrap lines
before 80 columns, but it was not always possible.

Warning - it's almost 1300 lines. Have a nice lecture!

We had intensive 3 days of our sprint.

ToC
1. Status on platforms
2. Status of building
3. Testing and plans for workflows
a) Salsa/GitLab workflow
b) testing and related publishing workflows
4. Buster
5. LTS, backports, variants
6. Other topics:
a) cloud-init
b) cloud kernel
c) unattended upgrades, user name, serrives, initramfs, networking, etc
7. Non-x86 ports
8. Secure boot
9. Governance
a) Mirror network
b) Account managemnt
10. Wrap up, summary

Status on various platforms
===========================
We have 4 clouds represented in the room.

* how is Debian running on each platform?
* what needs addressing?
* current tools we have and what is already deployed:
* what guest/SDK code isn't in Debian yet, how can we get there?
* Supporting those packages
* what isn't built with FAI yet, how can we get there?

Azure
- -----
Debian is running fine, growing in usage and popularity.
Support needs addressing. Different policies. Timelines of supports?

Daily Images are removed after 14 days after release
Official images are still built using custom scripts, not FAI. Script
derived
from Openstack script.
Road map to move to common set of tools (esp. FAI)
* For consistency
* Easy for end users to build custom images based on official configs
* Should evangelise doing this!

SDK, daemons, CLI: own repository. Do we try to integrate it into Debian
repositories?
Cloud kernel - good, interesting, beneficial. Already used for some
images
on Azure
Update speed

Cloud init - not currently using it, but customers are asking

GCE
- ---
Debian is still the default image

Still using bootstrap-vz for now, going to move to FAI for buster+.
Buster image with FAI has issues - can we resolve?
Is Debian enterprise distro? Companies support (financially) LTS

Security and EOL notices
* Reasonable (6m?) notice for EOL of a release
* GCE's concept of deprecating images: image is not recommended or
chosen
by default, but users can choose it (by choosing to see deprecated
images) and use it with warnings
* Deprecation of weekly images (the same for all clouds) (Azure deletes
them after 14 days)

Still having pain getting guest code into Debian
* Currently own repository and package for both guest
(https://github.com/GoogleCloudPlatform/compute-image-packages) and
SDK.
* The SDK will possibly never will be part of Debian but the guest will
be and is in process of being added as a Debian package
(https://salsa.debian.org/debian/google-compute-image-packages)
* The SDK (gcloud) is not needed to use the images and is optional for
users.
Maintaining GCE guest SW in Debian is a worry

Don't really care about cloud-init, but some users do.
* Slow, (adds an average of 5 seconds to boot time), only boot-time
config
(not runtime)
* Too many dependencies, hard to maintain.
* Not against cross-distro, cross-platform tool, but existing one is
not
good enough.

UEFI wanted, with Secure Boot

AWS
- ---
Things are stable, quite OK
We (most probably) run on all instance types.
Users use marketplace or other way (without statistics via marketplace)
Debian is default for Kubernetes COPS - suggests good quality of images.
But uses Jessie with backported kernel

Stretch generated using FAI, regenerated regularly
not using the cloud kernel yet, missing a couple of drivers? see BTS

Problems with gov cloud. Requires paperwork; ownership of account by
Debian should help here

cloud-init and aws cli is included in the image

Openstack
- ---------
Still using the old build script, but happy to move to FAI (with help!)
Building regularly for Jessie and Stretch, and weekly buster builds
amd64 and arm64
Basic support for arm64
Still wanting to add ppc64el (and s390x?)

Kernel usage. Currently used backported, suggestion not to use cloud
kernel.
Open Stack requires more drivers.
Many platforms (Xen, kvm, etc.). But not all supported (only kvm, or
more?)
We might not support everything, but users can build (and maintain)
their own
variants or configurations
virtio is covered (what does it mean)

Brasilian university providing access to hardware?

Status of building
==================

qemu-vm project on Salsa

Salsa with CI functionality:
https://salsa.debian.org/waldi/debian-cloud-images/

Run after each push, so we can see if anything got broken
Building images, normal jobs are running on GCE with an option to build
things on casulana too.

Gitlab CI runner.
Runner asks to start CoreOS images x86 (only architecture so far)
CoreOS is not Debian. But problems with Docker on Debian (is it still
current?)
Docker runs there, and complete builder runs inside this docker
Takes 50s to bootstrap; one VM (and one docker inside) per job

How do we upload those images?
Images are uploaded to artifacts on Salsa.

Docker machine to maintain VMs.
Makefile is not used by Salsa but might be needed to run/build locally
Needs documentation
We can run on Casulana. Only protected branches are (should?) built on
Casulana.
Building images using UEFI.
Debian Grub packages only support installing either BIOS boot or UEFI.
grub-cloud adds an option to support both.
Many variants: BIOS, UEFI, GPT, MBR, MsDos partitions.
Support issues for Grub maintainers

EFI hybrid images - use GPT with a protective MBR too. That way boot
either
via BIOS or UEFI

UEFI is the right way in the future, and then Secure Boot
GCE has flags that can be added to an image, to define (e.g. UEFI
support)
Azure does gen1 or gen2 VM format - gen2 is defined as using UEFI
AWS doesn't (yet) do UEFI, but we expect it to happen in future

Auto-growing images at boot? Not currently working with FAI. Will need
extra
tweaks to deal with changing the size of a GPT setup.
cloud-init can grow partitions - does it grow filesystems?
systemd script to expand things. Needs to be taught about the two
partitions
needed in a GPT+UEFI context. it already handles the simpler BIOS case.

Locations
- ---------
Currently only GCE. Casulana build is freshly baked in but not working
fully yet.
CI runs as user cloud?
Casulana is used to build CD images
Casulana is build box, not publish. pettersson is used to publish; it'll
go away but not yet.
We need space: 100-200GB. We have enough space
2nd publish location but not done yet
Need for redundancy. Hosting, geography, hardware.
Publishing from Casulana to other locations.
New jobs. Need for glue to join everything (building, publishing,
testing)
Supervision of publishing.
* registering (storing binary artifacts)
* publishing to providers, especially for release, should require human
in the loop
* publish to all platforms at once
* use day inside name (or identifier) not build number (which might
differ)

Providing
- ---------
New web site. More priority for downloading; might include image finder
With CI we have JSON with metadata which should help here
Need for volunteers to deal with it
Link to providers' images.
e.g AWS: link to market place, CLI arguments, AMI IDs, etc. Per region!
Ubuntu send you to market place, but also gives AMI ID
Ubuntu does not like their own tool, prefer SuSE
Look at the Suse "pint" tool which works with AMI metadata to help find
the right image to run etc.
- -
https://github.com/SUSE/Enceladus/tree/master/susePublicCloudInfoClient

OpenStack providers? 18 providers currently. Bad thing to provide links
only to 3 biggest
Every image has UUID. Potentially many UUIDs to publish
Also private OpenStacks. They should download latest from Casulana, but
then we need to provide links to those (appropriate) images
Outreachy/GSoC project? come back to this later - let's get the images
and metadata first!

Images will be available at
http://cdimage.debian.org/images/cloud/

cloud.debian.org
Put small page there with links to built images
Not we must use salsa artifactory which is non really intuitive
It's also temporary space
cloud.d.org (alias of pettersson) can also be used to host

Testing and plans for workflows
===============================

Salsa/Gitlab workflow
- ---------------
- --------
* Automatic CI builds via salsa
* Locations for building and publishing
* Providing images for download
+ Image finder etc.
* Image automatic uploader service (aka: uploading OpenStack images to
public clouds automatically, or at least triggers)
* Development images published automatically

Do we need policy for managing those repositories and changes?
Especially for fai and debian-cloud-images (formerly fai-cloud-images)

Layout of branches
Need for review of merge requests
Then we can remove direct access to master or similar branches and
require usage of merge requests
Which branch triggers pipeline run?

rename "master" to "unstable"
People to PR to branches (testing/unstable/stable)
any PRs trigger builds and tests
After X approvals then the "merge" button becomes available

Branch names like stretch, buster (not "stable" as it'll change)
Similar to gpb process (but not the same).
No direct commits to named branches
Need of cleanup for branches (removal of old, clean naming)

Stable images should be buildable from stable
Conflicts of FAI versions (configuration changes)

Separate repository for building images and publishing it
Former is (should be) treated as more stable and not changing
(especially for stable)
The latter might need change, e.g. cloud providers change API for
publishing
images and we need to update code during stable

The same policy for cloud images like for Debian.
Stable stays stable.
Forking/branching FAI and config?
* but how then we merge them?
* backporting changes to stable
* or cherry-picking security fixes (or similar)
Unstable can be used for experimentation
OpenStack only builds testing and stable currently (i.e.: no Sid images)

Current gitlab runners use the same VMs for all builds.
Do we need to split them?

Bot publishing info about repository changes
including merge requests

We should not have things running in pipeline without somebody doing
code review/requests merge

Two workflows
One automatic for daily
Another (manual) for official images where we'll publish manually after
some
review (to marketplace)

Also removal.
Both from market place but also from storage
For example we use 4T of storage on Azure

We have all cd images for all images since 3.x
No ISO but Jigdo for most, so we can rebuild ISO when needed
We point to snapshots so no problem with Ftp cleaning

New images triggered by:

AWS
- ---
DSA announcements or point release

openstack
- ---------
list of packages and version, e.g.:
https://cloud.debian.org/images/openstack/current/debian-9.5.5-20181004-openstack-amd64-packages.list

Script at
https://salsa.debian.org/images-team/live-setup/blob/master/bin/check-openstack-updates

Comparing versions doesn't work when a new kernel releases because it
is a different package name

*Not* doing testing yet after build.

Azure
- -----
Also watching DSA announcements, teat and promote the latest daily
release
if it's needed

GCE
- ---
Always build monthly regardless
Watching DSAs and triggering rebuilds manually if needed

Where should we announce new images?
debian-cloud@ is not great, that's development discussion
Should we have a d-c-announce@ list? How else can we announce things?
Maybe not the right answer - new images may not be available on all
platforms
at the point the mail goes out :-(
Pending changes in the fai config stable branch should probably trigger
warnings?
Probably little value in daily builds for the sake of it...

When we're doing a new stable build for all platforms, we should publish
each
independently as they're available and tested. If one platform fails,
don't
gate other platform releases on that one.

Testing
- -------

* Testing (progress update)
* Reviews of changes
* branches etc.
* Workflows for builds, particularly security rebuilds
* compare/contrast/merge to get the best practice for everybody

We have a runner that will test AWS and GCE images.
* Starts an instance, runs a script inside it.

Roadmap:
* Static tests outside the image
* Test basic Debian stuff in the image on casulana using kvm to run an
instance
* Push to the specific provider to test the provider-specific bits

Needs to be integrated into our workflow
Test framework decoupled from tests

Run tests on all combinations of hardware and configuration
(e.g. instance types, number of disks, etc.)

Do we need package for test runner or testing cases?
Good idea for people who build own images, so they can test those images

Pipelines:
run all tests when building new image.
But do we rerun when we push new tests?
Pipeline is per project

Google released their own tests as FLOSS
We could use this, or at least treat as template
github.com/GoogleCloudPlafrom/compute-image-tools

GCE Integration Tests
* Daisy is a GCE workflow tool- its an API client. Written in Go.
(github.com/GoogleCloudPlafrom/compute-image-tools)
* The test framework uses the GKE (kubernetes) open source system (Prow)
and associated modules. (https://github.com/kubernetes/test-infra/)
* Examples of tests: test on 2 instances, one ssh to another, etc.

How complex should it be?

Current gitlab-ci pipline defines a build for each platform.
Automated testing plan:
- - build an image (for example, gce stretch)
- stores an artifact, e.g. debian-sid-20181009-gce-amd64.zip
- - local static filesystem tests
- local-static-test.sh debian-sid-20181009-gce-amd64.zip
- - local qemu live tests
- local-live-test.sh debian-sid-20181009-gce-amd64.zip
- - register/upload image to cloud provider
- make daily builds public
- stores an artifact with the image name
- - run debian cloud tests
- cloud-tests.sh <registered image name>
- - trigger provider tests if they exist
- GCE: daisy
https://github.com/GoogleCloudPlatform/compute-image-tools
- AWS: angry kitten (not public, but some possibility)
- AZR: unknown github.com/LIS/LISAv2; there is internal Jenkins, but
we'll
probably use LISA; it has PowerShell control scripts and bash test
scripts

For official releases, begin the manual approval/publishing step

Buster
======

Any changes we need or will affect us?

systemd-networkd? no
grub_cloud - all the x86 images should use hybrid images by default now

We don't need one metapackage (to pull dependencies) as we control what
we
install anyway
hybrid image?

All clouds use (should use) EFI
uboot should not be used. usage of grub with it is problematic, depends
on uboot details

add a grub-cloud-arm64 package with simple config - maybe just install
grub-efi-arm64?

cloud kernel image
only amd64 for now
want an arm64 equivalent too
need some work to make it more generic before adding new architectures
add ppc64el? do that when somebody cares

kernel-clod has problems on AWS. Does not have drivers for ENI
(more performant networking)
Should work on lower instances though, but people might want to use
better machine types ;-)

cloud-kernel might be default for clouds
might be problematic for OpenStack but we should at least test it
on AWS/AZU/GCE
May need to add some cloud-specific drivers?
OS images are using this image already for buster, no complaints so far

OpenStack bare metal provider would need to have generic kernel
Build both when we have automation?
We might need to add back some small amount of HW drivers, e.g. NVMe
for AWS instances (like m5d)

Back-porting such drivers to stable kernels?

Python3
Remove everything for Buster which is not Python3
There is already lintian check suggesting only python3
Python2 will stay in Buster, will be removed from Bullseye

Buster Freeze: Q1 2019
Should we try to release cloud images at release day?
Azure has long rollout time (up to 1 week)
But we should announce that we're releasing

Prelease images, beta during freeze
We have D-I alpha/beta, so we could (should) do the same for cloud
images
We already have weekly images so we could (re-)use them
But we might want to rename them or give more publicity

Release team has release criteria; do we want similar for clouds?
Removal of RC-buggy packages.
There are key packages which are not removed; we should add cloud-
critical
packages to it
Take packages from fai config space

Debian LTS
==========

If there are packages we'll build new images?

First. Do we want to provide LTS images?
Cloud users are usually short-lived.
OTOH COPS still uses Jessie (with custom kernel)

Problem with environment/community
They use old, outdated images (e.g. Jessie) as basis.

Also. Debian volunteer does not provide LTS
Paid people provide support

We should state that we're not providing support exceeding oldstable
support

LTS vs backports

security as best effort

Users want to have more than 3 years

Requires effort to keep images supported

Keeping images for some time, but reduce discoverability
We won't provide images for LTS, but won't prevent users from pointing
to LTS repository if they want to
But they won't be official D images

Images including packages from backports
========================================
Mostly the kernel, but in the past has also included things like ssh
Most of other packages are doable by cloud-init (or similar)
Kernel is special as this requires restart (reboot)

AZU currently uses backported kernel (kernel from backports)
They (Azure or users? Users) want it and use it
Proposal to do the same for all providers

Backports (unlike LTS) is official part of Debian, hosted
on our infrastructure, maintained by DDs or DMs, etc.

ssh was used from backports for GCE for performance/cipher reasons

We should label it, describe/document it and be OK with it
Possibility is included in FAI

Multiple variants of images
===========================
* Needed properties inside cloud images
* Buster release -- Changes that matter to us? What do we need to do to
prepare?
* Cloud kernel images
* and grub-cloud package?
* Support for non-x86 architectures
* UEFI Secure Boot
* LTS in the cloud vs. elsewhere
+ deprecating older images etc.
* Images with newer (backports) kernels
* Multiple variants of images? (e.g. minimal vs including vendor
integrations)

Only single variant of image, or many (e.g. minimal, full, integrated
with vendor
stuff?)

Publish as ready-to-use or also to download (e.g. on get.d.org,
cloud.d.o).
Related to this - publishing images to OpenStack. API, hooks
Supervised or automatic push?

Variation on previous one

But also non-free, not always needed, vendor stuff

Debian base (minimum)

Amazon Linux provides base image but it's not very popular (or is it?)
What's removed?
it has cloud-init but people want to remove it
No official list with removed stuff, or difference to standard one

We have FAI class with differences

We might want to have cloud-init, some basic agents
But don't include SDKs or cloud CLI, etc.

Be good citizen (VM)

Putting SDK (GCE one) to Debian
Google people don't care about Debian (are not DDs)
GCE puts new version every week, and that won't change

Put it to unstable and prevent from migrating?
But then we won't have backports (they should be releasable)

We should have one config space to build all variants
Otherwise it'll be divergence

OTOH we'll be diluting "Debian" by providing images with non-debian
repository
Also, those repositories might end up containing non-free software

Cannonical was republishing SDKs in own repository but it was 6-8 weeks
out of date
Users were complaining about missing support for new features, regions,
etc.

Use cloud-init to add other repositories?
It'll increase boot time

There is no one clean solution

We (Debian) are not doing good job maintaing fast moving software
(docker, kubernetes, GCE SDK, etc.)
It's not one-time effort but continuous

We might include FAI classes for vendor repositories, without doing them

Do we want to provide many variants of images, or just make it easy
to people to build own images?

We've chosen FAI to make easy to build own variants
We might build some important variants but not to many

Having tests will also make sure that user can be assured that their
changes
did not break anything

We might revisit it in the future

The cloud providers should write the FAI configs and include them into
the cloud repo. Any ideas for class names?

SSL/TLS 1.2?
Recent OpenSSL release disables some ciphers and TLS below 1.2
It might break some software deployed in the cloud

We shall not prevent users from e.g. enabling old, insecure ciphers
But at the same time we should have sensible default configuration

cloud-init
==========

1 RC bug - current version in unstable FTBFS
a few other bugs to fix
upstream work - development is ongoing
is it good for us?
forking / replacing has been attempted by various folks, but never takes
off
sprint happened recently
notes -
https://docs.google.com/document/d/1-gctZNXA9oshxyDsMqoWAw66ll5MBBpvRkdA6wzMYDk/edit

CLA no longer a problem, it seems
maybe slow to get patched accepted upstream

These are mostly what the current OpenStack image supports (apart from
Hybrid), and which we probably want to see within the published images.

* cloud-init with this list of metadata sources in
/etc/cloud/cloud.cfg.d/90_dpkg.cfg:
- datasource_list: [ NoCloud, ConfigDrive, OpenNebula, Azure,
AltCloud,
OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Ec2, CloudStack,
None ]
* Support amd64 + arm64
* Must resize itself (partition + resize of the fs)
* password or ssh key defined, cloud-init does it
* tty + ttyS0 for:
- grub (please fill...)
- systemd .service unit printing (aka: systemd.show_status=true)
- getty consoles as per
http://0pointer.de/blog/projects/serial-console.html
* Latest kernel installed, a single cloud kernel.
* No unattended-upgrades
* Single kernel installed after securty updates
* Set console for emergency and rescue shells (ie: emergency.service
rescue.service)

Nice to have:
* Hybrid BIOS + EFI (grub-cloud does that...)
* Currently in experimental: 18.3-1
- Tested with a Buster image for ssh key, and connectivity (could ping
and ssh in)
- Need other cloud providers to test it

* Addressed by last upload:
- #907672, #788103
* Opened bugs:
- https://bugs.debian.org/909557 (renaming
/etc/NetworkManager/dispatcher.d/hook-network-manager)
* Need a decision on:
- https://bugs.debian.org/887226 (add Suggests: e2fsprogs, btrfs-
progs, xfsprogs, ...)

All the above bugs are already addressed.

* Need to discuss:
- https://bugs.debian.org/837757 (provide a utility/mode to wait for
cloud-init to complete)
User should use systemd or go upstream for this feature
It also has ability to go to URL when done but it requires setting up
service

Previously upstream was almost dead. Now it's better, at least there is
some progress

Proposal to move cloud-init as systemd part (service, hook, etc.) but
nobody
found time to do it

For not cloud-init works enough so people don't have enogh motivation to
change it.
There are some pain points but not enough

What we do with cloud init:
set hostname
create default user, install public key (GCE)
users can provide script (user-data) and it'll be executed
Users don't complain to much about it

Users can be doing many things but we don't know about it

Complaints: size, added boot time

Plans for additional architectures, software, features.

Cloud kernel
============
Cloud kernel. Idea: small footprint, quick start. Removed many drivers
(e.g. PCMCIA, etc.)
Bit of mismatch of expectations, e.g. removal of filesystems (AFS?)

Official kConfig. What else is needed or can be removed. We need to
document,
especially _why_ something was removed or configuration change

We cannot have more variants of kernel. Release/security/kernel team
agreed for one kernel
for cloud needs, but not more

Config for Cloud kernel:
https://salsa.debian.org/kernel-team/linux/blob/master/debian/config/amd64/config.cloud-amd64

unattended-upgrades
- -------------------
Need to discuss this again. We're enabling it in AWS and GCP images and
people seem happy.
Debian-security team don't like this. Need to discuss properly.
waldi thinks that d-i in buster is installing enabling u-u by default.
Need to check

default user name
- -----------------
openstack adds a "debian" user
AWS is using "admin"
do we care about the difference? we're using separate images already
aws might have an agent in the future, so maybe having a single image
for openstack
and aws won't be possible.

emergency service stuff
- -----------------------
openstack adds emergency.service and rescue.service. What for? Needed?
sulogin on tty0
we think this is obsoleted by newer systemd stuff

cloud-initramfs-growroot
- ------------------------
used to grow the rootfs - maybe not needed any more - growpart script
wanted for GCE and anybody else not using cloud-init
now in "cloud-guest-utils"
partition and fs growing is a common problem. waldi to look into this

Some GPT setup has partition 1 at the end.
Previously parition 1 had to be at beginning to boot from
Now it changed.
But there are (maybe?) still some scripts which only grow partition 1
(hardcoded in source)

If cloud-initramfs-growtoot might be removed if we check that it's not
used anymore

network bringup
- ---------------
different providers have different network startup scripts, mostly very
similar
use systemd-networkd?

IPv6 poses some problems, and there are hacks to fix it
But it adds many seconds to boot time
Problems between ifup/systemd.
systemd might fix it now (for Buster) but it'll require switching to
systemd networkd
It does not fit well for WiFi, and also cloud
but not hard data, mostly anegdotes of people with problems
Now desktops depend on network-manager
We still install ifup/down but it's disabled

gce/azure - diff only blank line
aws: configures 8 interfaces, IPv6 helper script to determine v6 or not
ifup does not work for dynamic attachment; but dynamic is possible on
clouds

We'll leave differences for now; we might return to this once there is
one good solution
There will be differences, e.g. configuration of 25/40Gbps for AWS, etc.
All should include e/n/interfaces.d/*
Also all VPCs and point-to-point configuration so /e/n/i.d/* should be
done
SuSE has cloud-net-config. It's framework for configuring it.
Should we look into this, or leave for the future

Need for stable networking interfaces' names

It looks like there is many solutions, each with own problems and
deficiencies

Noah proposed testing udev rules

serial-***@.service
Zigo uses it to have serial console in OpenStack
Now systemd should start getty automatically so it might not be needed
anymore
but we still need to check.
If it does not work, it's bug which needs to be reported and fixed

ttyS0 is not always first. ARM64 has ttyAMA0

Non-x86 ports
=============
arm64 OpenStack
There is ppc64el in Brasil Uni. OpenStack

Only OpenStack has non-x86
AWS/AZU/GCP are only doing amd64?

Cross-building
FAI checks if target is the same as host, uses qemu-static if they
differ

Emulation:
slow, but works for now
casulana uses kvm for x86 or qemu static for other architectures (1/20
of speed)

Summary:
we support other architectures, we'll solve problems if they arrive
The biggest problem might be grub/booting
We won't be adding new architectures unless there are real providers
using them

Secure boot
===========

All providers want it, as customer start expecting (demanding) it
It'll be opt-in for customers

Debian want to have Secure boot for Buster (release target)
Much work done during DebConf 2018.

Next alpha installer should have it
Software is ready

Additional kernel modules.
GPU drivers require them
Others - we don't know

Still problem with bootstraping and adding new keys

Kernel and then later stages verification
Debian should cope with it; software should be ready, might require
scaling up

Platforms should have MS key and that's all
(just like desktops)

qemu secure boot?
is ready, just need to add command line arguments with key details
(OVMF)

Non-x86 differs
No root CA, like MS is for x86
HPE shipped HW with ability to add new keys (non-enforcing mode?)

GOVERNANCE
==========
# attendance
- - Luca Filipozzi (SPI; Debian)
- - David Duncan (AWS)
- - Bastian Blank (credativ for Microsoft; Debian)
- - Lucas Kanashiro (Collabora for Google)
- - Jimmy Kaplowitz (SPI; Debian)
- - Max Illfelder (Google)
- - Tomasz Rybak (Debian)

* Handling of cloud accounts including permissions and ownership
* AWS Account ownership (currently owned by JEB, (needs|should really)
to be SPI)
* GCP Account
* AZU Account (to coin an acronym)

Next level of relationship between Debian/SPI and cloud providers.
Delegate to the team instead of going through leader.
Delegates vs. assistants
More than 1 delegates (preferably 3)

Individual can create instance (long-lived) and then lose Debian
account.
Need for way for DSA (?) to be able to terminate those resources
Need for specific per-cloud solution for this.
E.g. management of private ssh keys
No long-lived instances (?) or long-lived managed by DSA and not by DDs?

# discussion

## business

### requirements
- - account not held by an individual (no SPOF)
- - account should be for 'organization' not for 'individual'
- - account could use standard terms and conditions; could negotiate some
changes such indemnity

### current status
- - AWS:
- Bromberger-tied account under AWS VP
- used primarily by Noah to build and publish Debian cloud images
- Also hosts some Debian QA projects
- - Azure:
- credativ-tied account in public and regional Azure clouds, used for
mirror and image publishing
- internal name: credativ (used more deeply than just 'owner id')
- need to know how it is used, and after that to decide what we do
with it
- external name: Debian
- - GCP:
- debian-cloud and related projects owned by and billed to Google
- debian-cloud is where Google's official Debian images are
publiished
- debian-cloud-experiments is where Debian cloud team people can
tinker
- debian-infra used for the mirror project
- Collabora-tied account under which Debian cloud images are tested
and improved (built and released by Google)
- SPI-tied account under G-Suite (used so far by salsa, and separately
also freedesktop.org)
- Salsa expenses paid through credit applied to the account by
Google
- Freedesktop.org expenses paid through SPI debit card by fdo's
funds

### action items
- - AWS
- structure:
- AWS VP (cost covered by AWS VP; owned by AWS)
- Debian publication account (owned by SPI; delegated to Debian;
new)
- Separate Debian publication account for AWS GovCloud region
- Debian engineering account (owned by SPI; delegated to Debian;
new)
- ArchLinux publication account (ditto)
- ArchLinux engineering account (ditto)
- SPI (cost covered by SPI; owned by SPI)
- Debian bar (owned by SPI; delegated to Debian; new)
- freedesktop (owned by SPI; delegated to freedeskop; new)
- etc.
- schedule cadence meetings both business and technical considerations
- - Azure
- structure:
- SPI for Debian (some subscriptions covered by AZU, others paid;
owned by SPI)
- Debian mirror subscription
- Debian cloud-build subscription
- Debian bar subscription (not sponsored)
- SPI for ArchLinux
- ArchLinux subscription (sponsored)
- SPI for freedesktop
- freedesktop subscription (not sponsored)
- mirror
- complete
- - GCE
- SPI G Suite organzation (linked to spi-inc.org GCP organization)
- folders
- Debian GCP folder
- Debian cloud-build project (sponsored)
- Debian bar project (not sponsored)
- ArchLinux GCP folder
- ArchLinux project (sponsored)
- Freedesktop GCP
- freedesktop project (not sponsored)
- billing
- Multiple billing accounts, one or more per project (e.g. "Debian
- - Salsa")
- One or more payment profiles (e.g. "SPI debit card", "future SPI
credit card", "invoice", "something Google-internal maybe")
- Payment profiles can be used for one or more billing accounts
- Billing accounts can be used for one or more projects
- - DO
- ?
- - Gandi
- ?

### contacts
- - AWS:
- administrative: David Duncan (***@amazon.com)
- technical: ***@debian.org
- - Azure:
- administrative: Stephen Zarkos <***@microsoft.com>
- technical: Bastian Blank <***@credativ.de>
- - GCP:
- administrative: Shanmugam (Sham) Kulandaivel (***@google.com)
- technical: Zach Marano (***@google.com)
- - SPI:

### constraints
- - AWS
- no cap, but use must be frugal and directed
- not-for-profit use
- - AZU
- high cap, use for mirrors and testing
- - GCE
- no cap, but use must be frugal and directed

Mirrors
4 people are ready for ops:
* Tomasz Rybak
* Ross
* Noah
* Bastien
We should make sure that mirrors are reliable as they will be used by
all our instances in clouds

Marketplace publication
Contractual requirements; we should do general ToS review before
creating accounts,
and publishing images

Accounts.
New ones, so new IDs -> we need to have transition plan

AWS:
Jeb account: no way of detaching it from owner and move to Debian/SPI

We publish Buster to new account, and old (Stretch) to old one,
so people have continuity and we avoid confusion

Blog note that we'll migrate between accounts

Need for legal revision before we publish Buster beta (freeze, Jan 2019)

Azure
Subscription
Metadata, publisher id: credativ
description (JSON): can be changed

So discoverability "Debian" but creation requires publisher id, so
"credativ"

Create alias for marketplace.
To timeline for this feature from Azure

Currently Credativ is publishing, but they should do it through
Debian/SPI account

Mirrors. Credativ also manages them for Azure. Do we move this to SPI
account?
It's less clean situation as mirrors are not as official as images.
Also we have university-run mirrors, not under Debian control.

GCE
We have SPI account
Images are not tied to account
But images are tied to project, which is owner by Google right now?
There might be possibility to hand over this project.

We might need to create more projects for tests, etc.

## technical

### mirror networks
- - requirements:
- Debian wants users of Debian Images in cloud providers to have
fastest/cheapest/easist access to package mirror
- fastest in ms
- cheapest in $ (no network cost)
- easiest -> preconfigured (ideally deb.debian.org)
- - the cloud providers want an internal Debian mirror network in each
of their geographic region in order to reduce network costs (to
the provider) and to provide fast access to users
- - AWS
- deisres:
- siimlar to Debian
- options:
- (A) use CloudFront in front of EC2 Instances in three AWS Regions
(NA, EU, OC) supplemented with AWS Route53 entries to drive
traffic to the correct place
- (B) use EC2 Instances in each Aws Regions, each with an AWS ALB,
supplemented with AWS Route53 entries (or fallback to DNS names
using region names)
- waldi has tried (A) under the jeb's account using terraform (torn
down); will need to be recreated under SPI/Debian
- constraints / decision factors:
- cost (AWS decision)
- effort (Debian prefers A if it is reliable)
- reliability (Debian assume A is reliable)
- concerns:
- private endpoints in VPCs ("private link") -> future work
- actions:
- davdunc to confirm we can use CloudFornt
- noahm will speak with the AWS folks tha run the Amazon Linux yum
mirrors to see if there's a different approach to consider
- - AZU
- desires:
- same as Debian
- options:
- (A) no CDN equivalent
- (B) Traffic Manager (DNS) -> Load Balancer -> two instance in each
reagion
- option (B) exists (more or less) under credativ account; will need
to be
recreated under SPI/Debian
- constraints / decision factors:
- only choice (B)
- cost / effort / reliability
- concerns:
- none that we know of
- actions:
- waldi to consider moving inside debian.org
- - GCE
- desires:
- same as Debian
- "enterprise mirror" ... control the point in time that packages
are
available (like snapshot.debian.org or ipfs)
- options:
- (A) CloudCDN ... VM instances in multiple regions ... single name
handles per region service
- (B) instances per region with a load balancer ... not preferred by
GCE with hostname tricks
- (C) a new secret google service that might do the equivalent ...
- option (A) exists (more or less) under debian-infra project and is
blocking on DSA; will need to be recreated under SPI/Debian
- constraints / decision factors:
- cost (GCE prefers (C) followed by (A); Debian same)
- effort (Debian prefers (A)
- reliability (Debian assumes (C) and (A) are more reliable than
(B))
- timing ... GCE will have decisions regarding (C) by end of year
- concerns:
- private endpoitns in VPCs -> future work but "magic"
- actions:
- zmarano will reply by end of year for option (C)
- lucaf to review waldi's ticket from 9mo ago

- - how does the above interact with deb.debian.org? does it need to?
- no, we propose ${vendor}.mirror.debian.org where ${vendor} is
- AWS -> delegate to Route53
- AZU -> CNAME
- GCE -> A record (magic IP address)
- and is set at cloud image creation time and possibly overridden in
cloud-init

- - how do we make use of above for worldwide access to deb.debian.org?
- AWS -> good to go if we use (A)
- AZU -> no
- GCE -> TBD (zmarano to respond if yes and whether (C) or (A))

- - https as default
- this is a certificate management problem
- AWS -> would prefer; use ACM
- AZU -> would work with effort because load balancer is TCP not
HTTPS; need
to push certs to every backend instance
- GCE -> would prefer; for (C), will work magically; for (A),
certificate
management exists from waldi
- is set at cloud image creation time

- - AWS
- waldi has tried stuff
- davdunc can find us a resource
- - AZU
- waldi knows most
- - GCE
- jimmy knows most

Account management
- ------------------

### requirements
- - SPI does not wish to run SSO for every Associated Project
- - accounts for people and service accounts
- - each Associated Project can tie SSO

###
- - overview
Debian: LDAP as protocol (LDAP--)
OpenLDAP based. Customised, uses posix users. Debian user schema with
additional attributes
memberOf attribute, not group with member attribute
UserDIR for authentication, not LDAP itself
event-driven way of propagating changes to machines
Especially as there are machines around the world, we don't want to have
use
full LDAP and to give full access to strange machines

Debian uses it, SPI, ganeth, etc.
sso written by Enrico. x509 based, browser has client certificate

For clouds we'd need new LDAP or some facade.
SAML, OAuth2 or something along those lines

Or some synchronisation (if above does not work)
We keep copy of relevant data in cloud and push changes
Just like we do now with UserDIR

Federated provider?
AWS: roles (ephemeral users)
No such functionality elsewhere; we'd need to have user objects there

Lowest common denominator
Users, groups, etc.
Then problem: authentication. How to manange passwords?

IDP-based ephemeral user objects

AWS. Associate role to group of users
Needs to be done in our own LDAP. Attach role to LDAP group, not AWS
user
group (in IAM)

Debian group gets attached role. During synchronisation, not during
authentication.

SAML is for assertions. User authentitets to us and then we request/pass
token

AWS user can belong to many groups.
But can only have one role

G
For humans. There need to exist google account for person.
We can (but do not need to) push password
Roles - created in Google
Auth can also be SAML
LDAP group -> Google group

ability to push full posix, including ssh keys

We can instantiate users.

sync using API - we need to write code ourselves

experience across platforms should be consistent
AWS:
sync no
sync using API: yes
sync + SAML (no), but Noah needs to check it. Still no - uses role
behind
pure SAML - yes, ephemeral roles

Google:
sync yes, but bad hashing choice
sync + SAML: yes
pure saml: no, they require G account

Azure:
sync: probably not, Zach is checking
sync + saml:
pure saml: yes
sync and AD connect, federated model. Azure ID would go to our LDAP?

More than one AWS project - then we'll need to have sync to all of them

Lowest common denominator, maybe prefered choice:
sync using API
We don't put passwords, users need to set it themselves.
When account is deleted from Debian, it'll also be deleted from clouds.
In either case we won't care about password

AWS - prefers using roles
Azure Internally uses AD ID?
Google: we need user
But role can be attached to group

User and group on Debian side
In platform attachment priviledges to groups (not users!)

Google authentication best practices:
https://cloud.google.com/blog/products/identity-security/using-your-existing-identity-management-system-with-google-cloud-platform
The likely way to apply this to Debian is Google Cloud Directory Sync +
authenticating via SAML IdP.

# Storing credentials to access clouds
To run tests
use gitlab Protected Account

Also - authorisation of those roles/user
Credential rotations, etc

WRAP UP
=======
We have FAI config files for all the providers
We want debian accounts
SPI umbrella for accounts
We will probably have delegation from DPL
We have salsa pipelines to build images
We don't have an image finder

Timeline:
Buster.
Freeze - Jan 2019, we should have:
accounts
mirror names
beta images with those

Bastian:
mirror
EFI - secure boot
cloud kernel config cleanup

Luca + Jimmy as SPI
Debian accounts and legal agreements with the providers

Luca
delegation
authentication, authorisation (not required for Buster)
driving the cloud mirrors conversation through DSA
Send email about reimbursement

Jimmy
help luca with auth

Ross
integrating tests with the build pipeline
re-org of test framework

Tomasz
Delegate
Summarize notes of the sprint
integration tests
Planning to go through the mirror

Helen
work on tests - framework, integration etc.
nm work :-)
secure boot (+steve + lucas)

Martin
SPI and DSA work, not much free time otherwise
Debian-cloud-announcement ML

Steve M
Delegation
images vs cloud team on casulana
cloud-kernel for arm64
help zigo run arm64 OpenStack
arm64 host (not sure here)
secure boot
Organizing monthly meetings

Thomas L
likely to be occupied - surgery
review pull requests, etc

Thomas G
openstack ppc64el build and test

Noah
continue maintaining the stretch images for AWS
developing buster aws images
automatic building/registering from casulana
coordinate creating new AWS account (including publishing on gov cloud)

Lucas
secure boot
image finder :-D

David Duncan
New account structure
help with mirrors
help with rollout of tests on AWS
And try to open source existing tests
Set up monthly sync with SPI

Steve Z
accounts for publishing
user account federation (for DDs etc.)
EFI help for gen v2 (new HW) - make sure it works on Buster
whitelist waldi to access gen-2 capable region
access to team for test infrastructure (not defined timeline)

- --
Tomasz Rybak, Debian Developer <***@debian.org>
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C