Bosh

From Leo's Notes
Last edited on 14 June 2020, at 23:32.

Bosh is a tool created by Cloud Foundry to manage and deploy software on virtual machines either on private vSphere clouds, or on public clouds.

See Bosh's website for more information at https://bosh.io/.

Usage

Installation

Obtain the binaries from https://github.com/cloudfoundry/bosh-cli/releases.

If using PKS, Bosh is already installed as part of the Ops Manager deployment.

Obtaining Credentials

Before using any of the commands below, you will need to authenticate your bosh client against the bosh director. This can be done using environment variables or by using the bosh login command.

With PKS, the Bosh credentials can be obtained from Ops Manager. Log in to Ops Manager, then navigate to Bosh Tile -> Credentials -> Bosh Commandline Credentials.

Set the following environment variables in a .bash_profile file:

export BOSH_CLIENT=ops_manager
export BOSH_CLIENT_SECRET=xxxxxxxxxxxxxxxx
export BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate
export BOSH_ENVIRONMENT=172.31.0.3 bosh

Commands

Deployments

bosh deployments shows deployments tracked by the Bosh Director.

Each deployment has a name, a set of Bosh releases (ie. components required by the cloud VMs), the stemcell (ie. operating system), and the team it belongs to (ie. the parent deployment).

# bosh deployments
Using environment '172.31.0.3' as client 'ops_manager'

Name                                                   Release(s)                                Stemcell(s)                                      Team(s)
harbor-container-registry-b987f1f1a01cf539193c         bosh-dns/1.10.0                           bosh-vsphere-esxi-ubuntu-xenial-go_agent/250.93  -
                                                       harbor-container-registry/1.7.4-build.42
pivotal-container-service-ad2c46d3833f5f4ea239         backup-and-restore-sdk/1.8.0              bosh-vsphere-esxi-ubuntu-xenial-go_agent/250.93  -
                                                       bosh-dns/1.10.0
                                                       bpm/1.0.4
                                                       cf-mysql/36.14.0.1
                                                       cfcr-etcd/1.10.0
                                                       docker/35.1.0
                                                       harbor-container-registry/1.7.4-build.42
                                                       kubo/0.31.7
                                                       kubo-service-adapter/1.4.0-build.230
                                                       nsx-cf-cni/2.4.1.13515827
                                                       on-demand-service-broker/0.26.0
                                                       pks-api/1.4.0-build.230
                                                       pks-nsx-t/1.30.0
                                                       pks-telemetry/2.0.0-build.201
                                                       pks-vrli/0.9.0
                                                       pks-vrops/0.13.0
                                                       pxc/0.14.0
                                                       sink-resources-release/0.1.32
                                                       syslog/11.4.0
                                                       uaa/71.2
                                                       wavefront-proxy/0.14.0
service-instance_06d9a723-24d8-477e-8e7c-dcbf31c84e96  bosh-dns/1.10.0                           bosh-vsphere-esxi-ubuntu-xenial-go_agent/250.93  pivotal-container-service-ad2c46d3833f5f4ea239
                                                       bpm/1.0.4
                                                       cfcr-etcd/1.10.0
                                                       docker/35.1.0
                                                       harbor-container-registry/1.7.4-build.42
                                                       kubo/0.31.7
                                                       nsx-cf-cni/2.4.1.13515827
                                                       pks-api/1.4.0-build.230
                                                       pks-nsx-t/1.30.0
                                                       pks-telemetry/2.0.0-build.201
                                                       pks-vrli/0.9.0
                                                       pks-vrops/0.13.0
                                                       sink-resources-release/0.1.32
                                                       syslog/11.4.0
                                                       wavefront-proxy/0.14.0

bosh delete-deployment -d deployment_name can be used to delete a deployment by name. Use the --force to force delete and ignore any script or job errors.

ubuntu@ITSOOPSMAN01:~$ bosh delete-deployment -d service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7
Using environment '172.31.0.3' as client 'ops_manager'

Using deployment 'service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7'

Continue? [yN]: y

Task 189372

Task 189372 | 20:36:37 | Deleting instances: master/3d9a60e5-0f27-40c2-933f-fef5f9d874e0 (2)
Task 189372 | 20:36:37 | Deleting instances: worker/80adeff6-9017-47cc-b76e-853944b71a89 (3)
Task 189372 | 20:36:37 | Deleting instances: apply-addons/7942478a-ec42-4bf0-a1ea-dbe9cd2bdd09 (0)
Task 189372 | 20:36:37 | Deleting instances: worker/a18216cb-253f-4067-8f94-5e1e099a94a7 (1)
Task 189372 | 20:36:37 | Deleting instances: master/5dffde0c-2890-4a6f-8b94-d6c7f6089cac (1)
Task 189372 | 20:36:37 | Deleting instances: worker/8cc69743-84ec-4862-aabd-9277769e7332 (0)
Task 189372 | 20:36:37 | Deleting instances: master/aa5ff8b9-57b4-4c2e-aa50-8dcece512e22 (0)
Task 189372 | 20:36:37 | Deleting instances: worker/516b4580-b910-45ec-aa24-dc8b112d7f00 (4)
Task 189372 | 20:36:37 | Deleting instances: worker/5dacad58-b77b-42ed-8d86-f2451eb67db7 (2)
Task 189372 | 20:36:37 | Deleting instances: apply-addons/7942478a-ec42-4bf0-a1ea-dbe9cd2bdd09 (0) (00:00:00)

Task 189409 | 20:38:09 | Deleting instances: worker/80adeff6-9017-47cc-b76e-853944b71a89 (3) (00:00:11)
                      L Error: Action Failed get_task: Task 0803ac73-6db2-4145-7160-ceaddba77230 result: 1 of 3 drain scripts failed. Failed Jobs: kubelet. Successful Jobs: syslog_forwarder, openvswitch.
Task 189409 | 20:38:09 | Deleting instances: worker/a18216cb-253f-4067-8f94-5e1e099a94a7 (1) (00:00:11)
                      L Error: Action Failed get_task: Task 6ea57ea7-c5b9-4778-5947-73be0bb50579 result: 1 of 3 drain scripts failed. Failed Jobs: kubelet. Successful Jobs: syslog_forwarder, openvswitch.
Task 189409 | 20:38:09 | Deleting instances: worker/516b4580-b910-45ec-aa24-dc8b112d7f00 (4) (00:00:11)
                      L Error: Action Failed get_task: Task f76f0e3d-0d23-4434-7955-5e13778fc5ba result: 1 of 3 drain scripts failed. Failed Jobs: kubelet. Successful Jobs: syslog_forwarder, openvswitch.
Task 189409 | 20:38:09 | Error: Action Failed get_task: Task 0803ac73-6db2-4145-7160-ceaddba77230 result: 1 of 3 drain scripts failed. Failed Jobs: kubelet. Successful Jobs: syslog_forwarder, openvswitch.

Task 189409 Started  Tue Aug 27 20:37:58 UTC 2019
Task 189409 Finished Tue Aug 27 20:38:09 UTC 2019
Task 189409 Duration 00:00:11
Task 189409 error

Deleting deployment 'service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7':
  Expected task '189409' to succeed but state is 'error'

Exit code 1

Errors can be ignored by using the --force option.

VMs

bosh vms shows VMs that are running for every deployment managed by Bosh Director. Specific deployments can be specified with the -d option.

Common options:

  • Show stats using --vitals
  • Cloud properties with --cloud-properties
# bosh vms -d service-instance_62e82b91-cabb-40f0-842c-e1c10a98f2f7 --vitals
Using environment '172.31.0.3' as client 'ops_manager'

Task 164927. Done

Deployment 'service-instance_62e82b91-cabb-40f0-842c-e1c10a98f2f7'

Instance                                     Process State  AZ           IPs          VM CID                                   VM Type     Active  VM Created At                 Uptime          Load              CPU    CPU   CPU   CPU    Memory        Swap         System      Ephemeral   Persistent
                                                                                                                                                                                                 (1m, 5m, 15m)     Total  User  Sys   Wait   Usage         Usage        Disk Usage  Disk Usage  Disk Usage
master/74eb8b50-da8f-4d51-9a26-c8ab63c2bc70  running        pks-compute  172.16.14.3  vm-ec465136-0887-4a17-9aa0-3b9be8a33469  xlarge      true    Fri Aug 23 22:04:07 UTC 2019  2d 22h 27m 35s  0.15, 0.10, 0.08  -      1.0%  0.7%  4.6%   9% (1.5 GB)   0% (0 B)     46% (33i%)  13% (3i%)   4% (0i%)
master/c8cd7497-78bb-49a8-a092-264cbad73100  running        pks-compute  172.16.14.2  vm-f82f0fe9-cd61-40cd-86bb-ac60c88143ca  xlarge      true    Fri Aug 23 22:04:07 UTC 2019  2d 22h 27m 34s  0.10, 0.14, 0.15  -      0.9%  0.6%  7.5%   10% (1.6 GB)  0% (0 B)     46% (33i%)  13% (3i%)   4% (0i%)
master/d50104c4-759f-40e7-b55f-15aa93083272  running        pks-compute  172.16.14.4  vm-1ffa5f44-2d08-4080-808c-4733e98bc954  xlarge      true    Fri Aug 23 22:04:06 UTC 2019  2d 22h 27m 34s  0.04, 0.08, 0.08  -      0.6%  0.2%  0.3%   9% (1.5 GB)   0% (0 B)     46% (33i%)  13% (3i%)   4% (0i%)
worker/6f3ec8e8-8976-449d-851b-83803a67d348  running        pks-compute  172.16.14.5  vm-4eccd396-a040-44ac-8fdb-61bbf87ef024  large.disk  true    Fri Aug 23 22:04:08 UTC 2019  2d 22h 27m 34s  2.04, 2.82, 2.10  -      5.7%  2.7%  44.7%  40% (3.3 GB)  0% (4.1 MB)  46% (33i%)  7% (1i%)    12% (6i%)

4 vms

Tasks

bosh tasks shows tasks and errands that are currently running by the Bosh Director. Show completed recent tasks with -r[=n] or --recent[=n] flags, with optional n value to specify n results.

bosh task n show specific task logs for task n. Common flags

  • --debug to get debug logs
  • --cpi to show cloud provider interface logs

bosh cancel-task n will cancel a specific task n.

root@ITSOOPSMAN01:~# bosh tasks
Using environment '172.31.0.3' as client 'ops_manager'

ID      State       Started At                    Last Activity At              User                                            Deployment                                             Description                                                                                              Result
164924  processing  Mon Aug 26 20:28:45 UTC 2019  Mon Aug 26 20:28:45 UTC 2019  pivotal-container-service-ad2c46d3833f5f4ea239  service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7  create deployment                                                                                        -
164034  processing  Mon Aug 26 19:37:16 UTC 2019  Mon Aug 26 19:37:16 UTC 2019  ops_manager                                     pivotal-container-service-ad2c46d3833f5f4ea239         run errand upgrade-all-service-instances from deployment pivotal-container-service-ad2c46d3833f5f4ea239  -

2 tasks

Succeeded

Instances

bosh instances shows instances in deployments. Common flags:

  • --details to show details
  • --ps to show process for each instance, which shows the Monit status in each VM

SSH

bosh ssh -d deployment vm creates a SSH connection to the particular VM.

# bosh ssh -d service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7 worker/8cc69743-84ec-4862-aabd-9277769e7332
Using environment '172.31.0.3' as client 'ops_manager'

Using deployment 'service-instance_92975a8c-be8f-4db4-8731-8a6c3aab51d7'

Task 164932. Done
Unauthorized use is strictly prohibited. All access and activity
is subject to logging and monitoring.
Welcome to Ubuntu 16.04.6 LTS (GNU/Linux 4.15.0-52-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

Last login: Mon Aug 26 20:48:23 2019 from 172.31.0.2
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

worker/8cc69743-84ec-4862-aabd-9277769e7332:~$


Cleanup

bosh clean-up removes all unused releases and stemcells.

# bosh clean-up
Using environment '172.31.0.3' as client 'ops_manager'

Continue? [yN]: y

Task 164925

Task 164925 | 20:29:18 | Deleting stemcells: bosh-vsphere-esxi-ubuntu-xenial-go_agent/250.29 (00:00:12)
Task 164925 | 20:29:30 | Deleting dns blobs: DNS blobs (00:00:00)

Task 164925 Started  Mon Aug 26 20:29:18 UTC 2019
Task 164925 Finished Mon Aug 26 20:29:30 UTC 2019
Task 164925 Duration 00:00:12
Task 164925 done

Succeeded

Locks

bosh locks lists current locks held by different tasks or deployments

Errands

bosh errands -d deployment lists all errands defined by a deployment

bosh run-errand -d deployment errand runs an errand by a job name, in a particular deployment

# bosh run-errand -d pivotal-container-service-ad2c46d3833f5f4ea239 upgrade-all-service-instances
Using environment '172.31.0.3' as client 'ops_manager'

Using deployment 'pivotal-container-service-ad2c46d3833f5f4ea239'

Task 164997

Task 164997 | 21:06:08 | Preparing deployment: Preparing deployment (00:00:01)
Task 164997 | 21:06:09 | Running errand: pivotal-container-service/6b58ba3e-be95-43e7-a9f5-57e8812c4826 (0)

Logs

bosh logs shows logs from a particular deployment.


Troubleshooting

Restarting Bosh Director VM Breaks Bosh

After rebooting the Bosh Director VM, subsequent bosh calls results in:

Using environment '172.31.0.3' as client 'ops_manager'

Finding current tasks:
  Performing request GET 'https://172.31.0.3:25555/tasks?state=processing%!C(MISSING)cancelling%!C(MISSING)queued&verbose=2':
    Performing GET request:
      Requesting token via client credentials grant: UAA responded with non-successful status code '503' response 'FAILURE'

Exit code 1

This issue is known, see:

The fix is to run monit restart all after all services except credhub have started. The credhub service should be in a running state after this is done after a few minutes and bosh should function correctly afterwards.