Skip to content

Commit

Permalink
Merge pull request rook#2269 from travisn/ceph-volume
Browse files Browse the repository at this point in the history
Provision OSDs with ceph-volume
  • Loading branch information
travisn authored Dec 7, 2018
2 parents 389c489 + 6ff3248 commit a85aa84
Show file tree
Hide file tree
Showing 21 changed files with 827 additions and 207 deletions.
7 changes: 7 additions & 0 deletions Documentation/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ The following storage selection settings are specific to Ceph and do not apply t
- `databaseSizeMB`: The size in MB of a bluestore database. Include quotes around the size.
- `walSizeMB`: The size in MB of a bluestore write ahead log (WAL). Include quotes around the size.
- `journalSizeMB`: The size in MB of a filestore journal. Include quotes around the size.
- `osdsPerDevice`**: The number of OSDs to create on each device. High performance devices such as NVMe can handle running multiple OSDs. If desired, this can be overridden for each node and each device.

** **NOTE:** Depending on the Ceph image running in your cluster, OSDs will be configured differently. Newer images will configure OSDs with `ceph-volume`, which provides support for `osdsPerDevice` as well as other features that will be exposed in future Rook releases. OSDs created prior to Rook v0.9 or with older images of Luminous and Mimic are not created with `ceph-volume` and thus would not support the same features. For `ceph-volume`, the following images are supported:
- Luminous 12.2.10 or newer
- Mimic 13.2.3 or newer
- Nautilus

### Placement Configuration Settings
Placement configuration for the cluster services. It includes the following keys: `mgr`, `mon`, `osd` and `all`. Each service will have its placement configuration generated by merging the generic configuration under `all` with the most specific one (which will override any attributes).
Expand Down Expand Up @@ -184,6 +190,7 @@ spec:
metadataDevice:
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
osdsPerDevice: "1"
```

### Storage Configuration: Specific devices
Expand Down
4 changes: 3 additions & 1 deletion PendingReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
- The number of mons can be changed by updating the `mon.count` in the cluster CRD.
- RBD Mirroring is enabled by Rook. By setting the number of [rbd mirroring workers](Documentation/ceph-cluster-crd.md#cluster-settings), the daemon(s) will be started by rook. To configure the pools or images to be mirrored, use the Rook toolbox to run the [rbd mirror](http://docs.ceph.com/docs/mimic/rbd/rbd-mirroring/) configuration tool.
- Object Store User creation via CRD for Ceph clusters.
- Ceph MON, MGR, MDS, and RGW deployments (or DaemonSets) will be updated/upgraded automatically with updates to the Rook operator.
- Ceph OSD, MGR, MDS, and RGW deployments (or DaemonSets) will be updated/upgraded automatically with updates to the Rook operator.
- Ceph OSDs are created with the `ceph-volume` tool when configuring devices, adding support for multiple OSDs per device. See the [OSD configuration settings](Documentation/ceph-cluster-crd.md#osd-configuration-settings)


## Breaking Changes
Expand All @@ -48,5 +49,6 @@
All connections from users and clients are expected to come in through the [configurable Service instance](cluster/examples/kubernetes/minio/object-store.yaml#37).

## Known Issues
- Upgrades are not supported to nautilus. Specifically, OSDs configured before the upgrade (without ceph-volume) will fail to start on nautilus. Nautilus is not officially supported until its release, but otherwise is expected to be working in test clusters.

## Deprecations
5 changes: 4 additions & 1 deletion cluster/examples/kubernetes/ceph/cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ spec:
# storeType: bluestore
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
osdsPerDevice: "1" # this value can be overridden at the node or device level
# Cluster level list of directories to use for storage. These values will be set for all nodes that have no `directories` set.
# directories:
# - path: /rook/storage-dir
Expand All @@ -250,7 +251,9 @@ spec:
# - name: "172.17.4.201"
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "sdc"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# config: # configuration can be specified at the node level which overrides the cluster level config
# storeType: filestore
# - name: "172.17.4.301"
Expand Down
82 changes: 75 additions & 7 deletions cmd/rook/ceph/osd.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package ceph
import (
"fmt"
"os"
"strconv"
"strings"

"github.com/rook/rook/cmd/rook/rook"
Expand Down Expand Up @@ -59,13 +60,22 @@ var filestoreDeviceCmd = &cobra.Command{
Short: "Runs the ceph daemon for a filestore device",
Hidden: true,
}
var osdStartCmd = &cobra.Command{
Use: "start",
Short: "Starts the osd daemon", // OSDs that were provisioned by ceph-volume
Hidden: true,
}
var (
osdDataDeviceFilter string
ownerRefID string
mountSourcePath string
mountPath string
osdID int
copyBinariesPath string
osdStoreType string
osdStringID string
osdUUID string
osdIsDevice bool
)

func addOSDFlags(command *cobra.Command) {
Expand All @@ -82,6 +92,7 @@ func addOSDFlags(command *cobra.Command) {

// flags for generating the osd config
osdConfigCmd.Flags().IntVar(&osdID, "osd-id", -1, "osd id for which to generate config")
osdConfigCmd.Flags().BoolVar(&osdIsDevice, "is-device", false, "whether the osd is a device")

// flag for copying the rook binaries for use by a ceph container
copyBinariesCmd.Flags().StringVar(&copyBinariesPath, "path", "", "Copy the rook binaries to this path for use by a ceph container")
Expand All @@ -90,11 +101,17 @@ func addOSDFlags(command *cobra.Command) {
filestoreDeviceCmd.Flags().StringVar(&mountSourcePath, "source-path", "", "the source path of the device to mount")
filestoreDeviceCmd.Flags().StringVar(&mountPath, "mount-path", "", "the path where the device should be mounted")

// flags for running osds that were provisioned by ceph-volume
osdStartCmd.Flags().StringVar(&osdStringID, "osd-id", "", "the osd ID")
osdStartCmd.Flags().StringVar(&osdUUID, "osd-uuid", "", "the osd UUID")
osdStartCmd.Flags().StringVar(&osdStoreType, "osd-store-type", "", "whether the osd is bluestore or filestore")

// add the subcommands to the parent osd command
osdCmd.AddCommand(osdConfigCmd)
osdCmd.AddCommand(copyBinariesCmd)
osdCmd.AddCommand(provisionCmd)
osdCmd.AddCommand(filestoreDeviceCmd)
osdCmd.AddCommand(osdStartCmd)
}

func addOSDConfigFlags(command *cobra.Command) {
Expand All @@ -107,6 +124,8 @@ func addOSDConfigFlags(command *cobra.Command) {
command.Flags().IntVar(&cfg.storeConfig.DatabaseSizeMB, "osd-database-size", osdcfg.DBDefaultSizeMB, "default size (MB) for OSD database (bluestore)")
command.Flags().IntVar(&cfg.storeConfig.JournalSizeMB, "osd-journal-size", osdcfg.JournalDefaultSizeMB, "default size (MB) for OSD journal (filestore)")
command.Flags().StringVar(&cfg.storeConfig.StoreType, "osd-store", "", "type of backing OSD store to use (bluestore or filestore)")
command.Flags().IntVar(&cfg.storeConfig.OSDsPerDevice, "osds-per-device", 1, "the number of OSDs per device")
command.Flags().BoolVar(&cfg.storeConfig.EncryptedDevice, "encrypted-device", false, "whether to encrypt the OSD with dmcrypt")
}

func init() {
Expand All @@ -117,11 +136,30 @@ func init() {
flags.SetFlagsFromEnv(copyBinariesCmd.Flags(), rook.RookEnvVarPrefix)
flags.SetFlagsFromEnv(provisionCmd.Flags(), rook.RookEnvVarPrefix)
flags.SetFlagsFromEnv(filestoreDeviceCmd.Flags(), rook.RookEnvVarPrefix)
flags.SetFlagsFromEnv(osdStartCmd.Flags(), rook.RookEnvVarPrefix)

osdConfigCmd.RunE = writeOSDConfig
copyBinariesCmd.RunE = copyRookBinaries
provisionCmd.RunE = prepareOSD
filestoreDeviceCmd.RunE = runFilestoreDeviceOSD
osdStartCmd.RunE = startOSD
}

// Start the osd daemon if provisioned by ceph-volume
func startOSD(cmd *cobra.Command, args []string) error {
required := []string{"osd-id", "osd-uuid", "osd-store-type"}
if err := flags.VerifyRequiredFlags(osdStartCmd, required); err != nil {
return err
}

commonOSDInit(osdStartCmd)

context := createContext()
err := osddaemon.StartOSD(context, osdStoreType, osdStringID, osdUUID, args)
if err != nil {
rook.TerminateFatal(err)
}
return nil
}

// Start the osd daemon for filestore running on a device
Expand Down Expand Up @@ -181,7 +219,7 @@ func writeOSDConfig(cmd *cobra.Command, args []string) error {
crushLocation := strings.Join(locArgs, " ")
kv := k8sutil.NewConfigMapKVStore(clusterInfo.Name, clientset, metav1.OwnerReference{})

if err := osddaemon.WriteConfigFile(context, &clusterInfo, kv, osdID, cfg.storeConfig, cfg.nodeName, crushLocation); err != nil {
if err := osddaemon.WriteConfigFile(context, &clusterInfo, kv, osdID, osdIsDevice, cfg.storeConfig, cfg.nodeName, crushLocation); err != nil {
rook.TerminateFatal(fmt.Errorf("failed to write osd config file. %+v", err))
}
return nil
Expand Down Expand Up @@ -209,17 +247,21 @@ func prepareOSD(cmd *cobra.Command, args []string) error {
return err
}

var dataDevices string
var usingDeviceFilter bool
var dataDevices []osddaemon.DesiredDevice
if osdDataDeviceFilter != "" {
if cfg.devices != "" {
return fmt.Errorf("Only one of --data-devices and --data-device-filter can be specified.")
}

dataDevices = osdDataDeviceFilter
usingDeviceFilter = true
dataDevices = []osddaemon.DesiredDevice{
{Name: osdDataDeviceFilter, IsFilter: true},
}
} else {
dataDevices = cfg.devices
var err error
dataDevices, err = parseDevices(cfg.devices)
if err != nil {
rook.TerminateFatal(fmt.Errorf("failed to parse device list (%s). %+v", cfg.devices, err))
}
}

clientset, _, rookClientset, err := rook.GetClientset()
Expand All @@ -241,7 +283,7 @@ func prepareOSD(cmd *cobra.Command, args []string) error {
forceFormat := false
ownerRef := cluster.ClusterOwnerRef(clusterInfo.Name, ownerRefID)
kv := k8sutil.NewConfigMapKVStore(clusterInfo.Name, clientset, ownerRef)
agent := osddaemon.NewAgent(context, dataDevices, usingDeviceFilter, cfg.metadataDevice, cfg.directories, forceFormat,
agent := osddaemon.NewAgent(context, dataDevices, cfg.metadataDevice, cfg.directories, forceFormat,
crushLocation, cfg.storeConfig, &clusterInfo, cfg.nodeName, kv)

err = osddaemon.Provision(context, agent)
Expand All @@ -265,3 +307,29 @@ func commonOSDInit(cmd *cobra.Command) {

clusterInfo.Monitors = mondaemon.ParseMonEndpoints(cfg.monEndpoints)
}

// Parse the devices, which are comma separated. A colon indicates a non-default number of osds per device.
// For example, one osd will be created on each of sda and sdb, with 5 osds on the nvme01 device.
// sda,sdb,nvme01:5
func parseDevices(devices string) ([]osddaemon.DesiredDevice, error) {
var result []osddaemon.DesiredDevice
parsed := strings.Split(devices, ",")
for _, device := range parsed {
parts := strings.Split(device, ":")
d := osddaemon.DesiredDevice{Name: parts[0], OSDsPerDevice: 1}
if len(parts) > 1 {
count, err := strconv.Atoi(parts[1])
if err != nil {
return nil, fmt.Errorf("error parsing count from devices (%s). %+v", devices, err)
}
if count < 1 {
return nil, fmt.Errorf("osds per device should be greater than 0 (%s)", parts[1])
}
d.OSDsPerDevice = count
}
result = append(result, d)
}

logger.Infof("desired devices to configure osds: %+v", result)
return result, nil
}
51 changes: 51 additions & 0 deletions cmd/rook/ceph/osd_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/*
Copyright 2018 The Rook Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package ceph

import (
"testing"

"github.com/stretchr/testify/assert"
)

func TestParseDesiredDevices(t *testing.T) {
devices := "sda,sdb,nvme01:5"
result, err := parseDevices(devices)
assert.Nil(t, err)
assert.Equal(t, 3, len(result))
assert.Equal(t, "sda", result[0].Name)
assert.Equal(t, "sdb", result[1].Name)
assert.Equal(t, "nvme01", result[2].Name)
assert.Equal(t, 1, result[0].OSDsPerDevice)
assert.Equal(t, 1, result[1].OSDsPerDevice)
assert.Equal(t, 5, result[2].OSDsPerDevice)
assert.False(t, result[0].IsFilter)
assert.False(t, result[1].IsFilter)
assert.False(t, result[2].IsFilter)

// negative osd count is not allowed
devices = "nvme01:-5"
result, err = parseDevices(devices)
assert.Nil(t, result)
assert.NotNil(t, err)

// 0 osd count is not allowed
devices = "nvme01:0"
result, err = parseDevices(devices)
assert.Nil(t, result)
assert.NotNil(t, err)
}
2 changes: 1 addition & 1 deletion pkg/clusterd/disk.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ func DiscoverDevices(executor exec.Executor) ([]*sys.LocalDisk, error) {
if diskType != sys.PartType {
diskUUID, err = sys.GetDiskUUID(d, executor)
if err != nil {
logger.Warningf("skipping device %s with an unknown uuid. %+v", d, err)
logger.Warningf("device %s has an unknown uuid. %+v", d, err)
continue
}
}
Expand Down
Loading

0 comments on commit a85aa84

Please sign in to comment.