Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fluent-operator] Fluentd access to secured OpenSearch domain using IAM #323

Open
kaiohenricunha opened this issue Mar 15, 2023 · 0 comments

Comments

@kaiohenricunha
Copy link

kaiohenricunha commented Mar 15, 2023

Describe the issue

I'm using the fluent-operator to deploy fluentdbit to collect logs and fluentd to process and send to an OpenSearch domain with advanced security configuration.

It works with open domains, but not with secured ones.

I noticed the Operator creates a Service Account for Fluentbit and Fluentd by default. I then proceeded to attach an IAM Role for Service Account(IRSA) to Fluentd's Service Account with the following inlinePolicy:

apiVersion: auth.XXX.XXX.com/v1
kind: IRSA
metadata:
  name: fluent-test
  namespace: fluent-system
  annotations:
    auth.XXX.XXX.com/serviceaccount: managed
spec:
  serviceAccount: fluentd
  nameOverride: fluent-test
  path: /XXXX/
  inlinePolicy: |
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "es:*",
                "Resource": [
                    "arn:aws:es:*:XXXX:domain/*"
                ]
            }
        ]
    }

But the Fluentd pod still can't communicate with the specified domain:

The client is unable to verify distribution due to security privileges on the server side. Some functionality may not be compatible if the server is running an unsupported product.
2023-03-15 09:27:50 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. [401]
2023-03-15 09:27:50 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Remaining retry: 14. Retry to communicate after 2 second(s).
2023-03-15 09:27:54 +0000 [warn]: #0 [ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::fluentd-output-opensearch-0] Could not communicate to OpenSearch, resetting connection and trying again. [401]

After applying the IRSA, its irsa-operator generates the equivalent role in AWS with the correct inlinePolicy and even mentions the OpenSearch Service as "Allowed Services". It also correctly attaches the IRSA to the fluentd service account in the EKS cluster.

I've also used the IAM Policy Simulator, which seems to indicate my role/policy is correct:

image

I'm starting to wonder if it's possible at all to use IRSA to give fluentd access to a secured OpenSearch domain...

I noticed that fluentbit output plugin for opensearch has some parameters to deal with authentication and IAM roles, but fluentd's doesn't.

Is it an unavoidable limitation? Has anyone ever used the fluent-operator in a fluentbit-fluentd mode with fluentd using IRSA to connect to AWS OpenSearch?

To Reproduce

  1. Provision an OpenSearch domain with advanced security options using this Terraform provider.

I used the following inputs:

  dedicated_master_enabled        = "true"
  dedicated_master_count          = "3"
  dedicated_master_type           = "r5.large.search"
  automated_snapshot_start_hour   = "0"
  domain_name                     = "any-name"
  engine_version                  = "OpenSearch_2.3"
  instance_type                   = "m5.large.search"
  instance_count                  = 3
  subnet_ids                      = [...]
  volume_size                     = 50
  vpc_id                          = your-vpc-id
  default_zone_awareness_config   = false
  zone_awareness_enabled          = true
  create_iam_service_linked_role  = "false"
  encryption_enabled              = "true"
  enforce_https                   = "true"
  node_to_node_encryption_enabled = "true"
  retention_in_days               = 7
  warm_instance_enabled           = "true"
  warm_instance_type              = "ultrawarm1.medium.search"
  warm_instance_count             = 2
  cold_storage_enabled            = "true"
  sg_egress_all_enabled           = "true"
  sg_ingress_443_enabled          = "true"
  sg_ingress_9200_enabled         = "true"
  # Custom Endpoint
  #custom_endpoint_fqdn            = "your-custom-endpoint-fqdn"
  #custom_endpoint_certificate_arn = ...
  # SSO
  advanced_security_enabled       = "true"
  anonymous_auth_enabled          = "false"
  master_user_arn                 = "..."
  saml_master_user_name           = "..."
  saml_master_backend_role        = "..."
  internal_user_database_enabled  = "false"
  ## Okta Integration
  saml_enabled                    = "true"
  saml_entity_id                  = "http://www.okta.com/XXXX"
  saml_metadata_content           = file("./saml-metadata.xml")
  # COGNITO
  /*
  cognito_options_enabled         = "true"
  cognito_user_pool_id            = "us-west-2_xxxx"
  cognito_identity_pool_id        = "..."
  cognito_role_arn                = "XXX"
  */
  1. Install the fluent-operator, fluentbit and fluentd as instructed in the How did you install fluent operator? section below.

Expected behavior

  1. All fluent-operator, fluentbit and fluentd pods up and running.
  2. Fluentbit collecting logs and forwarding to fluentd.
  3. Fluentd shipping logs to a secured OpenSearch domain.

Your Environment

- Fluent Operator version: 2.0.1
- Container Runtime: Docker
- Operating system: Linux(Ubuntu)
- Kernel version: 5.4.0-135-generic

How did you install fluent operator?

I installed the operator via helm chart with fluentbit and fluentd disabled:

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
helm install fluent-operator fluent/fluent-operator --create-namespace -n fluent-system --version 2.0.2 --values values.yaml

My custom values.yaml had the following configuration:

  containerRuntime: docker
  Kubernetes: false
  operator:
    initcontainer:
      repository: "docker"
      tag: "20.10"
    container:
      repository: "kubesphere/fluent-operator"
      tag: v2.0.1
    resources:
      limits:
        cpu: 100m
        memory: 60Mi
      requests:
        cpu: 100m
        memory: 20Mi
    logLevel: debug
  fluentd:
    enable: false

I then, applied fluentbit and fluentd manifests manually:

apiVersion: fluentbit.fluent.io/v1alpha2
kind: FluentBit
metadata:
  labels:
    app.kubernetes.io/name: fluent-bit
  name: fluent-bit
  namespace: fluent-system
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/edge
            operator: DoesNotExist
  fluentBitConfigName: fluent-bit-config
  image: kubesphere/fluent-bit:v2.0.9
  positionDB:
    hostPath:
      path: /var/lib/fluent-bit/
  resources:
    limits:
      cpu: 500m
      memory: 200Mi
    requests:
      cpu: 10m
      memory: 25Mi
  tolerations:
  - operator: Exists
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: docker
spec:
  systemd:
    db: /fluent-bit/tail/systemd.db
    dbSync: Normal
    path: /var/log/journal
    systemdFilter:
    - _SYSTEMD_UNIT=docker.service
    - _SYSTEMD_UNIT=kubelet.service
    tag: service.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterInput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: tail
spec:
  tail:
    db: /fluent-bit/tail/pos.db
    dbSync: Normal
    memBufLimit: 5MB
    parser: docker
    path: /var/log/containers/*.log
    refreshIntervalSeconds: 10
    skipLongLines: true
    tag: kube.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFluentBitConfig
metadata:
  labels:
    app.kubernetes.io/name: fluent-bit
  name: fluent-bit-config
spec:
  filterSelector:
    matchLabels:
      fluentbit.fluent.io/enabled: "true"
  inputSelector:
    matchLabels:
      fluentbit.fluent.io/enabled: "true"
  outputSelector:
    matchLabels:
      fluentbit.fluent.io/enabled: "true"
  service:
    parsersFile: parsers.conf
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterFilter
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: kubernetes
spec:
  filters:
  - kubernetes:
      annotations: true
      kubeCAFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      kubeTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubeURL: https://kubernetes.default.svc:443
      labels: true
  - nest:
      addPrefix: kubernetes_
      nestedUnder: kubernetes
      operation: lift
  - modify:
      rules:
      - remove: stream
      - remove: kubernetes_pod_id
      - remove: kubernetes_host
      - remove: kubernetes_container_hash
  - nest:
      nestUnder: kubernetes
      operation: nest
      removePrefix: kubernetes_
      wildcard:
      - kubernetes_*
  match: kube.*
---
apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
  labels:
    fluentbit.fluent.io/component: logging
    fluentbit.fluent.io/enabled: "true"
  name: fluentd
spec:
  forward:
    host: fluentd.fluent-system.svc
    port: 24224
  matchRegex: (?:kube|service)\.(.*)
apiVersion: fluentd.fluent.io/v1alpha1
kind: Fluentd
metadata:
  name: fluentd
  namespace: fluent-system
  labels:
    app.kubernetes.io/name: fluentd
spec:
  globalInputs:
    - forward:
        bind: 0.0.0.0
        port: 24224
  replicas: 1
  image: kubesphere/fluentd:v1.15.3
  resources:
    limits:
      cpu: 500m
      memory: 500Mi
    requests:
      cpu: 100m
      memory: 128Mi
  fluentdCfgSelector:
    matchLabels:
      config.fluentd.fluent.io/enabled: "true"
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFluentdConfig
metadata:
  labels:
    config.fluentd.fluent.io/enabled: "true"
  name: fluentd-config
spec:
  clusterFilterSelector:
    matchLabels:
      filter.fluentd.fluent.io/enabled: "true"
  clusterOutputSelector:
    matchLabels:
      output.fluentd.fluent.io/enabled: "true"
  watchedNamespaces: # find an easier way to do this or open an issue
    - kube-system
    - fluent-system
    - default
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
  labels:
    output.fluentd.fluent.io/enabled: "true"
  name: fluentd-output-opensearch
spec:
  outputs:
  - opensearch:
      host: vpc-XXX-us-XXX-XXXX-XXXX.us-XXX-XXX.es.amazonaws.com
      logstashFormat: true
      logstashPrefix: logs
      port: 443
      scheme: https
    logLevel: debug # change to info after OpenSearchErrorHandler is fixed
@kaiohenricunha kaiohenricunha changed the title [fluent-operator] Fluentd access to secured OpenSearch domain using IRSA [fluent-operator] Fluentd access to secured OpenSearch domain using IAM Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant