Ray

Ray #

More information about Ray can be found here.

Set up Cluster #

Create a new KWOK cluster using the kind runtime

kwokctl create cluster --runtime kind

Create Node #

Create a KWOK fake node

kubectl apply -f https://kwok.sigs.k8s.io/examples/node.yaml

Verify that the nodes are created and running

kubectl get node
kNAME                      STATUS                     ROLES           AGE     VERSION
kwok-kwok-control-plane   Ready,SchedulingDisabled   control-plane   3m33s   v1.33.0
kwok-node-0               Ready                      agent           3m11s   kwok-v0.7.0

Deploy Ray Operator #

Add the KubeRay Helm repository and install the KubeRay operator using Helm

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2

Patch the KubeRay operator deployment to run on the control plane node

kubectl patch deploy kuberay-operator --type=json -p='[{"op":"add","path":"/spec/template/spec/nodeName","value":"kwok-kwok-control-plane"}]'

Create Mock Ray Head #

Create a ConfigMap containing a mock Ray head server script

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: mock-head
  namespace: default
data:
  server.py: |
    import json
    import http.server
    
    class MockRayHandler(http.server.BaseHTTPRequestHandler):
        """
        Handle GET /api/jobs/{job_id} requests for mock Ray head API
        """
        def do_GET(self):
            path = self.path.strip('/').split('/')
            
            if len(path) >= 3 and path[0] == 'api' and path[1] == 'jobs':
                self.send_response(200)
                self.send_header('Content-Type', 'application/json')
                self.end_headers()
                self.wfile.write(json.dumps({
                    "job_id": path[2],
                    "status": "SUCCEEDED"
                }).encode())
                return
            
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b'Not Found')
    
    if __name__ == "__main__":
        http.server.HTTPServer(('', 8265), MockRayHandler).serve_forever()
EOF

Create a Deployment for the mock Ray head service

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mock-head
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mock-head
  template:
    metadata:
      labels:
        app: mock-head
    spec:
      nodeName: kwok-kwok-control-plane
      containers:
      - name: server
        image: docker.io/library/python:3.11-alpine
        ports:
        - containerPort: 8265
        volumeMounts:
        - name: mock-head
          mountPath: /app
        command: ["python"]
        args: ["/app/server.py"]
      volumes:
      - name: mock-head
        configMap:
          name: mock-head
EOF

Create a Service to expose the mock Ray head

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: mock-head
  namespace: default
spec:
  ports:
  - port: 8265
    targetPort: 8265
    protocol: TCP
  selector:
    app: mock-head
EOF

Update CoreDNS #

Get the current CoreDNS configuration and save it to a file

kubectl get configmap coredns -n kube-system -o yaml > coredns.yaml

Add a rewrite rule to redirect Ray head service DNS queries to the mock service

sed -i '' '/ready/a\
        rewrite name regex (.+)-head-svc\.(.+)\.svc\.cluster\.local mock-head.default.svc.cluster.local
' coredns.yaml

Apply the updated CoreDNS configuration

kubectl apply -f coredns.yaml

Restart the CoreDNS deployment to run on the control plane node to reload the configuration

kubectl patch deployment coredns -n kube-system --type=json -p='[{"op":"add","path":"/spec/template/spec/nodeName","value":"kwok-kwok-control-plane"}]'

Test RayJob #

Deploy a sample Ray job to test the setup

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/pytorch-mnist/ray-job.pytorch-mnist.yaml

Check the status of the Ray job

kubectl get rayjob
NAME                   JOB STATUS   DEPLOYMENT STATUS   START TIME             END TIME               AGE
rayjob-pytorch-mnist   SUCCEEDED    Complete            2025-08-16T18:52:01Z   2025-08-16T18:52:02Z   5m31s