From 6de5e14be21947c166ffe43b1e1ce9b2ec1d2304 Mon Sep 17 00:00:00 2001 From: Andreas Naoum <49308613+andreasnaoum@users.noreply.github.com> Date: Thu, 4 Apr 2024 18:39:43 +0200 Subject: [PATCH] Updated READMEs for the examples - Batch 1 (#5620) Updated READMEs for the examples: Detect and Track Objects Dicom MRI Face Tracking Gesture Detection Human Pose Tracking LiDAR Live Camera Edge Detection Live Depth Sensor ### What ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using newly built examples: [app.rerun.io](https://app.rerun.io/pr/5620/index.html) * Using examples from latest `main` build: [app.rerun.io](https://app.rerun.io/pr/5620/index.html?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [app.rerun.io](https://app.rerun.io/pr/5620/index.html?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! - [PR Build Summary](https://build.rerun.io/pr/5620) - [Docs preview](https://rerun.io/preview/53229bd36daf2158b782ef62b9d76bd90befbdee/docs) - [Examples preview](https://rerun.io/preview/53229bd36daf2158b782ef62b9d76bd90befbdee/examples) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) --------- Co-authored-by: Nikolaus West Co-authored-by: Emil Ernerfeldt --- docs/cspell.json | 7 + .../python/detect_and_track_objects/README.md | 159 +++++- examples/python/dicom_mri/README.md | 45 +- examples/python/face_tracking/README.md | 196 +++++-- examples/python/gesture_detection/README.md | 490 ++++-------------- examples/python/human_pose_tracking/README.md | 11 +- examples/python/lidar/README.md | 44 +- .../live_camera_edge_detection/README.md | 63 ++- examples/python/live_depth_sensor/README.md | 103 +++- 9 files changed, 677 insertions(+), 441 deletions(-) diff --git a/docs/cspell.json b/docs/cspell.json index 9e490a316ce8..e8af69834881 100644 --- a/docs/cspell.json +++ b/docs/cspell.json @@ -48,6 +48,8 @@ "binsearching", "binstall", "binutils", + "blendshape", + "blendshapes", "Birger", "Birkl", "booktitle", @@ -124,6 +126,8 @@ "ewebsock", "extrinsics", "farbfeld", + "FACEMESH", + "facemesh", "Farooq", "Feichtenhofer", "fieldname", @@ -177,6 +181,7 @@ "keypointid", "keypoints", "Kirillov", + "klass", "kpreid", "Landmarker", "Larsson", @@ -329,6 +334,7 @@ "scipy", "scrollwheel", "segs", + "Segmentations", "serde", "Shaohui", "Shap", @@ -404,6 +410,7 @@ "Viktor", "virtualenv", "visualizability", + "voxels", "Vizzo", "vstack", "vsuryamurthy", diff --git a/examples/python/detect_and_track_objects/README.md b/examples/python/detect_and_track_objects/README.md index 8dacfd1fbb00..81301692955b 100644 --- a/examples/python/detect_and_track_objects/README.md +++ b/examples/python/detect_and_track_objects/README.md @@ -1,13 +1,14 @@ + @@ -16,11 +17,161 @@ channel = "release" -Another more elaborate example applying simple object detection and segmentation on a video using the Huggingface `transformers` library. Tracking across frames is performed using [CSRT](https://arxiv.org/pdf/1611.08461.pdf) from OpenCV. +Visualize object detection and segmentation using the [Huggingface's Transformers](https://huggingface.co/docs/transformers/index) and [CSRT](https://arxiv.org/pdf/1611.08461.pdf) from OpenCV. + +# Used Rerun Types +[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`SegmentationImage`](https://www.rerun.io/docs/reference/types/archetypes/segmentation_image), [`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context), [`Boxes2D`](https://www.rerun.io/docs/reference/types/archetypes/boxes2d), [`TextLog`](https://www.rerun.io/docs/reference/types/archetypes/text_log) + +# Background +In this example, CSRT (Channel and Spatial Reliability Tracker), a tracking API introduced in OpenCV, is employed for object detection and tracking across frames. +Additionally, the example showcases basic object detection and segmentation on a video using the Huggingface transformers library. + + +# Logging and Visualizing with Rerun +The visualizations in this example were created with the following Rerun code. + + +## Timelines +For each processed video frame, all data sent to Rerun is associated with the [`timelines`](https://www.rerun.io/docs/concepts/timelines) `frame_idx`. + +```python +rr.set_time_sequence("frame", frame_idx) +``` + +## Video +The input video is logged as a sequence of [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) to the `image` entity. + +```python +rr.log( + "image", + rr.Image(rgb).compress(jpeg_quality=85) +) +``` + +Since the detection and segmentation model operates on smaller images the resized images are logged to the separate `segmentation/rgb_scaled` entity. +This allows us to subsequently visualize the segmentation mask on top of the video. + +```python +rr.log( + "segmentation/rgb_scaled", + rr.Image(rgb_scaled).compress(jpeg_quality=85) +) +``` + +## Segmentations +The segmentation results is logged through a combination of two archetypes. +The segmentation image itself is logged as an +[`SegmentationImage`](https://www.rerun.io/docs/reference/types/archetypes/segmentation_image) and +contains the id for each pixel. It is logged to the `segmentation` entity. + + +```python +rr.log( + "segmentation", + rr.SegmentationImage(mask) +) +``` + +The color and label for each class is determined by the +[`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context) which is +logged to the root entity using `rr.log("/", …, timeless=True)` as it should apply to the whole sequence and all +entities that have a class id. + +```python +class_descriptions = [ rr.AnnotationInfo(id=cat["id"], color=cat["color"], label=cat["name"]) for cat in coco_categories ] +rr.log( + "/", + rr.AnnotationContext(class_descriptions), + timeless=True +) +``` + +## Detections +The detections and tracked bounding boxes are visualized by logging the [`Boxes2D`](https://www.rerun.io/docs/reference/types/archetypes/boxes2d) to Rerun. + +### Detections +```python +rr.log( + "segmentation/detections/things", + rr.Boxes2D( + array=thing_boxes, + array_format=rr.Box2DFormat.XYXY, + class_ids=thing_class_ids, + ), +) +``` -For more info see [here](https://huggingface.co/docs/transformers/index) +```python +rr.log( + f"image/tracked/{self.tracking_id}", + rr.Boxes2D( + array=self.tracked.bbox_xywh, + array_format=rr.Box2DFormat.XYWH, + class_ids=self.tracked.class_id, + ), +) +``` +### Tracked bounding boxes +```python +rr.log( + "segmentation/detections/background", + rr.Boxes2D( + array=background_boxes, + array_format=rr.Box2DFormat.XYXY, + class_ids=background_class_ids, + ), +) +``` +The color and label of the bounding boxes is determined by their class id, relying on the same +[`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context) as the +segmentation images. This ensures that a bounding box and a segmentation image with the same class id will also have the +same color. + +Note that it is also possible to log multiple annotation contexts should different colors and / or labels be desired. +The annotation context is resolved by seeking up the entity hierarchy. + +## Text Log +Rerun integrates with the [Python logging module](https://docs.python.org/3/library/logging.html). +Through the [`TextLog`](https://www.rerun.io/docs/reference/types/archetypes/text_log#textlogintegration) text at different importance level can be logged. After an initial setup that is described on the +[`TextLog`](https://www.rerun.io/docs/reference/types/archetypes/text_log#textlogintegration), statements +such as `logging.info("…")`, `logging.debug("…")`, etc. will show up in the Rerun viewer. + +```python +def setup_logging() -> None: + logger = logging.getLogger() + rerun_handler = rr.LoggingHandler("logs") + rerun_handler.setLevel(-1) + logger.addHandler(rerun_handler) + +def main() -> None: + # … existing code … + setup_logging() # setup logging + track_objects(video_path, max_frame_count=args.max_frame) # start tracking +``` +In the viewer you can adjust the filter level and look at the messages time-synchronized with respect to other logged data. + +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` + +Install the necessary libraries specified in the requirements file: ```bash pip install -r examples/python/detect_and_track_objects/requirements.txt -python examples/python/detect_and_track_objects/main.py +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/detect_and_track_objects/main.py # run the example +``` + +If you wish to customize it for various videos, adjust the maximum frames, explore additional features, or save it use the CLI with the `--help` option for guidance: + +```bash +python examples/python/detect_and_track_objects/main.py --help ``` diff --git a/examples/python/dicom_mri/README.md b/examples/python/dicom_mri/README.md index 2f4477b8af60..067c7a92f83b 100644 --- a/examples/python/dicom_mri/README.md +++ b/examples/python/dicom_mri/README.md @@ -16,9 +16,50 @@ channel = "main" -Example using a [DICOM](https://en.wikipedia.org/wiki/DICOM) MRI scan. This demonstrates the flexible tensor slicing capabilities of the Rerun viewer. +Visualize a [DICOM](https://en.wikipedia.org/wiki/DICOM) MRI scan. This demonstrates the flexible tensor slicing capabilities of the Rerun viewer. +# Used Rerun Types +[`Tensor`](https://www.rerun.io/docs/reference/types/archetypes/tensor), [`TextDocument`](https://www.rerun.io/docs/reference/types/archetypes/text_document) + +# Background +Digital Imaging and Communications in Medicine (DICOM) serves as a technical standard for the digital storage and transmission of medical images. In this instance, an MRI scan is visualized using Rerun. + +# Logging and Visualizing with Rerun + +The visualizations in this example were created with just the following line. +```python +rr.log("tensor", rr.Tensor(voxels_volume_u16, dim_names=["right", "back", "up"])) +``` + +A `numpy.array` named `voxels_volume_u16` representing volumetric MRI intensities with a shape of `(512, 512, 512)`. +To visualize this data effectively in Rerun, we can log the `numpy.array` as [`Tensor`](https://www.rerun.io/docs/reference/types/archetypes/tensor) to the `tensor` entity. + +In the Rerun viewer you can also inspect the data in detail. The `dim_names` provided in the above call to `rr.log` help to +give semantic meaning to each axis. After selecting the tensor view, you can adjust various settings in the Blueprint +settings on the right-hand side. For example, you can adjust the color map, the brightness, which dimensions to show as +an image and which to select from, and more. + +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` + +Install the necessary libraries specified in the requirements file: ```bash pip install -r examples/python/dicom_mri/requirements.txt -python examples/python/dicom_mri/main.py +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/dicom_mri/main.py # run the example +``` + +If you wish to customize it, explore additional features, or save it, use the CLI with the `--help` option for guidance: + +```bash +python examples/python/dicom_mri/main.py --help ``` diff --git a/examples/python/face_tracking/README.md b/examples/python/face_tracking/README.md index c5c491e65efc..a93bb7255e75 100644 --- a/examples/python/face_tracking/README.md +++ b/examples/python/face_tracking/README.md @@ -1,6 +1,7 @@ @@ -15,44 +16,179 @@ thumbnail_dimensions = [480, 480] -Use the [MediaPipe](https://google.github.io/mediapipe/) Face Detector and Landmarker solutions to detect and track a human face in image, videos, and camera stream. +Use the [MediaPipe](https://google.github.io/mediapipe/) Face Detector and Landmarker solutions to detect and track a human face in image, video, and camera stream. -```bash -pip install -r examples/python/face_tracking/requirements.txt -python examples/python/face_tracking/main.py +# Used Rerun Types +[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`Points2D`](https://www.rerun.io/docs/reference/types/archetypes/points2d), [`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d), [`Boxes2D`](https://www.rerun.io/docs/reference/types/archetypes/boxes2d), [`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context), [`Scalar`](https://www.rerun.io/docs/reference/types/archetypes/scalar) + +# Background +The face and face landmark detection technology aims to give the ability of the devices to interpret face movements and facial expressions as commands or inputs. +At the core of this technology, a pre-trained machine-learning model analyses the visual input, locates face and identifies face landmarks and blendshape scores (coefficients representing facial expression). +Human-Computer Interaction, Robotics, Gaming, and Augmented Reality are among the fields where this technology shows significant promise for applications. + +In this example, the [MediaPipe](https://developers.google.com/mediapipe/) Face and Face Landmark Detection solutions were utilized to detect human face, detect face landmarks and identify facial expressions. +Rerun was employed to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior. + +# Logging and Visualizing with Rerun +The visualizations in this example were created with the following Rerun code. + +## Timelines + +For each processed video frame, all data sent to Rerun is associated with the two [`timelines`](https://www.rerun.io/docs/concepts/timelines) `time` and `frame_idx`. + +```python +rr.set_time_seconds("time", bgr_frame.time) +rr.set_time_sequence("frame_idx", bgr_frame.idx) +``` + +## Video +The input video is logged as a sequence of [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) objects to the 'Video' entity. +```python +rr.log( + "video/image", + rr.Image(frame).compress(jpeg_quality=75) +) +``` + +## Face Landmark Points +Logging the face landmarks involves specifying connections between the points, extracting face landmark points and logging them to the Rerun SDK. +The 2D points are visualized over the video/image for a better understanding and visualization of the face. +The 3D points allows the creation of a 3D model of the face reconstruction for a more comprehensive representation of the face. + +The 2D and 3D points are logged through a combination of two archetypes. First, a timeless +[`ClassDescription`](https://www.rerun.io/docs/reference/types/datatypes/class_description) is logged, that contains the information which maps keypoint ids to labels and how to connect +the keypoints. Defining these connections automatically renders lines between them. +Second, the actual keypoint positions are logged in 2D and 3D as [`Points2D`](https://www.rerun.io/docs/reference/types/archetypes/points2d) and [`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d) archetypes, respectively. + +### Label Mapping and Keypoint Connections + +An annotation context is logged with one class ID assigned per facial feature. The class description includes the connections between corresponding keypoints extracted from the MediaPipe face mesh solution. +A class ID array is generated to match the class IDs in the annotation context with keypoint indices (to be utilized as the class_ids argument to rr.log). +```python +# Initialize a list of facial feature classes from MediaPipe face mesh solution +classes = [ + mp.solutions.face_mesh.FACEMESH_LIPS, + mp.solutions.face_mesh.FACEMESH_LEFT_EYE, + mp.solutions.face_mesh.FACEMESH_LEFT_IRIS, + mp.solutions.face_mesh.FACEMESH_LEFT_EYEBROW, + mp.solutions.face_mesh.FACEMESH_RIGHT_EYE, + mp.solutions.face_mesh.FACEMESH_RIGHT_EYEBROW, + mp.solutions.face_mesh.FACEMESH_RIGHT_IRIS, + mp.solutions.face_mesh.FACEMESH_FACE_OVAL, + mp.solutions.face_mesh.FACEMESH_NOSE, +] + +# Initialize class descriptions and class IDs array +self._class_ids = [0] * mp.solutions.face_mesh.FACEMESH_NUM_LANDMARKS_WITH_IRISES +class_descriptions = [] + +# Loop through each facial feature class +for i, klass in enumerate(classes): + # MediaPipe only provides connections for class, not actual class per keypoint. So we have to extract the + # classes from the connections. + ids = set() + for connection in klass: + ids.add(connection[0]) + ids.add(connection[1]) + + for id_ in ids: + self._class_ids[id_] = i + + # Append class description with class ID and keypoint connections + class_descriptions.append( + rr.ClassDescription( + info=rr.AnnotationInfo(id=i), + keypoint_connections=klass, + ) + ) + +# Log annotation context for video/landmarker and reconstruction entities +rr.log("video/landmarker", rr.AnnotationContext(class_descriptions), timeless=True) +rr.log("reconstruction", rr.AnnotationContext(class_descriptions), timeless=True) + +rr.log("reconstruction", rr.ViewCoordinates.RDF, timeless=True) # properly align the 3D face in the viewer +``` + +With the below annotation, the keypoints will be connected with lines to enhance visibility in the `video/detector` entity. +```python +rr.log( + "video/detector", + rr.ClassDescription( + info=rr.AnnotationInfo(id=0), keypoint_connections=[(0, 1), (1, 2), (2, 0), (2, 3), (0, 4), (1, 5)] + ), + timeless=True, +) +``` +### Bounding Box + +```python +rr.log( + f"video/detector/faces/{i}/bbox", + rr.Boxes2D( + array=[bbox.origin_x, bbox.origin_y, bbox.width, bbox.height], array_format=rr.Box2DFormat.XYWH + ), + rr.AnyValues(index=index, score=score), +) ``` -## Usage -CLI usage help is available using the `--help` option: +### 2D Points + +```python +rr.log( + f"video/detector/faces/{i}/keypoints", + rr.Points2D(pts, radii=3, keypoint_ids=list(range(6))) +) +``` +```python +rr.log( + f"video/landmarker/faces/{i}/landmarks", + rr.Points2D(pts, radii=3, keypoint_ids=keypoint_ids, class_ids=self._class_ids), +) ``` -$ python examples/python/face_tracking/main.py --help -usage: main.py [-h] [--demo-image] [--image IMAGE] [--video VIDEO] [--camera CAMERA] [--max-frame MAX_FRAME] [--max-dim MAX_DIM] [--num-faces NUM_FACES] [--headless] [--connect] [--serve] [--addr ADDR] [--save SAVE] -Uses the MediaPipe Face Detection to track a human pose in video. +### 3D Points -options: - -h, --help show this help message and exit - --demo-image Run on a demo image automatically downloaded - --image IMAGE Run on the provided image - --video VIDEO Run on the provided video file. - --camera CAMERA Run from the camera stream (parameter is the camera ID, usually 0 - --max-frame MAX_FRAME - Stop after processing this many frames. If not specified, will run until interrupted. - --max-dim MAX_DIM Resize the image such as its maximum dimension is not larger than this value. - --num-faces NUM_FACES - Max number of faces detected by the landmark model (temporal smoothing is applied only for a value of 1). - --headless Don't show GUI - --connect Connect to an external viewer - --serve Serve a web viewer (WARNING: experimental feature) - --addr ADDR Connect to this ip:port - --save SAVE Save data to a .rrd file at this path +```python +rr.log( + f"reconstruction/faces/{i}", + rr.Points3D( + [(lm.x, lm.y, lm.z) for lm in landmark], + keypoint_ids=keypoint_ids, + class_ids=self._class_ids, + ), +) ``` -Here is an overview of the options specific to this example: +## Scalar +Blendshapes are essentially predefined facial expressions or configurations that can be detected by the face landmark detection model. Each blendshape typically corresponds to a specific facial movement or expression, such as blinking, squinting, smiling, etc. + +The blendshapes are logged along with their corresponding scores. +```python +for blendshape in blendshapes: + if blendshape.category_name in BLENDSHAPES_CATEGORIES: + rr.log(f"blendshapes/{i}/{blendshape.category_name}", rr.Scalar(blendshape.score)) +``` -- *Running modes*: By default, this example streams images from the default webcam. Another webcam can be used by providing a camera index with the `--camera` option. Alternatively, images can be read from a video file (using `--video PATH`) or a single image file (using `--image PATH`). Also, a demo image with two faces can be automatically downloaded and used with `--demo-image`. -- *Max face count*: The maximum face detected by MediaPipe Face Landmarker can be set using `--num-faces NUM`. It defaults to 1, in which case the Landmarker applies temporal smoothing. This parameter doesn't affect MediaPipe Face Detector, which always attempts to detect all faces present in the input images. -- *Image downscaling*: By default, this example logs and runs on the native resolution of the provided images. Input images can be downscaled to a given maximum dimension using `--max-dim DIM`. -- *Limiting frame count*: When running from a webcam or a video file, this example can be set to stop after a given number of frames using `--max-frame MAX_FRAME`. +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` +Install the necessary libraries specified in the requirements file: +```bash +pip install -r examples/python/face_tracking/requirements.txt +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/face_tracking/main.py # run the example +``` +If you wish to customize it for various videos, adjust the maximum frames, explore additional features, or save it use the CLI with the `--help` option for guidance: +```bash +python examples/python/face_tracking/main.py --help +``` diff --git a/examples/python/gesture_detection/README.md b/examples/python/gesture_detection/README.md index 58231fe484a0..f717c31a4474 100644 --- a/examples/python/gesture_detection/README.md +++ b/examples/python/gesture_detection/README.md @@ -1,7 +1,7 @@ @@ -15,429 +15,133 @@ thumbnail_dimensions = [480, 480] -# Run +Use the [MediaPipe](https://google.github.io/mediapipe/) Hand Landmark and Gesture Detection solutions to +track hands and recognize gestures in images, video, and camera stream. -```bash -# Install the required Python packages specified in the requirements file -pip install -r examples/python/gesture_detection/requirements.txt -python examples/python/gesture_detection/main.py -``` - -# Usage - -CLI usage help is available using the `--help` option: - -```bash -$ python examples/python/gesture_detection/main.py --help -usage: main.py [-h] [--demo-image] [--demo-video] [--image IMAGE] - [--video VIDEO] [--camera CAMERA] [--max-frame MAX_FRAME] - [--headless] [--connect] [--serve] [--addr ADDR] [--save SAVE] - [-o] - -Uses the MediaPipe Gesture Recognition to track a hand and recognize gestures -in image or video. - -optional arguments: - -h, --help show this help message and exit - --demo-image Run on a demo image automatically downloaded - --demo-video Run on a demo image automatically downloaded. - --image IMAGE Run on the provided image - --video VIDEO Run on the provided video file. - --camera CAMERA Run from the camera stream (parameter is the camera - ID, usually 0; or maybe 1 on mac) - --max-frame MAX_FRAME - Stop after processing this many frames. If not - specified, will run until interrupted. - --headless Don\'t show GUI - --connect Connect to an external viewer - --serve Serve a web viewer (WARNING: experimental feature) - --addr ADDR Connect to this ip:port - --save SAVE Save data to a .rrd file at this path - -o, --stdout Log data to standard output, to be piped into a Rerun - Viewer -``` - -Here is an overview of the options specific to this example: - -- ***Running modes*:** By default, this example streams images from the default webcam. Another webcam can be used by - providing a camera index with the `--camera` option. Alternatively, images can be read from a video file ( - using `--video PATH`) or a single image file (using `-image PATH`). Also, a demo image can be automatically downloaded - and used with `--demo-image`. Also, a demo video can be automatically downloaded and used with `--demo-video`. -- ***Limiting frame count*:** When running from a webcam or a video file, this example can be set to stop after a given - number of frames using `--max-frame MAX_FRAME`. - -# Overview - -Use the [MediaPipe](https://google.github.io/mediapipe/) Gesture detection and Gesture landmark detection solutions to -track hands and recognize gestures in images and videos. - -Logging Details: - -1. Hand Landmarks as 2D Points: - - - Extracts hand landmark points as normalized 2D coordinates. - - - Utilizes image width and height for conversion into image coordinates. - - - Logs the 2D points to the Rerun SDK. - - -2. Hand Landmarks as 3D Points: - - - Detects hand landmarks using MediaPipe solutions. - - - Converts the detected hand landmarks into 3D coordinates. - - - Logs the 3D points to the Rerun SDK. - - -3. Gesture Detection Results: +# Used Rerun Types +[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`Points2D`](https://www.rerun.io/docs/reference/types/archetypes/points2d), [`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d), [`LineStrips2D`](https://www.rerun.io/docs/reference/types/archetypes/line_strips2d), [`ClassDescription`](https://www.rerun.io/docs/reference/types/datatypes/class_description), [`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context), [`TextDocument`](https://www.rerun.io/docs/reference/types/archetypes/text_document) - - Utilizes the Gesture Detection solution from MediaPipe. +# Background +The hand tracking and gesture recognition technology aims to give the ability of the devices to interpret hand movements and gestures as commands or inputs. +At the core of this technology, a pre-trained machine-learning model analyses the visual input and identifies hand landmarks and hand gestures. +The real applications of such technology vary, as hand movements and gestures can be used to control smart devices. +Human-Computer Interaction, Robotics, Gaming, and Augmented Reality are a few of the fields where the potential applications of this technology appear most promising. - - Logs the results of gesture detection as emoji +In this example, the [MediaPipe](https://developers.google.com/mediapipe/) Gesture and Hand Landmark Detection solutions were utilized to detect and track hand landmarks and recognize gestures. +Rerun was employed to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior. -# Logging Data +# Logging and Visualizing with Rerun +The visualizations in this example were created with the following Rerun code. -## Timelines for Video +## Timelines -You can utilize Rerun timelines' functions to associate data with one or more timelines. As a result, each frame of the -video can be linked with its corresponding timestamp. +For each processed video frame, all data sent to Rerun is associated with the two [`timelines`](https://www.rerun.io/docs/concepts/timelines) `time` and `frame_idx`. ```python -def run_from_video_capture(vid: int | str, max_frame_count: int | None) -> None: - """ - Run the detector on a video stream. - - Parameters - ---------- - vid: - The video stream to run the detector on. Use 0/1 for the default camera or a path to a video file. - max_frame_count: - The maximum number of frames to process. If None, process all frames. - """ - cap = cv2.VideoCapture(vid) - fps = cap.get(cv2.CAP_PROP_FPS) - - detector = GestureDetectorLogger(video_mode=True) - - try: - it: Iterable[int] = itertools.count() if max_frame_count is None else range(max_frame_count) - - for frame_idx in tqdm.tqdm(it, desc="Processing frames"): - ret, frame = cap.read() - if not ret: - break - - if np.all(frame == 0): - continue - - frame_time_nano = int(cap.get(cv2.CAP_PROP_POS_MSEC) * 1e6) - if frame_time_nano == 0: - frame_time_nano = int(frame_idx * 1000 / fps * 1e6) - - frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) - - rr.set_time_sequence("frame_nr", frame_idx) - rr.set_time_nanos("frame_time", frame_time_nano) - detector.detect_and_log(frame, frame_time_nano) - rr.log( - "Media/Video", - rr.Image(frame) - ) - - except KeyboardInterrupt: - pass - - cap.release() - cv2.destroyAllWindows() +rr.set_time_sequence("frame_nr", frame_idx) +rr.set_time_nanos("frame_time", frame_time_nano) ``` -## Hand Landmarks as 2D Points - -![gesture_recognition_2d_points](https://github.com/rerun-io/rerun/assets/49308613/7e5dd809-be06-4f62-93a8-4fc03e5dfa0e) - -You can extract hand landmark points as normalized values, utilizing the image's width and height for conversion into -image coordinates. These coordinates are then logged as 2D points to the Rerun SDK. Additionally, you can identify -connections between the landmarks and log them as 2D linestrips. - +## Video +The input video is logged as a sequence of [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) objects to the `Media/Video` entity. ```python -class GestureDetectorLogger: - - def detect_and_log(self, image: npt.NDArray[np.uint8], frame_time_nano: int | None) -> None: - # Recognize gestures in the image - height, width, _ = image.shape - image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) - - recognition_result = ( - self.recognizer.recognize_for_video(image, int(frame_time_nano / 1e6)) - if self._video_mode - else self.recognizer.recognize(image) - ) - - # Clear the values - for log_key in ["Media/Points", "Media/Connections"]: - rr.log(log_key, rr.Clear(recursive=True)) - - if recognition_result.hand_landmarks: - hand_landmarks = recognition_result.hand_landmarks - - # Convert normalized coordinates to image coordinates - points = self.convert_landmarks_to_image_coordinates(hand_landmarks, width, height) - - # Log points to the image and Hand Entity - rr.log( - "Media/Points", - rr.Points2D(points, radii=10, colors=[255, 0, 0]) - ) - - # Obtain hand connections from MediaPipe - mp_hands_connections = mp.solutions.hands.HAND_CONNECTIONS - points1 = [points[connection[0]] for connection in mp_hands_connections] - points2 = [points[connection[1]] for connection in mp_hands_connections] - - # Log connections to the image and Hand Entity - rr.log( - "Media/Connections", - rr.LineStrips2D( - np.stack((points1, points2), axis=1), - colors=[255, 165, 0] - ) - ) +rr.log( + "Media/Video", + rr.Image(frame).compress(jpeg_quality=75) +) ``` -## Hand Landmarks as 3D Points +## Hand Landmark Points +Logging the hand landmarks involves specifying connections between the points, extracting pose landmark points and logging them to the Rerun SDK. +The 2D points are visualized over the video and at a separate entity. +Meanwhile, the 3D points allows the creation of a 3D model of the hand for a more comprehensive representation of the hand landmarks. -![gesture_recognition_3d_points](https://github.com/rerun-io/rerun/assets/49308613/b24bb0e5-57cc-43f0-948b-3480fe9073a2) +The 2D and 3D points are logged through a combination of two archetypes. +For the 2D points, the Points2D and LineStrips2D archetypes are utilized. These archetypes help visualize the points and connect them with lines, respectively. +As for the 3D points, the logging process involves two steps. First, a timeless [`ClassDescription`](https://www.rerun.io/docs/reference/types/datatypes/class_description) is logged, that contains the information which maps keypoint ids to labels and how to connect +the keypoints. Defining these connections automatically renders lines between them. Mediapipe provides the `HAND_CONNECTIONS` variable which contains the list of `(from, to)` landmark indices that define the connections. +Second, the actual keypoint positions are logged in 3D [`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d) archetype. -You can first define the connections between the points using keypoints from Annotation Context in the init function, -and then log them as 3D points. +### Label Mapping and Keypoint Connections ```python - -class GestureDetectorLogger: - - def __init__(self, video_mode: bool = False): - # … existing code … - rr.log( - "/", - rr.AnnotationContext( - rr.ClassDescription( - info=rr.AnnotationInfo(id=0, label="Hand3D"), - keypoint_connections=mp.solutions.hands.HAND_CONNECTIONS - ) - ), - timeless=True, +rr.log( + "/", + rr.AnnotationContext( + rr.ClassDescription( + info=rr.AnnotationInfo(id=0, label="Hand3D"), + keypoint_connections=mp.solutions.hands.HAND_CONNECTIONS, ) - rr.log("Hand3D", rr.ViewCoordinates.RIGHT_HAND_X_DOWN, timeless=True) - - -def detect_and_log(self, image: npt.NDArray[np.uint8], frame_time_nano: int | None) -> None: - # … existing code … - - if recognition_result.hand_landmarks: - hand_landmarks = recognition_result.hand_landmarks + ), + timeless=True, +) - landmark_positions_3d = self.convert_landmarks_to_3d(hand_landmarks) - if landmark_positions_3d is not None: - rr.log( - "Hand3D/Points", - rr.Points3D(landmark_positions_3d, radii=20, class_ids=0, - keypoint_ids=[i for i in range(len(landmark_positions_3d))]), - ) - - # … existing code … +rr.log("Hand3D", rr.ViewCoordinates.LEFT_HAND_Y_DOWN, timeless=True) ``` -## Gesture Detection Presentation - -![Gesture Detection Presentation](https://github.com/rerun-io/rerun/assets/49308613/32cc44f4-28e5-4ed1-b283-f7351a087535) -One effective method to present these results to the viewer is by utilizing a TextDocument along with emojis for -enhanced visual communication. +### 2D Points ```python +# Log points to the image and Hand Entity +for log_key in ["Media/Points", "Hand/Points"]: + rr.log( + log_key, + rr.Points2D(points, radii=10, colors=[255, 0, 0]) + ) -# Emojis from https://github.com/googlefonts/noto-emoji/tree/main -GESTURE_URL = "https://raw.githubusercontent.com/googlefonts/noto-emoji/9cde38ef5ee6f090ce23f9035e494cb390a2b051/png/128/" - -# Mapping of gesture categories to corresponding emojis -GESTURE_PICTURES = { - "None": "emoji_u2754.png", - "Closed_Fist": "emoji_u270a.png", - "Open_Palm": "emoji_u270b.png", - "Pointing_Up": "emoji_u261d.png", - "Thumb_Down": "emoji_u1f44e.png", - "Thumb_Up": "emoji_u1f44d.png", - "Victory": "emoji_u270c.png", - "ILoveYou": "emoji_u1f91f.png" -} - - -class GestureDetectorLogger: - - def detect_and_log(self, image: npt.NDArray[np.uint8], frame_time_nano: int | None) -> None: - # Recognize gestures in the image - height, width, _ = image.shape - image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) - - recognition_result = ( - self.recognizer.recognize_for_video(image, int(frame_time_nano / 1e6)) - if self._video_mode - else self.recognizer.recognize(image) - ) - - for log_key in ["Media/Points", "Hand/Points", "Media/Connections", "Hand/Connections", "Hand3D/Points"]: - rr.log(log_key, rr.Clear(recursive=True)) - - for i, gesture in enumerate(recognition_result.gestures): - # Get the top gesture from the recognition result - gesture_category = gesture[0].category_name if recognition_result.gestures else "None" - self.present_detected_gesture(gesture_category) # Log the detected gesture - - def present_detected_gesture(self, category): - # Get the corresponding ulr of the picture for the detected gesture category - gesture_pic = GESTURE_PICTURES.get( - category, - "emoji_u2754.png" # default - ) - - # Log the detection by using the appropriate image - rr.log( - "Detection", - rr.TextDocument( - f'![Image]({GESTURE_URL + gesture_pic})'.strip(), - media_type=rr.MediaType.MARKDOWN - ) - ) - +# Log connections to the image and Hand Entity [128, 128, 128] +for log_key in ["Media/Connections", "Hand/Connections"]: + rr.log( + log_key, + rr.LineStrips2D(np.stack((points1, points2), axis=1), colors=[255, 165, 0]) + ) ``` -# Gesture Detector Logger +### 3D Points ```python +rr.log( + "Hand3D/Points", + rr.Points3D( + landmark_positions_3d, + radii=20, + class_ids=0, + keypoint_ids=[i for i in range(len(landmark_positions_3d))], + ), +) +``` -class GestureDetectorLogger: - """ - Logger for the MediaPipe Gesture Detection solution. - This class provides logging and utility functions for handling gesture recognition. - - For more information on MediaPipe Gesture Detection: - https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer - """ - - # URL to the pre-trained MediaPipe Gesture Detection model - MODEL_DIR: Final = EXAMPLE_DIR / "model" - MODEL_PATH: Final = (MODEL_DIR / "gesture_recognizer.task").resolve() - MODEL_URL: Final = ( - "https://storage.googleapis.com/mediapipe-models/gesture_recognizer/gesture_recognizer/float16/latest/gesture_recognizer.task" - ) - - def __init__(self, video_mode: bool = False): - self._video_mode = video_mode - - if not self.MODEL_PATH.exists(): - download_file(self.MODEL_URL, self.MODEL_PATH) - - base_options = python.BaseOptions( - model_asset_path=str(self.MODEL_PATH) - ) - options = vision.GestureRecognizerOptions( - base_options=base_options, - running_mode=mp.tasks.vision.RunningMode.VIDEO if self._video_mode else mp.tasks.vision.RunningMode.IMAGE - ) - self.recognizer = vision.GestureRecognizer.create_from_options(options) - - rr.log( - "/", - rr.AnnotationContext( - rr.ClassDescription( - info=rr.AnnotationInfo(id=0, label="Hand3D"), - keypoint_connections=mp.solutions.hands.HAND_CONNECTIONS - ) - ), - timeless=True, - ) - # rr.log("Hand3D", rr.ViewCoordinates.RIGHT_HAND_Y_DOWN, timeless=True) - rr.log("Hand3D", rr.ViewCoordinates.LEFT_HAND_Y_DOWN, timeless=True) - - @staticmethod - def convert_landmarks_to_image_coordinates(hand_landmarks, width, height): - return [(int(lm.x * width), int(lm.y * height)) for hand_landmark in hand_landmarks for lm in hand_landmark] - - @staticmethod - def convert_landmarks_to_3d(hand_landmarks): - return [(lm.x, lm.y, lm.y) for hand_landmark in hand_landmarks for lm in hand_landmark] - - def detect_and_log(self, image: npt.NDArray[np.uint8], frame_time_nano: int | None) -> None: - # Recognize gestures in the image - height, width, _ = image.shape - image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image) - - recognition_result = ( - self.recognizer.recognize_for_video(image, int(frame_time_nano / 1e6)) - if self._video_mode - else self.recognizer.recognize(image) - ) - - for log_key in ["Media/Points", "Hand/Points", "Media/Connections", "Hand/Connections", "Hand3D/Points"]: - rr.log(log_key, rr.Clear(recursive=True)) - - for i, gesture in enumerate(recognition_result.gestures): - # Get the top gesture from the recognition result - gesture_category = gesture[0].category_name if recognition_result.gestures else "None" - self.present_detected_gesture(gesture_category) # Log the detected gesture - - if recognition_result.hand_landmarks: - hand_landmarks = recognition_result.hand_landmarks - - landmark_positions_3d = self.convert_landmarks_to_3d(hand_landmarks) - if landmark_positions_3d is not None: - rr.log( - "Hand3D/Points", - rr.Points3D(landmark_positions_3d, radii=20, class_ids=0, - keypoint_ids=[i for i in range(len(landmark_positions_3d))]), - ) - - # Convert normalized coordinates to image coordinates - points = self.convert_landmarks_to_image_coordinates(hand_landmarks, width, height) - - # Log points to the image and Hand Entity - for log_key in ["Media/Points", "Hand/Points"]: - rr.log( - log_key, - rr.Points2D(points, radii=10, colors=[255, 0, 0]) - ) - - # Obtain hand connections from MediaPipe - mp_hands_connections = mp.solutions.hands.HAND_CONNECTIONS - points1 = [points[connection[0]] for connection in mp_hands_connections] - points2 = [points[connection[1]] for connection in mp_hands_connections] +## Detection - # Log connections to the image and Hand Entity [128, 128, 128] - for log_key in ["Media/Connections", "Hand/Connections"]: - rr.log( - log_key, - rr.LineStrips2D( - np.stack((points1, points2), axis=1), - colors=[255, 165, 0] - ) - ) +To showcase gesture recognition, an image of the corresponding gesture emoji is displayed within a `TextDocument` under the `Detection` entity. - def present_detected_gesture(self, category): - # Get the corresponding ulr of the picture for the detected gesture category - gesture_pic = GESTURE_PICTURES.get( - category, - "emoji_u2754.png" # default - ) - - # Log the detection by using the appropriate image - rr.log( - "Detection", - rr.TextDocument( - f'![Image]({GESTURE_URL + gesture_pic})'.strip(), - media_type=rr.MediaType.MARKDOWN - ) - ) +```python +# Log the detection by using the appropriate image +rr.log( + "Detection", + rr.TextDocument(f"![Image]({GESTURE_URL + gesture_pic})".strip(), media_type=rr.MediaType.MARKDOWN), +) +``` +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` +Install the necessary libraries specified in the requirements file: +```bash +pip install -r examples/python/gesture_detection/requirements.txt +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/gesture_detection/main.py # run the example +``` +If you wish to customize it for various videos, adjust the maximum frames, explore additional features, or save it use the CLI with the `--help` option for guidance: +```bash +$ python examples/python/gesture_detection/main.py --help ``` diff --git a/examples/python/human_pose_tracking/README.md b/examples/python/human_pose_tracking/README.md index adec70e1b05c..f0730103153f 100644 --- a/examples/python/human_pose_tracking/README.md +++ b/examples/python/human_pose_tracking/README.md @@ -19,11 +19,16 @@ Use the [MediaPipe Pose Landmark Detection](https://developers.google.com/mediap -## Used Rerun Types +# Used Rerun Types [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`Points2D`](https://www.rerun.io/docs/reference/types/archetypes/points2d), [`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d), [`ClassDescription`](https://www.rerun.io/docs/reference/types/datatypes/class_description), [`AnnotationContext`](https://www.rerun.io/docs/reference/types/archetypes/annotation_context), [`SegmentationImage`](https://www.rerun.io/docs/reference/types/archetypes/segmentation_image) -## Background -The [MediaPipe Pose Landmark Detection](https://developers.google.com/mediapipe/solutions/vision/pose_landmarker) solution detects and tracks human pose landmarks and produces segmentation masks for humans. The solution targets real-time inference on video streams. In this example we use Rerun to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior. +# Background +Human pose tracking is a task in computer vision that focuses on identifying key body locations, analyzing posture, and categorizing movements. +At the heart of this technology is a pre-trained machine-learning model to assess the visual input and recognize landmarks on the body in both image coordinates and 3D world coordinates. +The use cases and applications of this technology include but are not limited to Human-Computer Interaction, Sports Analysis, Gaming, Virtual Reality, Augmented Reality, Health, etc. + +In this example, the [MediaPipe Pose Landmark Detection](https://developers.google.com/mediapipe/solutions/vision/pose_landmarker) solution was utilized to detect and track human pose landmarks and produces segmentation masks for humans. +Rerun was employed to visualize the output of the Mediapipe solution over time to make it easy to analyze the behavior. # Logging and Visualizing with Rerun diff --git a/examples/python/lidar/README.md b/examples/python/lidar/README.md index 15feac668d08..b8c572aeb3bc 100644 --- a/examples/python/lidar/README.md +++ b/examples/python/lidar/README.md @@ -15,9 +15,49 @@ thumbnail_dimensions = [480, 480] -This example visualizes only the lidar data from the [nuScenes dataset](https://www.nuscenes.org/) using Rerun. For a moe extensive example including other sensors and annotations check out the [nuScenes example](https://www.rerun.io/examples/real-data/nuscenes). +Visualize the LiDAR data from the [nuScenes dataset](https://www.nuscenes.org/). +# Used Rerun Types +[`Points3D`](https://www.rerun.io/docs/reference/types/archetypes/points3d) + +# Background +This example demonstrates the ability to read and visualize LiDAR data from the nuScenes dataset, which is a public large-scale dataset specifically designed for autonomous driving. +The scenes in this dataset encompass data collected from a comprehensive suite of sensors on autonomous vehicles, including 6 cameras, 1 LIDAR, 5 RADAR, GPS and IMU sensors. + + +It's important to note that in this example, only the LiDAR data is visualized. For a more extensive example including other sensors and annotations check out the [nuScenes example](https://www.rerun.io/examples/real-data/nuscenes). + +# Logging and Visualizing with Rerun + +The visualization in this example was created with just the following lines. + + +```python +rr.set_time_seconds("timestamp", sample_data["timestamp"] * 1e-6) # Setting the time +rr.log("world/lidar", rr.Points3D(points, colors=point_colors)) # Log the 3D data +``` + +When logging data to Rerun, it's possible to associate it with specific time by using the Rerun's [`timelines`](https://www.rerun.io/docs/concepts/timelines). +In the following code, we first establish the desired time frame and then proceed to log the 3D data points. + +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` +Install the necessary libraries specified in the requirements file: ```bash pip install -r examples/python/lidar/requirements.txt -python examples/python/lidar/main.py +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/lidar/main.py # run the example +``` +If you wish to customize it, explore additional features, or save it use the CLI with the `--help` option for guidance: +```bash +python examples/python/lidar/main.py --help ``` diff --git a/examples/python/live_camera_edge_detection/README.md b/examples/python/live_camera_edge_detection/README.md index 3b404e072867..f480e14c2116 100644 --- a/examples/python/live_camera_edge_detection/README.md +++ b/examples/python/live_camera_edge_detection/README.md @@ -1,6 +1,7 @@ @@ -14,11 +15,65 @@ thumbnail_dimensions = [480, 480] Live Camera Edge Detection example screenshot -Very simple example of capturing from a live camera. +Visualize the [OpenCV Canny Edge Detection](https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html) results from a live camera stream. -Runs the opencv canny edge detector on the image stream. +# Used Rerun Types +[`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) -Usage: +# Background +In this example, the results of the [OpenCV Canny Edge Detection](https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html) algorithm are visualized. +Canny Edge Detection is a popular edge detection algorithm, and can efficiently extract important structural information from visual objects while notably reducing the computational load. +The process in this example involves converting the input image to RGB, then to grayscale, and finally applying the Canny Edge Detector for precise edge detection. + +# Logging and Visualizing with Rerun + +The visualization in this example were created with the following Rerun code: +## RGB Image + +The original image is read and logged in RGB format under the entity "image/rgb". +```python +# Log the original image +rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) +rr.log("image/rgb", rr.Image(rgb)) +``` + +## Grayscale Image + +The input image is converted from BGR color space to grayscale, and the resulting grayscale image is logged under the entity "image/gray". +```python +# Convert to grayscale +gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) +rr.log("image/gray", rr.Image(gray)) +``` + +## Canny Edge Detection Image + +The Canny edge detector is applied to the grayscale image, and the resulting edge-detected image is logged under the entity "image/canny". +```python +# Run the canny edge detector +canny = cv2.Canny(gray, 50, 200) +rr.log("image/canny", rr.Image(canny)) +``` + + +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` +Install the necessary libraries specified in the requirements file: +```bash +pip install -r examples/python/live_camera_edge_detection/requirements.txt +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/live_camera_edge_detection/main.py # run the example ``` -python examples/python/live_camera_edge_detection/main.py +If you wish to customize it, explore additional features, or save it use the CLI with the `--help` option for guidance: +```bash +python examples/python/live_camera_edge_detection/main.py --help ``` diff --git a/examples/python/live_depth_sensor/README.md b/examples/python/live_depth_sensor/README.md index 2762fc8ebc92..e085beb0635d 100644 --- a/examples/python/live_depth_sensor/README.md +++ b/examples/python/live_depth_sensor/README.md @@ -1,6 +1,7 @@ @@ -14,10 +15,106 @@ thumbnail_dimensions = [480, 360] Live Depth Sensor example screenshot +Visualize the live-streaming frames from an Intel RealSense depth sensor. -A minimal example of streaming frames live from an Intel RealSense depth sensor. +This example requires a connected realsense depth sensor. -Usage: +# Used Rerun Types +[`Pinhole`](https://www.rerun.io/docs/reference/types/archetypes/pinhole), [`Transform3D`](https://www.rerun.io/docs/reference/types/archetypes/transform3d), [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image), [`DepthImage`](https://www.rerun.io/docs/reference/types/archetypes/depth_image) + +# Background +The Intel RealSense depth sensor can stream live depth and color data. To visualize this data output, we utilized Rerun. + +# Logging and Visualizing with Rerun + +The RealSense sensor captures data in both RGB and depth formats, which are logged using the [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) and [`DepthImage`](https://www.rerun.io/docs/reference/types/archetypes/depth_image) archetypes, respectively. +Additionally, to provide a 3D view, the visualization includes a pinhole camera using the [`Pinhole`](https://www.rerun.io/docs/reference/types/archetypes/pinhole) and [`Transform3D`](https://www.rerun.io/docs/reference/types/archetypes/transform3d) archetypes. + +The visualization in this example were created with the following Rerun code. + +```python +rr.log("realsense", rr.ViewCoordinates.RDF, timeless=True) # Visualize the data as RDF +``` + + + +## Image + +First, the pinhole camera is set using the [`Pinhole`](https://www.rerun.io/docs/reference/types/archetypes/pinhole) and [`Transform3D`](https://www.rerun.io/docs/reference/types/archetypes/transform3d) archetypes. Then, the images captured by the RealSense sensor are logged as an [`Image`](https://www.rerun.io/docs/reference/types/archetypes/image) object, and they're associated with the time they were taken. + + + +```python +rgb_from_depth = depth_profile.get_extrinsics_to(rgb_profile) + rr.log( + "realsense/rgb", + rr.Transform3D( + translation=rgb_from_depth.translation, + mat3x3=np.reshape(rgb_from_depth.rotation, (3, 3)), + from_parent=True, + ), + timeless=True, +) +``` + +```python +rr.log( + "realsense/rgb/image", + rr.Pinhole( + resolution=[rgb_intr.width, rgb_intr.height], + focal_length=[rgb_intr.fx, rgb_intr.fy], + principal_point=[rgb_intr.ppx, rgb_intr.ppy], + ), + timeless=True, +) +``` +```python +rr.set_time_sequence("frame_nr", frame_nr) +rr.log("realsense/rgb/image", rr.Image(color_image)) +``` + +## Depth Image + +Just like the RGB images, the RealSense sensor also captures depth data. The depth images are logged as [`DepthImage`](https://www.rerun.io/docs/reference/types/archetypes/depth_image) objects and are linked with the time they were captured. + +```python +rr.log( + "realsense/depth/image", + rr.Pinhole( + resolution=[depth_intr.width, depth_intr.height], + focal_length=[depth_intr.fx, depth_intr.fy], + principal_point=[depth_intr.ppx, depth_intr.ppy], + ), + timeless=True, +) +``` +```python +rr.set_time_sequence("frame_nr", frame_nr) +rr.log("realsense/depth/image", rr.DepthImage(depth_image, meter=1.0 / depth_units)) +``` + + + + + +# Run the Code +To run this example, make sure you have the Rerun repository checked out and the latest SDK installed: +```bash +# Setup +pip install --upgrade rerun-sdk # install the latest Rerun SDK +git clone git@github.com:rerun-io/rerun.git # Clone the repository +cd rerun +git checkout latest # Check out the commit matching the latest SDK release +``` +Install the necessary libraries specified in the requirements file: +```bash +pip install -r examples/python/live_depth_sensor/requirements.txt +``` +To experiment with the provided example, simply execute the main Python script: +```bash +python examples/python/live_depth_sensor/main.py # run the example ``` -examples/python/live_depth_sensor/main.py +If you wish to customize it, explore additional features, or save it use the CLI with the `--help` option for guidance: +```bash +python examples/python/live_depth_sensor/main.py --help ```