Using 3-D LiDAR Data for Safe Physical Human-Robot Interaction

Sarthak Arora1, Karthik Subramanian2, Odysseus Adamides3, and Ferat Sahin4 This material is based upon work supported by the National Science Foundation under Award No. DGE-2125362. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.1Sarthak Arora is with the Department of Electrical Engineering and Micro-electronic, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623, USA [email protected]2Karthik Subramanian is with the Department of Electrical and Micro-electronic Engineering, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623, USA kxs8997rit.edu3Odysseus Adamides is with the Department of Electrical and Micro-electronic Engineering, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623, USA [email protected]4Ferat Sahin is with Faculty of Electrical and Microelectronic Engineering, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623, USA [email protected]
Abstract

This paper explores the use of 3D lidar in a physical Human-Robot Interaction (pHRI) scenario. To achieve the aforementioned, experiments were conducted to mimic a modern shop-floor environment. Data was collected from a pool of seventeen participants while performing pre-determined tasks in a shared workspace with the robot. To demonstrate an end-to-end case; a perception pipeline was developed that leverages reflectivity, signal, near-infrared, and point-cloud data from a 3-D lidar. This data is then used to perform safety based control whilst satisfying the speed and separation monitoring (SSM) criteria. In order to support the perception pipeline, a state-of-the-art object detection network was leveraged and fine-tuned by transfer learning. An analysis is provided along with results of the perception and the safety based controller. Additionally, this system is compared with the previous work.

I Introduction

Industry 4.0 has significantly increased the integration of point rich perception sensors into industries including manufacturing, supply chain, warehousing, medical fields, and construction [36]. The integration of these sensors has expanded the automation capabilities of these fields. A key sensor technology integrated across these fields has been lidar. These sensors provide two dimensional and three dimensional information about the environment around them and have been used to detect objects, obstacles, and humans through processes and tasks going on in the workspace around them [19]. With the data provided by lidars, the industry has been able to implement more complex algorithms and autonomous approaches within their fields. This includes the rise of autonomous vehicles, autonomous space vehicle landings, automated guided vehicles (AGVs), unmanned areal vehicles (UAVs), and collaborative robotics applications [11, 34, 20, 25, 5]. Throughout the past decade, both the algorithms and sensors have continued to see significant innovations.

Refer to caption
Figure 1: An image showing different stages of the experiment setup used in this work. In image “A”, the layout of the robot workspace is shown along with the exteroceptive sensors used in the setup (encircled in red and white). In image “B”, the test subject is wearing a motion capture body suit for acquiring minimum distance associated with the human and robot. In image “C”, the participant is wearing a reflective vest and reflective hardhat.

Time-of-Flight (ToF) cameras have begun to increase in depth resolution, it has become easier to calculate depth for stereoscopic cameras, and millimeter wave radar has begun to be used for human tracking applications [28, 1, 14, 30]. Though there has been a diversification in perception options, lidar remains to be a commonly used sensing method across industrial and research applications [23, 24, 3, 10]. In the past few years, there has been releases of new lidar products lines which bring new innovations to the perception platform. One product line example was released by the lidar manufacturer Ouster. The “OS” series of lidars includes the OS0, OS1, OS2, and OSDome. Along with various viewing angles and data channels, this product line generates four 2D image data modalities formed from the traditional 3D point cloud the lidar generates [27]. The four frames are range, signal, Near-IR (infra-red), and Reflectivity frames. The Range frame provides a per-pixel ToF distance calculation from the sensor origin to the pixel in the Range frame. The Signal frame provides the light return strength per pixel in the frame. The Near-IR frame provides the light return to the sensor per pixel that was not generated by the laser emitter local to the lidar. This frame measurement is similar to a monochrome IR return from a traditional image sensor. Lastly, the reflectivity frame provides the reflectivity strength per pixel. This frame provides key data on the reflectivity of materials and surfaces in the environment. The significance of Ouster including these frames in addition to the traditional point cloud is that it allows existing 2D machine learning algorithms to be directly applied to the 3D lidar data [27]. Hence, by leveraging the data provided by the channels, we aim to make the following contributions:

  1. 1.

    Develop a lidar based dataset with multiple participants with varying clothing and body shapes in realistic shop-floor conditions.

  2. 2.

    Demonstrate a successful use of the data collected by training a state-of-the-art object detector with validation and testing.

  3. 3.

    An application of the perception pipeline to develop a simple speed and separation monitoring safety controller based on prior work in [9]

II Literature Survey

2D frames of 3D lidar point clouds have been used in a number of research fields. This includes [37], where the reflectivity image was used to correct drone odometry. Additionally, [17] fuses the multiple modalities to increase 3D object detection performance. These alternate frame data formats have also been used in the automotive field to test segmentation of humans, vehicles, and other traffic objects without the use of a traditional CMOS image sensors [29]. In these different applications, there are plenty of previous works that illustrate a sufficient approach for feature extraction and data formatting to feed image based classifiers and algorithms. With the dawn of Industry 5.0, it is imperative for lidar to maintain compatibility with 2D machine vision and machine learning algorithms such that lidars match the performance of other perception systems used across industry [2]. Industry 4.0 setup the infrastructure of digitally driven and automated processes, Industry 5.0 pushes researchers to look deeper at these processes and their impact on the human individuals who must coexist with this infrastructure. A key research area that will continue to be a focus area in Industry 5.0 is Human Robot Collaboration (HRC). In this field, the pose of the worker, distance from worker to robot, and trajectories of the human and robot in the workspace are vital to increasing the safety and comfort of the worker [12, 32, 13]. Speed and separation monitoring (SSM) is one of the four major collaborative approaches identified in the International Organization for Standardization (ISO) standard ISO/TS 15066:2016 [9]. In the field of SSM research, a number of different sensor configurations and modalities are considered including ToF cameras, stereo cameras, mmWave radars, ultrasonic sensors, and lidars [13]. Lidar was the primary sensor used in the early years of SSM research [18]. As innovations in computation and perception have progressed, the other perception modalities have seen a rise of use in the field. To track the human in the scene, it is crucial for the image based perception systems used in an SSM setup to feed data to convolutional neural networks (CNNs) [22]. This localization of the human in the frame enables the computation of minimum distance data needed for an SSM algorithm. With the Ouster OS-0-32, the lidar data can also be used to directly feed CNN based algorithms for human position, and pose tracking.

In this paper, frame based lidar data is directly used to train a YOLOv9 [33] model in contrast to traditional methods which require raw 3D point cloud processing and mapping prior to the input into a neural network. Additionally, the data captured in this work consists of diverse body shapes and clothing material in an industrial environment. Furthermore, the data and model is applied to a simple, generalized SSM algorithm which outputs a safety distance and an operational velocity scaling factor. Lastly, the paper explores the viability of a vertical and horizontal field of view (FoV) lidars for safety based applications. The dataset, and trained model will be provided for other researchers to conduct further studies.

III Methodology

This section covers the various components involved in the experimental process. The goal of of the setup was to explore the usage scenarios for 3D lidars such as the Ouster OS-0-32 in an industrial shop floor environment. This environment was comprised of mostly static objects (workbenches etc.) with a limited number of dynamic objects (humans & robots) within the lidar FoV.

Refer to caption
Figure 2: Control schema showing the complete system, our communication is powered by Robot Operating System (ROS).

III-A Setup & Calibration

In this step, the focus was on achieving a time synchronization between the heterogeneous data streams emanating from the lidar sensor, motion capture, and the robot control box. Synchronization relied on a local high speed Ethernet based network that exhibited an average latency of approximately 0.25 milliseconds (round-trip time). Therefore, it was assumed that the delta time between the time of arrival of data packets and the time of origin was negligible. Finally, an asynchronous time synchronization on the various streams was performed. The inter-stream time delta of the synchronization was 5 milliseconds. In the calibration procedure, the main goal was to obtain the rigid body transformations of all the sensing data in a common reference frame. Customized rigid body marker-sets we developed for the motion capture system. These marker were affixed to the lidar, pedestal of the robot, and also on the skeleton tracking body suit worn by a human participant. This step provided a coarse calibration, however, for a better estimate of the lidar extrinsics an optimization based point-set alignment was used as shown in [31].

III-B Data Collection and Labeling

Once a synchronized and calibrated setup was achieved, the incoming data was recorded over the network on a local disk storage. The following data fields were focus on:

  • Lidar Data at 20 Hz: Point-cloud, Refelectivity, Signal, Near-Infrared and Stacked images

  • Motion Capture Data at 120 Hz: Rigid-bodies and 3D marker locations

  • Robot State Data at 125 Hz : Joint positions and velocities

Refer to caption
Figure 3: Clean samples of the reflectivity, signal, near-IR, and a depth-wise stacked image of the first three. The annotation is overlayed on the grayscale images in black and in green on colored images.

After the data was collected, it was then processed for labeling and downstream tasks such as low-level image processing and classification. The bulk of process, comprised of pre-processing steps applied to the lidar point-cloud, and image quadruplet obtained from the lidar data. First, the lidar point-cloud images were “destaggered” as mentioned in the documentation provided by manufacturer in [21]. The main idea behind this step was to remove the time offset from each element of the lidar data (point-cloud and images). Afterwards, the image quadruplet was subjected to bit depth down-sampling from 16-bit to 8-bit image data. As the image resolution was 1024×321024321024\times 321024 � 32, the images had to be resized to 102425610242561024\times 2561024 � 256 by applying bi-linear interpolation. The images were then subjected to auto-exposure adjustment and histogram equalization as part of pre-processing. The images were then annotated in a semi-automated fashion with bounding boxes in MSCOCO format [16]. For semi-automation, the static nature of the environment was exploited to remove a large number of points by background removal and applying statistical outlier rejection on the remaining points. Afterwards, noisy bounding-box labels were generated by re-projecting the non-stationary 3D points into the images. Ultimately, the bounding boxes were hand tuned.

III-C Network Training and Inference

The YOLOv9 [33] object detection network was selected to annotate bounding boxes around the human body shape in the lidar data (to image quadruplet only). For this step, two datasets were developed from the data collected during multiple experiments. The two variants comprised of single-channel annotated reflectivity images and multi-channel annotated images where reflectivity, signal, and near-infrared images were stacked depth-wise forming a tensor. It must be noted, that the images in the datasets represented only a subset of the total data recorded during the experiments. A larger full version of the dataset will be made available for the research community. Transfer learning was performed on a pre-trained variant of YOLOv9 called “YOLOv9-C” that had fewer parameters than the largest YOLOv9 variant called “YOLOv9-E”. The network was selected due to it’s state-of-the-art performance and efficiency as shown in [33]. Both datasets consisted of 14,000 annotated images, the dataset split was selected as 80% training and 20% validation. For testing, new dual variants of single-channel and multi-channel datasets (comprised of unseen data by the network) were prepared. As safety is one of the key challenges in pHRI [32], it is important to analyze every labeled and unlabeled image by the network at inference time to determine its suitability in a high stakes scenarios. Therefore, the test-set created was representative of one full trial performed by a human subject during the experiment, and validated using the fine-tuned network. For training, the stochastic gradient descent (SGD) optimizer with a momentum of 0.937 was selected. The batch size and epochs were chosen to be 16 and 50, respectively.

III-D Human point cloud extraction

In this phase of the pipeline, the annotations provided by the aforementioned network at inference time were used. As the spatial structure of the lidar frame (comprising of point-cloud and image quadruplets) allowed for a bi-directional mapping between the images and the point-cloud. The bounding-box rectangles were projected into a corresponding point-cloud and points external to the region of interest were pruned. This reduced the total number of points from 1024321024321024\times 321024 � 32 to approximately 2050205020\times 5020 � 50 (based on the largest possible size of the bounding box). Then, plane-segmentation and DBSCAN [6] clustering were used to extract the points associated with the human body shape.

Refer to caption
Figure 4: An image showing human shape geometry extraction using a point-cloud along with a bounding determined from a reflectivity image. Points in blue represent the human body, points in black are rejected as background.

III-E Speed and Separation Monitoring Algorithm

The method towards the implementation of the Speed and Separation Monitoring (SSM) was derived from[12]. This work defines the 3D geometrical (in a common reference frame as the robot) representation of the human operator in the robot’s workspace. A scene graph was constructed and closest pair of point queries between the human and the robot were performed. The algorithm of choice for such tasks was the “GJK” algorithm [7] which is widely used for such applications. The closest pair of points allowed for the computation of the minimum distance vector. This vector was used to compute the protective safety distance (threshold or a barrier around the robot) to trigger the robot stop behavior. As stated in [9] and [12], the speed and separation monitoring equation is given by:

Ssafety(t0)=Vhuman(tr+ts)+Vrobottr+C+Zs+Zrsubscript𝑆𝑠𝑎𝑓𝑒𝑡𝑦subscript𝑡0subscript𝑉𝑢𝑚𝑎𝑛subscript𝑡𝑟subscript𝑡𝑠subscript𝑉𝑟𝑜𝑏𝑜𝑡subscript𝑡𝑟𝐶subscript𝑍𝑠subscript𝑍𝑟S_{safety}(t_{0})=V_{human}\cdot(t_{r}+t_{s})+V_{robot}\cdot t_{r}+C+Z_{s}+Z_{r}italic_S start_POSTSUBSCRIPT italic_s italic_a italic_f italic_e italic_t italic_y end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT ⋅ ( italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) + italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT ⋅ italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_C + italic_Z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (1)

Zssubscript𝑍𝑠Z_{s}italic_Z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT & Zrsubscript𝑍𝑟Z_{r}italic_Z start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT represent the position measurement uncertainties for human and robot respectively. These values were obtained from the data-sheets of the robot and the exteroceptive sensing equipment used. C𝐶Citalic_C is the intrusion distance which is defined by [8], in essence it represents the threshold at which an obstacle is successfully detected. trsubscript𝑡𝑟t_{r}italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and tssubscript𝑡𝑠t_{s}italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT represent the control loop processing time and the time required by the robot to come to a full stop, respectively. These time values were also be obtained from the robot’s (UR10) data-sheet or can be empirically estimated. The robot stopping time aka tssubscript𝑡𝑠t_{s}italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT can also be tuned but should be lower bound by the worst case stopping time. To achieve jerk free stop behavior, the online trajectory generation library [4] was used.

As governed by the standard [9], most terms in the equation can be substituted with constants, the only quantity which is non-trivial to compute is Vrobotsubscript𝑉𝑟𝑜𝑏𝑜𝑡V_{robot}italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT. Thus, a fairly simple but general algorithm is proposed to compute the directed velocity of the robot towards the human based on velocity kinematics of the robot:

Data: Closest points: Phumansubscript𝑃𝑢𝑚𝑎𝑛P_{human}italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT, Probotsubscript𝑃𝑟𝑜𝑏𝑜𝑡P_{robot}italic_P start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT
Workspace limit: Wmaxsubscript𝑊𝑚𝑎𝑥W_{max}italic_W start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT
Routine: “jacobian()𝑗𝑎𝑐𝑜𝑏𝑖𝑎𝑛jacobian()italic_j italic_a italic_c italic_o italic_b italic_i italic_a italic_n ( )” to compute robot jacobian
Result: Vrobotsubscript𝑉𝑟𝑜𝑏𝑜𝑡V_{robot}italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT in direction of the operator
SPhumanProbot𝑆subscript𝑃𝑢𝑚𝑎𝑛subscript𝑃𝑟𝑜𝑏𝑜𝑡\vec{S}\leftarrow P_{human}-P_{robot}over→ start_ARG italic_S end_ARG ← italic_P start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT;
z=(0,0,1)T𝑧superscript001𝑇\vec{z}=(0,0,1)^{T}over→ start_ARG italic_z end_ARG = ( 0 , 0 , 1 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT
if is_valid(S𝑆\vec{S}over→ start_ARG italic_S end_ARG) then
       f𝑓\vec{f}over→ start_ARG italic_f end_ARG = normalize(S)𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑆normalize(\vec{S})italic_n italic_o italic_r italic_m italic_a italic_l italic_i italic_z italic_e ( over→ start_ARG italic_S end_ARG );
       r𝑟\vec{r}over→ start_ARG italic_r end_ARG = z×f𝑧𝑓\vec{z}\times\vec{f}over→ start_ARG italic_z end_ARG × over→ start_ARG italic_f end_ARG
       u𝑢\vec{u}over→ start_ARG italic_u end_ARG = f×r𝑓𝑟\vec{f}\times\vec{r}over→ start_ARG italic_f end_ARG × over→ start_ARG italic_r end_ARG
       TrwsuperscriptsubscriptT𝑟𝑤{}^{w}\textbf{T}_{r}start_FLOATSUPERSCRIPT italic_w end_FLOATSUPERSCRIPT T start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = (rufProbot0001)matrixrufsubscriptP𝑟𝑜𝑏𝑜𝑡0001\begin{pmatrix}\begin{smallmatrix}\vec{\textbf{r}}\end{smallmatrix}&\begin{% smallmatrix}\vec{\textbf{u}}\end{smallmatrix}&\begin{smallmatrix}\vec{\textbf{% f}}\end{smallmatrix}&\begin{smallmatrix}\textbf{P}_{robot}\end{smallmatrix}\\ 0&0&0&1\end{pmatrix}( start_ARG start_ROW start_CELL start_ROW start_CELL over→ start_ARG r end_ARG end_CELL end_ROW end_CELL start_CELL start_ROW start_CELL over→ start_ARG u end_ARG end_CELL end_ROW end_CELL start_CELL start_ROW start_CELL over→ start_ARG f end_ARG end_CELL end_ROW end_CELL start_CELL start_ROW start_CELL P start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG )directed transform along z in world coordinates
       Jrobot=jacobian(q,wTr)\textbf{J}_{robot}=jacobian(\textbf{q},^{w}\textbf{T}_{r})J start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT = italic_j italic_a italic_c italic_o italic_b italic_i italic_a italic_n ( q , start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT T start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT )
       [vtwistw]6×1=Jrobotq˙subscriptmatrixsuperscriptsubscriptv𝑡𝑤𝑖𝑠𝑡𝑤61subscriptJ𝑟𝑜𝑏𝑜𝑡˙q\begin{bmatrix}{}^{w}\textbf{v}_{twist}\end{bmatrix}_{6\times 1}=\textbf{J}_{% robot}\cdot\dot{\textbf{q}}[ start_ARG start_ROW start_CELL start_FLOATSUPERSCRIPT italic_w end_FLOATSUPERSCRIPT v start_POSTSUBSCRIPT italic_t italic_w italic_i italic_s italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT 6 × 1 end_POSTSUBSCRIPT = J start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT ⋅ over˙ start_ARG q end_ARG
       Thw=wTr{}^{w}\textbf{T}_{h}=^{w}\textbf{T}_{r}\cdotstart_FLOATSUPERSCRIPT italic_w end_FLOATSUPERSCRIPT T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT T start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⋅ (10000100001S0001)matrix10000100001delimited-∥∥S0001\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&1&\lVert\vec{\textbf{S}}\rVert\\ 0&0&0&1\end{pmatrix}( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL ∥ over→ start_ARG S end_ARG ∥ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG )
       vrh=Th1w[vtwistw]4×4superscriptsubscriptv𝑟superscriptsuperscriptsubscriptT1𝑤subscriptmatrixsuperscriptsubscriptv𝑡𝑤𝑖𝑠𝑡𝑤44{}^{h}\textbf{v}_{r}={{}^{w}\textbf{T}_{h}}^{-1}\cdot\begin{bmatrix}{}^{w}% \textbf{v}_{twist}\end{bmatrix}_{4\times 4}start_FLOATSUPERSCRIPT italic_h end_FLOATSUPERSCRIPT v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = start_FLOATSUPERSCRIPT italic_w end_FLOATSUPERSCRIPT T start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ [ start_ARG start_ROW start_CELL start_FLOATSUPERSCRIPT italic_w end_FLOATSUPERSCRIPT v start_POSTSUBSCRIPT italic_t italic_w italic_i italic_s italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT 4 × 4 end_POSTSUBSCRIPT
       Vrobot=sign(vrhz)vrhzsubscript𝑉𝑟𝑜𝑏𝑜𝑡𝑠𝑖𝑔𝑛superscriptsubscriptv𝑟delimited-⟨⟩zdelimited-∥∥superscriptsubscriptv𝑟delimited-⟨⟩z{V}_{robot}=sign({{}^{h}\textbf{v}_{r}}\langle\textbf{z}\rangle)\cdot\lVert{{}% ^{h}\textbf{v}_{r}}\langle\textbf{z}\rangle\rVertitalic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT = italic_s italic_i italic_g italic_n ( start_FLOATSUPERSCRIPT italic_h end_FLOATSUPERSCRIPT v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟨ z ⟩ ) ⋅ ∥ start_FLOATSUPERSCRIPT italic_h end_FLOATSUPERSCRIPT v start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟨ z ⟩ ∥
       return𝑟𝑒𝑡𝑢𝑟𝑛returnitalic_r italic_e italic_t italic_u italic_r italic_n Vrobotsubscript𝑉𝑟𝑜𝑏𝑜𝑡{V}_{robot}italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT
else
      return𝑟𝑒𝑡𝑢𝑟𝑛returnitalic_r italic_e italic_t italic_u italic_r italic_n 11-1- 1
Algorithm 1 Algorithm to compute directed VrobotsubscriptV𝑟𝑜𝑏𝑜𝑡\textbf{V}_{robot}V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT

Using the signed scalar given by Vrobotsubscript𝑉𝑟𝑜𝑏𝑜𝑡V_{robot}italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT, it can be substituted in (1) to obtain the safety distance. The safety distance Ssafetysubscript𝑆𝑠𝑎𝑓𝑒𝑡𝑦S_{safety}italic_S start_POSTSUBSCRIPT italic_s italic_a italic_f italic_e italic_t italic_y end_POSTSUBSCRIPT and the magnitude of minimum distance vector Sdelimited-∥∥𝑆\lVert\vec{S}\rVert∥ over→ start_ARG italic_S end_ARG ∥ can be used to compute the speed scaling commands sent to the robot controller by the following computation:

ρscaling=max(SSsafety,0)Wmaxsubscript𝜌𝑠𝑐𝑎𝑙𝑖𝑛𝑔𝑚𝑎𝑥delimited-∥∥𝑆subscript𝑆𝑠𝑎𝑓𝑒𝑡𝑦0subscript𝑊𝑚𝑎𝑥\rho_{scaling}=\frac{max(\lVert\vec{S}\rVert-S_{safety},0)}{W_{max}}italic_ρ start_POSTSUBSCRIPT italic_s italic_c italic_a italic_l italic_i italic_n italic_g end_POSTSUBSCRIPT = divide start_ARG italic_m italic_a italic_x ( ∥ over→ start_ARG italic_S end_ARG ∥ - italic_S start_POSTSUBSCRIPT italic_s italic_a italic_f italic_e italic_t italic_y end_POSTSUBSCRIPT , 0 ) end_ARG start_ARG italic_W start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT end_ARG (2)

such that ρscaling[0,1]subscript𝜌𝑠𝑐𝑎𝑙𝑖𝑛𝑔01\rho_{scaling}\in[0,1]italic_ρ start_POSTSUBSCRIPT italic_s italic_c italic_a italic_l italic_i italic_n italic_g end_POSTSUBSCRIPT ∈ [ 0 , 1 ] and Wmaxsubscript𝑊𝑚𝑎𝑥W_{max}italic_W start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is the distance limit in meters for the robot workspace beyond which sensor readings are clamped. ρscalingsubscript𝜌𝑠𝑐𝑎𝑙𝑖𝑛𝑔\rho_{scaling}italic_ρ start_POSTSUBSCRIPT italic_s italic_c italic_a italic_l italic_i italic_n italic_g end_POSTSUBSCRIPT represents the scaling factor to control the speed of the robot. This factor can be used to uniformly scale the joint velocities q˙˙q\dot{\textbf{q}}over˙ start_ARG q end_ARG of the robot as shown in [12].

IV Experiment

Refer to caption
Figure 5: Flow diagram of the experiment performed as prescribed by the authors in [26]

The experimental configuration comprised pf distinct tasks allocated to both the human and the robot, emulating an assembly line scenario. Specifically, a collaborative robot was responsible for extracting a component from a pallet and depositing it onto a conveyor belt. Subsequently, a UR-10 robot retrieved the necessary component from the conveyor and situated it within a bin accessible to the human operator, both the human and UR-10 coexisted within the same workspace. There were four stations in the shared workspace. The human worker walked to each of these stations completing the assembly of a PVC coupling. One of these parts was provided by UR-10. Once the assembly of a single part was completed, the human was required to deposit the item at a designated location and perform an arbitrary task at the same station. This process was repeated 24 times in one trial. A lidar captured point clouds and multi-channel imagery throughout the experiment. Each worker participated in 6 trials, in 3 of those trials, they were required to wear a high visibility jacket. Additionally, the entire workspace was monitored with a motion capture system which possessed 13 cameras that flooded the workspace with 940 nm infrared light.

V Results and Discussion

V-A Dataset

The experiment involved 17 participants, 29% of the participants were of female sex and the remaining 71% were of male sex. The variety of clothing, fabrics, and colors worn by participants were recorded. A world-cloud image to represent this diversity is illustrated in Figure 6. The most common clothing color, fabric, and type were black, cotton and jeans with hoodie, respectively. This gave a minor insight that clothing worn by operators in shop-floors could also be dark in color and of cotton material. Hence, a lidar should be able to detect these materials based on any arbitrary surface reflectivity exhibited by them.

Refer to caption
Figure 6: A plot and word-cloud showing the participant attribute distribution. Left: the weight-height and age distribution of the participants. Right: a word cloud showing a distribution of clothing types, materials and colors.

The weight of the participants ranged from 49 kg to 136 kg, and the height ranged from 1.5 m to 1.85 m. It is vital for any learning based model to be aware of varying body geometries in a shop-floor environment. The approval for Human subject research was granted by Rochester Institute of Technology. (Approval number: 21081267)

After the data collection, pre-processing was applied and the training and validation sets (for “single-channel” and “multi-channel”) were prepared with about 12,000 and 2200 images, respectively.

Refer to caption
Figure 7: Tiled layout of 18 samples randomly drawn from each training dataset. Left: a snapshot of the single-channeled dataset built with reflectivity images. Right: a snapshot of the multi-channeled dataset built with depth-wise concatenation of reflectivity, signal, and near-infrared images.

V-B Quantitative Results

The two previously mentioned datasets were used to train the YOLOv9 object detector in a binary detection mode. The validation curves during training session are shown in Figure 8. During the “multi-channel” training it was observed that the YOLOv9 network converged faster and exhibited a higher mAP50-95 validation score.

Refer to caption
Figure 8: Plots showing the validation metrics during YOLOv9 fine-tuning on the datasets prepared namely Single-channel & Multi-channel datasets.

After training the network, inference was performed on unseen lidar sequences of 12,500 samples for both variants. In figure 9, it was observed that the “multi-channel” variant performed approximately 1% better than the “single-channel” variant. However, it was noted that the classifier confidence during inference was more robust during “single-channel” inference. On analyzing the spread of the confidence values of the classifiers, it was found that the multi-channel detector was measurably less certain than the single-channel variant.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Figures showing inference examples, confusion matrix, and confidence scatter plots on the test sets.

To measure the accuracy of the lidar for pHRI scenarios, the closest pair of points between the human operator and the robot were recorded with the lidar and the motion capture system simultaneously. For the on-robot base mounted 3D lidar, the root mean square error (RMSE) was more than 4 times lower than on-robot time-of-flight sensing rings in [12]. The margin of error was found to be lower bounded by 3mm as reported by the manufacturer.

Lidar (ours) ToF Rings [12]
RMSE (m) 0.0605 0.25
TABLE I: RMSE comparison for lidar and time-of-flight sensing rings
Refer to caption
Figure 10: Plot showing the minimum-distance comparison between data acquired from the lidar and motion capture system overlay-ed on top of each other.

The results from the safety algorithm are presented in figure 11. A 25 second(s)𝑠𝑒𝑐𝑜𝑛𝑑𝑠second(s)italic_s italic_e italic_c italic_o italic_n italic_d ( italic_s ) long recording of the results was analyzed; the directed robot velocity Vrobotsubscript𝑉𝑟𝑜𝑏𝑜𝑡V_{robot}italic_V start_POSTSUBSCRIPT italic_r italic_o italic_b italic_o italic_t end_POSTSUBSCRIPT was found to be proportionally tracked by safety distance Ssafetysubscript𝑆𝑠𝑎𝑓𝑒𝑡𝑦S_{safety}italic_S start_POSTSUBSCRIPT italic_s italic_a italic_f italic_e italic_t italic_y end_POSTSUBSCRIPT due to its linear dependence on the prior. Vhumansubscript𝑉𝑢𝑚𝑎𝑛V_{human}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT was set to 1.6m/s1.6𝑚𝑠1.6m/s1.6 italic_m / italic_s (prescribed by [9]) and the remaining terms in equation 1 were construed from the robot’s datasheet. Between 18.5 and 19.0 seconds marks, the speed scaling term ρscalingsubscript𝜌𝑠𝑐𝑎𝑙𝑖𝑛𝑔\rho_{scaling}italic_ρ start_POSTSUBSCRIPT italic_s italic_c italic_a italic_l italic_i italic_n italic_g end_POSTSUBSCRIPT decayed when the minimum distance Sdelimited-∥∥𝑆\lVert\vec{S}\rVert∥ over→ start_ARG italic_S end_ARG ∥ violated or tended towards Ssafetysubscript𝑆𝑠𝑎𝑓𝑒𝑡𝑦S_{safety}italic_S start_POSTSUBSCRIPT italic_s italic_a italic_f italic_e italic_t italic_y end_POSTSUBSCRIPT. It should also be noted that ρscalingsubscript𝜌𝑠𝑐𝑎𝑙𝑖𝑛𝑔\rho_{scaling}italic_ρ start_POSTSUBSCRIPT italic_s italic_c italic_a italic_l italic_i italic_n italic_g end_POSTSUBSCRIPT was always below 0.50.50.50.5, as the human subject was always within 1.5 meters (<Wmaxabsentsubscript𝑊𝑚𝑎𝑥<W_{max}< italic_W start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) of the robot. Furthermore, even though there was no smoothing and filtering applied to the data, Sdelimited-∥∥𝑆\lVert\vec{S}\rVert∥ over→ start_ARG italic_S end_ARG ∥ computed from lidar data was significantly smoother than in [12], where an exponential filter was used. As closest distance between two articulated bodies can tend to vary at higher frequencies, the capability of the system to provide a relatively smoother metric virtually eliminates the need of low-pass filters that can introduce time lag in the controller. This is vital in high stakes scenarios where safety is the main goal.

Refer to caption
Figure 11: Time series (25 seconds) plots showing the results of the directed robot velocity computation (bottom most in blue) towards the human along with the minimum distance (top in black), safety distance threshold (second from top in red) computed with the SSM Equation [9] and the speed scaling factor (third from top in yellow) used for modulating the operational speed of the robot.

V-C Qualitative Results

During data collection, it was observed that the response of the lidar was poorer in certain scenarios where the participants were wearing significantly darker clothes even in close proximity to the robot. It was found that the reflectivity and range images provided by the lidar exhibited the presence of holes. As a consequence, the point cloud lacked the 3D information associated with the human’s shape geometry (points were missing from the point cloud). This phenomenon is illustrated in a side by side comparison shown in figure 12. It should be noted that in the left half of the figure, the participant was wearing a high-reflectivity vest with black cotton garments underneath. Only the points associated with the reflective vest were reported by the lidar.

Refer to caption
Figure 12: On left, lidar reflectivity image with holes with its corresponding 3D point-cloud in perspective view. On right, a healthy sample of the reflectivity image with its 3D point cloud.

Another limitation was observed, wherein the “multi-channel” variant performed poorly after the floor layout changed. This limitation is shown in figure 13, this can be explained due to a distribution shift, the network is biased directly on the metrics associated with the photons scattered in the environment. The colored patches in the image can also create ambiguous textures that can confuse the network. It should be noted, that in this image the participant is wearing a reflective vest. This can be also be addressed by a higher resolution lidar such as OS-0-128 where the base resolution is 1024×12810241281024\times 1281024 × 128, hence the effect of up-scaling will not create aliasing artifacts.

Refer to caption
Figure 13: A miss-classification performed by the “multi-channel” variant due to room layout change and ambiguous texture patches in the input image. The correct bounding box is drawn in a dashed bounding box.

Figure 14 shows the monotonic nature of the lidar recordings on the left. This can cause a rolling shutter effect; while the lidar records at relatively high frequency (at 20 Hz𝐻𝑧Hzitalic_H italic_z), it may be susceptible to creating artifacts if an object in the frame moves fast enough from one point to another before the rolling shutter has scanned the entire frame. This may create a situation where the moving object appears to have teleported in the recorded frame. Swiftly moving objects can also appear distorted as they may have been recorded at staggered intervals. To explain for a slightly higher validation and inference results by the “multi-channel” variant, the structured similarity index metric (SSIM) matrix was used. As shown on the right in 14, as the relative SSIM of each image type is significantly below 1.0, there is a likelihood that the network can extract additional features from the added channels. If the features were redundant, the relative SSIM (the matrix elements would diffuse more) would be closer to 1.0. However, the Near-IR channel tends to exhibit higher ambient noise than other images, but can be useful in close proximity scenarios where objects are closer than the minimum measurable distance by the lidar.

Refer to caption
Refer to caption
Figure 14: Left, the timestamps of the sequential readout perform by the lidar. Right, a matrix showing the relative structured similarity index metric (SSIM) of the lidar images.

VI Conclusions & Future Work

The on-robot base mounted lidar can significantly outperform on-robot time-of-flight sensing rings due to the 3D point-cloud and 2D image data. Furthermore, the bi-directional 2D3D2𝐷3𝐷2D\Leftrightarrow 3D2 italic_D ⇔ 3 italic_D mapping enables for higher level tasks such as object detection on images and subsequently, region-of-interest extraction on corresponding point-clouds. This leads to a more efficient perception pipeline as image based backend(s) can be used to bootstrap detection networks while pruning the 3D search space. Also, due to this capability, we were able to semi-automate bounding box annotation for our datasets. In future, this can enable the application of techniques such as continual learning [35].

The lidar also exhibits some limitations due to the presence of holes in the image channels which affect the quality of the point-cloud. Therefore, in a shop-floor it is vital to wear high-reflective markers such as vest and helmets as they can alleviate the presence of holes in the lidar data. Ultimately, we can conclude that the use of the 3D lidar in close proximity pHRI scenarios is viable, as long as steps are taken to prevent sensing failures and pitfalls.
For future works, the first step is be to develop a target network that can directly handle the input sizes provided by the lidar and is designed to work with 16-bit precision. To overcome the distribution shift problem, the channel order can be randomized while also introducing small changes in the floor layout so that the network becomes more robust. The changes required would be quite small, as a shop-floor environment is more static than an outdoor scenario. Exploring deep learning based image up-scaling techniques such as [15] and usage of more advanced sensing hardware (OS-0-128) will also provide us with more reliable inference. Another downstream task that we are already working is instance segmentation, we are currently working on developing mask annotation for the lidar images. For the safety controller, leveraging directed velocity of the human operator towards the robot will also aid the safety barrier to be relaxed in situations where the human is moving away from the robot. As we assume Vhumansubscript𝑉𝑢𝑚𝑎𝑛V_{human}italic_V start_POSTSUBSCRIPT italic_h italic_u italic_m italic_a italic_n end_POSTSUBSCRIPT to be a positive constant, it implies that operator is always moving in the direction of the robot with a constant velocity. Therefore, measuring the velocity of the operator in real-time will be beneficial for robot productivity without sacrificing operator safety.

References

  • [1] Odysseus Alexander Adamides, Alexander Avery, Karthik Subramanian, and Ferat Sahin. Evaluation of On-Robot Depth Sensors for Industrial Robotics. In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 1014–1021, October 2023. ISSN: 2577-1655.
  • [2] João Barata and Ina Kayser. Industry 5.0 – Past, Present, and Near Future. Procedia Computer Science, 219:778–788, January 2023.
  • [3] Himansu Sekhar Behera and Anandakumar M Ramiya. Urban flood modelling simulation with 3D building models from airborne LiDAR point cloud. In 2022 IEEE Mediterranean and Middle-East Geoscience and Remote Sensing Symposium (M2GARSS), pages 145–148, March 2022.
  • [4] Lars Berscheid and Torsten Kröger. Jerk-limited Real-time Trajectory Generation with Arbitrary Target States, June 2021. arXiv:2105.04830 [cs].
  • [5] Răzvan-Ionuț Bălașa, Ghoerghe Olaru, Daniel Constantin, Amado Ștefan, Ciprian-Marian Bîlu, and Maria Beatrice Bălăceanu. LIDAR based distance estimation for emergency use terrestrial autonomous robot. In 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pages 1–4, July 2021.
  • [6] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pages 226–231, Portland, Oregon, August 1996. AAAI Press.
  • [7] E.G. Gilbert, D.W. Johnson, and S.S. Keerthi. A fast procedure for computing the distance between complex objects in three-dimensional space. IEEE Journal on Robotics and Automation, 4(2):193–203, April 1988. Conference Name: IEEE Journal on Robotics and Automation.
  • [8] ISO. ISO 13855:2010(en) Safety of machinery — Positioning of safeguards with respect to the approach speeds of parts of the human body, 2010.
  • [9] ISO. ISO/TS 15066:2016(en) Robots and robotic devices — Collaborative robots, 2022.
  • [10] Efstathios Karypidis, Georgios Zamanakos, Lazaros Tsochatzidis, and Ioannis Pratikakis. Point Contrastive learning for LiDAR-based 3D object detection in autonomous driving. In 2023 24th International Conference on Digital Signal Processing (DSP), pages 1–5, June 2023. ISSN: 2165-3577.
  • [11] G. Ajay Kumar, Ashok Kumar Patil, Rekha Patil, Seong Sill Park, and Young Ho Chai. A LiDAR and IMU Integrated Indoor Navigation System for UAVs and Its Application in Real-Time Pipeline Classification. Sensors (Basel, Switzerland), 17(6):1268, June 2017.
  • [12] Shitij Kumar, Sarthak Arora, and Ferat Sahin. Speed and Separation Monitoring using On-Robot Time-of-Flight Laser-ranging Sensor Arrays. In 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), pages 1684–1691, August 2019. ISSN: 2161-8089.
  • [13] Shitij Kumar, Celal Savur, and Ferat Sahin. Survey of Human–Robot Collaboration in Industrial Settings: Awareness, Intelligence, and Compliance. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(1):280–297, January 2021. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics: Systems.
  • [14] Bakir Lacevic, Andrea Maria Zanchettin, and Paolo Rocco. Safe Human-Robot Collaboration via Collision Checking and Explicit Representation of Danger Zones. IEEE Transactions on Automation Science and Engineering, 20(2):846–861, April 2023. Conference Name: IEEE Transactions on Automation Science and Engineering.
  • [15] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced Deep Residual Networks for Single Image Super-Resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1132–1140, July 2017. ISSN: 2160-7516.
  • [16] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft COCO: Common Objects in Context, February 2015. arXiv:1405.0312 [cs].
  • [17] Yunze Man, Xinshuo Weng, Prasanna Kumar Sivakumar, Matthew O’Toole, and Kris Kitani. Multi-Echo LiDAR for 3D Object Detection. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 3743–3752, October 2021. ISSN: 2380-7504.
  • [18] Jeremy A. Marvel and Rick Norcross. Implementing speed and separation monitoring in collaborative robot workcells. Robotics and computer-integrated manufacturing, 44(Journal Article):144–155, 2017. Place: OXFORD Publisher: Elsevier Ltd.
  • [19] Paul F. McManamon and Society of Photo-optical Instrumentation Engineers. LiDAR technologies and systems, volume PM300. SPIE, Bellingham, Washington (1000 20th St. Bellingham WA 98225-6705 USA), 2019.
  • [20] Charith Munasinghe, Fatemeh Mohammadi Amin, Davide Scaramuzza, and Hans Wernher van de Venn. COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation. In 2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA), pages 1–4, September 2022.
  • [21] Ouster. Ouster SDK — Ouster Sensor SDK 0.10.0 documentation, 2022.
  • [22] Justyna Patalas-Maliszewska, Adam Dudek, Grzegorz Pajak, and Iwona Pajak. Working toward Solving Safety Issues in Human–Robot Collaboration: A Case Study for Recognising Collisions Using Machine Learning Algorithms. Electronics (Basel), 13(4):731, 2024. Place: Basel Publisher: MDPI AG.
  • [23] Arne Peters, Adam Schmidt, and Alois C. Knoll. Extrinsic Calibration of an Eye-In-Hand 2D LiDAR Sensor in Unstructured Environments Using ICP. IEEE Robotics and Automation Letters, 5(2):929–936, April 2020. Conference Name: IEEE Robotics and Automation Letters.
  • [24] Aquib Rashid, Kannan Peesapati, Mohamad Bdiwi, Sebastian Krusche, Wolfram Hardt, and Matthias Putz. Local and Global Sensors for Collision Avoidance. In 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pages 354–359, September 2020.
  • [25] Zoltan Rozsa and Tamas Sziranyi. Obstacle Prediction for Automated Guided Vehicles Based on Point Clouds Measured by a Tilted LIDAR Sensor. IEEE Transactions on Intelligent Transportation Systems, 19(8):2708–2720, August 2018. Conference Name: IEEE Transactions on Intelligent Transportation Systems.
  • [26] Celal Savur. A physiological computing system to improve human-robot collaboration by using human comfort index. PhD thesis, Rochester Institute of Technology, Rochester, NY, 2022. Dissertation/Thesis.
  • [27] Fisher Shi. Object Detection and Tracking using Deep Learning and Ouster Python SDK | Ouster, March 2022.
  • [28] Jeffrey Too Chuan Tan and Tamio Arai. Triple stereo vision system for safety monitoring of human-robot collaboration in cellular manufacturing. In 2011 IEEE International Symposium on Assembly and Manufacturing (ISAM), pages 1–6, May 2011.
  • [29] Maria Tsiourva and Christos Papachristos. LiDAR Imaging-based Attentive Perception. In 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pages 622–626, September 2020. ISSN: 2575-7296.
  • [30] Barnaba Ubezio, Christian Schöffmann, Lucas Wohlhart, Stephan Mülbacher-Karrer, Hubert Zangl, and Michael Hofbaur. Radar Based Target Tracking and Classification for Efficient Robot Speed Control in Fenceless Environments. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 799–806, September 2021. ISSN: 2153-0866.
  • [31] S. Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(4):376–380, April 1991. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • [32] Valeria Villani, Fabio Pini, Francesco Leali, and Cristian Secchi. Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications. Mechatronics, 55:248–266, November 2018.
  • [33] Chien-Yao Wang, I.-Hau Yeh, and Hong-Yuan Mark Liao. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, February 2024. arXiv:2402.13616 [cs].
  • [34] Heng Wang, Bin Wang, Bingbing Liu, Xiaoli Meng, and Guanghong Yang. Pedestrian recognition and tracking using 3D LiDAR for autonomous vehicle. Robotics and Autonomous Systems, 88:71–78, February 2017.
  • [35] Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20, 2024. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • [36] Li Da Xu, Eric L. Xu, and Ling Li. Industry 4.0: state of the art and future trends: International Journal of Production Research. International Journal of Production Research, 56(8):2941–2962, April 2018. Publisher: Taylor & Francis Ltd.
  • [37] Yanfeng Zhang, Yunong Tian, Wanguo Wang, Guodong Yang, Zhishuo Li, Fengshui Jing, and Min Tan. RI-LIO: Reflectivity Image Assisted Tightly-Coupled LiDAR-Inertial Odometry. IEEE Robotics and Automation Letters, 8(3):1802–1809, March 2023. Conference Name: IEEE Robotics and Automation Letters.