<div>
<div>
<div>
<p>For pedestrians and autonomous vehicles (AVs) to co-exist harmoniously and safely in
the real-world, AVs will need to not only react to pedestrian actions, but also anticipate
their intentions. In this thesis, we propose to use rich visual and pedestrian-environment
interaction features to improve pedestrian crossing intention prediction from the ego-view.
We do so by combining visual feature extraction, graph modeling of scene objects and their
relationships, and feature encoding as comprehensive inputs for an LSTM encoder-decoder
network.
</p>
<p>Pedestrians react and make decisions based on their surrounding environment, and the
behaviors of other road users around them. The human-human social relationship has already been explored for pedestrian trajectory prediction from the bird’s eye view in stationary
cameras. However, context and pedestrian-environment relationships are often missing in
current research into pedestrian trajectory, and intention prediction from the ego-view. To
map the pedestrian’s relationship to its surrounding objects we use a star graph with the
pedestrian in the center connected to all other road objects/agents in the scene. The pedestrian and road objects/agents are represented in the graph through visual features extracted
using state of the art deep learning algorithms. We use graph convolutional networks, and
graph autoencoders to encode the star graphs in a lower dimension. Using the graph en-
codings, pedestrian bounding boxes, and human pose estimation, we propose a novel model
that predicts pedestrian crossing intention using not only the pedestrian’s action behaviors
(bounding box and pose estimation), but also their relationship to their environment.
</p>
<p>Through tuning hyperparameters, and experimenting with different graph convolutions
for our graph autoencoder, we are able to improve on the state of the art results. Our context-
driven method is able to outperform current state of the art results on benchmark dataset
Pedestrian Intention Estimation (PIE). The state of the art is able to predict pedestrian
crossing intention with a balanced accuracy (to account for dataset imbalance) score of 0.61,
while our best performing model has a balanced accuracy score of 0.79. Our model especially
outperforms in no crossing intention scenarios with an F1 score of 0.56 compared to the state
of the art’s score of 0.36. Additionally, we also experiment with training the state of the art model and our model to predict pedestrian crossing action, and intention jointly. While
jointly predicting crossing action does not help improve crossing intention prediction, it is
an important distinction to make between predicting crossing action versus intention.</p>
</div>
</div>
</div>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/14831025 |
Date | 06 August 2021 |
Creators | Chen Chen (11014800) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Modeling_Spatiotemporal_Pedestrian-Environment_Interactions_for_Predicting_Pedestrian_Crossing_Intention_from_the_Ego-View/14831025 |
Page generated in 0.0025 seconds