Return to search

Real-time road traffic information detection through social media

In current study, a mechanism to extract traffic related information such as congestion and incidents from textual data from the internet is proposed. The current source of data is Twitter, however, the same mechanism can be extended to any kind of text available on the internet. As the data being considered is extremely large in size automated models are developed to stream, download, and mine the data in real-time. Furthermore, if any tweet has traffic related information then the models should be able to infer and extract this data. To pursue this task, Artificial Intelligence, Machine Learning, and Natural Language Processing techniques are used. These models are designed in such a way that they are able to detect the traffic congestion and traffic incidents from the Twitter stream at any location.
Currently, the data is collected only for United States. The data is collected for 85 days (50 complete and 35 partial) randomly sampled over the span of five months (September, 2014 to February, 2015) and a total of 120,000 geo-tagged traffic related tweets are extracted, while six million geo-tagged non-traffic related tweets are retrieved. The classification models for detection of traffic congestion and incidents are trained on this dataset. Furthermore, this data is also used for various kinds of spatial and temporal analysis. A mechanism to calculate level of traffic congestion, safety, and traffic perception for cities in U.S. is proposed. Traffic congestion and safety rankings for the various urban areas are obtained and then they are statistically validated with existing widely adopted rankings. Traffic perception depicts the attitude and perception of people towards the traffic.
It is also seen that traffic related data when visualized spatially and temporally provides the same pattern as the actual traffic flows for various urban areas. When visualized at the city level, it is clearly visible that the flow of tweets is similar to flow of vehicles and that the traffic related tweets are representative of traffic within the cities.
With all the findings in current study, it is shown that significant amount of traffic related information can be extracted from Twitter and other sources on internet. Furthermore, Twitter and these data sources are freely available and are not bound by spatial and temporal limitations. That is, wherever there is a user there is a potential for data.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/53889
Date21 September 2015
CreatorsKhatri, Chandra P.
ContributorsHunter, Michael P.
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Languageen_US
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf

Page generated in 0.0025 seconds