Research on Human Mobility and Urban Computing with Big Data

In the Mobile Age, with the widespread use of location-awareness devices, it is possible to collect large-scale location-awareness datasets, such as mobile phone data, GPS-enabled taxi trajectories, and location-based social media data, to sense complex human movements and human-environment interactions in urban space. Here are some examples of research that we have been working on.

(1) Discovering Spatial Interaction Communities from Mobile Phone Data

This research attempts to explore and interpret patterns embedded in the network of phone‐call interaction and the network of phone‐users’ movements, by considering the geographical context of mobile phone cells. We adopt an agglomerative clustering algorithm based on a Newman‐Girvan modularity metric and propose an alternative modularity function incorporating a gravity model to discover the clustering structures of spatial‐interaction communities using a mobile phone dataset from one week in a city in China. The results verify the distance decay effect and spatial continuity that control the process of partitioning phone‐call interaction, which indicates that people tend to communicate within a spatial‐proximity community. Furthermore, we discover that a high correlation exists between phone‐users’ movements in physical space and phone‐call interaction in cyberspace. Our approach presents a combined qualitative‐quantitative framework to identify clusters and interaction patterns, and explains how geographical context influences communities of callers and receivers. The findings of this empirical study are valuable for urban structure studies as well as for the detection of communities in spatial networks.

(2) Spatio-Temporal Analytics for Exploring Human Mobility Patterns and Urban Dynamics

In this research, we present a spatio-temporal analytical framework including spatiotemporal visualization (STV), space-time kernel density estimation (STKDE), and spatio-temporal-autocorrelation-analysis (STAA), to explore human mobility patterns and intra-urban communication dynamics. Experiments were conducted using large-scale detailed records of mobile phone calls in a city. The space-time path, time series graphs, vertical Bezier curves, STKDE, STAA, and related techniques in 3D GIS as well as statistical tests have been suggested for different spatio-temporal analysis tasks. We also investigated several statistical measures that extend the classic spatial association indices for spatio-temporal autocorrelation analysis. The spatial order of weighted matrix was found to have more significant effects than the temporal neighbors on influencing the autocorrelation strength of hourly phone calls.

(3) Spatio-Temporal-Network Visualization for Exploring Human Movements and Interactions in Physical and Virtual Spaces

Spatiotemporal visualization techniques are effective in detecting human activity patterns over space and time, while social network analysis is good for exploring graph structures and interactions among individual nodes and groups. Integrating these two types of methods can facilitate the exploration of complex social networks in space and time, and can help discover hidden spatiotemporal and social connections. In this research, we propose a novel conceptual framework for spatiotemporal and social network visualization in a three-dimensional context. Based on this framework, new spatio-temporal-network (STN) quantitative metrics (including STN-impact-extent, STN-impact-center, STN-distance, STN-efficiency, and STN-centrality) are introduced to measure the underlying dynamic interactions among entities. The proposed framework aims to help better understand spatiotemporal patterns of human dynamics and social interactions over both physical and virtual spaces simultaneously, as well as explore how emerging events trigger spatial-temporal-social interactions and information diffusion from a process perspective. As a proof of concept, we demonstrate the proposed framework with a case study using geotagged tweets and associated visualization in the ArcScene software. We hope that this research can stimulate new insights on integrating multidisciplinary knowledge to explore human dynamics in a broader way.

(4) Uncovering the digital divide and the physical divide using mobile phone data

In this research, we first aim at developing data analytics that can derive insights about how people from different regions communicate and connect via mobile phone calls and physical movements. We uncover the digital divide (geographical segregation of phone communication patterns) and the physical divide (geographical limits of human mobility) in a developing country. The research also demonstrates that the chosen spatial unit and temporal resolution can affect the community detection results of spatial interaction graphs when analyzing human mobility patterns and exploring urban dynamics in the mobile age. We find that the daily detection has generated a more stable partition structure than an hourly one, while monthly changes also exist over time. The presented framework can help identify patterns of spatial interaction in both cyberspace and physical space with phone call detailed records in some regions where census data acquisition is difficult, especially in some developing countries.

(5) Identifying Local Spatiotemporal Autocorrelation Patterns of Large-scale Taxi Pick-ups and Drop-offs

Analyzing spatiotemporal autocorrelation would be helpful to understand the underlying dynamic patterns in space and time simultaneously. In this work, we aim to extend the conventional spatial autocorrelation statistics to a more general framework considering both spatial and temporal dimensions. Specifically, we focus on the spatiotemporal version of Getis-Ord’s G*. The new indicator STG* can quantify the local association of adjacent features in space and time. As a proof of concept, the proposed method is applied in a large-scale GPS-enabled taxi dataset to identify local spatiotemporal autocorrelation patterns of taxi pick-ups and drop-offs in New York City.

(6) Extracting urban functional regions from points of interest and human activities on location‐based social networks

Data about points of interest (POI) have been widely used in studying urban land use types and for sensing human behavior. However, it is difficult to quantify the correct mix or the spatial relations among different POI types indicative of specific urban functions. In this research, we develop a statistical framework to help discover semantically meaningful topics and functional regions based on the co‐occurrence patterns of POI types. The framework applies the latent Dirichlet allocation (LDA) topic modeling technique and incorporates user check‐in activities on location‐based social networks. Using a large corpus of about 100,000 Foursquare venues and user check‐in behavior in the 10 most populated urban areas of the US, we demonstrate the effectiveness of our proposed methodology by identifying distinctive types of latent topics and, further, by extracting urban functional regions using K‐means clustering and Delaunay triangulation spatial constraints clustering. We show that a region can support multiple functions but with different probabilities, while the same type of functional region can span multiple geographically non‐adjacent locations. Since each region can be modeled as a vector consisting of multinomial topic distributions, similar regions with regard to their thematic topic signatures can be identified. Compared with remote sensing images which mainly uncover the physical landscape of urban environments, our popularity‐based POI topic modeling approach can be seen as a complementary social sensing view on urban space based on human activities.

(7) Detecting Origin-Destination Mobility Flows From Geotagged Tweets

Human origin-to-destination (OD) trip information is of major importance in urban transportation modelling and infrastructure planning in order to optimize the use of street networks. The increasing use of social media like Twitter offers unprecedented opportunities to study individual activities and movements, to know where users are at which time, and what they are talking about. In this work we study the reliability of detecting regional OD trips from individual geotagged tweets in comparison with survey data in a quantitative manner, and explore the spatiotemporal flow patterns extracted from social media. We investigate the research question of whether OD trips mined from social media yield comparable results to expensive and labour intensive large-scale studies. To do so, we derive large-scale OD trips from geotagged tweets, aggregate them, and compare the results by correlating them to the American Community Survey data.

(8) Exploring the uncertainty of activity zone detection from Geotagged Tweets with multi-scaled DBSCAN (M-DBSCAN)

While exploring human mobility patterns based on digital footprints captured from social media, the density-based spatial clustering of applications with noise (DBSCAN) method is often used to identify activity zones which an individual regularly visits. However, DBSCAN is sensitive to the two parameters, including the search radius of a cluster (eps), and the minimum number of points (minpts). This research first discusses the uncertainty while detecting an individual’s activity zones through digital footprints. An improved density-based clustering algorithm for mobility analysis known as Multi-Scaled DBSCAN (M-DBSCAN), is then presented to mitigate the detection uncertainty of clusters produced by DBSCAN at different scales of density and cluster size. Next, we demonstrate that M-DBSCAN iteratively calibrates suitable local eps and minpts values instead of using one global parameter setting as DBSCAN for detecting clusters of varying densities, and proves to be very effective for detecting potential activity zones (clusters) with the historic geo-tagged tweets of selected users. Besides, M-DBSCAN can significantly reduce the noise ratio (the proportion of trajectory points not included in any cluster) by identifying all points capturing the activities performed in each zone. Using the historic geo-tagged tweets of a large number of users in Madison, Wisconsin and Washington, D.C., the results of M-DBSCAN and DBSCAN with a minpts value of 4 and varying eps values reveal that: 1) M-DBSCAN can capture dispersed clusters with low density of points, and therefore detecting more activity zones for each user and resulting in a lower noise ratio; 2) A value of 40m or higher should be used for eps in order to reduce the possibility of collapsing distinctive activity zones, and ensure a relatively low noise ratio during the clustering process; and 3) A value between 200m to 300m is recommended for eps while using DBSCAN for detecting activity zones from geotagged tweets.

(9) Reconstruction of human movement trajectories from large-scale low-frequency mobile phone data

Understanding human mobility is significant in many fields, such as urban planning, transportation, and sociology. Due to the wide spatiotemporal coverage and low operational cost, mobile phone data have been recognized as a major resource for human mobility research. However, due to conflicts between the data sparsity problem of mobile phone data and the requirement of fine-scale solutions, trajectory reconstruction is of considerable importance. Although there have been initial studies on this problem, existing methods rarely consider the effect of similarities among individuals and the temporal patterns of missing data. To address this issue, we propose a novel multi-criteria data partitioning trajectory reconstruction (MDP-TR) method for large-scale mobile phone data. In this research, a multi-criteria data partitioning (MDP) technique is used to measure the similarity among individuals in near real-time and investigate the spatiotemporal patterns of missing data. With this technique, the trajectory reconstruction from mobile phone data is then conducted with classic machine learning models. We verified the method using a real mobile phone dataset in a large city. Results indicate that the MDP-TR method outperforms competing methods in both accuracy and robustness. We argue that the MDP-TR method can be effectively utilized for grasping highly dynamic human movement status and improving the spatiotemporal resolution of human mobility research.

(10) Understanding neighborhood isolation through spatial interaction network analysis using location big data

Hidden biases of racial and socioeconomic preferences shape residential neighborhoods throughout the USA. Thereby, these preferences shape neighborhoods composed predominantly of a particular race or income class. However, the assessment of spatial extent and the degree of isolation outside the residential neighborhoods at large scale is challenging, which requires further investigation to understand and identify the magnitude and underlying geospatial processes. With the ubiquitous availability of location-based services, large-scale individual-level location data have been widely collected using numerous mobile phone applications and enable the study of neighborhood isolation at large scale. In this research, we analyze large-scale anonymized smartphone users’ mobility data in Milwaukee, Wisconsin, to understand neighborhood-to-neighborhood spatial interaction patterns of different racial classes. Several isolated neighborhoods are successfully identified through the mobility-based spatial interaction network analysis.

a dark background version of the neighborhood isolation map without a cartogram in Milwaukee, Wisconsin


Gao, S., Liu, Y., Wang, Y., & Ma, X. (2013) Discovering Spatial Interaction Communities from Mobile Phone Data. Transactions in GIS. 17(3):463-481.

Gao, S. (2015). Spatio-temporal analytics for exploring human mobility patterns and urban dynamics in the mobile age. Spatial Cognition & Computation, 15(2), 86-114.

Gao, S., Janowicz, K., & Couclelis, H. (2017). Extracting urban functional regions from points of interest and human activities on location‐based social networks. Transactions in GIS21(3), 446-467.

Gao, S., Yan, B., Gong, L., Regalia, B., Ju, Y., & Hu, Y. (2017). Uncovering the digital divide and the physical divide in Senegal using mobile phone data. In Advances in Geocomputation (pp. 143-151). Springer, Cham.

Gao, S., Yang, J. A., Yan, B., Hu, Y., Janowicz, K., & McKenzie, G. (2014, September). Detecting origin-destination mobility flows from geotagged Tweets in greater Los Angeles area. In Eighth International Conference on Geographic Information Science (GIScience’14).

Gao, S., & Long, Y. (2015). Finding public transportation community structure based on large-scale smart card records in Beijing. In Long, Y., & Shen, Z. (Eds) Geospatial Analysis to Support Urban Planning in Beijing (pp. 155-167). Springer, Cham.

Gao, S., Zhu, R., & Mai, G. (2016, January). Identifying Local Spatiotemporal Autocorrelation Patterns of Taxi Pick-ups and Drop-offs. In International Conference on GIScience Short Paper Proceedings (Vol. 1, No. 1). (pp.109–113), Montreal, Canada, Sep.27-Sep.30, 2016. DOI:10.21433/B31104b2d8xp.

Gao, S., Chen, H., Luo, W., Hu, Y., & Ye, X. (2018) Spatio-Temporal-Network Visualization for Exploring Human Movements and Interactions in Physical and Virtual SpacesIn Shih-Lung Shaw and Daniel Sui (Eds): Human Dynamics Research in Smart and Connected Communities (Chapter 4). pp. 67-80, Springer.

Lee, J. H., Gao, S., & Goulias, K. G. (2015). Can Twitter data be used to validate travel demand models. In 14th International Conference on Travel Behaviour Research.

Liu, X.Y., Huang, Q.Y., & Gao, S. (2019). Exploring the uncertainty of activity zone detection using digital footprints with multi-scaled DBSCAN. International Journal of Geographical Information Science. DOI: 10.1080/13658816.2018.1563301

Li, M.X.,Gao, S., Lu, F., & Zhang, H.C. (2019). Reconstruction of human movement trajectories from large-scale low-frequency mobile phone data. Computers, Environment and Urban Systems. DOI: 10.1016/j.compenvurbsys.2019.101346

Prestby, T., App, J., Kang, Y., & Gao, S. (2019). Understanding Neighborhood Isolation through Spatial Interaction Network Analysis using Location Big Data. Environment and Planning A: Economy and Space. DOI: 10.1177/0308518X19891911