Prof. Gao joins the new NSF funded AI Institute: ICICLE

Today, the U.S. National Science Foundation (NSF) announced the establishment of 11 new NSF National Artificial Intelligence Research Institutes. Each institute will receive $20 million for a total $220 million investment by NSF. Building off of seven institutes funded in 2020, the new program is meant to broaden access to AI to solve complex societal problems.

Prof. Song Gao joins the Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE).

Led by The Ohio State University, ICICLE will build the next generation of Cyberinfrastructure to render Artificial Intelligence (AI) more accessible to everyone and drive its further democratization in the larger society.

ICICLE will build and prove its system around three use-inspired science application domains: smart foodsheds, digital agriculture, and animal ecology. Analogous to watersheds, foodsheds define the geographical and human elements that affect how, when and where food is grown and consumed. Digital agriculture seeks to use technology to improve the yield and efficiency of crops, while animal ecology focuses on the roles of animals in agriculture and the environment.

More information on: https://icicle.ai/

Two COVID-19 research papers published in PNAS

  1. Xiao Hou, Song Gao*, Qin Li*, Yuhao Kang, Nan Chen, Kaiping Chen, Jinmeng Rao, Jordan S. Ellenberg, Jonathan A. Patz (2021) Intracounty modeling of COVID-19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race. Proceedings of the National Academy of Sciences. June 15, 2021, 118 (24) e2020524118; DOI: 10.1073/pnas.2020524118

Abstract:

The COVID-19 pandemic is a global threat presenting health, economic, and social challenges that continue to escalate. Meta-population epidemic modeling studies in the susceptible–exposed–infectious–removed (SEIR) style have played important roles in informing public health policy making to mitigate the spread of COVID-19. These models typically rely on a key assumption on the homogeneity of the population. This assumption certainly cannot be expected to hold true in real situations; various geographic, socioeconomic, and cultural environments affect the behaviors that drive the spread of COVID-19 in different communities. What’s more, variation of intracounty environments creates spatial heterogeneity of transmission in different regions (e.g., varying peak infection timing). To address this issue, we develop a human mobility flow-augmented stochastic SEIR-style epidemic modeling framework with the ability to distinguish different regions and their corresponding behaviors. This modeling framework is then combined with data assimilation and machine learning techniques to reconstruct the historical growth trajectories of COVID-19 confirmed cases in two counties in Wisconsin. The associations between the spread of COVID-19 and business foot traffic, race and ethnicity, and age structure are then investigated. The results reveal that, in a college town (Dane County), the most important heterogeneity is age structure, while, in a large city area (Milwaukee County), racial and ethnic heterogeneity becomes more apparent. Scenario studies further indicate a strong response of the spread rate to various reopening policies, which suggests that policy makers may need to take these heterogeneities into account very carefully when designing policies for mitigating the ongoing spread of COVID-19 and reopening.

2. Xiaoyi Han, Yilan Xu, Linlin Fan, Yi Huang, Minhong Xu, Song Gao. (2021) Quantifying COVID-19 importation risk in a dynamic network of domestic cities and international countries. Proceedings of the National Academy of Sciences. August 3, 2021, 118 (31) e2100201118; DOI: 10.1073/pnas.2100201118

Abstract:

Since its outbreak in December 2019, the novel coronavirus 2019 (COVID-19) has spread to 191 countries and caused millions of deaths. Many countries have experienced multiple epidemic waves and faced containment pressures from both domestic and international transmission. In this study, we conduct a multiscale geographic analysis of the spread of COVID-19 in a policy-influenced dynamic network to quantify COVID-19 importation risk under different policy scenarios using evidence from China. Our spatial dynamic panel data (SDPD) model explicitly distinguishes the effects of travel flows from the effects of transmissibility within cities, across cities, and across national borders. We find that within-city transmission was the dominant transmission mechanism in China at the beginning of the outbreak and that all domestic transmission mechanisms were muted or significantly weakened before importation posed a threat. We identify effective containment policies by matching the change points of domestic and importation transmissibility parameters to the timing of various interventions. Our simulations suggest that importation risk is limited when domestic transmission is under control, but that cumulative cases would have been almost 13 times higher if domestic transmissibility had resurged to its precontainment level after importation and 32 times higher if domestic transmissibility had remained at its precontainment level since the outbreak. Our findings provide practical insights into infectious disease containment and call for collaborative and coordinated global suppression efforts.

GeoDS Lab students’ industry internship experience

Besides schoolwork, students in the GeoDS lab also have the opportunity to work as interns in geospatial industries over the summer. They are able to apply their Cartography/GIS/Spatial Data Science knowledge & skills learned at school to solve some real-world problems and build a better understanding of what are key knowledge & skills that can make a difference! Two students Yunlei Liang and Jinmeng Rao are sharing their summer internship experience in summer 2020 in this post.

In addition, please join us to congratulate lab members and alumni: Yuhao Kang (Google X), Jake Kruse (Arity, Allstate), Jinmeng Rao (Google X), and Timothy Prestby (Apple Maps) will take their 2021 summer internships .

Yunlei Liang :

Last summer, I worked as a Data Science Intern at Arity, a mobility data and analytics company under Allstate. I was very lucky to work on two teams. In the first team, I worked on understanding the impact of COVID-19 on the user trajectories and analyzing how the model and statistics have changed because of the reduced travel. In the second team, I was responsible for evaluating Points of Interest (POIs) from different vendors. I matched their classification and locations, identified the coverage quality, assigned scores to each vendor and produced a recommendation report to the team.

Through this 12-week internship, I learned a lot of technical skills, which also helps me realize what are important knowledge I should improve back to school. The cross-team experience made me learn how to work in a team. It was very different than what I did in school. In a company, I am expected to communicate with different people: my mentor, my teammates, and people from other teams. Understanding what others are doing is extremely important as collaboration is fairly common, and people always help each other by discussing solutions to various problems. Being active and always reaching out to others are my main takeaways from this internship. I also learned a lot of such experience from my previous internship in the Data Science team at Wework Inc.

Jinmeng Rao:

Last summer, I worked as a Geospatial Vision Intern at Sturfee Inc., a spatial intelligence company focusing on Visual Positioning Service (VPS), to design and implement computer vision algorithms and toolkits on geospatial data (e.g., street/satellite view images, GPS traces) to improve city-scale AR experience.

During my 3-month internship at Sturfee, our team developed a cross-view Perspective-n-Point (PnP) aligner tool for estimating and refining camera pose based on satellite images and street view images. My main tasks were to design an efficient algorithm to synthesize aerial view images from street view images and to integrate the algorithm into the tool. After the integration, the camera pose estimation accuracy is significantly improved, and the PnP aligner tool becomes much easier to use. I also worked on designing a grid-based keypoint matching algorithm to automatically find matching points between two different views and search for the best camera pose accordingly.

My internship experience at Sturfee is great and fruitful. As an intern, I had a chance to learn state-of-the-art industrial solutions, and I got a general picture of what the industry cares about more. The biggest takeaway for me is that I learned how to apply our skills to solve some real-world problems in the industry. I believe my experience at Sturfee will help me do better in research or work in the future.

Prof. Gao joins the Editorial Board of CaGIS and Scientific Reports

Recently, Prof. Gao was invited to serve on the Editorial Board for the following two journals:

Cartography and Geographic Information Science (CaGIS) is the official publication of the Cartography and Geographic Information Society. The Society supports research, education, and practices that improve the understanding, creation, analysis, and use of maps and geographic information. The CaGIS journal implements the objectives of the Society by publishing authoritative peer-reviewed articles that report on innovative research in cartography and geographic information science.

Scientific ReportsNature is an open access journal publishing original research from across all areas of the natural sciences.

A Five-Star Guide for Achieving Replicability and Reproducibility When Working with GIS Software and Algorithms

Reference: John P. Wilson, Kevin Butler, Song Gao, Yingje Hu, Wenwen Li & Dawn J. Wright (2020) A Five-Star Guide for Achieving Replicability and Reproducibility When Working with GIS Software and AlgorithmsAnnals of the American Association of Geographers, DOI: 10.1080/24694452.2020.1806026

Abstract: The availability and use of geographic information technologies and data for describing the patterns and processes operating on or near the Earth’s surface have grown substantially during the past fifty years. The number of geographic information systems software packages and algorithms has also grown quickly during this period, fueled by rapid advances in computing and the explosive growth in the availability of digital data describing specific phenomena. Geographic information scientists therefore increasingly find themselves choosing between multiple software suites and algorithms to execute specific analysis, modeling, and visualization tasks in environmental applications today. This is a major challenge because it is often difficult to assess the efficacy of the candidate software platforms and algorithms when used in specific applications and study areas, which often generate different results. The subtleties and issues that characterize the field of geomorphometry are used here to document the need for (1) theoretically based software and algorithms; (2) new methods for the collection of provenance information about the data and code along with application context knowledge; and (3) new protocols for distributing this information and knowledge along with the data and code. This article discusses the progress and enduring challenges connected with these outcomes.

New Protocols for Distributing the Data and Code of Geospatial Research

Here, we propose a five-star practical guide for sharing data and code in geospatial research, modeled after the five-star system offered by Berners-Lee (2009) for publishing linked open data on the Web. Instead of asking researchers to share all pieces of data and code, this five-star guide encourages a simple start of data and code sharing, and researchers can move to a higher level when time and other resources allow.

See more papers on the Forum on Reproducibility and Replicability in Geography.

Prof. Gao receives a new geospatial data science research grant

The American Family Insurance Data Science Institute (AFIDSI) is honored to announce the results of the new round of the American Family Funding Initiative, a research competition for data science projects. American Family Insurance has partnered with UW–Madison through the Institute to offer “mini grants” of $75k-to-150k per year for data science research. This is the second installation of a $10 million research agreement.

The goal of the American Family Funding Initiative is to stimulate and support highly innovative research. The successful projects, reviewed by faculty and staff from across UW-Madison campus, were evaluated based on their potential contributions to the field of data science, practical use and the novelty of their approaches.

AFIDSI brings people together to launch new research in data science and apply findings to solve problems. In collaboration with researchers across campus and beyond, AFIDSI focuses on the fundamentals of data science research and on translating that research into practice.

New projects funded in the second round of the American Family Funding Initiative include:

A Deep Learning Approach to User Location Privacy Protection
Principal Investigator: Song Gao, Assistant Professor of Geography.
Co-Principal Investigator: Jerry Zhu, Computer Sciences.

Location information is among the most sensitive data being collected by mobile apps, and users increasingly raise privacy concerns. The proposed research aims to develop a deep learning architecture that will protect users’ location privacy while keeping the capability for location-based business recommendations such as usage-based insurance (UBI).

Machine Learning Approaches for Metadata Standardization
Principal investigator: Colin Dewey, Professor of Biostatistics and Medical Informatics.
Co-Principal Investigator: Mark Craven, Biostatistics and Medical Informatics.

The need to manually standardize metadata describing records in large data sets, compiled from many sources, is a major bottleneck in both research and business. This project will develop machine learning approaches for automating metadata standardization and identifying records that would most benefit from expert human input.

Adaptive Operations Research and Data Modeling for Insurance Applications
Principal Investigator: Michael Ferris, Professor of Computer Sciences.

Insurance claims applications must be operated efficiently under normal conditions and allow for rapid reconfiguration in crisis situations. The proposed work will develop optimization models, data and solution processes to schedule resources over time, servicing normal workloads, while creating resilience to abrupt changes from random disturbances.

GAN-mixup: A New Approach to Improve Generalization in Machine Learning
Principal Investigator: Kangwook Lee, Assistant Professor of Electrical and Computer Engineering.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

Recent machine learning successes rely on predictive models that adapt to previously unseen data. This research will provide a new approach to improve such generalization, with provable performance guarantees.

Integer Programming for Mixture Matrix Completion
Principal Investigator: Jeff Linderoth, Professor of Industrial and Systems Engineering.
Co-Principal Investigators: Jim Luedtke, Industrial and Systems Engineering; Daniel Pimentel-Alarcon, Biostatistics and Medical Informatics.

Matrix completion, or filling in the unknown entities in a matrix, is used in applications such as recommender systems and systems for analyzing visual images. This project will apply integer programming techniques to develop algorithms for solving a mixture matrix completion problem, paving the way towards applying this method to large-scale data science problems.

Developing a State-of-the-Science Regional Weather Forecasting System
Principal Investigator: Michael Morgan, Professor of Atmospheric and Oceanic Sciences.
Co-Principal Investigator: Brett Hoover, Space Science and Engineering Center.

This project will develop a weather prediction system for American Family Insurance, run entirely in cloud computing infrastructure, that will improve the accuracy of forecasting hazards such as hail and hurricanes. The probabilistic system will also estimate the uncertainty associated with the predictability of hazardous weather.

Model Recycling: Accelerating Machine Learning by Re-using Past Completions
Principal Investigator: Shivaram Venkataraman, Assistant Professor of Computer Sciences.
Co-Principal Investigator: Dimitris Papailiopoulos, Electrical and Computer Engineering.

Training machine learning models that are used in a wide range of domains, from drug discovery to recommendation engines, takes significant time and resources. This project will automate and accelerate this process of fine-tuning by reusing and sharing past computations from prior training jobs, using a technique called model recycling.

Additionally, two projects from the first round received continued funding:

Question Asking with Differing Knowledge and Goals
Principal investigator: Joe Austerweil, Assistant Professor of Psychology.

Despite tremendous progress in machine learning, automated answers to questions are still inferior to answers from humans. This project investigates whether incorporating psycholinguistic factors that influence how people respond to language can improve automated question-answering methods.

Lightweight Natural Language and Vision Algorithms for Data Analysis
Principal investigator: Vikas Singh, Professor of Biostatistics and Medical Informatics. Collaborators: Zhanpeng Zeng, Computer Sciences; Shailesh Acharya and Glenn Fung, American Family Insurance.

Natural language processing is a form of artificial intelligence that helps computers read and understand human language. The overarching goal of this project is to accelerate the time it takes to train and test efficient, accurate natural language processing models.

National Fellowships Engage Geospatial Research And Education On COVID-19

Projects address human mobility patterns, access to health care and food systems, racial and disability disparities during the pandemic.

The Geospatial Software Institute (GSI) Conceptualization Project has announced 16 fellowships to researchers at 13 institutions to tackle COVID-19 challenges using geospatial software and advanced capabilities in cyberinfrastructure and data science. Prof. Song Gao was selected as one of the geospatial fellows. A full list of the fellows, with biographies and project information, is at https://gsi.cigi.illinois.edu/geospatial-fellows-members/.

The GSI Conceptualization Project is supported by the National Science Foundation (NSF), and carried out in partnership with the American Association of Geographers (AAG), Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), the National Opinion Research Center (NORC) at the University of Chicago, Open Geospatial Consortium (OGC), and University Consortium for Geographic Information Science (UCGIS). Technical and cyberinfrastructure support are provided by the CyberGIS Center for Advanced Digital and Spatial Studies (CyberGIS Center)  at the University of Illinois at Urbana-Champaign. 

“The COVID-19 crisis has shown how critical it is to have cutting-edge geospatial software and cyberinfrastructure to tackle the pandemic’s many challenges,” said Shaowen Wang, the principal investigator of the NSF project and founding director of the CyberGIS Center. “We are extremely grateful for NSF’s support to fund this talented group of researchers, whose work is so diverse yet complementary.”

Michael Goodchild, chair of the NSF project advisory committee and professor emeritus in geography at UC-Santa Barbara, agreed. “Geospatial data and tools have enormous potential for helping us address the challenges of COVID-19, and these 16 Fellows have exactly the right qualifications and experience. I’m very excited to see what they are able to achieve.”

The Fellows come from varied professional, cultural, and institutional backgrounds, representing many disciplinary areas, including public health, food access, emergency management, housing and neighborhood change, and community-based mapping. The fellowship projects represent frontiers of emerging geospatial data science, including for example geospatial AI and deep learning, geovisualization, and advanced approaches to gathering and analyzing geospatial data.

Pioneered by multi-million dollar research funded by NSF, cyberGIS (i.e., cyber geographic information science and systems based on advanced computing and cyberinfrastructure) has emerged as a new generation of GIS, comprising a seamless integration of advanced cyberinfrastructure, GIS, and spatial analysis and modeling capabilities while leading to widespread research advances and broad societal impacts. Built on the progress made by cyberGIS-related communities, the GSI conceptualization project is charged with developing a strategic plan for a long-term hub of excellence in geospatial software infrastructure, one that can better address emergent issues of food security, ecology, emergency management, environmental research and stewardship, national security, public health, and more.

The Geospatial Fellows program will enable diverse researchers and educators to harness geospatial software and data at scale, in reproducible and transparent ways; and will contribute to the nation’s workforce capability and capacity to utilize geospatial big data and software for knowledge discovery. With a particular focus on COVID-19, the combined research findings of the Fellows will offer insight on how to make geospatial research computationally reproducible and transparent, while also developing novel methods, including analysis, simulation, and modeling, to study the spread and impacts of the virus. The Fellows’ research will substantially add to public understanding of the societal impacts of COVID-19 on different communities, assessing the social and spatial disparities of COVID-19 among vulnerable populations.

“I look forward to seeing the results of these projects, particularly as FAIR and open datasets, software, and models that others can then build on,” says Daniel S. Katz, Assistant Director for Scientific Software and Applications at the National Center for Supercomputing Applications (NCSA), the University of Illinois.

For more information about the GSI conceptualization project, see their website: https://gsi.cigi.illinois.edu/.

For a list of Geospatial Fellows and their projects, visit https://gsi.cigi.illinois.edu/geospatial-fellows-members/.

Location Big Data for Business Analytics

Reference: Yunlei Liang, Song Gao, Yuxin Cai, Natasha Z. Foutz, Lei Wu. (2020) Calibrating the dynamic Huff model for business analysis using location big data. Transactions in GIS, 24(3), 681-703.

Abstract: The Huff model has been widely used in location‐based business analysis to delineate a trade area containing a store’s potential customers. Calibrating the Huff model and its extensions requires empirical location visit data. Many studies rely on labor‐intensive surveys. With the increasing availability of mobile devices, users in location‐based platforms share rich multimedia information about their locations at a fine spatio‐temporal resolution, which offers opportunities for business intelligence. In this research, we present a time‐aware dynamic Huff model (T‐Huff) for location‐based market share analysis and calibrate this model using large‐scale store visit patterns based on mobile phone location data across the 10 most populated US cities. By comparing the hourly visit patterns of two types of stores, we demonstrate that the calibrated T‐Huff model is more accurate than the original Huff model in predicting the market share of different types of business (e.g., supermarkets versus department stores) over time. We also identify the regional variability where people in large metropolitan areas with a well‐developed transit system show less sensitivity to long‐distance visits. In addition, several socioeconomic and demographic factors (e.g., median household income) that potentially affect people’s visit decisions are examined and summarized.

The Whole Foods Markets in Los Angeles with their temporal visit probability.
The spatial distributions of CBGs that have visit flows to five Whole Foods Markets.
The probability density distribution, empirical cumulative distribution, and log-log plots of visitors’ distance from home to supermarkets and grocery stores (NACIS: 445110) and to department stores (NACIS: 452210) in the top 10 most populated cities in US.
(a) Estimated market share of five Whole Foods Market stores in Los Angeles using the original Huff model; and (b) Actual market share derived from the SafeGraph visit database.

New Research Paper on Trajectory Privacy Protection accepted in GIScience 2021

Reference: Rao, J., Gao, S., Kang, Y., & Huang, Q. (2020). LSTM-TrajGAN: A Deep Learning Approach to Trajectory Privacy Protection. In the Proceedings of the 11th International Conference on Geographic Information Science (GIScience 2021), No. 12; pp. 12:1–12:17. DOI: 10.4230/LIPIcs.GIScience.2021.12 [PDF]

Abstract: The prevalence of location-based services contributes to the explosive growth of individual-level trajectory data and raises public concerns about privacy issues. In this research, we propose a novel LSTM-TrajGAN approach, which is an end-to-end deep learning model to generate privacy-preserving synthetic trajectory data for data sharing and publication. We design a loss metric function TrajLoss to measure the trajectory similarity losses for model training and optimization. The model is evaluated on the trajectory-user-linking task on a real-world semantic trajectory dataset. Compared with other common geomasking methods, our model can better prevent users from being re-identified, and it also preserves essential spatial, temporal, and thematic characteristics of the real trajectory data. The model better balances the effectiveness of trajectory privacy protection and the utility for spatial and temporal analyses, which offers new insights into the GeoAI-powered privacy protection.