Sampling Bias in Cuebiq Dataset in MSAs

We study the bias in mobility data across 11 Metropolitan Statistical Areas (MSAs) in the U.S, including New York NY, Los Angeles CA, Boston MA, Seattle WA, Baltimore MD, Tulsa OK, Fresno CA, Tyler TX, Champaign-Urbana IL, Sebring-Avon Park FL, and Cheyenne WY. These MSAs are selected to include diverse socioeconomic and racial compositions.

Based on detected home locations, sampling bias is analyzed according to the ratio between the number of devices and the population. Here we assume that there is no bias among population groups in the same block group. In the violin plot below, 6 MSAs with a population over 500,000 are visualized. In Los Angeles, CA, despite that both White and other races are under-represented in our dataset, the bias in other race groups reaches -9.60% in total which is much more significant than that in the White population at -1.04%. A similar pattern also exists in Fresno, CA. In Boston, MA, and Seattle, WA, the pattern reversed that the White population is less represented than other races. In Tulsa, OK, both White and other racial groups are over-represented. And in Baltimore, MD, the bias in both groups seems balanced when outliers are excluded.

Community Call Plan

The team plans to engage stakeholders and collaborative labs to develop and issue the community call to all mobility labs in the US and around the world. The community call will specify a set of mobility metrics to be submitted along with a set of information that describes the data and the techniques used. A specific due date will be identified in the call by which participating labs shall submit their results to the PI team. Upon receiving the submissions from many labs around the world, the PI team will conduct a meta-analysis of the submissions. The results of the meta-analysis will be presented to all participating labs in a virtual meeting, also attended by the stakeholders. The culminating event of the research project is an in-person whole-community workshop to be held at the University of Washington involving all participating labs and stakeholders. This whole-community workshop has two important purposes: 1) to report back the findings of this community-coordinated effort and formulate (with all participating labs and stakeholders) recommendations for the research community and for the stakeholders (policymakers); and 2) to develop a result dissemination mechanism involving all participating labs and stakeholders. Planned dissemination methods include: manuscripts, white papers, short essays for non-technical audiences, and policy briefs, and short videos.

We expect to release the community call in late 2022 or early 2023. Please check back.