Sampling Bias in Cuebiq Dataset in MSAs

We study the bias in mobility data across 11 Metropolitan Statistical Areas (MSAs) in the U.S, including New York NY, Los Angeles CA, Boston MA, Seattle WA, Baltimore MD, Tulsa OK, Fresno CA, Tyler TX, Champaign-Urbana IL, Sebring-Avon Park FL, and Cheyenne WY. These MSAs are selected to include diverse socioeconomic and racial compositions.

Based on detected home locations, sampling bias is analyzed according to the ratio between the number of devices and the population. Here we assume that there is no bias among population groups in the same block group. In the violin plot below, 6 MSAs with a population over 500,000 are visualized. In Los Angeles, CA, despite that both White and other races are under-represented in our dataset, the bias in other race groups reaches -9.60% in total which is much more significant than that in the White population at -1.04%. A similar pattern also exists in Fresno, CA. In Boston, MA, and Seattle, WA, the pattern reversed that the White population is less represented than other races. In Tulsa, OK, both White and other racial groups are over-represented. And in Baltimore, MD, the bias in both groups seems balanced when outliers are excluded.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *