Skip to content

Tricera-topping the Charts: Taking a Closer Look at US State Fossils (and US Fossil Records in General)

Project Overview

For this project, I was originally interested in exploring the Paleobiology Database and comparing their datasets to the current list of official US State fossils to see if there are any patterns to be found in the selection of an official state fossil (i.e., most common fossil discovered in the state, unique fossil to that state, etc.).

While I’ve known for most of my life that my home state (Colorado) has the stegosaurus as its official fossil despite the fact that most of the state used to be an ancient seabed (I once found a fish fossil on a 2nd grade field trip near my house), it wasn’t until this project that I realized how little I knew about Washington’s prehistoric past or about the selection of official state fossils.

In discussing my potential project idea with my peers and family members, the most common response I received before finishing my explanation was “ohhh, what’s the state fossil for ___?”, immediately followed by excited internet searches. Taking this as an indicator of success, I hoped to use R to answer the following questions:

    • What are the most commonly reported fossils in the PBDB dataset?
    • What is the most popular (i.e., most commonly shared) state fossil?
    • Which states are the most commonly shared state fossils found in?

While I did eventually come up with an answer for each of these, examining datasets to get those answers ended up being a lot more complicated than I originally anticipated.

 

The Paleobiology Database Dataset

This project uses open-access data available from The Paleobiology Database. According to their website, “The Paleobiology Database (PBDB) is a non-governmental, non-profit public resource for paleontological data. It has been organized and operated by a multi-disciplinary, multi-institutional, international group of paleobiological researchers”.

Fossil occurrences are added to The Paleobiology Database by nearly 400 scientists from over 130 institutions in 24 countries, all of whom have been approved to add to the collection by the PBDB Executive Committee.

In order to limit the scope of my dataset to the United States and obtain location information, I used the following options on their data download page:

  • Download: Specimens, csv, show all available parameters
  • Select by location: Country = United States
  • Output options: Additional output blocks = location

With these filters, the resulting dataset is roughly 32,700 lines and 35 columns and can be downloaded here: https://paleobiodb.org/data1.2/specs/list.csv?datainfo&rowcount&cc=US&show=loc.

 

Disclaimer:

Just from looking at the download generator page, it is clear that this is a database created and organized by knowledge experts. The page allows filtering for things that I, a lifelong casual paleontology fan, have never even heard of before. I am certain that there are industry standards or practices that I am unaware of that influence how the data is currently being presented. Someone who has more familiarity with the vocabulary in this field would likely be able to see additional trends that I might miss, and catch if I am making an assumption about the data born from inexperience.

From my initial perusal of this specific dataset however, I feel confident in understanding which columns contain the state, the species, and the specimen ID, which should allow me to use that information without accidentally counting duplicates (such as multiple bones from the same specimen being listed on separate lines).

 

The State Fossil Dataset

My next step was to create a data frame listing states with their official state fossils, for use in comparative analysis against the PBDB dataset later. To do this, I compared the National Park Service’s webpage of official state fossils to Wikipedia’s listing of U.S. State Fossils. While the vast majority of states had identical state fossil information listed on both sites, I caught a few notable differences such as Minnesota’s newly adopted state fossil (May, 2025) not appearing on the National Parks page yet.

  • The Science Museum of Minnesota launched a State Fossil Campaign to explore several options for their official state fossil, and I highly recommend checking it out even though the campaign is over.

In aggregating this list of state fossils, I realized that states have wildly varying degrees of specificity in their selection of fossils, which could impact my ability to identify the correct species in the PBDB dataset. For example:

Several states have chosen an official state fossil, but their selection is incredibly broad:

  • Rhode Island’s state fossil, Trilobite, is technically a class of extinct marine arthropods, rather than a species
  • Kentucky has listed the entire phylum of Brachiopods as their state fossil
  • Massachusetts and Connecticut both chose “dinosaur tracks” (eubrontes giganteus) for their state fossil
  • Georgia, likewise, has settled on “shark teeth” for their state fossil classification

On the other end of the spectrum, we have states like Nebraska.

  • Nebraska has listed not one, but three very specific, distinct species of mammoth as their state fossil.

Several states have more than one official fossil, including Kansas, Nebraska, and Ohio, while some states don’t have an official state fossil at all. I briefly considered narrowing the scope of my project to only states with official state fossils, but people like being able to connect fun and cool data to the places they are familiar with so I made a few exceptions:

  • Arkansas and Texas have state dinosaurs instead of state fossils and personally I think that’s close enough
  • Florida and New Hampshire have unofficial/proposed state fossils and I wanted to include them as well

Which leaves us with two states that I eventually had to leave blank: Hawaii and Iowa. Neither of these states have a state fossil (official or unofficial), and they also don’t have any fossil records in the PBDB dataset. Geologically, it makes sense for Hawaii to have fewer fossil records as it is made up of volcanic islands, but I’m really curious to know what the reason behind Iowa’s dearth of dinosaurs could be.

 

Exploring the PBDB Dataset

In order to get a better understanding of the fossil data I was working with, I grouped all of the fossil records in the downloaded PBDB dataset by state and plotted them in the map below:

 

Two things in this graphic particularly stand out:

1. This dataset only appears to contain data for 41/50 states

It’s worth noting that while this dataset is incredibly robust, the Paleobiology Database collects data primarily from published writing including academic journal articles or book chapters. Since their goal of “databasing every published fossil occurrence on the planet” is foundationally limited to published occurrences, this initial exploration can inform us about potential gaps in the dataset. For example, Georgia’s state fossil, “shark teeth”, is common enough to be sold in local shops, but there aren’t any records of shark teeth in the PBDB dataset for Georgia’s fossil record.

 

2. Of the 32,746 total fossil records in this dataset, 47% were discovered in Texas, making it a clear outlier.

When we exclude Texas from the dataset in order to better visualize the distribution of fossil records through the rest of the country, we see that Colorado, Wyoming, Nevada, and California are the next highest contributors to the PBDB fossil repository.

Filtering out all five of these states still leaves us with a consistent clustering of where states with the highest number of documented fossil records appear:

All of these states cluster around what used to be the Western Interior Seaway during the Late Cretaceous period. The higher quantities of fossils can likely be attributed to the ideal fossilization conditions created when a sediment-laden interior sea eventually gets replaced with the Rocky Mountains. Additionally, NPS notes that “dinosaurs are rare in the eastern half of the country because this area was generally eroding instead of being a place of deposition when dinosaurs were around”.

 

Answering the Original Questions

Question 1: What are the most commonly reported fossils in the PBDB dataset? 

With 196 records each, Jupiteria and Venericardia are the most commonly reported fossils in the PBDB dataset. From what I can tell, they both appear to be a genus of marine bivalve mollusks. Venericardia (Baluchicardia) bulla, a species within the Venericardia genus, also appears as the fourth-most commonly reported fossil at 135 records.

I suppose I shouldn’t have been surprised that mollusks are the most commonly reported fossils, given that most of our fossils are discovered in what used to be an ancient sea bed.

 

Question 2: What is the most popular (i.e., most commonly shared) state fossil?

The vast majority of states seem to have a unique official state fossil. The only exceptions to this are:

  • Mammuthus primigenius (Woolly Mammoth) is shared by Nebraska, Alaska, and Vermont
  • Mammuthus columbi (Columbian Mammoth) is shared by Nebraska, South Carolina, and Washington
  • Mammut americanum (Mastodon) is shared by Indiana, Michigan, and (unofficially) New Hampshire
  • Triceratops horridus (Triceratops) is shared by South Dakota and Wyoming
  • Tubrontes giganteus (dinosaur tracks) is shared by Connecticut and Massachusetts


Mammoths are clearly the popular choice when it comes to selecting an official state fossil, and shout out to Nebraska for including Mammuthus imperator as their third official state fossil despite already claiming the other two top mammoths as well.

 

Question 3: Which states are the most commonly shared state fossils found in? 

Of the five state fossils from Question 2 that were shared between multiple states, only three of them actually appeared within the PBDB dataset:

With more time, I would try to discover why South Carolina (official state fossil: Mammuthus columbi) is tied with Utah for the highest number of Mammut americanum fossil records, as they don’t have any records for their actual state fossil in the PBDB dataset.

 

What ethical considerations must be kept in mind when using this data?

PBDB has made it explicitly clear on their website that they are an open-access resource and have granted complete permission for anyone around the world to access their data via the internet. While this does reduce the number of ethical considerations that need to be considered when working with this data as permission for use has already been given, there are several considerations that still exist. General users (like myself) are not allowed to contribute data directly to the database, so my work will cite the Paleobiology Database but will not impact or affect the existing database. Another consideration is that researchers are allowed to embargo their own contributed data, so if something arises where part of the dataset I downloaded is embargoed, I will remove/adjust accordingly.

– Susan Broadfoot

 

Skip to toolbar