Our Datasets
Data Critique
For this study, the Stanford Education Data Archive (SEDA) used various publicly available data files, technical documentation, and data codebooks to gather data. The researchers used data pertaining to Title I funding, test scores, district level poverty rates, district characteristics, federal pandemic relief funding, and the percent of the 2020-21 school year that the district was operating with remote, hybrid, or in-person instruction. Test score data was found through probit models to estimate the average score in each district from the counts in each proficiency category. Regarding info about Title I funding, researchers obtained data on state-adjusted Title I allocations for districts from ED Data Express, allowing them to allocate Title I aid proportionately. Data on federal pandemic relief spending for schools was collected from Burbio, Inc. and Edunomics.org, including ESSER II and ESSER III allocations and spending reports as of various dates. Moreover, missing ESSER II data was imputed based on state-level ratios. Using Common Core of Data and EDFacts data, district characteristics data was generated using the Longitudinal Imputed School Dataset to estimate free and reduced-price lunch eligibility rates for U.S. public schools. Imputed values for erroneous entries and district-level rates were calculated as an enrollment-weighted average of school-level rates. Finally, the measure of the share of each district operating remotely or in a hybrid mode was generated by averaging weekly district-level data from the Return to Learn tracker and the COVID-19 School Data Hub. Missing values were imputed via regression predictions to account for errors in each data source.
The creation of this dataset by the Stanford Education Data Archive (SEDA), an initiative to improve educational opportunities via the use of data, is housed by the Educational Opportunity Project (EOP) at Stanford University. The National Center for Education Statistics (NCES), part of the IES, provided some data used in the creation of SEDA files; Data used to develop SEDA was retrieved from two primary sources: one is EDFacts, a U.S. Department of Education initiative with a similar goal to SEDA, via an agreement with NCES. The second is state-reported accountability data, and standardization of state data was through the National Assessment of Educational Progress (NAEP) SEDA itself is funded by grants from the Bill and Melinda Gates Foundation, the William T. Grant Foundation, Institute of Education Sciences (IES), the Overdeck Family Foundation, the Spencer Foundation, and a visiting fellowship from the Russell Sage Foundation.
The EOP organizes its data aims to “generate and share data and research that can help scholars, policymakers, educators, and parents learn how to improve educational opportunities for all children.” Thus, while it is academically-oriented, it is also designed to be accessible for the general public. It targets racial achievement gaps, gender achievement gaps, and SES achievement gaps across non-COVID impacted years, and aims to shed light on pandemic-related losses and recoveries with regards to achievement.
The Covid Pandemic Impact on Educational Opportunities dataset includes versions 1.0 to 5.0 and SEDA 2022 to 2023. Specifically, the SEDA provides data measuring educational opportunities in every community in America. They give data that indicates 2009-2019 educational opportunities, which measures demographics, opportunity gaps, and export reports. Also, Education Recovery Explorer gives insight into remote learning federal funds that impacted student learning during the pandemic season. The interactive map depicts school segregation between racial and economic groups in the country. However, some information needs to be included in the spreadsheets.
First, there is a need for more data according to ethnicity and type of school. For the dataset, the achievement gaps measure all the races against white, but other permutations are not included (e.g. Hispanic-Asian achievement gaps), so it does not account for different racial gaps. Consequently, we have less data to draw a more thorough conclusion about how different racial groups were impacted by the COVID-19 pandemic. Moreover, they separate individuals based on specific racial categories. Dividing the data in this way creates a fixed set of experiences to measure and is protective of student privacy, but may not be representative of the diverse and intersectional characteristics, identities, and experiences of students at these schools. In measuring for gender-achievement gaps, the 2012-2019 data dichotomizes male and female students while the 2022-23 model does not account for gender. This reinforces an ideological binary in this category that may not be representative of students’ self-declared identities.
In the data, test scores are a proxy for student achievement. This limits the implications of the study to test scores specifically, leaving out information on other measures such as COVID-19 transmission, student health and wellbeing, social-emotional wellness, absenteeism, enrollment rates, disciplinary rates, etc.
Lastly, SEDA data is focused on public school data, and private school data is not included. Although the number of public schools in the United States is higher than that of private schools, the data from private schools is not negligible because the proportion of students attending private schools is high. Also, even though SEDA includes data from every state in the U.S., it needs more information on some states. For example, states with insufficient data (i.e. Colorado and Tennessee) are removed. The researchers clarify that they have not made any adjustment to the estimates to account for changes in the population and in subgroups between 2019 and 2022, which could impact the data due to pandemic losses and declining enrollment. Additionally, SEDA provides data until 2023, so users need to research the latest data from 2024 individually. The SEDA data includes demographic information, but there needs to be more information about financial support for each school. Academic performance will differ from financial support or other outside impacts.