The researchers found that in some stores, highly processed foods were the only choice in some categories
A recent one Nature Nature The study used machine learning techniques to analyze over 50,000 products from major US grocery websites, developing the GroceryDB database, which facilitates consumer decision-making and informs public health initiatives.
Quantifying the extent of food processing in grocery stores
Research has shown the adverse health effects of reliance on ultra-processed foods (UPF), which contribute up to 60% of total calorie intake in developed countries. Much UPF reaches consumers through grocery stores, which raises questions about quantifying the extent of food processing in the food supply, methods to be used, and alternatives to reduce UPF consumption.
Measuring the degree of food processing is not straightforward because food labels often contain mixed and unclear messages, leaving room for ambiguities and differences in interpretation. Therefore, scientists have advocated a more objective definition of the degree of food processing based on biological mechanisms.
Furthermore, due to these large-scale and complex data, artificial intelligence (AI) methodologies are increasingly being used to promote food security.
About the study
Publicly accessible food data was collected from the websites of the leading US grocers, Walmart, Target and Whole Foods. Websites were navigated to locate specific food items and consistency was ensured by aligning the classification systems used by each store.
Food labels were used to standardize nutrient concentrations, while FoodProX was used to assess the degree of processing of each product. FoodProX is a random forest classifier that translates combinatorial changes in the amounts of nutrients affected by food processing into a food processing score (FPro).
Extensive testing and validation was performed on the stability of FPro. The final score depended on the likelihood of observing the overall pattern of nutrient concentrations in unprocessed foods in contrast to the UPF. Variation in price per calorie at different levels of food processing was estimated using robust linear models with Huber’s t rule.
Study findings
Leveraging the FoodProX machine learning classifier, the GroceryDB database assigned an FPro score to all foods. In all three supermarkets, the distribution of FPro was similar and the results showed that foods with low FPro (minimal processing) represent a relatively small fraction of the grocery stock. Most items were in the high FPro or UPF category. Low FPro items account for a proportionally larger portion of actual purchases, showing a mismatch between sales data and available food choices.
Some differences between stores were noted, eg, Whole Foods offers fewer ultra-processed options, while Target offers a high percentage of food items high in FPro. Low FPro variation was noted in categories such as jerky, popcorn cookies, mac and cheese, chips and bread, highlighting limited consumer choice in these segments. This was not the case in other categories such as cereals, pasta, milk and milk substitutes and snack bars, where consumers had more choice. Furthermore, the distribution of FPro in GroceryDB and the most recent USDA Food and Nutrient Database for Dietary Studies (FNDDS) was similar.
Regarding the relationship between price and calories, a 10% increase in FPro led to an 8.7% decrease in the price per calorie of products across all categories in GroceryDB. Food category was significant in the relationship between FPro and price per calorie, with more processed foods likely to be cheaper per calorie than less processed alternatives. The relationship between milk category and milk replacer and FPro showed an increasing trend.
Regarding store heterogeneity within the same food category, the analysis showed that cereals sold at Whole Foods typically contain less artificial and natural flavors, less sugar and fewer added vitamins than Walmart and Target. The brands offered by each store could also explain the heterogeneity, with Whole Foods relying on suppliers other than Target and Walmart.
Certain food categories, such as pizza, popcorn, and mac and cheese, are highly processed in all stores. According to GroceryDB, Whole Foods offers a wider range of FPro cookies and crackers for consumers to choose from, while Target and Walmart have identical and narrower ranges of FPro scores.
An FPro component (IgFPro), ranging from 0 (unprocessed) to 1 (overprocessed), was calculated to rank the components based on their contribution to the degree of processing of the final product. Analyzing a variety of food items, it appeared that not all ingredients contribute equally to the amount of processing and that foods with more complex ingredient lists tend to be more processed.
conclusions
In summary, this work uses machine learning techniques to model the chemical complexity of food items offered by some leading US supermarkets. GroceryDB and FPro offer a data-driven approach for consumers to identify similar but less processed alternatives across a range of categories.