Nowcasting Italian CPI with online prices and machine learning
A new Banca d'Italia Occasional Paper explores using real-time online food price data and machine learning to nowcast the Italian Consumer Price Index. The study demonstrates the potential for more granular and timely inflation indicators, especially during economic volatility.
Web-scraped data for real-time insights
The research collected daily online food prices from 20 supermarkets across major Italian cities between December 2020 and March 2023.
This web-scraped data, focusing on COICOP5 categories like fruits, vegetables, and meat, was then used to forecast the CPI for these specific food categories.
The authors employed machine learning models, including the Prophet model, to process the high-velocity data.
The study's objective was to assess the feasibility and accuracy of this approach, particularly during periods of high macroeconomic uncertainty such as those following the COVID-19 pandemic and the onset of the war in Ukraine.
The findings suggest that web-based price data can effectively complement traditional statistical sources, offering valuable insights for researchers and practitioners interested in real-time economic monitoring.
The methodology involved manually labeling a significant portion of the dataset with expert help, then building rules using regular expressions to categorize products into COICOP5 classifications, ensuring a structured approach to data analysis.
Addressing traditional forecasting gaps
Traditional CPI forecasting models often struggle to incorporate real-time data and adapt to rapid changes, leading to potential inaccuracies in short-term forecasts.
This paper addresses these limitations by leveraging the increasing availability of 'Big Data' and advanced web scraping techniques.
The literature widely supports the use of online prices as representative of overall market prices, with several National Statistical Institutes (NSIs) globally, including ISTAT, actively researching or implementing web-scraped data for official CPI production.
Pioneering efforts like MIT's Billion Prices Project laid the groundwork, though they faced challenges related to coverage and continuous updating.
The study highlights that nowcasting granular categories, such as fresh food prices, is particularly challenging due to their inherent volatility.
The data collection focused on a single national retail chain, covering provinces with a total population of about 20 million people, roughly one-third of the Italian population, ensuring geographical spread.
Promising, but with inherent limitations
This paper offers a promising proof-of-concept for enhancing inflation measurement with high-frequency, alternative data, particularly valuable in times of economic flux.
However, the reliance on a single retail chain and specific food categories inherently limits the generalizability of its findings to the broader Italian CPI basket.
While methodologically sound for its defined scope, the study's practical policy implications for central banks would require broader data coverage and further validation across diverse sectors.