University of Pittsburgh
Earth data
Global Sunrise & Sunset Data:
Description: Global pixel level, daily data of the sunrise and sunset time from year 1700 to present.
Python code: Modify the Python code to scrape the data in any location and any period.
Stata code: the Stata code to clean above scraped data.
Data Source: U.S. Navel Observatory
Paper: Nutrition, Labor Supply, and Productivity: Evidence from Ramadan in Indonesia
Global Temperature Data:
Description: Global pixel level, daily data of the average, minimum, maximum temperature from 1979 to present.
Python code: the Python code to transfer the original NetCDF format data to CSV.
Stata code: the Stata code to merge and clean above CSV data.
Data Source: Climate Prediction Center Global Temperature Time Series
Search Index data
Google Search Index Data:
Description: Monthly to minutely, country to city level data of search index of selected keywords in all countries from 2004 to latest date.
Python code: the Python code to scrape the search index of any keyword in any region in any time from 2004.
Data Source: Google trends
Note: The search index data could be used to predict latest trends or as outcome variables to measure intention. Daily, hourly, minutely level data only available in past 90 days, past 7 days, and past 4 hours, respectively
Scraping data from Other Webs
Book/Movie Ratings:
Description: Data from Douban, a leading book and movie review website with 0.2 billion users in China.
Python code: the Python code to scrape book information, such as title, author, rating, etc. The code could be used to scrape other information from Douban pages with a similar structure.
Data Source: Douban
Wikipedia
Description: Encyclopedia information about any object from Wikipedia
Python code: E.g., the Python code to scrape the country code and official languages of any country from Wikipedia.
Data Source: Wikipedia
Online product data
Description: Information about products sold on online shopping website
Python code: E.g., the Python code to scrape product name, brand, shipping fees. It may also be adapted to gather other key information, such as price, rating.
Data Source: Newegg