The collection stage involves acquiring the necessary data in order to perform a meaningful analysis based upon accurate information.
Define which data is needed to properly approach the project (e.g. format, variables, time range, granularity)
Find reliable and relevant data sources (e.g. databases, APIs, files, sensor readings)
Secure necessary permissions to access the data (e.g. email/password, OAuth, API key, robots.txt)
Acquire the data using appropriate methods (e.g. SQL queries, API calls, web scraping, manual data entry)
Handle the data in accordance with best practices (e.g. data quality, data governance, data security)