Data Collection and Processing

Overview

To explore the relationship between sentiment and mimetic behaviour in the digital assets market, we collected and processed data from two public APIs:

  • Alternative.me’s Fear & Greed Index API – for a sentiment index quantifying market mood from Extreme Fear to Extreme Greed.
  • CoinGecko API – for daily price, volume, and market capitalisation data of Bitcoin, Ethereum, and Solana.

We then stored the processed data in a relational SQLite database to facilitate structured analysis and reproducibility.

Data Collection

πŸ“‰ Market Sentiment (Alternative.me)

The Fear & Greed Index API requires no authentication and provides daily scores representing aggregated market sentiment.

From the API endpoint: https://api.alternative.me/fng/?limit=0&format=json

We extracted:

  • value (0–100 index)
  • value_classification (Extreme Fear, Fear, Greed, Extreme Greed)
  • timestamp (UNIX format β†’ converted to date)

The data was filtered to include only the most recent 365 days and renamed appropriately (classification, date).

πŸ“Š Cryptocurrency Price and Volume (CoinGecko)

We used the CoinGecko Demo API to fetch:

  • Daily OHLC prices (open, high, low, close)
  • Market capitalisation
  • 24-hour trading volume

Coins tracked:

  • Bitcoin (BTC)
  • Ethereum (ETH)
  • Solana (SOL)

Each coin’s data was collected for the last 365 days using Python’s requests library. We respected rate limits using a time.sleep(1.5) delay between API calls.

We made use of two endpoints:

  • /coins/{coin_id}/ohlc
  • /coins/{coin_id}/market_chart

Data was normalised into a pandas DataFrame per coin, with timestamps converted to standard datetime format, and then concatenated into a unified price_data table.

Data Processing Pipeline

πŸ” Cleaning & Merging

Once collected, all raw data was processed using pandas to:

  • Convert timestamps
  • Drop unused columns
  • Handle missing values where necessary
  • Sort and reset DataFrame indices
  • Align timeframes between price and sentiment datasets

πŸ”’ API Key Security

For CoinGecko authentication, we loaded the Demo API key using the dotenv package to keep credentials hidden from version control:

DEMO_API_KEY = os.getenv("COINGECKO_DEMO_API_KEY")

🧱 Database Schema Design

To store and structure our data efficiently, we implemented a relational SQLite database with three interlinked tables: coin_metadata, price_data, and sentiment_data.

This schema ensures proper normalisation and supports robust, query-friendly analysis.

Table Description
coin_metadata Contains unique identifiers, names, and symbols for each cryptocurrency. Serves as a reference table to avoid redundant repetition of coin attributes across records.
price_data Stores daily observations for each coin, including closing price, trading volume, and market capitalisation. Uses a composite primary key (date, coin_id) and links to coin_metadata via a foreign key.
sentiment_data Stores the daily Fear & Greed Index scores and associated classifications. Keyed by date, allowing it to be joined directly to price_data on the same date.