Elle Cawtheray :)

myspaql is a pretend social media platform, populated by thousands of randomly generated users with randomly generated behaviour (posts, likes, follows, etc.).

All user and website data is stored on a MySQL database. Using PHP, the website queries the database to display users, their posts, who they follow, etc. The search function, in particular, required a strong grasp of SQL, with queries and sub-queries and sub-sub-queries...

Site data is generated and uploaded to the database with Python. User behaviour is not entirely random. For instance, when a new user is generated, they are more likely to be from a country which already has users on the site. Users are more likely to follow people whose posts they like, or like posts from the people they follow.

In order to determine how many likes a post gets, I used scikit-learn to train a linear regression model with real-life Twitter data obtained from a Kaggle competition. TF-IDF is used to vectorise text within the model.

The Quick Draw dataset and related Python API are used for profile picture generation. EurOccupations is used for occupational data (see their data policy here). Text data is generated with wonderwords and GPT-2 (I couldn't afford GPT-3 + ...)

By importing the MySQL database into Power BI, I have created an interactive visualisation of user activity, demographics, etc.

Browse myspaql Click here!
How it works In progress...
Visualising user data In progress...

Background gif (Animated Starfield Tile) created by ArtBIT, distributed with the Creative Commons Attribution-ShareAlike 3.0 License