This is an undergraduate computer science project I had a chance to work on with a classmate of mine: Garry (@garrygrewal), see the GitHub Repository
What is HypeTracker?
What is this “hype” and how do I get more of it?
HypeTracker is a data aggregator application which took data from social media platforms such as Reddit and Twitter to display the number of occurrences a sneaker has been mentioned by people over a period of time. This data is valuable because the prices of aftermarket sneakers can be affected by the attention or “hype” of said sneaker at a certain point in time. By collecting this data and displaying it in graphical form, users can make informed decisions about whether to purchase a certain sneaker based on the perceived value through social media attention and comparing it against the price history of the item.
Designing a Database Using ER Modeling
This project was largely data-centric and so we wanted to design the methods in which we are storing and accessing data in order to not run into issues such as incorrect data relationships or duplicate entries. The errors described could have forced us to delete our database and restart from scratch which can be disheartening depending on the progress of the project.
Entities & Attributes
Can I re-roll for more strength?
In HypeTracker, we used entity relationship (ER) modeling as a way to visually describe our data model before implementing it in SQL. We started by listing out our most important entities (real world objects):
- Sneakers - the topic focus of this application
- Members - the users of our application
- Rankings - stores the historical data of mentions / occurrences of a sneaker
Next, we wrote down some attributes (characteristics or information) we had in mind for each entity.
*** retail price at launch (not price history)
Keep in mind that this is only the initial set of items we thought of and more attributes were added later on. However, this chart gave us a baseline for the most important attributes needed for our application and a simplistic view to refer back to once our data model becomes complicated.
Relationships Between Entities
What are we?
In the next phase, we began to define the relationships between entities through simple scenarios of how the entities interact.
Sneakers and Members
- each sneaker may be monitored by one or more member
- each member may monitor one or more sneaker
Sneakers and Rankings
- each sneaker can have zero or more rankings
- each ranking can only contain one sneaker
*** Members and Rankings entities have no relation between them
Entity relationships can be modeled by their cardinality), which adds a numerical representation to their relations. For example, sneakers and members have a many-to-many relationship, because one sneaker can be watched by many members while one member may watch many sneakers. Meanwhile, sneakers and rankings have a one-to-many relationship because one sneaker can have zero or more ranks associated with it, but each ranking can only describe one sneaker.
Translating all of that into symbols using Crow’s foot notation for cardinality, this is what our ER diagram looks like at this point.
Weak Entity Sets
Apes strong together
A key point to identify at this point is that the Rankings entity does not exist without at least one Sneaker entity. This creates a different type of relationship where the weak entity (Rankings) has an existence dependency on the stronger entity (Sneakers). We can represent this by changing the relationship into a double diamond, changing the weak entity into a double rectangle, and using two lines between the weak entity and weak relationship.
There’s not enough room for all of us in this relationship
Unfortunately we were not finished with this data model yet. Another glaring issue was the cardinality between the Sneakers and Members entities. Many-to-many relationships creates problems in SQL such as how can one members row in the database store many sneakers at the same time? In addition, there are other issues and proposed solutions which you can read about in this article, but the recommended solution is to use an associative entity.
Using an associative entity, we can refactor the relationship between sneakers and members into a new Watchlist entity which keeps track of members and their sneakers.
That was easy
Here is the SQL file to implement this in MariaDB (v10.1.35):
1/* hypetracker.sql */23CREATE DATABASE IF NOT EXISTS HypeTracker;45USE HypeTracker;67CREATE TABLE IF NOT EXISTS `Sneakers` (8 `Name` VARCHAR(100) NOT NULL,9 `Price` DECIMAL(9,2) UNSIGNED,10 `Brand` VARCHAR(40) NOT NULL,11 CONSTRAINT `PK_Sneakers` PRIMARY KEY (Name)12) ENGINE=InnoDB;1314CREATE TABLE IF NOT EXISTS `Members` (15 `Name` VARCHAR(40) NOT NULL,16 `Email` VARCHAR(100) NOT NULL,17 `Password` VARCHAR(40) NOT NULL,18 CONSTRAINT `PK_Members` PRIMARY KEY (Email)19) ENGINE=InnoDB;2021CREATE TABLE IF NOT EXISTS `Watchlist` (22 `SneakerName` VARCHAR(100) NOT NULL,23 `MemberEmail` VARCHAR(100) NOT NULL,24 CONSTRAINT `PK_Watchlist`25 PRIMARY KEY (SneakerName, MemberEmail),26 CONSTRAINT `FK_Watchlist_Sneakers`27 FOREIGN KEY (SneakerName)28 REFERENCES Sneakers(Name)29 ON DELETE CASCADE ON UPDATE CASCADE,30 CONSTRAINT `FK_Watchlist_Members`31 FOREIGN KEY (MemberEmail)32 REFERENCES Members(Email)33 ON DELETE CASCADE ON UPDATE CASCADE34) ENGINE=InnoDB;3536CREATE TABLE IF NOT EXISTS `Rankings` (37 `Platform` VARCHAR(100) NOT NULL,38 `Date` DATE NOT NULL,39 `SneakerName` VARCHAR(100) NOT NULL,40 `Mentions` INT UNSIGNED NOT NULL,41 CONSTRAINT `PK_Rankings`42 PRIMARY KEY (Platform, Date, SneakerName),43 CONSTRAINT `FK_Rankings_Sneakers`44 FOREIGN KEY (SneakerName)45 REFERENCES Sneakers(Name)46 ON DELETE CASCADE ON UPDATE CASCADE47) ENGINE=InnoDB;
Here are some example SQL queries to retrieve data for certain scenarios:
1/* Get All Sneakers Watched by a Member */2SELECT S.Name, S.Price, S.Brand3FROM Sneakers S4INNER JOIN Watchlist W5ON S.Name = W.SneakerName6WHERE W.MemberEmail = '$email'; /* $email is a PHP variable here */78/* Get the 5 Most Mentioned Sneakers In the Last Week */9SELECT S.Name, S.Price, S.Brand, RS.TotalMentions10FROM Sneakers S11INNER JOIN12(13 SELECT R.SneakerName, SUM(Mentions) AS TotalMentions14 FROM Rankings R15 WHERE R.Date > DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)16 GROUP BY R.SneakerName17 ORDER BY TotalMentions DESC18 LIMIT 519) RS20ON S.Name = RS.SneakerName;
Takeaways and Learnings
It’s not over yet!
To recap, we designed a relational database based on our requirements using the entity relationship model in order to visualize our database before implementation. I learned how to express data relationships through cardinality and how to refactor many-to-many relationships so that it will work nicely in SQL. Although creating ER diagrams can be tedious, this is an important process to verify our design decisions in order to avoid simple dependency or redundancy issues later on. I am continuously learning more about SQL and this write-up details an iteration of the project after it had been implemented in PHP.
That being said, we were working with a relatively simple model due to the small number of entities we needed and did not run in more issues which will require more normalization techniques. I’ve avoided using this term in the post because it is a complicated topic on its own and we were able to achieve a data model in Boyce-Codd Normal Form (BCNF) just by one refactoring step. If you are planning to learn more about databases, I would suggest looking at the different normal forms and normalization techniques, as well as relational algebra and relational calculus to express your SQL queries more effectively.