Navigate back to the homepage

HypeTracker - An Exercise In Designing Databases

Justin Ho
September 16th, 2020 · 4 min read

This is an undergraduate computer science project I had a chance to work on with a classmate of mine: Garry (@garrygrewal), see the GitHub Repository

What is HypeTracker?

What is this “hype” and how do I get more of it?

HypeTracker is a data aggregator application which took data from social media platforms such as Reddit and Twitter to display the number of occurrences a sneaker has been mentioned by people over a period of time. This data is valuable because the prices of aftermarket sneakers can be affected by the attention or “hype” of said sneaker at a certain point in time. By collecting this data and displaying it in graphical form, users can make informed decisions about whether to purchase a certain sneaker based on the perceived value through social media attention and comparing it against the price history of the item.

Designing a Database Using ER Modeling

This project was largely data-centric and so we wanted to design the methods in which we are storing and accessing data in order to not run into issues such as incorrect data relationships or duplicate entries. The errors described could have forced us to delete our database and restart from scratch which can be disheartening depending on the progress of the project.

Entities & Attributes

Can I re-roll for more strength?

In HypeTracker, we used entity relationship (ER) modeling as a way to visually describe our data model before implementing it in SQL. We started by listing out our most important entities (real world objects):

  • Sneakers - the topic focus of this application
  • Members - the users of our application
  • Rankings - stores the historical data of mentions / occurrences of a sneaker

Next, we wrote down some attributes (characteristics or information) we had in mind for each entity.

SneakersMembersRankings
namenameplatform
brandemailmentions
price*passworddate

*** retail price at launch (not price history)

Keep in mind that this is only the initial set of items we thought of and more attributes were added later on. However, this chart gave us a baseline for the most important attributes needed for our application and a simplistic view to refer back to once our data model becomes complicated.

Relationships Between Entities

What are we?

In the next phase, we began to define the relationships between entities through simple scenarios of how the entities interact.

Sneakers and Members

  • each sneaker may be monitored by one or more member
  • each member may monitor one or more sneaker

Sneakers and Rankings

  • each sneaker can have zero or more rankings
  • each ranking can only contain one sneaker

*** Members and Rankings entities have no relation between them

Entity relationships can be modeled by their cardinality), which adds a numerical representation to their relations. For example, sneakers and members have a many-to-many relationship, because one sneaker can be watched by many members while one member may watch many sneakers. Meanwhile, sneakers and rankings have a one-to-many relationship because one sneaker can have zero or more ranks associated with it, but each ranking can only describe one sneaker.

Translating all of that into symbols using Crow’s foot notation for cardinality, this is what our ER diagram looks like at this point.

Initial conceptual diagram modelling relationships between entities listed earlier
Initial ER Diagram With Important Entities and Their Relationships

Weak Entity Sets

Apes strong together

A key point to identify at this point is that the Rankings entity does not exist without at least one Sneaker entity. This creates a different type of relationship where the weak entity (Rankings) has an existence dependency on the stronger entity (Sneakers). We can represent this by changing the relationship into a double diamond, changing the weak entity into a double rectangle, and using two lines between the weak entity and weak relationship.

Modified ER Diagram After Account For the Weak Entity (Rankings)
ER Diagram After Modifications for Weak Entity Sets

Many-To-Many Relationships

There’s not enough room for all of us in this relationship

Unfortunately we were not finished with this data model yet. Another glaring issue was the cardinality between the Sneakers and Members entities. Many-to-many relationships creates problems in SQL such as how can one members row in the database store many sneakers at the same time? In addition, there are other issues and proposed solutions which you can read about in this article, but the recommended solution is to use an associative entity.

Using an associative entity, we can refactor the relationship between sneakers and members into a new Watchlist entity which keeps track of members and their sneakers.

ER Diagram after adding an associative entity between Sneakers and Members
ER Diagram After Modifications for Associative Entities

Finishing Up

That was easy

Now we were basically done! Just add in the foreign key(s) as attributes and underline the primary key(s) and this was the resulting ER diagram:

The Final ER Diagram
ER Diagram Complete!

Here is the SQL file to implement this in MariaDB (v10.1.35):

1/* hypetracker.sql */
2
3CREATE DATABASE IF NOT EXISTS HypeTracker;
4
5USE HypeTracker;
6
7CREATE TABLE IF NOT EXISTS `Sneakers` (
8 `Name` VARCHAR(100) NOT NULL,
9 `Price` DECIMAL(9,2) UNSIGNED,
10 `Brand` VARCHAR(40) NOT NULL,
11 CONSTRAINT `PK_Sneakers` PRIMARY KEY (Name)
12) ENGINE=InnoDB;
13
14CREATE TABLE IF NOT EXISTS `Members` (
15 `Name` VARCHAR(40) NOT NULL,
16 `Email` VARCHAR(100) NOT NULL,
17 `Password` VARCHAR(40) NOT NULL,
18 CONSTRAINT `PK_Members` PRIMARY KEY (Email)
19) ENGINE=InnoDB;
20
21CREATE TABLE IF NOT EXISTS `Watchlist` (
22 `SneakerName` VARCHAR(100) NOT NULL,
23 `MemberEmail` VARCHAR(100) NOT NULL,
24 CONSTRAINT `PK_Watchlist`
25 PRIMARY KEY (SneakerName, MemberEmail),
26 CONSTRAINT `FK_Watchlist_Sneakers`
27 FOREIGN KEY (SneakerName)
28 REFERENCES Sneakers(Name)
29 ON DELETE CASCADE ON UPDATE CASCADE,
30 CONSTRAINT `FK_Watchlist_Members`
31 FOREIGN KEY (MemberEmail)
32 REFERENCES Members(Email)
33 ON DELETE CASCADE ON UPDATE CASCADE
34) ENGINE=InnoDB;
35
36CREATE TABLE IF NOT EXISTS `Rankings` (
37 `Platform` VARCHAR(100) NOT NULL,
38 `Date` DATE NOT NULL,
39 `SneakerName` VARCHAR(100) NOT NULL,
40 `Mentions` INT UNSIGNED NOT NULL,
41 CONSTRAINT `PK_Rankings`
42 PRIMARY KEY (Platform, Date, SneakerName),
43 CONSTRAINT `FK_Rankings_Sneakers`
44 FOREIGN KEY (SneakerName)
45 REFERENCES Sneakers(Name)
46 ON DELETE CASCADE ON UPDATE CASCADE
47) ENGINE=InnoDB;

Here are some example SQL queries to retrieve data for certain scenarios:

1/* Get All Sneakers Watched by a Member */
2SELECT S.Name, S.Price, S.Brand
3FROM Sneakers S
4INNER JOIN Watchlist W
5ON S.Name = W.SneakerName
6WHERE W.MemberEmail = '$email'; /* $email is a PHP variable here */
7
8/* Get the 5 Most Mentioned Sneakers In the Last Week */
9SELECT S.Name, S.Price, S.Brand, RS.TotalMentions
10FROM Sneakers S
11INNER JOIN
12(
13 SELECT R.SneakerName, SUM(Mentions) AS TotalMentions
14 FROM Rankings R
15 WHERE R.Date > DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
16 GROUP BY R.SneakerName
17 ORDER BY TotalMentions DESC
18 LIMIT 5
19) RS
20ON S.Name = RS.SneakerName;

Takeaways and Learnings

It’s not over yet!

To recap, we designed a relational database based on our requirements using the entity relationship model in order to visualize our database before implementation. I learned how to express data relationships through cardinality and how to refactor many-to-many relationships so that it will work nicely in SQL. Although creating ER diagrams can be tedious, this is an important process to verify our design decisions in order to avoid simple dependency or redundancy issues later on. I am continuously learning more about SQL and this write-up details an iteration of the project after it had been implemented in PHP.

Data Normalization

That being said, we were working with a relatively simple model due to the small number of entities we needed and did not run in more issues which will require more normalization techniques. I’ve avoided using this term in the post because it is a complicated topic on its own and we were able to achieve a data model in Boyce-Codd Normal Form (BCNF) just by one refactoring step. If you are planning to learn more about databases, I would suggest looking at the different normal forms and normalization techniques, as well as relational algebra and relational calculus to express your SQL queries more effectively.

More articles from Justin

Automating My Personal Portfolio

integrating continuous delivery concepts with a personal touch

September 4th, 2020 · 6 min read
© 2020 Justin
Link to $https://github.com/justinhodevLink to $https://linkedin.com/in/justin-ho-devLink to $https://dev.to/justinhodev