The task was to :
- Design a database for given streaming company wit ERD.
- Write DDL queries to create it and DML to seed data + query to update value of servers.
- Write queries for extraction different kinds of data.
- Offer some indexing structure
- Write stored procedure for processing records.
- Propose simple data warehouse model.
Additional notes :
- Channels can be without categories.
- We need to log all changes of
Viewing Requests
table (inRequestStatusHistory
table). - Viewer is defined by Country, Email and Username (combination shouldbe unique).
- Servers have Desctruction Date - nullable, filled when server is destroyed.
- Servers should have CurrentUsersAmt always less or equal to their capacity.
- Database is in the 3NF.
Files :
init.sql
- file with database creation.seed.sql
- seeding database.update_servers.sql
- query to update amount of active users for each server.
Some prerequisite knowledge :
- Primary key is treated by MS SQL Server as a clustered unqiue index.
- Clustered index describes how data is actually stored.
- Non-clustered index describes how to navigate to the row and is stored separately from the database.
- In order to declare new clustered index we need to delete index, created by SQL server by default (PK).
- Servers - clustered,
(CurrentUsersAmt DESC, ExpirationDate ASC)
- task required to firstly fill servers with highest amount of users. Not unique. - Viewers - leave PK as clustered unique index, add non-clustered not unique on
CountryId
- for statistics. - ViewingRequests - we address this table a lot by two combinations:
viewer – channel
andchannel-server
. They should be created as UNIQUE non-clustered indexes. PK should be left. - All other tables should be left as they are : junction tables already define primary keys as combination of their foreign keys, and small tables as country or statuses doesn’t need anything except for the primary key.
Tose queries are kept in folder Queries
with following name pattern query[X].sql
where X is a number of a query as they are listed below :
- For each viewer show channels that have never been requested by that viewer.
- Identify channels for which the average viewing time per viewer exceeds average viewing time per user by 45%.
- For each country list top 10 customers with largest ratio of total viewing time by total requests.
- Identify 3 channels with largest share of short views among all completed views (status = 'open') A shot view is view for less than 15 sec.
- Identify customers who have already viewed every channel for total at least an hour in 2018.
Procedure proceeds requests from old to new ones, if expiration date of channel is earlier than request date then it is rejected. If it finds server (we browse them by decreasing amount of current users and availability) then it increments its current users amount. If not - creates new one. After that it assigns request to the server and logs changes in request status.
File : Procedures/procedure.sql
.
Steps :
- Denormalize original database to the 2NF.
- Add
DurationInSeconds
property to viewing request. Calculated with help ofDATEDIFF
. - Use
Star
design:
- Query to update servers doesn't update servers who doesn't have active requests waiting(so, amount of current user will not change if it decreases in real life).
- Query #5 is not finished.
- In procedure it wiuld be good to remove extra cursor inside and replace it with
TOP(1)
SQL select statement. - Place
UPDATE
statements into transactions (so, if anythin goes wrong it is possible to rollback).