BTM 382 Project Fall 2016
The purpose of the BTM 382 project is to provide you the opportunity to analyze and develop the database model and operational database in a realistic organizational scenario. This is a group project. Other than with your group members, you may not collaborate with other classmates or anyone else. You should certainly use the Internet and any other resources to learn more in order to complete this project, but if you need any direct help from any person, ask questions on Piazza, ask the instructor in class or during office hours, or book an appointment with the TA.
This document provides full, detailed project requirements and submission instructions. In addition, it includes many tips on finding resources to help you succeed in the project, and the final section provides a suggestion of how to effectively organize your teamwork.
Project Case Overview
Concordia University Campus Recreation serves Concordia University students, staff and faculty, and also the general public with athletic facilities and activities for recreation, sports and fitness. Downtown on the Sir George Williams (SGW) Campus, Le Gym is a full-service fitness centre with over a hundred exercise machines, many physical fitness classes (including aerobics, dance and martial arts), personal training, and also open time for some indoor group sports. On the Loyola Campus, the Athletics Complex provides a gym, outdoor sports fields, a hockey arena and other facilities for various sports that require large amounts of space.
In this project, you will model and reproduce the database required to support Campus Recreation’s scheduling of instructional classes and personal training, and customer management. Sample data for this project is based on Campus Recreation’s schedule for the current semester, which contains a detailed specification of real data that would be required for this project. There are two versions of this schedule:
These two versions contain exactly the same schedule information, but in slightly different formats.
Details of project and deliverables
In general, you will develop a fully-specified database model (ERD), write SQL scripts to create tables and constraints, write SQL scripts to populate the database, and write SQL queries to generate data for reports that will be needed by Campus Recreation. You will not construct any kind of user interface for the system, or produce any actual printed reports. You are required to use Oracle for this project; hence your SQL must be fully compatible with Oracle.
SQL queries needed for reports and operations
Campus Recreation needs a number of SQL queries and operations for maintenance activities and reports. You will write SQL queries necessary for the generation of these reports and operations. However, you will not be required to actually create the reports; you will only write the SQL queries that would support them. Although some of these reports are based on the real Campus Recreation schedule, most of them are fictional (that is, they do not necessarily represent the actual operations of Campus Recreation), and even the real ones are somewhat simplified for the sake of this project; they are specified here to enable you to practice creating SQL queries. However, read these report specifications very carefully, since many business rules are contained in them that will affect the specification of your database model. In all of your reports, please note the following stipulations:
- Your database tables must not directly store any calculated fields or multivalued attributes. That is poor database design. So, the reports need to use the appropriate SQL operators or functions to display information in the reports that require calculations or combined data. For example, to display the days of the week, times and prices as required in some of the reports, you will need to use the Oracle CONCAT function or concatenation (||) operator.
- Some of the reports require you to display specific class categories or other such information. You are not permitted to hard-code primary key codes into your queries; your queries must rather accept appropriate data values. For example, an SQL clause like “
where Class.ID = 15” is no permitted; that is poor program design since database programmers would have to know the specific primary key values in advance. Rather, you should have something like “
where Class.ClassName = 'Zumba'” that permits programmers to use human-understandable inputs in their code.
- Although you are not allowed to request help from any person other than myself or the tutor for the project, you are strongly encouraged to search the Web and use any resources you can find to help you out. Other than what you might find from a regular web search, please note:
- One of the best resources for learning Oracle SQL is Tech on the Net’s Oracle pages. They are fairly complete, include simple examples, and are much easier to understand than Oracle’s official pages.
- StackOverflow is the best programming question-and-answer site on the Web for any and all kinds of programming. You can search for your questions and always include “Oracle” in your search question. For this project, you are permitted to post questions on StackOverflow and use the answers that you might be provided, because that’s what real programmers do; this is the only exception to only requesting help from the tutor or myself.
- When you search the Web for SQL help, make absolutely certain that the page you are looking at talks specifically about Oracle’s flavour of SQL. You could easily waste a lot of time trying to apply an answer that actually applies to some other flavour of SQL, such as Microsoft SQL Server.
- In particular, be very careful when you get results from the oracle.com website. Oracle owns not only Oracle Database, but also MySQL, so a lot of SQL documentation from the oracle.com website refer to MySQL, not Oracle Database—very confusing. You should be absolutely certain that you are reading about Oracle Database SQL.
- You are allowed to copy or modify code from any website (or use an answer from StackOverflow) but you absolutely must cite it properly. This is what professional programmers do. If you’re not sure how to cite something properly, just ask on Piazza.
Here is the list of SQL queries you need to create:
- SGW printed schedule: Display the full schedule on the Sir George Williams campus for the Fall semester 2016 as displayed in the SGW printed schedule. In other words, reproduce all of the data in the schedule tables except for the data in the “Weight Room” section. Show all classes being held in the semester with their category, class name, instructor, days, location, times, and price (for Concordia students / everyone else). The classes must be grouped by their category. This report will display only the real data provided to you; it should not include any fictional data. Note the following requirements and modifications from the actual print report:
- You are not permitted to save these multivalued attributes in single table attributes.
- It is OK (and probably necessary) for the class categories to be repeated on every line.
- Similarly, some of the sports in Recreation (specifically, Badminton, Beach Volleyball and Open Training) have repeating groups (multivalued attributes) in the class name or price. In these cases, repeat the missing data on each row.
- However, the missing instructors in Recreation are not multivalued attributes. You need to represent them in your database as nulls, but your SQL output should display blank (rather than null) as in the printed schedule.
- Unlike in the printed schedule, your results must be sorted alphabetically by class category.
- Within each category, the individual classes must be sorted alphabetically.
- Use consistent abbreviations for each day of the week: Su, Mo, Tu, We, Th, Fr, Sa.
- Whereas some classes have two instructors (e.g. Zumba and Salsa), delete the second instructor in each case; assume that one class can only have one instructor. If schedule gives you only a first name or only a last name for an instructor, then make up a fictional first or last name for that instructor so that they have a complete name in the database.
- Note that the printed schedule only displays classes on the Sir George Willams campus. So, even though your database must also store Loyola classes, your printed schedule query must not display them.
- “Open Training (outdoors)” (also called “open time”) means that clients can do whatever they want at the scheduled time and location.
- Full website schedule: Display the schedule for one category of sports the Fall semester 2016 as displayed on the website. In other words, if the query specifies a specific category, then you will produce the same results query as that section on the website. For example, if the query includes a clause that is something like “
where Class.Category = 'Aerobics'“, your query will display the results of the “Aerobics” block only of the website. For those classes, display the class name, instructor, days, times, cost (for Concordia students / everyone else), location (facility), and campus. This report will display only the real data provided to you; it should not include any fictional data. All the requirements and modifications given for the SGW printed report above apply, with the following points specific to the actual website report:
- The query will display results for only one category that you specify (e.g. “Aerobics”), not the entire website report. By changing the parameter for the category, the result will display results for a different category.
- Do not attempt to display anything for
“Personal training”, “Weight room at Le Gym – Fall 2016” or “Intramural sports”.
- On the website, for SGW facilities, the report lists both the facility and the specific location in the facility (e.g. “Le Gym (Gym)”). This is different from the format in the printed schedule, and you should show it in this different way. (However, it is inconsistent: sometimes it includes parentheses, e.g. “Le Gym (Gym)”, and sometimes it omits them, e.g., “Le Gym Studio C”. Your report should consistently include the parentheses.) The Oracle NVL2 function might be useful for this requirement.
- Unlike the printed report, the website schedule query should include any classes held on the Loyola Campus, except that you should completely ignore “Open Time” at Loyola in the Recreational category.
- Although it is not displayed on the website, your website query should include “Open Training (outdoors)” in the Recreation category (as in the printed report); this is on the SGW campus.
- When you see classes with names like “Yoga @ Loyola”, that is the full name of the class that should be recorded and displayed.
- Do not include hyperlinks in your query results (though you can do so if you really want to).
- You should correct obvious typos and inconsistencies (e.g., the Aerobics block should have “Facility” as a column header instead of “Loc.”). If you’re not sure about anything, ask on Piazza for clarification).
- Daily classes report: Display all the classes being held on a specific day (for example, November 17, 2016). So, if for example, you enter ‘
2016-11-17‘ as the date, your query should detect that that date is a Thursday, and should display results for all Thursday classes held that semester. Your report should display the classes, their instructors, their locations, their times, and the names of all the customers in each class. You should have at least three customers in each class for the day you choose. Note that whereas the first two queries are based only on real data from the published schedules for Campus Recreation, this and all the following queries will require you to generate some fake sample data as described in the instructions for the “Data population script” below. You might find the following information helpful for this and some other queries:
- How to get the week day name from a date?
- Format functions for the TO_DATE function, as well as other Oracle date and time functions
- Daily personal training report: Display all the customers with personal training sessions on a specific day (for example, Thursday, November 17, 2016). Your report should display the customer and instructor names, the session times, and whether or not the customer is a current member of Le Gym.
- Semester revenue report by class: Display a report that lists the enrollment and revenue by class. This report should list all the classes offered, and each row should show the number of customers enrolled for that class, and the total revenue obtained for that class. (Revenue is based on the price paid by each customer, based on their customer category, as explained below in the “Data population script“.) The results should be sorted by total revenue, with the highest-earning class on top. In your results, some classes should have no customers and hence no revenue.
- Semester revenue report by day of week: Display a report that lists the enrollment and revenue for each day of the week. This report should list seven rows (one for each day of the week), and each row should show the number of customers enrolled for all classes on that day of the week, and the total revenue obtained on that day of the week. In your results, all days of the week should have some customers.
- Flag customers for discount: Le Gym wants to offer membership discounts to customers who have had more than one personal training sessions in the past, but who are not current members. However, rather than mailing these customers, Le Gym wants to simply flag (mark or track) their accounts so that the receptionist can personally talk to the customer when next they come in for a personal training session. To support this marketing effort, write an UPDATE query that flags all customers who have had more than one personal training session, but who have never subscribed for a regular membership. For this query, note:
- This flag must be stored in the database so that it can be available for later retrieval. You must decide the appropriate way and place to store such information.
- This query requires one or more UPDATE or INSERT queries (there are different ways that you could implement it); it does not involve a SELECT query, unlike all the other queries.
For each query specified in this section, you will submit all the SQL in a single text file. In addition, in a word processor file you should list the code of each query with a screenshot of the query results from Oracle SQL Developer. (However, for UPDATE queries, you will only include the code of the query; there is no corresponding screenshot.) Here is an example of what each image should look like within the word processor document; note that the SQL code is listed above the screenshot in a
monospace font such as Courier New.
Database model (ERD)
Based on the schedule data for Campus Recreation and the description of the required queries, create a fully-specified database model (ERD) for a database required to support Campus Recreation’s reports and operations. Note that the reports currently in the printed schedule and on the website are not normalized; they might not even be in 1NF. Moreover, note that the project would require you to generate more sample data. Your ERD must include all the database tables and relationships needed to store all data, whether real or sample. Please note:
- Your database model must be normalized to 4NF. However, there are situations where full normalization is not ideal or practical. In such cases, you should normalize the table to the appropriate level (usually either 1NF or 2NF) and then add comments to your submission cover page with explanations. For each such table, you must
- State that you consciously and deliberately did not normalize the table to 4NF;
- Specify which normal form the table is in; and
- Justify your reason for normalizing at a level below 4NF.
- Your ERD must follow all standard Crow’s Foot notation. Be sure to refer to the rubric (marking guide) for Homework Assignment 1, since that contains the detailed standards that will be used to evaluate the project ERD as well.
- For the sake of your learning and for academic integrity, you are not permitted to contact Campus Recreation or Le Gym to ask them what their actual database model is, nor are you permitted to ask such questions of anyone who has ever worked on a database model for Le Gym. It is not necessary to obtain further information from Campus Recreation for this project, since the specifications make a number of assumptions which might or might not be their actual situation.
To document the technical structure of your database model, you will also submit a fully specified data dictionary. You should enter it in a spreadsheet following the exact format for data dictionaries given in the class textbook, as shown in Table 3.6 (A Sample Data Dictionary) and Table 7.3 (Data Dictionary for the Ch07_SaleCo Database).
As specified in the submission instructions below, you will first create the ERD and then submit it by the specified deadline (only the ERD, not the data dictionary). I will mark and correct it so that you can continue with the rest of the project. (You will only submit the data dictionary at the end of the semester.)
Database structure (DDL) script
After correcting the ERD, create the SQL required to generate the tables, indexes, and views that are specified in the database model. Every single detail in the ERD and data dictionary must be reflected in your SQL script.
At the top of the file, before the CREATE TABLE statements, you must include DROP TABLE statements so that you can automatically drop and recreate tables as often as you need to. Note that in order to maintain referential integrity, you must create and drop tables in a particular order: For creating tables, you must first create the table on the one side of a one-to-many relationship (your first tables created must not be on the many side of any relationship). For dropping tables, you must first drop the table on the many side of a one-to-many relationship (your first tables dropped must not be on the one side of any relationship). Thus, you will drop tables in the reverse order that you create them.
Although it is not necessary, you are permitted to use automatic tools to generate DDL elements such as CREATE TABLE and DROP TABLE. If you use such a tool, you must cite it properly.
Data population script
You will create the SQL required to add data to the tables you need for this project. For this, you will enter schedule data from the Campus Recreation website. In addition, you will also have to create sample data for tables not specified in the real schedule. The following details specify which data you need to enter into your database:
- The basic requirement is that you must enter all sample data needed to display all the reports in the Queries section. You do not need to add any data not needed for the queries. For example, when creating information for customers, if there is no report that requests that you print their addresses, then you do not need to include any address information in the database. Although a real database would probably have such information, in order to keep this project manageable, only create the data that you need and nothing more.
- To do the required queries, you will also have to create sample data for tables not specified in the real schedule. To generate sample data, you may use a test data generator ; one that I particularly like is GenerateData.com (if you use such a generator, you must cite it properly). Here are some important additional tips:
- In generating sample data, one of the most challenging aspects is handling foreign keys when tables are connected. I’ve prepared a video with detailed instructions on one way to do this. The video is quite long (58 minutes), but it can save you many hours by making the process fairly simple.
- If the only thing you need to do is to transfer spreadsheet data (e.g. from Excel) to SQL, then you only need to watch the video from 15:00 to 18:45 (around 4 minutes). The spreadsheet that I display is available as a link at the bottom of the blog page (“Test Data Excel spreadsheet”). You can copy and modify the code to meet your needs.
- Although you do need to spend some time to generate sample data (this is an important part of real-life database creation and testing), it should not take an inordinate amount of time. If you find that generating sample data is taking too much time or is very challenging, then contact me or the tutor immediately to help you out.
- Some of the customers in your database should be current members of Le Gym, but some of them will not be. (Note that only Le Gym, on the SGW campus, offers memberships.) A customer does not need to be a member of Le Gym to enroll for an instructional class or to sign up for personal training.
- Choose any five of the existing instructors (do not create any new fake instructors) who are also personal trainers available to give personal training sessions.
- At least 15 customers should sign up for personal training sessions during the semester, with at least 3 from each of the five categories of customer. You need to make up specific times and dates for each session within the dates of the semester schedule (September 12 to December 11, 2016). The pricing is $30/hr for students and $35/hr for everyone else. Some of the customers should sign up for more than one training session during the semester.
- For the other records and tables that you need to make up, there is no minimum of how many records you need. Obviously, you need a sufficient number of records in each table to support the total number of customers and to generate the reports.
Note that, to maintain referential integrity, you must insert data into tables in a particular order: You must first add data into the table on the one side of a one-to-many relationship (the first tables into which you add data must not be on the many side of any relationship). In other words, you should add data into tables in the same order in which you created the tables.
For the INSERT statements, you are permitted to use an automatic tool to convert sample data from tabular format to SQL INSERT statements (there are numerous examples on the Web, including a guide for importing data into Oracle SQL Developer). If you do use such a tool, you must cite it in the comments section at the top of your INSERT script by including a URL reference to its source webpage.
For each table that you populate, you must generate the full data result of the table (using the “SELECT * FROM TableName” statement), and save an image of the first page of results from Oracle SQL Developer and submit them all in one word processor document. Here is an example of what each image should look like within the word processor document; note that the SQL code is listed above the screenshot in a monospace font such as Courier New.
It is important that each team member contributes significantly for your team to succeed in this project. To evaluate each member’s contribution, each individual will submit a confidential Peer Evaluation in the middle of the semester and at the end of the project. Your individual grade for the group project will depend partially on your group grade, and also on how your peers assess your contribution and participation. So, be a team player!
However, rather than waiting until problems get so serious that they negatively affect your team’s perform, if any student is concerned about a teammate’s performance, then at any time in the course, early on please notify me. I will anonymously e-mail the following message to the student:
“One or more of your teammates has anonymously notified me that they are concerned about your performance on the team. These concerns might be related to the quality of your contributions, your responsibility in contributing to the team effort, your attitude and cooperation with other team members, or other matters of concern.
Please have an honest conversation with all of your teammates to address any possible concerns. When you have this conversation with your team, please don’t try to find out who might have said what to me, but rather try to understand any issues of concern (please note that other teammates might also have concerns that they have not yet expressed, so it might not be whom you think). Hopefully, any concerns will be quickly resolved so that your team can perform at its best.”
There are three submission deadlines, and two of the deadlines require not only a group submission, but also individual peer evaluation submissions.
1. Team member list submission
Each project team must post the list of members on Piazza no later than Wednesday October 5, 2016 by 11:55 pm. In order to succeed in the project, you must form your team early and start working. Since each student is personally responsible to join a team, anyone who joins a team later than October 5 will receive a 1% penalty deducted from their individual final project grade for each day that they are late in joining a team. Obviously, you are not contributing to the team effort if you haven’t even joined a team yet.
2. Project ERD submission
The ERD is due Wednesday November 2, 2016 by 11:55 pm. It is worth 20% of the project grade (that is, 5% of the course grade). There are two parts to the submission:
First, one group member must upload the ERD to Moodle in one single zip file named “Group X ERD.zip” (replace “X” with your group letter), with the following individual files:
- A cover page (word processor file) that lists the name of all group members. In this file, if you choose, you may list any notes and comments for the project. For example, you might explain and justify why some tables might not be in 4NF.
- A one-page PDF file named “ERD.pdf” with the fully specified ERD.
Second, every group member must individually complete and submit to Moodle a confidential ERD peer evaluation form where you evaluate the performance of every member of the group (including yourself) for their work so far in the semester. This evaluation is non-binding; it is for feedback purposes so that each teammate knows how their teammates percieve their contributions thus far. This individual evaluation is important for the entire team to function smoothly, and so anyone who submits it late will receive a 1% penalty deducted from their individual final project grade for each day that it is late.
3. Final project submission
The final complete project is due Monday December 5, 2016 by 11:55 pm. It is worth 80% of the project grade (that is, 20% of the course grade). There are two parts to the submission:
First, one group member must upload the final complete project to Moodle in one single zip file named “Group X final project.zip” (replace “X” with your group letter), with the following individual files:
- A cover page (word processor file) that lists the name of all group members. In this file, if you choose, you may list any notes and comments for the project. For example, you can explain and justify why some tables might not be in 4NF.
- A one-page PDF file named “ERD.pdf” with the fully specified ERD. This is the corrected version of the ERD that the final project is based on.
- A spreadsheet file named “Data Dictionary” with the fully specified data dictionary.
- A text file named “Structure.SQL” with the database structure script described above.
- A text file named “Population.SQL” with the data population script described above.
- A text file named “Reports.SQL” with the SQL queries for reports and operations listed above; they should all be in one file. Write the name of each query as specified above (e.g. “1. SGW printed schedule”) in the comments before each query.
- A word processor file named “Screenshots” with the screenshots from the table listings from Structure.SQL and each of the reports from Reports.SQL as specified in the details; these should all be in one file. Write the name of each query in the comments before each query.
Second, every group member must individually complete and submit to Moodle a confidential final project peer evaluation form where you evaluate the performance of every member of the group (including yourself) based on their contribution to the entire project effort from beginning to the end. (However, if a team member has significantly improved their performance since the ERD submission, you may choose to only evaluate their most recent performance.) Your project grade will depend partially on how your teammates evaluate you. This individual evaluation is required for me to prepare the final project grade, and so anyone who submits it late will receive a 1% penalty deducted from their individual final project grade for each day that it is late.
Suggested procedure for successfully executing the project
This project is quite large and complex, and so I provide here a suggestion for how to go about it effectively. The most important point is that rather than strictly dividing up all the work so that each person does their own part, it is most effective to work together on most parts. Multiple minds will help to understand and overcome challenges more easily, and will identify and correct errors more effectively. This could save you more time and would certainly increase the quality of the final project. This is merely a suggestion, but it highlights the parts that are best for everyone to be involved in, and the parts where it might be more profitable to divide the work:
- First meeting:
- Before meeting: Each member should indvidually read all requirements, then identify entities and attributes
- Before meeting: Send each other your lists, then each member individually corrects their own list
- MEET: Agree on entities and attributes
- Second meeting:
- Before meeting: Each member should indvidually specify relationships with full cardinalities, and also draft data types and structure for each attribute
- It is very important to name your relationships in your drafts. Many errors are hidden when you don’t name your relationships.
- Before meeting: Send each other your relationships, then individually correct your relationships
- Agree on relationships and cardinalities
- Agree on general data types and structure for each attribute
- Finalize ERD together
- Before meeting: Each member should indvidually specify relationships with full cardinalities, and also draft data types and structure for each attribute
- SUBMIT the ERD and individual peer evaluations
- Third meeting: After the ERD is marked, MEET:
- Correct the ERD together and decide on the final ERD for the rest of the project.
- Decide who will be responsible for various tasks:
- SQL queries: No query should be done by only one person. At least two people should be responsible for each query. Collaborative work on queries increases learning and effectiveness. Moreover, every team member should work on at least two or three queries, at the very minimum.
- Data dictionary and database structure script: Two people should be assigned to work together on these closely related tasks. They should be people who do work early (not late), or else they will hold everyone else up.
- Data population script: One person can do this, but they should also be someone who does work early (not late), or else no one will be able to test their queries on. Ideally, this person should be very good with Excel; my instructions on data population could be very helpful.
- Project work on your own or in small groups of two or three, according to assigned tasks
- Data dictionary and database structur script have to be done first and quickly.
- Data population script has to be done next and quickly.
- Everyone can begin drafts of SQL queries, but real coding and testing can’t begin until the data is all finalized.
- Before final submission, EVERYONE should execute and verify all SQL queries (not only the ones that each person was responsible for, but all of them) to approve them as correct
- SUBMIT the final project and individual peer evaluations