2022

Facts and Learning Tips

Databases will be one of the hardest subjects you have encountered during your studies. Although decades of experience suggests that the only thing needed to pass (with flying colors) is willing to obtain the knowledge presented to you during the course. Trust me, the knowledge you get here is basically a full-stack slice of THE system engineering skillset, which can be directly applied in the industry. During the semester, me and my colleagues attemp to give you a profound knowledge of both practice and theory. This includes the tech skills introduced during lab sessions, but dives down all the way to database theory including high-level structural modeling, physical storage considerations, algorithms and detailed answers for many “why”-s. So now I’m giving you the “how”: some tips and best practices for being a successful student of Databases:

  • Study every single week. For each lecture, practice and lab, you should be familiar with the whole material of the lectures presented to you at that given point in time. You also have to be familiar with the relevant sections of the reference book. Do not be surprised, if the quiz or midterm or exam asks for knowledge not discussed in class, but discussed by the reference book. During (self-)studying, your instructors are open to discussion via e-mail in case you don’t understand something.
  • Knowledge does not mean being able to recite the material word-by-word. Reciting the material word-by-word is completely useless in this course, as it is not what makes a good engineer.
  • Knowledge means having a deep understanding of the material, seeing the connections between its different parts, and knowing the answer for all “why”-s. This knowledge can be obtained by trying to explain the material at home, in front of the mirror, to your teddy bear, your friend or roommate: Do this out loud, and while doing so, you will hear yourself and sense which parts are missing and have to be worked on.
  • Knowledge also means being able to apply your knowledge. For this, doing all the tutorial exercises, doing the whole exercise book, and checking it with your tutorial instructor is a very good way.
  • It is OK to be wrong. It is a valuable and necessary step in any learning process.
  • It is OK to ask. As being wrong is OK, don’t be afraid to. All your questions will be appreciated! Ask during lectures, ask during tutorials, ask during labs or ask via e-mail: We are there to help and answer. Actually, that is why we’re there.
  • It is not okay to pretend you understand while you do not. Feel comfortable to stop the class at any time and ask any question. There is no such thing as dumb question. All questions are valuable!
  • As the goal is a profound and practical knowledge, the tests will contain exercises for checking this. Test exercises might greatly differ from the ones in class, but will need the same skillset and knowledge.
  • A prerequisite of this course is to speak, comprehend and read English fluently. If you feel you are not there yet, do drop the course, spend the same amount of time improving your English skills, and come back next fall. Background: This course is the so-called “mirror course” of the Hungarian Databases course. You will get the same degree eventually, so you have to meet the same requirements to pass Databases. Hungarian students speak Hungarian fluently, so you have to speak English fluently to avoid language skills from being a barrier of your Databases studies.

Material

Reference book. This is the book you have to self-study at home besides the class material. The relevant sections are defined below.

PPT presentations:

Exercise book. If you do every single exercise, and discuss your questions with your instructors, your chances of getting a good grade will significantly improve (as decades of experience suggests).

Here are the recordings of last year’s Databases lectures and tutorials. The ones relevant for you are the ones under my name. “Databases, Lecture n” is the nth lecture, “Databases, Practice m” is the m tutorial session. Also, there is a “Databases, 7,5th Lecture”. Don’t miss that one. Note: These videos only cover like half of the complete material!

Lectures

Lectures are held every Monday from 14:15 to 15:45. Location: QBF.11

Topics and corresponding sections in the reference book:

  • ER modeling: p32-42. Notes for the ER modeling chapter of the book:

    • disregard p33/paragraph 2., as we don’t talk about object-oriented data modeling

    • The book uses the term “relationship set” the same way as our course does. However, the book uses the word “relationship” as a synonym of “relationship type”, which describes which entity sets are connected by the given relationship type. According to the book, the instances of a relationship (type) are all the exact representations of the given relationship type, which are relationship sets, where each relationship set contains different connections among the entities of the connected entity sets. This terminology of the book is a very nice, mathematically fair and consistent. However, in this class, we will use an rather lightweight terminology, where:

      • a relationship denotes an element of a relationship set, i.e. a relationship defines a connection among a pair of (or set of) entities (i.e. entity set elements)

      • a relationship set is the set of all relationships of the same semantics. There is only a single relationship set of the given semantics, and its elements (the relationships) may change by time

      • So whenever the book refers to a “relationship”, it means “relationship set” in the terminology of our course. Some exact cases (not all of them) will be mentioned below.

    • As a consequence of the above, on p36, the first paragraph under section Relationships is adjusted to the terminology used in this course as follows: “A relationship set among entity sets is an ordered list of entity sets. A particular entity set may appear more than once on the list. If there is relationship R among entity sets E1, E2,…, Ek, then the current instance of R (i.e. the current relationship) is a set of k-tuples. Each k-tuple (e1,e2,…,ek) in relationship set R implies that entities e1,e2,…,ek, where e1 is in set E1, e2 is in set E2, and so on, stand in relationship to each other as group. The most common case, by far, is where k=2, but lists of three or more entity sets are sometimes related.”

    • In Example 2.4 MOTHER_OF is a relationship set itself, not a relationship. (p1,p2) is a relationship (instance).

    • In section “Entity-Relationship Diagrams”, under 3, “Diamonds represent relationship types” according to our terminology.

    • When the book writes about “Functionality of Relationships”, according to our terminology, it writes about “Functionality of Relationship Sets”

    • Also note that “isa relationship” is separate from the notion of “relationship”, as an isa relationship is defined between entity sets and not entities.

    • Note that the concept described in the “Borrowed key attributes” is the same as that of weak entity sets, except for the final example of the section about multiple citizenships: the toolset used in this course does not directly support modeling such cases.

  • relational data model: p43-65
    • On page 49, in point 4 (top of page) k=1 that is, there is only a single entity set on the ‘many’ side of the relationship set, as we only define cardinality (functionality) for binary relationship sets. In conclusion when mapping a many-to-one (binary) relationship set to a relational scheme, the key of the relational scheme will be the key of the entity set on the ‘many’ side.
  • Physical organization: p294-330 (Section 6.7 is not needed)
    • skip the part from section “Variable-Length Records” on p300, all the way to the beginning of section 6.2 on p304 (Section 6.2 is needed)
    • skip the part from section titled “Sorted Files with Pinned Records” on p318, all the way to the beginning of Section 6.5 on p321 (Aection 6.5 is needed)
    • you can read Section 6.13 (p358) for fun (will not be part of midterm or exam)
    • you might want to read Section 6.8, but in lecture we will have a more concise discussion on this (and we will only use dense indices for second indices)
    • B*-tree example here
  • Query optimization: See PPT
  • Transaction management: p467-543
    • TODO
  • Design theory: p376-403
    • TODO

Instructors

Evaluation of results

A single midterm test will be written. Furthermore, after each topic. Passing the midterm test is a requirement for the signature. One retake is possible in case of failure on the midterm test.

Furthermore, there will be a quiz (10 minute) in class after each (sub)topic. You will be notified about the next quiz in the previous lecture. Quizzes are discussed right away (5 minutes). Quizzes and their discussion take 15 minutes at maximum. Further discussion is possible after class or via e-mail. The purpose of quizzes is to serve as a feedback for you on how clear you are with the material and how much you are on the good path on passing the midterm test or exam. Quizzes act as a “progress bar” if you like. Quizzes generally do not influence the signature or the final mark, but might convince me to or not to round your final mark up in case you are in between marks.

Your final grade is calculated as follows: For each lab, you get a lab grade, depending on the grade you got for your work from the evaluator, and your classwork in lab. Note: Your grade for each lab will be assigned by your lab instructor. The evaluator only grades the work you handed in. The lab instructor has the right to give a different lab grade.

You will get 4 lab grades in total (for the labs, except for the Team Workshop), which will be averaged (let’s call this lab average L). Each of your lab grades must be greater than or equal to 2 to get the signature. A single lab can be retaken in case of failure. If you fail 2 labs or more, you will not get the signature. Failing a lab entry test makes you fail the given lab.

The prerequisites of the signature are thus: Passing the midterm (or the midterm retake), and passing all labs.

If you have the signature, you can take the exam. You also get a grade for your exam. Let E denote this grade. E must be greater than or equal to 2.

Your final grade, registered in Neptun will be 0.6E+0.4L.

Your performance on the Team workshop might increase your final mark.

Tutorials (practices)

Tutorials are held on Thursdays, from 16:15 to 17:45 in room IE.220 (take the triple elevator to the 2nd floor), according to the following schedule. You should prepare for these classes, from all preceding lectures.

Instructor

Material and dates

Please make sure you have this open at the beginning of each practice.

Topic Date Exercise sheet
ER Modeling Sep. 15 HERE
LECTURE Sep. 22  
Relational Queries Sep. 29 HERE
Physical Storage Oct. 13 TODO
Query Optimization Oct. 27 TODO
LECTURE Nov. 10  
Transaction Management Nov. 24 TODO

Please note that the tutorial about design theory will be incorporated to the lectures. Also, the tutorial time slots denoted by LECTURE mean that in the given Thursday there will be a lecture in the time and place of the tutorial.

Laboratories

Instructors and contact

Lab responsible:

  • Ruba Mahasneh, ruba.mahasneh -AT- db.bme.hu

Lab instructors:

  • Idrees Razak, idreesrazak -AT- edu.bme.hu
  • András Gerendás, gerendas.andras -AT- db.bme.hu
  • AlGhaith Ahmad, alghaith.ahmad -AT- edu.bme.hu
  • Toghrul Jabbarli, jabbarli.toghrul -AT- edu.bme.hu
  • Tatiana Barbova, tatiana.barbova -AT- edu.bme.hu
  • Chaitanya Arora, chaitanya.arora -AT- edu.bme.hu
  • Ashwin Varma, ashwin.varma -AT- edu.bme.hu

Place of labs

For the Team Workshop EVERYONE go to TODO

For all the other labs, do as described below:

  • If your NEPTUN ID starts with A, B, C, D or E then go to room R4N. Your lab instructors are Toghrul Jabbarli, Ashwin Varma, and Idrees Razak
  • If your NEPTUN ID starts with F, H, I, J or M then go to room R4O. Your lab instructors are András Gerendás and Tatiana Barbova
  • Otherwise, go to room R4P. Your lab instructors are AlGhaith Ahmad and Chaitanya Arora.

Time of labs

Each lab is on Thursday, from 16:15 to 17:45, according to the following schedule:

Topic Date
Oracle Oct. 6
SQL I Oct. 20
SQL II Nov. 3
SQL III Dec. 1
Team Workshop & Retakes Dec. 8

The Team workshop will be in room IE.220, only for those, who do not need to retake any labs.

General information

You should arrive at the lab prepared from the course material. At the beginning of each lab, an entry test will be written which is necessary for completing the given lab practice. The course material can be found below for each topic, separately.

You will start solving the exercises during class. The amount of exercises that you get can be solved within a single class if you are prepared. Once finished, you can upload your solutions in a predefined format (see below) in the lab portal, which is here. If you can’t finish the exercises during class, you can finish them at the Students’ Computer Center (HSZK, floor 4, building R), or at home, but 48 hours after the beginning of the given lab, your solution has to be uploaded. The deadline is thus, Friday, 16:15.

The labs are equipped with the hardware and software needed for the classes. Using your own computers during labs is allowed, although we cannot give support for installing/configuring or troubleshooting the necessary tools on your laptops.

Software you will need

  • Oracle SQL Developer 4.2.0. To run it: Unpack it, and execute sqldeveloper\sqldeveloper.exe. If it does not work, then run sqldeveloper\sqldeveloper\bin\sqldeveloper.bat. This is for Win64. You’ll need JRE to run this.
  • For other platforms, you can download the same version of SQL Developer here. You’ll need to install the appropriate Java 8 JDK to run this. Attention! On Oracle’s website, SQL Developer 18.1 stable can be downloaded since April 5th 2018. However, in the lab we will use version 4.2. We are unaware of any significant issues caused by the differences between versions, but should there be any issue due to this, we might not know the solution. So please, use version 4.2

The work environment is provided and supported in HSZK.

Network settings for home

SQL Developer connection:

  • Hostname: rapid.eik.bme.hu
  • Port: 1521
  • SID: szglab

Useful material

Lab 1, Oracle

Please prepare from the following PDF:

Your exercise sheet can be downloaded from here.

You should upload a single PDF file to the lab portal, containing your report. The name of the file should be NEPTUN-1-c16.pdf, where you should substitute NEPTUN with your actual NEPTUN code.

Below is the SQL code that has to be executed after solving exercise 5.

-------------------------------------------------
-- SQL code to be executed in the 5th exercise
-------------------------------------------------
column grantor format a8;
column grantee format a8;
column table_name format a20;
column privilege format a20;
select grantor
     , grantee
     , table_name
     , privilege
     , initcap(grantable) grant_opt
  from all_tab_privs
 where grantor = user
    or grantee = user
 order by grantor, grantee, table_name, privilege
;

Labs 2, 3, and 4: SQL1, SQL2, SQL3

The material for the SQL labs (SQL1, SQL2, and SQL3) can be found here. The appendix about SQL constraints is here

  • For the first SQL lab, you should be prepared from table definition, data manipulation statements and from the ER modeling and ER->relational mapping chapters of the Databases course.
  • For the second SQL lab, you should be prepared from SQL queries.
  • For the third SQL lab, you should be prepared from the entire SQL guide.

For the requirements of completing this lab, and for some SQL tips, read this Students’ guide!

The skeleton of the script to be handed in is here, while here is a page that helps you create the XML file if you don’t do it manually. USING THE GENERATOR PAGE IS PREFERRED.

SQL 1 sample file is here

Lab 5, Team workshop

Before you join this workshop, make sure you are familiar with all of the material already covered in the lectures, tutorials and previous labs. Exercises will only be shared with you during the lab.

Are you ready for a challenge?

Choose your teammates (3 or 4), then the first two teams to deliver the correct solution will be entitled to upgrade their final grade.

The aim of this workshop is to facilitate discussions and activities to explore ER modeling by solving two real-life use cases. The only way you can claim the extra marks is by finishing the two case studies within the lab time. The first 10 minutes will be dedicated to the instructors explaining (going over) the use cases and to sharing the link were you can download them.

Then the competition beings!

Note: All the teams will be able to view the solutions, as they will be posted on the portal after the lecture.

The solution of the team workshop is here

SQL exercises

In order for you to be able to try writing SQL statements before the lab, and gain some practical knowledge, I made for you a script and some exercises you can do before the labs.

To initialize the environment, run this script in SQL Developer. This creates two tables and fills them up with data you can work on. Then, query the following:

  • The name of each person.
  • People and cars belonging to them.
  • People without cars.
  • Car types and number of owners of that type.
  • People and number of cars they have in descending order of number of owned cars.
  • People and number of cars they have, including those without a car (i.e. having 0 cars). Rows should be ordered by the number of cars owned, in ascending order.

Create a table storing which person drove which car (not necessarily their own) and when.

Insert example data into this newly created table. Try to insert data belonging to a nonexistent person or car. Do you get an error message? If not, create the appropriate constraints, which you have not created yet.

Have fun!

Theoretical exercises

TODO remove and/or into exercise book

Here are some exercises you can use for preparing for tests (beyond the tutorial material):

  1. Given is the following description: A patient might have multiple diseass, and there are diseases not affecting anyone at the moment. Each patient is treated on a single base. A patient might be treated by multiple doctors. Doctors might have multiple patients, who can be lying in different bases. A base can be empty, and always belongs to a single hospital. A hospital might have multiple bases and multiple doctors. A doctor is employed by at most 3 hospitals. A hospital is always led by a director who is a doctor at the given hospital, has an economy degree too, and is not employed by any other hospital. Create an ER diagram about the above problem. Display the cardinality (functionality) of the relations with the learned syntax. Identify entities with appropriately chosen attributes, and define the keys.

  2. We’d like store the monthly menu of a canteen in a database. Each day, the menu contains a soup, a main course, and a dessert. A meal can occur multiple times within the month, but we know that to a single soup-main course combination, a single dessert is suitable. For each meal, we’d like to store its name, the energy it contains, and its ingredients (ingredient and amount needed to produce the meal). Create an ER-model for the database.

  3. Given is the following description: A hospital contains multiple departments. Each department has a main leading doctor and an arbitrary number of leading doctors. If there is no main leading doctor, then there is a temporarty leader for the department. They all have a doctor’s degree, and are the employees of the hospital, and not employed in any other hospital. Beyond them, the hospital staff has other members too: nurses and auxiliary personnel. Doctors and nurses always work at a given department of the hospital, while the auxiliary personnel can directly belong to the hospital too. Each employee has a code number, but for doctors, we register their state-level doctor IDs as well. The hospital is always led by a doctor who has an economy degree, and not employed in any other hospital. If a patient gets into the hospital, can be assigned to different departments one after another, and can be treated for different diseases until completely cured. Create an ER-diagram about the above topic. Display the cardinality (functionality) of the relations with the learned syntax. Identify entities with appropriately chosen attributes, and define the keys.

  4. A file of 15 525 records has to be stored using sparse index. Record length is 850 bytes, block capacity is 4000 bytes. A key takes 50 bytes while a pointer takes 18 bytes.
    • At least how many blocks are needed?
    • How long does it take at maximum, to read the contents of a record if the memory has 5000 bytes free space? (One block operation takes 5 ms)
    • Does it help with the record access time if we have 10 times as much free memory? What if we have 100 times as much? How should we use the extra memory?
  5. We would like to store a file using a single-layer sparse index built on a dense index. Give a lower estimate for the number of necessary blocks within the following conditions:
    • The file consists of 3 000 000 records.
    • A record takes 300 bytes.
    • The size of a block is 1000 bytes.
    • A key takes 45 bytes.
    • A pointer takes 5 bytes.
  6. We store 1 000 000 records in a database, using bucket hash. A record takes 110 bytes, a block is 3000 bytes , a key is 25 bytes, while a pointer is 64 bytes. Reading a block takes 5 ms. The record access time must not be greater than 20 ms. The bucket catalog fits in the memory, and the hash function is even.
    • What is the average record access time?
    • How many bytes does the bucket catalog occupy from the memory?
    • How much extra memory do we need to reduce the block access time to its half?

Midterm

  • November 7th, 14:15-15:45, in room TODO

Retakes

  • Laboratory retakes will be on December 8th 16:15. TODO

  • The midterm retake will be on December 5th, between 14:15 and 15:45, in TODO. Whoever does not have a midterm retake, does not have to come to class that day.