CISC 5640

CISC 5640

NoSQL Database Systems

Final Exam, Spring 2021

Instructor: Ruhul Amin

I declare that I will not copy any part of this questionnaire from the class lecture or online materials. Moreover, I ensure that I will not ask any questions to anyone either directly or indirectly by any means. Overall, I will maintain honesty and integrity during exam time.

Name:Douglas Mensah__ ID:__A18685161_________

90 Minutes Exam; Date: May 14, 2021

1. Name three Big Data properties and define?

   Velocity: simply means data changing frequently
   Variety: simply means data in different formats
   Volume: data in large quantities

2. What do you mean by static vs dynamic, and structured vs unstructured data?

  Dynamic data: it’s a data that changes over time or periodically updated.eg websites Static data: it’s a type of data that does not change after it’s been recorded. Eg mri scan   Structured data: it’s a data that has a predefined model and organized. Unstructured data: it’s a data that has no unique form of model. Eg mp3

3. Which of the 4 types of datasets (static, dynamic, structured, unstructured) are used for RDBMS/NoSQL and why?

Structured, unstructured and dynamic. This is because Rdbms uses relational data(structured) in storing database.  

4. What is the main limitation of Vertical and Horizontal scaling of a database? Explain.

Vertical scaling:it’s limited by the number of cpu,ram and applied resources that can be configured on a single machine. Horizontal scaling: Limited by the Read-to-Write ratio and communication overhead(communication imbalance)    

5. Horizontal scaling requires Sharding and Replication. Why?

Sharding to achieve  Concurrent access
Replication to achieve  scalability

6. What do you mean by ACID properties? Explain with example.

Atomicity    the whole transaction occurs at once or doesnt happen at all.if an operation fails, the entire transaction is aborted. Example, if there are multiple join queries operation and final operation is to update and one join fails. The whole transaction will be aborted due to one operation failure  
Consistency   this follows atomicity. Database must be consistent before and after transaction. Example, before a client performs an update request, the state of the database must be the same across the distribution and after the update operation, the updates must reflect on/at all distribution    
Isolation  Transactions occur independently.example for a pair of transactions eg delete and update. It appears to update operation that either delete finished execution before update started or vice versa    
Durability  changes of a transaction/commit to a database occur regardless of any server restart. system guarantees changes regardless of server restart. Example if a client performed an update operation and the response was commit successfully regardless of server restart the update must be executed.      

7. Which of the ACID properties can be ensured by the 2PC protocol? Explain.

         

8. CAP theorem is used to explain the limitations of a distributed database. Explain each of the three properties used by this theorem:

Consistency    All clients see the same information at the same time. This means if you write data to the distributed system, you should be able to read the same data at any point in time from any nodes of the system.
Availabilty    The system continues to operate even in the presence of node failures. This means the system should always perform reads/writes on any non-failing node of the cluster successfully without any error.
Partition tolerance    The system continues to operate even in the events of system failures. if there is a partition between nodes are not able to talk to each other the system should still be functioning

9. Why ‘Loose Consistency’ is easier to implement than ‘Strict Consistency’ in a distributed database? Explain with an example.

                 

10. Explain the BASE properties of a distributed database. Give an example of any popular service that uses this property.

BASE: Basically available: the system guarantee availability. (there could be a possibility of fault, but it still will be available for some users.) Soft state: changes might (copies of a data item may be inconsistent) Eventual Consistency: the system will eventually become consistent at a stage. This is because changes in other nodes will finally come together hence making the system consistent.
Example:   Banking service  

11. What benefits the following NoSQL data modeling techniques offer? Explain.

Denormalization    Denormalization is a database optimization
Aggregate    returns the computed results
Application side join       

12. What are the two important principles of Sharding? Explain.

1.  Vertical partitioning
2.Horizontal partitioning  

13. What is the difference between Peer-to-Peer and Master-Slave replication? Explain with an appropriate architectural diagram for each.

  
Diagram 1:                            Diagram 2:

14. State three differences between NoSQL and RDBMS based on Model, Data, and Schema:

NoSQL  RDBMS
  schemalessUses schema
Unstructured data  Structured data
Document,column,key-value or graph  Relational model

15. Name 4 NoSQL Database techniques? For each type, include an example database and a service in which such a database can be used effectively than any other alternatives.

1.Document: Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is stored as a document. DB example: mongodb Service: attendance system    
2. Key value: Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, string, or predefined datatype DB example: redis Service Example: instant messaging where there can even be a time to live and message will be deleted after      
3.Graph: Graph base database mostly used for social networks, logistics, spatial data To store entities and relationships between nodes and there exists so many patterns and mutual relationships. Db example: neo4j Service example: social networking apps or platforms    
4. Column family: predefined and structures. Every column is treated separately. Values of single column databases are stored contiguously. Db example: Cassandra Service example: when dealing with big data and data warehousing    

16. Write down MongoDB, Cassandra and Neo4J terminologies being used in place of RDBMS terminologies:

RDBMSMongoDBCassandraNeo4J
DatabaseDatabasedatabasedatabase
TableCollectiontablenode label
RowdocumentRow keynode
ColumnKey (key-value)Column keyNode property
Joinembeddedjoinrelationships
Primary Key_idPrimary keyPrimary key

17. What are the three basic operations in Redis? Include an example for each operation.

Put hset -key -valye/field  
Get hget -key -value/field  
Delete hdel – key -value/field  

18. Write down the three collection types (with example) you can use in the Column-Family database:

   
   
   

19. What are the Nodes and Edges in the Graph database? Explain a scenario in which graph database results in gaining time and space complexity. 

Nodes: contains properties with key-value pairs this is basically considered as a table
Edges: can be considered as relationships and this basically connects two nodes together  
Node Example: eg. Student node can have name: “Douglas”, age:’20” as key value pairs   Edge example: Using the movie db example in class. Person node can have an edge/relationship by [:ACTED_IN] connected to the movie node

20. Write down the flow-chart of data processing steps (Input, Splitting, Mapping, Shuffling, Reducing, Final Results) in Map-Reduce operations for counting following words:

{Tiger, Deer, Bear, Tiger, Monkey, Bear, Tiger, Deer, Bear}

Use the next page.

  Input: this is considered as the values coming into the system or database. So basically {Tiger, Deer, Bear, Tiger, Monkey, Bear, Tiger, Deer, Bear} are the inputs. Step 2 splitting: is done by isolation is group entry Step 3 mapping: consider this as a key value. Eg tiger:1 deer:1 bear:1 Step 4 shuffling: this rearranges all the key and its value as one group Step 5 reducing: this counts the values in each group per the shuffling Step 6 final results: this then gives you the total count of each word                                                  

Leave a Comment

Your email address will not be published. Required fields are marked *

css.php