• Genome Data Science

    We develop methods and tools to work with tens of thousands of genomes and analyze and integrate the corresponding data.

    Model of DNA double helix in front of a student.
    © Universität Bielefeld

Big Data Analytics


392157/ 392158 Schönhuth/Knop Summer 2022 Thu 10:15-12:00 with a break (V)

Contents

The lecture Big Data Analytics develops competencies in performing data mining tasks on very large amounts of data that cannot be stored in main memory. The lecture provides the key ideas of similarity search using minhashing and locality-sensitive hashing, of data stream processing where data arrives so fast that it has to be processed immediately or is otherwise lost, of Web-related algorithms such as Google's PageRank, of algorithms for mining frequent itemsets, association rules and frequent subgraphs, of algorithms to analyze the structure of large graphs such as social network graphs, and of the map-reduce principle to design parallel algorithms.

  1. Finding Similar Items
  2. Stream Data Analysis
  3. PageRank
  4. MapReduce
  5. Mining Frequent Itemsets
  6. Mining Frequent Subgraphs
  7. Mining Social Network Graphs
  8. Recommender Systems


Literature

  • A. Silberschatz, H. F. Korth, S. Sudarshan, „Database System Concepts“, 5th edition, McGraw Hill, 2006.
  • R. Elmasri und S.B. Navathe, „Fundamentals of Database Systems“, 5th edition, Pearson/Addison Wesley, 2007.
  • William H. Inmon, “Building the Data Warehouse”, John Wiley & Sons, 1996.
  • Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, “Mining of Massive Datasets”, 2nd Edition, Cambridge University Press, 2014.
  • Tom White, “Hadoop: The Definitive Guide Storage and Analysis at Internet Scale”, 3rd edition, O'Reilly.
  • Viktor Mayer-Schönberger , Kenneth Cukier , “ Big Data: A Revolution That Will Transform How We Live, Work and Think”, John Murray, 2013.
  • Eric Redmond , Jim R. Wilson, “Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement”, O' Reilly, 2012.
  • Peter Gulutzan, Trudy Pelzer , “SQL Performance Tuning”, Addison Wesley, 2002.

Time table lecture

Date Topic
07.04.2022 (Online) Introduction ( slides)
14.04.2022 (Online) Finding Similar Items I ( slides)
21.04.2022 (Online) Finding Similar Items II ( slides)
28.04.2022 Map Reduce / Workflow Systems I ( slides)
05.05.2022 Map Reduce / Workflow Systems II ( slides)
12.05.2022 Mining Data Streams I (slides)
19.05.2022 Mining Data Streams II (slides)
26.05.2022 no lecture
02.06.2022 Link Analysis I (slides)
09.06.2022 Link Analysis II / Frequent Itemsets I (slides)
16.06.2022 no lecture
23.06.2022 Recommendation Systems (slides)
30.06.2022 Social Networks (slides)
07.07.2022 no lecture
14.07.2022 Exam

Time table tutorials

Date Topic
06./07.04.2022
14.04.2022 Upload Exercise 01
20./21.04.2022
27./28.04.2022 Upload Exercise 02 Discussion Exercise Sheet 01
04./05.05.2022
12.05.2022 Upload Exercise 03
19.05.2022 Discussion Exercise Sheet 02
25.05.2022 Discussion Exercise Sheet 02 (Group B)
01./02.06.2022 Upload Exercise 04 Discussion Exercise Sheet 03
08./09.06.2022
15./16.06.2022
22./23.06.2022 Upload Exercise 05 Discussion Exercise Sheet 04
29./30.06.2022
06./07.07.2022 Discussion Exercise Sheet 05
11.07.2022 Exam preparation for all groups (Swen Simon)
14.07.2022 Exam

Examination dates

1st Exam:

Presence exam, Thursday, July 14, 2022, 14:00 to 17:00

2nd Exam:

Presence exam, Thursday, Aug. 04, 2022, 09:00 to 12:00