Welcome to Scribd!

0% found this document useful (0 votes)

86 views

Homework Assignment 2: Total Points 80

Uploaded by

This homework assignment involves answering questions about sampling from data streams to estimate statistics, using Bloom filters to estimate set membership, estimating frequencies of elements in a data stream using the Frequency Moment (FM) algorithm, and estimating frequencies of elements in a sliding window using the Dynamic Global Iceberg Monitoring (DGIM) algorithm. The document provides details of exercises to complete for each algorithm involving calculating statistics from samples, estimating false positive rates, determining tail lengths, and estimating frequencies. Solutions must show all steps and be explained briefly in own words. Copying from others will result in no points.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Homework Assignment 2: Total Points 80

Uploaded by

samriddhi

0% found this document useful (0 votes)

86 views2 pages

Original Title

2187891_1_hw2

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

86 views2 pages

Homework Assignment 2: Total Points 80

Uploaded by

samriddhi

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Homework Assignment 2

From the course book Mining Massive Datasets, chapter 4.

http://infolab.stanford.edu/~ullman/mmds/ch4.pdf
Use your own words. No cut-and-paste from the web or from class mates. Copying from other
sources will be detected and result in 0 points. If assignments by multiple students seem too
similar to be independent work, all students will receive 0 points.
It is great to work on solutions in groups! Just prepare the homework report in your own words.
Show all steps of your solution and calculations and explain them briefly in your own words. If
you write just the answer to the question without solution details, it will be 0 points.
No long answers, just brief and clear explanations for each step of your solution are required.

Total points 80
1. Sampling

(20 points) Exercise 4.2.1 : Suppose we have a stream of tuples with the schema Grades(university,
courseID, studentID, grade) Assume universities are unique, but a courseID is unique only within a
university (i.e., different universities may have different courses with the same ID, e.g., “CS101”) and
likewise, studentID’s are unique only within a university (different universities may assign the same ID to
different students). Suppose we want to answer certain queries approximately from a 1/15th sample of
the data. For each of the queries below, indicate how you would construct the sample. That is, tell what
the key attributes should be.

(a) Estimate the average number of courses per university.

(b) Estimate the fraction of students who have a GPA of 3.7 or more.

Explain briefly but clearly how you will create the sample and why.

2. Bloom Filter

(15 points) Exercise 4.3.1 : For the situation of our running example (8 billion bits, 1 billion members of
the set S), calculate the false-positive rate if we use 3 and 5 hash functions. Briefly explain each step in
your solution.

3. FM Algorithm

(10 points) Exercise 4.4.1 : Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Our hash
functions will all be of the form h(x) = ax+ b mod 32 for some a and b. You should treat the result as a 5-
bit binary integer. Determine the tail length for each stream element and the resulting estimate of the
number of distinct elements if the hash function is:

(a) h(x) = 2x + 1 mod 32.

(b) h(x) = 3x + 7 mod 32.

Briefly explain each step in your solution.

4. DGIM Algorithm

1) (15 points) Exercise 4.6.1 : Suppose the window is as shown in Fig. 4.2. Estimate the number of 1’s the
last k positions, for k =

(a) 5

(b) 15

In each case, how far off the correct value is your estimate?
2) (20 points) Study the example in section 4.6.7 Extensions to the Counting of Ones. Use the technique of
Section 4.6.6 to estimate the total error. Show that if each ci has fractional error at most e, then the
estimate of the true sum has error at most e.
Briefly explain each step in your solution.

Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Homework 2
Document3 pages
Homework 2
Alireza Zahra
No ratings yet
MIT6 189IAP11 hw2
Document8 pages
MIT6 189IAP11 hw2
Ali Akhavan
No ratings yet
COMP 116: Object Oriented Programming: (Course Plan February 2011) 1 Year 2 Semester
Document7 pages
COMP 116: Object Oriented Programming: (Course Plan February 2011) 1 Year 2 Semester
Rahul Khadka
No ratings yet
Exam One
Document28 pages
Exam One
asdf
No ratings yet
CSC263 Winter 2021 Problem Set 1: Instructions
Document4 pages
CSC263 Winter 2021 Problem Set 1: Instructions
Codage Aider
No ratings yet
Computer Science and Software Engineering Semester 1, 2014 Examinations CITS1001 Object-Oriented Programming and Software Engineering
Document26 pages
Computer Science and Software Engineering Semester 1, 2014 Examinations CITS1001 Object-Oriented Programming and Software Engineering
philipshen1969
No ratings yet
SSS 2 3RD Term Lesson Note
Document22 pages
SSS 2 3RD Term Lesson Note
Oyinade Adeolu
No ratings yet
CSE-1020 Midterm Exam: Written Portion Family Name: Given Name: Student#: CSE Account: Section: A E
Document12 pages
CSE-1020 Midterm Exam: Written Portion Family Name: Given Name: Student#: CSE Account: Section: A E
examkiller
No ratings yet
adaAssignment 5
Document4 pages
adaAssignment 5
ime
No ratings yet
Assignment A1
Document7 pages
Assignment A1
pw6233
No ratings yet
Project 3 STD 10
Document10 pages
Project 3 STD 10
thedarklord.sc
No ratings yet
Fall 2012 - Homework 5: Che 3E04 - Process Model Formulation and Solution
Document4 pages
Fall 2012 - Homework 5: Che 3E04 - Process Model Formulation and Solution
harvey
No ratings yet
CS 229, Summer 2019 Problem Set #2 Solutions
Document18 pages
CS 229, Summer 2019 Problem Set #2 Solutions
Sasanka Sekhar Sahu
No ratings yet
Company Questions
Document99 pages
Company Questions
zfmck4f4nk
No ratings yet
ICTD 351: Introduction To Computer Programming For The Mathematics Teacher
Document37 pages
ICTD 351: Introduction To Computer Programming For The Mathematics Teacher
Prince Boahene
No ratings yet
Adobe Sample Paper Questions
Document5 pages
Adobe Sample Paper Questions
Riya Srivastava
No ratings yet
Final 2009
Document3 pages
Final 2009
eddie2490
No ratings yet
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
Document3 pages
CSE 455/555 Spring 2012 Homework 1: Bayes ∗ ω
Alireza Zahra
100% (1)
Java Programming Tutorial 1
Document10 pages
Java Programming Tutorial 1
Akisseh Ngunde Nnam
No ratings yet
Programming Paradigms - C++ FS 2017: Universit at Basel
Document4 pages
Programming Paradigms - C++ FS 2017: Universit at Basel
BookDown
100% (1)
Ps 2
Document11 pages
Ps 2
Shaurya Goyal
No ratings yet
Class Assignments
Document21 pages
Class Assignments
sklencarovamaria
No ratings yet
SampleExamSolutions 2up
Document7 pages
SampleExamSolutions 2up
spikysim
No ratings yet
Midterm Review
Document30 pages
Midterm Review
shivam pandey
No ratings yet
TCS Programming Bits PDF
Document118 pages
TCS Programming Bits PDF
Prasa
67% (3)
TCS Technical Pro Paid Paper-2
Document118 pages
TCS Technical Pro Paid Paper-2
viru kothamasu
100% (1)
Ps 6
Document5 pages
Ps 6
Anil
No ratings yet
Bca Revised Ist Sem Assignment
Document12 pages
Bca Revised Ist Sem Assignment
Chanchal Roy
No ratings yet
COMP2123 Assignment 1
Document3 pages
COMP2123 Assignment 1
efe
No ratings yet
Java Exam
Document3 pages
Java Exam
mdfire0
No ratings yet
ENEL2CAH1 - Computer Methods 1
Document7 pages
ENEL2CAH1 - Computer Methods 1
qanaq
No ratings yet
Assignment Three: CSPS: Question One
Document2 pages
Assignment Three: CSPS: Question One
Cindy San
No ratings yet
CSE 331 Final Exam 3/18/13
Document14 pages
CSE 331 Final Exam 3/18/13
Aijin Jiang
No ratings yet
Lab Manual: Algorithms Design (PR) COT-311
Document10 pages
Lab Manual: Algorithms Design (PR) COT-311
sadafScribd
No ratings yet
Final Test: Student Name: - Time/Date Issued: Time/Date Due: Instructions
Document3 pages
Final Test: Student Name: - Time/Date Issued: Time/Date Due: Instructions
Jonatan Ramirez
No ratings yet
Module 2: Problem Solving Techniques Unit 1
Document7 pages
Module 2: Problem Solving Techniques Unit 1
Alipriya Chatterjee
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
Document5 pages
CS5785 Homework 4: .PDF .Py .Ipynb
Al Tarino
No ratings yet
ECS305 (OOS) 2nd Sessional
Document3 pages
ECS305 (OOS) 2nd Sessional
Ashutosh Singh
No ratings yet
CPSC 540 Assignment 1 (Due January 19)
Document9 pages
CPSC 540 Assignment 1 (Due January 19)
JohnnyDoe0x27A
No ratings yet
Mekelle University Cncs Information Science Department: INSC 3093 Data Structures and Algorithms
Document23 pages
Mekelle University Cncs Information Science Department: INSC 3093 Data Structures and Algorithms
israel teshome
No ratings yet
USOL Panjab University Diploma in Computer Applications Assignments
Document6 pages
USOL Panjab University Diploma in Computer Applications Assignments
Jamil Wolf
0% (1)
NLP Endsem 2016
Document2 pages
NLP Endsem 2016
Puneet Sangal
No ratings yet
Computer Science, Paper-I: Roll Number
Document2 pages
Computer Science, Paper-I: Roll Number
Junaid Ahmed Shaikh
No ratings yet
Cse 321 HW04
Document2 pages
Cse 321 HW04
tuba
No ratings yet
Copy of AIML Simp-Tie
Document4 pages
Copy of AIML Simp-Tie
Sana Khan
No ratings yet
Design and Analysis of Algorithms
Document13 pages
Design and Analysis of Algorithms
tierSarge
No ratings yet
CSE160-Final-18sp-key
Document9 pages
CSE160-Final-18sp-key
manhbet07
No ratings yet
Assignment 2
Document11 pages
Assignment 2
Anonymous yEj2TlFu3
No ratings yet
TCS Ninja Programming MCQ's
Document9 pages
TCS Ninja Programming MCQ's
Vetrí
No ratings yet
CSC2410 2021 S2 Supplementary Exam
Document6 pages
CSC2410 2021 S2 Supplementary Exam
meeras
No ratings yet
Matlab Exam 2 Review Matlab
Document3 pages
Matlab Exam 2 Review Matlab
Richard Kwofie
No ratings yet
02 Greedy Algorithms Problems
Document18 pages
02 Greedy Algorithms Problems
amarjeet
No ratings yet
ENEL2CMH1 - Applied Computer Methods
Document9 pages
ENEL2CMH1 - Applied Computer Methods
qanaq
No ratings yet
Daa Question Paper Winter 2024
Document8 pages
Daa Question Paper Winter 2024
sahillanjewar294
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Simultaneous Linear Equations BBA-I Sem 2022 Batch
Document34 pages
Simultaneous Linear Equations BBA-I Sem 2022 Batch
Gaurav Agrawal
No ratings yet
Cse-Nd-2021-Cs 8391-Data Structures-20630293-40388 (CS8391)
Document3 pages
Cse-Nd-2021-Cs 8391-Data Structures-20630293-40388 (CS8391)
Charmila Jesu
No ratings yet
Gravity and Anti-Gravity
Document31 pages
Gravity and Anti-Gravity
Sergio Fiuza Fiuza
No ratings yet
Milling Programming Manual
Document207 pages
Milling Programming Manual
Ahmad Suffian Ismail
No ratings yet
October Nov 2015 With Solutions
Document13 pages
October Nov 2015 With Solutions
Wikus Sanders
No ratings yet
Map1.map of Nepal
Document1 page
Map1.map of Nepal
Gopi Raj Pun
No ratings yet
VITA 10251E 1 VITA Prothetik Katalog en V09 Screen en
Document44 pages
VITA 10251E 1 VITA Prothetik Katalog en V09 Screen en
Ingrid
No ratings yet
Linear Electric Machines - A Personal View
Document66 pages
Linear Electric Machines - A Personal View
Lin Cheng
No ratings yet
Fiitjee Aits
Document16 pages
Fiitjee Aits
Vinay Mittal
No ratings yet
Anecdotal Record Assestment Form: First Quarter
Document2 pages
Anecdotal Record Assestment Form: First Quarter
cindy fernandez
100% (1)
sequences-8mFpT7Kw75ggYwzB
Document34 pages
sequences-8mFpT7Kw75ggYwzB
Qais Abuhantash
No ratings yet
Quadratic Equations: Parabolas in The Real World
Document2 pages
Quadratic Equations: Parabolas in The Real World
Minnow Chang
No ratings yet
Probabilistic Modeling Processes For Oil and Gas
Document26 pages
Probabilistic Modeling Processes For Oil and Gas
Dmitry Reznikov
No ratings yet
CSEC Additional Mathematics SBA - Group 1
Document11 pages
CSEC Additional Mathematics SBA - Group 1
Tiki Blossom
No ratings yet
Or Asgmt 236
Document6 pages
Or Asgmt 236
Puja Sharma
No ratings yet
Math 9 DLL Q2 W1 D2
Document3 pages
Math 9 DLL Q2 W1 D2
Joy Geronimo
No ratings yet
Math 6210 Lec 01
Document3 pages
Math 6210 Lec 01
Sean Li
No ratings yet
Computing Aptitude ws3
Document6 pages
Computing Aptitude ws3
Pragati Gupta
No ratings yet
Advanced Digital Control Syst EE554: Controllability and Observability of Digital Linear Systems
Document28 pages
Advanced Digital Control Syst EE554: Controllability and Observability of Digital Linear Systems
Abdullah Alogla
No ratings yet
2018 Pre-NE Grade02 Exam EngVersionPrint
Document10 pages
2018 Pre-NE Grade02 Exam EngVersionPrint
Iwan Soegihjanto
No ratings yet
Tissue Engineering II
Document344 pages
Tissue Engineering II
Leonardo Garro
100% (1)
G76 Cutting Methods P1-P4: More Detailed P1-P4 Information
Document2 pages
G76 Cutting Methods P1-P4: More Detailed P1-P4 Information
Sandip Jawalkar
No ratings yet
Project Work
Document42 pages
Project Work
Orah Seun
No ratings yet
Simulation Thickener
Document11 pages
Simulation Thickener
Enrique Santiago Maldonado
No ratings yet
Grade 7 Operations On Integers: Choose Correct Answer(s) From The Given Choices
Document2 pages
Grade 7 Operations On Integers: Choose Correct Answer(s) From The Given Choices
John Philip Reyes
No ratings yet
NDA Preparation
Document2 pages
NDA Preparation
visha181045
No ratings yet
Matriculation Physics Geometrical Optics PDF
Document99 pages
Matriculation Physics Geometrical Optics PDF
iki292
No ratings yet
Choice of Coordinate System: Chapter 2 Kinematics of Particles Nslating Relative Motion RA .
Document9 pages
Choice of Coordinate System: Chapter 2 Kinematics of Particles Nslating Relative Motion RA .
Rodrigo Linda
100% (1)
Chapter 1 Mathematical Modeling
Document8 pages
Chapter 1 Mathematical Modeling
Hansham
No ratings yet
Exercise # 1:: Investigate The Effect of The Following Commands: A) V (2) & B) Sum V+W
Document12 pages
Exercise # 1:: Investigate The Effect of The Following Commands: A) V (2) & B) Sum V+W
Vikram singh
No ratings yet