Ebook483 pages2 hours

Spark Cookbook

Name: Spark Cookbook
Author: Rishi Yadav
ISBN: 9781783987078

By Rishi Yadav

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Become an expert at graph processing using GraphX
Use Apache Spark as your single big data compute platform and master its libraries
Learn with recipes that can be run on a single machine as well as on a production cluster of thousands of machines

Who This Book Is For

If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJul 27, 2015

ISBN9781783987078

Author

Rishi Yadav

Related authors

Skip carousel

Related to Spark Cookbook

Related ebooks

Skip carousel

Apache Spark 2.x Cookbook
Ebook
Apache Spark 2.x Cookbook
byRishi Yadav
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
Ebook
PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
Microsoft Azure Machine Learning
Ebook
Microsoft Azure Machine Learning
bySumit Mund
Rating: 4 out of 5 stars
4/5
Python Business Intelligence Cookbook
Ebook
Python Business Intelligence Cookbook
byDempsey Robert
Rating: 0 out of 5 stars
0 ratings
Elixir Cookbook
Ebook
Elixir Cookbook
byPaulo A Pereira
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Databricks A Complete Guide - 2021 Edition
Ebook
Databricks A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Learning Apache Cassandra - Second Edition
Ebook
Learning Apache Cassandra - Second Edition
bySandeep Yarabarla
Rating: 0 out of 5 stars
0 ratings
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Ebook
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
byVinicius Aquino do Vale
Rating: 0 out of 5 stars
0 ratings
Learning PySpark
Ebook
Learning PySpark
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Hadoop MapReduce v2 Cookbook - Second Edition
Ebook
Hadoop MapReduce v2 Cookbook - Second Edition
byThilina Gunarathne
Rating: 0 out of 5 stars
0 ratings
Learning Hadoop 2
Ebook
Learning Hadoop 2
byGarry Turkington
Rating: 4 out of 5 stars
4/5
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Learn Hadoop in 24 Hours
Ebook
Learn Hadoop in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Azure Databricks A Complete Guide - 2021 Edition
Ebook
Azure Databricks A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
Ebook
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
byHansamali Gamage
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
Fast Data Processing with Spark 2 - Third Edition
Ebook
Fast Data Processing with Spark 2 - Third Edition
byKrishna Sankar
Rating: 0 out of 5 stars
0 ratings
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
Ebook
Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)
byAshish Agarwal
Rating: 0 out of 5 stars
0 ratings
Instant Pentaho Data Integration Kitchen
Ebook
Instant Pentaho Data Integration Kitchen
bySergio Ramazzina
Rating: 0 out of 5 stars
0 ratings
AWS Key Management Service and AWS CloudHSM Third Edition
Ebook
AWS Key Management Service and AWS CloudHSM Third Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Building the Data Warehouse
Ebook
Building the Data Warehouse
byW.H. Inmon
Rating: 5 out of 5 stars
5/5
Getting Started with Talend Open Studio for Data Integration
Ebook
Getting Started with Talend Open Studio for Data Integration
byJonathan Bowen
Rating: 0 out of 5 stars
0 ratings
Talend Open Studio Cookbook
Ebook
Talend Open Studio Cookbook
byRick Barton
Rating: 2 out of 5 stars
2/5
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Ebook
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
byVishwanathan Narayanan
Rating: 0 out of 5 stars
0 ratings
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Ebook
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
byWei Liu
Rating: 0 out of 5 stars
0 ratings
AWS Glue A Complete Guide - 2021 Edition
Ebook
AWS Glue A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Hadoop: Data Processing and Modelling
Ebook
Hadoop: Data Processing and Modelling
bySandeep Karanth
Rating: 0 out of 5 stars
0 ratings
Learn Hbase in 24 Hours
Ebook
Learn Hbase in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019
Ebook
The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019
byShoshana Zuboff
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Algorithms to Live By: The Computer Science of Human Decisions
Ebook
Algorithms to Live By: The Computer Science of Human Decisions
byBrian Christian
Rating: 4 out of 5 stars
4/5
Storytelling with Data: Let's Practice!
Ebook
Storytelling with Data: Let's Practice!
byCole Nussbaumer Knaflic
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
Learning the Chess Openings
Ebook
Learning the Chess Openings
byJef Kaan
Rating: 5 out of 5 stars
5/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
Ebook
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
byMarc A. Palmer
Rating: 5 out of 5 stars
5/5
Get Into UX: A foolproof guide to getting your first user experience job
Ebook
Get Into UX: A foolproof guide to getting your first user experience job
byVy Alechnavicius
Rating: 4 out of 5 stars
4/5
Black Holes: The Key to Understanding the Universe
Ebook
Black Holes: The Key to Understanding the Universe
byBrian Cox
Rating: 5 out of 5 stars
5/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit
Ebook
Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit
byJeremy P. Jones
Rating: 4 out of 5 stars
4/5
The Alignment Problem: How Can Machines Learn Human Values?
Ebook
The Alignment Problem: How Can Machines Learn Human Values?
byBrian Christian
Rating: 4 out of 5 stars
4/5
Prompt Engineering ; The Future Of Language Generation
Ebook
Prompt Engineering ; The Future Of Language Generation
byMichael Ferguson
Rating: 3 out of 5 stars
3/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
Deep Learning with PyTorch
Ebook
Deep Learning with PyTorch
byLuca Pietro Giovanni Antiga
Rating: 5 out of 5 stars
5/5
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
Ebook
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
byEryk Lewinson
Rating: 0 out of 5 stars
0 ratings
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
Ebook
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
byMahbouba Gharbi
Rating: 5 out of 5 stars
5/5
Computer Science I Essentials
Ebook
Computer Science I Essentials
byRandall Raus
Rating: 5 out of 5 stars
5/5
Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis
Ebook
Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis
bySebastien Donadio
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
S1:E1 "The Beginning"
Podcast episode
S1:E1 "The Beginning"
byData Science Now
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
Podcast episode
DynamoDB The Database of Choice for Serverless Applications with Alex DeBrie: Alex DeBrie is the founder of DeBrie, LLC, a cloud-native training and AWS consulting company with a focus on DynamoDB and serverless technologies. He’s also the author of The DynamoDB Book, a 450-page tome that offers tips, strategies, and more about dat
byScreaming in the Cloud
0 ratings
0% found this document useful
77: Data Visualization Tips w/ Tableau Expert Andy Kriebel
Podcast episode
77: Data Visualization Tips w/ Tableau Expert Andy Kriebel
byData Career Podcast: Helping You Land a Data Analyst Job FAST
0 ratings
0% found this document useful
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
Podcast episode
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
byTest and Code
0 ratings
0% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
#143 - How to Think Like a Software Engineering Manager - Akanksha Gupta
Podcast episode
#143 - How to Think Like a Software Engineering Manager - Akanksha Gupta
byTech Lead Journal
100%
100% found this document useful
#159 - Leveling Up Your Code Reviews from 'Good Enough' to Great - Adrienne Tacke
Podcast episode
#159 - Leveling Up Your Code Reviews from 'Good Enough' to Great - Adrienne Tacke
byTech Lead Journal
0 ratings
0% found this document useful
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
Podcast episode
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Continuous Delivery with Jez Humble and Martin Fowler: Scott sits down with Jez Humble and Martin Fowler at the GOTO Conference in Aarhus, Denmark to talk about Continuous Delivery. How do your software systems have to change if you deploy weekly? Daily? How about 10 times a day?
Podcast episode
Continuous Delivery with Jez Humble and Martin Fowler: Scott sits down with Jez Humble and Martin Fowler at the GOTO Conference in Aarhus, Denmark to talk about Continuous Delivery. How do your software systems have to change if you deploy weekly? Daily? How about 10 times a day?
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
Cloud SQL Insights with Nimesh Bhagat: This week on the podcast, Mark Mirchandani and Gabi Ferrara talk with Nimesh Bhagat about Cloud SQL Insights.
Podcast episode
Cloud SQL Insights with Nimesh Bhagat: This week on the podcast, Mark Mirchandani and Gabi Ferrara talk with Nimesh Bhagat about Cloud SQL Insights.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
Podcast episode
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
Podcast episode
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics: Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.
Podcast episode
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics: Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.
byData Engineering Podcast
0 ratings
0% found this document useful
Agile Development for Data Scientists, Part 1: The Good: If you're a data scientist at a firm that does a …
Podcast episode
Agile Development for Data Scientists, Part 1: The Good: If you're a data scientist at a firm that does a …
byLinear Digressions
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Ignore Previous Instructions and Listen To This Interview with Sander Schulhoff, CEO of Learnprompting.org: In this episode, Nathan sits down with Sander Schulhoff, Cofounder and CEO of Learnprompting.org.
Podcast episode
Ignore Previous Instructions and Listen To This Interview with Sander Schulhoff, CEO of Learnprompting.org: In this episode, Nathan sits down with Sander Schulhoff, Cofounder and CEO of Learnprompting.org.
by"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
0 ratings
0% found this document useful
Improving Upon a First-Draft Data Science Analysis: There are a lot of good resources out there for g…
Podcast episode
Improving Upon a First-Draft Data Science Analysis: There are a lot of good resources out there for g…
byLinear Digressions
0 ratings
0% found this document useful
Episode 189: Prioritizing Features with Corinn Pope: When developing a SaaS product, founders and product managers always have to deal with limited resources. Is there a formula for making the best feature decisions? Our guest today is Corinn Pope, founder of Speckled and prioritization expert. You’ll hear Corinn’s recommendations on roadmaps, decision-making formulas, prioritization processes, and management tips for all kinds of products.
Podcast episode
Episode 189: Prioritizing Features with Corinn Pope: When developing a SaaS product, founders and product managers always have to deal with limited resources. Is there a formula for making the best feature decisions? Our guest today is Corinn Pope, founder of Speckled and prioritization expert. You’ll hear Corinn’s recommendations on roadmaps, decision-making formulas, prioritization processes, and management tips for all kinds of products.
byUI Breakfast: UI/UX Design and Product Strategy
100%
100% found this document useful
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
Podcast episode
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
byScreaming in the Cloud
0 ratings
0% found this document useful
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
Podcast episode
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
byData Engineering Podcast
0 ratings
0% found this document useful
Challenges Operationalizing ML (And Some Solutions) // Nathan Ryan Frank // #199
Podcast episode
Challenges Operationalizing ML (And Some Solutions) // Nathan Ryan Frank // #199
byMLOps.community
0 ratings
0% found this document useful
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
Podcast episode
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub: Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to make it easier for everyone to contribute to the data being used by an organization and collaborate on its application. In this episode Aneesh Karve shares the journey that Quilt has taken to provide an approachable interface for working with versioned data in S3 that empowers everyone to collaborate.
byData Engineering Podcast
0 ratings
0% found this document useful
How Data Discovery is Changing the Game with Shinji Kim: Shinji Kim, CEO and Co-Founder of Select Star, joins Corey to talk about the fast-growing world of data discovery. Shinji presents the question that Select Star answers, “How discoverable is your data?” and explains how Select Star is differentiating itse
Podcast episode
How Data Discovery is Changing the Game with Shinji Kim: Shinji Kim, CEO and Co-Founder of Select Star, joins Corey to talk about the fast-growing world of data discovery. Shinji presents the question that Select Star answers, “How discoverable is your data?” and explains how Select Star is differentiating itse
byScreaming in the Cloud
0 ratings
0% found this document useful
Working With Developers
Podcast episode
Working With Developers
byBusiness Analysis Live!
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
MLOps Build or Buy, Startup vs. Enterprise? // Aaron Maurer & Katrina Ni # 157
Podcast episode
MLOps Build or Buy, Startup vs. Enterprise? // Aaron Maurer & Katrina Ni # 157
byMLOps.community
0 ratings
0% found this document useful
A Roadmap To Bootstrapping The Data Team At Your Startup: Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.
Podcast episode
A Roadmap To Bootstrapping The Data Team At Your Startup: Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Grafana Terminology
Linux Format
Article
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
Create Asynchronous Code With Python
Linux Format
Article
Create Asynchronous Code With Python
Jun 29, 2021
8 min read
AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Your First Steps In Grafana
Linux Format
Article
Your First Steps In Grafana
Nov 17, 2020
The easiest way to get hold of Grafana and begin using it as soon as possible is by downloading and executing its official Docker image. This means that apart from the Docker image, you won’t need to download, set up or install anything else for Graf
1 min read
Understand And Deploy Security Keys
Linux Format
Article
Understand And Deploy Security Keys
Feb 8, 2022
9 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
Zulip Economy
Linux Format
Article
Zulip Economy
Oct 20, 2020
10 min read
A Place For Everything
Outdoor Photographer
Article
A Place For Everything
Aug 10, 2019
9 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
How An A.i. Chatbot Works
Muse: The magazine of science, culture, and smart laughs for kids and children
Article
How An A.i. Chatbot Works
Feb 1, 2024
1 min read
Do I Need To Learn Python To Be A Good Character Rigger?
3D World
Article
Do I Need To Learn Python To Be A Good Character Rigger?
Sep 7, 2021
1 min read
Artificial Intelligence Rules Of The Road
Linux Format
Article
Artificial Intelligence Rules Of The Road
Nov 14, 2023
AI FOR ALL! Anyone who works with computers needs to understand that AI will undoubtedly change how work is executed. That said, I don’t think we are anywhere near the much bleated “Everyone will lose their jobs!” IT-related jobs will change but they
2 min read
Root & branch
Linux Format
Article
Root & branch
Jun 25, 2024
11 min read
PyScript – Bring Python Coding To The Web
APC
Article
PyScript – Bring Python Coding To The Web
Aug 8, 2022
4 min read
Seven Ways To Future-proof Your SEO Strategy
Marketing
Article
Seven Ways To Future-proof Your SEO Strategy
Apr 8, 2018
Search engine optimisation (SEO) is always changing. To stay ahead of your competitors you need to be able to shift your SEO strategy. Expect to see mobile devices, artificial intelligence (AI) and voice search dominating the news. But what practical
3 min read

Related categories

Skip carousel

Reviews for Spark Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Spark Cookbook - Rishi Yadav

Spark Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Apache Spark

Introduction

Installing Spark from binaries

Getting ready

How to do it...

Building the Spark source code with Maven

Getting ready

How to do it...

Launching Spark on Amazon EC2

Getting ready

How to do it...

See also

Deploying on a cluster in standalone mode

Getting ready

How to do it...

How it works...

See also

Deploying on a cluster with Mesos

How to do it...

Deploying on a cluster with YARN

Getting ready

How to do it...

How it works…

Using Tachyon as an off-heap storage layer

How to do it...

See also

2. Developing Applications with Spark

Introduction

Exploring the Spark shell

How to do it...

Developing Spark applications in Eclipse with Maven

Getting ready

How to do it...

Developing Spark applications in Eclipse with SBT

How to do it...

Developing a Spark application in IntelliJ IDEA with Maven

How to do it...

Developing a Spark application in IntelliJ IDEA with SBT

How to do it...

3. External Data Sources

Introduction

Loading data from the local filesystem

How to do it...

Loading data from HDFS

How to do it...

There's more…

Loading data from HDFS using a custom InputFormat

How to do it...

Loading data from Amazon S3

How to do it...

Loading data from Apache Cassandra

How to do it...

There's more...

Merge strategies in sbt-assembly

Loading data from relational databases

Getting ready

How to do it...

How it works…

4. Spark SQL

Introduction

Understanding the Catalyst optimizer

How it works…

Analysis

Logical plan optimization

Physical planning

Code generation

Creating HiveContext

Getting ready

How to do it...

Inferring schema using case classes

How to do it...

Programmatically specifying the schema

How to do it...

How it works…

Loading and saving data using the Parquet format

How to do it...

How it works…

There's more…

Loading and saving data using the JSON format

How to do it...

How it works…

There's more…

Loading and saving data from relational databases

Getting ready

How to do it...

Loading and saving data from an arbitrary source

How to do it...

There's more…

5. Spark Streaming

Introduction

Word count using Streaming

How to do it...

Streaming Twitter data

How to do it...

Streaming using Kafka

Getting ready

How to do it...

There's more…

6. Getting Started with Machine Learning Using MLlib

Introduction

Creating vectors

How to do it…

How it works...

Creating a labeled point

How to do it…

Creating matrices

How to do it…

Calculating summary statistics

How to do it…

Calculating correlation

Getting ready

How to do it…

Doing hypothesis testing

How to do it…

Creating machine learning pipelines using ML

Getting ready

How to do it…

7. Supervised Learning with MLlib – Regression

Introduction

Using linear regression

Getting ready

How to do it…

Understanding cost function

Doing linear regression with lasso

How to do it…

Doing ridge regression

How to do it…

8. Supervised Learning with MLlib – Classification

Introduction

Doing classification using logistic regression

Getting ready

How to do it…

Doing binary classification using SVM

How to do it…

Doing classification using decision trees

Getting ready

How to do it…

How it works…

Doing classification using Random Forests

Getting ready

How to do it…

How it works…

Doing classification using Gradient Boosted Trees

Getting ready

How to do it…

Doing classification with Naïve Bayes

Getting ready

How to do it…

9. Unsupervised Learning with MLlib

Introduction

Clustering using k-means

Getting ready

How to do it…

Dimensionality reduction with principal component analysis

Getting ready

How to do it…

Dimensionality reduction with singular value decomposition

Getting ready

How to do it…

10. Recommender Systems

Introduction

Collaborative filtering using explicit feedback

Getting ready

How to do it…

Collaborative filtering using implicit feedback

Getting ready

How to do it…

How it works…

There's more…

11. Graph Processing Using GraphX

Introduction

Fundamental operations on graphs

Getting ready

How to do it…

Using PageRank

Getting ready

How to do it…

Finding connected components

Getting ready

How to do it…

Performing neighborhood aggregation

Getting ready

How to do it…

12. Optimizations and Performance Tuning

Introduction

Optimizing memory

Using compression to improve performance

Using serialization to improve performance

How to do it…

Optimizing garbage collection

How to do it…

Optimizing the level of parallelism

How to do it…

Understanding the future of optimization – project Tungsten

Manual memory management by leverage application semantics

Using algorithms and data structures

Code generation

Index

Spark Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2015

Production reference: 1160715

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-706-1

www.packtpub.com

Cover image by: InfoObjects design team

Credits

Author

Rishi Yadav

Reviewers

Thomas W. Dinsmore

Cheng Lian

Amir Sedighi

Commissioning Editor

Kunal Parikh

Acquisition Editors

Shaon Basu

Neha Nagwekar

Content Development Editor

Ritika Singh

Technical Editor

Ankita Thakur

Copy Editors

Ameesha Smith-Green

Swati Priya

Project Coordinator

Milton Dsouza

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Sheetal Aute

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Rishi Yadav has 17 years of experience in designing and developing enterprise applications. He is an open source software expert and advises American companies on big data trends. Rishi was honored as one of Silicon Valley's 40 under 40 in 2014. He finished his bachelor's degree at the prestigious Indian Institute of Technology (IIT) Delhi in 1998.

About 10 years ago, Rishi started InfoObjects, a company that helps data-driven businesses gain new insights into data.

InfoObjects combines the power of open source and big data to solve business challenges for its clients and has a special focus on Apache Spark. The company has been on the Inc. 5000 list of the fastest growing companies for 4 years in a row. InfoObjects has also been awarded with the #1 best place to work in the Bay Area in 2014 and 2015.

Rishi is an open source contributor and active blogger.

My special thanks go to my better half, Anjali, for putting up with the long, arduous hours that were added to my already swamped schedule; our 8 year old son, Vedant, who tracked my progress on a daily basis; InfoObjects' CTO and my business partner, Sudhir Jangir, for leading the big data effort in the company; Helma Zargarian, Yogesh Chandani, Animesh Chauhan, and Katie Nelson for running operations skillfully so that I could focus on this book; and our internal review team, especially Arivoli Tirouvingadame, Lalit Shravage, and Sanjay Shroff, for helping with the review. I could not have written without your support. I would also like to thank Marcel Izumi for putting together amazing graphics.

About the Reviewers

Thomas W. Dinsmore is an independent consultant, offering product advisory services to analytic software vendors. To this role, he brings 30 years of experience, delivering analytics solutions to enterprises around the world. He uniquely combines hands-on analytics experience with the ability to lead analytic projects and interpret results.

Thomas' previous services include roles with SAS, IBM, The Boston Consulting Group, PricewaterhouseCoopers, and Oliver Wyman.

Thomas coauthored Modern Analytics Methodologies and Advanced Analytics Methodologies, published in 2014 by Pearson FT Press, and is under contract for a forthcoming book on business analytics from Apress. He publishes The Big Analytics Blog at www.thomaswdinsmore.com.

I would like to thank the entire editorial and production team at Packt Publishing, who work tirelessly to bring out quality books to the public.

Cheng Lian is a Chinese software engineer and Apache Spark committer from Databricks. His major technical interests include big data analytics, distributed systems, and functional programming languages.

Cheng is also the translator of the Chinese edition of Erlang and OTP in Action and Concurrent Programming in Erlang (Part I).

I would like to thank Yi Tian from AsiaInfo for helping me review some parts of Chapter 6, Getting Started with Machine Learning Using MLlib.

Amir Sedighi is an experienced software engineer, a keen learner, and a creative problem solver. His experience spans a wide range of software development areas, including cross-platform development, big data processing and data streaming, information retrieval, and machine learning. He is a big data lecturer and expert, working in Iran. He holds a bachelor's and master's degree in software engineering. Amir is currently the CEO of Rayanesh Dadegan Ekbatan, the company he cofounded in 2013 after several years of designing and implementing distributed big data and data streaming solutions for private sector companies.

I would like to thank the entire team at Packt Publishing, who work hard to bring awesomeness to the books and the readers' professional life.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

The success of Hadoop as a big data platform raised user expectations, both in terms of solving different analytics challenges as well as reducing latency. Various tools evolved over time, but when Apache Spark came, it provided one single runtime to address all these challenges. It eliminated the need to combine multiple tools with their own challenges and learning curves. By using memory for persistent storage besides compute, Apache Spark eliminates the need to store intermedia data in disk and increases processing speed up to 100 times. It also provides a single runtime, which addresses various analytics needs such as machine-learning and real-time streaming using various libraries.

This book covers the installation and configuration of Apache Spark and building solutions using Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX libraries.

Note

For more information on this book's recipes, please visit infoobjects.com/spark-cookbook.

What this book covers

Chapter 1, Getting Started with Apache Spark, explains how to install Spark on various environments and cluster managers.

Chapter 2, Developing Applications with Spark, talks about developing Spark applications on different IDEs and using different build tools.

Chapter 3, External Data Sources, covers how to read and write to various data sources.

Chapter 4, Spark SQL, takes you through the Spark SQL module that helps you to access the Spark functionality using the SQL interface.

Chapter 5, Spark Streaming, explores the Spark

Enjoying the preview?

Page 1 of 1

Spark Cookbook

About this ebook

Rishi Yadav

Related authors

Related to Spark Cookbook

Related ebooks

Apache Spark 2.x Cookbook

PostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases

Microsoft Azure Machine Learning

Python Business Intelligence Cookbook

Elixir Cookbook

Big Data Analytics

Databricks A Complete Guide - 2021 Edition

Learning Apache Cassandra - Second Edition

Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools

Learning PySpark

Hadoop MapReduce v2 Cookbook - Second Edition

Learning Hadoop 2

Frank Kane's Taming Big Data with Apache Spark and Python

Learn Hadoop in 24 Hours

Azure Databricks A Complete Guide - 2021 Edition

Hadoop Real-World Solutions Cookbook - Second Edition

Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)

Apache Spark for Data Science Cookbook

Fast Data Processing with Spark 2 - Third Edition

Ultimate Azure Data Engineering: Build Robust Data Engineering Systems on Azure with SQL, ETL, Data Modeling, and Power BI for Business Insights and Crack Azure Certifications (English Edition)

Instant Pentaho Data Integration Kitchen

AWS Key Management Service and AWS CloudHSM Third Edition

Building the Data Warehouse

Getting Started with Talend Open Studio for Data Integration

Talend Open Studio Cookbook

SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)

Exploring Hadoop Ecosystem (Volume 1): Batch Processing

AWS Glue A Complete Guide - 2021 Edition

Hadoop: Data Processing and Modelling

Learn Hbase in 24 Hours

Computers For You

Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power: Barack Obama's Books of 2019

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Data Analytics for Beginners: Introduction to Data Analytics

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

Algorithms to Live By: The Computer Science of Human Decisions

Storytelling with Data: Let's Practice!

Grokking Algorithms: An illustrated guide for programmers and other curious people

Practical Data Analysis

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

Learning the Chess Openings

Elon Musk

Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you

Get Into UX: A foolproof guide to getting your first user experience job

Black Holes: The Key to Understanding the Universe

Python Machine Learning By Example

Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit

The Alignment Problem: How Can Machines Learn Human Values?

Prompt Engineering ; The Future Of Language Generation

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.

Deep Learning with PyTorch

Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis

Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant

Computer Science I Essentials

Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis

Related podcast episodes

Related articles

Related categories

Reviews for Spark Cookbook

What did you think?

Book preview

Spark Cookbook - Rishi Yadav

Table of Contents

Spark Cookbook

Spark Cookbook

Credits

About the Author

About the Reviewers