PROC SQL: Beyond the Basics Using SAS, Third Edition

Ebook747 pages7 hours

PROC SQL: Beyond the Basics Using SAS, Third Edition

Name: PROC SQL: Beyond the Basics Using SAS, Third Edition
Author: Kirk Paul Lafler
ISBN: 9781635266818

By Kirk Paul Lafler

Rating: 0 out of 5 stars

()

Read preview

About this ebook

PROC SQL: Beyond the Basics Using SAS®, Third Edition, is a step-by-step, example-driven guide that helps readers master the language of PROC SQL. Packed with analysis and examples illustrating an assortment of PROC SQL options, statements, and clauses, this book not only covers all the basics, but it also offers extensive guidance on complex topics such as set operators and correlated subqueries. Programmers at all levels will appreciate Kirk Lafler’s easy-to-follow examples, clear explanations, and handy tips to extend their knowledge of PROC SQL.

This third edition explores new and powerful features in SAS® 9.4, including topics such as:

IFC and IFN functions
nearest neighbor processing
the HAVING clause
indexes

It also features two completely new chapters on fuzzy matching and data-driven programming. Delving into the workings of PROC SQL with greater analysis and discussion, PROC SQL: Beyond the Basics Using SAS®, Third Edition, explores this powerful database language using discussion and numerous real-world examples.

Skip carousel

LanguageEnglish

PublisherSAS Institute

Release dateMar 20, 2019

ISBN9781635266818

Author

Kirk Paul Lafler

Kirk Paul Lafler is founder and entrepreneur at Software Intelligence Corporation. He has worked with SAS software since 1979 as a consultant, application designer and developer, programmer, SAS solutions provider, data analyst, educator, and author. As a SAS Certified professional, mentor, and educator at Software Intelligence Corporation and as an advisor and SAS programming adjunct professor at the University of California San Diego Extension, Lafler has taught SAS courses, seminars, workshops, and webinars to thousands of users around the world. He received his BS and MS degrees from the University of Miami. Laﬂer is a frequent speaker at international, regional, special-interest, local, and in-house SAS users group conferences and meetings. He has also authored or co-authored several books, including Google® Search Complete and PROC SQL: Beyond the Basics Using SAS®; hundreds of papers and articles on a variety of SAS topics; and a popular SAS tips column, called “Kirk's Korner.” He has also served as an invited speaker, instructor, keynote speaker, and section leader at SAS users group conferences and meetings around the world and is the recipient of numerous “Best” contributed paper, hands-on workshop (HOW), and poster awards.

Related authors

Skip carousel

Related to PROC SQL

Related ebooks

Skip carousel

SAS Macro Programming Made Easy, Third Edition
Ebook
SAS Macro Programming Made Easy, Third Edition
byMichele M. Burlew
Rating: 3 out of 5 stars
3/5
Practical and Efficient SAS Programming: The Insider's Guide
Ebook
Practical and Efficient SAS Programming: The Insider's Guide
byMartha Messineo
Rating: 0 out of 5 stars
0 ratings
The Little SAS Book: A Primer, Sixth Edition
Ebook
The Little SAS Book: A Primer, Sixth Edition
byLora D. Delwiche
Rating: 5 out of 5 stars
5/5
Exercises and Projects for The Little SAS Book, Sixth Edition
Ebook
Exercises and Projects for The Little SAS Book, Sixth Edition
byRebecca A. Ottesen
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Programming in SAS: A Case Studies Approach
Ebook
Fundamentals of Programming in SAS: A Case Studies Approach
byJames Blum
Rating: 0 out of 5 stars
0 ratings
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
Ebook
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
bySAS Institute
Rating: 4 out of 5 stars
4/5
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
Ebook
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
bySAS Institute
Rating: 1 out of 5 stars
1/5
Learning SAS by Example: A Programmer's Guide, Second Edition
Ebook
Learning SAS by Example: A Programmer's Guide, Second Edition
byRon Cody
Rating: 3 out of 5 stars
3/5
SAS Statistics by Example
Ebook
SAS Statistics by Example
byRon Cody
Rating: 5 out of 5 stars
5/5
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Ebook
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
byLisa Fine
Rating: 0 out of 5 stars
0 ratings
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
Ebook
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
byKattamuri S. Sarma
Rating: 0 out of 5 stars
0 ratings
Applying Data Science: Business Case Studies Using SAS
Ebook
Applying Data Science: Business Case Studies Using SAS
byGerhard Svolba
Rating: 0 out of 5 stars
0 ratings
Biostatistics by Example Using SAS Studio
Ebook
Biostatistics by Example Using SAS Studio
byRon Cody
Rating: 0 out of 5 stars
0 ratings
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ebook
Getting Started with SAS Programming: Using SAS Studio in the Cloud
byRon Cody
Rating: 0 out of 5 stars
0 ratings
The SAS Programmer's PROC REPORT Handbook: ODS Companion
Ebook
The SAS Programmer's PROC REPORT Handbook: ODS Companion
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The R Perspective
Ebook
SAS Viya: The R Perspective
byYue Qi
Rating: 0 out of 5 stars
0 ratings
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
Ebook
Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition
byChris Holland
Rating: 0 out of 5 stars
0 ratings
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
Ebook
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
byAnja Fischer
Rating: 5 out of 5 stars
5/5
PROC DOCUMENT by Example Using SAS
Ebook
PROC DOCUMENT by Example Using SAS
byMichael Tuchman
Rating: 0 out of 5 stars
0 ratings
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
Ebook
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
byTricia Aanderud
Rating: 5 out of 5 stars
5/5
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
Ebook
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
byJoni N. Shreve, PhD
Rating: 0 out of 5 stars
0 ratings
Data Quality for Analytics Using SAS
Ebook
Data Quality for Analytics Using SAS
byGerhard Svolba
Rating: 4 out of 5 stars
4/5
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
Ebook
Applied Econometrics with SAS: Modeling Demand, Supply, and Risk
byBarry K. Goodwin
Rating: 5 out of 5 stars
5/5
Categorical Data Analysis Using SAS, Third Edition
Ebook
Categorical Data Analysis Using SAS, Third Edition
byMaura E. Stokes
Rating: 0 out of 5 stars
0 ratings
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
Ebook
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
byRajinder Kr. Chitoria
Rating: 0 out of 5 stars
0 ratings
SAS Visual Analytics for SAS Viya
Ebook
SAS Visual Analytics for SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
Multiple Imputation of Missing Data Using SAS
Ebook
Multiple Imputation of Missing Data Using SAS
byPatricia Berglund
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings
Querying MySQL: Make your MySQL database analytics accessible with SQL operations, data extraction, and custom queries (English Edition)
Ebook
Querying MySQL: Make your MySQL database analytics accessible with SQL operations, data extraction, and custom queries (English Edition)
byAdam Aspin
Rating: 0 out of 5 stars
0 ratings

Enterprise Applications For You

Skip carousel

Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
Ebook
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
byJames H. Moyle
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Agile Project Management: Scrum for Beginners
Ebook
Agile Project Management: Scrum for Beginners
byMarkus Heimrath
Rating: 4 out of 5 stars
4/5
Notion for Beginners: Notion for Work, Play, and Productivity
Ebook
Notion for Beginners: Notion for Work, Play, and Productivity
byJill Hamilton
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
Ebook
Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you
byMarc A. Palmer
Rating: 5 out of 5 stars
5/5
Excel VBA Programming For Dummies
Ebook
Excel VBA Programming For Dummies
byJohn Walkenbach
Rating: 4 out of 5 stars
4/5
Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
Personal Knowledge Graphs: Connected thinking to boost productivity, creativity and discovery
Ebook
Personal Knowledge Graphs: Connected thinking to boost productivity, creativity and discovery
byIvo Velitchkov
Rating: 0 out of 5 stars
0 ratings
Power BI for the Excel Analyst: Your Essential Guide to Power BI
Ebook
Power BI for the Excel Analyst: Your Essential Guide to Power BI
byWyn Hopkins
Rating: 0 out of 5 stars
0 ratings
Lean Management for Beginners: Fundamentals of Lean Management for Small and Medium-Sized Enterprises - With many Practical Examples
Ebook
Lean Management for Beginners: Fundamentals of Lean Management for Small and Medium-Sized Enterprises - With many Practical Examples
byMaximilian Tündermann
Rating: 0 out of 5 stars
0 ratings
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
Ebook
Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI
byGreg Deckler
Rating: 5 out of 5 stars
5/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Mastering Scrivener
Ebook
Mastering Scrivener
byAntoni Dol
Rating: 0 out of 5 stars
0 ratings
Change Management for Beginners: Understanding Change Processes and Actively Shaping Them
Ebook
Change Management for Beginners: Understanding Change Processes and Actively Shaping Them
bySteffen Lobinger
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
Product Operations: How successful companies build better products at scale
Ebook
Product Operations: How successful companies build better products at scale
byMelissa Perri
Rating: 0 out of 5 stars
0 ratings
Microsoft Excel Formulas: Master Microsoft Excel 2016 Formulas in 30 days
Ebook
Microsoft Excel Formulas: Master Microsoft Excel 2016 Formulas in 30 days
byTina E. Bernard
Rating: 4 out of 5 stars
4/5
Trend Following: Learn to Make a Fortune in Both Bull and Bear Markets
Ebook
Trend Following: Learn to Make a Fortune in Both Bull and Bear Markets
byMatthew G. Carter
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 1 out of 5 stars
1/5
Excel Dashboards and Reports
Ebook
Excel Dashboards and Reports
byJohn Walkenbach
Rating: 5 out of 5 stars
5/5
Logseq for Students: Super Powered Outliner Notebook for Learning with Confidence
Ebook
Logseq for Students: Super Powered Outliner Notebook for Learning with Confidence
byJeremy P. Jones
Rating: 5 out of 5 stars
5/5
Learning Python
Ebook
Learning Python
byFabrizio Romano
Rating: 5 out of 5 stars
5/5
Learn SAP MM in 24 Hours
Ebook
Learn SAP MM in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
The Ultimate Backup Guide: Saving, Syncing and Sharing Your Digital Life: Location Independent Series, #3
Ebook
The Ultimate Backup Guide: Saving, Syncing and Sharing Your Digital Life: Location Independent Series, #3
byJeff Blum
Rating: 0 out of 5 stars
0 ratings
Learn SAP Basis in 24 Hours
Ebook
Learn SAP Basis in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Learn MongoDB in 24 Hours
Ebook
Learn MongoDB in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
101 Most Popular Excel Formulas: 101 Excel Series, #1
Ebook
101 Most Popular Excel Formulas: 101 Excel Series, #1
byJohn Michaloudis
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
Podcast episode
End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
DataFramed Careers Series Special Announcement!
Podcast episode
DataFramed Careers Series Special Announcement!
byDataFramed
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
Podcast episode
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
Podcast episode
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
byData Engineering Podcast
0 ratings
0% found this document useful
A "SaaS" Look Ahead for 2020
Podcast episode
A "SaaS" Look Ahead for 2020
byThe Cloudcast
100%
100% found this document useful
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Podcast episode
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Modern Customer Data Platform Principles: Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
Podcast episode
Modern Customer Data Platform Principles: Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
byData Engineering Podcast
0 ratings
0% found this document useful
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Podcast episode
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
byData Engineering Podcast
0 ratings
0% found this document useful
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
Podcast episode
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Serverless Data APIs
Podcast episode
Serverless Data APIs
byThe Cloudcast
0 ratings
0% found this document useful
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
Podcast episode
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
byData Engineering Podcast
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
Podcast episode
Shining Some Light In The Black Box Of PostgreSQL Performance: Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Engineering for ML // Chad Sanderson // Coffee Sessions #117
Podcast episode
Data Engineering for ML // Chad Sanderson // Coffee Sessions #117
byMLOps.community
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Working With Developers
Podcast episode
Working With Developers
byBusiness Analysis Live!
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
How To Setup A Killer Wensite In 2022
PC Pro Magazine
Article
How To Setup A Killer Wensite In 2022
Jan 6, 2022
8 min read
HoudahSpot 5
MacLife
Article
HoudahSpot 5
Jun 25, 2019
2 min read
PC Matic For Mac: Don’t Bother
MacWorld
Article
PC Matic For Mac: Don’t Bother
Feb 13, 2024
3 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Three Low-code Options
PC Pro Magazine
Article
Three Low-code Options
Nov 12, 2020
Counting Intel, Vodafone and VW among its customers, OutSystems helps businesses create cloudbased, on-premises and hybrid applications for mobile and web. Its development environment is predominantly drag-and-drop, with views for processes, data and
3 min read
CalicoPie Family Historian 7
Computeractive
Article
CalicoPie Family Historian 7
Mar 24, 2021
SOFTWARE | £60 from Family Historian Store www.snipca.com/37615 If you’ve ever researched your family tree, you’ll know it’s much harder than the BBC’s celebrity genealogy programme Who Do You Think You Are? makes it appear. You’ll certainly need to
2 min read
Big Sur Takeaway
Essential Apple User Magazine
Article
Big Sur Takeaway
Nov 6, 2020
Command Centre: The Command Centre is Big Sur’s one-stop-shop for accessing the system’s Wi-Fi, display brightness, volume controls, Do Not Disturb and more. Widgets: Widgets are accessed from the Notifications menu, scroll through the notifications
5 min read
Beta Yourself Rss
Stuff Magazine South Africa
Article
Beta Yourself Rss
Apr 4, 2022
2 min read
Beta Yourself Rss
Stuff UK
Article
Beta Yourself Rss
Mar 17, 2022
2 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
Business NAS appliances 2022
PC Pro Magazine
Article
Business NAS appliances 2022
Apr 10, 2022
4 min read
Newsdesk
Linux Format
Article
Newsdesk
Mar 7, 2023
8 min read
Enterprise Soaring Success
Linux Format
Article
Enterprise Soaring Success
Aug 27, 2019
7 min read
Best Password Managers For Your Android Device
Android Advisor
Article
Best Password Managers For Your Android Device
Jul 5, 2023
7 min read
BUYER'S GUIDE TO Cloud File Sharing In 2021
PC Pro Magazine
Article
BUYER'S GUIDE TO Cloud File Sharing In 2021
Jan 7, 2021
4 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
PC Matic for Mac
Macworld UK
Article
PC Matic for Mac
Jan 12, 2024
3 min read
Cloud File Sharing
PC Pro Magazine
Article
Cloud File Sharing
Mar 10, 2022
4 min read
Opinion
Linux Format
Article
Opinion
Aug 20, 2024
Italo Vignoli is one of the founders of LibreOffice and the Document Foundation. “Think about the personal and confidential information in your office suite documents; it’s essential your office suite respects user privacy. LibreOffice does not ask y
3 min read
The Verdict
Linux Format
Article
The Verdict
Dec 15, 2020
2 min read
Jasper vs Writesonic
PC Pro Magazine
Article
Jasper vs Writesonic
Apr 6, 2023
5 min read
The 10 Must-Have Utilities for macOS Sierra
MacWorld
Article
The 10 Must-Have Utilities for macOS Sierra
Jan 24, 2017
12 min read

Related categories

Skip carousel

Reviews for PROC SQL

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

PROC SQL - Kirk Paul Lafler

Chapter 1: Designing Database Tables

Introduction

Database Design

Conceptual View

Table Definitions

Redundant Information

Normalization

Normalization Strategies

Column Names and Reserved Words

ANSI SQL Reserved Words

SQL Code

Data Integrity

Referential Integrity

Database Tables Used in This Book

CUSTOMERS Table

INVENTORY Table

INVOICE Table

MANUFACTURERS Table

PRODUCTS Table

PURCHASES Table

Table Contents

The Database Structure

Sample Database Tables

Summary

Introduction

The area of database design is very important in relational processes. Much has been written on this subject, including entire textbooks and thousands of technical papers. No pretenses are made about the thoroughness of this very important subject in these pages. Rather, an attempt is made to provide a quick-start introduction for those readers who are unfamiliar with the issues and techniques of basic design principles. Readers needing more information are referred to the references listed in the back of this book. As you read this chapter, the following points should be kept in mind.

Database Design

Activities related to good database design require the identification of end-user requirements and involve defining the structure of data values on a physical level. Database design begins with a conceptual view of what is needed. The next step, called logical design, consists of developing a formal description of database entities and relationships to satisfy user requirements. Seldom does a database consist of a single table. Consequently, tables of interrelated information are created to enable more complex and powerful operations on data. The final step, referred to as physical design, represents the process of achieving optimal performance and storage requirements of the logical database.

Conceptual View

The health and well-being of a database depends on its database design. A database must be in balance with all of its components (or optimized) to avoid performance and operation bottlenecks. Database design doesn’t just happen and is not a process that occurs by chance. It involves planning, modeling, creating, monitoring, and adjusting to satisfy the endless assortment of user requirements without impeding resource requirements. Of central importance to database design is the process of planning. Planning is a valuable component that, when absent, causes a database to fall prey to a host of problems including poor performance and difficulty in operation. Database design consists of three distinct phases, as illustrated in Figure 1.1.

Figure 1.1: Three Distinct Phases of Database Design

Figure 1.1: Three Distinct Phases of Database Design

Table Definitions

PROC SQL uses a model of data that is conceptually stored as multisets rather than as physical files. A physical file consists of one or more records ordered sequentially or some other way. Programming languages such as COBOL and FORTRAN evolved to process files of this type by performing operations one record at a time. These languages were generally designed and used to mimic the way people process paper forms.

PROC SQL was designed to work with multisets of data. Multisets have no order, and members of a multiset are of the same type using a data structure known as a table. For classification purposes, a table is a base table consisting of zero or more rows and one or more columns, or a table is a virtual table (called a view), which can be used the same way that a table can be used (see Chapter 8, Working with Views).

Redundant Information

One of the rules of good database design requires that data not be redundant or duplicated in the same database. The rationale for this conclusion originates from the belief that if data appears more than once in a database, then there is reason to believe that one of the pieces of data is likely to be in error. Furthermore, redundancy often leads to the following:

● Inconsistencies, because errors are more likely to result when facts are repeated.

● Update anomalies where the insertion, modification, or deletion of data may result in inconsistencies.

Another thing to watch for is the appearance of too many columns containing NULL values. When this occurs, the database is probably not designed properly. To alleviate potential table design issues, a process referred to as normalizing is performed. When properly done, this ensures the complete absence of redundant information in a table.

Normalization

The development of an optimal database design is an important element in the life cycle of a database. Not only is it critical for achieving maximum performance and flexibility while working with tables and data, it is essential to the organization of data by reducing or minimizing redundancy in one or more database tables. The process of table design is frequently referred to by database developers and administrators as normalization.

The normalization process is used for reducing redundancy in a database by converting complex data structures into simple data structures. It is carried out for the following reasons:

● To organize the data to save space and to eliminate any duplication or repetition of data.

● To enable simple retrieval of data to satisfy query and report requests.

● To simplify data manipulation requests such as data insertions, updates, and deletions.

● To reduce the impact associated with reorganizing or restructuring data as new application requirements arise.

The normalization process attempts to simplify the relationship between columns in a database by splitting larger multicolumn tables into two or more smaller tables containing fewer columns. The rationale for doing this is contained in a set of data design guidelines called normal forms. The guidelines provide designers with a set of rules for converting one or two large database tables containing numerous columns into a normalized database consisting of multiple tables and only those columns that should be included in each table. The normalization process consists of multiple steps with each succeeding step subscribing to the rules of the previous steps.

Normalization helps to ensure that a database does not contain redundant information in two or more of its tables. In an application, normalization prevents the destruction of data or the creation of incorrect data in a database. What this means is that information of fact is represented only once in a database, and any possibility of it appearing more than once is not, or should not be, allowed.

As database designers and analysts proceed through the normalization process, many are not satisfied unless a database design is carried out to at least third normal form (3NF). Joe Celko in his popular book SQL for Smarties: Advanced SQL Programming (Morgan Kaufman, 2014), describes 3NF this way: Databases are considered to be in 3NF when a column is dependent on the key, the whole key, and nothing but the key.

While the normalization guidelines are extremely useful, some database purists actually go to great lengths to remove any and all table redundancies even at the expense of performance. This is in direct contrast to other database experts who follow the guidelines less rigidly in an attempt to improve the performance of a database by only going as far as third normal form (or 3NF). Whatever your preference, you should keep this thought in mind as you normalize database tables. A fully normalized database often requires a greater number of joins and can adversely affect the speed of queries. Celko mentions that the process of joining multiple tables in a fully normalized database is costly, specifically affecting processing time and computer resources.

Normalization Strategies

After transforming entities and attributes from the conceptual design into a logical design, the tables and columns are created. This is when a process known as normalization occurs. Normalization refers to the process of making your database tables subscribe to certain rules. Many, if not most, database designers are satisfied when third normal form (3NF) is achieved and, for the objectives of this book, I will stop at 3NF, too. To help explain the various normalization steps, an example scenario follows.

First Normal Form (1NF)

First normal form (1NF) involves the elimination of data redundancy or repeating information from a table. A table is considered to be in first normal form when all of its columns describe the table completely and when each column in a row has only one value. A table satisfies 1NF when each column in a row has a single value and no repeating group information. Essentially, every table meets 1NF as long as an array, list, or other structure has not been defined. The following table illustrates a table satisfying the 1NF rule because it has only one value at each row‑and‑column intersection. The table is in ascending order by CUSTNUM and consists of customers and the purchases they made at an office supply store.

Table 1.1: First Normal Form (1NF) Table

CUSTNUM CUSTNAME CUSTCITY ITEM UNITS UNITCOST MANUCITY

1 Smith San Diego Chair 1 $179.00 San Diego

1 Smith San Diego Pens 12 $0.89 Los Angeles

1 Smith San Diego Paper 4 $6.95 Washington

1 Smithe San Diego Stapler 1 $8.95 Los Angeles

7 Lafler Spring Valley Mouse Pad 1 $11.79 San Diego

7 Loffler Spring Valley Pens 24 $1.59 Los Angeles

13 Thompson Miami Markers . $0.99 Los Angeles

Second Normal Form (2NF)

Second normal form (2NF) addresses the relationships between sets of data. A table is said to be in second normal form when all the requirements of 1NF are met and a foreign key is used to link any data in one table which has relevance to another table. The very nature of leaving a table in first normal form (1NF) may present problems due to the repetition of some information in the table. One noticeable problem is that Table 1.1 has repetitive information in it. Another problem is that there are misspellings in the customer name. Although repeating information may be permissible with hierarchical file structures and other legacy type file structures, it does pose a potential data consistency problem as it relates to relational data.

To describe how data consistency problems can occur, let’s say that a customer takes a new job and moves to a new city. In changing the customer’s city to the new location, it would be very easy to miss one or more occurrences of the customer’s city resulting in a customer residing incorrectly in two different cities. Assuming that our table is only meant to track one unique customer per city, this would definitely be a data consistency problem. Essentially, second normal form (2NF) is important because it says that every non-key column must depend on the entire primary key.

Tables that subscribe to 2NF prevent the need to make changes in more than one place. What this means in normalization terms is that tables in 2NF have no partial key dependencies. As a result, our database that consists of a single table that satisfies 1NF will need to be split into two separate tables in order to subscribe to the 2NF rule. Each table would contain the CUSTNUM column to connect the two tables. Unlike the single table in 1NF, the tables in 2NF allow a customer’s city to be easily changed whenever they move to another city because the CUSTCITY column only appears once. The tables in 2NF would be constructed as follows.

Table 1.2: CUSTOMERS Table

CUSTNUM CUSTNAME CUSTCITY

1 Smith San Diego

1 Smithe San Diego

7 Lafler Spring Valley

13 Thompson Miami

Table 1.3: PURCHASES Table

CUSTNUM ITEM UNITS UNITCOST MANUCITY

1 Chair 1 $179.00 San Diego

1 Pens 12 $0.89 Los Angeles

1 Paper 4 $6.95 Washington

1 Stapler 1 $8.95 Los Angeles

7 Mouse Pad 1 $11.79 San Diego

7 Pens 24 $1.59 Los Angeles

13 Markers . $0.99 Los Angeles

Third Normal Form (3NF)

Referring to the two tables constructed according to the rules of 2NF, you may have noticed that the PURCHASES table contains a column called MANUCITY. The MANUCITY column stores the city where the product manufacturer is headquartered. Keeping this column in the PURCHASES table violates the third normal form (3NF) because MANUCITY does not provide factual information about the primary key column (CUSTNUM) in the PURCHASES table. Consequently, tables are considered to be in third normal form (3NF) when each column is dependent on the key, the whole key, and nothing but the key. The tables in 3NF are constructed so the MANUCITY column would be in a table of its own as follows.

Table 1.4: CUSTOMERS Table

CUSTNUM CUSTNAME CUSTCITY

1 Smith San Diego

1 Smithe San Diego

7 Lafler Spring Valley

13 Thompson Miami

Table 1.5: PURCHASES Table

CUSTNUM ITEM UNITS UNITCOST

1 Chair 1 $179.00

1 Pens 12 $0.89

1 Paper 4 $6.95

1 Stapler 1 $8.95

7 Mouse Pad 1 $11.79

7 Pens 24 $1.59

13 Markers . $0.99

Table 1.6: MANUFACTURERS Table

MANUNUM MANUCITY

101 San Diego

112 San Diego

210 Los Angeles

212 Los Angeles

213 Los Angeles

214 Los Angeles

401 Washington

Beyond Third Normal Form

In general, database designers are satisfied when their database tables subscribe to the rules in 3NF. But, it is not uncommon for others to normalize their database tables to fourth normal form (4NF) where independent one-to-many relationships between primary key and non-key columns are forbidden. Some database purists will even normalize to fifth normal form (5NF) where tables are split into the smallest pieces of information in an attempt to eliminate any and all table redundancies. Although constructing tables in 5NF may provide the greatest level of database integrity, it is neither practical nor desired by most database practitioners.

There is no absolute right or wrong reason for database designers to normalize beyond 3NF as long as they have considered all the performance issues that may arise by doing so. A common problem that occurs when database tables are normalized beyond 3NF is that a large number of small tables are generated. In these cases, an increase in time and computer resources frequently occurs because small tables must first be joined before a query, report, or statistic can be produced.

Column Names and Reserved Words

According to the American National Standards Institute (ANSI), SQL is the standard language used with relational database management systems. The ANSI Standard reserves a number of SQL keywords from being used as column names. The SAS SQL implementation is not as rigid, but users should be aware of what reserved words exist to prevent unexpected and unintended results during SQL processing. Column names should conform to proper SAS naming conventions (as described in the SAS Language Reference), and they should not conflict with certain reserved words found in the SQL language. The following list identifies the reserved words found in the ANSI SQL standard.

ANSI SQL Reserved Words

You probably will not encounter too many conflicts between a column name and an SQL reserved word, but when you do you will need to follow a few simple rules to prevent processing errors from occurring. As was stated earlier, although PROC SQL’s naming conventions are not as rigid as other vendor’s implementations, care should still be exercised, in particular when PROC SQL code is transferred to other database environments expecting it to run error free. If a column name in an existing table conflicts with a reserved word, you have three options at your disposal:

1. Physically rename the column name in the table, as well as any references to the column.

2. Use the RENAME= data set option to rename the desired column in the current query.

3. Specify the PROC SQL option DQUOTE=ANSI, and surround the column name (reserved word) in double quotes, as illustrated below.

SQL Code

PROC SQL DQUOTE=ANSI;

SELECT *

FROM RESERVED_WORDS

WHERE "WHERE"=’EXAMPLE’;

QUIT;

Data Integrity

Webster’s New World Dictionary defines integrity as the quality or state of being complete; perfect condition; reliable; soundness. Data integrity is a critical element that every organization must promote and strive for. It is imperative that the data tables in a database environment be reliable, free of errors, and sound in every conceivable way. The existence of data errors, missing information, broken links, and other related problems in one or more tables can impact decision-making and information reporting activities resulting in a loss of confidence among users.

Applying a set of rules to the database structure and content can ensure the integrity of data resources. These rules consist of table and column constraints, and will be discussed in detail in Chapter 5, Creating, Populating, and Deleting Tables.

Referential Integrity

Referential integrity refers to the way in which database tables handle update and delete requests. Database tables frequently have a primary key where one or more columns have a unique value by which rows in a table can be identified and selected. Other tables may have one or more columns called a foreign key that are used to connect to some other table through its value. Database designers frequently apply rules to database tables to control what happens when a primary key value changes and its effect on one or more foreign key values in other tables. These referential integrity rules apply restrictions on the data that may be updated or deleted in tables.

Referential integrity ensures that rows in one table have corresponding rows in another table. This prevents lost linkages between data elements in one table and those of another enabling the integrity of data to always be maintained. Using the 3NF tables defined earlier, a foreign key called CUSTNUM can be defined in the PURCHASES table that corresponds to the primary key CUSTNUM column in the CUSTOMERS table. Users are referred to Chapter 5, Creating, Populating, and Deleting Tables for more details on assigning referential integrity constraints.

Database Tables Used in This Book

This section describes a database or library of tables that is used by an imaginary computer hardware and software wholesaler. The library consists of six tables: Customers, Inventory, Invoice, Manufacturers, Products, and Purchases. The examples used throughout this book are based on this library (database) of tables and are described and displayed below. An alphabetical description of each table used throughout this book appears below.

CUSTOMERS Table

The CUSTOMERS table contains customers that have purchased computer hardware and software products from a manufacturer. Each customer is uniquely identified with a customer number. A description of each column in the Customers table follows.

Table 1.7: Description of Columns in the Customers Table

INVENTORY Table

The INVENTORY table contains customer inventory information consisting of computer hardware and software products. The Inventory table contains no historical data. As inventories are replenished, the old quantity is overwritten with the new quantity. A description of each column in the Inventory table follows.

Table 1.8: Description of Columns in the Inventory Table

INVOICE Table

The INVOICE table contains information about customers who purchased products. Each invoice is uniquely identified with an invoice number. A description of each column in the Invoice table follows.

Table 1.9: Description of Columns in the Invoice Table

MANUFACTURERS Table

The MANUFACTURERS table contains companies who make computer hardware and software products. Two companies cannot have the same name. No historical data is kept in this table. If a company is sold or stops making computer hardware or software, then the manufacturer is dropped from the table. In the event that a manufacturer has an address change, the old address is overwritten with the new address. A description of each column in the Manufacturers table follows.

Table 1.10: Description of Columns in the Manufacturers Table

PRODUCTS Table

The PRODUCTS table contains computer hardware and software products offered for sale by the manufacturer. Each product is uniquely identified with a product number. A description of each column in the Products table follows.

Table 1.11: Description of Columns in the Products Table

PURCHASES Table

The PURCHASES table contains computer hardware and software products purchased by customers. Each product is uniquely identified with a product number. A description of each column in the Purchases table follows.

Table 1.12: Description of Columns in the Purchases Table

Table Contents

An alphabetical list of tables, variables, and attributes for each table is displayed below.

Output 1.1: Customers CONTENTS Output

Output 1.1: Customers CONTENTS Output

Output 1.2: Inventory CONTENTS Output

Output 1.2: Inventory CONTENTS Output

Output 1.3: Invoice CONTENTS Output

Output 1.3: Invoice CONTENTS Output

Output 1.4: Manufacturers CONTENTS Output

Output 1.4: Manufacturers CONTENTS Output

Output 1.5: Products CONTENTS Output

Output 1.5: Products CONTENTS Output

Output 1.6: Purchases CONTENTS Output

Output 1.6: Purchases CONTENTS Output

The Database Structure

The logical relationship between each table, and the columns common to each, appear below.

Figure 1.2. Logical Database Structure

Figure 1.2. Logical Database Structure

Sample Database Tables

The following tables: Customers, Inventory, Manufacturers, Products, Invoice, and Purchases represent a relational database that will be illustrated in the examples in this book. These tables are small enough to follow easily, but complex enough to illustrate the power of SQL. The data contained in each table appears below.

Table 1.13: CUSTOMERS Table

Table 1.13: CUSTOMERS Table

Table 1.14: INVENTORY Table

Table 1.14: INVENTORY Table

Table 1.15: INVOICE Table

Table 1.15: INVOICE Table

Table 1.16: MANUFACTURERS Table

Table 1.16: MANUFACTURERS Table

Table 1.17: PRODUCTS Table

Table 1.17: PRODUCTS Table

Table 1.18: PURCHASES Table

Table 1.18: PURCHASES Tableimage shown hereimage shown here

Summary

1. Good database design often improves the relative ease by which tables can be created and populated in a relational database and can be implemented into any database (see the Conceptual View section).

2. SQL was designed to work with sets of data and accesses a data structure known as a table or a virtual table, known as a view (see the Table Definitions section).

3. Achieving optimal design of a database means that the database contains little or no redundant information in two or more of its tables. This means that good database design calls for little or no replication of data (see the Redundant Information section).

4. Good database design avoids data redundancy, update anomalies, costly or inefficient processing, coding complexities, complex logical relationships, long application development times, and/or excessive storage requirements (see the Normalization section).

5. Design decisions made in one phase may involve making one or more tradeoffs in another phase (see the Normalization section).

6. A database in third normal form (3NF) is defined as a column that is dependent on the key, the whole key, and nothing but the key (see the Normalization section).

Chapter 2: Working with Data in PROC SQL

Introduction

The SELECT Statement and Clauses

Overview of Data Types

Numeric Data

Date and Time Column Definitions

Character Data

Missing Values and NULL

Arithmetic and Missing Data

SQL Keywords

SQL Operators, Functions, and Keywords

Comparison Operators

Logical Operators

Arithmetic Operators

Character String Operators and Functions

Summarizing Data

Predicates

CALCULATED Keyword

Dictionary Tables

Dictionary Tables and Metadata

Displaying Dictionary Table Definitions

Dictionary Table Column Names

Accessing a Dictionary Table’s Contents

Summary

Introduction

PROC SQL is essentially a database language as opposed to a procedural or computational language. This chapter’s focus is on working with data in PROC SQL using the SELECT statement. Often referred to as an SQL query, the SELECT statement is the most versatile statement in SQL and is used to read data from one or more database tables (or data sets). It also supports numerous extensions including keywords, operators, functions, and predicates, and returns the data in a table-like structure called a result-set.

The SELECT Statement and Clauses

The SELECT statement’s purpose is to retrieve (or read) data from the underlying tables (or views). Although it supports multiple clauses, the SELECT statement has only one clause that is required to be specified – the FROM clause. All the remaining clauses, described below, are optional and only used when needed. Note: Not every query needs to have all the clauses specified, but SQL provides developers and data analysts with a powerful and flexible language to access, manipulate, and display data without the need to write large amounts of code.

During execution, SAS carries out the tasks associated with planning, optimizing, and performing the operations specified in the SELECT statement and its clauses to produce the desired results. To prevent syntax errors from occurring when using the SELECT statement, the clauses must be specified in the correct order. To help you remember the order of the SELECT statement’s clauses recite, SQL is fun when geeks help others. The first letter in each word corresponds to the name of the SELECT statement’s clause as shown in Figure 2.1.

Figure 2.1: Order of the SELECT Statement Clauses

Figure 2.1: Order of the SELECT Statement Clauses

When constructed correctly, the SELECT statement and its clauses declares the database table (or data set) to find the data in, what data to retrieve, and whether any special transformations or processing is needed before the data is returned. The next example shows the correct syntax of a query’s SELECT statement and its clauses.

SQL Code

PROC SQL;

SELECT PRODNAME

,PRODTYPE

,PRODCOST

INTO :M_PRODNAME

,:M_PRODTYPE

,:M_PRODCOST

FROM PRODUCTS

WHERE PRODNAME CONTAINS Software

GROUP BY PRODTYPE

HAVING COUNT(PRODTYPE) > 3

ORDER BY PRODNAME;

QUIT;

Results

image shown here

Now that we’ve explored the order that each clause is specified in an SQL query, let’s examine the order of execution of each clause in an SQL query. Table 2.1 illustrates and describes the execution order of each SELECT statement clause.

Table 2.1: Clause Execution Order

Enjoying the preview?

Page 1 of 1

PROC SQL: Beyond the Basics Using SAS, Third Edition

About this ebook

Kirk Paul Lafler

Related authors

Related to PROC SQL

Related ebooks

SAS Macro Programming Made Easy, Third Edition

Practical and Efficient SAS Programming: The Insider's Guide

The Little SAS Book: A Primer, Sixth Edition

Exercises and Projects for The Little SAS Book, Sixth Edition

Fundamentals of Programming in SAS: A Case Studies Approach

SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4

SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4

Learning SAS by Example: A Programmer's Guide, Second Edition

SAS Statistics by Example

PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS

End-to-End Data Science with SAS: A Hands-On Programming Guide

Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition

Applying Data Science: Business Case Studies Using SAS

Biostatistics by Example Using SAS Studio

Getting Started with SAS Programming: Using SAS Studio in the Cloud

The SAS Programmer's PROC REPORT Handbook: ODS Companion

SAS Viya: The R Perspective

Implementing CDISC Using SAS: An End-to-End Guide, Revised Second Edition

SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment

PROC DOCUMENT by Example Using SAS

An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data

SAS Certification Prep Guide: Statistical Business Analysis Using SAS9

Data Quality for Analytics Using SAS

Applied Econometrics with SAS: Modeling Demand, Supply, and Risk

Categorical Data Analysis Using SAS, Third Edition

Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques

SAS Visual Analytics for SAS Viya

Multiple Imputation of Missing Data Using SAS

SAS Viya: The Python Perspective

Querying MySQL: Make your MySQL database analytics accessible with SQL operations, data extraction, and custom queries (English Edition)

Enterprise Applications For You

Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

Agile Project Management: Scrum for Beginners

Notion for Beginners: Notion for Work, Play, and Productivity

Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert

Beginner's Guide to the Obsidian Note Taking App and Second Brain: Everything you Need to Know About the Obsidian Software with 70+ Screenshots to Guide you

Excel VBA Programming For Dummies

Bitcoin For Dummies

Personal Knowledge Graphs: Connected thinking to boost productivity, creativity and discovery

Power BI for the Excel Analyst: Your Essential Guide to Power BI

Lean Management for Beginners: Fundamentals of Lean Management for Small and Medium-Sized Enterprises - With many Practical Examples

Learn Power BI: A beginner's guide to developing interactive business intelligence solutions using Microsoft Power BI

Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1

Mastering Scrivener

Change Management for Beginners: Understanding Change Processes and Actively Shaping Them

Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator

Enterprise AI For Dummies

Product Operations: How successful companies build better products at scale

Microsoft Excel Formulas: Master Microsoft Excel 2016 Formulas in 30 days

Trend Following: Learn to Make a Fortune in Both Bull and Bear Markets

ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology

Excel Dashboards and Reports

Logseq for Students: Super Powered Outliner Notebook for Learning with Confidence

Learning Python

Learn SAP MM in 24 Hours

ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve

The Ultimate Backup Guide: Saving, Syncing and Sharing Your Digital Life: Location Independent Series, #3

Learn SAP Basis in 24 Hours

Learn MongoDB in 24 Hours

101 Most Popular Excel Formulas: 101 Excel Series, #1

Related podcast episodes

Related articles

Related categories

Reviews for PROC SQL

What did you think?

Book preview

PROC SQL - Kirk Paul Lafler

Chapter 1: Designing Database Tables

Introduction

Database Design

Conceptual View

Table Definitions

Redundant Information