08 Lecture2

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 23

[MUSIC PLAYING] DAVID MALAN: All right.

So this is CS50, and this is lecture two,


our continuation of C. And for the next several weeks, we're gonna keep using C.
But we're gonna focus less on the language and the syntax, which we'll get
experience with over time by way of the problem sets, and more and more on the
ideas and more and more on the problems that we can solve. But before we forge
ahead with anything new, let's take a quick look back at where we left off and what
we'll sort of assume for today's comfort level, and ask any and all questions along
the way. So in order to program last time, we needed a tool. And that tool was this
thing here, CS50 IDE. If you haven't dived in already, you probably will this
weekend for problem set one. And this will be a web-based programming environment
that's got all the requisite tools you need in order to write code, compile code,
and then, starting today, debug code or find mistakes in code. But this is
requisite because it's not sufficient to just write this. What do we call this,
generally speaking? Yeah, so this is source code. So this is code. When someone
says, I write code, they write stuff like this. And this is, particularly, a
language called C. But, of course, computers don't speak C. And they don't speak
Java, and they don't speak Python or C++ or any of the languages with which you're
familiar. They only understand what at the end of the day? Yeah, so binary. So
binary is, of course zeros and ones, otherwise known as machine code in this
context, insofar as it it is code. It's instructions that implement some problem-
solving techniques. But it's just zeros and ones that computers understand. So we
needed a tool to get from A to B. And that was called what? Yeah, a compiler. So a
compiler, of course, does this for us. Source code is the input. Compiler is the
program or really the algorithm, albeit in the form of a piece of software. And
then the output is machine code, zeros and ones. And for our purposes, we're not
going to worry about how we get from step A to B per se. We'll want to use the
tool. But this is another area unto itself in computer science. If you want to
understand how compilers work and how humans got from literally zeros and ones to
something called assembly code to higher-level languages, you'll see a glimpse of
that in CS50. But it unto itself is a whole field that might prove ultimately of
interest. But this, more mechanically, is how we compiled code. Clang is a
compiler. It stands for C language. And it's just software that some humans wrote
some years ago. And there are alternatives. If you've ever used Visual Studio in
the Windows world or GCC in the Linux and Unix world, there's bunches of other
compilers. We just happened to use clang since it's pretty popular. And then that
second command is even stranger looking. But it represents the act of doing
what? ./a.out. Yes, over here? Yeah, running the program. Exactly. So ./a.out is a
cryptic way of running a program. But it's like the textual equivalent of double-
clicking an icon. And a.out is just like the default name you get, assembler
output, when you compile a program without specifying its name. But we were able to
specify a name. If you introduce a technique called command-line arguments, you can
be a little more precise. clang -o for output, then any word you want. In this
case, I went with "hello." And then the name of the program or the file that you
want to compile. But, this of course, gets pretty tedious. And, in fact, there's a
missing step. Sometimes when you want to write a program, it suffices to compile it
exactly as that. But let me go ahead and do this. Let me go ahead into CS50 IDE,
and let me go ahead and briefly do the following. File, New. And let me go ahead
and save this as "hello.c." And just from memory, I'll quickly recreate that same
program. int main void, and then we had printf, hello, world. And then just for
good measure, backslash n, which means move the cursor to the next line. And now
I'm gonna go ahead and save that. If I go ahead now and run clang hello.c, looks
good. And ./a.out. That looks good, too. But recall that we introduced some other
functions the other day, as well, like get_string and get_int. And we'll see
bunches more before long. And if I do that, notice that I have to do a couple of
things. So if I want to do, like, string name gets get string, and then, quote,
unquote "name" to prompt the user, recall that the left-hand side says, hey,
computer, give me a variable that's gonna store string. Call it name. Could have
called it anything. I could have called it s. But why might it be arguably better
to call my variable name instead of string, or s, rather? Yeah? AUDIENCE: It's
clearer. DAVID MALAN: It's just clearer, right? It might be a very marginal, nit-
picky detail. But it's just clearer. And when our programs get bigger, it's just
nicer to be able to read words and understand implicitly what they mean without
having to think through what x or y or s or whatever actually is. On the right-hand
side, meanwhile, we had this function get-string, whose purpose in life is to go
get a string from the user, from his or her keyboard, prompting them with a word
like "name," and then return it, just as Sam handed me back a slip of paper with a
name on it. But there's a catch. It's now no longer sufficient to just adapt my
code for this new approach. And I had, again, to change that second line. I had to
give a placeholder, which is %s. And if that's a little cryptic looking, just kind
of think of it like a Mad Lib, if you're familiar with those, where there's just,
like, a fill-in-the-blank here. And all %s is doing is saying, put a word there.
What word? Well whatever comes after the comma, whatever variable or value is
there. So that's good. But a couple of things can go wrong. And let me point those
out so that you don't perhaps trip over it yourself on your own. Let me go ahead
and do "clang hello.c" now that I've made these changes and saved them. Bunches of
errors all of a sudden, even though I've not changed all that much code. Again,
rule of thumb from last time should be even if there's lots of error messages,
always look at the first one first because the rest might just be kind of a
resulting cascade of errors, only one of which is important, which is the first.
Now, it says "use of undeclared identifier string. Did you mean stdin," which is
something else altogether. I didn't. I meant string. And it turns out that string
is a feature of the so-called CS50 library. So this is one of these training wheels
we're just gonna use for a few weeks until we dive in underneath the hood of
strings, too. But in order to use anything from CS50, what did I need to add to my
code, too? AUDIENCE: Source code. DAVID MALAN: Source code, yes. But what else?
AUDIENCE: The library. DAVID MALAN: The library. And the library was the CS50
library. And that means there's a file somewhere on the computer, in the IDE,
called cs50.h, a so-called header file-- more on those in a bit. And so I did
forget that detail. But suppose you don't recall that yourselves. Well, you might
recall from one of our orientation sessions that CS50 has tools with which to help
for this, too. You don't have to turn to this course's online discussion forum. You
don't have to go to office hours, necessarily, for error messages like this. If you
can't quite wrap your mind around what's happening, go ahead and do this instead.
Instead of just running clang hello.c, do something like help50 clang hello.c,
where help50 is a CS50 specific command, sort of a virtual teaching fellow, if you
will. And if we recognize your error messages in yellow at the bottom, we'll
highlight the first one. And then we'll try to give you advice like you might get
in person. So "by undeclared identifier, clang means that you've used a name,
string, on line 5 of hello.c, which hasn't been defined. Did you forget to include
cs50.h, in which string is defined, atop your file?" So we'll generally try to
prompt you with rhetorical questions that hopefully are correct in leading you
toward the right solution. So OK, that jogged my memory, at least, even if I'm not
yet 100% comfortable with what these lines really are doing. And we'll tease that
apart more today. But it feels like I'm trying to do this. But it turns out there's
one other gotcha here. clang hello.c, Enter. Dang it. Well, actually, this is a net
positive, right? Fewer error messages, it would seem. But now one we did not see
the other day-- "undefined reference to get_string." So it's kind of similar in
spirit. Something is not understood. But it's different wording, certainly. And I'm
not quite sure what that means. But it turns out that it's not sufficient just when
using most libraries to use include cs50.h or other header files, as we'll
eventually see. That just teaches the compiler that something exists. It was like
briefly last time, when we talked about prototypes, where I put a little one-liner
that just said, by the way, this function's eventually gonna exist. That's what the
header file's doing. It's like a promise to clang, this shall exist. But there's a
second step. You actually have to feed clang the zeros and ones that implement the
CS50 library that you, of course, did not create yourself, but we did pre-install
in the IDE. And there's a separate way of doing that. Rather than just do clang
hello.c, you have to do what's called linking your code against that library, at
least if it's a third-party library that doesn't come with the computer. It came
from humans like us. So this now says, hey, clang, compile hello.c, but link it
against the CS50 library, which means take my zeros and ones, take CS50's zeros and
ones, combine them, and then give me my actual program to run. And so that's going
to be a key ingredient. And help50
could guide you toward that solution. But if I hit Enter now, now it seems to
compile. ./a.out. My name shall be David. Enter. And it says, "hello, David." So
again, don't get hung up, ultimately, in office hours and problem sets on these
kinds of errors. You're gonna hit these bumps from the get-go. But just realize--
look for sort of familiar words. Use things like help50. Reach out to the course
online. And just get over those hurdles because at the end of the day, the
interesting stuff's gonna be the logic of the programs and the actual problems
we're trying to solve. So what is it that's actually, then, going on here
underneath the hood? And frankly, this is very quickly becoming tedious. So how do
I automate some of these processes? Because it's very easy to forget this, and it's
just very boring to continue to type so many commands. Well, recall that there was
a shortcut we talked about the other day, which just kind of hides all of these
details. You don't need to remember -o. You now don't need to remember -lcs50. What
do you do instead to make a program? Yeah? Yeah, so make, it's not a compiler
itself. It's just kind of a helper program that knows how to run a compiler for
you. So frankly, the simpler approach is just to do that-- make hello. And the
reason it outputs so many more words is just because we have, in anticipation of
teaching the semester, sort of preconfigured it with command-line arguments,
additional words, that we expect you're gonna need at some point. And this just
saves you the trouble of having to futz around with the manual to figure that kind
of thing out. But that's why it looks cryptic. But notice this is really the most
important word-- hello followed by hello.c. And those are your two ingredients. All
right. So what's going on, then, underneath the hood there? Well, it turns out that
even though we can simplify the command structure, it's actually doing quite a bit
for us. And this the process of compiling. But that was kind of an
oversimplification, or, put more intelligently, like an abstraction. There's
actually quite a few steps that go on underneath the hood, one of which is called
preprocessing and compiling and assembling and linking. So let's do a quick dive-
down here. But then we'll abstract away, just so that you've seen what's going on.
But henceforth, we can just take for granted that all of this is happening. So here
is source code, same program as the simplest version we had a moment ago. And
ultimately, I need to get this to machine code. Well, let's see if we can't just
visualize how we get from point A to B without completely abstracting it away with
just those big arrows. So this is my source code. And it turns out that the very
first step of turning source code into machine code in the world of C is you first
run what's called a preprocessor. You don't do this explicitly, although you could
if you were really low-level and interested in it. But what the preprocessor does,
essentially, is anytime there's a line of code that starts with a pound sign, or a
hashtag these days, that's a special command that gets, essentially, replaced with
the contents of the file, at least in this case. So somewhere on the idea is a file
called, literally, "stdio.h." And so #include means go get that file and
essentially copy and paste it here. And so when you preprocess your code, this
yellow line here becomes something like this. And I'm doing "..." it's dozens if
not hundreds of lines long. But there's one juicy line in it which is the little
clue to clang that printf shall exist. And that's why you need stdio.h. So that's
essentially, for our purposes today, all the preprocessor does. It does these kind
of find and replace style operations so that now your file, without you knowing it,
suddenly became much bigger because it's got other lines of code that someone else
wrote. And then your code remains right there as it was. But the next step after
preprocessing is something called compiling itself, which technically, the
compiler, if we really want to be nit-picky and look at its formal definition, is
actually taking these yellow lines, your source code and someone else's, perhaps,
and converting it into something called assembly code. And this is a language that
humans kind of sort of still do, but back in the day really did program in. And in
fact, if you have a computer with an Intel CPU, a brain made by Intel inside of
your computer, there was and still is a big user's manual that tells programmers
around the world that this Intel CPU understands the following instructions-- add,
subtract, multiply, divide, all the basics, and then things like move numbers from
here to here, read numbers from here to here, just move stuff around in the
computer's memory. And so even though this really looks cryptic even to me, since I
am by no means an expert at assembly language, certainly, all these years later,
you can see words that kind of sound familiar, like "mov" suggests moving a value
from one location to another. "sub" alludes to subtraction, so subtracting one
number from another. And without really thinking this through carefully, I'm not
really sure what's going on yet. But I do see a familiar word down there called
"printf." And so long story short, what the computer or compiler specifically has
done is it's taken my more user-friendly C code, converted it to something that's a
little closer to what the machine understands. But it's not there yet because the
machine only understands zeros and ones. So there's another step called assembling.
And the assembling process simply takes assembly code and converts it to zeros and
ones. Now we're down to the zeros and ones. And what's amazing-- if it's
interesting in the first place-- is when you run clang and hit Enter, all of this
is just happening instantly. And you're getting these zeros and ones, this output.
But I've left the room on the other side because all we've done is convert my code
from source code to assembly code to machine code. What needs to now be merged in,
so to speak, for that "Hello World" program? Yeah, so still need, like, stdio, the
standard I/O library that has printf. So the next step is to take a whole bunch of
zeros and ones from somewhere else on the system, combine them until this is the
file containing a.out or hello, whatever you called your program. And that,
ultimately, is what the computer understands. So that is a very low-level detail.
Thankfully, we learned in the very first lecture this notion of abstraction, which
means even as you dive in underneath the hood and sort of understand how we're
building up, now, henceforth-- literally every minute hereafter-- that whole
mouthful just becomes compiling. And indeed, that's what most people in the
programming business refer to as compiling, is all of those several steps. But
that's all that's happening. Feels like magic, but it's just one step after another
gets us closer to our goal. Questions? It's about as low-level as we'll get. Yeah?
AUDIENCE: Why do you have to go through the assembly code and then the machine
code? Why not just go straight to machine code? DAVID MALAN: Good question. Why do
you have to go from one step to another, like from source code to assembly code to
machine code? You absolutely could. It just happens to be the case that there's
lots of humans in the world and lots of people working on different projects. And
this notion of layering your software on top of someone else's on top of others'
allows us to build more complex systems much more cleanly, if you will. And there's
different types of computers in the world. There's Macs. There's PCs, which even
though these days, they're a lot more similar underneath the hood, literally, than
they used to be back in the day, there's different CPUs. There's phones that have
very different CPUs. And wouldn't it be nice if I could write my programs in one
language and compile them into zeros and ones that do work on a Mac and on a PC and
on an Android phone and an iPhone and so forth? And that's why by having these sort
of different layers, one set of humans or one person can implement the process of
converting C to assembly code. Then someone else can take it to the zeros and ones,
in some sense. Or even-- there's even intermediate steps. Compilers have front ends
and back ends and all of this complexity. But it gives us advantages because it
means we can sort of decide which types of hardware to support more easily. Really
good question. Other questions? OK. So with that said, let's now consider any
number of ways in which things can go wrong. It's easy for me, certainly, to write
"hello, world," and everything just kind of works. And even when it doesn't, I
quickly know how to fix it. And it's only from experience and practice. But let me
just give you a teaser not just of help50 but two other tools that you'll see,
particularly for the problem sets, that will not necessarily teach you how to write
good code-- good, efficient code. That's where the humans come in and the teaching
fellows feedback and sections and office hours and more-- but at least to write
correct code that meets our specifications and that's well-styled, at least looks
good. But the third ingredient, recall, besides correctness and style is gonna be
design, which is something we'll learn after practice and examples. So with
check50, this is a tool that comes in CS50 IDE, recall, if unfamiliar, that allows
me to essentially do this. Let me whittle this back down to my simplest hello,
world program like this. I no longer need the CS50 library. I can run make hello.
Seems to work. And how do you go about testing your programs if you've written this
for a problem set? Well, the easiest and most straightforward way, of course, is
just run it. Looks like it's correct. And it is. And there's not too much that can
go
wrong in this program. But soon, you'll see, with problem set one and beyond,
anytime you start getting input from the user where he or she has to type their
name or a number or other things, you can absolutely concoct scenarios where
something goes wrong. But if you run a command in this case like check50, we can do
the following. Let me go ahead and first make a directory called-- let me go ahead
and do this and do mkdir-- for make directory-- hello. And then we didn't see this
the other day. And you'll see more of this in today's super sections, or classwide
sections, which will also be filmed. I'm just gonna to move this file into a
directory called hello. So that's like on a Mac or PC just dragging and dropping
it. But I'm doing it with my keyboard. What's the command to change into another
directory? cd. So that's like double-clicking on a directory, albeit with my
keystrokes only. And now I'm gonna go ahead and run this. I can run make hello
again. Seems to work. And I can run ./hello. Seems to work. But now let's see if
CS50 agrees. So check50. And then I'm gonna type "cs50/2017/fall/hello," which
looks like a bunch of folders, but it's not. It's just a unique identifier that has
sort of some hierarchy to it. You would only know to type this by reading problem
set specification online. And what this is gonna do, if you haven't seen it
already, is actually connect to CS50's server. It's gonna authenticate you, if you
haven't already. I'm gonna go ahead and log in as student50. And now hereafter it
will remember my password, for at least some amount of time, so you don't have to
type it in every darn time. Then it's preparing. It's uploading. And what's
happening now is my "hello.c" file is somewhere in the cloud on CS50 servers. We
are running the checks, the tests that the staff wrote. And hopefully, I'm gonna
see a whole bunch of green smiley faces that look a little yellow on this
projector, but those are, in fact, green smiley faces instead of frowny faces,
which would suggest something is wrong. So that's all good. And don't be
discouraged if you see a few frowny faces or a few flat, confused faces if
something else is awry. But style50 does something different. Right now, the style
of my code, I'd argue, looks pretty good because it's kind of hard to go wrong when
it's this short. But we'll see a way. And if I instead run "style50 hello.c," just
the name of the file I want to check-- looks good, but consider adding more
comments. And that's pretty compelling because there's zero at the moment. And so
what kind of comments might you want to add? Well, in this program, it's not that
compelling to add that many comments because the reality is this program's so short
it probably takes me less time to read the code than the comments. But it's very
common, as you'll see in the examples from lecture, to do something like this--
"says hello to user," just a quick one-line summary so that when you're skimming
the file or looking at the code, OK, got it. I know what this does. And if I care
to know how it does that, then I can read the code. And so that would be a comment.
And that will probably make style50 happy in this case. But what if I'm getting a
little sloppy? And I remember vaguely that I was in the habit of hitting Tab or the
space bar in lecture. But I can't be bothered to do that when I'm working on my
problems set. I just want to get the darn thing to work. It's not uncommon for code
to eventually start to look like this, even though this, too, is a simple program.
Now, good style, as you'll see and learn from practice, dictates that just like in
Scratch there were those yellow puzzle pieces that kind of hug the code, similarly,
inside of curly braces, you really should be indenting. And so if I go ahead and
sort of forget that and now run style50 on "hello.c," I'll see see my code
outputted in the terminal window, the bottom of the screen. But green suggests hey,
programmer, add the following characters. So green suggests add here. And if I go
ahead now and reindent that by hitting Tab-- specifically four spaces, which is a
human convention-- it should make it happy again. We can go in the reverse
direction, though. Suppose that I got a little confused as to what I actually am
supposed to indent-- and you might even see in textbooks and some online resources,
some people write their code like this. Let me go ahead now and run style50 on
this. It's gonna print out my code. And red in this case means remove those
characters that you might not otherwise see. So it's not always going to be
perfect. And especially when the programs get long, it might be a little nonobvious
what changes you have to make. But just like with error messages, start at the top.
Make one or few changes. Save it and rerun it, and see what the updated advice is.
And I can't stress this enough, especially with problem set one and any problem set
thereafter-- don't get into the habit of sitting down and trying to bite off the
entirety of a problem. Odds are with Scratch, you didn't sit down and write the
whole thing without once playing it or testing it or adding features to it. Don't
get into that habit, then, in C. Take steps and steps, just as we've been doing
with these examples so far. All right. Any questions, then, on those tools? And
we'll come back in just a moment to more sophisticated debugging techniques. All
right. So one of the problems that we were distracted by earlier is there's this
old-school games, "Super Mario Bros.," wherein a character like this jumps around
the screen quite a bit. And it's one from the very first Nintendo game, and there's
lots of obstacles in the way of Mario as he's running left and right and jumping.
And some of these obstacles can be represented with fairly simple constructs like
bricks in this colorful world. And we can approximate this just by using characters
on our screens, as well. So I actually poked around for far too much time last
night looking at old "Super Mario Bros." maps, which if I had them in, like, the
1980s, would have made "Super Mario Bros." a lot easier. But people have captured
all of the imagery from this game. And one snapshot from this game was a screen
like this. So eventually, Mario's supposed to run through this. And he's supposed
to bump his head up against the question marks and get coins and so forth. But for
now, I'm gonna really, really simplify this and propose that all I care about for
the sake of discussion is this line of question marks. How would a computer
program, whether in "Super Mario Bros." or today here in Sanders, go about printing
a line of question marks in a row like that? Well let me go ahead and open up CS50
IDE. I'm gonna go ahead and create a new file here. And I'm just going to go ahead
and call this, say, "mario0.c" because it's the first or the zero version of this
program. And I just want to print, like, four question marks. So let me take a stab
at this first. So #include stdio.h, which I think I need because why? AUDIENCE:
Printf? DAVID MALAN: Yeah, I need printf. i need to be able to print the character.
So int main void is what comes next. And we'll start to tease apart why that is
today. And now I'm going to go ahead and print out "????." And then semi-colon. All
right. Let me go ahead now and make mario0. ./mario0. And I kind of sort of have a
very ugly textual representation of a really fun-- at least 1980s style-- game. But
there's a slight aesthetic bug. And I made this same mistake the other day. How do
I move my cursor onto the next line? Yeah, so backslash n. So backslash is the one
we're about to type, and forward slash or slash is what people would call just the
other direction. So that's backslash n. And that's a special escape character, so
to speak. For now, just know that this starts to confuse the computer if you just
literally hit Enter. Now, your code's on two lines, when really it's just one idea
or one function. So humans decided some time ago, let's just represent that special
character that you would otherwise just hit on a keyboard. So now if I rerun make
mario0. ./mario0. OK, now looks a little better. But we know from scratch that we
don't just need to do question mark, question mark, question mark, question mark,
especially if I want even more coins to be available on the screen. What's the
right programming construct to just give me more of these? AUDIENCE: A for loop.
DAVID MALAN: Yeah, like a for loop, right? So let me go ahead and tweak this a
little bit. Let me go ahead and in, let's say, "mario1.c"-- so "mario1.c"-- I'm
going to instead do this. So for int-- to give me an integer-- i equals 0 by
default, though it could be 1. But programmers tend to use 0. i is less than-- I'm
not sure, so let's just put a big blank there for a moment. And then i plus plus, I
remember, being the way to increment. And then inside of this loop, I'm going to do
printf "?" semi-colon. And now let's answer this question. If this for loop, which,
recall, has a very methodical process to it-- it initializes, checks the condition,
does something, increments, checks the condition, repeat again and again. What
number should I put on the otherwise blank line here? AUDIENCE: Four. DAVID MALAN:
So four. But if I'm counting from 0 to 4, that feels like it's five numbers. So
three might get me closer, but less than. We have this relational operator. Less
than, could have been greater than in other contexts. So the less than actually
saves us. If I do for here, think about logically what's happening. i gets
initialized to 0 for the first time. And we get a question mark printed. Then it
gets plus plussed, and so it becomes 1, which is less than 4. And so that's the
first time I printed a question mark. Then i becomes 1 next. I print another one. i
is now 1. I do another. i is now 2. I do another. i is now three, which is not
consistent with the number of fingers I'm holding up because I started at 0. But
once i becomes 3, and therefore I've already printed my four question marks, the
next value i is gonna take on is 4 itself. Is 4 less than 4? No, so I never get a
chance to print another question mark or put up another finger. And honestly, this
is a waste of intellectual capacity to think through, OK, how many numbers are
between 0 and 4? We could have-- like most of us in this room just think-- could
have just done i is less than or equal to 4, and that, too, would have worked. This
is even more clear, perhaps. You start at 1, and you count up to and through the
number 4. And that will give me four fingers, as well. Why do we start counting at
0? It's kind of just because, but more technically it's because it's easy to start
counting with all 0 bits per our first lecture. So it's just a habit. And it's fine
if you're more comfortable this way. But before long, get into this habit just
because everyone else does it this way. OK, so now let's go ahead and print this
out. Make mario1. ./mario1. Ah, still that bug. OK, is this gonna fix this? Why
not? Yeah, that's gonna do question mark, new line, question mark, new line. That's
not right. So what line number should the backslash n really go on or between?
AUDIENCE: It should go outside the for loop. DAVID MALAN: OK, so it should go
outside the for loop. So specifically-- I saw a hand in back, too. What line
number? Yeah. AUDIENCE: Eight and nine? DAVID MALAN: Yeah, so between eight and
nine. There's no room there at the moment. That's no big deal. We'll just hit
Enter, printf. And I can certainly just do a single backslash n, even with no words
to the left of it. That, too, is OK. Let me recompile this. And honestly, if you
get bored retyping the same commands, know that you can also hit up and down on
your keyboard, and it will go through your history, so to speak. And that, too,
over time will start to save you time. So there's make mario1. Enter. ./mario1, or
I could just scroll back up as I did before. And now I get those four question
marks. But now let's actually create this a little more interestingly, as follows.
Let me go ahead and not just hard-code 4 into this program. Let me make one more
version of Mario, call it mario2.c, and this time actually get some user input. How
about I do int n because is like a number, and it's just common convention to call
it that. get_int, and then I can say number, semi-colon. And now instead of hard-
coding 4, why don't I just put n there, which I can certainly do? So let me go
ahead now and run make mario2. Uh-oh. Error. Yeah, I forgot cs50.h. So I have to go
back up here. I'll just do a quick copy-paste, and then change the word, cs50.h.
Now I'm gonna clear my screen. And to clear your screen, you can hit Control-L, for
instance, which will just keep fewer characters on the screen for us. Make mario2.
That worked. I didn't need to worry about the -lcs50 because, again, make does that
for me. That's one of the features. And now I can do make mario2. Number. How many
question marks do we want? AUDIENCE: Seven. DAVID MALAN: I heard seven first. And
now we have seven question marks. And it's not necessarily gonna look super pretty.
If I do 700, now I'm gonna get a whole lot. But look how quickly it did that for
me. And so we have this power now of loops. So that's good, but you know what?
Let's see, what about this? What about -50 question marks? Is -50 an int? AUDIENCE:
Yes. DAVID MALAN: It is. So we will get it for you via the get_int That's not
really logically what we want. So think about it. On line seven if n equals -50,
how many times will the for loop execute? AUDIENCE: None. DAVID MALAN: Why none?
AUDIENCE: Because 0's greater than. DAVID MALAN: Yeah, because 0 is in this case
greater than -50. So that condition never lets the loop actually proceed logically.
So we're kind of OK. Nothing seems to happen. I get this sort of ugly blank line.
And maybe that's arguably a bug. But at least it didn't freak out the computer and
just kind of print things infinitely many times, as could actually happen. So let
me go ahead and-- actually, at the risk of losing control over my computer, let's
go ahead and change the logic. Suppose I change the less than to a greater than.
And we initialize n to -50. And now, is 0 greater than -50? AUDIENCE: Yes. DAVID
MALAN: Yes. And it's gonna be that way for a really long time, most likely, even as
you increment it. And so let me go ahead and do make mario2 and then hold my breath
and do -50. And even the internet and the computer can't really keep up. And that's
why you're just kind of seeing it bursty like this. We're sending thousands, tens
of thousands, millions, ultimately, of question marks across the screen. And that,
too, you might do accidentally. And so just as I did, you can hit the secret
keystroke, which usually works, which is Control-C for cancel. And that will stop a
program in the window from running. All right. So I've gone ahead now and
implemented kind of a very weak approximation of this. So that's great. Let's now
take a step up and consider not just this construct, but if we fast forward in the
game, to this part of the screen, now maybe we have a vertical block, as well. And
let's just consider for a moment what about my code needs to change if I want to
print three or maybe any number of vertical blocks? Fundamentally, how do I want to
change the code? How do I want to change the code? Yeah. AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, I just need a line break, where I accidentally almost did
earlier. But in this case, it would be a good thing. So let me go over there, and
let me just quickly make mario3 by starting at the same point. So mario3.c. And
then let me go down here and change this as follows. Let's make this i is less than
n, the way it's supposed to be and just so that if we upload these later, I don't
forget. And now I'll do a hashtag just because it looks more brick-like. It's not
one of those coin things. And now I do this here. I don't think I need this
anymore. So let me save that and run make mario3. Seems to be OK compiling-wise.
mario3, number 3. And I get three of those. And now I could do maybe five of those.
That works, and so forth. So there's still an opportunity for improvement here. In
case I want to pester the user to actually cooperate such that if he or she types
in -50, I don't want to just quit. I want to yell at them or somehow give them
feedback and say, give me what I asked for. How do I continue to pester a user
again and again and again until he or she actually gives me the value I want? I'm
sorry? AUDIENCE: While. DAVID MALAN: While. So there's different looping
constructs. While-- and it turns out we could use while or even for, or there's
another one, as well. And let's consider how we might do this. It turns out that
when you want to get user input from someone, you could use for. You could use
while. But you'll find that it's a little annoying to use those constructs. Let me
just jump to the better way first so that we see one other way. It turns out if you
in a program want to do something at least one time, and maybe some more times, you
could use for or while. But it's actually a little more straightforward to
literally just do it while something is true. Now, this is just a placeholder. Let
me start to fill in some logic here. So I want to do the following-- do the
following while what? If the user does not give me a positive number, I want to
prompt him or her again. The curly braces on lines seven and nine at the moment
connote exactly that. Do this, do this, do this while line 10 is true. So what
Boolean expression, if you will, do I want to type in the parentheses here on line
10 to express the fact keep doing this until the number is positive? Yeah?
AUDIENCE: While n is greater than 0. DAVID MALAN: So while n is greater than 0,
keep doing-- which one? AUDIENCE: Less than. DAVID MALAN: I heard less than. OK, so
let's rethink this. So while n is less than 0, ask the user for a number. Ask the
user for a number. Ask the user for a number. And you know what? This is going to
just confuse the heck out of them. Let's be even more clear with our prompt. Give
me a positive number. But keep prompting him or her until we actually get a
positive number. Now, if we really want to be nit-picky, it's actually not even
less than. We're so close. AUDIENCE: Less than or equal to. DAVID MALAN: It's less
than or equal to. Unfortunately, I don't really remember having a key on my
keyboard that's got, like, an angled bracket and then a line under it like you
might write in math class. So there's a way to do that nonetheless on your
keyboard. I actually just do them side by side. This is less than or equal to. This
would be greater than or equal to. And so now just get comfortable reading these
things left to right. There's no special symbol like you would have in a math book
or a homework assignment on paper. So this, I think, says the right thing. Do this
while n is less than or equal to 0, which is, of course, not positive. And then
down here, the rest of my code, I think, can stay the same. I just have a block of
code up here now that's doing something. And you know what? This is where comments
start to get useful. Prompt user for a positive number. And now down here, print
out that many bricks. So it's kind of obvious if you just read through the code
what I just said. But this helps you if you sort of sleep on it and wake up, and
you want to remember, why did I do this? Why did I do that? It helps the reader of
your code, a colleague, a teaching fellow, and so forth. That's how you kind of
start to add comments to your code. Unfortunately, there's a bug, and we're about
to hit it. So let me try. Let me go ahead and make mario3. Oh, my god. More errors
than
I have lines of code, it seems. And this one's weird. Error-- unused variable n.
And now let me dive in deeper to these error messages just so you start to notice
little clues. So over here on the left is, of course, the filename, as you might
have noticed-- mario3.c Then there's a colon, and then a number, and then a colon
and another number. Turns out this is just a very succinct way of saying that in
mario3.c on line nine at character or column, left to right, 13, you've got a
problem, at least the compiler thinks. So generally, the character is kind of sort
of helpful. It's really the line number that draws your attention to the right
place. Somehow, this is buggy. And specifically, the bug is that I have an unused
variable n. And then very inexplicably, on line 11, now I have a use of n. So it's
unused here, but it's used here. And somehow, the computer doesn't like this. Why
might this be? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, that's the trick
here. So it's a little different from Scratch, where when you make a variable, you
can just use it anywhere you want. In C and some other languages, variables only
exist in what's called a certain scope. And a scope you can generally think of as
just the most recently opened and closed curly braces. So what does that imply
here? Well, on line nine, I am on the left-hand side declaring a variable. Hey,
computer, give me a variable called n. And it's gonna store an int. That's the
story we keep telling. But the problem is I am doing that in between lines 8 and
10, curly braces. And I claimed today that that means, kind of like Scratch has the
hug the puzzle pieces, in C, variables are treated a little different. If you
declare a variable in here, it only exists in here, and you can't use it down here
in your code. And so this would kind of seem to be a catch-22. I need a variable.
And so I can declare it. But I can't declare the variable there if I want to use it
later. It doesn't really seem to be a good situation. So just logically, even if
you've never programmed before, if the fundamental problem is that this variable
exists only in that scope of the curly braces, how intuitively could we solve this?
Yeah. AUDIENCE: Move the curly brace? DAVID MALAN: Remove the curly-- oh, move the
curly braces. Yeah, we could move the curly braces, which is essentially the idea.
The catch is the do-while loop really kind of needs them. At least in generally
cases, you need those curly braces. But you know what? It's been a while since I
typed them. But I do have another pair of curly braces that are sort of outside of,
so to speak, my inner curly braces. So I have another scope here that's essentially
the whole function called main. So what if I somehow declare my variable out there.
And indeed, I can. I can go to, like, line seven-- or even higher, but generally
you want to keep it as close to where you care about it as possible. I can type int
n. And I don't think I want to prompt the user here because then I'm going to
create the same problem as before, where I'm just prompting him or her once. I want
that prompt in a loop, again and again and again, potentially. So that's OK. We've
not seen this before, but you can declare a variable, and then do nothing with it
yet. Just say, hey, computer, give me a variable. I'll deal with this later, just
like in Scratch. You declare a variable if you did, and then you deal with it later
as you want. Now, this would be a bug still. I can't say, hey, computer, give me a
variable n. And then, oh, by the way, give me another variable n. So all I have to
do to fix this issue is just don't redeclare it. Just use it. So line seven says,
hey, computer, give me a variable called n that's going to store an int. Line 10,
same story as always except it's slightly shorter. On the left-hand side, it says,
here's my variable. Right-hand side says, here's a value we got from the user. Put
it from right to left. And so now because n is declared or created on line seven,
it exists within the scope of these outermost curly braces. And now I can use it
kind of anywhere I want, including on line 12, which is great, and, most
importantly, on line 15. So let me go ahead and save this and do make mario3, hold
my breath. Good, it actually worked. ./mario3. Positive number. Nuh-uh. I'm gonna
give you -1. OK. I'm gonna give you zero. All right, fine. I'll give you 3. And now
it actually cooperates. And so the do-while construct is still a loop, and Scratch
doesn't really have an analog of this. But the do-while loop is still a loop, but
it does something at least once. The difference fundamentally, though, is this-- if
I did this, like, while n is less than or equal to 0, if I change this to a while
loop, which we saw ever so briefly the other day as just an analog of the forever
block in Scratch, if I do this, there's kind of a logical problem. Here's n being
declared on line seven. So we're avoiding the scope issue this time from the get-
go. But line eight is saying while n is less than or equal to zero. But what is n
at this point? It's not yet defined. And, in fact, as we'll soon see in class, it
has some garbage value, typically, some unknown value, remnants of whatever the
computer used that RAM or memory for in the past. So this is literally undefined
behavior, it would seem. I don't know if the loop's gonna execute or not because I
don't know what's in n. So you could hack around this, so to speak. Hacking
generally means kind of sort of figuring out a solution to a problem that might not
be the cleanest. And OK, let me just initialize this to, like, -1,000 because I
know that's less than or equal to zero. So it's a hack in that it fixes the logical
problem because now on line eight, is -1,000 less than or equal to 0? It is. So now
my loop will execute at least once, and it will then change the value of n. But
what the heck is -1,000 coming from? These kind of inelegant solutions would be
horrible, horrible design, even though it logically gets the job done and it's
correct. Bad, bad design. And so that's why we started with a better design, with a
do-while loop. But you'll find there's many different ways to do things. And you
might not, certainly, in problem set one or two do things always the right or best
way the first time. And that's OK because with practice and experience, you'll
begin to see patterns with which you can solve these same kinds of problems. Any
questions on these approximations of Mario? Well, let me do one last one, one last
one involving Mario, and kind of like this. I spent way too much time looking for
parts of Mario that kind of painted these pictures. And I found this, these
additional bricks underground in the fire level. And suppose I wanted to print,
like, a cube of hashtags, so not just a horizontal line, not just a vertical line,
but kind of sort of both together. And indeed, you can think of these bricks as
exactly that. It's like hashtag, hashtag, hashtag, hashtag, hashtag, hashtag,
hashtag, hashtag, and so forth, kind of like an old-school typewriter, printing one
line at a time. And if you even remember typewriters, you can actually think of
computers and printf as behaving very similarly. You can print something, then do
the backslash n. Print something else, do the backslash n. So what is a square like
this on the screen? Well, it's really just the process of, like, painting the
screen, if you will, from left to right, moving down, left to right, moving down.
And now do we do this? Well, what was the type of code we used in order to do
something again and again and again? The for loop was the first one. We could use
other constructs, but I'm going to go ahead and use the for loop again. Let me save
this as mario4, as our fourth and final example. I'm gonna keep this code up here
because I still want to prompt the user for some number of blocks, a positive
number at that. And now I don't want to just do this, but let me just see where I
left off. Let me go ahead and make mario4. ./mario4. And let's do, like, a 5-by-5
block. OK, that's not. That's just a column. So I've got to do a little more. Well,
it turns out that just like in Scratch, I can take one idea and kind of nest it
inside of another. Let me go ahead and do this. How about inside of my for loop
that's going from i to n, let me do another one for int-- and I don't want to reuse
i because I feel like if I use i in two places, something's going to get messed up.
So I'm gonna go with the next one alphabetically, j, which is actually pretty
common. So int j gets 0. j is less than n. And j plus plus. And now in here, I'm
gonna put that brick. I think I need to get rid of this here. And let's see now
what happens. So I've got a for loop inside of a for loop. If I go ahead and do
make mario4, Enter, code is compilable. ./mario4. Let's type in 5. Hmm. I think
that's actually, like, 25, if I really count it out. That's not what I wanted. I
wanted a square. So what's obviously missing aesthetically? A new line. But I kind
of thought that doesn't go here, right? Because if I do this-- just real quick
teaser. If I rerun mario4 after making that change, now I've just made the opposite
problem. So what needs to change? Yeah. Oh, just scratching? OK. What needs to
change? Yeah, over here. AUDIENCE: Another line with a printf and a backslash n.
DAVID MALAN: Yeah, and what line would you propose the printf with the backslash n?
AUDIENCE: 21? DAVID MALAN: Sorry? AUDIENCE: 21. DAVID MALAN: 21. So above or below
it? AUDIENCE: Above it. DAVID MALAN: Above it. OK, so let me go there. So let me go
ahead and printf backslash n. And now let's see. So make mario4. ./mario4, Enter.
5. Beautiful, beautiful. It's not quite a square on the screen because the hashtags
are a little more vertical than they are wide. But that's OK. We've built this sort
of approximation of that level,
too. And now, just for good measure, let me just think about-- this is kind of an
oversimplification, print out that many bricks. So print out this many rows or
columns on the outside? And then in here, where we're going with this, print out
this many-- what should my comment here be on the top? Top on is rows? And then
down here, this should be columns. And it is because on the outermost loop, you've
got i. And it's starting at 0, and it's eventually going to 5. But whenever i is 0,
at the beginning, it's like the cursor is in the top left-hand location by default
on the screen. And then you've got this nested loop, which says, oh, by the way, do
the following five times. What are you doing five times? Hashtag, hashtag, hashtag,
hashtag, hashtag. Then a new line. Then i becomes 1. So it's like moving over--
sorry. Then i becomes 1, which means you're now on the second row because you've
printed out one of those newline characters. So here, too, this is where comments
would be helpful because, frankly, even I had to think about that. And you don't
want to waste time thinking about code you've already written. Just give yourself
the answer to why you made past decisions as in a case like this. All right. So
suppose something's going wrong. And, in fact, we already solved the problem of,
like, a lot of hashtags going this way and a lot of hashtags going that way. But
suppose you want to wrap your mind around what your code is actually doing. It
turns out that we have two other tools we can use. It turns out we have in the CS50
library a function that's almost identical to printf except we called eprintf for,
like, error printf just to help you see what's going on inside of your code. And
you should use it as follows. If you kind of want to wrap your mind more clearly
around what your own code is doing or, for that matter, even an example for class
that you downloaded, you can add, certainly, your own lines of this-- like, "hello
there. I'm at home playing with this code" or something like that, right? So
something nonsensical, but at least now when you see that sentence on the screen,
you know on what line the computer was executing your program. So you can be a
little more methodical than this. And with eprintf, notice we can do the following.
I'm going to change this to just eprintf, and it works the same. And I'm going to
go ahead and do this-- "about to prompt user for a number." I just want to provide
an explicit note to myself, temporarily, what should be happening here. And let me
see now what happens. If I do make mario4. OK. ./mario4. Ah. I get a little ugly
output, but it's just diagnostic. It's temporary. It says mario4.c on line 10 is
giving the following message-- "about to prompt user for a number." That's just a
note to self so that I'm comfortable understanding the flow or the structure of my
program. I can still interact with it. Let's type in something like -1. And what
should I see next on the screen if I type -1? Yeah, another prompt. "About to
prompt user for number." So it's just like a sanity check. If you think something's
going to happen, tell yourself that it should in your code, and make sure you see
what you expect to see. And then once you're sure your code is good, then don't
submit it with this because this is not correct per the specification. You can just
get rid of it at that point. But frankly, that gets tedious very quickly. Oh, and
how do I kill my program if I don't want to keep playing? Control-C will terminate
the program. There's one other tool, perhaps the most powerful. And I can't stress
stress this enough. Get into the habit of using this as needed early on. Even if it
takes you an extra 10 minutes, half hour to play with it, it will save you,
potentially, hours over the course of the semester. And that is a program called a
debugger. So a debugger is a program that helps you remove bugs or mistakes from a
program. And it works like this. I'm going to go ahead and recompile mario4. And
now I would normally run it, of course, with ./mario4. But suppose I have a bug,
and I really want to understand what's going on. I'm going to do the following.
You'll notice that all of my examples thus far have line numbers in the so-called
gutter of the program, left-hand side. And it turns out you can actually click to
the left of those numbers at, like, this point here. And you can put a red dot.
This shall be known as what's called a breakpoint. This is like a little stop sign,
only for yourself, that says, hey, computer, pause my program here, or really stop
my program here, like a stop sign, temporarily. And let me, the human, go at human
speed, not, like, billions of things per second speed. And by this, I mean the
following. I'm going to now run not mario4 but debug50 space mario4, which, again,
is a program we wrote that invokes or starts the IDE's built-in debugger. So notice
magically this right-hand panel just popped out. And it's actually always been
there. It's always said "Debugger," and it just happened to open that window for me
automatically. And let's see what's going on. There's a lot of words, but we're
familiar with many of them already. Notice that down here is the word local
variables. And then there's kind of a table here. And it's not very big because I
only have one local variable. And at this point in the story, my variable n
happens-- I got lucky. It has a default value, it would seem, of 0. I shouldn't
rely on that. But it's just so early in my program that it seems safe-- well
rather, it's so early in my program that it happened to have the value 0 in it for
our purposes today. And it's of type int. But what's cool now is the following. Now
notice that my program is effectively paused on line seven, or, specifically, line
10, which is the first interesting line. That's why it's highlighted in yellow. And
what's cool here is this. Up here in the top right, you have a play button which
will just say, play the rest of my program. Just let it go through without pausing.
Or, if I hover over this thing, you can step over this line, which means, hey,
computer, execute this line, but at my human pace, just one line of code at a time.
If you're really curious, you can step into that line of code, but more on that in
just a moment. Meanwhile, this is step out, which is if we've actually dived in
deeper. So what do I mean by all of this? So I'm currently paused on line 10, which
was the first interesting line of code in my program, so far as the debugger is
concerned. I'm going to go ahead at top right, and I'm going to go ahead and click
Step Over. And notice my terminal window is now prompting me for a number. Why?
Well because I've stepped over the get int line, which means execute it. So let me
go ahead and type in that number. Let me go ahead and type in -50, Enter. And keep
an eye on the variable on the right-hand side. Notice now in the debugger, even
without printing it with printf or eprintf, I can see that n has a value of -50.
It's just a sanity check, so to speak. I can see what it is to be sure it's
consistent with my expectations. All right. That's not right, so let me go ahead
and step over. And notice the yellow line moved because it's looping. You can
literally see what I keep doing with my hand. Let me do it again. OK, positive
number. I'm going to cooperate this time. 42, Enter. Notice at the right-hand side,
the value n is indeed 42. And notice the yellow line, if I keep stepping, is about
to jump to the next interesting line of code. And if I keep doing this, keep doing
this, watch what's about to happen in the blue terminal window at the bottom.
There's the first hashtag. There's the second hashtag. So the sort of fake
animation I did the other day with just my slides, and what I try to do verbally
and with my hand going back and forth, you can now see much more methodically. So
even if it's a simple program, and even if it's code you wrote, you can really see
step by step what it is your program's doing. And maybe it's not doing what you
expect. And if it's not, you'll see it visually. All right. Now I'm just gonna go
ahead and say, OK, print the rest of the thing. So I hit Play. You see that the
GDB, the GNU Debugger, server is exiting. It's just quitting. And now I'm back at
my prompt, and the debugger goes away. So do not undervalue those particular tools.
So before we forge ahead, I thought I'd introduce Abhishek here, who you might have
seen on the internet just a couple months ago. He kind of went viral. He's a recent
grad from NYU. And he did this extraordinary thing. He took a device called the
Microsoft Hololens, which is an augmented reality device that puts sort of a goofy
looking screen in front of your eyes. But then it projects images in front of your
eyes. And it's really cool in that much like an Android phone or an iPhone these
days, it knows where you are in a three-dimensional space. And what Abhishek
actually did was he went to a very three-dimensional space, Central Park in
Manhattan. And he had before that spent days recreating "Super Mario Bros." in
augmented reality by recreating one of those maps to which I alluded earlier. And
the end result-- and I'll show you just a glimpse of it, and we'll put it on the
course's website for you to see later in detail-- was this, which was pretty mind-
blowing and a wonderful application of computer science to the real world,
literally. [VIDEO PLAYBACK] - Hi. I'm Abhishek. And I recreated the iconic first
level of "Super Mario Bros." as a first-person, life-size, augmented-reality game
that I'm now going to play as Mario. [MUSIC, "SUPER MARIO BROS. THEME"] [END
PLAYBACK] DAVID MALAN: Abhishek gave a tech talk in CS50 a couple of months ago.
And the funniest part, if you really look closely-- and it is Manhattan-- is some
people look at him. But a lot of New Yorkers don't even look twice
at what he's doing. Let's go ahead here and take a five-minute break. And when we
come back, we'll begin to look at the world of cryptography. So we are back. And,
of course, there are more functions than just printf. And we've seen a glimpse of
these by way of the CS50 library. And there's many, many, many, many more that come
with C itself and that other people around the world have written over the years.
But implied in each of these CS50 functions, notice, are these key words like
string and int and float-- which we talked about the other day, too-- long, and
long long, and double, as we saw the other day, too. So it turns out that C, to be
clear, has what are called data types. And we glimpsed this the other day. Data
types specify what type of data you can put inside of a variable. And that's what's
different from Scratch, too. In C and a few other languages, too, you have to
decide in advance as the programmer what kind of data are you going to put in this
variable so that the computer-- or, really, the compiler-- knows. And so the
compiler knows how to deal with it for you. Well, it turns out that if you want to
print these things out, printf also comes with certain format codes. And we've seen
%s for strings and %i for integers. And there's a bunch of others, too. Perhaps the
most common would be these, just so that you've seen them-- %f for float. We saw
that the other day. %lld for a long, long decimal number. That's one I often have
to look up myself. And then there's even more of those, too. So just realize that
as you're getting input from the users, whether for problem sets or any other
purposes, realize that sometimes you have to check the manual or the documentation,
so to speak, for functions that you're using. And so that you know where to turn
for those kinds of things, let me just introduce one thing real quick. And you'll
see more of this in super sections and sections and beyond. If you forget, for
instance, how certain functions work, you can actually type the following-- "man
get_string," where man stands for manual. And this is kind of an old-school command
on Unix and Linux computers that have this text-based keyboard environment. And
you'll see pretty much a standard, structured user's manual for the function in
which you're interested. So if you forget what we talked about in class or you're
not really sure how else you can use it, and the function is something like
get_string, you can simply read about it here. But sometimes, frankly, it's going
to look a little arcane. I mean, we have not talked about what some of these
symbols mean-- the ..., the word const, the asterisk that I've highlighted on the
screen. So frankly, sometimes you will find the man pages, as they're called-- the
manual pages-- just confusing unto themselves, which is a nasty situation to be in.
If you're already confused, and the documentation's not helping, you of need a
third option. And so if you go to CS50's website, you'll actually find that there's
a link to a tool that the staff has created over the years called CS50 Reference.
This is a more user-friendly version of those same man pages, where we've gone
through and sort of translated the very arcane English into less comfortable
English, if you will. So if over here I scroll down to, say, printf-- or, rather,
let me just search for it-- I can see printf here. It's inside of this header file,
this h file on the system. And now I can actually read about it here. And notice at
top right, checked is the Less Comfortable box, which means, hey, show me the
language the TFs came up with as opposed to the default language. But it, too, is
meant to be a training wheel. So if and when you're ready to sort of take away some
of those simplifications, you can uncheck that box and now see the much more
verbose technical version that you would actually see in the real world. So keep in
mind those kinds of things, too, especially if it feels like we go through things
quickly in class, which we do, and you need to lean on something authoritative
thereafter. But let's tease apart what actually a string is. Let me go ahead and
start actually, with Stelios here. So Stelios, one of our head TAs in New Haven,
has this name here. And I've written it as a string, S-T-E-L-I-O-S. But I've kind
of drawn boxes, deliberately, around his name to capture the fact that this thing
we call string, like "Stelios," is actually not really a string only. It's really
like an abstraction for something a little lower-level, which is a character after
a character after a character, and so forth. And so here, too, we see an example of
an abstraction. It's not that much fun to call Stelios S-T-E-L-I-O-S. We call him
Stelios. But we, in languages like C, would call that construct a string or, more
technically, a sequence of characters. But it's a string. It's a nice abstraction.
It's a nice simplification. But it turns out there's an opportunity here now to see
how characters and numbers interrelate in a computer and see how powerful computer
programs and software are that we ourselves can write. But first, how do we access
individual characters in a name? I can easily get Stelios's name using the function
get_string, as we've seen, just like Sam did from the audience the other day. But
how do I actually get at, like, the S or the T or the E? Or if maybe he makes a
typo or maybe he, like, doesn't type it very neatly, how do I capitalize his name
or sort of clean up his user input like websites today very commonly do? Well, let
me go ahead and open up CS50 IDE again and just do a pretty simple example that
this time involves strings. Let me go ahead and create a new file. And I'm going to
call this file string0.c. And I'm going to go ahead now and write a short program--
come on-- once I've lost control over my terminal window. Now I've lost control of
my menu. This is my own fault for-- oh, here we go. Well, this is gonna look great.
Very inspiring here. Where'd it go? Oh, oh. Here. OK. That's an example of bad
design, so we will fix that. And now I see that I've misspelled string as strig. So
we're just gonna-- no one on the internet will ever know the following happened.
OK, so string0-- voila. Here we go. All right. So string0.c, and I'm gonna whip up
a really quick program here as follows. So int main void. And now string s gets
get_string. And I'm just gonna ask for the user's input in this way. And now I'm
going to go ahead and print out-- how about just say the word output here. And just
to be nice and tidy, let me put a couple of spaces here in anticipation. And now
let me go ahead and do this-- on line five, my intention is to get, like, Stelios's
name from him or whoever is playing this game. But now I want to go ahead and not
just print out, like, hello Stelios, and plug in his value s, which we've been
doing. I want to do this character at a time. And doing something one at a time
kind of suggests a loop. And indeed, I can do that. So I'm going to do for int i
gets 0, i is less than however long his name is, and then i plus plus. And now I
can introduce one other trick that you can kind of glimpse ever so quickly from the
screen I had up before. It turns out that %c is the placeholder for a character.
Perhaps no surprise. But the catch is I only have access to s, the whole thing, the
string s. But it turns out there's a new piece of syntax here. And as is kind of
sort of implied by our having used boxes to flank Stelios's letters of his name
there, turns out that the equivalent in C is to kind of sort of do the same, use a
box of characters, by using the square brackets, which you might not often use on
your keyboard. On a US keyboard, they're often just above the Enter key. And here I
can go ahead and type in s[i]. And so to speak, this is going to print the i'th
character, if you will, of Stelios's name. So i is going to start at 0. And I keep
doing plus plus, plus plus, plus plus. And using the square bracket notation, so to
speak, I can dive into the individual letters in his name in this case. So when I
run this, what's going to be the net effect? Let me go ahead and make string0. Huh.
OK, that is not valid C code, however long his name is. So I have a problem to
solve here. How do I actually get the length of his name? Well I can kind of cheat.
OK, so one, two, three, four, five, six, seven. All right. So we can just write
this program as follows-- 7. But this should rub you the wrong way. Why is this not
a good solution to the problem? Yeah? AUDIENCE: Because it's not changeable. DAVID
MALAN: Exactly. It's not changeable. I have this dynamism of get_string to get
Stelios's name. But seven is not going to be true of all the humans who might use
this program. I need something dynamic. Well, it turns out there is a function for
that. I can call strlen, for string length, pass in as input the variable whose
length I want to get, and that will return to me a number, which will be, in this
case, it would seem, 7. But it's going to be dynamic. So if I type in, like, David,
that should return 5, hopefully, and any number for any number of other humans
engaging in this. So let me go ahead now and try again. Make string0. A lot of
errors. And "use of undeclared identifier string." Wait a minute. We've seen this
before. How did I solve this last time? Yeah? What's up above missing? AUDIENCE:
The libraries. DAVID MALAN: Yeah, the libraries or the header files, so to speak,
for the libraries. So I need to include, I'm pretty, sure at least stdio.h for
printf. I need to include cs50.h for get_string. And we're almost there. Let me see
if that's enough. Make string0. Oh, Implicitly declaring library functions strlen
with type-- I don't really know what that is. But there's kind of an answer hinted
there-- include the header string.h and so forth. So turns out this is true. And
there's different
ways to know this. If I actually go back to reference.cs50.net and do strlen,
there's that function. Let me go back to the less comfortable-- whoops-- to the
less comfortable version. Notice that under synopsis of a man page or
reference.cs50.net is always a quick summary of how you use it. So just the
prototype of the function that gives you a sense of what it is-- size_t is
essentially equivalent to an int, just saying the size of something as a number.
But include string.h is the ingredient I wanted. So let me go ahead and copy that.
Let me go back to the IDE. I'm gonna be a little nit-picky, and I'm just gonna keep
things alphabetical at the top. But that's not strictly necessary. It just makes it
easier to skim later on when the list gets long. Make string0. Seems good to
go. ./string0. Inputs. Now I'm going to go ahead and type in Stelios's name. And I
got his output, as well. Now, that was a lot of unnecessary work to print his name.
I could have just used %s. But now I can make modifications. What if I wanted to
print it one per line? I can add that. I can make the program again, rerun it, and
type in his name, and now I get it one per line. It's a little ugly. Like, now it
says output s. But that's just an aesthetic bug. I could go in and fix that. But
now I have control over the individual characters in his actual name. So that would
seem to be progress in some form. But if I now have access to the individual
letters, we can kind of come full circle from the very first lecture where we
talked about zeros and ones, and then numbers, and then letters, and now, in turn,
words, otherwise known as strings, by way of a topic called typecasting. Types, of
course, are the types of variables we've been talking about. Casting means to
convert from one to the other. And you might recall from the first lecture that
capital A was the number we know in decimal as 65 and whatever pattern of zeros and
ones that is. Capital B is 66, and so forth. So can I see that now for the first
time? Well it turns out I can. Let me go back to the IDE. And let me go ahead and
create a new program called ascii0.c. and ASCII, again is just the standard. It's
an acronym, American Standard Code for Information Interchange, which maps letters
to numbers and numbers to letters. So let me go ahead now and whip up a quick
program. Include stdio.h for printf. Int main void. And then let me go here and do
the following. You know what? I'm gonna just go ahead and print out, let's say,
string s gets get_string. Let's just ask for someone's name. And then let me go
ahead and do the following-- for int i gets 0, i is less than the length of that
string-- learning from last time-- i plus plus. So this is gonna iterate over the
whole string. And now what I want to do is this. Let me go ahead and print out the
following. Let me print out the character itself, and then a space, and then how
about an integer, and then a new line. And we'll see what this does in just a
moment. I want to plug in values for these placeholders. So how do I get at the
first character of the name if the string is called s? Yeah, so s[i] for the i'th
character. And that's gonna plug in, literally, S-T-E-L if Stelios is the one
playing the game. But now I put a comma to plug in a second placeholder. And %i--
you know what I'm gonna do? I'm going to do int in parentheses s[i] semi-colon. So
it looks a little cryptic. But let me just remove this for a moment. This is just
the same thing twice-- print the i'th character of the name, i'th character of the
name. But in parentheses, I'm doing what's called typecasting. I'm taking whatever
that is, which is a char or character. And I'm saying, parenthetically, make this
an int instead. So if it's capital A, it becomes 65. Capital B, it becomes 66, and
so forth. And if I now compile this program after preemptively fixing what would
have been a mistake by adding the header-- make ascii0.c Whoops, sorry. Oh, common
mistake. Nothing to be done. I'm pretty sure there's something to be done. I need
to compile it. What did I do wrong? Yeah, don't put the .c. It's a little
counterintuitive, but when you want to make a program, you type the name of the
program, not the name of the file. Now, in-- oh, damn it. I almost learned from my
mistakes. What am I missing now? AUDIENCE: String.h. DAVID MALAN: String.h. All
right. Include string.h. Save it. OK, so let me make ascii0. Good. ./ascii0. And
now, Stelios, Enter. And now we see the ASCII codes or the numbers that correspond
to the letters in his name. They're pretty big numbers. They're in the 100s now.
And that's because they're lowercase. We've previously talked only about capital A,
capital B, and so forth. But it turns out that the lowercase letters also have
values associated with them, like some of those here, as well. And now it turns out
now that I know this, now I can kind of do some low-level stuff that we all take
for granted on our phones and websites like when you just type in your name in all
lowercase, and the website just fixes it, or if you type in your phone number with
parentheses, without parentheses, with dashes, without, the website just kind of
fixes it and cleans it up into some cleaner format. We now have kind of the low-
level control to do this. I won't type this one out manually just because it's a
little longer. Let me go ahead and open it up. And among today's examples in
source2, which is on the course's website, is this example here-- capitalize0. So
let me make a little more room for this. It's a little longer. But let's just focus
on just a couple lines at a time. Here's the beginning of my program main. Here is
a line of code where get_string before. I just say, give me the before string. And
then I claim, now print the string after making some changes to it. So what am I
gonna do? On line 11, I seem to have used the same ideas a moment ago, but with one
change, actually. I've done something a little different. Line 11 is very similar
to what I've been doing to iterate over the characters in a string. But I did
something different, which is what? What looks different now versus what was on the
screen a little bit ago? Anyone a little farther back? Yeah. AUDIENCE: [INAUDIBLE]
DAVID MALAN: Yeah, it looks like I'm declaring two variables all in the same
breath, so to speak. I have my int i equals zero, and we've done this a bunch of
times. But then I have a comma here for the very first time. n equals strlen of s.
But if you think about what these building blocks are-- OK, the comma is new, but n
is on the left, so that's a variable, apparently. It's probably an int because the
word int came before. Equal sign is assignment from right to left. strlen is a
function that returns the length of a string. So this would seem to be storing,
just to be clear, what number in n? Yeah, the number of characters in whatever the
string is the user typed in, "Stelios" or "David" or whatever the name is. So then
I have my condition. Then there's the semi-colon, which we have seen. And then
there's plus plus. So I claim that this is, in a sense, better design. It's a
little more complicated. Like, I typed out more characters. I added another
variable. But why might it be smart and good design to have used an extra variable,
using a little more space to keep a number around, so as to then simply compare i
against n? Why did I jump through these hoops? What do you think? AUDIENCE: Because
then it doesn't have to check what n is each time it goes through the loop. DAVID
MALAN: Exactly. AUDIENCE: [INAUDIBLE] DAVID MALAN: It doesn't have to check what
the length of the string is on every iteration because after all, once I or Stelios
or whoever types in their name, it's not going to change. It is D-A-V-I-D or
Stelios's name or Maria's name or whoever is playing the game. And so why would you
keep calling a function saying, oh, by the way, what's the length? By the way,
what's the length? What's the length? Just remember it the first time because odds
are it takes a little bit of time to do that computation and actually figure out
the length. And so here, we've simply kept that answer around in n and can compare
two variables. Meanwhile, here's a pretty big if-else construct. But if we break it
down into pieces, it's doing something relatively simple. On line 13, I am asking
the question, if the i'th character of s is greater than or equal to a lowercase A,
&&, which we haven't seen before. This means logically and. So if it's greater than
or equal to a and less than or equal to z-- put another way, if it's between a and
z inclusive in lowercase-- what am I doing on line 15, which is super weird-
looking? I'm first printing out %c, which is my placeholder. But then I'm printing
out the result of s[i] minus whatever lowercase A minus capital A is? I mean, this
is just strange now. But let me just point out one clue. Turns out there's a
pattern here. And humans did this deliberately. If you can do the arithmetic
quickly, how far apart is lowercase A from capital A? It's 32, right? And you could
just do the math. 97 minus 65-- oh, 32. How about capital B versus lowercase B?
It's 32. 32. It follows this pattern. So this is the say-- and it's sort of proof
by example. We're not even seeing all the way to Z, but trust me. 32 is invariant
across all of the letters of the alphabet. They're always 32 away. And I could
hard-code 32, but that feels a little inelegant. Why don't I instead just
arithmetically say whatever the difference is between lowercase A and capital A,
and that's all I'm saying in parentheses here. Whatever that numeric difference is
in the computer's representation of my numbers, just subtract that difference from
the i'th character. Now, what's nice is I kind of sort of should do this first.
Like, I should cast the character to an int. But I don't need to be so explicit.
The
computer knows that characters are integers. And the computer knows that integers
are character. There's this equivalence. I don't need to be so verbose as to even
say that. It just suffices to let the computer figure it out implicitly that in
this context, I'm doing arithmetic on numbers, and then, in the context of printf,
I'm displaying that number as characters. Nothing is happening. You're just telling
the compiler what context in which to treat these values, numeric or characters. So
long story short, what does the effect of these four lines of code have on the
characters in the user's input? What does it do? AUDIENCE: [INAUDIBLE] DAVID MALAN:
It capitalizes it. Right? It capitalizes it. And that might have been implied, too,
by the file name, in full disclosure. But it's how you think about solving the
problem of capitalization. Here's the string. Home in on the individual characters.
Figure out if they're within a range you want to deal with. And if so, do some kind
of mutation of them to change from one value to another. I could have done this a
horrible way. I could have had, like, an if-else-if-else-if-else-- I could have,
like, 26 conditions checking is the character A? Is it B? Is it C? And if so, make
it capital A, capital B, capital C. But that code would have been like this big or
bigger. This is now a more algorithmic way of solving the same problem. And if it's
not a lowercase letter between A and Z, just print it out. There's no work to be
done. Now, just so you've seen it, it doesn't have to even be as verbose as this.
In capitalize1.c, which is available also on the course's website, I've made my
code a little better designed. I'm now not reinventing as many wheels. I'm standing
on the shoulders of smart programmers before me. And I've clearly changed at least
one thing. Instead of doing this manual process of comparing against lowercase A
and lowercase Z, I'm just punting and using a function which, beautifully, is
called is islower, which just literally answers that question. Because another data
type in C is not just int and char and float and string in CS50's library, but
there's also something called a Boolean. A Boolean, also named after Boole, is
similar in spirit to a Boolean expression, true or false. But a Boolean variable is
literally just the idea true or false. And so islower you can think of as returning
a Boolean value. It returns true or false, yes or no. And the name of this
function, therefore is very appropriate. Is lower? That's a yes-no or a true-false
question. I don't know how it's implemented. If I really care, I could go to CS50
Reference, or I could use the man command on the IDE. And I could actually check
how this thing works. But I do need one takeaway. To use this, I need to use the
ctype library. So there's other libraries that we're now just scratching the
surface of. And you would only know they exist by reading documentation like that.
But you know what? I can go even further. You know what? If some human years ago
wrote a function to check if something is lower, what did he or she probably do, as
well, for me? AUDIENCE: Uppercase? DAVID MALAN: Yeah, isupper also does exist.
Yeah. So spoiler here. So isupper exists. But if they checked if it's lower or is
it upper, gonna just go out on a limb. toupper? Yeah. So it turns out there's a
function called toupper that converts a letter to uppercase. And indeed, I can now
leverage this in my third version of this program as follows. capitalize2.c gets
even better designed still, if you will. It's even shorter, fewer lines of code,
easier to read, fewer opportunities for bugs. How do I solve it now? I still
iterate over each of the characters, but I just blindly call toupper, toupper,
toupper on every character because I read the documentation. If you pass a
character to toupper that is already uppercase, it just prints it out. Doesn't
change it. If you pass in a punctuation symbol, it just passes it through. But if
you pass in a lowercase letter, it capitalizes it for you. And so I can now kind of
implement-- I can lean on whoever implemented that before me. It could have been
me. I could have wrote my own function called toupper But I don't need to because
in the world of programming, there exists libraries of code that other people have
written for us that we can leverage. Any questions, then, on that? Yeah. AUDIENCE:
So this method, you wouldn't be able to [INAUDIBLE].. DAVID MALAN: This would be
all of them, yeah. So if I only wanted to capitalize Stelios's first-- the first
letter of his name, I probably wouldn't want the loop. I would probably just want
to capitalize [0], specifically, of the letters. But I'd want to make sure that his
name is at least one character long, lest he just have hit, like, Enter
accidentally or maliciously. Absolutely. So let's just dive in to one other detail
here as follows. Suppose that I want to actually know what the length of a string
is. I know that there exists this function called strlen. But it turns out I can
figure out lengths of strings for myself, too. Let me go ahead and write a program
called strlen itself. But I'm not allowed in this example to use string length. I'm
going to go ahead and include the CS50 library. Let me include stdio.h. Let me go
ahead and do int main void. And now let me go ahead here and do string-- bad
style-- string s gets get_string. Name. And now let me go ahead and do int n equals
0. Just give me a variable, call it n, set it equal to zero. And then let me go
ahead and while I'm not at the end of the string-- also not valid code-- n plus
plus. I can use that plus plus trick that we've seen before for i plus plus. And
then I'm going to go ahead and print out whatever the value of that counter is
because I want in my loop to just count the number of characters in Stelios's name
or whoever's name actually ran the program. And just to be clear, this is what's
called syntactic sugar, which is a very sexy way of just saying this is shorthand
notation for doing this, which is just more boring-looking. This does the exact
same thing. It's just a more succinct way of doing it. And you'll see little
features of languages like this just to save us humans keystrokes. This, of course,
is not a solution to a problem. How do I know I'm at the end of the string? Well,
it turns out we need to break the abstraction layer, so to speak, of strings just a
little bit. So it turns out that in your computer, we have this piece of hardware--
RAM. And we saw this the other day. And we talked a little bit about the
limitations of computers and the finite amount of memory that they have. And if you
think about all of the chips on this device-- doesn't matter for today how this
works. But just know that there's lots and lots of bytes that can be stored in your
computer's memory. And you might have 1 billion bytes, 1 gigabyte, 2 billion bytes,
2 gigabytes. But for our purposes today, just think of this RAM inside of your
computer as just a long list of available bytes-- lots of bits, zeroes and ones,
that you can change the values of. And maybe it's kind of a grid, so there's lots
of bytes horizontally, lots of bytes vertically. We can kind of number them all so
that one of the bytes is 0, and the other one way at the bottom is, like, the 2
billionth byte. So just assume that we can number all of the bytes in our
computer's memory. Well, it turns out that when you type in Stelios's name, it of
course ends with an S. But it would probably be a stupid decision to just look for
an S when figuring out the length of someone's name because it's not gonna work on
my name. It's not gonna work on Maria's name or any number of other people in the
room. So we don't know enough yet about what's going on inside of the computer's
memory. It turns out that if you think of this grid now as your computer's RAM,
maybe top-left corner is byte zero. The one next to it is byte one, then byte two,
then dot, dot, to, byte two billion. So I'm just arbitrarily depicting it as a two-
dimensional grid. Turns out we need to know that there's this special character.
What C does for us even without our telling it to do, it always puts a secret
number at the end of any string the human types in. It's specifically represented
as backslash 0. But that's just the special way, like backslash n is special, of
saying that is eight zero bits all together. It's a special value, 0. And so now
that we have this so-called sentinel value, if you will-- sentinel value means this
is just special. The human can't really type this. Like, I can't actually type all
zero bits easily on my keyboard because honestly, even if you hit the number zero,
that is technically the character 0 because it turns out even numbers on your
keyboard map to different integers. But more on that another time. So 00000000 as
bits are what that is. And so if I write a program that calls get_string multiple
times, and Stelios is the first one to type in his name, it might end up in memory
looking like this. But then suppose one other person types in their name, like
Maria. Her name is just going to fit in the next available memory, but also be null
terminated, so to speak. The sentinel value is also called null, N-U-L. But that's
just all zeros. And then if someone else types in his or her name, it's still going
to fit in there. So Zamyla, for instance-- it wraps around, but again, this is an
arbitrary artist's rendition of my computer's memory. Z-A-M-Y-L-A, backslash 0. And
I can keep typing in names until I'm just out of memory. At that point, the
program's going to crash, or I'm gonna have an if condition that says too many
things in memory. Something's gonna have to stop at that point. So what this means
for my implementation, ultimately, is the following. I can now go ahead here and
change this silly English to the following. While the
n'th character of the string does not equal, quote, unquote, backslash 0. And I'm
using single quotes this time because recall from last time that we use single
quotes anytime we talk about single characters. We use double quotes any time we're
talking about strings. And even though s is a string, s[n] is the n'th or i'th--
doesn't matter what letter we use-- the n'th character. So that's a character. And
so we now need to use single quotes. So this is really just doing the following--
it's initializing n to 0. And then it's looking in memory. And it's saying, is this
backslash 0? If not, increment n by 1. Is this backslash 0? No. No. No. No. Damn
it. No. No. Yes. And at that point, I have 7 fingers up, or n is storing the value
7. That's what my program is going to print out. So now we have a complete program
that counts the number of characters in a string. I don't need this program because
strlen exists as a function. But it's now a capability to which I have access. Any
questions, then, on what a string really is underneath the hood as this sequence of
characters with a special null character at the end? Yeah. AUDIENCE: [INAUDIBLE]
DAVID MALAN: Ah, good question. What about other data types, if I can rephrase it
like ints and floats and so forth? Actually, strings are special. If I scroll back
to the list of data types that C has, for instance, most of these are of fixed
length. And this is why the compiler needs to know what you're putting in them
because the compiler and the computer in turn need to know is it one byte? Is it
two bytes? Is it four? Is it eight? How many bytes should I look at? Strings have
no predetermined length because, of course, we don't know who's going to type in
their name. But an int, turns out, in most systems is always going to be 32 bits or
maybe 64 bits, or, equivalently, four bytes or eight bytes because there's a one to
eight ratio. A bool is often one byte. It's a little wasteful. Even though
technically you need one bit, it's just easier to deal with eight-bit increments.
Chars are, by definition, eight bits or one byte. So almost all of the data types
are a fixed length. So you don't need to have a special null character. But strings
you do. Strings are special. Other questions? All right. So what can we start to do
with this? Well, it turns out that this idea of thinking about things that are
back-to-back-to-back-to-back- as being individually accessible is actually a very
powerful idea. Because up until now, we've just had this list of data types-- bool
and float and char and int. It's kind of a short list of very primitive things. But
it turns out if you want to write a program that doesn't just keep asking for one
name but asks for two people's names or 10 people's names or asks for, as you asked
earlier, the name or maybe their house or their dorm or their phone number or their
email address-- a whole bunch of different values-- it would be nice to kind of
store multiple things together. And one way you can store multiple strings is you
could call one string s. You can call the next string t. You could call the next
string whatever-- you could just come up with arbitrary names for your strings. But
that's going to very quickly devolve. Imagine, like, what the registrar uses here
or at Yale to actually keep track of students. They don't have a computer program
with thousands of variables inside of it. They probably have a computer program for
dealing with course registrations with at least one variable called students. And
inside of that students variable can the registrar fit one student, 10 students,
thousands of students. It can kind of grow to fill the number of values we actually
care about. And C isn't quite as powerful as that. We'll need another language like
Python or JavaScript to really get dynamism. But for now, we do have the ability in
C to represent multiple things back to back to back to back to back in memory. So
not just characters in strings. We can borrow that idea from strings and store, if
we really want, student, student, student, student like multiple strings back to
back instead of just individual characters. And what that idea is called is an
array. An array is a contiguous chunk of memory, something back to back to back--
literally physically next to each other, typically, in the RAM that we've presented
as hardware. But it's not just character, character, character. Maybe it's int, int
int, int, int or string, string, string, string, string or, more generally,
student, student, student, student, student-- multiple things back to back to back.
And so now we can actually give you a glimpse of what this thing here is that we
keep typing sort of on faith. Int main void literally says that your main programs
that you're about to start writing for pset one and beyond will be returning an
int, even if you don't do it yourself. They're going to return by default 0, it
turns out. And we'll see before long why this is useful for a main function to
return a value, even though we humans will rarely, if ever, see that value. But it
is interesting to note that main can take input, and not input in the sense of
get_int and get_string and so forth. You can actually provide your program with
input at the so-called command line. All this time, I've been typing ./mario0,
./mario1, and no words after that. And yet we've shown you clang already, the
compiler itself, which can take in, like, -o and then -lcs50, all of these
additional key words that somehow influence its behavior. So wouldn't it be nice if
I could write a program where I don't prompt the user eventually for his or her
name. Let me just let them type their name at the command line and hit Enter once
and be done with it, just like clang is just one long command, and you're done with
it. There's no prompts. Well, we can do this if we change void to this. And it's a
mouthful, but there is an alternative version of main that does not just take zero
arguments. That's what the key word void all this time has meant. It just means
main takes no input by default. You have to prompt the user explicitly with get_int
or get_string or whatever. But there's an alternative second version of main in C
that takes two inputs. And you don't have to provide them explicitly. We'll see how
to use this in a second. Main can also be handed two inputs. One is an int, and one
is an array of strings. The int is the total number of words that the human has
typed at their keyboard. The argv, argument vector, by convention, though we could
call it anything we want, that is an array of words that the user typed at the
prompt before hitting Enter. And so this is useful in the following way. I'm going
to go ahead and in today's source code open up an example called argv-- for
argument vector-- 0 as follows. In argv0, there's not all that much going on. And
if you at least kind of take on faith the concept here, you can perhaps infer
what's going on. So I've changed what main looks like on line six, the signature of
main, so to speak. And then I'm asking a question. If argv equals equals 2, then
print out "Hello, something." Otherwise, just print out the hardcoded "hello,
world." So it looks like argv[1] is kind of being treated like we were treating
strings a moment ago. But this is the special syntax that's new. If you use square
brackets like this, like I've done, with no numbers inside, that's like telling the
computer, hey, computer, this variable argv is going to be an array of some length
of strings. Why strings? Because string is the word immediately to the left--
string argv0[]. Now, I don't know how the strings are gonna get in there. The
computer's gonna do that for me. But it gives me this capability. Let me go ahead
and compile this program as follows-- make argv0. ./argv0. Hello, world.
Uninteresting. But if I now type in my name at the prompt and hit Enter, now it's
dynamic. So what must this mean? Even if the syntax is a little new, we can kind of
infer now what this must be doing. Argc happens to stand for argument count. So
argc equaling two apparently implies that the human typed two words at the prompt--
the name of the program, and then whatever else he or she typed. Meanwhile, argv--
argument vector-- is the variable that you can use to go get the first word or the
second word or, if there are more, the third and fourth words. In fact, if I kind
of change this manually, what should probably be, by that logic, in argv[0]?
AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, the name of the program, right? So let me
see. So make argv0. ./argv0 David. Hello-- OK. I mean, it's stupid-looking, but
that's all I'm doing. I could be a little bold and say what is in the 100th
location of this array or list, as you can also think of it? Make argv0. ./david.
Whoa. That is bad. And get used to this because it will start to happen with
greater frequency. Segmentation fault is a very cryptic way of saying you touched
memory, RAM, that you should not have. And you can kind of think of what this
means. So if argv[0]-- let me pull up my picture of an array. If my array looks
like this, and argv[0] is here, and that was safe to print, and argv[1] is here.
That was safe to print. It was my name. And argv-- what did I do-- 100, it's like
way over here. I don't know what's over here. And indeed, touching that memory was
very bad. The program crashed. And segmentation fault is an allusion to how
computers lay out memory. You've got like a segment of memory here, a segment of
memory here, a segment of memory here. Segmentation fault means you touched a chunk
of memory that was not yours to use, to change or to even view. So I got lucky,
though-- well, I didn't get lucky. I could sometimes see garbage values. Let me be
a little more conservative. Let me put [2], which is just one past what I typed in.
It's sometimes undefined behavior. I don't
know what I'm gonna get. Null. So there's some funky characters there or zeros
there. But now you're playing with fire, so to speak. These are logical bugs in my
program. But it is OK to check if if argc is two, then it's OK to look at 0 and 1,
two things and only two things. Any questions on that? All right. So where, in what
domain, is this kind of thing helpful? And there's a couple more examples of argv
that you can look at online. Turns out that in the world of cryptography, this
stuff really starts to get interesting. So the world of cryptography is all about
scrambling information. Maybe back in the day in grade school, you might have
passed notes to a friend or a crush that you had in the classroom. And if you were
really clever, or your teacher was really adversarial, you might have to encode
your message so that you're not just writing, like, "I love you" or whatever. But
you instead change all the A's to B's, and all the B's to C's or hopefully
something a little more cryptic than that so that the teacher can't just change all
the B's to A's and all the C's to B's. But you kind of scrambled the words. But you
scrambled the words, perhaps, in such a way that it's reversible by the recipients,
the recipient of your encrypted message. So to encrypt information means to convert
it into some other format, from what's called plaintext to ciphertext, which sounds
really cool, and it's just the scrambled version. But it's not random. It's got to
follow a pattern or, if you will, an algorithm so that he or she on the other end
can reverse the algorithm and undo it. Now, in the simple example I proposed, A
becomes B. B becomes C. What is the secret that you and your crush know? It's
probably just the number one. He or she has to just know, if you added 1 to the
letters, that they should subtract 1 to the letters. And hopefully they know that
if you hit Z, you should probably wrap around to A and not get into a weird
punctuation or something like that. So you can keep an algorithm as simple as that.
So we can think of cryptography, really, as just an example of problem-solving. You
want to send a message from someone, yourself, to someone else, maybe over a very
insecure medium like passing a note through the room. And you want only one person
to know how to access it. That's like providing inputs, and you want outputs-- your
plaintext and your ciphertext-- so that no one can understand it except you and the
recipient. So it turns out that cryptography-- there's different forms of it, but
perhaps the simplest looks like this. There's two inputs, the plaintext, the
message you want to actually send, and then the key, which might be a number like 1
or 2 or 25 or 26. And more than that's probably silly because you're just wrapping
around the alphabet even more, so to speak. But the output is going to be something
called ciphertext. And when your crush receives this message, he or she really just
needs to reverse the process. They have to know the key. Otherwise, they're going
to be guessing all day long what your message actually was. But so long as you know
the secret in advance, you can do this. Now, of course, there's a gotcha. You have
to be on speaking terms with this person you're crushing on because he or she needs
to know what the key is in advance. Otherwise, you're just sending them nonsensical
values. So that's kind of, too, a catch-22. In order to send a secret message from
A to B, A and B need to be able to confer in advance and agree on this secret. But
if you need to agree in advance on a secret, why don't you just use that time to
send the message directly to the person? Right? So there's this disconnect. And
we'll come back to this before long because most of us probably don't know someone
who works at, like, amazon.com. And yet when I buy something on Amazon, I've been
told all these years that it's secure. It's encrypted. My credit card, my name, and
all of that are somehow encrypted between me and Amazon in Seattle or wherever
their servers are. But I don't know anyone there. And yet somehow, cryptography
still works. So this type of cartography is just one called secret-key
cryptography. But there's public-key cryptography and yet other things. And so what
you'll find in problem set two in particular is you'll have an opportunity to
explore this world, whereby you'll write software that encrypts and then,
hopefully, decrypts information and even, if you're among those more comfortable,
an opportunity to try writing software that takes passwords that are encrypted--
or, more properly, hashed, so to speak. More on that before long-- and you try to
crack those passwords, actually figure out what the passwords actually were. And it
all boils down to, ultimately, in the context of C, taking as input a message, like
a plaintext, and somehow converting it to ciphertext by manipulating those
individual characters, or, if you're the recipient, vice versa. And I like to show
a clip from, frankly, a film you can watch, like, literally every hour on the hour
around the holidays, "A Christmas Story," because it has an example of a very
simple form of cryptography. If you ever saw this movie, this is little Ralphie.
And he's really excited because over months or whatever, he saves up and sends in,
like, all of these, like, cereal box covers or something like that, and gets back,
finally, this secret decoder ring. And the secret decoder ring is kind of a nice
mental model to have for the type of cryptography I'm proposing here, this sort of
rotational idea-- A becomes B. B becomes C. Because if you imagine a ring that has
another ring on the outside, you can kind of line up the A's and Z's, so to speak,
differently. And that's what he was saving up for. So I thought we'd take just a
moment to look at this clip to inspire one of the problems ahead. [VIDEO PLAYBACK]
- Be it known to all and sundry that Ralph Parker is hereby appointed a member of
the Little Orphan Annie secret circle and is entitled to all the honors and
benefits occurring thereto. Too - Signed Little Orphan Annie! Countersigned Pierre
Andre! In ink! Honors and benefits already, at the age of nine. - Let's go
overboard! - Come on. Let's get on with it. I don't need all that jazz about
smugglers and pirates. - Listen tomorrow night for the concluding adventure of the
black pirate ship. Now it's time for Annie's secret message for you members of the
secret circle. Remember, kids, only members of Annie's secret circle can decode
Annie's secret message. Remember, Annie is depending on you. Set your pins to B2.
Here is the message. 12, 11-- - I am in. My first secret meeting. - --14, 11, 18,
16-- - Oh, Pierre was in great voice tonight. I could tell that tonight's message
was really important. - --3, 25. That's a message from Annie herself. Remember,
don't tell anyone. - 90 seconds later I'm in the only room in the house where a boy
of nine could sit in privacy and decode. Aha! B! I went to the next. E. The first
word is "be!" S. It was coming easier now. U. - Aw, come on, Ralphie! I got to go!
- I'll be right down, Ma! Gee, whiz. - T. O! "Be sure to"-- be sure to what? What
was Little Orphan Annie trying to say? "Be sure to" what? - Ralphie, Randy has got
to go. Will you please come out? - All right, Ma! I'll be right out! - I was
getting closer now. The tension was terrible. What was it? The fate of the planet
may hang in the balance. [KNOCKING] - Ralph, Randy's got to go! - I'll be right
out, for crying out loud! DAVID MALAN: Gee, almost there! My fingers flew. My mind
was a steel trap. Every pore vibrated. It was almost clear! Yes! Yes! Yes! Yes! -
"Be sure to drink your Ovaltine." Ovaltine? A crummy commercial? Son of a bitch!
[END PLAYBACK] DAVID MALAN: That's it for CS50. We'll see you next time. [APPLAUSE]

You might also like