This lecture discusses C programming and debugging code. The lecturer begins by reviewing the tools used last lecture, including the CS50 IDE for writing code and the clang compiler. He explains that source code is compiled into machine code that computers can understand. When errors occur during compilation, it is important to carefully examine the first error message. The lecturer demonstrates adding string handling and library functions to a simple "Hello, World" program in C. This requires including the CS50 header file and linking against the CS50 library during compilation. The help50 command can assist with understanding error messages.
This lecture discusses C programming and debugging code. The lecturer begins by reviewing the tools used last lecture, including the CS50 IDE for writing code and the clang compiler. He explains that source code is compiled into machine code that computers can understand. When errors occur during compilation, it is important to carefully examine the first error message. The lecturer demonstrates adding string handling and library functions to a simple "Hello, World" program in C. This requires including the CS50 header file and linking against the CS50 library during compilation. The help50 command can assist with understanding error messages.
This lecture discusses C programming and debugging code. The lecturer begins by reviewing the tools used last lecture, including the CS50 IDE for writing code and the clang compiler. He explains that source code is compiled into machine code that computers can understand. When errors occur during compilation, it is important to carefully examine the first error message. The lecturer demonstrates adding string handling and library functions to a simple "Hello, World" program in C. This requires including the CS50 header file and linking against the CS50 library during compilation. The help50 command can assist with understanding error messages.
This lecture discusses C programming and debugging code. The lecturer begins by reviewing the tools used last lecture, including the CS50 IDE for writing code and the clang compiler. He explains that source code is compiled into machine code that computers can understand. When errors occur during compilation, it is important to carefully examine the first error message. The lecturer demonstrates adding string handling and library functions to a simple "Hello, World" program in C. This requires including the CS50 header file and linking against the CS50 library during compilation. The help50 command can assist with understanding error messages.
Download as TXT, PDF, TXT or read online from Scribd
Download as txt, pdf, or txt
You are on page 1of 23
[MUSIC PLAYING] DAVID MALAN: All right.
So this is CS50, and this is lecture two,
our continuation of C. And for the next several weeks, we're gonna keep using C. But we're gonna focus less on the language and the syntax, which we'll get experience with over time by way of the problem sets, and more and more on the ideas and more and more on the problems that we can solve. But before we forge ahead with anything new, let's take a quick look back at where we left off and what we'll sort of assume for today's comfort level, and ask any and all questions along the way. So in order to program last time, we needed a tool. And that tool was this thing here, CS50 IDE. If you haven't dived in already, you probably will this weekend for problem set one. And this will be a web-based programming environment that's got all the requisite tools you need in order to write code, compile code, and then, starting today, debug code or find mistakes in code. But this is requisite because it's not sufficient to just write this. What do we call this, generally speaking? Yeah, so this is source code. So this is code. When someone says, I write code, they write stuff like this. And this is, particularly, a language called C. But, of course, computers don't speak C. And they don't speak Java, and they don't speak Python or C++ or any of the languages with which you're familiar. They only understand what at the end of the day? Yeah, so binary. So binary is, of course zeros and ones, otherwise known as machine code in this context, insofar as it it is code. It's instructions that implement some problem- solving techniques. But it's just zeros and ones that computers understand. So we needed a tool to get from A to B. And that was called what? Yeah, a compiler. So a compiler, of course, does this for us. Source code is the input. Compiler is the program or really the algorithm, albeit in the form of a piece of software. And then the output is machine code, zeros and ones. And for our purposes, we're not going to worry about how we get from step A to B per se. We'll want to use the tool. But this is another area unto itself in computer science. If you want to understand how compilers work and how humans got from literally zeros and ones to something called assembly code to higher-level languages, you'll see a glimpse of that in CS50. But it unto itself is a whole field that might prove ultimately of interest. But this, more mechanically, is how we compiled code. Clang is a compiler. It stands for C language. And it's just software that some humans wrote some years ago. And there are alternatives. If you've ever used Visual Studio in the Windows world or GCC in the Linux and Unix world, there's bunches of other compilers. We just happened to use clang since it's pretty popular. And then that second command is even stranger looking. But it represents the act of doing what? ./a.out. Yes, over here? Yeah, running the program. Exactly. So ./a.out is a cryptic way of running a program. But it's like the textual equivalent of double- clicking an icon. And a.out is just like the default name you get, assembler output, when you compile a program without specifying its name. But we were able to specify a name. If you introduce a technique called command-line arguments, you can be a little more precise. clang -o for output, then any word you want. In this case, I went with "hello." And then the name of the program or the file that you want to compile. But, this of course, gets pretty tedious. And, in fact, there's a missing step. Sometimes when you want to write a program, it suffices to compile it exactly as that. But let me go ahead and do this. Let me go ahead into CS50 IDE, and let me go ahead and briefly do the following. File, New. And let me go ahead and save this as "hello.c." And just from memory, I'll quickly recreate that same program. int main void, and then we had printf, hello, world. And then just for good measure, backslash n, which means move the cursor to the next line. And now I'm gonna go ahead and save that. If I go ahead now and run clang hello.c, looks good. And ./a.out. That looks good, too. But recall that we introduced some other functions the other day, as well, like get_string and get_int. And we'll see bunches more before long. And if I do that, notice that I have to do a couple of things. So if I want to do, like, string name gets get string, and then, quote, unquote "name" to prompt the user, recall that the left-hand side says, hey, computer, give me a variable that's gonna store string. Call it name. Could have called it anything. I could have called it s. But why might it be arguably better to call my variable name instead of string, or s, rather? Yeah? AUDIENCE: It's clearer. DAVID MALAN: It's just clearer, right? It might be a very marginal, nit- picky detail. But it's just clearer. And when our programs get bigger, it's just nicer to be able to read words and understand implicitly what they mean without having to think through what x or y or s or whatever actually is. On the right-hand side, meanwhile, we had this function get-string, whose purpose in life is to go get a string from the user, from his or her keyboard, prompting them with a word like "name," and then return it, just as Sam handed me back a slip of paper with a name on it. But there's a catch. It's now no longer sufficient to just adapt my code for this new approach. And I had, again, to change that second line. I had to give a placeholder, which is %s. And if that's a little cryptic looking, just kind of think of it like a Mad Lib, if you're familiar with those, where there's just, like, a fill-in-the-blank here. And all %s is doing is saying, put a word there. What word? Well whatever comes after the comma, whatever variable or value is there. So that's good. But a couple of things can go wrong. And let me point those out so that you don't perhaps trip over it yourself on your own. Let me go ahead and do "clang hello.c" now that I've made these changes and saved them. Bunches of errors all of a sudden, even though I've not changed all that much code. Again, rule of thumb from last time should be even if there's lots of error messages, always look at the first one first because the rest might just be kind of a resulting cascade of errors, only one of which is important, which is the first. Now, it says "use of undeclared identifier string. Did you mean stdin," which is something else altogether. I didn't. I meant string. And it turns out that string is a feature of the so-called CS50 library. So this is one of these training wheels we're just gonna use for a few weeks until we dive in underneath the hood of strings, too. But in order to use anything from CS50, what did I need to add to my code, too? AUDIENCE: Source code. DAVID MALAN: Source code, yes. But what else? AUDIENCE: The library. DAVID MALAN: The library. And the library was the CS50 library. And that means there's a file somewhere on the computer, in the IDE, called cs50.h, a so-called header file-- more on those in a bit. And so I did forget that detail. But suppose you don't recall that yourselves. Well, you might recall from one of our orientation sessions that CS50 has tools with which to help for this, too. You don't have to turn to this course's online discussion forum. You don't have to go to office hours, necessarily, for error messages like this. If you can't quite wrap your mind around what's happening, go ahead and do this instead. Instead of just running clang hello.c, do something like help50 clang hello.c, where help50 is a CS50 specific command, sort of a virtual teaching fellow, if you will. And if we recognize your error messages in yellow at the bottom, we'll highlight the first one. And then we'll try to give you advice like you might get in person. So "by undeclared identifier, clang means that you've used a name, string, on line 5 of hello.c, which hasn't been defined. Did you forget to include cs50.h, in which string is defined, atop your file?" So we'll generally try to prompt you with rhetorical questions that hopefully are correct in leading you toward the right solution. So OK, that jogged my memory, at least, even if I'm not yet 100% comfortable with what these lines really are doing. And we'll tease that apart more today. But it feels like I'm trying to do this. But it turns out there's one other gotcha here. clang hello.c, Enter. Dang it. Well, actually, this is a net positive, right? Fewer error messages, it would seem. But now one we did not see the other day-- "undefined reference to get_string." So it's kind of similar in spirit. Something is not understood. But it's different wording, certainly. And I'm not quite sure what that means. But it turns out that it's not sufficient just when using most libraries to use include cs50.h or other header files, as we'll eventually see. That just teaches the compiler that something exists. It was like briefly last time, when we talked about prototypes, where I put a little one-liner that just said, by the way, this function's eventually gonna exist. That's what the header file's doing. It's like a promise to clang, this shall exist. But there's a second step. You actually have to feed clang the zeros and ones that implement the CS50 library that you, of course, did not create yourself, but we did pre-install in the IDE. And there's a separate way of doing that. Rather than just do clang hello.c, you have to do what's called linking your code against that library, at least if it's a third-party library that doesn't come with the computer. It came from humans like us. So this now says, hey, clang, compile hello.c, but link it against the CS50 library, which means take my zeros and ones, take CS50's zeros and ones, combine them, and then give me my actual program to run. And so that's going to be a key ingredient. And help50 could guide you toward that solution. But if I hit Enter now, now it seems to compile. ./a.out. My name shall be David. Enter. And it says, "hello, David." So again, don't get hung up, ultimately, in office hours and problem sets on these kinds of errors. You're gonna hit these bumps from the get-go. But just realize-- look for sort of familiar words. Use things like help50. Reach out to the course online. And just get over those hurdles because at the end of the day, the interesting stuff's gonna be the logic of the programs and the actual problems we're trying to solve. So what is it that's actually, then, going on here underneath the hood? And frankly, this is very quickly becoming tedious. So how do I automate some of these processes? Because it's very easy to forget this, and it's just very boring to continue to type so many commands. Well, recall that there was a shortcut we talked about the other day, which just kind of hides all of these details. You don't need to remember -o. You now don't need to remember -lcs50. What do you do instead to make a program? Yeah? Yeah, so make, it's not a compiler itself. It's just kind of a helper program that knows how to run a compiler for you. So frankly, the simpler approach is just to do that-- make hello. And the reason it outputs so many more words is just because we have, in anticipation of teaching the semester, sort of preconfigured it with command-line arguments, additional words, that we expect you're gonna need at some point. And this just saves you the trouble of having to futz around with the manual to figure that kind of thing out. But that's why it looks cryptic. But notice this is really the most important word-- hello followed by hello.c. And those are your two ingredients. All right. So what's going on, then, underneath the hood there? Well, it turns out that even though we can simplify the command structure, it's actually doing quite a bit for us. And this the process of compiling. But that was kind of an oversimplification, or, put more intelligently, like an abstraction. There's actually quite a few steps that go on underneath the hood, one of which is called preprocessing and compiling and assembling and linking. So let's do a quick dive- down here. But then we'll abstract away, just so that you've seen what's going on. But henceforth, we can just take for granted that all of this is happening. So here is source code, same program as the simplest version we had a moment ago. And ultimately, I need to get this to machine code. Well, let's see if we can't just visualize how we get from point A to B without completely abstracting it away with just those big arrows. So this is my source code. And it turns out that the very first step of turning source code into machine code in the world of C is you first run what's called a preprocessor. You don't do this explicitly, although you could if you were really low-level and interested in it. But what the preprocessor does, essentially, is anytime there's a line of code that starts with a pound sign, or a hashtag these days, that's a special command that gets, essentially, replaced with the contents of the file, at least in this case. So somewhere on the idea is a file called, literally, "stdio.h." And so #include means go get that file and essentially copy and paste it here. And so when you preprocess your code, this yellow line here becomes something like this. And I'm doing "..." it's dozens if not hundreds of lines long. But there's one juicy line in it which is the little clue to clang that printf shall exist. And that's why you need stdio.h. So that's essentially, for our purposes today, all the preprocessor does. It does these kind of find and replace style operations so that now your file, without you knowing it, suddenly became much bigger because it's got other lines of code that someone else wrote. And then your code remains right there as it was. But the next step after preprocessing is something called compiling itself, which technically, the compiler, if we really want to be nit-picky and look at its formal definition, is actually taking these yellow lines, your source code and someone else's, perhaps, and converting it into something called assembly code. And this is a language that humans kind of sort of still do, but back in the day really did program in. And in fact, if you have a computer with an Intel CPU, a brain made by Intel inside of your computer, there was and still is a big user's manual that tells programmers around the world that this Intel CPU understands the following instructions-- add, subtract, multiply, divide, all the basics, and then things like move numbers from here to here, read numbers from here to here, just move stuff around in the computer's memory. And so even though this really looks cryptic even to me, since I am by no means an expert at assembly language, certainly, all these years later, you can see words that kind of sound familiar, like "mov" suggests moving a value from one location to another. "sub" alludes to subtraction, so subtracting one number from another. And without really thinking this through carefully, I'm not really sure what's going on yet. But I do see a familiar word down there called "printf." And so long story short, what the computer or compiler specifically has done is it's taken my more user-friendly C code, converted it to something that's a little closer to what the machine understands. But it's not there yet because the machine only understands zeros and ones. So there's another step called assembling. And the assembling process simply takes assembly code and converts it to zeros and ones. Now we're down to the zeros and ones. And what's amazing-- if it's interesting in the first place-- is when you run clang and hit Enter, all of this is just happening instantly. And you're getting these zeros and ones, this output. But I've left the room on the other side because all we've done is convert my code from source code to assembly code to machine code. What needs to now be merged in, so to speak, for that "Hello World" program? Yeah, so still need, like, stdio, the standard I/O library that has printf. So the next step is to take a whole bunch of zeros and ones from somewhere else on the system, combine them until this is the file containing a.out or hello, whatever you called your program. And that, ultimately, is what the computer understands. So that is a very low-level detail. Thankfully, we learned in the very first lecture this notion of abstraction, which means even as you dive in underneath the hood and sort of understand how we're building up, now, henceforth-- literally every minute hereafter-- that whole mouthful just becomes compiling. And indeed, that's what most people in the programming business refer to as compiling, is all of those several steps. But that's all that's happening. Feels like magic, but it's just one step after another gets us closer to our goal. Questions? It's about as low-level as we'll get. Yeah? AUDIENCE: Why do you have to go through the assembly code and then the machine code? Why not just go straight to machine code? DAVID MALAN: Good question. Why do you have to go from one step to another, like from source code to assembly code to machine code? You absolutely could. It just happens to be the case that there's lots of humans in the world and lots of people working on different projects. And this notion of layering your software on top of someone else's on top of others' allows us to build more complex systems much more cleanly, if you will. And there's different types of computers in the world. There's Macs. There's PCs, which even though these days, they're a lot more similar underneath the hood, literally, than they used to be back in the day, there's different CPUs. There's phones that have very different CPUs. And wouldn't it be nice if I could write my programs in one language and compile them into zeros and ones that do work on a Mac and on a PC and on an Android phone and an iPhone and so forth? And that's why by having these sort of different layers, one set of humans or one person can implement the process of converting C to assembly code. Then someone else can take it to the zeros and ones, in some sense. Or even-- there's even intermediate steps. Compilers have front ends and back ends and all of this complexity. But it gives us advantages because it means we can sort of decide which types of hardware to support more easily. Really good question. Other questions? OK. So with that said, let's now consider any number of ways in which things can go wrong. It's easy for me, certainly, to write "hello, world," and everything just kind of works. And even when it doesn't, I quickly know how to fix it. And it's only from experience and practice. But let me just give you a teaser not just of help50 but two other tools that you'll see, particularly for the problem sets, that will not necessarily teach you how to write good code-- good, efficient code. That's where the humans come in and the teaching fellows feedback and sections and office hours and more-- but at least to write correct code that meets our specifications and that's well-styled, at least looks good. But the third ingredient, recall, besides correctness and style is gonna be design, which is something we'll learn after practice and examples. So with check50, this is a tool that comes in CS50 IDE, recall, if unfamiliar, that allows me to essentially do this. Let me whittle this back down to my simplest hello, world program like this. I no longer need the CS50 library. I can run make hello. Seems to work. And how do you go about testing your programs if you've written this for a problem set? Well, the easiest and most straightforward way, of course, is just run it. Looks like it's correct. And it is. And there's not too much that can go wrong in this program. But soon, you'll see, with problem set one and beyond, anytime you start getting input from the user where he or she has to type their name or a number or other things, you can absolutely concoct scenarios where something goes wrong. But if you run a command in this case like check50, we can do the following. Let me go ahead and first make a directory called-- let me go ahead and do this and do mkdir-- for make directory-- hello. And then we didn't see this the other day. And you'll see more of this in today's super sections, or classwide sections, which will also be filmed. I'm just gonna to move this file into a directory called hello. So that's like on a Mac or PC just dragging and dropping it. But I'm doing it with my keyboard. What's the command to change into another directory? cd. So that's like double-clicking on a directory, albeit with my keystrokes only. And now I'm gonna go ahead and run this. I can run make hello again. Seems to work. And I can run ./hello. Seems to work. But now let's see if CS50 agrees. So check50. And then I'm gonna type "cs50/2017/fall/hello," which looks like a bunch of folders, but it's not. It's just a unique identifier that has sort of some hierarchy to it. You would only know to type this by reading problem set specification online. And what this is gonna do, if you haven't seen it already, is actually connect to CS50's server. It's gonna authenticate you, if you haven't already. I'm gonna go ahead and log in as student50. And now hereafter it will remember my password, for at least some amount of time, so you don't have to type it in every darn time. Then it's preparing. It's uploading. And what's happening now is my "hello.c" file is somewhere in the cloud on CS50 servers. We are running the checks, the tests that the staff wrote. And hopefully, I'm gonna see a whole bunch of green smiley faces that look a little yellow on this projector, but those are, in fact, green smiley faces instead of frowny faces, which would suggest something is wrong. So that's all good. And don't be discouraged if you see a few frowny faces or a few flat, confused faces if something else is awry. But style50 does something different. Right now, the style of my code, I'd argue, looks pretty good because it's kind of hard to go wrong when it's this short. But we'll see a way. And if I instead run "style50 hello.c," just the name of the file I want to check-- looks good, but consider adding more comments. And that's pretty compelling because there's zero at the moment. And so what kind of comments might you want to add? Well, in this program, it's not that compelling to add that many comments because the reality is this program's so short it probably takes me less time to read the code than the comments. But it's very common, as you'll see in the examples from lecture, to do something like this-- "says hello to user," just a quick one-line summary so that when you're skimming the file or looking at the code, OK, got it. I know what this does. And if I care to know how it does that, then I can read the code. And so that would be a comment. And that will probably make style50 happy in this case. But what if I'm getting a little sloppy? And I remember vaguely that I was in the habit of hitting Tab or the space bar in lecture. But I can't be bothered to do that when I'm working on my problems set. I just want to get the darn thing to work. It's not uncommon for code to eventually start to look like this, even though this, too, is a simple program. Now, good style, as you'll see and learn from practice, dictates that just like in Scratch there were those yellow puzzle pieces that kind of hug the code, similarly, inside of curly braces, you really should be indenting. And so if I go ahead and sort of forget that and now run style50 on "hello.c," I'll see see my code outputted in the terminal window, the bottom of the screen. But green suggests hey, programmer, add the following characters. So green suggests add here. And if I go ahead now and reindent that by hitting Tab-- specifically four spaces, which is a human convention-- it should make it happy again. We can go in the reverse direction, though. Suppose that I got a little confused as to what I actually am supposed to indent-- and you might even see in textbooks and some online resources, some people write their code like this. Let me go ahead now and run style50 on this. It's gonna print out my code. And red in this case means remove those characters that you might not otherwise see. So it's not always going to be perfect. And especially when the programs get long, it might be a little nonobvious what changes you have to make. But just like with error messages, start at the top. Make one or few changes. Save it and rerun it, and see what the updated advice is. And I can't stress this enough, especially with problem set one and any problem set thereafter-- don't get into the habit of sitting down and trying to bite off the entirety of a problem. Odds are with Scratch, you didn't sit down and write the whole thing without once playing it or testing it or adding features to it. Don't get into that habit, then, in C. Take steps and steps, just as we've been doing with these examples so far. All right. Any questions, then, on those tools? And we'll come back in just a moment to more sophisticated debugging techniques. All right. So one of the problems that we were distracted by earlier is there's this old-school games, "Super Mario Bros.," wherein a character like this jumps around the screen quite a bit. And it's one from the very first Nintendo game, and there's lots of obstacles in the way of Mario as he's running left and right and jumping. And some of these obstacles can be represented with fairly simple constructs like bricks in this colorful world. And we can approximate this just by using characters on our screens, as well. So I actually poked around for far too much time last night looking at old "Super Mario Bros." maps, which if I had them in, like, the 1980s, would have made "Super Mario Bros." a lot easier. But people have captured all of the imagery from this game. And one snapshot from this game was a screen like this. So eventually, Mario's supposed to run through this. And he's supposed to bump his head up against the question marks and get coins and so forth. But for now, I'm gonna really, really simplify this and propose that all I care about for the sake of discussion is this line of question marks. How would a computer program, whether in "Super Mario Bros." or today here in Sanders, go about printing a line of question marks in a row like that? Well let me go ahead and open up CS50 IDE. I'm gonna go ahead and create a new file here. And I'm just going to go ahead and call this, say, "mario0.c" because it's the first or the zero version of this program. And I just want to print, like, four question marks. So let me take a stab at this first. So #include stdio.h, which I think I need because why? AUDIENCE: Printf? DAVID MALAN: Yeah, I need printf. i need to be able to print the character. So int main void is what comes next. And we'll start to tease apart why that is today. And now I'm going to go ahead and print out "????." And then semi-colon. All right. Let me go ahead now and make mario0. ./mario0. And I kind of sort of have a very ugly textual representation of a really fun-- at least 1980s style-- game. But there's a slight aesthetic bug. And I made this same mistake the other day. How do I move my cursor onto the next line? Yeah, so backslash n. So backslash is the one we're about to type, and forward slash or slash is what people would call just the other direction. So that's backslash n. And that's a special escape character, so to speak. For now, just know that this starts to confuse the computer if you just literally hit Enter. Now, your code's on two lines, when really it's just one idea or one function. So humans decided some time ago, let's just represent that special character that you would otherwise just hit on a keyboard. So now if I rerun make mario0. ./mario0. OK, now looks a little better. But we know from scratch that we don't just need to do question mark, question mark, question mark, question mark, especially if I want even more coins to be available on the screen. What's the right programming construct to just give me more of these? AUDIENCE: A for loop. DAVID MALAN: Yeah, like a for loop, right? So let me go ahead and tweak this a little bit. Let me go ahead and in, let's say, "mario1.c"-- so "mario1.c"-- I'm going to instead do this. So for int-- to give me an integer-- i equals 0 by default, though it could be 1. But programmers tend to use 0. i is less than-- I'm not sure, so let's just put a big blank there for a moment. And then i plus plus, I remember, being the way to increment. And then inside of this loop, I'm going to do printf "?" semi-colon. And now let's answer this question. If this for loop, which, recall, has a very methodical process to it-- it initializes, checks the condition, does something, increments, checks the condition, repeat again and again. What number should I put on the otherwise blank line here? AUDIENCE: Four. DAVID MALAN: So four. But if I'm counting from 0 to 4, that feels like it's five numbers. So three might get me closer, but less than. We have this relational operator. Less than, could have been greater than in other contexts. So the less than actually saves us. If I do for here, think about logically what's happening. i gets initialized to 0 for the first time. And we get a question mark printed. Then it gets plus plussed, and so it becomes 1, which is less than 4. And so that's the first time I printed a question mark. Then i becomes 1 next. I print another one. i is now 1. I do another. i is now 2. I do another. i is now three, which is not consistent with the number of fingers I'm holding up because I started at 0. But once i becomes 3, and therefore I've already printed my four question marks, the next value i is gonna take on is 4 itself. Is 4 less than 4? No, so I never get a chance to print another question mark or put up another finger. And honestly, this is a waste of intellectual capacity to think through, OK, how many numbers are between 0 and 4? We could have-- like most of us in this room just think-- could have just done i is less than or equal to 4, and that, too, would have worked. This is even more clear, perhaps. You start at 1, and you count up to and through the number 4. And that will give me four fingers, as well. Why do we start counting at 0? It's kind of just because, but more technically it's because it's easy to start counting with all 0 bits per our first lecture. So it's just a habit. And it's fine if you're more comfortable this way. But before long, get into this habit just because everyone else does it this way. OK, so now let's go ahead and print this out. Make mario1. ./mario1. Ah, still that bug. OK, is this gonna fix this? Why not? Yeah, that's gonna do question mark, new line, question mark, new line. That's not right. So what line number should the backslash n really go on or between? AUDIENCE: It should go outside the for loop. DAVID MALAN: OK, so it should go outside the for loop. So specifically-- I saw a hand in back, too. What line number? Yeah. AUDIENCE: Eight and nine? DAVID MALAN: Yeah, so between eight and nine. There's no room there at the moment. That's no big deal. We'll just hit Enter, printf. And I can certainly just do a single backslash n, even with no words to the left of it. That, too, is OK. Let me recompile this. And honestly, if you get bored retyping the same commands, know that you can also hit up and down on your keyboard, and it will go through your history, so to speak. And that, too, over time will start to save you time. So there's make mario1. Enter. ./mario1, or I could just scroll back up as I did before. And now I get those four question marks. But now let's actually create this a little more interestingly, as follows. Let me go ahead and not just hard-code 4 into this program. Let me make one more version of Mario, call it mario2.c, and this time actually get some user input. How about I do int n because is like a number, and it's just common convention to call it that. get_int, and then I can say number, semi-colon. And now instead of hard- coding 4, why don't I just put n there, which I can certainly do? So let me go ahead now and run make mario2. Uh-oh. Error. Yeah, I forgot cs50.h. So I have to go back up here. I'll just do a quick copy-paste, and then change the word, cs50.h. Now I'm gonna clear my screen. And to clear your screen, you can hit Control-L, for instance, which will just keep fewer characters on the screen for us. Make mario2. That worked. I didn't need to worry about the -lcs50 because, again, make does that for me. That's one of the features. And now I can do make mario2. Number. How many question marks do we want? AUDIENCE: Seven. DAVID MALAN: I heard seven first. And now we have seven question marks. And it's not necessarily gonna look super pretty. If I do 700, now I'm gonna get a whole lot. But look how quickly it did that for me. And so we have this power now of loops. So that's good, but you know what? Let's see, what about this? What about -50 question marks? Is -50 an int? AUDIENCE: Yes. DAVID MALAN: It is. So we will get it for you via the get_int That's not really logically what we want. So think about it. On line seven if n equals -50, how many times will the for loop execute? AUDIENCE: None. DAVID MALAN: Why none? AUDIENCE: Because 0's greater than. DAVID MALAN: Yeah, because 0 is in this case greater than -50. So that condition never lets the loop actually proceed logically. So we're kind of OK. Nothing seems to happen. I get this sort of ugly blank line. And maybe that's arguably a bug. But at least it didn't freak out the computer and just kind of print things infinitely many times, as could actually happen. So let me go ahead and-- actually, at the risk of losing control over my computer, let's go ahead and change the logic. Suppose I change the less than to a greater than. And we initialize n to -50. And now, is 0 greater than -50? AUDIENCE: Yes. DAVID MALAN: Yes. And it's gonna be that way for a really long time, most likely, even as you increment it. And so let me go ahead and do make mario2 and then hold my breath and do -50. And even the internet and the computer can't really keep up. And that's why you're just kind of seeing it bursty like this. We're sending thousands, tens of thousands, millions, ultimately, of question marks across the screen. And that, too, you might do accidentally. And so just as I did, you can hit the secret keystroke, which usually works, which is Control-C for cancel. And that will stop a program in the window from running. All right. So I've gone ahead now and implemented kind of a very weak approximation of this. So that's great. Let's now take a step up and consider not just this construct, but if we fast forward in the game, to this part of the screen, now maybe we have a vertical block, as well. And let's just consider for a moment what about my code needs to change if I want to print three or maybe any number of vertical blocks? Fundamentally, how do I want to change the code? How do I want to change the code? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, I just need a line break, where I accidentally almost did earlier. But in this case, it would be a good thing. So let me go over there, and let me just quickly make mario3 by starting at the same point. So mario3.c. And then let me go down here and change this as follows. Let's make this i is less than n, the way it's supposed to be and just so that if we upload these later, I don't forget. And now I'll do a hashtag just because it looks more brick-like. It's not one of those coin things. And now I do this here. I don't think I need this anymore. So let me save that and run make mario3. Seems to be OK compiling-wise. mario3, number 3. And I get three of those. And now I could do maybe five of those. That works, and so forth. So there's still an opportunity for improvement here. In case I want to pester the user to actually cooperate such that if he or she types in -50, I don't want to just quit. I want to yell at them or somehow give them feedback and say, give me what I asked for. How do I continue to pester a user again and again and again until he or she actually gives me the value I want? I'm sorry? AUDIENCE: While. DAVID MALAN: While. So there's different looping constructs. While-- and it turns out we could use while or even for, or there's another one, as well. And let's consider how we might do this. It turns out that when you want to get user input from someone, you could use for. You could use while. But you'll find that it's a little annoying to use those constructs. Let me just jump to the better way first so that we see one other way. It turns out if you in a program want to do something at least one time, and maybe some more times, you could use for or while. But it's actually a little more straightforward to literally just do it while something is true. Now, this is just a placeholder. Let me start to fill in some logic here. So I want to do the following-- do the following while what? If the user does not give me a positive number, I want to prompt him or her again. The curly braces on lines seven and nine at the moment connote exactly that. Do this, do this, do this while line 10 is true. So what Boolean expression, if you will, do I want to type in the parentheses here on line 10 to express the fact keep doing this until the number is positive? Yeah? AUDIENCE: While n is greater than 0. DAVID MALAN: So while n is greater than 0, keep doing-- which one? AUDIENCE: Less than. DAVID MALAN: I heard less than. OK, so let's rethink this. So while n is less than 0, ask the user for a number. Ask the user for a number. Ask the user for a number. And you know what? This is going to just confuse the heck out of them. Let's be even more clear with our prompt. Give me a positive number. But keep prompting him or her until we actually get a positive number. Now, if we really want to be nit-picky, it's actually not even less than. We're so close. AUDIENCE: Less than or equal to. DAVID MALAN: It's less than or equal to. Unfortunately, I don't really remember having a key on my keyboard that's got, like, an angled bracket and then a line under it like you might write in math class. So there's a way to do that nonetheless on your keyboard. I actually just do them side by side. This is less than or equal to. This would be greater than or equal to. And so now just get comfortable reading these things left to right. There's no special symbol like you would have in a math book or a homework assignment on paper. So this, I think, says the right thing. Do this while n is less than or equal to 0, which is, of course, not positive. And then down here, the rest of my code, I think, can stay the same. I just have a block of code up here now that's doing something. And you know what? This is where comments start to get useful. Prompt user for a positive number. And now down here, print out that many bricks. So it's kind of obvious if you just read through the code what I just said. But this helps you if you sort of sleep on it and wake up, and you want to remember, why did I do this? Why did I do that? It helps the reader of your code, a colleague, a teaching fellow, and so forth. That's how you kind of start to add comments to your code. Unfortunately, there's a bug, and we're about to hit it. So let me try. Let me go ahead and make mario3. Oh, my god. More errors than I have lines of code, it seems. And this one's weird. Error-- unused variable n. And now let me dive in deeper to these error messages just so you start to notice little clues. So over here on the left is, of course, the filename, as you might have noticed-- mario3.c Then there's a colon, and then a number, and then a colon and another number. Turns out this is just a very succinct way of saying that in mario3.c on line nine at character or column, left to right, 13, you've got a problem, at least the compiler thinks. So generally, the character is kind of sort of helpful. It's really the line number that draws your attention to the right place. Somehow, this is buggy. And specifically, the bug is that I have an unused variable n. And then very inexplicably, on line 11, now I have a use of n. So it's unused here, but it's used here. And somehow, the computer doesn't like this. Why might this be? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, that's the trick here. So it's a little different from Scratch, where when you make a variable, you can just use it anywhere you want. In C and some other languages, variables only exist in what's called a certain scope. And a scope you can generally think of as just the most recently opened and closed curly braces. So what does that imply here? Well, on line nine, I am on the left-hand side declaring a variable. Hey, computer, give me a variable called n. And it's gonna store an int. That's the story we keep telling. But the problem is I am doing that in between lines 8 and 10, curly braces. And I claimed today that that means, kind of like Scratch has the hug the puzzle pieces, in C, variables are treated a little different. If you declare a variable in here, it only exists in here, and you can't use it down here in your code. And so this would kind of seem to be a catch-22. I need a variable. And so I can declare it. But I can't declare the variable there if I want to use it later. It doesn't really seem to be a good situation. So just logically, even if you've never programmed before, if the fundamental problem is that this variable exists only in that scope of the curly braces, how intuitively could we solve this? Yeah. AUDIENCE: Move the curly brace? DAVID MALAN: Remove the curly-- oh, move the curly braces. Yeah, we could move the curly braces, which is essentially the idea. The catch is the do-while loop really kind of needs them. At least in generally cases, you need those curly braces. But you know what? It's been a while since I typed them. But I do have another pair of curly braces that are sort of outside of, so to speak, my inner curly braces. So I have another scope here that's essentially the whole function called main. So what if I somehow declare my variable out there. And indeed, I can. I can go to, like, line seven-- or even higher, but generally you want to keep it as close to where you care about it as possible. I can type int n. And I don't think I want to prompt the user here because then I'm going to create the same problem as before, where I'm just prompting him or her once. I want that prompt in a loop, again and again and again, potentially. So that's OK. We've not seen this before, but you can declare a variable, and then do nothing with it yet. Just say, hey, computer, give me a variable. I'll deal with this later, just like in Scratch. You declare a variable if you did, and then you deal with it later as you want. Now, this would be a bug still. I can't say, hey, computer, give me a variable n. And then, oh, by the way, give me another variable n. So all I have to do to fix this issue is just don't redeclare it. Just use it. So line seven says, hey, computer, give me a variable called n that's going to store an int. Line 10, same story as always except it's slightly shorter. On the left-hand side, it says, here's my variable. Right-hand side says, here's a value we got from the user. Put it from right to left. And so now because n is declared or created on line seven, it exists within the scope of these outermost curly braces. And now I can use it kind of anywhere I want, including on line 12, which is great, and, most importantly, on line 15. So let me go ahead and save this and do make mario3, hold my breath. Good, it actually worked. ./mario3. Positive number. Nuh-uh. I'm gonna give you -1. OK. I'm gonna give you zero. All right, fine. I'll give you 3. And now it actually cooperates. And so the do-while construct is still a loop, and Scratch doesn't really have an analog of this. But the do-while loop is still a loop, but it does something at least once. The difference fundamentally, though, is this-- if I did this, like, while n is less than or equal to 0, if I change this to a while loop, which we saw ever so briefly the other day as just an analog of the forever block in Scratch, if I do this, there's kind of a logical problem. Here's n being declared on line seven. So we're avoiding the scope issue this time from the get- go. But line eight is saying while n is less than or equal to zero. But what is n at this point? It's not yet defined. And, in fact, as we'll soon see in class, it has some garbage value, typically, some unknown value, remnants of whatever the computer used that RAM or memory for in the past. So this is literally undefined behavior, it would seem. I don't know if the loop's gonna execute or not because I don't know what's in n. So you could hack around this, so to speak. Hacking generally means kind of sort of figuring out a solution to a problem that might not be the cleanest. And OK, let me just initialize this to, like, -1,000 because I know that's less than or equal to zero. So it's a hack in that it fixes the logical problem because now on line eight, is -1,000 less than or equal to 0? It is. So now my loop will execute at least once, and it will then change the value of n. But what the heck is -1,000 coming from? These kind of inelegant solutions would be horrible, horrible design, even though it logically gets the job done and it's correct. Bad, bad design. And so that's why we started with a better design, with a do-while loop. But you'll find there's many different ways to do things. And you might not, certainly, in problem set one or two do things always the right or best way the first time. And that's OK because with practice and experience, you'll begin to see patterns with which you can solve these same kinds of problems. Any questions on these approximations of Mario? Well, let me do one last one, one last one involving Mario, and kind of like this. I spent way too much time looking for parts of Mario that kind of painted these pictures. And I found this, these additional bricks underground in the fire level. And suppose I wanted to print, like, a cube of hashtags, so not just a horizontal line, not just a vertical line, but kind of sort of both together. And indeed, you can think of these bricks as exactly that. It's like hashtag, hashtag, hashtag, hashtag, hashtag, hashtag, hashtag, hashtag, and so forth, kind of like an old-school typewriter, printing one line at a time. And if you even remember typewriters, you can actually think of computers and printf as behaving very similarly. You can print something, then do the backslash n. Print something else, do the backslash n. So what is a square like this on the screen? Well, it's really just the process of, like, painting the screen, if you will, from left to right, moving down, left to right, moving down. And now do we do this? Well, what was the type of code we used in order to do something again and again and again? The for loop was the first one. We could use other constructs, but I'm going to go ahead and use the for loop again. Let me save this as mario4, as our fourth and final example. I'm gonna keep this code up here because I still want to prompt the user for some number of blocks, a positive number at that. And now I don't want to just do this, but let me just see where I left off. Let me go ahead and make mario4. ./mario4. And let's do, like, a 5-by-5 block. OK, that's not. That's just a column. So I've got to do a little more. Well, it turns out that just like in Scratch, I can take one idea and kind of nest it inside of another. Let me go ahead and do this. How about inside of my for loop that's going from i to n, let me do another one for int-- and I don't want to reuse i because I feel like if I use i in two places, something's going to get messed up. So I'm gonna go with the next one alphabetically, j, which is actually pretty common. So int j gets 0. j is less than n. And j plus plus. And now in here, I'm gonna put that brick. I think I need to get rid of this here. And let's see now what happens. So I've got a for loop inside of a for loop. If I go ahead and do make mario4, Enter, code is compilable. ./mario4. Let's type in 5. Hmm. I think that's actually, like, 25, if I really count it out. That's not what I wanted. I wanted a square. So what's obviously missing aesthetically? A new line. But I kind of thought that doesn't go here, right? Because if I do this-- just real quick teaser. If I rerun mario4 after making that change, now I've just made the opposite problem. So what needs to change? Yeah. Oh, just scratching? OK. What needs to change? Yeah, over here. AUDIENCE: Another line with a printf and a backslash n. DAVID MALAN: Yeah, and what line would you propose the printf with the backslash n? AUDIENCE: 21? DAVID MALAN: Sorry? AUDIENCE: 21. DAVID MALAN: 21. So above or below it? AUDIENCE: Above it. DAVID MALAN: Above it. OK, so let me go there. So let me go ahead and printf backslash n. And now let's see. So make mario4. ./mario4, Enter. 5. Beautiful, beautiful. It's not quite a square on the screen because the hashtags are a little more vertical than they are wide. But that's OK. We've built this sort of approximation of that level, too. And now, just for good measure, let me just think about-- this is kind of an oversimplification, print out that many bricks. So print out this many rows or columns on the outside? And then in here, where we're going with this, print out this many-- what should my comment here be on the top? Top on is rows? And then down here, this should be columns. And it is because on the outermost loop, you've got i. And it's starting at 0, and it's eventually going to 5. But whenever i is 0, at the beginning, it's like the cursor is in the top left-hand location by default on the screen. And then you've got this nested loop, which says, oh, by the way, do the following five times. What are you doing five times? Hashtag, hashtag, hashtag, hashtag, hashtag. Then a new line. Then i becomes 1. So it's like moving over-- sorry. Then i becomes 1, which means you're now on the second row because you've printed out one of those newline characters. So here, too, this is where comments would be helpful because, frankly, even I had to think about that. And you don't want to waste time thinking about code you've already written. Just give yourself the answer to why you made past decisions as in a case like this. All right. So suppose something's going wrong. And, in fact, we already solved the problem of, like, a lot of hashtags going this way and a lot of hashtags going that way. But suppose you want to wrap your mind around what your code is actually doing. It turns out that we have two other tools we can use. It turns out we have in the CS50 library a function that's almost identical to printf except we called eprintf for, like, error printf just to help you see what's going on inside of your code. And you should use it as follows. If you kind of want to wrap your mind more clearly around what your own code is doing or, for that matter, even an example for class that you downloaded, you can add, certainly, your own lines of this-- like, "hello there. I'm at home playing with this code" or something like that, right? So something nonsensical, but at least now when you see that sentence on the screen, you know on what line the computer was executing your program. So you can be a little more methodical than this. And with eprintf, notice we can do the following. I'm going to change this to just eprintf, and it works the same. And I'm going to go ahead and do this-- "about to prompt user for a number." I just want to provide an explicit note to myself, temporarily, what should be happening here. And let me see now what happens. If I do make mario4. OK. ./mario4. Ah. I get a little ugly output, but it's just diagnostic. It's temporary. It says mario4.c on line 10 is giving the following message-- "about to prompt user for a number." That's just a note to self so that I'm comfortable understanding the flow or the structure of my program. I can still interact with it. Let's type in something like -1. And what should I see next on the screen if I type -1? Yeah, another prompt. "About to prompt user for number." So it's just like a sanity check. If you think something's going to happen, tell yourself that it should in your code, and make sure you see what you expect to see. And then once you're sure your code is good, then don't submit it with this because this is not correct per the specification. You can just get rid of it at that point. But frankly, that gets tedious very quickly. Oh, and how do I kill my program if I don't want to keep playing? Control-C will terminate the program. There's one other tool, perhaps the most powerful. And I can't stress stress this enough. Get into the habit of using this as needed early on. Even if it takes you an extra 10 minutes, half hour to play with it, it will save you, potentially, hours over the course of the semester. And that is a program called a debugger. So a debugger is a program that helps you remove bugs or mistakes from a program. And it works like this. I'm going to go ahead and recompile mario4. And now I would normally run it, of course, with ./mario4. But suppose I have a bug, and I really want to understand what's going on. I'm going to do the following. You'll notice that all of my examples thus far have line numbers in the so-called gutter of the program, left-hand side. And it turns out you can actually click to the left of those numbers at, like, this point here. And you can put a red dot. This shall be known as what's called a breakpoint. This is like a little stop sign, only for yourself, that says, hey, computer, pause my program here, or really stop my program here, like a stop sign, temporarily. And let me, the human, go at human speed, not, like, billions of things per second speed. And by this, I mean the following. I'm going to now run not mario4 but debug50 space mario4, which, again, is a program we wrote that invokes or starts the IDE's built-in debugger. So notice magically this right-hand panel just popped out. And it's actually always been there. It's always said "Debugger," and it just happened to open that window for me automatically. And let's see what's going on. There's a lot of words, but we're familiar with many of them already. Notice that down here is the word local variables. And then there's kind of a table here. And it's not very big because I only have one local variable. And at this point in the story, my variable n happens-- I got lucky. It has a default value, it would seem, of 0. I shouldn't rely on that. But it's just so early in my program that it seems safe-- well rather, it's so early in my program that it happened to have the value 0 in it for our purposes today. And it's of type int. But what's cool now is the following. Now notice that my program is effectively paused on line seven, or, specifically, line 10, which is the first interesting line. That's why it's highlighted in yellow. And what's cool here is this. Up here in the top right, you have a play button which will just say, play the rest of my program. Just let it go through without pausing. Or, if I hover over this thing, you can step over this line, which means, hey, computer, execute this line, but at my human pace, just one line of code at a time. If you're really curious, you can step into that line of code, but more on that in just a moment. Meanwhile, this is step out, which is if we've actually dived in deeper. So what do I mean by all of this? So I'm currently paused on line 10, which was the first interesting line of code in my program, so far as the debugger is concerned. I'm going to go ahead at top right, and I'm going to go ahead and click Step Over. And notice my terminal window is now prompting me for a number. Why? Well because I've stepped over the get int line, which means execute it. So let me go ahead and type in that number. Let me go ahead and type in -50, Enter. And keep an eye on the variable on the right-hand side. Notice now in the debugger, even without printing it with printf or eprintf, I can see that n has a value of -50. It's just a sanity check, so to speak. I can see what it is to be sure it's consistent with my expectations. All right. That's not right, so let me go ahead and step over. And notice the yellow line moved because it's looping. You can literally see what I keep doing with my hand. Let me do it again. OK, positive number. I'm going to cooperate this time. 42, Enter. Notice at the right-hand side, the value n is indeed 42. And notice the yellow line, if I keep stepping, is about to jump to the next interesting line of code. And if I keep doing this, keep doing this, watch what's about to happen in the blue terminal window at the bottom. There's the first hashtag. There's the second hashtag. So the sort of fake animation I did the other day with just my slides, and what I try to do verbally and with my hand going back and forth, you can now see much more methodically. So even if it's a simple program, and even if it's code you wrote, you can really see step by step what it is your program's doing. And maybe it's not doing what you expect. And if it's not, you'll see it visually. All right. Now I'm just gonna go ahead and say, OK, print the rest of the thing. So I hit Play. You see that the GDB, the GNU Debugger, server is exiting. It's just quitting. And now I'm back at my prompt, and the debugger goes away. So do not undervalue those particular tools. So before we forge ahead, I thought I'd introduce Abhishek here, who you might have seen on the internet just a couple months ago. He kind of went viral. He's a recent grad from NYU. And he did this extraordinary thing. He took a device called the Microsoft Hololens, which is an augmented reality device that puts sort of a goofy looking screen in front of your eyes. But then it projects images in front of your eyes. And it's really cool in that much like an Android phone or an iPhone these days, it knows where you are in a three-dimensional space. And what Abhishek actually did was he went to a very three-dimensional space, Central Park in Manhattan. And he had before that spent days recreating "Super Mario Bros." in augmented reality by recreating one of those maps to which I alluded earlier. And the end result-- and I'll show you just a glimpse of it, and we'll put it on the course's website for you to see later in detail-- was this, which was pretty mind- blowing and a wonderful application of computer science to the real world, literally. [VIDEO PLAYBACK] - Hi. I'm Abhishek. And I recreated the iconic first level of "Super Mario Bros." as a first-person, life-size, augmented-reality game that I'm now going to play as Mario. [MUSIC, "SUPER MARIO BROS. THEME"] [END PLAYBACK] DAVID MALAN: Abhishek gave a tech talk in CS50 a couple of months ago. And the funniest part, if you really look closely-- and it is Manhattan-- is some people look at him. But a lot of New Yorkers don't even look twice at what he's doing. Let's go ahead here and take a five-minute break. And when we come back, we'll begin to look at the world of cryptography. So we are back. And, of course, there are more functions than just printf. And we've seen a glimpse of these by way of the CS50 library. And there's many, many, many, many more that come with C itself and that other people around the world have written over the years. But implied in each of these CS50 functions, notice, are these key words like string and int and float-- which we talked about the other day, too-- long, and long long, and double, as we saw the other day, too. So it turns out that C, to be clear, has what are called data types. And we glimpsed this the other day. Data types specify what type of data you can put inside of a variable. And that's what's different from Scratch, too. In C and a few other languages, too, you have to decide in advance as the programmer what kind of data are you going to put in this variable so that the computer-- or, really, the compiler-- knows. And so the compiler knows how to deal with it for you. Well, it turns out that if you want to print these things out, printf also comes with certain format codes. And we've seen %s for strings and %i for integers. And there's a bunch of others, too. Perhaps the most common would be these, just so that you've seen them-- %f for float. We saw that the other day. %lld for a long, long decimal number. That's one I often have to look up myself. And then there's even more of those, too. So just realize that as you're getting input from the users, whether for problem sets or any other purposes, realize that sometimes you have to check the manual or the documentation, so to speak, for functions that you're using. And so that you know where to turn for those kinds of things, let me just introduce one thing real quick. And you'll see more of this in super sections and sections and beyond. If you forget, for instance, how certain functions work, you can actually type the following-- "man get_string," where man stands for manual. And this is kind of an old-school command on Unix and Linux computers that have this text-based keyboard environment. And you'll see pretty much a standard, structured user's manual for the function in which you're interested. So if you forget what we talked about in class or you're not really sure how else you can use it, and the function is something like get_string, you can simply read about it here. But sometimes, frankly, it's going to look a little arcane. I mean, we have not talked about what some of these symbols mean-- the ..., the word const, the asterisk that I've highlighted on the screen. So frankly, sometimes you will find the man pages, as they're called-- the manual pages-- just confusing unto themselves, which is a nasty situation to be in. If you're already confused, and the documentation's not helping, you of need a third option. And so if you go to CS50's website, you'll actually find that there's a link to a tool that the staff has created over the years called CS50 Reference. This is a more user-friendly version of those same man pages, where we've gone through and sort of translated the very arcane English into less comfortable English, if you will. So if over here I scroll down to, say, printf-- or, rather, let me just search for it-- I can see printf here. It's inside of this header file, this h file on the system. And now I can actually read about it here. And notice at top right, checked is the Less Comfortable box, which means, hey, show me the language the TFs came up with as opposed to the default language. But it, too, is meant to be a training wheel. So if and when you're ready to sort of take away some of those simplifications, you can uncheck that box and now see the much more verbose technical version that you would actually see in the real world. So keep in mind those kinds of things, too, especially if it feels like we go through things quickly in class, which we do, and you need to lean on something authoritative thereafter. But let's tease apart what actually a string is. Let me go ahead and start actually, with Stelios here. So Stelios, one of our head TAs in New Haven, has this name here. And I've written it as a string, S-T-E-L-I-O-S. But I've kind of drawn boxes, deliberately, around his name to capture the fact that this thing we call string, like "Stelios," is actually not really a string only. It's really like an abstraction for something a little lower-level, which is a character after a character after a character, and so forth. And so here, too, we see an example of an abstraction. It's not that much fun to call Stelios S-T-E-L-I-O-S. We call him Stelios. But we, in languages like C, would call that construct a string or, more technically, a sequence of characters. But it's a string. It's a nice abstraction. It's a nice simplification. But it turns out there's an opportunity here now to see how characters and numbers interrelate in a computer and see how powerful computer programs and software are that we ourselves can write. But first, how do we access individual characters in a name? I can easily get Stelios's name using the function get_string, as we've seen, just like Sam did from the audience the other day. But how do I actually get at, like, the S or the T or the E? Or if maybe he makes a typo or maybe he, like, doesn't type it very neatly, how do I capitalize his name or sort of clean up his user input like websites today very commonly do? Well, let me go ahead and open up CS50 IDE again and just do a pretty simple example that this time involves strings. Let me go ahead and create a new file. And I'm going to call this file string0.c. And I'm going to go ahead now and write a short program-- come on-- once I've lost control over my terminal window. Now I've lost control of my menu. This is my own fault for-- oh, here we go. Well, this is gonna look great. Very inspiring here. Where'd it go? Oh, oh. Here. OK. That's an example of bad design, so we will fix that. And now I see that I've misspelled string as strig. So we're just gonna-- no one on the internet will ever know the following happened. OK, so string0-- voila. Here we go. All right. So string0.c, and I'm gonna whip up a really quick program here as follows. So int main void. And now string s gets get_string. And I'm just gonna ask for the user's input in this way. And now I'm going to go ahead and print out-- how about just say the word output here. And just to be nice and tidy, let me put a couple of spaces here in anticipation. And now let me go ahead and do this-- on line five, my intention is to get, like, Stelios's name from him or whoever is playing this game. But now I want to go ahead and not just print out, like, hello Stelios, and plug in his value s, which we've been doing. I want to do this character at a time. And doing something one at a time kind of suggests a loop. And indeed, I can do that. So I'm going to do for int i gets 0, i is less than however long his name is, and then i plus plus. And now I can introduce one other trick that you can kind of glimpse ever so quickly from the screen I had up before. It turns out that %c is the placeholder for a character. Perhaps no surprise. But the catch is I only have access to s, the whole thing, the string s. But it turns out there's a new piece of syntax here. And as is kind of sort of implied by our having used boxes to flank Stelios's letters of his name there, turns out that the equivalent in C is to kind of sort of do the same, use a box of characters, by using the square brackets, which you might not often use on your keyboard. On a US keyboard, they're often just above the Enter key. And here I can go ahead and type in s[i]. And so to speak, this is going to print the i'th character, if you will, of Stelios's name. So i is going to start at 0. And I keep doing plus plus, plus plus, plus plus. And using the square bracket notation, so to speak, I can dive into the individual letters in his name in this case. So when I run this, what's going to be the net effect? Let me go ahead and make string0. Huh. OK, that is not valid C code, however long his name is. So I have a problem to solve here. How do I actually get the length of his name? Well I can kind of cheat. OK, so one, two, three, four, five, six, seven. All right. So we can just write this program as follows-- 7. But this should rub you the wrong way. Why is this not a good solution to the problem? Yeah? AUDIENCE: Because it's not changeable. DAVID MALAN: Exactly. It's not changeable. I have this dynamism of get_string to get Stelios's name. But seven is not going to be true of all the humans who might use this program. I need something dynamic. Well, it turns out there is a function for that. I can call strlen, for string length, pass in as input the variable whose length I want to get, and that will return to me a number, which will be, in this case, it would seem, 7. But it's going to be dynamic. So if I type in, like, David, that should return 5, hopefully, and any number for any number of other humans engaging in this. So let me go ahead now and try again. Make string0. A lot of errors. And "use of undeclared identifier string." Wait a minute. We've seen this before. How did I solve this last time? Yeah? What's up above missing? AUDIENCE: The libraries. DAVID MALAN: Yeah, the libraries or the header files, so to speak, for the libraries. So I need to include, I'm pretty, sure at least stdio.h for printf. I need to include cs50.h for get_string. And we're almost there. Let me see if that's enough. Make string0. Oh, Implicitly declaring library functions strlen with type-- I don't really know what that is. But there's kind of an answer hinted there-- include the header string.h and so forth. So turns out this is true. And there's different ways to know this. If I actually go back to reference.cs50.net and do strlen, there's that function. Let me go back to the less comfortable-- whoops-- to the less comfortable version. Notice that under synopsis of a man page or reference.cs50.net is always a quick summary of how you use it. So just the prototype of the function that gives you a sense of what it is-- size_t is essentially equivalent to an int, just saying the size of something as a number. But include string.h is the ingredient I wanted. So let me go ahead and copy that. Let me go back to the IDE. I'm gonna be a little nit-picky, and I'm just gonna keep things alphabetical at the top. But that's not strictly necessary. It just makes it easier to skim later on when the list gets long. Make string0. Seems good to go. ./string0. Inputs. Now I'm going to go ahead and type in Stelios's name. And I got his output, as well. Now, that was a lot of unnecessary work to print his name. I could have just used %s. But now I can make modifications. What if I wanted to print it one per line? I can add that. I can make the program again, rerun it, and type in his name, and now I get it one per line. It's a little ugly. Like, now it says output s. But that's just an aesthetic bug. I could go in and fix that. But now I have control over the individual characters in his actual name. So that would seem to be progress in some form. But if I now have access to the individual letters, we can kind of come full circle from the very first lecture where we talked about zeros and ones, and then numbers, and then letters, and now, in turn, words, otherwise known as strings, by way of a topic called typecasting. Types, of course, are the types of variables we've been talking about. Casting means to convert from one to the other. And you might recall from the first lecture that capital A was the number we know in decimal as 65 and whatever pattern of zeros and ones that is. Capital B is 66, and so forth. So can I see that now for the first time? Well it turns out I can. Let me go back to the IDE. And let me go ahead and create a new program called ascii0.c. and ASCII, again is just the standard. It's an acronym, American Standard Code for Information Interchange, which maps letters to numbers and numbers to letters. So let me go ahead now and whip up a quick program. Include stdio.h for printf. Int main void. And then let me go here and do the following. You know what? I'm gonna just go ahead and print out, let's say, string s gets get_string. Let's just ask for someone's name. And then let me go ahead and do the following-- for int i gets 0, i is less than the length of that string-- learning from last time-- i plus plus. So this is gonna iterate over the whole string. And now what I want to do is this. Let me go ahead and print out the following. Let me print out the character itself, and then a space, and then how about an integer, and then a new line. And we'll see what this does in just a moment. I want to plug in values for these placeholders. So how do I get at the first character of the name if the string is called s? Yeah, so s[i] for the i'th character. And that's gonna plug in, literally, S-T-E-L if Stelios is the one playing the game. But now I put a comma to plug in a second placeholder. And %i-- you know what I'm gonna do? I'm going to do int in parentheses s[i] semi-colon. So it looks a little cryptic. But let me just remove this for a moment. This is just the same thing twice-- print the i'th character of the name, i'th character of the name. But in parentheses, I'm doing what's called typecasting. I'm taking whatever that is, which is a char or character. And I'm saying, parenthetically, make this an int instead. So if it's capital A, it becomes 65. Capital B, it becomes 66, and so forth. And if I now compile this program after preemptively fixing what would have been a mistake by adding the header-- make ascii0.c Whoops, sorry. Oh, common mistake. Nothing to be done. I'm pretty sure there's something to be done. I need to compile it. What did I do wrong? Yeah, don't put the .c. It's a little counterintuitive, but when you want to make a program, you type the name of the program, not the name of the file. Now, in-- oh, damn it. I almost learned from my mistakes. What am I missing now? AUDIENCE: String.h. DAVID MALAN: String.h. All right. Include string.h. Save it. OK, so let me make ascii0. Good. ./ascii0. And now, Stelios, Enter. And now we see the ASCII codes or the numbers that correspond to the letters in his name. They're pretty big numbers. They're in the 100s now. And that's because they're lowercase. We've previously talked only about capital A, capital B, and so forth. But it turns out that the lowercase letters also have values associated with them, like some of those here, as well. And now it turns out now that I know this, now I can kind of do some low-level stuff that we all take for granted on our phones and websites like when you just type in your name in all lowercase, and the website just fixes it, or if you type in your phone number with parentheses, without parentheses, with dashes, without, the website just kind of fixes it and cleans it up into some cleaner format. We now have kind of the low- level control to do this. I won't type this one out manually just because it's a little longer. Let me go ahead and open it up. And among today's examples in source2, which is on the course's website, is this example here-- capitalize0. So let me make a little more room for this. It's a little longer. But let's just focus on just a couple lines at a time. Here's the beginning of my program main. Here is a line of code where get_string before. I just say, give me the before string. And then I claim, now print the string after making some changes to it. So what am I gonna do? On line 11, I seem to have used the same ideas a moment ago, but with one change, actually. I've done something a little different. Line 11 is very similar to what I've been doing to iterate over the characters in a string. But I did something different, which is what? What looks different now versus what was on the screen a little bit ago? Anyone a little farther back? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, it looks like I'm declaring two variables all in the same breath, so to speak. I have my int i equals zero, and we've done this a bunch of times. But then I have a comma here for the very first time. n equals strlen of s. But if you think about what these building blocks are-- OK, the comma is new, but n is on the left, so that's a variable, apparently. It's probably an int because the word int came before. Equal sign is assignment from right to left. strlen is a function that returns the length of a string. So this would seem to be storing, just to be clear, what number in n? Yeah, the number of characters in whatever the string is the user typed in, "Stelios" or "David" or whatever the name is. So then I have my condition. Then there's the semi-colon, which we have seen. And then there's plus plus. So I claim that this is, in a sense, better design. It's a little more complicated. Like, I typed out more characters. I added another variable. But why might it be smart and good design to have used an extra variable, using a little more space to keep a number around, so as to then simply compare i against n? Why did I jump through these hoops? What do you think? AUDIENCE: Because then it doesn't have to check what n is each time it goes through the loop. DAVID MALAN: Exactly. AUDIENCE: [INAUDIBLE] DAVID MALAN: It doesn't have to check what the length of the string is on every iteration because after all, once I or Stelios or whoever types in their name, it's not going to change. It is D-A-V-I-D or Stelios's name or Maria's name or whoever is playing the game. And so why would you keep calling a function saying, oh, by the way, what's the length? By the way, what's the length? What's the length? Just remember it the first time because odds are it takes a little bit of time to do that computation and actually figure out the length. And so here, we've simply kept that answer around in n and can compare two variables. Meanwhile, here's a pretty big if-else construct. But if we break it down into pieces, it's doing something relatively simple. On line 13, I am asking the question, if the i'th character of s is greater than or equal to a lowercase A, &&, which we haven't seen before. This means logically and. So if it's greater than or equal to a and less than or equal to z-- put another way, if it's between a and z inclusive in lowercase-- what am I doing on line 15, which is super weird- looking? I'm first printing out %c, which is my placeholder. But then I'm printing out the result of s[i] minus whatever lowercase A minus capital A is? I mean, this is just strange now. But let me just point out one clue. Turns out there's a pattern here. And humans did this deliberately. If you can do the arithmetic quickly, how far apart is lowercase A from capital A? It's 32, right? And you could just do the math. 97 minus 65-- oh, 32. How about capital B versus lowercase B? It's 32. 32. It follows this pattern. So this is the say-- and it's sort of proof by example. We're not even seeing all the way to Z, but trust me. 32 is invariant across all of the letters of the alphabet. They're always 32 away. And I could hard-code 32, but that feels a little inelegant. Why don't I instead just arithmetically say whatever the difference is between lowercase A and capital A, and that's all I'm saying in parentheses here. Whatever that numeric difference is in the computer's representation of my numbers, just subtract that difference from the i'th character. Now, what's nice is I kind of sort of should do this first. Like, I should cast the character to an int. But I don't need to be so explicit. The computer knows that characters are integers. And the computer knows that integers are character. There's this equivalence. I don't need to be so verbose as to even say that. It just suffices to let the computer figure it out implicitly that in this context, I'm doing arithmetic on numbers, and then, in the context of printf, I'm displaying that number as characters. Nothing is happening. You're just telling the compiler what context in which to treat these values, numeric or characters. So long story short, what does the effect of these four lines of code have on the characters in the user's input? What does it do? AUDIENCE: [INAUDIBLE] DAVID MALAN: It capitalizes it. Right? It capitalizes it. And that might have been implied, too, by the file name, in full disclosure. But it's how you think about solving the problem of capitalization. Here's the string. Home in on the individual characters. Figure out if they're within a range you want to deal with. And if so, do some kind of mutation of them to change from one value to another. I could have done this a horrible way. I could have had, like, an if-else-if-else-if-else-- I could have, like, 26 conditions checking is the character A? Is it B? Is it C? And if so, make it capital A, capital B, capital C. But that code would have been like this big or bigger. This is now a more algorithmic way of solving the same problem. And if it's not a lowercase letter between A and Z, just print it out. There's no work to be done. Now, just so you've seen it, it doesn't have to even be as verbose as this. In capitalize1.c, which is available also on the course's website, I've made my code a little better designed. I'm now not reinventing as many wheels. I'm standing on the shoulders of smart programmers before me. And I've clearly changed at least one thing. Instead of doing this manual process of comparing against lowercase A and lowercase Z, I'm just punting and using a function which, beautifully, is called is islower, which just literally answers that question. Because another data type in C is not just int and char and float and string in CS50's library, but there's also something called a Boolean. A Boolean, also named after Boole, is similar in spirit to a Boolean expression, true or false. But a Boolean variable is literally just the idea true or false. And so islower you can think of as returning a Boolean value. It returns true or false, yes or no. And the name of this function, therefore is very appropriate. Is lower? That's a yes-no or a true-false question. I don't know how it's implemented. If I really care, I could go to CS50 Reference, or I could use the man command on the IDE. And I could actually check how this thing works. But I do need one takeaway. To use this, I need to use the ctype library. So there's other libraries that we're now just scratching the surface of. And you would only know they exist by reading documentation like that. But you know what? I can go even further. You know what? If some human years ago wrote a function to check if something is lower, what did he or she probably do, as well, for me? AUDIENCE: Uppercase? DAVID MALAN: Yeah, isupper also does exist. Yeah. So spoiler here. So isupper exists. But if they checked if it's lower or is it upper, gonna just go out on a limb. toupper? Yeah. So it turns out there's a function called toupper that converts a letter to uppercase. And indeed, I can now leverage this in my third version of this program as follows. capitalize2.c gets even better designed still, if you will. It's even shorter, fewer lines of code, easier to read, fewer opportunities for bugs. How do I solve it now? I still iterate over each of the characters, but I just blindly call toupper, toupper, toupper on every character because I read the documentation. If you pass a character to toupper that is already uppercase, it just prints it out. Doesn't change it. If you pass in a punctuation symbol, it just passes it through. But if you pass in a lowercase letter, it capitalizes it for you. And so I can now kind of implement-- I can lean on whoever implemented that before me. It could have been me. I could have wrote my own function called toupper But I don't need to because in the world of programming, there exists libraries of code that other people have written for us that we can leverage. Any questions, then, on that? Yeah. AUDIENCE: So this method, you wouldn't be able to [INAUDIBLE].. DAVID MALAN: This would be all of them, yeah. So if I only wanted to capitalize Stelios's first-- the first letter of his name, I probably wouldn't want the loop. I would probably just want to capitalize [0], specifically, of the letters. But I'd want to make sure that his name is at least one character long, lest he just have hit, like, Enter accidentally or maliciously. Absolutely. So let's just dive in to one other detail here as follows. Suppose that I want to actually know what the length of a string is. I know that there exists this function called strlen. But it turns out I can figure out lengths of strings for myself, too. Let me go ahead and write a program called strlen itself. But I'm not allowed in this example to use string length. I'm going to go ahead and include the CS50 library. Let me include stdio.h. Let me go ahead and do int main void. And now let me go ahead here and do string-- bad style-- string s gets get_string. Name. And now let me go ahead and do int n equals 0. Just give me a variable, call it n, set it equal to zero. And then let me go ahead and while I'm not at the end of the string-- also not valid code-- n plus plus. I can use that plus plus trick that we've seen before for i plus plus. And then I'm going to go ahead and print out whatever the value of that counter is because I want in my loop to just count the number of characters in Stelios's name or whoever's name actually ran the program. And just to be clear, this is what's called syntactic sugar, which is a very sexy way of just saying this is shorthand notation for doing this, which is just more boring-looking. This does the exact same thing. It's just a more succinct way of doing it. And you'll see little features of languages like this just to save us humans keystrokes. This, of course, is not a solution to a problem. How do I know I'm at the end of the string? Well, it turns out we need to break the abstraction layer, so to speak, of strings just a little bit. So it turns out that in your computer, we have this piece of hardware-- RAM. And we saw this the other day. And we talked a little bit about the limitations of computers and the finite amount of memory that they have. And if you think about all of the chips on this device-- doesn't matter for today how this works. But just know that there's lots and lots of bytes that can be stored in your computer's memory. And you might have 1 billion bytes, 1 gigabyte, 2 billion bytes, 2 gigabytes. But for our purposes today, just think of this RAM inside of your computer as just a long list of available bytes-- lots of bits, zeroes and ones, that you can change the values of. And maybe it's kind of a grid, so there's lots of bytes horizontally, lots of bytes vertically. We can kind of number them all so that one of the bytes is 0, and the other one way at the bottom is, like, the 2 billionth byte. So just assume that we can number all of the bytes in our computer's memory. Well, it turns out that when you type in Stelios's name, it of course ends with an S. But it would probably be a stupid decision to just look for an S when figuring out the length of someone's name because it's not gonna work on my name. It's not gonna work on Maria's name or any number of other people in the room. So we don't know enough yet about what's going on inside of the computer's memory. It turns out that if you think of this grid now as your computer's RAM, maybe top-left corner is byte zero. The one next to it is byte one, then byte two, then dot, dot, to, byte two billion. So I'm just arbitrarily depicting it as a two- dimensional grid. Turns out we need to know that there's this special character. What C does for us even without our telling it to do, it always puts a secret number at the end of any string the human types in. It's specifically represented as backslash 0. But that's just the special way, like backslash n is special, of saying that is eight zero bits all together. It's a special value, 0. And so now that we have this so-called sentinel value, if you will-- sentinel value means this is just special. The human can't really type this. Like, I can't actually type all zero bits easily on my keyboard because honestly, even if you hit the number zero, that is technically the character 0 because it turns out even numbers on your keyboard map to different integers. But more on that another time. So 00000000 as bits are what that is. And so if I write a program that calls get_string multiple times, and Stelios is the first one to type in his name, it might end up in memory looking like this. But then suppose one other person types in their name, like Maria. Her name is just going to fit in the next available memory, but also be null terminated, so to speak. The sentinel value is also called null, N-U-L. But that's just all zeros. And then if someone else types in his or her name, it's still going to fit in there. So Zamyla, for instance-- it wraps around, but again, this is an arbitrary artist's rendition of my computer's memory. Z-A-M-Y-L-A, backslash 0. And I can keep typing in names until I'm just out of memory. At that point, the program's going to crash, or I'm gonna have an if condition that says too many things in memory. Something's gonna have to stop at that point. So what this means for my implementation, ultimately, is the following. I can now go ahead here and change this silly English to the following. While the n'th character of the string does not equal, quote, unquote, backslash 0. And I'm using single quotes this time because recall from last time that we use single quotes anytime we talk about single characters. We use double quotes any time we're talking about strings. And even though s is a string, s[n] is the n'th or i'th-- doesn't matter what letter we use-- the n'th character. So that's a character. And so we now need to use single quotes. So this is really just doing the following-- it's initializing n to 0. And then it's looking in memory. And it's saying, is this backslash 0? If not, increment n by 1. Is this backslash 0? No. No. No. No. Damn it. No. No. Yes. And at that point, I have 7 fingers up, or n is storing the value 7. That's what my program is going to print out. So now we have a complete program that counts the number of characters in a string. I don't need this program because strlen exists as a function. But it's now a capability to which I have access. Any questions, then, on what a string really is underneath the hood as this sequence of characters with a special null character at the end? Yeah. AUDIENCE: [INAUDIBLE] DAVID MALAN: Ah, good question. What about other data types, if I can rephrase it like ints and floats and so forth? Actually, strings are special. If I scroll back to the list of data types that C has, for instance, most of these are of fixed length. And this is why the compiler needs to know what you're putting in them because the compiler and the computer in turn need to know is it one byte? Is it two bytes? Is it four? Is it eight? How many bytes should I look at? Strings have no predetermined length because, of course, we don't know who's going to type in their name. But an int, turns out, in most systems is always going to be 32 bits or maybe 64 bits, or, equivalently, four bytes or eight bytes because there's a one to eight ratio. A bool is often one byte. It's a little wasteful. Even though technically you need one bit, it's just easier to deal with eight-bit increments. Chars are, by definition, eight bits or one byte. So almost all of the data types are a fixed length. So you don't need to have a special null character. But strings you do. Strings are special. Other questions? All right. So what can we start to do with this? Well, it turns out that this idea of thinking about things that are back-to-back-to-back-to-back- as being individually accessible is actually a very powerful idea. Because up until now, we've just had this list of data types-- bool and float and char and int. It's kind of a short list of very primitive things. But it turns out if you want to write a program that doesn't just keep asking for one name but asks for two people's names or 10 people's names or asks for, as you asked earlier, the name or maybe their house or their dorm or their phone number or their email address-- a whole bunch of different values-- it would be nice to kind of store multiple things together. And one way you can store multiple strings is you could call one string s. You can call the next string t. You could call the next string whatever-- you could just come up with arbitrary names for your strings. But that's going to very quickly devolve. Imagine, like, what the registrar uses here or at Yale to actually keep track of students. They don't have a computer program with thousands of variables inside of it. They probably have a computer program for dealing with course registrations with at least one variable called students. And inside of that students variable can the registrar fit one student, 10 students, thousands of students. It can kind of grow to fill the number of values we actually care about. And C isn't quite as powerful as that. We'll need another language like Python or JavaScript to really get dynamism. But for now, we do have the ability in C to represent multiple things back to back to back to back to back in memory. So not just characters in strings. We can borrow that idea from strings and store, if we really want, student, student, student, student like multiple strings back to back instead of just individual characters. And what that idea is called is an array. An array is a contiguous chunk of memory, something back to back to back-- literally physically next to each other, typically, in the RAM that we've presented as hardware. But it's not just character, character, character. Maybe it's int, int int, int, int or string, string, string, string, string or, more generally, student, student, student, student, student-- multiple things back to back to back. And so now we can actually give you a glimpse of what this thing here is that we keep typing sort of on faith. Int main void literally says that your main programs that you're about to start writing for pset one and beyond will be returning an int, even if you don't do it yourself. They're going to return by default 0, it turns out. And we'll see before long why this is useful for a main function to return a value, even though we humans will rarely, if ever, see that value. But it is interesting to note that main can take input, and not input in the sense of get_int and get_string and so forth. You can actually provide your program with input at the so-called command line. All this time, I've been typing ./mario0, ./mario1, and no words after that. And yet we've shown you clang already, the compiler itself, which can take in, like, -o and then -lcs50, all of these additional key words that somehow influence its behavior. So wouldn't it be nice if I could write a program where I don't prompt the user eventually for his or her name. Let me just let them type their name at the command line and hit Enter once and be done with it, just like clang is just one long command, and you're done with it. There's no prompts. Well, we can do this if we change void to this. And it's a mouthful, but there is an alternative version of main that does not just take zero arguments. That's what the key word void all this time has meant. It just means main takes no input by default. You have to prompt the user explicitly with get_int or get_string or whatever. But there's an alternative second version of main in C that takes two inputs. And you don't have to provide them explicitly. We'll see how to use this in a second. Main can also be handed two inputs. One is an int, and one is an array of strings. The int is the total number of words that the human has typed at their keyboard. The argv, argument vector, by convention, though we could call it anything we want, that is an array of words that the user typed at the prompt before hitting Enter. And so this is useful in the following way. I'm going to go ahead and in today's source code open up an example called argv-- for argument vector-- 0 as follows. In argv0, there's not all that much going on. And if you at least kind of take on faith the concept here, you can perhaps infer what's going on. So I've changed what main looks like on line six, the signature of main, so to speak. And then I'm asking a question. If argv equals equals 2, then print out "Hello, something." Otherwise, just print out the hardcoded "hello, world." So it looks like argv[1] is kind of being treated like we were treating strings a moment ago. But this is the special syntax that's new. If you use square brackets like this, like I've done, with no numbers inside, that's like telling the computer, hey, computer, this variable argv is going to be an array of some length of strings. Why strings? Because string is the word immediately to the left-- string argv0[]. Now, I don't know how the strings are gonna get in there. The computer's gonna do that for me. But it gives me this capability. Let me go ahead and compile this program as follows-- make argv0. ./argv0. Hello, world. Uninteresting. But if I now type in my name at the prompt and hit Enter, now it's dynamic. So what must this mean? Even if the syntax is a little new, we can kind of infer now what this must be doing. Argc happens to stand for argument count. So argc equaling two apparently implies that the human typed two words at the prompt-- the name of the program, and then whatever else he or she typed. Meanwhile, argv-- argument vector-- is the variable that you can use to go get the first word or the second word or, if there are more, the third and fourth words. In fact, if I kind of change this manually, what should probably be, by that logic, in argv[0]? AUDIENCE: [INAUDIBLE] DAVID MALAN: Yeah, the name of the program, right? So let me see. So make argv0. ./argv0 David. Hello-- OK. I mean, it's stupid-looking, but that's all I'm doing. I could be a little bold and say what is in the 100th location of this array or list, as you can also think of it? Make argv0. ./david. Whoa. That is bad. And get used to this because it will start to happen with greater frequency. Segmentation fault is a very cryptic way of saying you touched memory, RAM, that you should not have. And you can kind of think of what this means. So if argv[0]-- let me pull up my picture of an array. If my array looks like this, and argv[0] is here, and that was safe to print, and argv[1] is here. That was safe to print. It was my name. And argv-- what did I do-- 100, it's like way over here. I don't know what's over here. And indeed, touching that memory was very bad. The program crashed. And segmentation fault is an allusion to how computers lay out memory. You've got like a segment of memory here, a segment of memory here, a segment of memory here. Segmentation fault means you touched a chunk of memory that was not yours to use, to change or to even view. So I got lucky, though-- well, I didn't get lucky. I could sometimes see garbage values. Let me be a little more conservative. Let me put [2], which is just one past what I typed in. It's sometimes undefined behavior. I don't know what I'm gonna get. Null. So there's some funky characters there or zeros there. But now you're playing with fire, so to speak. These are logical bugs in my program. But it is OK to check if if argc is two, then it's OK to look at 0 and 1, two things and only two things. Any questions on that? All right. So where, in what domain, is this kind of thing helpful? And there's a couple more examples of argv that you can look at online. Turns out that in the world of cryptography, this stuff really starts to get interesting. So the world of cryptography is all about scrambling information. Maybe back in the day in grade school, you might have passed notes to a friend or a crush that you had in the classroom. And if you were really clever, or your teacher was really adversarial, you might have to encode your message so that you're not just writing, like, "I love you" or whatever. But you instead change all the A's to B's, and all the B's to C's or hopefully something a little more cryptic than that so that the teacher can't just change all the B's to A's and all the C's to B's. But you kind of scrambled the words. But you scrambled the words, perhaps, in such a way that it's reversible by the recipients, the recipient of your encrypted message. So to encrypt information means to convert it into some other format, from what's called plaintext to ciphertext, which sounds really cool, and it's just the scrambled version. But it's not random. It's got to follow a pattern or, if you will, an algorithm so that he or she on the other end can reverse the algorithm and undo it. Now, in the simple example I proposed, A becomes B. B becomes C. What is the secret that you and your crush know? It's probably just the number one. He or she has to just know, if you added 1 to the letters, that they should subtract 1 to the letters. And hopefully they know that if you hit Z, you should probably wrap around to A and not get into a weird punctuation or something like that. So you can keep an algorithm as simple as that. So we can think of cryptography, really, as just an example of problem-solving. You want to send a message from someone, yourself, to someone else, maybe over a very insecure medium like passing a note through the room. And you want only one person to know how to access it. That's like providing inputs, and you want outputs-- your plaintext and your ciphertext-- so that no one can understand it except you and the recipient. So it turns out that cryptography-- there's different forms of it, but perhaps the simplest looks like this. There's two inputs, the plaintext, the message you want to actually send, and then the key, which might be a number like 1 or 2 or 25 or 26. And more than that's probably silly because you're just wrapping around the alphabet even more, so to speak. But the output is going to be something called ciphertext. And when your crush receives this message, he or she really just needs to reverse the process. They have to know the key. Otherwise, they're going to be guessing all day long what your message actually was. But so long as you know the secret in advance, you can do this. Now, of course, there's a gotcha. You have to be on speaking terms with this person you're crushing on because he or she needs to know what the key is in advance. Otherwise, you're just sending them nonsensical values. So that's kind of, too, a catch-22. In order to send a secret message from A to B, A and B need to be able to confer in advance and agree on this secret. But if you need to agree in advance on a secret, why don't you just use that time to send the message directly to the person? Right? So there's this disconnect. And we'll come back to this before long because most of us probably don't know someone who works at, like, amazon.com. And yet when I buy something on Amazon, I've been told all these years that it's secure. It's encrypted. My credit card, my name, and all of that are somehow encrypted between me and Amazon in Seattle or wherever their servers are. But I don't know anyone there. And yet somehow, cryptography still works. So this type of cartography is just one called secret-key cryptography. But there's public-key cryptography and yet other things. And so what you'll find in problem set two in particular is you'll have an opportunity to explore this world, whereby you'll write software that encrypts and then, hopefully, decrypts information and even, if you're among those more comfortable, an opportunity to try writing software that takes passwords that are encrypted-- or, more properly, hashed, so to speak. More on that before long-- and you try to crack those passwords, actually figure out what the passwords actually were. And it all boils down to, ultimately, in the context of C, taking as input a message, like a plaintext, and somehow converting it to ciphertext by manipulating those individual characters, or, if you're the recipient, vice versa. And I like to show a clip from, frankly, a film you can watch, like, literally every hour on the hour around the holidays, "A Christmas Story," because it has an example of a very simple form of cryptography. If you ever saw this movie, this is little Ralphie. And he's really excited because over months or whatever, he saves up and sends in, like, all of these, like, cereal box covers or something like that, and gets back, finally, this secret decoder ring. And the secret decoder ring is kind of a nice mental model to have for the type of cryptography I'm proposing here, this sort of rotational idea-- A becomes B. B becomes C. Because if you imagine a ring that has another ring on the outside, you can kind of line up the A's and Z's, so to speak, differently. And that's what he was saving up for. So I thought we'd take just a moment to look at this clip to inspire one of the problems ahead. [VIDEO PLAYBACK] - Be it known to all and sundry that Ralph Parker is hereby appointed a member of the Little Orphan Annie secret circle and is entitled to all the honors and benefits occurring thereto. Too - Signed Little Orphan Annie! Countersigned Pierre Andre! In ink! Honors and benefits already, at the age of nine. - Let's go overboard! - Come on. Let's get on with it. I don't need all that jazz about smugglers and pirates. - Listen tomorrow night for the concluding adventure of the black pirate ship. Now it's time for Annie's secret message for you members of the secret circle. Remember, kids, only members of Annie's secret circle can decode Annie's secret message. Remember, Annie is depending on you. Set your pins to B2. Here is the message. 12, 11-- - I am in. My first secret meeting. - --14, 11, 18, 16-- - Oh, Pierre was in great voice tonight. I could tell that tonight's message was really important. - --3, 25. That's a message from Annie herself. Remember, don't tell anyone. - 90 seconds later I'm in the only room in the house where a boy of nine could sit in privacy and decode. Aha! B! I went to the next. E. The first word is "be!" S. It was coming easier now. U. - Aw, come on, Ralphie! I got to go! - I'll be right down, Ma! Gee, whiz. - T. O! "Be sure to"-- be sure to what? What was Little Orphan Annie trying to say? "Be sure to" what? - Ralphie, Randy has got to go. Will you please come out? - All right, Ma! I'll be right out! - I was getting closer now. The tension was terrible. What was it? The fate of the planet may hang in the balance. [KNOCKING] - Ralph, Randy's got to go! - I'll be right out, for crying out loud! DAVID MALAN: Gee, almost there! My fingers flew. My mind was a steel trap. Every pore vibrated. It was almost clear! Yes! Yes! Yes! Yes! - "Be sure to drink your Ovaltine." Ovaltine? A crummy commercial? Son of a bitch! [END PLAYBACK] DAVID MALAN: That's it for CS50. We'll see you next time. [APPLAUSE]