Interview Q SAS
Interview Q SAS
Interview Q SAS
Answer: To override the default way in which the DATA step writes observations to output, you can use an OUTPUT statement in the DATA step. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, so that observations are added to a data set only when the explicit OUTPUT statement is executed. Question: What is the function of Stop statement? Answer: Stop statement causes SAS to stop processing the current data step immediately and resume processing statement after the end of current data step. Question : What is the difference between using drop= data set option in data statement and set statement? Answer: If you dont want to process certain variables and you do not want them to appear in the new data set, then specify drop= data set option in the set statement. Whereas If want to process certain variables and do not want them to appear in the new data set, then specify drop= data set option in the data statement. Question: Given an unsorted dataset, how to read the last observation to a new data set? Answer: using end= data set option. For example: data work.calculus; set work.comp end=last; If last; run; Where Calculus is a new data set to be created and Comp is the existing data set last is the temporary variable (initialized to 0) which is set to 1 when the set statement reads the last observation. Question : What is the difference between reading the data from external file and reading the data from existing data set? Answer: The main difference is that while reading an existing data set with the SET statement, SAS retains the values of the variables from one observation to the next. Question: What is the difference between SAS function and procedures? Answer: Functions expects argument value to be supplied across an observation in a SAS data set and procedure expects one variable value per observation. For example: data average ; set temp ; avgtemp = mean( of T1 T24 ) ; run ; Here arguments of mean function are taken across an observation.
proc sort ; by month ; run ; proc means ; by month ; var avgtemp ; run ; Proc means is used to calculate average temperature by month (taking one variable value across an observation). Question: Differnce b/w sum function and using + operator? Answer: SUM function returns the sum of non-missing arguments whereas + operator returns a missing value if any of the arguments are missing. Example: data mydata; input x y z; cards; 33 3 3 24 3 4 24 3 4 .32 23 . 3 54 4 . 35 4 2 ; run; data mydata2; set mydata; a=sum(x,y,z); p=x+y+z; run; In the output, value of p is missing for 3rd, 4th and 5th observation as : ap 39 39 31 31 31 31 5. 26 . 58 . 41 41
Question: What would be the result if all the arguments in SUM function are missing? Answer: a missing value Question: What would be the denominator value used by the mean function if two out of seven arguments are missing? Answer: five Question: Give an example where SAS fails to convert character value to numeric value automatically? Answer: Suppose value of a variable PayRate begins with a dollar sign ($). When SAS tries to automatically convert the values of PayRate to numeric values, the dollar sign blocks the process. The values cannot be converted to numeric values. Therefore, it is always best to include INPUT and PUT functions in your programs when conversions occur. Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithmetic calculation? 1,735.00 Answer: a missing value Question: What would be the resulting numeric value (generated by automatic char to numeric conversion) of a below mentioned character value when used in arithmetic calculation? 1735.00 Answer: 1735 Question: Which SAS statement does not perform automatic conversions in comparisons? Answer: where statement Question: Briefly explain Input and Put function? Answer: Input function Character to numeric conversion- Input(source,informat) put function Numeric to character conversion- put(source,format) Question: What would be the result of following SAS function(given that 31 Dec, 2000 is Sunday)? Weeks = intck (week,31 dec 2000d,01jan2001d); Years = intck (year,31 dec 2000d,01jan2001d); Months = intck (month,31 dec 2000d,01jan2001d); Answer: Weeks=0, Years=1,Months=1 Question: What are the parameters of Scan function?
Answer: scan(argument,n,delimiters) argument specifies the character variable or expression to scan n specifies which word to read delimiters are special characters that must be enclosed in single quotation marks Question: Suppose the variable address stores the following expression: 209 RADCLIFFE ROAD, CENTER CITY, NY, 92716 What would be the result returned by the scan function in the following cases? a=scan(address,3); b=scan(address,3,,'); Answer: a=Road; b=NY Question: What is the length assigned to the target variable by the scan function? Answer: 200 Question: Name few SAS functions? Answer: Scan, Substr, trim, Catx, Index, tranwrd, find, Sum. Question: What is the function of tranwrd function? Answer: TRANWRD function replaces or removes all occurrences of a pattern of characters within a character string. Question: Consider the following SAS Program data finance.earnings; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); end; run; What would be the value of month at the end of data step execution and how many observations would be there? Answer: Value of month would be 13 No. of observations would be 1 Question: Consider the following SAS Program data finance; Amount=1000; Rate=.075/12; do month=1 to 12; Earned+(amount+earned)*(rate); output;
end; run; How many observations would be there at the end of data step execution? Answer: 12 Question: How do you use the do loop if you dont know how many times should you execute the do loop? Answer: we can use do until or do while to specify the condition. Question: What is the difference between do while and do until? Answer: An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop never executes. Whereas DO UNTIL executes at least once. Question: How do you specify number of iterations and specific condition within a single do loop? Answer: data work; do i=1 to 20 until(Sum>=20000); Year+1; Sum+2000; Sum+Sum*.10; end; run; This iterative DO statement enables you to execute the DO loop until Sum is greater than or equal to 20000 or until the DO loop executes 10 times, whichever occurs first. Question: How many data types are there in SAS? Answer: Character, Numeric Question: If a variable contains only numbers, can it be character data type? Also give example Answer: Yes, it depends on how you use the variable Example: ID, Zip are numeric digits and can be character data type. Question: If a variable contains letters or special characters, can it be numeric data type? Answer: No, it must be character data type. Question; What can be the size of largest dataset in SAS? Answer: The number of observations is limited only by computers capacity to handle and store them. Prior to SAS 9.1, SAS data sets could contain up to 32,767 variables. In SAS 9.1, the maximum number of variables in a SAS data set is limited by the resources available on your computer.
Question: Give some example where PROC REPORTs defaults are different than PROC PRINTs defaults? Answer:
No Record Numbers in Proc Report Labels (not var names) used as headers in Proc Report REPORT needs NOWINDOWS option
Question: Give some example where PROC REPORTs defaults are same as PROC PRINTs defaults? Answer:
Question: Highlight the major difference between below two programs: a. data mydat; input ID Age; cards; 2 23 4 45 3 56 9 43 ; run; proc report data = mydat nowd; column ID Age; run; b. data mydat1; input grade $ ID Age; cards; A 2 23 B 4 45 C 3 56 D 9 43 ; run; proc report data = mydat1 nowd; column Grade ID Age; run;
Answer: When all the variables in the input file are numeric, PROC REPORT does a sum as a default.Thus first program generates one record in the list report whereas second generates four records. Question: In the above program, how will you avoid having the sum of numeric variables? Answer: To avoid having the sum of numeric variables, one or more of the input variables must be defined as DISPLAY. Thus we have to use : proc report data = mydat nowd; column ID Age; define ID/display; run; Question: What is the difference between Order and Group variable in proc report? Answer:
If the variable is used as group variable, rows that have the same values are collapsed. Group variables produce list report whereas order variable produces summary report.
Question: Give some ways by which you can define the variables to produce the summary report (using proc report)? Answer: All of the variables in a summary report must be defined as group, analysis, across, or Computed variables. Questions: What are the default statistics for means procedure? Answer: n-count, mean, standard deviation, minimum, and maximum Question: How to limit decimal places for variable using PROC MEANS? Answer: By using MAXDEC= option Question: What is the difference between CLASS statement and BY statement in proc means? Answer:
Unlike CLASS processing, BY processing requires that your data already be sorted or BY group results have a layout that is different from the layout of CLASS group results.
Question: What is the difference between PROC MEANS and PROC Summary? Answer: The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, you must include a PRINT option in the PROC SUMMARY statement. Question: How to specify variables to be processed by the FREQ procedure?
Answer: By using TABLES Statement. Question: Describe CROSSLIST option in TABLES statement? Answer: Adding the CROSSLIST option to TABLES statement displays crosstabulation tables in ODS column format. Question: How to create list output for crosstabulations in proc freq? Answer: To generate list output for crosstabulations, add a slash (/) and the LIST option to the TABLES statement in your PROC FREQ step. TABLES variable-1*variable-2 <* variable-n> / LIST; Question: Proc Means work for ________ variable and Proc FREQ Work for ______ variable? Answer: Numeric, Categorical Question: How can you combine two datasets based on the relative position of rows in each data set; that is, the first observation in one data set is joined with the first observation in the other, and so on? Answer: One to One reading Question: data concat; set a b; run; format of variable Revenue in dataset a is dollar10.2 and format of variable Revenue in dataset b is dollar12.2 What would be the format of Revenue in resulting dataset (concat)? Answer: dollar10.2 Question: If you have two datasets you want to combine them in the manner such that observations in each BY group in each data set in the SET statement are read sequentially, in the order in which the data sets and BY variables are listed then which method of combining datasets will work for this? Answer: Interleaving Question: While match merging two data sets, you cannot use the __________option with indexed data sets because indexes are always stored in ascending order. Answer: Descending Question: I have a dataset concat having variable a b & c. How to rename a b to e & f? Answer: data concat(rename=(a=e b=f)); set concat; run; Question : What is the difference between One to One Merge and Match Merge? Give example also..
Answer: If both data sets in the merge statement are sorted by id(as shown below) and each observation in one data set has a corresponding observation in the other data set, a one-to-one merge is suitable. data mydata1; input id class $; cards; 1 Sa 2 Sd 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 4 Lks ; data mymerge; merge mydata1 mydata2; run; If the observations do not match, then match merging is suitable data mydata1; input id class $; cards; 1 Sa 2 Sd 2 Sp 3 Rd 4 Uj ; data mydata2; input id class1 $; cards; 1 Sac 2 Sdf 3 Rdd 3 Lks 5 Ujf
; data mymerge; merge mydata1 mydata2; by id run; 1>What is the difference between Call Symput and Call SymputX? 2> In how many ways you can create macro variables? 3> Explain the Compilation and Execution phase of Data Step when reading SAS data set and raw data set? 4> What is the difference between multiple Set statement and Single set statement in the data Step? 5> What is the difference between appending and concatenation? 6> What is the difference between Multiple set statement and merge statement in the Data Step? 7> What is the difference between INTCK and INTNX functions ? 8> What is the difference between SQL Passthrough query and libname statement? 9>what is the difference between %STR and %NRSTR in Macros? 10> Which all Macro options you have used? 11> Which all automatic Macro variables you have Used? 12> How to Save Macro permanently? 13> How to delete macro programs? 14>What is the difference between Global Symbol Table and Local Symbol Table?
Questions asked By QUINTILES: 1) 2) 3) 4) 5) 6) 7) 8) 9) Tell me something about yourself or introduce yourself. What is Bio-informatics? (Detail questions on background of a student). How SAS differ from other language like C++, JAVA, and HTML that you have learned? How much you comfortable in SAS? Why you select SAS as a Career? How do you read and write raw data file? Explain SCAN function, INDEX function in detail. How can you calculate age? What is informat and format? (in detail)
10) How many ways to create macro variable? 11) What is difference between %LET and %Global Statement? 12) How do you create multiple macro variables in Proc SQL? 13) What is Macro? 14) Can you write raw data file which is CSV file by using PUT statement? 15) Why we use DATA _NULL_?
16) Questions on multiple SET statements? 17) Which SAS version you have used? 18) Why have changed your domain to SAS? 19) Application of SAS. 20) What will you do if any new software will come after one year? 21) Which steps are used to make up SAS program? 22) What is the type of Macro variable? 23) How can we apply user defined format? 24) How SAS will store date? 25) Value of 01JAN1960. 26) Explain Proc Compare. Why it is used? 27) What is ODS? (Tell me about ODS). 28) Can we create a text file using ODS? How? 29) Explain Proc Import and Proc Export. (In detail) 30) Why we use quit statement in Proc SQL instead of run Statement. 31) What options we used in Proc Sort to store the duplicate observations. 32) Explain Proc Means. 33) Difference between Proc Means and Proc summary. 34) Explain Univariate procedure. What is by default output? 35) Plot option in univariate procedure. 36) Explain Call Symputx and Call symget. 37) How can we delete Macro variables. 38) ID statement in transpose procedure. 39) By and Class statements in Means Procedure. 40) From where you learn SAS. 41) What is Epoch Research Institute (in detail). 42) Why and how do you join this institute? 43) Tell me about your work experience. 44) Have you done any live and Demo Project. 45) Do you have any questions? Or do you have any query?
Frequently asked Advance SAS Interview Questions 1> What is the use of Macros in SAS? 2> What is Autocall Facility? 3> In How many ways we can create Macro variables in the Global Symbol Table? 4>In How many ways we can create Macro variables in the Local Symbol Table? 5> What is the difference between Call Symputx and Call Symput? 6> What is the difference between Keyword parameters and Positional parameters?
7>What is the difference between %NRST and %STR Functions? 8>What is Symget? 9> How to resolve Multiple Ampersand Symbols? 10>Write a program to delete all the User Defined Macro variables form the Global Symbol Table? 11>Explain the Syntax of proc SQL? 12> What is the difference between Sub-queries and SQL Views? 13>What is the difference between Inner join and Outer joins? Explain with examples? 14>How to vertically combine SAS Date Set by using Proc SQL and how it is different from Data Step Concatenation? 15> In How many ways we can create SAS Data set by using proc SQL? 16> How to create Macro variables by Using Proc SQL? 17>How to save a program by using Data step? 18>What is the difference between Joins and Merging? 19> What is Benchmarking in SAS? 20>Explain Indexes? In How many ways we can create Indexes? 21> Explain SAS Views? In How many ways we can Create SAS views?
Frequently Asked Base SAS Interview Questions 1> In how many ways we can combine SAS data set by using Data step? 2> What is the difference between Interleaving and Concatenation? 3>What is the difference between Put Statement and Put Function? 4>What is the difference between Input Statement and Input Function? 5>What is the difference between Infile Statement and File Statement? 6> What is the use of _NULL_ in the Data Step? 7>What is the difference between Proc Means and Proc Summary? 8> How to delete SAS Data Set? 9>How to rename a SAS Data Set? 10>What is the difference between Left Substr and Right Substr Function? 11> What is the difference between Substr and tranward Function? 12> What is the Default output of proc Means? 13> What is a lookup table in SAS? 14>How to create Lookup table in SAS? 15>Name Some Array Functions? 16> What is the difference between Nodup and Nodupkey in Proc Sort? 17>What is the use of By and ID Statements in proc Print? 18> What is the syntaxt of Proc Transpose? 19>Explain Proc Datasets? 20>Explain Proc Copy and Proc Compare with examples?
As a SAS Programmer we are utilizing following resources: 1> CPU TIME 2>NETWORK BANDWIDTH 3>VIRTUAL MEMORY 4>PROGRAMMERS TIME 5>I//O 6>DATA STORAGE SPACE An efficient SAS programmer should always Benchmark the SAS programs and conserve the Resources.
We have a SAS data set work._ALL_. How to see the contents of the SAS data set Work._ALL_? SAS Interview Questions II 1. What has been your most common programming mistake? 2. What is your favorite programming language and why? 3. What is your favorite operating system? Why? 4. Do you observe any coding standards? What is your opinion of them? 5. What percent of your program code is usually original and what percent copied and modified? 6. Have you ever had to follow SOPs or programming guidelines? 7. Which is worse: not testing your programs or not commenting your programs? 8. Name several ways to achieve efficiency in your program. Explain trade-offs. 9. What other SAS products have you used and consider yourself proficient in using? 10. How do you make use of functions? 11. When looking for contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? 12. What do the PUT and INPUT functions do? 13. What do the MOD and INT function do? 14. How might you use MOD and INT on numeric to mimic SUBSTR on character strings? 15. In ARRAY processing, what does the DIM function do? 16. How would you determine the number of missing or non missing values in computations? 17. What is the difference between: x=a+b+c+d; and x=SUM(a,b,c,d);? 18. There is a field containing a date. It needs to be displayed in the format ddmonyy if its before 1975, dd mon ccyy if its after 1985, and as Disco Years if its between 1975 and 1985. How would you accomplish this in data step code? Using only PROC FORMAT. 19. In the following DATA step, what is needed for fraction to print to the log? data _null_; x=1/3; if x=.3333 then put fraction; run; 20. What is the difference between calculating the mean using the mean function and PROC MEANS? 21. Have you ever used Proc Merge? (be prepared for surprising answers..) 22. If you were given several SAS data sets you were unfamiliar with, how would you find out the variable names and formats of each dataset?
23. What SAS PROCs have you used and consider yourself proficient in using? 24. How would you keep SAS from overlaying the a SAS set with its sorted version? 25. In PROC PRINT, can you print only variables that begin with the letter A? 26. What are some differences between PROC SUMMARY and PROC MEANS? 27. Code the tables statement for a single-level (most common) frequency. 28. Code the tables statement to produce a multi-level frequency. 29. Name the option to produce a frequency line items rather that a table. 30. Produce output from a frequency. Restrict the printing of the table. 31. Code a PROC MEANS that shows both summed and averaged output of the data. 32. Code the option that will allow MEANS to include missing numeric data to be included in the report. 33. Code the MEANS to produce output to be used later. 34. Do you use PROC REPORT or PROC TABULATE? Which do you prefer? Explain. 35. What happens in a one-on-one merge? When would you use one? 36. How would you combine 3 or more tables with different structures? 37. What is a problem with merging two data sets that have variables with the same name but different data? 38. When would you choose to MERGE two data sets together and when would you SET two data sets? 39. Which data set is the controlling data set in the MERGE statement? 40. How do the IN= variables improve the capability of a MERGE? 41. Explain the message MERGE HAS ONE OR MORE DATASETS WITH REPEATS OF BY VARIABLES. 42. How would you generate 1000 observations from a normal distribution with a mean of 50 and standard deviation of 20. How would you use PROC CHART to look at the distribution? Describe the shape of the distribution. 43. How do you generate random samples? 44. What is the purpose of the statement DATA _NULL_ ;? 45. What is the pound sign used for in the DATA _NULL_? 46. What would you use the trailing @ sign for? 47. For what purpose(s) would you use the RETURN statement? 48. How would you determine how far down on a page you have printed in order to print out footnotes? 49. What is the purpose of using the N=PS option? 50. What system options would you use to help debug a macro? 51. Describe how you would create a macro variable. 52. How do you identify a macro variable? 53. How do you define the end of a macro? 54. How do you assign a macro variable to a SAS variable? 55. For what purposes have you used SAS macros? 56. What is the difference between %LOCAL and %GLOBAL? 57. How long can a macro variable be? A token? 58. If you use a SYMPUT in a DATA step, when and where can you use the macro variable? 59. What do you code to create a macro? End one? 60. Describe how you would pass data to a macro.
61. You have five data sets that need to be processed identically; how would you simplify that processing with a macro? 62. How would you code a macro statement to produce information on the SAS log? This statement can be coded anywhere. 63. How do you add a number to a macro variable? 64. If you need the value of a variable rather than the variable itself, what would you use to load the value to a macro variable? 65. Can you execute a macro within a macro? Describe. 66. Can you a macro within another macro? If so, how would SAS know where the current macro ended and the new one began? 67. How are parameters passed to a macro? 68. Name statements that are recognized at compile time only? 69. Name statements that are execution only. 70. In the flow of DATA step processing, what is the first action in a typical DATA Step? 71. Name statements that function at both compile and execution time. 72. What is the smallest length of numeric and character variable respectively? 1) Which data functions advances a date time or data/time values by a given interval? A.. INTNX 2) How can call macros within data step? A.. We can call the macro with call-symputx 3) In the flow of data step processing, what is the first action in a typical data step? A.. when you submit a data step , SAS process the data step and then creates a new SAS data set. (Creation of input buffer and PDV) compilation phase and execution phase. 4) How do you identify a macro variable? A.. Ampersand (&) 5) What are SAS/Access and SAS/Connect? A.. SAS/Access only process through the database like oracles, SQL-server, Ms-Access etc. SAS/Connect only use server connections. 6) What is the one statement to set the criteria of data that can be coded in any step? A.. Options statement, Label statements, Keep/Drop Statements. 7) What is the purpose of using the N=PS option? A.. the N=Ps option creates a buffer in memory which is large enough to store PS lines and enables a page to be formatted randomly prior to it being printed. 8) What are the scrubbing procedures in SAS? A.. Proc sort with nodupkey option, because it will eliminate the duplicate values. 9) What are the new features included in the new versions of SAS? A.. the main advantage of version 9 is faster execution of applications and centralized access of date and support. 10) What difference did you find among version 6, 8 and 9?
A. Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new component, the metadata server, to provide an information layer between the programs and the data they access. Metadata, such as security permission for SAs libraries and where the various SAs servers are running, are maintained in a common repository. 11) What are the advantages of using SAS clinical data management? Why should not we use other software products in managing clinical data? A.. ADVANTAGES OF USING A SAS-BASED SYSTEM: Less hardware is required. A typical SAS based system can utilize a standard file server to store its databases and does not require one or more dedicated servers to handle the application load. PC SAS can easily be used to handle processing, while data access is left to the file server. Additionally, as presented, later in this paper, it is possible to use the SAS product SAS/ Share to provide a dedicated server to handle data transactions. Fewer personnel are required. Systems that use complicated database software often the hiring of one or more DBAs who make sure the database software is running, make changes to the structure of t he database, etc. these individuals often require special training or background experience in the particular database application being used, typically oracle. Additionally, consultants are often required to set up the system studies since dedicated serves and specific expertise requirements often complicate the process. Users with even casual SAS experience can set up studies.Programmer can build the structure of the database and design screens. Organizations that re involved in data management almost always have at least one SAS programmer already on staff. He has understanding of how actually system works, which would allow them to extend the functionality of directly accessing SAS data from outside of the system. Speed of setup is dramatically reduced. By keeping studies on a local file server and making the database and screen design process extremely simple and intuitive, setup time is reduced from weeks to days. All phases of the data management process become homogeneous. From entry to analysis, data reside in SAS data sets, often the end goal of every data management group. Additionally, SAS users are involved in each step, instead of having specialist from different area hand off pieces of studies during the projects life cycle. No data conversion is required. Since the data reside in SAS data sets natively, no conversion programs need to be written. Data review can happen during the data entry process, on the master databases. As long as records are marked as being double-keyed, data review personnel can run edit check programs and build queries on some patients while others are still being entered. Tables and listing can be generated on live data. This helps speed up the development of table and listing programs and allows programmers to avoid having to make continual copies or extracts of the data during testing. 12) What has been your most common programming mistake? A.. I remember missing semicolon and not checking log after submitting program, not using debugging tech and not using Fsview option vigorously are my common programming errors I made when I started learning SAS and in my initial projects. 13) Have you ever had to follow SOPs or Programming guidelines? A.. SOP describes the process to assure that standard coding activities, which produce tables, listing and graphs, functions and /or edit checks, are conducted in accordance with industry standards are appropriately documented. 14) Name several ways to achieve efficiency in your program. Explain trade off?
A. Efficiency and performance strategies can be classified into 5 different areas. Data Storage Elapsed time Input / Output Memory CPU time and elapsed time base line measurements. Efficiency improving techniques: Using keep and drop statements to retain necessary variables. Use macros for reducing the code. Using if-then/else statements to process data programming. Use sql procedure to reduce number of programming steps. Using of length statements to reduce the variable size for reducing the data storage. 15) What other SAS products have you used and consider yourself proficient in using? A. Data _null_ statements, proc means, proc report, proc tabulate, proc freq, and proc print, proc Univariate etc. 16) What is the significance of the OF in x=sum (of a1-a4, a6,a9); A. If dont use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 toa4 &a6 and a9. It is true for mean option also. 16) What do the put and input function do? A. Input function converts character data values to numeric values. Put function converts numeric values to character values. Ex: for input: input (source, informat) SAS interview Questions 1) 2) Can merge statement give output without BY statement? Give at least one example on it. If I have following two data sets. New a 1 3 5 Test a 3 1 5 b 5 6 7 b 2 4 6
I am submitting following code.( Assume that NEW and TEST Data sets will be sorted by A and B variable.) Data nete; Merge new test; By a b; Run; What will be my output? 3) 4) Difference with Data step Merge and SQL join with example. How to define Merging, concatenating and Appending.
5)
Assume that you have 50 observation in one data set and 3000 observation in another data set. If you use
DATA step Merge to combine the data sets, how many observations are read from the larger data set (i.e. second data set)? 6) 7) error? 8) 9) What is interleaving? How do you insert comments in Macro? Why we use Index? Is it necessary to make it? When you use SET statement with multiple data sets, suppose variable salary defined in one data set contain
type character and salary defined in another data set contain type numeric. Did my SET statement work without
Interview Questions asked By ICON: 1) 2) 3) 4) 5) 6) 7) 8) 9) Tell me about yourself. What you did after graduation and how SAS? What have you learned in Base SAS? What is Data step and Proc step? Define Data statement. What are Drop= and Keep= options and can they be used in proc step? We save our word files in .doc format, excel files in .xls format, can you tell me the format to save SAS Can you tell me about substring function and its syntax? Use of Input and put function.
Dataset?
10) Explain retain statement. 11) What is merging . 12) If we write 2 set data set names in the set statement, what will happen? 13) Can I merge 3 datasets and if yes, tell me the syntax. 14) Difference between merging and concatenating. 15) Difference between Like operator and contains operator. 16) nodupkey and nodups. 17) Proc Transpose. 18) Proc SQL syntax. 19) What have you done previously? 20) Use and syntax of PROC Format . 21) What functions do you study during your training? 22) What is the difference between functions and proc? 23) Explain any one function. 24) Difference between WHERE and IF. 25) Questions on format and informat. 26) I have a raw data file, where date format is date9. Now I want to read raw data file to SAS data set what should we do?
27) Questions on raw data file. 28) How to create macro variables? 29) If we have a raw data file and we use where condition then what happen? 30) What is the use of compress function? 31) What project you have done? 32) What is the difference between merge statement and PROC Sql join? 33) What is NODUP and what is NODUPKEY?
Interview Questions asked By GSK: 1) 2) 3) 4) 5) 6) 7) 8) 9) Tell me something about yourself? How to read excel worksheet in SAS? How to merge two data sets? What are the methods available for many to many merge? What is difference between appending and concatenating? What is interleaving? What to do for std. deviation so that it can appear in listing report? What is difference between proc means and proc summary? How we can create macro variable? When we can use call symputx and why?
10) Mention the macro functions? 11) Describe Clinical trial definitions. 12) Definition of safety. 13) What is mechanism of action for Peracetamol? 14) What did you studied in Pharmacy? 15) What is training period and from where you study? (in terms of SAS) 16) What is your course content? (in terms of SAS). 17) What is used of output statement? 18) How to read raw data files? 19) How to find STD error? 20) Types of Merge? 21) Proc means procedure-detail 22) How to create macro in auto-call library 23) % include statement-detail 24) Define macro variable 25) Proc report about ID statement 26) ID statement in proc Transpose 27) Rename= option 28) Data statement options at least any 3 options. 29) College percentage and H.S.C. percentage
30) LOCF, Proc Transpose, and Proc Univariate. 31) Proc report Ideal Design 32) Append and set difference 33) Set and Merge Difference 34) Macro: How will you create macro with example 35) Define different functions which can accomplish with proc sql. 36) On which project you have worked? 37) Definitions of Protocol, Adverse Event, Concomitant Medicine, Phases of Clinical Trial and SAP. 38) Categorical and continuous data and derived data sets (related to Project) 39) Flow option in proc report 40) Statistical background than also want to be programmer? Why? 41) Statistician or programmer, which will you prefer? 42) If you have opportunity to learn statistics, what will you do? 43) More detail questions related to background of student.
FREQUENTLY ASKED SAS WRITTEN TEST/INTERVIEW QUESTIONS PART 1 Objective: Gearing up for a SAS interview?? The following SAS program is submitted: data test; set sasuser.employees; if 2 le years_service le 10 then amount = 1000; else if years_service gt 10 then amount = 2000; else amount = 0; amount_per_year = years_service / amount; run; Which one of the following values does the variable AMOUNT_PER_YEAR contain if an employee has been with the company for one year? A. 0 B. 1000 C. 2000 D. . (missing numeric value)
The contents of the raw data file AMOUNT are listed below: 10-20-30 $1,234 The following SAS program is submitted: data test; infile amount; input @1 salary 6.; if _error_ then description = Problems; else description = No Problems; run; Which one of the following is the value of the DESCRIPTION variable? A. Problems B. No Problems C. (missing character value) D. The value can not be determined as the program fails to execute due to errors. The contents of the raw data file NAMENUM are listed below: 10-20-30 Joe xx The following SAS program is submitted: data test; infile namenum; input name $ number; run; Which one of the following is the value of the NUMBER variable? A. xx B. Joe C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. The contents of the raw data file AMOUNT are listed below: 10-20-30 $1,234 The following SAS program is submitted: data test; infile amount; input @1 salary 6.; run; Which one of the following is the value of the SALARY variable? A. 1234 B. 1,234
C. $1,234 D. . (missing numeric value) Which one of the following statements is true regarding the SAS automatic _ERROR_ variable? A. The _ERROR_ variable contains the values ON or OFF. B. The _ERROR_ variable contains the values TRUE or FALSE. C. The _ERROR_ variable is automatically stored in the resulting SAS data set. D. The _ERROR_ variable can be used in expressions or calculations in the DATA step. Which one of the following is true when SAS encounters a data error in a DATA step? A. The DATA step stops executing at the point of the error, and no SAS data set is created. B. A note is written to the SAS log explaining the error, and the DATA step continues to execute. C. A note appears in the SAS log that the incorrect data record was saved to a separate SAS file for further examination. D. The DATA step stops executing at the point of the error, and the resulting DATA set contains observations up to that point. The following SAS program is submitted: data work.totalsales (keep = monthsales{12} ); set work.monthlysales (keep = year product sales); array monthsales {12} ; do i=1 to 12; monthsales{i} = sales; end; run; The data set named WORK.MONTHLYSALES has one observation per month for each of five years for a total of 60 observations. Which one of the following is the result of the above program? A. The program fails execution due to data errors. B. The program fails execution due to syntax errors. C. The program executes with warnings and creates the WORK.TOTALSALES data set. D. The program executes without errors or warnings and creates the WORK.TOTALSALES data set. The following SAS program is submitted: data work.totalsales; set work.monthlysales(keep = year product sales); retain monthsales {12} ; array monthsales {12} ; do i = 1 to 12; monthsales{i} = sales; end; cnt + 1; monthsales{cnt} = sales;
run; The data set named WORK.MONTHLYSALES has one observation per month for each of five years for a total of 60 observations. Which one of the following is the result of the above program? A. The program fails execution due to data errors. B. The program fails execution due to syntax errors. C. The program runs with warnings and creates the WORK.TOTALSALES data set with 60 observations. D. The program runs without errors or warnings and creates the WORK.TOTALSALES data set with 60 observations. The following SAS program is submitted: data work.january; set work.allmonths (keep = product month num_sold cost); if month = Jan then output work.january; sales = cost * num_sold; keep = product sales; run; Which variables does the WORK.JANUARY data set contain? A. PRODUCT and SALES only B. PRODUCT, MONTH, NUM_SOLD and COST only C. PRODUCT, SALES, MONTH, NUM_SOLD and COST only D. An incomplete output data set is created due to syntax errors. The contents of the raw data file CALENDAR are listed below: 10-20-30 01012000 The following SAS program is submitted: data test; infile calendar; input @1 date mmddyy10.; if date = 01012000d then event = January 1st; run; Which one of the following is the value of the EVENT variable? A. 01012000 B. January 1st C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. A SAS program is submitted and the following SAS log is produced: 2 data gt100; 3 set ia.airplanes 4 if mpg gt 100 then output; 22 202
ERROR: File WORK.IF.DATA does not exist. ERROR: File WORK.MPG.DATA does not exist. ERROR: File WORK.GT.DATA does not exist. ERROR: File WORK.THEN.DATA does not exist. ERROR: File WORK.OUTPUT.DATA does not exist. ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, ;, END, KEY, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_. ERROR 202-322: The option or parameter is not recognized and will be ignored. 5 run; The IA libref was previously assigned in this SAS session. Which one of the following corrects the errors in the LOG? A. Delete the word THEN on the IF statement. B. Add a semicolon at the end of the SET statement. C. Place quotes around the value on the IF statement. D. Add an END statement to conclude the IF statement. The contents of the raw data file SIZE are listed below: 10-20-30 72 95 The following SAS program is submitted: data test; infile size; input @1 height 2. @4 weight 2; run; Which one of the following is the value of the variable WEIGHT in the output data set? A. 2 B. 72 C. 95 D. . (missing numeric value) A SAS PRINT procedure output of the WORK.LEVELS data set is listed below: Obs name level 1 Frank 1 2 Joan 2 3 Sui 2 4 Jose 3 5 Burt 4 6 Kelly . 7 Juan 1 The following SAS program is submitted: data work.expertise;
set work.levels; if level = . then expertise = Unknown; else if level = 1 then expertise = Low; else if level = 2 or 3 then expertise = Medium; else expertise = High; run; Which of the following values does the variable EXPERTISE contain? A. Low, Medium, and High only B. Low, Medium, and Unknown only C. Low, Medium, High, and Unknown only D. Low, Medium, High, Unknown, and (missing character value) The contents of the raw data file EMPLOYEE are listed below: 10-20-30 Ruth 39 11 Jose 32 22 Sue 30 33 John 40 44 The following SAS program is submitted: data test; infile employee; input employee_name $ 1-4; if employee_name = Ruth then input idnum 10-11; else input age 7-8; run; Which one of the following values does the variable IDNUM contain when the name of the employee is Ruth? A. 11 B. 22 C. 32 D. . (missing numeric value) The contents of the raw data file EMPLOYEE are listed below: 10-20-30 Ruth 39 11 Jose 32 22 Sue 30 33 John 40 44
The following SAS program is submitted: data test; infile employee; input employee_name $ 1-4; if employee_name = Sue then input age 7-8; else input idnum 10-11; run; Which one of the following values does the variable AGE contain when the name of the employee is Sue? A. 30 B. 33 C. 40 D. . (missing numeric value) The following SAS program is submitted: libname sasdata SAS-data-library; data test; set sasdata.chemists; if jobcode = Chem2 then description = Senior Chemist; else description = Unknown; run; A value for the variable JOBCODE is listed below: JOBCODE chem2 Which one of the following values does the variable DESCRIPTION contain? A. Chem2 B. Unknown C. Senior Chemist D. (missing character value) The following SAS program is submitted: libname sasdata SAS-data-library; data test; set sasdata.chemists; if jobcode = chem3 then description = Senior Chemist; else description = Unknown; run; A value for the variable JOBCODE is listed below: JOBCODE CHEM3
Which one of the following values does the variable DESCRIPTION contain? A. chem3 B. Unknown C. Senior Chemist D. (missing character value) Which one of the following ODS statement options terminates output being written to an HTML file? A. END B. QUIT C. STOP D. CLOSE The following SAS program is submitted: proc means data = sasuser.shoes; where product in (Sandal , Slipper , Boot); run; Which one of the following ODS statements completes the program and sends the report to an HTML file? A. ods html = sales.html; B. ods file = sales.html; C. ods file html = sales.html; D. ods html file = sales.html; The following SAS program is submitted: proc format; value score 1 50 = Fail 51 100 = Pass; run; proc report data = work.courses nowd; column exam; define exam / display format = score.; run; The variable EXAM has a value of 50.5. How will the EXAM variable value be displayed in the REPORT procedure output? A. Fail B. Pass C. 50.5 D. . (missing numeric value) The following SAS program is submitted: options pageno = 1; proc print data = sasuser.houses; run; proc means data = sasuser.shoes;
run; The report created by the PRINT procedure step generates 5 pages of output. What is the page number on the first page of the report generated by the MEANS procedure step? A. 1 B. 2 C. 5 D. 6 Which one of the following SAS system options displays the time on a report? A. TIME B. DATE C. TODAY D. DATETIME Which one of the following SAS system options prevents the page number from appearing on a report? A. NONUM B. NOPAGE C. NONUMBER D. NOPAGENUM The following SAS program is submitted: footnote1 Sales Report for Last Month; footnote2 Selected Products Only; footnote3 All Regions; footnote4 All Figures in Thousands of Dollars; proc print data = sasuser.shoes; footnote2 All Products; run; Which one of the following contains the footnote text that is displayed in the report? A. All Products B. Sales Report for Last Month All Products C. All Products All Regions All Figures in Thousands of Dollars D. Sales Report for Last Month All Products All Regions All Figures in Thousands of Dollars The following SAS program is submitted: proc means data = sasuser.houses std mean max; var sqfeet;
run; Which one of the following is needed to display the standard deviation with only two decimal places? A. Add the option MAXDEC = 2 to the MEANS procedure statement. B. Add the statement MAXDEC = 7.2; in the MEANS procedure step. C. Add the statement FORMAT STD 7.2; in the MEANS procedure step. D. Add the option FORMAT = 7.2 option to the MEANS procedure statement. Unless specified, which variables and data values are used to calculate statistics in the MEANS procedure? A. non-missing numeric variable values only B. missing numeric variable values and non-missing numeric variable values only C. non-missing character variables and non-missing numeric variable values only D. missing character variables, non-missing character variables, missing numeric variable values, and non-missing numeric variable values The following SAS program is submitted: proc sort data = sasuser.houses out = houses; by style; run; proc print data = houses; run; Click on the Exhibit button to view the report produced. style bedrooms baths price CONDO 2 1.5 80050 3 2.5 79350 4 2.5 127150 2 2.0 110700 RANCH 2 1.0 64000 3 3.0 86650 3 1.0 89100 1 1.0 34550 SPLIT 1 1.0 65850 4 3.0 94450 3 1.5 73650 TWOSTORY 4 3.0 107250 2 1.0 55850 2 1.0 69250 4 2.5 102950 Which of the following SAS statement(s) create(s) the report? A. id style; B. id style;
var style bedrooms baths price; C. id style; by style; var bedrooms baths price; D. id style; by style; var style bedrooms baths price; A realtor has two customers. One customer wants to view a list of homes selling for less than $60,000. The other customer wants to view a list of homes selling for greater than $100,000. Assuming the PRICE variable is numeric, which one of the following PRINT procedure steps will select all desired observations? A. proc print data = sasuser.houses; where price lt 60000; where price gt 100000; run; B. proc print data = sasuser.houses; where price lt 60000 or price gt 100000; run; C. proc print data = sasuser.houses; where price lt 60000 and price gt 100000; run; D. proc print data = sasuser.houses; where price lt 60000 or where price gt 100000; run; The value 110700 is stored in a numeric variable. Which one of the following SAS formats is used to display the value as $110,700.00 in a report? A. comma8.2 B. comma11.2 C. dollar8.2 D. dollar11.2 The SAS data set SASUSER.HOUSES contains a variable PRICE which has been assigned a permanent label of Asking Price. Which one of the following SAS programs temporarily replaces the label Asking Price with the label Sale Price in the output? A. proc print data = sasuser.houses; label price = Sale Price; run; B. proc print data = sasuser.houses label;
label price Sale Price; run; C. proc print data = sasuser.houses label; label price = Sale Price; run; D. proc print data = sasuser.houses label = Sale Price; run; The SAS data set BANKS is listed below: BANKS name rate FirstCapital 0.0718 DirectBank 0.0721 VirtualDirect 0.0728 The following SAS program is submitted: data newbank; do year = 1 to 3; set banks; capital + 5000; end; run; Which one of the following represents how many observations and variables will exist in the SAS data set NEWBANK? A. 0 observations and 0 variables B. 1 observations and 4 variables C. 3 observations and 3 variables D. 9 observations and 2 variables The following SAS program is submitted: data work.clients; calls = 6; do while (calls le 6); calls + 1; end; run; Which one of the following is the value of the variable CALLS in the output data set? A. 4 B. 5 C. 6 D. 7 The following SAS program is submitted: data work.pieces;
do while (n lt 6); n + 1; end; run; Which one of the following is the value of the variable N in the output data set? A. 4 B. 5 C. 6 D. 7 The following SAS program is submitted: data work.sales; do year = 1 to 5; do month = 1 to 12; x + 1; end; end; run; Which one of the following represents how many observations are written to the WORK.SALES data set? A. 0 B. 1 C. 5 D. 60 A raw data record is listed below: 10-20-30 1999/10/25 The following SAS program is submitted: data projectduration; infile file-specification; input date $ 1 10; run; Which one of the following statements completes the program above and computes the duration of the project in days as of todays date? A. duration = today( ) put(date,ddmmyy10.); B. duration = today( ) put(date,yymmdd10.); C. duration = today( ) input(date,ddmmyy10.); D. duration = today( ) input(date,yymmdd10.); A raw data record is listed below: 10-20-30
Printing 750 The following SAS program is submitted: data bonus; infile file-specification; input dept $ 1 11 number 13 15; run; Which one of the following SAS statements completes the program and results in a value of Printing750 for the DEPARTMENT variable? A. department = trim(dept) number; B. department = dept input(number,3.); C. department = trim(dept) || put(number,3.); D. department = input(dept,11.) || input(number,3.); The following SAS program is submitted: data work.month; date = put(13mar2000d,ddmmyy10.); run; Which one of the following represents the type and length of the variable DATE in the output data set? A. numeric, 8 bytes B. numeric, 10 bytes C. character, 8 bytes D. character, 10 bytes The following SAS program is submitted: data work.products; Product_Number = 5461; Item = 1001; Item_Reference = Item/'Product_Number; run; Which one of the following is the value of the variable ITEM_REFERENCE in the output data set? A. 1001/5461 B. 1001/ 5461 C. . (missing numeric value) D. The value can not be determined as the program fails to execute due to errors. The following SAS program is submitted: data work.retail; cost = 20000; total = .10 * cost; run; Which one of the following is the value of the variable TOTAL in the output data set?
A. 2000 B. 2000 C. . (missing numeric value) D. (missing character value) Which one of the following SAS statements correctly computes the average of four numerical values? A. average = mean(num1 num4); B. average = mean(of num1 num4); C. average = mean(of num1 to num4); D. average = mean(num1 num2 num3 num4); The following SAS program is submitted: data work.test; Author = Agatha Christie; First = substr(scan(author,1, ,),1,1); run; Which one of the following is the length of the variable FIRST in the output data set? A. 1 B. 6 C. 15 D. 200 The following SAS program is submitted: data work.test; Author = Christie, Agatha; First = substr(scan(author,2, ,),1,1); run; Which one of the following is the value of the variable FIRST in the output data set? A. A B. C C. Agatha D. (missing character value) The following SAS program is submitted: data work.test; Title = A Tale of Two Cities, Charles J. Dickens; Word = scan(title,3, ,); run; Which one of the following is the value of the variable WORD in the output data set? A. T B. of C. Dickens D. (missing character value)
The following SAS program is submitted: data work.test; First = Ipswich, England; City_Country = substr(First,1,7)!!, !!England; run; Which one of the following is the length of the variable CITY_COUNTRY in the output data set? A. 6 B. 7 C. 17 D. 25 The following SAS program is submitted: data work.test; First = Ipswich, England; City = substr(First,1,7); City_Country = City!!, !!England; run; Which one of the following is the value of the variable CITY_COUNTRY in the output data set? A. Ipswich!! B. Ipswich, England C. Ipswich, England D. Ipswich , England Which one of the following is true of the RETAIN statement in a SAS DATA step program? A. It can be used to assign an initial value to _N_ . B. It is only valid in conjunction with a SUM function. C. It has no effect on variables read with the SET, MERGE and UPDATE statements. D. It adds the value of an expression to an accumulator variable and ignores missing values. A raw data file is listed below: 10-20-30 1901 2 1905 1 1910 6 1925 . 1941 1 The following SAS program is submitted and references the raw data file above: data coins; infile file-specification; input year quantity; run; Which one of the following completes the program and produces a non-missing value for the variable TOTQUANTITY
in the last observation of the output data set? A. totquantity + quantity; B. totquantity = sum(totquantity + quantity); C. totquantity 0; sum totquantity; D. retain totquantity 0; totquantity = totquantity + quantity; A raw data file is listed below: 10-20-30 squash 1.10 apples 2.25 juice 1.69 The following SAS program is submitted using the raw data file above: data groceries; infile file-specification; input item $ cost; run; Which one of the following completes the program and produces a grand total for all COST values? A. grandtot = sum cost; B. grandtot = sum(grandtot,cost); C. retain grandtot 0; grandtot = sum(grandtot,cost); D. grandtot = sum(grandtot,cost); output grandtot; The following SAS program is submitted: data work.total; set work.salary(keep = department wagerate); by department; if first.department then payroll = 0; payroll + wagerate; if last.department; run; The SAS data set WORK.SALARY, currently ordered by DEPARTMENT, contains 100 observations for each of 5 departments. Which one of the following represents how many observations the WORK.TOTAL data set contains? A. 5 B. 20
C. 100 D. 500 The following SAS program is submitted: data work.total; set work.salary(keep = department wagerate); by department; if first.department then payroll = 0; payroll + wagerate; if last.department; run; The SAS data set named WORK.SALARY contains 10 observations for each department, currently ordered by DEPARTMENT. Which one of the following is true regarding the program above? A. The BY statement in the DATA step causes a syntax error. B. FIRST.DEPARTMENT and LAST.DEPARTMENT are variables in the WORK.TOTAL data set. C. The values of the variable PAYROLL represent the total for each department in the WORK.SALARY data set. D. The values of the variable PAYROLL represent a total for all values of WAGERATE in the WORK.SALARY data set. ANSWERS : 1: d 2: a 3: c 4: d 5: d 6: b 7: b 8: b 9: d 10: d 11: b 12: a 13: b 14: d 15: d 16: b 17: b 18: d 19: d 20: c 21: d 22: b 23: c 24: b 25: a 26: a 27: c 28: b 29: d 30: c 31: b 32: d 33: c 34: b 35: d 36: c 37: d 38: d 39: a 40: b 41: d 42: a 43: b 44: d 45: d 46: c or d 47: a 48: c 49: a 50: d or c
SAS Interview Question You can go into a SAS interview with more confidence if you know that you are prepared to respond to the kind of technical questions that an interviewer might ask you.
What is the one statement to set the criteria of data that can be coded in any step? A) Options statement. What is the effect of the OPTIONS statement ERRORS=1? A) The ERROR- variable ha a value of 1 if there is an error in the data for that observation and 0 if it is not. What do the SAS log messages "numeric values have been converted to character" mean? What are the implications? A) It implies that automatic conversion took place to make character functions possible. Why is a STOP statement needed for the POINT= option on a SET statement? A) Because POINT= reads only the specified observations SAS cannot detect an end-of-file condition as it would if the file were being read sequentially. How do you control the number of observations and/or variables read or written? A) FIRSTOBS and OBS option Approximately what date is represented by the SAS date value of 730? A) 31st December 1961 Identify statements whose placement in the DATA step is critical. A) INPUT, DATA and RUN Does SAS 'Translate' (compile) or does it 'Interpret'? A) Compile What does the RUN statement do? A) When SAS editor looks at Run it starts compiling the data or proc step, if you have more than one data step or proc step or if you have a proc step. Following the data step then you can avoid the usage of the run statement. Why is SAS considered self-documenting? A) SAS is considered self documenting because during the compilation time it creates and stores all the information about the data set like the time and date of the data set creation later No. of the variables later labels all that kind of info inside the dataset and you can look at that info using proc contents procedure. What are some good SAS programming practices for processing very large data sets? A) Sort them once, can use firstobs = and obs = , What is the different between functions and PROCs that calculate the same simple descriptive statistics? A) Functions can used inside the data step and on the same data set but with proc's you can create a new data sets
to output the results. May be more ........... If you were told to create many records from one record, show how you would do this using arrays and with PROC TRANSPOSE? A) I would use TRANSPOSE if the variables are less use arrays if the var are more ................. depends What is a method for assigning first.VAR and last.VAR to the BY groupvariable on unsorted data? A) In unsorted data you can't use First. or Last. How do you debug and test your SAS program? A) First thing is look into Log for errors or warning or NOTE in some cases or use the debugger in SAS data step. What other SAS features do you use for error trapping and data validation? A) Check the Log and for data validation things like Proc Freq, Proc means or some times proc print to look how the data looks like ........ How would you combine 3 or more tables with different structures? A) I think sort them with common variables and use merge statement. I am not sure what you mean different structures. Other questions: What areas of SAS are you most interested in? A) BASE, STAT, GRAPH, ETSBriefly Describe 5 ways to do a "table lookup" in SAS. A) Match Merging, Direct Access, Format Tables, Arrays, PROC SQL What versions of SAS have you used (on which platforms)? A) SAS 9.1.3,9.0, 8.2 in Windows and UNIX, SAS 7 and 6.12 What are some good SAS programming practices for processing very large data sets?A) Sampling method using OBS option or subsetting, commenting the Lines, Use Data Null What are some problems you might encounter in processing missing values? In Data steps? Arithmetic? Comparisons? Functions? Classifying data? A) The result of any operation with missing value will result in missing value. Most SAS statistical procedures exclude observations with any missing variable vales from an analysis. How would you create a data set with 1 observation and 30 variables from a data set with 30observations and 1 variable?
A) Using PROC TRANSPOSE What is the different between functions and PROCs that calculate the same simple descriptive statistics? A) Proc can be used with wider scope and the results can be sent to a different dataset. Functions usually affect the existing datasets. If you were told to create many records from one record, show how you would do this using array and with PROC TRANSPOSE? A) Declare array for number of variables in the record and then used Do loop Proc Transpose with VARstatement What are _numeric_ and _character_ and what do they do? A) Will either read or writes all numeric and character variables in dataset. How would you create multiple observations from a single observation?A) Using double Trailing @@ For what purpose would you use the RETAIN statement? A) The retain statement is used to hold the values of variables across iterations of the data step. Normally, all variables in the data step are set to missing at the start of each iteration of the data step.What is the order of evaluation of the comparison operators: + - * / ** ()?A) (), **, *, /, +, How could you generate test data with no input data? A) Using Data Null and put statement How do you debug and test your SAS programs? A) Using Obs=0 and systems options to trace the program execution in log. What can you learn from the SAS log when debugging? A) It will display the execution of whole program and the logic. It will also display the error with line number so that you can and edit the program. What is the purpose of _error_? A) It has only to values, which are 1 for error and 0 for no error. How can you put a "trace" in your program? A) By using ODS TRACE ON How does SAS handle missing values in: assignment statements, functions, a merge, an update, sort order, formats, PROCs? A) Missing values will be assigned as missing in Assignment statement. Sort order treats missing as second smallest followed by underscore.
How do you test for missing values? A) Using Subset functions like IF then Else, Where and Select. How are numeric and character missing values represented internally? A) Character as Blank or and Numeric as. Which date functions advances a date time or date/time value by a given interval? A) INTNX. In the flow of DATA step processing, what is the first action in a typical DATA Step? A) When you submit a DATA step, SAS processes the DATA step and then creates a new SAS data set.( creation of input buffer and PDV) Compilation Phase Execution Phase What are SAS/ACCESS and SAS/CONNECT? A) SAS/Access only process through the databases like Oracle, SQL-server, Ms-Access etc. SAS/Connect only use Server connection. What is the one statement to set the criteria of data that can be coded in any step?A) OPTIONS Statement, Label statement, Keep / Drop statements. What is the purpose of using the N=PS option? A) The N=PS option creates a buffer in memory which is large enough to store PAGESIZE (PS) lines and enables a page to be formatted randomly prior to it being printed. What are the scrubbing procedures in SAS? A) Proc Sort with nodupkey option, because it will eliminate the duplicate values. What are the new features included in the new version of SAS i.e., SAS9.1.3? A) The main advantage of version9 is faster execution of applications and centralized access of data and support. There are lots of changes has been made in the version 9 when we compared with the version8. The following are the few:SAS version 9 supports Formats longer than 8 bytes & is not possible with version 8. Length for Numeric format allowed in version 9 is 32 where as 8 in version 8. Length for Character names in version 9 is 31 where as in version 8 is 32. Length for numeric informat in version 9 is 31, 8 in version 8. Length for character names is 30, 32 in version 8.3 new informats are available in version 9 to convert various date,
time and datetime forms of data into a SAS date or SAS time. ANYDTDTEW. - Converts to a SAS date value ANYDTTMEW. - Converts to a SAS time value. ANYDTDTMW. Converts to a SAS datetime value.CALL SYMPUTX Macro statement is added in the version 9 which creates a macro variable at execution time in the data step by Trimming trailing blanks Automatically converting numeric value to character. New ODS option (COLUMN OPTION) is included to create a multiple columns in the output. WHAT DIFFERRENCE DID YOU FIND AMONG VERSION 6 8 AND 9 OF SAS. The SAS 9 A) Architecture is fundamentally different from any prior version of SAS. In the SAS 9 architecture, SAS relies on a new component, the Metadata Server, to provide an information layer between the programs and the data they access. Metadata, such as security permissions for SAS libraries and where the various SAS servers are running, are maintained in a common repository. What has been your most common programming mistake? A) Missing semicolon and not checking log after submitting program, Not using debugging techniques and not using Fsview option vigorously. Name several ways to achieve efficiency in your program.Efficiency and performance strategies can be classified into 5 different areas. CPU time Data Storage Elapsed time Input/Output Memory CPU Time and Elapsed Time- Base line measurements Few Examples for efficiency violations:Retaining unwanted datasets Not sub setting early to eliminate unwanted records. Efficiency improving techniques: A) Using KEEP and DROP statements to retain necessary variables. Use macros for reducing the code. Using IF-THEN/ELSE statements to process data programming. Use SQL procedure to reduce number of programming steps. Using of length statements to reduce the variable size for reducing the Data storage. Use of Data _NULL_ steps for processing null data sets for Data storage.
What other SAS products have you used and consider yourself proficient in using? B) A) Data _NULL_ statement, Proc Means, Proc Report, Proc tabulate, Proc freq and Proc print, Proc Univariate etc. What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);A) If dont use the OF function it might not be interpreted as we expect. For example the function above calculates the sum of a1 minus a4 plus a6 and a9 and not the whole sum of a1 to a4 & a6 and a9. It is true for mean option also.