Hello readers, over the past three months, I have been taking a class on the edx platform through MIT (Massachusetts Institute of Technology) on Computer Science Fundamentals and Python. As a Salesforce Developer, I will be the first to say that I love the platform and all that is has to offer. However, over my years as an engineer, I have always felt like having a base and foundation that is built on Computer Science fundamentals is extremely important and crucial. With this mindset, I am constantly searching the depths of algorithmic complexities and principles that I can apply to my everyday life as an engineer. I was able to get a certification in Computer Science fundamentals and also a certification in Python which I am now fairly fluent in thanks to this course! The course covered the following:
- Python fundamentals
- Iterative and Recursive algorithms
- Inheritance and OOP with Python
- Computational Complexities: Program Efficiency, Big O Notation, and Complexity Classes
- Searching and Sorting algorithms: Indirection, Linear Search, Bisection Search, Bogo and Bubble sort, Selection sort and Merge sort
Each night we had homework and each week we had Problem Sets that consisted of five to six complex problems that required each student to write custom algorithms to solve. Unfortunately due to the privacy agreement with the class, I can’t post anything here. On top of the homework and problem sets, we had a mid-term and a final exam. Overall, I was able to make a 97% in the class and I am pretty proud of that considering how challenging it was at times.
One of the most interesting concepts we learned about and practiced was Computational Complexity. Here is a sample example of a simple algorithm.
def program1(x): total = 0 for i in range(1000): total += i while x > 0: x -= 1 total += x return total
Suppose you were given a list L of some random length (len(L)). The goal is to find the overall efficiency of the algorithm but how do we do that? Let’s start with what we know:
- Best case: minimum running time over all possible inputs of a given size, len(L)
- First element in any list or the list is empty
- Average case: average running time over all possible inputs of a given size, len(L)
- Worst case: maximum running time over all possible inputs of a given size, len(L)
- Value found is last in list or the value is not found at all and the iteration must go through all values in the list
We want to evaluate programs efficiency when input is very big and express the growth of programs run time as input size grows. This is called Orders of Growth:
Big O notation measures an upper bound on the asymptotic growth. Basically that means an examination of how our application grows as we take larger inputs. We focus on worse case because thats typically where the bottleneck occurs in our programs.
So if we are looking at the diagram I drew above, we know that constant components of an algorithm are the most ideal because they are subject to never change the performance of an algorithm as our input grows larger. From there, we scale downward the more complex our algorithm becomes. Let’s examine the code above:
- Constants: total = 0, return total, for loop counting our range values
- No matter how we increase the size of the input, it takes the same amount of time to perform these procedures
- Linear: while loop and everything within. So, iteration loops n times. Within the loop, we are performing five actions:
- Evaluation of x compared to 1
- Two operations for subtracting and equaling (+-) the x variable
- Two operations for adding and equaling (+=) the total variable
- Therefore, the worse case equation for the complexity of the algorithm is as follows:
- 1 + 3000 + n * 5 + 1 + 1 = 5n + 3003
- So, if we are examining the graphs above to determine the complexity of the algorithm with Big O notation, we would say that this algorithms complexity is O(n). This describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set.
I cannot wait to use this practice in my own algorithms and apply complexity analysis to make sure I am being very mindful and intentional of each line that I utilize with my own applications. I thoroughly enjoyed this class and cannot wait to expand my knowledge further into the realm of computer science.
Here are my complete notes on the class for those who are interested:
MIT Computer Science Class Notes Knowledge Declarative Knowledge: is statements of fact. There is a coffee cup on this table Imperative Knowledge: is a recipe for “how to”: drink a cup of coffee 1. Look at the cup in front of you 2. Extend you hand 3. Grasp the handle 4. Lift the cup up to you face 5. Tilt cup at a reasonable angle 6. Drink the coffee What is a recipe? 1. A sequence of simple steps 2. Flow of control process that specifies when each step is executed 3. A means of determining when to stop Machines Computers are machines: how to capture a recipe in a mechanical process Fixed program: calculator, Alan Turing’s bomb -> computer designed specifically to calculate a particular computation Stored program computer: machine stores and executes instructions -> load into it a recipe or instructions and inside the comp will be a set of parts that will do those instructions when asked -Interpreter -Stored program is emulating a fixed program for various options of sequences -arithmetic and logic -simple tests -moving data around -Actual definition: sequence of instructions stored inside computer built from predefined set of primitive instructions -Special program called interpreter executes each instruction in order Basic machine architecture: 1. Memory: data, mechanical steps, place to store things. Could be a sequence of instructions 2. Input/output 3. Arithmetic Logic Unit - Do primitive ops - takes info from memory, reads it in, does an operation, and then store things back into memory 4. Control unit: keep track of what specific operation I want to do in that point in time - Program counter: when load program into machine, sequence of instructions and the program counter points to the location of the first instruction 5. Eventually we will get to a test that will say whether something is true or false Basic Primitives: Turing showed that you can compute anything with 6 primitives. 6 operations: move left, move right, scan, read, write and do nothing Modern programming languages have more convenient set of primitives Can abstract methods to create new primitives Anything computational in one language is computable in any other programming language -> Touring complete: one of the fundamentals of computer science Languages Creating Recipes 1. A programming language proves a set of primitive operations 2. Expressions are complex but legal combinations of primitives in a programming language 3. Expressions and computations have values and meanings in programming languages Aspects of languages 1. Primitive constructs - English: words - programming language: number, string, simple operators 2. Static semantics: is which syntactically valid strings have meanings -English: ‘I are hungry’ -Programming language: 3 + ‘hi’ 3. Semantics: meaning associated with syntactically correct string of symbols with no static errors English: can have many meanings: “Flying a plane is dangerous”, “This reading lamp hasn’t uttered a word since I bought it?” Programming languages: have only one meaning but may not be what the programmer intended Where things go wrong 1. Syntactic errors -common and easily caught 2. Static semantic errors -some languages check for these before running program -can cause unpredictable behavior 3. No semantic errors but different meaning then what programmer intended -program crashes, stops running -program runs forever -program gives an answer but different than expected Types Python Programs 1. A program is a sequence of definitions and commands -definitions evaluated -commands executed by Python interpreter in a shell 2. Commands (statements) instruct interpreter to do something 3. Can be typed directly in a shell or stored in a filed that is read into the shell and evaluated Objects 1. Program manipulate data objects 2. Objects have a type that define the kinds of things program can do with them 3. Objects are: -scalar: cannot be subdivided -non-scalar: have internal structure that can be accessed Scalar Objects 1. Int - represent integers 5 2. Float - represent real numbers 3.27 3. Bool - represent boolean values true 4. NoneType - special Type Conversions 1. Can convert objects of one type to another 2. Float (3) converts integer to 3.0 3. Int (3.9) truncates float 3.9 to integer 3 Printing to console 1. To show output from code to a user, use print command Expressions 1. Combine objects and operators to form expressions 2. An expressions has a value, which has a type 3. Syntax for a simple expression <object> <operator> <object> Operators on ints and floats I + j -> the sum I - j -> the difference I * j -> the product I/j -> division I//j -> int division I%j -> remainder when I is divided by j I**j -> I to the power of j Simple Operations 1. Parenthesis used to tell python to do these operations first 2. Operator precedence without parenthesis - ** - * - / - + and - - executed left to right Binding Variables and Values pi = 3.14 1. Equal sign is an assignment of a value to a variable name 2. Value stored in computer memory 3. An assignment binds name to value 4. Retrieve value associated with name or variable by invoking the name, ex typing pi Abstracting Expressions 1. Why give names to value of expressions? 2. Reuse names instead of values 3. Easier to change code later Programming vs Math 1. In programming, you do not solve for x Assignment 1. Value on the right 2. Name on the left 3. Equivalent is radius += 1 Changing Bindings 1. Can re-bind variable names using new assignment statement 2. Previous value may still stored in memory but lost the handle for it 3. Value for area does not change until you tell the computer to do the calculation again Operators and Branching Comparison Operators on int and float 1. I and j are any variable names - I > j - I >= j - I <= j - I == j -> tests equality - I != j 2. Logic Operators on Bools - a and b are any variable names + not a + a and b + a or b 3. Branching Programs - the simplest branching statement is conditional + a test (expression that evaluates to true or false) + a block of code to execute if test is true + an optional block of code to execute if test is false A simple example x = int(input(‘Enter an inter:’)) If x%2 == 0: <— test print (‘’) -> true block/Users/taylor.ortiz/Documents/slalomProjects/optiv/force-app/main/default/flexipages/Opportunity_Record_Page1.flexipage-meta.xml print(even) Else: print (‘’) -> false block print(‘Odd’) print(‘Don’t with conditional’) Some Observations 1. The expression x%2 == 0 evaluates to true when the remainder of x divided by 2 is 0 2. Note that == is used for comparison, since = is reserved for assignment 3. The indentation is important - each indented set of expressions denotes a block of instructions - for example, if the last statement were indented, it would be executed as part of the else block of code 4. Note how this indentations provides a visual structure that reflects semantic structure of the program Nested Conditionals 1. If x%2 == 0: if x% == 0 print (‘x is here’) Compound Booleans 1. If x < y and x < z print(‘x is least’) elif y < z: print(‘this is equivalent to an else if) Indentation 1. Matters in python 2. How you denote blocks of code = vs == 1. When we do comparison, we need to do the double equal What have we added? 1. Branching programs allow us to make choices and do different things 2. But still the case that at most, each statement gets executed once 3. So maximum time to run the program depends only on the length of the program 4. These programs run in constant time Variables 1. Name -descriptive -meaningful -help you read code -cannot be keywords 2. Value -information stored -can be updated Variable Binding with: 1. Computer the right hand side -> value 2. Store it (bind it) in the left hand side -> variable 3. Left hand side will be replaced with new value 4. = is assignment Binding example: 1. Swap variables 2. Doesn’t swap because there is a sequence 3. Temp variables Strings 1. Sequence of characters: letters, special characters, spaces, digits 2. Enclose in quotation marks or single quotes 3. Concatenate strings -name = “print” -greet = “hi” + name -overloaded string Operations on Strings 1. ‘Ab’ + ‘cd’ -> concatenation 2. 3 + ‘Eric’ -> successive concatenation 3. len[‘Eric’] -> the length 4. ‘Eric’ -> indexing: begins with 0. Attempting to index beyond length -1 is an error 5. ‘Eric’[1:3] -> slicing: Extracts sequence starting at first index and ending before second index :if no value before :, start at 0 :if no value after :, end at length :if just :, make a copy of the entire sequence Input/Output: print 1. Used to output stuff to console 2. Keyword is called print 3. Printing with multiple arguments Input/Output: input (“”) 1. Prints whatever is within the quotes 2. User types in something and hits enter 3. Returns entered sequence 4. Can bind that value to a variable so can reference - text = input(“Type Anything…”) - print(5 * text); 5. Input returns a string so must cast if working with numbers - num = int(input(“Type a number….”)) - print(5 * num) IDEs 1. Painful to just type things into a shell 2. Better to have a text editor - integrated development environment 3. IDLE or Anaconda are examples 4. Comes with -text editor: use to enter, edit and save your programs -Shell - place in which to interact with and run your programs; standard methods to evaluate your programs from the editor or from stored files -Integrated debugger (use later) Branching Programs 1. The simplest branching statement is a conditional -A test (expression that evaluates to True or False) -A block of code to execute if the test is True -An optional block of code to execute if the test is False Using Control in Loops 1. Simple branching programs just make choices, but path through code is still linear 2. Sometimes want to reuse parts of the code indeterminate number of times Control Flow while Loops: 1. While <condition> <expression> <expression> 2. <condition> evaluates to a Boolean 3. If <condition> is True, do all the steps inside the while code block 4. Check <condition> again 5. Repeat until <condition> is False While and for loops: 1. N = 0 while n < 5: print(n) n = n + 1 2. for n in range(5): print(n) For Loops: for <variable> in range(<some_num>): <expression> <expression> … -each time through the loop, <variable> takes a value -first time, <variable> starts at the smallest value -next time, <variable> gets the prev value + 1 Range(start, stop, step) -default values are start = 0 and step = 1 and is optional -loop until value is stop -1 mysum = 0 For I in range(7, 10): my sum += I mysum = 0 for i in range(5, 11, 2) Mysum += 1 Break statement - Immediately exists whatever loop it is in - Skips remaining expressions in code block - Exits only innermost loop For loops vs While loops - For loop: - Known number of iterations - Can end early via break - Uses a counter - Can rewrite a for loop using a while loop - While loop: - Unbounded (unknown) number of iterations - Can end early via break - Can use a counter but must initialize before loop and increment it inside loop - May not be able to rewrite a while loop using a for loop Iteration - Concept of iteration lets us extend simple branching algorithms to be able to write programs of arbitrary complexity - Start with a test - If evaluates to True, then execute loop body once, and go back to reevaluate the test - Repeat until test evaluates to False, after which code following iteration statement is executed Stepping through the code x = 3 Ans = 0 intersLeft = x While (iterleft != 0): ans = ans + x intersLeft = itersLeft - 1 - Some properties of iteration loops - need to set an iteration variable outside the loop -need to test variable to determine when done -need to change variable within the loop, in addition to the other work Iterative Code - Branching structures (conditionals) let us jump to different pieces of code based on a test - Programs are constant time - Looping structures let us repeat pieces of code until a condition is satisfied - Programs now take time that depends on values of variables, as well as length of program Loops and Strings - lexicographic order to use <> to compare two strings - Immutable type, cannot be changed. You can change the definition of s but not change it. Changing binding Approximate Solutions - Suppose we want to find the root of any non negative number? - Can’t guarantee exact answer, but just look for something close enough - Start with exhaustive enumeration - Take small steps to generate guesses in order - Check to see if close enough - Good enough solution - Start with a guess and increment by some small value - |guess3|-cube <= epsilon for some small epsilon - Decreasing increment -> slower program - Increasing epsilon -> less accurate answer Some Observations - Step could be any small number - If too small, takes a long time to find square root - If too large, might skip over answer without getting close enough - In general, will take x/step times through code to find solution - Need a more efficient way to do this Bisection Search - We know that the square root of x lies between 1 and x, from mathematics - Rather than exhaustively trying things starting at 1, suppose instead we pick a number in the middle of this range - If we are lucky, this answer is close enough - If its not close enough, is guess too big or too small? - If g ** 2, then know g is too big, but now search - And if, for example, this new g is such that g ** 2 < x, then know too small, so now search - At each stage, reduce range of values by half Bisection Search Convergence - Search space - First guess: N/2 - Second guess: N/4 - Third guess: N/2^g - Guess converges on the order of log2N steps - Bisection search works when value of function varies monotonically with input - Code as shown only works for positive cubes > 1. Why? - Challenges: - Modify to work with negative cubes - Modify to work with x < 1 x < 1 - If x < 1, search space is 0 to x but cube root is greater than x and less than 1 - Modify the code to choose the search space depending on the value of x Some observations - Bisection search radically reduces computation time - being smart about generating guesses is important - Should work well on problems with “ordering” property - value of function being solved varies monotonically with input value - Here function is g ** 2; which grows as g grows Dealing with floats - Floats approximate real numbers, but useful to understand how - Decimal number: - 302 = 3*10^2 + 0*10^1 + 2*10^0 - Binary number - 10011 = 1*2^4 + 0*2^3 + 0*2^2 + 1*2^1 + 1*2^0 - (Which in decimal is 16 + 2 + 1 = 19) - Internally, computer represents numbers in binary Converting Decimal Integer to Binary - Consider example of - x = 1*2^4 + 0*2^3 + 0*2^2 + 1*2^1 + 1*2^0 = 10011 - If we take remainder relative to 2 (x%2) of this number, that gives us the last binary bit - If we then divide x by 2 (x // 2), all the bits get shifted right - x//2 = 1 *2^3 + 0*2^2 + 0*2^1 + 1*2^0 = 1001 - Keep doing successive divisions; now remainder gets next bit and so on - Lets us convert to binary form Newton-Raphson - General approximation algorithm to find roots of a polynomial in one variable - Want to find r such that p(r) = 0 - For example, to find the square root of 24, find the root of p(x) = x^2 - 24 - Newton showed that if g is an approximation to the root, then - g - p(g)/p’(g) - Is a better approximation; where p’ is derivative of p - Simple case cx^2 + k - First derivative 2cx - So if polynomial is x^2 + k, then derivative is 2x - Newton-raphson says given a guess g for root, a better guess is g - (g^2 - k)/2g Iterative Algorithms - Guess and check methods build on reusing same code - Use a looping construct to generate guesses, then check and continue - Generating guesses - Exhaustive enumeration - Bisection search - Newton-raphson (for root finding) Decomposition, Abstraction, Functions Good Programming - More code not necessarily a good thing - Measure good programmers by the amount of functionality - Introduce functions - Mechanism to achieve decomposition and abstraction Example - Projector - A projector is a black box - Don’t know how it works - Know the interface: input/output - Connect any electronics to it that can communicate with that input - Black box somehow covers image from input source to a wall, magnifying it - Abstraction idea: do not need to know how projector works to use it Example-Projector - Projecting large image for olympics decomposed into separate tasks for separate projectors’ - Each projector takes input and produces separate output - All projectors work together to produce larger image - DECOMPOSITION IDEA: different devices work together to achieve an end goal Applying these ideas to programming - Decomposition: breaking problem into different, self contained pieces - Abstraction: suppress details of method to compute something from use of that computation Create structure with decomposition - In example, separate devices - In programming, divide code into modules: - Self contained - Used to break up code - Intended to be reusable - Keep code organized - Keep code coherent - This lecture, achieve decomposition with functions - In a few weeks, achieve decomposition with classes Create structure with Abstraction - In example, no need to know how to build a projector - In programing, think of a piece of code as a black box - Cannot see details - Do not need to see details - Do not want to see details - Hide tedious coding details - Achieve abstraction with function specifications or docstrings Decomposition and abstraction - Powerful together - Code can be used many times but only has to be debugged once Functions - Write reusable piece/chunks of code, called functions - Functions are not run in a program until they are “called” or “invoked” in a program - Function characteristics: - Has a name - Has parameters (0 or more) - Has a docstring (optional) - Has a body Variable Scope - Formal parameter gets bound to the value of the actual parameter when function is called - New scope/frame/environment created when enter a function - Scope is mapping of names to objects One warning if no return statement - Python returns value None, if no return given - Represents the absence of a value Return - Only has a meaning inside a function - Only one return executed inside a function - Code inside ruction but after return statement not executed - Has a value associated with, given back to function caller Print - Can be used outside functions - Can execute many print statements inside a function - Code inside function can be executed after print statement - Has a value associated to it Functions as arguments - Arguments can take on any type, even functions Scope example - Inside a function, can access a variable defined outside - Inside a function, cannot modify a variable defined outside Specifications - A contract between the implementation of a function and the clients who will use it - Assumptions: conditions that must be met by clients of the function; typically constraints on values of parameters - Guarantees: conditions that must be met by function, providing it has been called in manner consistent with assumptions What is recursion - A way to design solutions to problems by dive and conquer or decrease and conquer - A programming technique where a function calls itself - In programming, goal is to not have infinite recursion - Must have 1 or more base cases that are easy to solve - Must solve the same problem on some other input with the goal of simplifying the larger problem input Iterative algorithms - Looping constructs lead to iterative algorithms - Can capture computation in a set of state variables that can update on each iteration through loop Multiplication - Iterative Solution - “Multiply” a * b is equivalent to “add a to itself b times” - capture state by - An iteration number (I) starts at b - I <- I - 1 and stop when 0 - A current value of computation(result) - Result <- result + a Multiplication - recursive solution - Recursive step - Think how to reduce problem to a simpler/smaller version of same problem - Base case - Keep reducing problem until reach a simple case than can be solved directly Factorial - What n do we know the factorial of? - How to reduce problem? Rewrite in terms of something simpler to reach base case Some Observations - Each recursive call to function creates its own scope/environment - Bindings of variables in a scope is not changed by recursive call - Flow of control passes back to previous scope once function call returns value Inductive Reasoning - How down know that our recursive code will work Mathematical Induction - To prove a statement indexed on integers is true for all values of n - Prove its true when n is smallest value (eg. N = 0 or n = 1) - Then prove that if it is true for an arbitrary value of n, one can show that it must be true for n+1 - Relevance to code? - Base case, we can assume that mult must return correct answer - For recursive cases, we can assume that mult correctly returns an answer for problems of size smaller than b, then by the addition step, it must also return a correct answer for problem size of b Towers of Hanoi - The story: - 3 tall spikes - Stack of 64 different discs - start on one spike - Need to move stack to second spike - Can only move one disc at a time and a larger disc can never cover up a small disc Part 3 of the course Tuples - An ordered sequence of elements, can mix element types - Immutable, cannot change element values - Represented with parenthesis - Te = () - T = (2, ‘one’, 3) - Conveniently used to swap variables -> (x, y) = (y, x) - Used to return more than one value from a function - Can iterate over tuples Lists - Ordered sequence of information, accessible by index - A list is denoted by square brackets - A list contains elements - Usually homogeneous(ie. All integers) - Can contain mixed types (not common) - List elements can be changed so a list is mutable Indices and Ordering - An element of a int is at a position (aka index) in list, indices start at 0 - Index can be a variable or expression, must evaluate to an int Changing elements - Lists are mutable - Assigning to an element at an index changes the value Iterating over a list - Compute the sum of elements of a list - Common pattern is to use Len of L in range to loop over and extract out element via i - However, you can use the L to iterate over directly - List elements are indexed 0 to len(L) - 1 - range(n) goes from 0 to n-1 Operations on lists - Add - Add elements to end of list with L.append(element) - Mutates the list - What is the dot? - Lists are python objects, everything in Python is an object - Objects have data - Objects have methods and functions - Access this information by object_name.do_something() - To combine lists together using concatenation, + operator - Mutate list with L.extend(some_list) Operations on lists - Remove - Delete element at specific index with del(L[index]) - Remove element at end of list with L.pop(), returns the removed element - Remove a specific element with L.remove(element) - Looks for the element and removes it - If element occurs multiple times, removes first occurrence - If element not in list, gives an error Convert Lists to Strings and back - Convert string to list with list(s), returns a list with every character from s an element in L - Can use s.split(), to split a string on a character parameter, splits on spaces if called without a parameter - Use ‘’.join(L) to turn a list of characters into a string, can give a character in quotes to add character between every element Other list operations - sort() and sorted() - reverse() Bring together loops, functions, range and lists - Range is a special procedure - Returns something that behaves like a tuple - Doesn’t generate the elements at once, rather it generates the first element and provides an iteration method by which subsequent elements can be generated - When use range in a for loop, what the loop variable iterates over behaves like a list - For var in range(5) - Expresion - Behind the scenes, gets converted to something that will behave like - For var in (0, 1, 2, 3, 4): - Expression Mutation, Aliasing and Cloning Lists in memory - Lists are mutable - Behave differently than immutable types - Is an object in memory - Variable name points to object - Any variable pointing to that object is affected - Key phrase to keep in mind when working with lists is side affects Alias - Hot is an alias for warm - changing one changes the other - append() has a side effect Print is not == - If two lists print the same thing, does not mean they are the same structure - Can test by mutating one and checking Cloning a list - Create a new list and copy every element using chill = cool[:] Sorting Lists - Calling sort() mutates the list, returns nothing - Calling sorted() does not mutate list, must assign result to a variable Lists of lists of lists of… - Can have nested lists - Side effects still possible after mutation Mutation and iteration - Avoid mutating a list as you are iterating over it - Python uses an internal counter to keep track of index it is in the loop - Mutating changes the list but python doesn’t update the counter - Loop never sees element 2 Functions as Objects, Dictionaries - functions are first class objects - Have types - Can be elements of data structures like lists - Can appear in expressions - As part of an assignment statement - As an argument to a function - Particularly useful to use functions as arguments when couples with lists - Aka higher order programming Generalization of hops - Python provides a general purpose HOP, map -> higher order procedure - Simple form - a unary function and a collection of suitable arguments - map(abs, [1, -2, 3, -4]) - Produces an ‘iterable’, so need to walk down it - General form - an n-ary function and n collections of arguments Testing, Debugging We aim for high quality - an analogy with soup - You are making soup but bugs keep falling in from the ceiling. What do you do? - Checking soup for bugs - Testing - Keep lid closed - Defensive programming - Clean kitchen - Eliminate source of bugs - debugging - Defensive Programming - Write specifications for functions (doc strings) - Modularize programs - Check conditions on inputs/outputs (assertions) - Testing/Validation - Compare input/output pairs to specification - “Its not working!” - “How can I break my program?” - Debugging - Study events leading up to an error - “Why is it not working?” - “How can I fix my program?” Set yourself up for easy testing and debugging - From the start, design code to ease this part - Break program into modules that can be tested and debugged individually - Document contraints on modules - What do you expect the input to be? The output to be? - Document assumptions behind code design When are you ready to test? - Ensure code runs - Remove syntax errors - Remove static semantic errors - Python interpreter can usually find these for you - Have a set of expected results - An input set - For each input, the expected output Classes of tests - Unit testing - Validate each piece of program - Testing each function separately - Regression Testing - Add test for bugs as you find them in a function - Catch reintroduced errors that were fixed previously - Integration testing - Does overall program work? - Tend to rush to do this Testing Approaches - Intuition about natural boundaries to the problem - Def is_bigger(x, y): - Returns True if y is less than x, else False - Can you come up with some natural partitions? - If no natural partitions, might do random testing - Probability that code is correct increases with more tests - Better options below - Black box testing - Explore paths through specification - Glass box testing - Explore paths through code Black box testing - Designed without looking at the code - Can be done by someone other than the implementer to avoid some implementer biases - Testing can be reused if implementation changes - Paths through specification - Build test cases in different natural space partitions - Also consider boundary conditions (empty lists, singleton list, large numbers, small numbers) Debugging - Steep learning curve - Goal is to have a bug free program - Tools - Built in IDLE and Anaconda - Python tutor - Print statement - Use your brain, be systematic in your hunt Print statements - Good way to test hypothesis - When to print - Enter function - Parameters - Function results - Use bisection method - Put print halfway in code - Decide where bug may be depending on values Error Messages - EASY - Trying to access beyond the limits of a list -> IndexError - Trying to convert an inappropriate type -> TypeError - Referencing a non-existent variable -> NameError - Mixing data types without appropriate coercion -> TypeError - Forgetting to close parenthesis, quotation -> SyntaxError Logic Errors - Hard - Think before writing new code - Draw pictures, take a break - Explain the code to - Someone else - A rubber ducky Debugging steps - Study program code: - Ask how did I get the unexpected result - Don’t ask what is wrong - Is it part of a family? - Scientific method - Study available data - Form hypothesis - Repeatable experiments - Pick simplest input to test with Dont - Write entire program - Test entire program - Debug entire program - Change code - Remember where bug was - Forget where the bug was or what change you made - Panic Do - Write a function - Test the function, debug the function - Write a function - Test the function, debug the function - Backup code - Change code - Write down potential bug in a comment - Test code - Compare new version with old version Debugging Skills - Treat as a search problem: looking for explanation for incorrect data - Study available data-both correct test cases and incorrect cases - Form an hypothesis consistent with the data - Design and run a repeatable experiment with potential to refute the hypothesis - Keep record of experiments performed: use narrow range of hypothesis Debugging as a search - Want to narrow down space of possible sources of error - design experiments that expose intermediate stages of computation (use print statements!) and use results to further narrow search - Binary search can be a powerful tool for this Stepping through the tests - Suppose we run this code: - We try the input ‘abcba’ which succeeds - We try the input ‘palinnilap’ which succeeds - But we try the input ‘ab’ which also succeeds - Lets use binary search to isolate bugs - Pick a spot about halfway through code and devise experiment - Pick a spot where easy to examine intermediate values - At this point in the code, we expect (for our test case of ‘ab’), that result should be a list [‘a’, ‘b’] - We run the code and get [‘b’] - Because of binary search, we know that at least one bug must be present earlier in the code - So we add a second print, this time inside of the loop Stepping through - This now shows we are getting the data structure result properly set up, but we still have a bug somewhere - A reminder that there may be more than one problem - This suggests second bug must lie between below print statement; lets look at isPal - Pick a point in middle of code and add print statement again; remove the earlier print statement Some Pragmatic Hints - Look for the usual suspects - Ask why the code is doing what it is, not why it is not doing what you want - The bug is probably not where you think it is - eliminate locations - Explain the problem to someone else - Dont always believe the documentation - Take a break and come back to the bug later Exceptions, Assertions - What happens when procedure execution hits an unexpected condition? - Get an exception…to what was expected - Trying to access beyond list limits - Trying to convert an inappropriate type - Referencing a non-existing variable - Mixing data types without coercion - Already seen common error types: - Syntax error: python can’t parse program - nameError: local or global name not found - AttributeError: attribute reference fails - TypeError: operand doesn’t have correct type - ValueError: operand type okay but value is illegal - IOError: IO system reports malfunction (eg. File not found) What to do with exceptions? - What to do when encounter an error? - Fail silently: - Substitute default values or just continue - Bad idea! - Return an “error” value - What value to choose? - Complicates code having to check for a special value - Stop execution, signal error condition - In Python: raise an exception - Raise exception(“descriptive string”) Dealing with exceptions - Python code can provide handlers for exceptions - Exceptions raised by any statement in body of try are handled by the except statement and execution continues after the body of the except statement Handing specific exceptions - Have separate except clauses to deal with a particular type of exception Other exceptions - Else - Body of this is executed when execution of associated try body completes with no exceptions - Finally - Body of this is always executed after try, else and except clauses, even if they raised another error or executed a break, continue or return - Useful for clean up code that should be run no matter what else happened (eg. Close a file) Control input - Appears to correct read in data, and convert to a list of lists - Now suppose we want to restructure this into a list of names and a list of grades for each entry in the overall list Assertions - Want to be sure that assumptions on state of computation are as expected - Use an assert statement to raise an AssertionError exception if assumptions not met - An example of good defensive programming Assertions as defensive programming - Assertions dont allow a programmer to control response to unexpected conditions - Ensure that execution halts whenever an expected condition is not met - Typically used to check inputs to functions procedures, but can be used anywhere - Can be used to check outputs of a function to avoid propagating bad values - Can make it easier to locate a source of a bug Where to use assertions - Goal is o spot bugs as soon as introduced and make clear where they happened - Use as a supplement to testing - Raise exceptions if uses supplies bad data input - Use assertions to: - Check types of arguments or values - Check that invariants on data structures are met - Check constraints on return values - Check for violations of constraints on procedure (eg. No duplicates in a list) The Power of OOP - Bundle together objects that share - Common attributes and - Procedures that operate on those attributes - Use abstraction to make a distinction between how to implement and object vs how to use the object - Build layers of object abstractions that inherit behaviors from other classes of objects - Create your own classes of objects on top of pythons basic classes Implementing the class - Write code from two different perspectives - All class examples we saw so far were numerical - Implementing a new object type with a class - Define the class - Define data attributes (what IS the object) - Define methods (HOW to use the object) Using the class - Using the new object type in code - Create instances of the object type - Do operations with them Class definition of an object type - Class is the type - A Coordinate type - Class Coordinate(object) - Class is defined generically - Use self to refer to any instance while defining the class - Class defines data and methods common across all instances Instance of a class - Instance is one particular object - mycoo = Coordinate(1,2) - Data values vary between instances - C1 = Coordinate(1,2) - C2 = Coordinate(3,4) - C1 and c2 have different data values because they are different objects - Instance has the structure of the class Why use OOP and Classes of objects? - Mimic real life - Group different objects as part of the same type Groups of objects have attributes - Data attributes - How can you represent object with data? - What it is - >for a coordinate: x and y values? - <for an animal: age, name> - Procedural attributes (behavior/operations/methods) - What kinds of things can you do with the object? - What it does - <for a coordinate: find distance between two> - <for an animal: make a sound> - Getters and setters should always be used outside of the class to access data attributes - An instance and dot notation - Instantiation creates instance of an object - a = Animal(3) - Dot notation used to access attributes (data and methods) though it is better to use getters and setters to access data attributes - a.getAge() - a.get_age() - Separate the internal representation from access to that representation Information Hiding - Author of class definition may change data attribute variable names - If you are accessing data attributes outside of the class and class definition changes, may get errors - Outside of class, use getters and setters instead. Use a.get_age() NOT a.age - Good style - Easy to maintain code - Prevents bugs self and other args - Self determined from instance, passed in as argument - For the method: def __init__(self, age) - Creates self, passes it in automatically as argument - For the method: def get_age(self) - Call method with => a.get_age() - or an alternate way => Animal.get_age() - Default arguments for formal parameters are used if no actual argument is given - For the method: def set_name(self, newname=“”) - Default argument used here => a.set_name() - Argument passed is used here => a.set_name(“fluffy”) Hierarchies - Parent class (super class) - Child class (sub class) - Inherits all data and behaviors of parent class - Add more info - Add more behavior - Override behavior Inheritance - Add new functionality with speak() - Instance of type Cat can be called with new methods - Instance of type animal throws error if called with new methods - __init__ is not missing, uses the animal version Which method to use? - Subclass can have methods with same name as superclass - Subclass can have methods with same name as other subclasses - For an instance of a class, look for a method name in current class definition - If not found, look for method up the hierarchy (in parent, the grandparent, and so on) - Use first method up the hierarchy that you found with that method name Instance variables - We have seen instance variables so far in code - Specific to an instance - Created for each instance, belongs to an instance - Used the generic variable name self within the class definition Class variables - Introduce class variables that belong to the class - Defined inside class but outside any class methods, outside __init__ - Shared among all objects/instances of that class Class variables and the Rabbit subclass - Subclasses inherit all data attributes and methods of the parent class - Tag used to give unique id to each new rabbit instance Special method to compare two rabbits - Decide that two rabbits are equal if they have the same two parents - Comparing ids of parents since ids are unique (due to class var) - Note that comparing objects (self.parent1==other.parent1) will call the __eq__ method over and over until call it on none. Will get an attribute error Summary of classes and OOP - Bundle together objects that share - Common attributes and - Procedures that operate on those attributes - Use abstraction to make a distinction between how to implement an object vs how to use the object - Build layers of object abstractions that inherit behaviors from other classes of objects - Create our own classes of objects on top of pythons basic classes Using Inherited Methods - Add a professor class of objects - Also a kind of MITPerson - But has different behaviors - Use as an example to see how one can leverage methods in the hierarchy Modularity Helps - By isolating methods in cases, makes it easier to change behaviors - Can change base behavior of MITPerson class, which will be inherited by all other subclass of MITPerson - Or can change behavior of a lower class in heirarchy - Change MITPERSONs speak method to: - Def speak(self, utterance): - Return (self.name + “ says: “ + utterances Example class: Gradebook - Create class that includes instances of other classes within it - Concept: - Build a data structure that can hold grades for students - Gather together data and procedures for dealing with them in a single structure, so that users can manipulate without having to know internal details Understanding Program Efficiency Want to understand efficiency of programs - Computers are fast and getting faster — so maybe efficient programs dont matter? - But data sets can be very large - Thus, simple solutions may be simply not scale with size in acceptable manner - So how could we decide which option for program is most efficient? - Separate time and space efficiency of a program - Tradeoff between then — will focus on time efficiency - Challenges in understanding efficiency of solution to a computational problem: - A program can be implemented in many different ways - You can solve a problem using only a handful of different algorithms - Would like to separate choices of implementation from choices of more abstract algorithms How to evaluate efficiency of programs - Measure with a timer - Count the operations - Abstract notion of order oof growth -> most appropriate way of assessing efficiency Timing a program - Use time module - Real that importing means. Too bring that class into your own file - Start clock - Call function - Stop clock Timing programs is inconsistent - Goal: toe value different algorithms - Running time varies between algorithms - Running time varies between implementations - Running time varies between computers - Running time is not predictable based on small inputs - Time varies for different inputs but cannot really express a relationship between inputs and time Counting operations - Assume these steps take constant time: - Mathematical operations - Comparisons - Assignments - Accessing objects in memory - Then count the number of operations executed as function of size of input Counting operations is better but still… - GOAL: to evaluate different algorithms - Count depends on algorithm - Count depends on implementations X - Count independent of computers - No real definition of which operations to count X - Count varies for different inputs and can come up with a relationship between inputs and the count Still need a better way - Timing and counting evaluate implementations - Timing evaluates machines - Want to evaluate algorithm - Want to evaluate scalability - Want to evaluate in terms of input size Need to choose which input to use to evaluate a function - Want to express efficiency in terms of input, so ned to decide what your input is - Could be an integer - Could be length of list - You decide multiple parameters to a function Different inputs change how the program runs - A function that searches for an element in list - When e is first element in the list -> best case - When e is not in list -> worst case - When look through about half of the elements in list -> average case - Want to measure this behavior in a general way Best, average and worst cases - Suppose you are given a list L of some length len(L) - Best case: minimum running time over all possible inputs of a given size, len(L) - Constant for search_for_elmt - First element in any list - Average case: average running time over all possible inputs of a given size, len(L) - Practical measure - Worst case: maximum running time over all possible inputs of a given size, len(L) - Linear in length of list for search_for_elmt - Must search entire list and not find it Orders of Growth - Want to evaluate programs efficiency when input is very big - Want to express the growth of programs run time as input size grows - Want to put an upper bound on growth - Do not need to be precise: “order of” not “exact” growth - We will look at largest factors in run time (which section of the program will take the longest to run?) Types of orders of growth - Constant: no matter how we increase the size of the input, it takes the same amount of time - Linear: double the size of the problem, I double the time it takes - Quadratic: grow with the square of the time - Logarithmic: grow with the log of the size of the problem - N log n: - Exponential Measuring order of growth: big oh notation - Big Oh notation measures an upper bound on the asymptotic growth, often called order of growth - Big Oh or O() is used to describe worst case - Worst case occurs often and is the bottleneck when a program runs - Express rate of growth of program relative to the input of the size - evaluate algorithm not machine or implementation Exact steps vs O() Def fact_iter(n): Answer = 1 While x > 1: Answer *= n N -= 1 Return answer - Computes factorial - Number of steps: 1 + 5n + 1 - Worst case asymptotic complexity: - Ignore additive constants - Ignore multiplicative constants Simplification examples - Drop constants and multiplicative factors - Focus on dominant terms - n^2 + 2n + 2 —> O(n^2) - n^2 + 100000n + 3^1000 —> O(n^2) - log(n) + n + 4 —> O(n) => n grows more rapidly than log n - 0.0001*n*log(n) + 300 n —> O(n log(n)) - 2n^30 + 3^n —> O(3^n) Analyze programs and their complexity - Combine complexity class - Analyze statements inside functions - Apply some rules, focus on dominant term - Law of Addition for O(): - Used with sequential statements - O(f(n)) + O(g(n)) is O(f(n) + g(n)) - For I in range(n): - print(‘a’) - For j in range(n*n): - print(‘b’) - Is O(n) + O(n*n) = O(n+n^2) = O(n^2) because of dominant term - Law of Multiplication for O(): - Used with nested statement/loops - O(f(n)) * O(g(n)) is O( f(n) * g(n) ) - For I in range(n): - print(‘a’) - For j in range(n*n): - print(‘b’) - Is O(n)*O(n) = O(n*n) = O(n^2) because the outer loop goes n times and the inner loop goes n times for every outer loop iteration