Wednesday, April 2, 2014

Week 11 Sorting and Efficiency

This is the last week of lectures for CSC148. I am going to conclude some concepts for sorting and efficiency. We've learned mainly four kinds of sorting so far. They are bubble sort, insertion sort, selection sort, merge sort and quick sort.

The worst cases for bubble sort, insertion sort and selection sort are that the inputs are sorted backwards. And in this case, these three sorting method have a runtime of O(n^2). Merge sort does not have a best case or worst case. It has a runtime of O(nlgn) because the original list have to be separated lgn times and every time it is separated into two parts, roughly n comparisons have to be made. For quick sort, the best case needs a runtime of O(nlogn). The worst case is when the input is already sorted correctly. In this case, the runtime is O(n^2) because every element need to be chosen as the pivot once and every time a pivot is chosen, roughly n comparisons are needed to be made.

However, in the lecture professor taught us a way that can make the runtime of quick sort to be O(nlgn) regardless of what input is. The basic idea of this method is choosing the pivot randomly. If the pivot is not chosen randomly, then take [1, 2, 3, 4, 5] as an example, the pivot that is chosen every time is the first element in the list. And according to the algorithm of quick sort --- the elements that are smaller than the pivot go to the left and the elements that are greater than the pivot go to the right --- a result of this is that every time the pivot is chosen, the list stays where it is and does not make any change. So the time of recursion that the algorithm has to do in this case is equal to the length of the input, which is 5. And this causes the runtime in this case to be O(n^2). However, if we choose the pivot randomly, then we can effectively avoid this problem. For example, if we randomly choose the first pivot to be 3, then we only have to do 3 time of recursion, and this saves a dramatic amount of running time. From this lecture I learned that sometimes making ordered items random is also an alternative way to solve a problem.

Also, in Dustin's lecture, we discussed a harder problem on efficiency. The problem is as follows.
f1 has runtime of O(nlogn) in worst case and f2 has runtime of O(n^2) in worst case.
def g(n):
         return f1(f2(n))
de f(n):
         return f2(f1(n))
Discuss the runtime for the best case and the worst case for function g(n) and function f(n).
Because this is the first time I have encounter a question that requires me to solve the runtime for two combined functions, I had no idea. But after Dustin gave us some hints, I found out that for g(n), f2(n) must be deal in the first place, so f2(n)'s runtime in the worst case is O(n^2) and therefore the runtime for the best case for g(n) is O(n^2). As for the worst case of g(n), because the worst case for f2(n) is O(n^2), so f1(f2(n))'s worst case has a runtime of O(n^2 * log(n^2)).


Wish everybody good marks on the finals!

Tuesday, March 25, 2014

Week 10

We learned some sorting methods this week, especially quick sort and merge sort. At first, the algorithm for quick sort is just set the first item in the list as the pivot and move everything smaller than the pivot to the left of that pivot and move everything greater than the pivot to the right side of the pivot like this:
smaller_than_pivot = [x for x in L[1:] if x < pivot]
larger_than_pivot = [x for x in L[1:] if x >= pivot]
Then use recursion
return quick(smaller_than_pivot) + [pivot] + quick(larger_than_pivot)
to make items on left and right side of the pivot sorted. Then the function is done. However, we ran into a problem in a special case. For example, when the list is already sorted, either greatest to the smallest or smallest to the greatest, the number of recursion needed would be the same as the size of inputs. So for a very large list, this algorithm will exceed the recursion depth. The way to solve this problem is that we can choose the pivot randomly. For instance, is not chosen randomly, list [1, 2, 3, 4, 5] will have 5 recursion, but by choosing a random pivot, we can easily avoid this problem, since 3 could be chosen as a pivot at first.
For merge sort, we wrote a helper function merge2,
def merge2(L1:list, L2:list) -> list:
    """return merge of L1 and L2
    NOTE: modifies L1 and L2

    >>> merge2([1, 3, 5], [2, 4, 6])
    [1, 2, 3, 4, 5, 6]
    >>> merge2([1, 2, 3], [0, 4, 5])
    [0, 1, 2, 3, 4, 5]
    >>> merge2([0], [1, 2, 3, 4])
    [0, 1, 2, 3, 4]
    """
    L = []
    L1.reverse()
    L2.reverse()
    while L1 and L2:
        if L1[-1] < L2[-1]:
            L.append(L1.pop())
        else:
            L.append(L2.pop())
    L1.reverse()
    L2.reverse()
    return L + L1 + L2
This function merge two lists together in a non-decreasing order. Then it's easier to write merge_sort function.
def merge_sort(L):
    """Produce copy of L in non-decreasing order

    >>> merge_sort([1, 5, 3, 4, 2])
    [1, 2, 3, 4, 5]
    """
    if len(L) < 2 :
        return L
    else :
        left_sublist = L[:len(L)//2]
        right_sublist = L[len(L)//2:]
        return merge2(merge_sort(left_sublist),
                      merge_sort(right_sublist))


Monday, March 24, 2014

Week 9

This week we discussed more on BST. One thing that I found interesting was the deletion of data from BST root at node. First is a simple case, which is when the item wanted to be deleted has no children. Then the base case would be if children is equal to None. In this first case, when the data to delete is less than that at node, then delete from left, which leads to a recursion function, and also return this node. When the data to delete is greater than that at the node, then delete from right of the Binary Search Tree and also return that node. The harder case is actually deleting a node that is equal to the node and that node has one or two children. If the node has either no left child or right child, then just replace the node by right or left child. If that node has both children, then replace the node with either the right-most child on the left subtree or the left-most child on the right subtree, as shown on the figure below. And here comes a point that I didn't fully understand in the lecture. That is, replacing by which one is actually correct? Since replacing either of them could be correct.

Also, we discussed about the efficiency of BST searching. Since BST allows us to eliminate the left or the right subtree, nearly half of the efficiency is improved in the case of a well-balanced BST. The running time for binary search is actually clog(n), where c is a constant and n is the size of the input. Another thing I learned from lecture is that the running time only depends on the highest order. For example, the running times for 0.23n^2 + 100n and 5n^2 increase at same rate when the input n is increased.

Tuesday, March 11, 2014

Week 8

This week we did some practice on LinkedList and learned a new Object called binary search tree. For LinkedList, we implemented the class using insertion, deletion and reversing. Insertion is quite easy because the function of this method is just assign self.front = LListNode(value, self.front), since self.front is the whole LinkedList. Deletion method is also not difficult. However, reversing function is harder because it contains recursion and I cannot completely understand and trace the function.

Binary Search Tree is a new Object that we learned this week. It is a tree that is sorted. In other words, for children on every node, the left child is always smaller than the right child and the their parent is in the middle of the left and right child. This makes some methods of BST more efficient. And we also learned how to write the __str__ method for BST. A string representation of binary search tree is shown as follows.
>>> print(BST([5,3,2,6,7])
                  7
         6
5
         3
                  2

 Inserting method for BST also uses recursion. If the number given is greater than the node, then the function will call inserting method again on the right children. If the number given is less than the node, then the function will call inserting method again on the left children. This continues to happen until there is no children. And we used a helper function in the inserting method to achieve this.  

Tuesday, March 4, 2014

Week 7: Recursion

Recently we learned and practiced recursion a lot. Recursion is a structure that is repeating items that has a structure that is similar to the structure of itself. For instance,
 is a recursion structure. Because all branches generated from the previous bigger branch have the same structure as the previous branches. Let me explain recursion more clearly using a simple example using Python.

def sum_list(L: list) -> float:
    """
    Return sum of the numbers in L.

    L --- a possibly nested list of numbers.  L, and all sublists, are
       non-empty

    >>> sum_list([1, 2, 3])
    6
    >>> sum_list([1, [2, 3, [4]], 5])
    15
    """
    return sum( [sum_list(x) if isinstance(x, list) else x for x in L])

What this function does is that it produces a list that contains the sum of the elements in the nested lists for every nested list in the original list and the elements in the original list if they are not lists. Then the function calculates the sum of that produced lists.

We were focusing on LinkedList this week. A LinkedList has a structure like this:
.

It contains a value in the head and a LinkedList as the rest. So it has a recursion structure. I found some methods related to LinkedList is pretty hard like __len__() because it has two base cases. One of them is when the LinkedList is empty and the second one is when self.rest is empty. This costs me lots of time in the lab.


I think the best way to enhance my knowledge with recursion is to trace the values in the function for more times and maybe draw some pictures in order to have a better understanding of the concepts.

Wednesday, February 26, 2014

Week 6


This week, we learned an object called Tree. In the lecture, the professor taught us some terminologies like nodes, leaf, height, path, length and so on. And there are three ways to go through the Tree without missing any nodes. The first method is called pre-order traversal. Take the following tree list as an example, then the path is [5, 4, 3, 2, 1]. For the same example, in-order traversal will generate the path in a different way, which is [4, 5, 2, 3, 1]. The third approach is called post-order traversal and its path is [4, 2, 1, 3, 5]. Actually a tree list is has something to do with recursion. The reason for this is that a tree list is consisting of nodes and subtrees of those nodes. And the subtree itself is also a tree list. This forms a recursion structure.


Another thing I learned this week is that method __repr__() is used to return a string representation for the object but in a more useful way. The reason for this is that when we copy and paste what __repr__() returns, a same object will be produced. This is what's different from.

Sunday, February 9, 2014

Week 5

This week we did more recursion both in the lecture and in the labs. And I began to do the first big assignment, which was about Hanoi Tower, in CSC148H with two of my friends. At the time when we began to do the assignment, we were totally stuck and didn't even know what was the purpose of the program. And we just ignored that and keep on typing some codes. After sometime, we finally realized the purpose of the program and found lots of bugs in our codes. For example, we didn't match what we wrote and what was given by the instructors. The lesson I learned from this assignment is that we need to read through all the materials and the instructions before writing the code. Luckily, I realized this not till the end of the course.


During the process of doing assignment 1, I found that the raising error part was somehow difficult. For example, when a motion of a cheese was detected to be invalid, that cheese went into problem...that means it couldn't be moved again. We spent lot of time fixing that problem and we were now on the next step. Another problem that I found confusing was that there were too many files of codes and I thought I could achieve the same result writing codes in different files, so sometimes I got confused about how to write in a better and clearer way.