Difference between revisions of "Data Structures and Algorithms for Engineers"

From David Vernon's Wiki
Jump to: navigation, search
(Content details:)
(43 intermediate revisions by the same user not shown)
Line 1: Line 1:
__NOTOC__
+
<!-- __NOTOC__ -->
<span style="color:#AB0000" ><span style="font-size:10px">|</span></span><span style="color:#000000" ><span style="font-size:10px"><span style="text-decoration:underline">CARNEGIE MELLON UNIVERSITY IN RWANDA</span></span></span><span style="color:#AB0000" ><span style="font-size:10px">|</span></span>
+
<div style="float:right;">__TOC__</div>
 +
<span style="color:#AB0000" ><span style="font-size:10px">|</span></span><span style="color:#000000" ><span style="font-size:10px"><span style="text-decoration:underline">CARNEGIE MELLON UNIVERSITY AFRICA</span></span></span><span style="color:#AB0000" ><span style="font-size:10px">|</span></span>
  
 
<span style="color:#AB0000" ><span style="font-size:18px">04-630 <BR></span></span>
 
<span style="color:#AB0000" ><span style="font-size:18px">04-630 <BR></span></span>
 
<span style="color:#AB0000" ><span style="font-size:18px">Data Structures and Algorithms for Engineers</span></span>
 
<span style="color:#AB0000" ><span style="font-size:18px">Data Structures and Algorithms for Engineers</span></span>
  
<span style="color:#AB0000">Course discipline: TBD</span>
+
<!-- <span style="color:#AB0000">Course discipline: TBD</span> -->
 
+
 
<span style="color:#AB0000">Core</span>
 
<span style="color:#AB0000">Core</span>
  
Line 19: Line 19:
 
Students are expected to be familiar with programming in at least one programming language. Formal programming language training is not required. Students may not have any formal background in algorithms, data structures, analysis, or detailed design techniques and methods.
 
Students are expected to be familiar with programming in at least one programming language. Formal programming language training is not required. Students may not have any formal background in algorithms, data structures, analysis, or detailed design techniques and methods.
  
==<span style="color:#AB0000">Course description:</span> ==
+
==<span style="color:#AB0000">Course description</span> ==
  
 
Many organizations today are incorporating computer hardware and software into the products that they design and build. Most of these organizations' primary competencies are not computer science or software engineering, but rather they find that automation makes their products smarter, more capable, and more appealing in the market place. Because deep domain knowledge is needed to build these products, these organizations often hire engineers from traditional engineering disciplines to design and build the product platform, in many cases requiring them to write software to make the product actually work. These are capable engineers from many disciplines other than software engineering and unfortunately they usually learn software engineering on the job. This process typically involves considerable trial and error and often results in poorly designed and documented systems, defect laden software, bloated product development costs, unmaintainable software, and missed opportunities to leverage software development investments.  
 
Many organizations today are incorporating computer hardware and software into the products that they design and build. Most of these organizations' primary competencies are not computer science or software engineering, but rather they find that automation makes their products smarter, more capable, and more appealing in the market place. Because deep domain knowledge is needed to build these products, these organizations often hire engineers from traditional engineering disciplines to design and build the product platform, in many cases requiring them to write software to make the product actually work. These are capable engineers from many disciplines other than software engineering and unfortunately they usually learn software engineering on the job. This process typically involves considerable trial and error and often results in poorly designed and documented systems, defect laden software, bloated product development costs, unmaintainable software, and missed opportunities to leverage software development investments.  
Line 25: Line 25:
 
In addition to developing mere functionality, some application domains are often highly constrained and unforgiving in their quality attribute needs such as performance, safety, and availability. These systems intimately depend upon software to provide these capabilities in addition to basic functionality. Designing software intensive systems with these properties in a cost-effective way requires first-class computer science and software engineering expertise. While many practicing engineers often have many years of industrial experience writing software applications, many lack a formal background in computer science principles. These engineers may have had a few courses or technical training in specific computer languages or technologies, but in general they often lack formal training in algorithms, computing theory, data structures, and design among other key topics. The result is that many of these engineers are not fully realizing their potential as software engineers. This course is designed to bridge these gaps in formal computer science training.
 
In addition to developing mere functionality, some application domains are often highly constrained and unforgiving in their quality attribute needs such as performance, safety, and availability. These systems intimately depend upon software to provide these capabilities in addition to basic functionality. Designing software intensive systems with these properties in a cost-effective way requires first-class computer science and software engineering expertise. While many practicing engineers often have many years of industrial experience writing software applications, many lack a formal background in computer science principles. These engineers may have had a few courses or technical training in specific computer languages or technologies, but in general they often lack formal training in algorithms, computing theory, data structures, and design among other key topics. The result is that many of these engineers are not fully realizing their potential as software engineers. This course is designed to bridge these gaps in formal computer science training.
  
==<span style="color:#AB0000">Learning objectives:</span> ==
+
==<span style="color:#AB0000">Learning objectives</span> ==
  
 
The primary objective of the course is to provide engineers without formal training in computer science, a solid background in the key principles of computer science, in general, and of the algorithms and data-structures, in particular. The key purpose of this course is to complement the experience that engineers may already have in writing software with formal computer science underpinnings, making those engineers more capable in developing software intensive systems.  
 
The primary objective of the course is to provide engineers without formal training in computer science, a solid background in the key principles of computer science, in general, and of the algorithms and data-structures, in particular. The key purpose of this course is to complement the experience that engineers may already have in writing software with formal computer science underpinnings, making those engineers more capable in developing software intensive systems.  
Line 31: Line 31:
 
The course begins by considering the main phases of the software development lifecycle, from requirements elicitation, to computational modelling, system specification, software design, implementation, and software quality assurance, including various forms of testing, verification, and validation. Then, building on the concept of abstract data types, the course provides an in in-depth treatment of the key elements of algorithms and data-structures, beginning with the fundamentals of searching, sorting, lists, stacks, and queues, but quickly progressing to more advanced topics, including trees, graphs, and algorithmic strategies. It also covers the analysis of the performance and tractability of algorithms, finishing with automata theory and computability theory. A key focus of the course is on effective implementation and good design principles.
 
The course begins by considering the main phases of the software development lifecycle, from requirements elicitation, to computational modelling, system specification, software design, implementation, and software quality assurance, including various forms of testing, verification, and validation. Then, building on the concept of abstract data types, the course provides an in in-depth treatment of the key elements of algorithms and data-structures, beginning with the fundamentals of searching, sorting, lists, stacks, and queues, but quickly progressing to more advanced topics, including trees, graphs, and algorithmic strategies. It also covers the analysis of the performance and tractability of algorithms, finishing with automata theory and computability theory. A key focus of the course is on effective implementation and good design principles.
  
==<span style="color:#AB0000">Outcomes:</span> ==
+
==<span style="color:#AB0000">Outcomes</span> ==
  
 
After completing this course, students should be able to:
 
After completing this course, students should be able to:
Line 40: Line 40:
 
* Perform detailed, code-level design and document the design in an understandable way.
 
* Perform detailed, code-level design and document the design in an understandable way.
  
==<span style="color:#AB0000">Content details:</span> ==
+
==<span style="color:#AB0000">Content details</span> ==
Refer to the '''[[Data Structures and Algorithms for Engineers Lecture Plan|Lecture Plan]]''' for information on course delivery, including lectures, labs, assignments, and exercises.
+
Refer to the  
 +
'''[[Data Structures and Algorithms for Engineers Lecture Schedule|Lecture Schedule]]'''  
 +
<!-- '''[[Data Structures and Algorithms for Engineers Lecture Schedule 2|Lecture Schedule]]''' -->
 +
for information on course delivery, including lectures, labs, assignments, and exercises.
  
 
The course will cover the following topics:
 
The course will cover the following topics:
  
* Introduction: The Software Development Lifecycle
+
* Introduction: The Software Development Life Cycle
* Representation of Algorithms
+
* Formalisms for Representing Algorithms
* Algorithmic Complexity
+
* Analysis of Complexity
 
* Searching and Sorting Algorithms
 
* Searching and Sorting Algorithms
 
* Abstract Data Types (ADT)
 
* Abstract Data Types (ADT)
Line 56: Line 59:
 
* Heaps
 
* Heaps
 
* Graphs
 
* Graphs
* Complex networks
+
* Complex Networks
 
* Hashing
 
* Hashing
 
* Algorithmic Strategies
 
* Algorithmic Strategies
 
* Analysis of Correctness  
 
* Analysis of Correctness  
* Databases
+
<!-- * Databases -->
* Programming Paradigms
+
<!-- * Programming Paradigms -->
* Automata Theory & Computability Theory
+
* Automata Theory  
 +
* Computability Theory
  
 
''The detailed content for each of these topics follows.''
 
''The detailed content for each of these topics follows.''
Line 70: Line 74:
 
* Motivation
 
* Motivation
 
* Goals of the course
 
* Goals of the course
* Topic areas
+
* Syllabus and lecture schedule
 +
* Course operation
 +
* Software development tools for assignments
 
* Preview of course material
 
* Preview of course material
* Overview of labs, assignments, and exercises
 
 
* Levels of abstraction in information processing systems
 
* Levels of abstraction in information processing systems
* The software development life cycle  
+
* The software development life cycle: Yourdon Structured Analysis - functional,  data, and behavioural models (hierarchical decomposition trees, architecture diagrams, data flow diagrams DFD, data dictionaries, entity relationship ER diagrams, state transition diagrams)
* Static, dynamic, physical structures
+
* Software process models: waterfall, evolutionary, formal transformation, re-use, hybrid, spiral
* Architectural design: components of design, internal vs external aspects
+
* Top down design and structured design
+
* Yourdon Structured Analysis: data flow diagrams (DFD), data dictionaries, process specification, entity relationship (ER) diagrams, state transition diagrams
+
* Software development tools for assignments
+
  
<span style="color:#AB0000">Representation of Algorithms</span><BR>
+
<span style="color:#AB0000">Formalisms for Representing Algorithms</span><BR>
 
* Definition of an algorithm
 
* Definition of an algorithm
 
* Modelling software
 
* Modelling software
* Representation, communication, and analysis of algorithms
 
 
* Relational modelling
 
* Relational modelling
 
* State modelling
 
* State modelling
 +
* Practical representations
 
* Pseudo code
 
* Pseudo code
 
* Flow charts
 
* Flow charts
Line 94: Line 95:
 
* Analysis
 
* Analysis
  
<span style="color:#AB0000">Algorithmic Complexity</span><BR>
+
<span style="color:#AB0000">Analysis of Complexity</span><BR>
* Complexity analysis
+
 
* Performance of algorithms, time and space tradeoff, worst case and average case performance  
 
* Performance of algorithms, time and space tradeoff, worst case and average case performance  
 
* Big O notation
 
* Big O notation
Line 134: Line 134:
 
<span style="color:#AB0000">Stacks</span><BR>
 
<span style="color:#AB0000">Stacks</span><BR>
 
* Stack (LIFO): push, pop, peek, size, numItems operations
 
* Stack (LIFO): push, pop, peek, size, numItems operations
* Array implementation in pseudo-code (directly and array of pointers to data)
+
* Array implementation (directly and array of pointers to data)
 
* Stack applications, including evaluation of infix, prefix, and postfix expressions
 
* Stack applications, including evaluation of infix, prefix, and postfix expressions
  
 
<span style="color:#AB0000">Queues</span><BR>
 
<span style="color:#AB0000">Queues</span><BR>
 
* Queue (FIFO): enqueue, dequeue, peek, size, numItems operations
 
* Queue (FIFO): enqueue, dequeue, peek, size, numItems operations
* Array implementation in pseudo-code (directly and array of pointers to data)
+
* Array implementation (directly and array of pointers to data)
* Linked list implementation in pseudo-code
+
* Linked list implementation
 
* Circular queues
 
* Circular queues
 
* Performance considerations
 
* Performance considerations
* Deque
+
<!-- * Deque -->
  
 
<span style="color:#AB0000">Trees</span><BR>
 
<span style="color:#AB0000">Trees</span><BR>
Line 153: Line 153:
 
* Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations
 
* Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations
 
* Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring
 
* Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring
* Non-search trees: parse trees, array implementation, linked list implementation
+
<!-- * Non-search trees: parse trees, array implementation, linked list implementation -->
* Forests
+
<!-- * Forests -->
  
 
<span style="color:#AB0000">Heaps</span><BR>
 
<span style="color:#AB0000">Heaps</span><BR>
Line 164: Line 164:
 
* Operating systems heaps
 
* Operating systems heaps
 
* Implementation of heap
 
* Implementation of heap
* Heap sort (pseudo-code)
+
* Heap sort  
* d-ary heaps
+
<!-- * d-ary heaps -->
* Leftist  heaps
+
<!-- * Leftist  heaps -->
  
 
<span style="color:#AB0000">Graphs</span><BR>
 
<span style="color:#AB0000">Graphs</span><BR>
Line 179: Line 179:
 
* Dijkstra's shortest path algorithm
 
* Dijkstra's shortest path algorithm
 
* Floyd-Warshall's all-pairs algorithm
 
* Floyd-Warshall's all-pairs algorithm
* Graphs problems: routes, Hamilton paths, network flows, covering problems, museum guard problem
+
<!-- * Graphs problems: routes, Hamilton paths, network flows, covering problems, museum guard problem -->
* Fleury's Euler circuit algorithm
+
<!-- * Fleury's Euler circuit algorithm -->
  
 
<span style="color:#AB0000">Complex Networks</span><BR>
 
<span style="color:#AB0000">Complex Networks</span><BR>
* Complex systems and large networks
+
<!-- * Complex systems and large networks
* Random networks
+
* Community detection
* Degree distribution
+
* Network evolution -->
* Clustering
+
* The importance of complex networks and network science
 +
* Euler's theorem: the Bridges of Königsberg
 +
* Networks vs. graphs
 +
* Degree, average degree, and degree distribution
 +
* Bipartite networks
 +
* Path length, BFS, Connectivity, Components
 +
* Clustering coefficient
 +
* Random graph model
 
* Small world phenomena
 
* Small world phenomena
 
* Scale free networks
 
* Scale free networks
 +
* Communities
 +
* Fundamental Hypothesis
 +
* Connectedness and Density Hypothesis
 +
* Strong and weak communities
 +
* Graph partitioning
 
* Community detection
 
* Community detection
* Network evolution
+
* Hierarchical clustering
 +
* Girvan-Newman Algorithm
 +
* Modularity
 +
* Random Hypothesis
 +
* Maximum Modularity Hypothesis
 +
* Greedy algorithm for community detection by maximizing modularity
 +
* Overlapping communities
 +
* Clique percolation algorithm and CFinder
  
 
<span style="color:#AB0000">Hashing</span><BR>
 
<span style="color:#AB0000">Hashing</span><BR>
Line 199: Line 218:
 
* Hash tables: current value tables, direct access tables
 
* Hash tables: current value tables, direct access tables
 
* Managing collisions: chaining, overflow areas, re-hashing, linear probing, quadratic probing
 
* Managing collisions: chaining, overflow areas, re-hashing, linear probing, quadratic probing
* Evaluating hash functions: prime division, mid-square, folding, load factor
+
<!-- * Evaluating hash functions: prime division, mid-square, folding, load factor -->
 
* Example application: dictionaries
 
* Example application: dictionaries
* Generating hash functions and using hash structures
+
<!-- * Generating hash functions and using hash structures -->
  
 
<span style="color:#AB0000">Algorithmic Strategies</span><BR>
 
<span style="color:#AB0000">Algorithmic Strategies</span><BR>
Line 207: Line 226:
 
* Brute force
 
* Brute force
 
* Divide and conquer
 
* Divide and conquer
* Branch and bound
 
* Dynamic programming
 
 
* Greedy algorithms
 
* Greedy algorithms
* Heuristics and heuristic algorithms
+
* Dynamic programming
* Probabilistic algorithms
+
* Combinatorial search and backtracking
 +
* Branch and bound
 +
<!-- * Heuristics and heuristic algorithms -->
 +
<!-- * Probabilistic algorithms -->
  
 
<span style="color:#AB0000">Analysis of Correctness</span><BR>
 
<span style="color:#AB0000">Analysis of Correctness</span><BR>
Line 225: Line 245:
 
* Verification and validation strategies
 
* Verification and validation strategies
  
 +
<!--
 
<span style="color:#AB0000">Databases</span><BR>
 
<span style="color:#AB0000">Databases</span><BR>
 
* Databases: relational databases, hierarchical databases, NoSQL databases
 
* Databases: relational databases, hierarchical databases, NoSQL databases
Line 230: Line 251:
 
* Compression strategies, dictionary algorithm, LZ algorithm
 
* Compression strategies, dictionary algorithm, LZ algorithm
 
* File structure strategies.
 
* File structure strategies.
 
+
-->
 +
<!--
 
<span style="color:#AB0000">Programming Paradigms</span><BR>
 
<span style="color:#AB0000">Programming Paradigms</span><BR>
 
* Imperative programming
 
* Imperative programming
* Object-oriented programming
 
 
* Logic programming
 
* Logic programming
 
* Functional programming
 
* Functional programming
Line 239: Line 260:
 
* OO design methodology: UML class diagram, composite-structure diagram, architecture diagram, package diagram, object diagram, component diagram, deployment diagram, activity diagram, sequence diagram, communication diagram, interaction diagram, timing diagram, use case diagram, state machine diagram
 
* OO design methodology: UML class diagram, composite-structure diagram, architecture diagram, package diagram, object diagram, component diagram, deployment diagram, activity diagram, sequence diagram, communication diagram, interaction diagram, timing diagram, use case diagram, state machine diagram
 
* OO design principles: open/close principle, design by contract principle, dependency inversion principle, other design principles, documentation
 
* OO design principles: open/close principle, design by contract principle, dependency inversion principle, other design principles, documentation
 
+
-->
<span style="color:#AB0000">Automata Theory and Computability Theory</span><BR>
+
<span style="color:#AB0000">Automata Theory</span><BR>
 
* Regular Languages
 
* Regular Languages
 
* Finite Automata
 
* Finite Automata
 
* Nondeterminism
 
* Nondeterminism
 
* Regular Expressions
 
* Regular Expressions
* Nonregular Languages�
+
* Nonregular Languages
 
*Context-free Languages
 
*Context-free Languages
 
* Context-free Grammars
 
* Context-free Grammars
 
* Pushdown Automata
 
* Pushdown Automata
 
* Deterministic Context-Free Languages
 
* Deterministic Context-Free Languages
 +
 +
<span style="color:#AB0000">Computability Theory</span><BR>
 
* The Church-Turing Thesis
 
* The Church-Turing Thesis
 
* Turing Machines
 
* Turing Machines
 
* Variants of Turing Machines
 
* Variants of Turing Machines
* The Definition of Algorithm�
+
* The Definition of Algorithm
 
* Decidability
 
* Decidability
 
* Decidable Languages
 
* Decidable Languages
* Undecidability�
+
* Undecidability
 
* Reducibility
 
* Reducibility
  
==<span style="color:#AB0000">Faculty:</span> ==
+
==<span style="color:#AB0000">Lecture Schedule</span> ==
 +
Refer to the [[Data Structures and Algorithms for Engineers Lecture Schedule|Lecture Schedule]] for information on course delivery, including lectures, labs, assignments, and exercises.
 +
 
 +
==<span style="color:#AB0000">Faculty</span> ==
 
[http://www.vernon.eu David Vernon]
 
[http://www.vernon.eu David Vernon]
  
==<span style="color:#AB0000">Delivery:</span> ==  
+
==<span style="color:#AB0000">Delivery</span> ==  
 
Face-to-face.
 
Face-to-face.
  
==<span style="color:#AB0000">Students assessment:</span> ==  
+
==<span style="color:#AB0000">Student assessment</span> ==  
 
This course includes several hands-on programming and analysis assignments. Students will program mainly in C/C++. The programming assignments include individual assignments and a team capstone project in teams of 2-3 people. In addition to programming assignments, students will be assigned readings to support the lecture material.
 
This course includes several hands-on programming and analysis assignments. Students will program mainly in C/C++. The programming assignments include individual assignments and a team capstone project in teams of 2-3 people. In addition to programming assignments, students will be assigned readings to support the lecture material.
  
 
Marks will be awarded as follows.
 
Marks will be awarded as follows.
  
Individual Assignments 50%
+
Seven individual assignments 70%.
Final Capstone Project 40% (The capstone project will be completed in 2-3 person teams).
+
Final examination 30%.
Instructor Judgement 10% (We will use time tracking and observation to determine this part of the grade).
+
<!-- Instructor Judgement 10% (We will use time tracking and observation to determine this part of the grade). -->
+
 
==<span style="color:#AB0000">Software requirements:</span> ==
+
==<span style="color:#AB0000">Software tools</span> ==
 
We will use Microsoft Visual C++ Express compiler, version 10.0 (also known as Visual C++ 2010) and CMake running on Windows 7 (64 bit).   
 
We will use Microsoft Visual C++ Express compiler, version 10.0 (also known as Visual C++ 2010) and CMake running on Windows 7 (64 bit).   
  
A complete software installation guide will be provided in due course.
+
Please follow the instructions provided in the [[04-630 Software Development Environment|Software Development Environment]] installation guide.
  
==<span style="color:#AB0000">Course texts:</span> ==
+
==<span style="color:#AB0000">Course texts</span> ==
  
''Algorithmics: The Spirit of Computing'', Third Edition, David Harel and Yishai Feldman.
+
David Harel and Yishai Feldman, ''Algorithmics: The Spirit of Computing'', Third Edition.
  
''Data Structures and Algorithms'',  Alfred V. Aho,  Jeffrey D. Ullman, and John E. Hopcroft.
+
Alfred V. Aho,  Jeffrey D. Ullman, and John E. Hopcroft, ''Data Structures and Algorithms''.
 +
 
 +
A selection of examples will be taken from Steven Skiena, "The Algorithm Design Manual", Second Edition.
  
 
A selection of papers and readings will be provided to complement these required textbooks.
 
A selection of papers and readings will be provided to complement these required textbooks.
  
 
+
==<span style="color:#AB0000">Acknowledgments</span> ==
==<span style="color:#AB0000">Acknowledgments:</span> ==
+
This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University, USA, Course CS-CO-412 Algorithms and Data Structures given by David Vernon at Innopolis University, Russia, and Course IT706A Scientific Theory in Informatics given by David Vernon and others at the University of Skövde, Sweden.
This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University.
+
 
+
Additional topics and teaching material have been taken from Course CS-CO-412 Algorithms and Data Structures given by David Vernon at the Innopolis University.
+

Revision as of 11:19, 18 April 2017

|CARNEGIE MELLON UNIVERSITY AFRICA|

04-630
Data Structures and Algorithms for Engineers

Core

Units: 12

Lecture/Lab/Rep hours/week: 4 hours lectures/week, 1.5 hours labs/week (two sessions), 1 hour recitation/week (two sessions)

Semester: Spring

Pre-requisites: programming skills

Students are expected to be familiar with programming in at least one programming language. Formal programming language training is not required. Students may not have any formal background in algorithms, data structures, analysis, or detailed design techniques and methods.

Course description

Many organizations today are incorporating computer hardware and software into the products that they design and build. Most of these organizations' primary competencies are not computer science or software engineering, but rather they find that automation makes their products smarter, more capable, and more appealing in the market place. Because deep domain knowledge is needed to build these products, these organizations often hire engineers from traditional engineering disciplines to design and build the product platform, in many cases requiring them to write software to make the product actually work. These are capable engineers from many disciplines other than software engineering and unfortunately they usually learn software engineering on the job. This process typically involves considerable trial and error and often results in poorly designed and documented systems, defect laden software, bloated product development costs, unmaintainable software, and missed opportunities to leverage software development investments.

In addition to developing mere functionality, some application domains are often highly constrained and unforgiving in their quality attribute needs such as performance, safety, and availability. These systems intimately depend upon software to provide these capabilities in addition to basic functionality. Designing software intensive systems with these properties in a cost-effective way requires first-class computer science and software engineering expertise. While many practicing engineers often have many years of industrial experience writing software applications, many lack a formal background in computer science principles. These engineers may have had a few courses or technical training in specific computer languages or technologies, but in general they often lack formal training in algorithms, computing theory, data structures, and design among other key topics. The result is that many of these engineers are not fully realizing their potential as software engineers. This course is designed to bridge these gaps in formal computer science training.

Learning objectives

The primary objective of the course is to provide engineers without formal training in computer science, a solid background in the key principles of computer science, in general, and of the algorithms and data-structures, in particular. The key purpose of this course is to complement the experience that engineers may already have in writing software with formal computer science underpinnings, making those engineers more capable in developing software intensive systems.

The course begins by considering the main phases of the software development lifecycle, from requirements elicitation, to computational modelling, system specification, software design, implementation, and software quality assurance, including various forms of testing, verification, and validation. Then, building on the concept of abstract data types, the course provides an in in-depth treatment of the key elements of algorithms and data-structures, beginning with the fundamentals of searching, sorting, lists, stacks, and queues, but quickly progressing to more advanced topics, including trees, graphs, and algorithmic strategies. It also covers the analysis of the performance and tractability of algorithms, finishing with automata theory and computability theory. A key focus of the course is on effective implementation and good design principles.

Outcomes

After completing this course, students should be able to:

  • Recognize and analyze critical computational problems, generate alternative solutions to problems, and assess their relative merits.
  • Understand, analyze, and characterize those factors that influence algorithmic computational performance and memory consumption.
  • Design, implement, and document appropriate, effective, and efficient data structures & algorithms for a variety of real-world problems.
  • Understand detailed software structures and their underlying strengths and weaknesses.
  • Perform detailed, code-level design and document the design in an understandable way.

Content details

Refer to the Lecture Schedule for information on course delivery, including lectures, labs, assignments, and exercises.

The course will cover the following topics:

  • Introduction: The Software Development Life Cycle
  • Formalisms for Representing Algorithms
  • Analysis of Complexity
  • Searching and Sorting Algorithms
  • Abstract Data Types (ADT)
  • Containers, Dictionaries, and Lists
  • Stacks
  • Queues
  • Trees
  • Heaps
  • Graphs
  • Complex Networks
  • Hashing
  • Algorithmic Strategies
  • Analysis of Correctness
  • Automata Theory
  • Computability Theory

The detailed content for each of these topics follows.


Introduction: The Software Development Life Cycle

  • Motivation
  • Goals of the course
  • Syllabus and lecture schedule
  • Course operation
  • Software development tools for assignments
  • Preview of course material
  • Levels of abstraction in information processing systems
  • The software development life cycle: Yourdon Structured Analysis - functional, data, and behavioural models (hierarchical decomposition trees, architecture diagrams, data flow diagrams DFD, data dictionaries, entity relationship ER diagrams, state transition diagrams)
  • Software process models: waterfall, evolutionary, formal transformation, re-use, hybrid, spiral

Formalisms for Representing Algorithms

  • Definition of an algorithm
  • Modelling software
  • Relational modelling
  • State modelling
  • Practical representations
  • Pseudo code
  • Flow charts
  • Finite state machines
  • UML
  • Predicate logic
  • Analysis

Analysis of Complexity

  • Performance of algorithms, time and space tradeoff, worst case and average case performance
  • Big O notation
  • Recurrence relationships
  • Analysis of complexity of iterative and recursive algorithms
  • Recursive vs. iterative algorithms: runtime memory implications
  • Complexity theory: tractable vs intractable algorithmic complexity
  • Example intractable problems: travelling salesman problem, Hamiltonian circuit, 3-colour problem, SAT, cliques
  • Determinism and non-determinism
  • P, NP, and NP-Complete classes of algorithm

Searching and Sorting Algorithms

  • Linear and binary search (iterative and recursive)
  • In-place sorts: bubblesort (efficient and inefficient), selection sort, insertion sort.
  • Not-in-place sorts: Quicksort, merge sort.
  • Complexity analysis
  • Characteristics of a good sort
  • Speed, consistency, keys, memory usage, length & code complexity, stability
  • Other sorts ordered by complexity

Abstract Data Types (ADT)

  • Vector example exercise
  • History of abstraction
  • Abstract Data Types (ADT)
  • Information hiding
  • Types and typing
  • Encapsulation
  • Efficiency
  • Design practices

Containers, Dictionaries, and Lists

  • Basic operations
  • Implementation with arrays and linked lists
  • Singly linked lists
  • Doubly linked lists
  • Performance considerations

Stacks

  • Stack (LIFO): push, pop, peek, size, numItems operations
  • Array implementation (directly and array of pointers to data)
  • Stack applications, including evaluation of infix, prefix, and postfix expressions

Queues

  • Queue (FIFO): enqueue, dequeue, peek, size, numItems operations
  • Array implementation (directly and array of pointers to data)
  • Linked list implementation
  • Circular queues
  • Performance considerations

Trees

  • Concepts and terminology: level, height, external and internal nodes, skinny, fat, complete, left-complete, perfect, multi-way, d-ary
  • Types of tree: binary, binary search, B-tree, 2-3 tree, AVL, Red-Black
  • Binary trees and binary search trees
  • Tree traversals: inorder, preorder, postorder
  • Fixed-length codes, variable length codes, optimal code trees, Huffman's algorithm and implementation
  • Height-balanced trees: AVL Trees, RR, RL, LR, LL rotations
  • Height-balanced trees: Red-Black Trees, single promotion, zig-zag promotion, recolouring and restructuring

Heaps

  • Heap basics
  • Types of heap: min heaps and max heap
  • Heap characteristics
  • Heap operations: delete max/min, down heap, up heap, merge, construct, heapify; complexity of operations
  • Priority queues
  • Operating systems heaps
  • Implementation of heap
  • Heap sort

Graphs

  • Types of graphs: directed, undirected, weighted, unweighted, cyclic, acyclic, directed acyclic, simple, non-simple, implicit, explicit, embedded, topological
  • Adjacency matrix representation,
  • Adjacency list representation
  • Topological sort
  • Euler's theorem
  • Graph traversal: breadth-first and depth-first, uses of
  • Depth-first search and maze traversal
  • Spanning trees and minimum spanning trees, Kruskal's algorithm, Prim's algorithm
  • Dijkstra's shortest path algorithm
  • Floyd-Warshall's all-pairs algorithm

Complex Networks

  • The importance of complex networks and network science
  • Euler's theorem: the Bridges of Königsberg
  • Networks vs. graphs
  • Degree, average degree, and degree distribution
  • Bipartite networks
  • Path length, BFS, Connectivity, Components
  • Clustering coefficient
  • Random graph model
  • Small world phenomena
  • Scale free networks
  • Communities
  • Fundamental Hypothesis
  • Connectedness and Density Hypothesis
  • Strong and weak communities
  • Graph partitioning
  • Community detection
  • Hierarchical clustering
  • Girvan-Newman Algorithm
  • Modularity
  • Random Hypothesis
  • Maximum Modularity Hypothesis
  • Greedy algorithm for community detection by maximizing modularity
  • Overlapping communities
  • Clique percolation algorithm and CFinder

Hashing

  • Using keys to address data
  • Mappings: injection, surjection, bijection
  • Map ADT
  • Hash functions
  • Hash tables: current value tables, direct access tables
  • Managing collisions: chaining, overflow areas, re-hashing, linear probing, quadratic probing
  • Example application: dictionaries

Algorithmic Strategies

  • Classes of algorithms
  • Brute force
  • Divide and conquer
  • Greedy algorithms
  • Dynamic programming
  • Combinatorial search and backtracking
  • Branch and bound

Analysis of Correctness

  • Types of software defects
  • Code module design
  • Syntactic, semantic, logical defects
  • (Semi-)formal verification: partial vs. total correctness
  • Invariant assertion method
  • Simple proof strategies: by contradiction, counterexample, induction
  • Dynamic testing: unit tests, test harness, stubs, drivers, integration testing, regression testing.
  • Static tests: reviews, walkthroughs, inspections, reviewing algorithms and software
  • Pair programming
  • Verification and validation strategies

Automata Theory

  • Regular Languages
  • Finite Automata
  • Nondeterminism
  • Regular Expressions
  • Nonregular Languages
  • Context-free Languages
  • Context-free Grammars
  • Pushdown Automata
  • Deterministic Context-Free Languages

Computability Theory

  • The Church-Turing Thesis
  • Turing Machines
  • Variants of Turing Machines
  • The Definition of Algorithm
  • Decidability
  • Decidable Languages
  • Undecidability
  • Reducibility

Lecture Schedule

Refer to the Lecture Schedule for information on course delivery, including lectures, labs, assignments, and exercises.

Faculty

David Vernon

Delivery

Face-to-face.

Student assessment

This course includes several hands-on programming and analysis assignments. Students will program mainly in C/C++. The programming assignments include individual assignments and a team capstone project in teams of 2-3 people. In addition to programming assignments, students will be assigned readings to support the lecture material.

Marks will be awarded as follows.

Seven individual assignments 70%. Final examination 30%.

Software tools

We will use Microsoft Visual C++ Express compiler, version 10.0 (also known as Visual C++ 2010) and CMake running on Windows 7 (64 bit).

Please follow the instructions provided in the Software Development Environment installation guide.

Course texts

David Harel and Yishai Feldman, Algorithmics: The Spirit of Computing, Third Edition.

Alfred V. Aho, Jeffrey D. Ullman, and John E. Hopcroft, Data Structures and Algorithms.

A selection of examples will be taken from Steven Skiena, "The Algorithm Design Manual", Second Edition.

A selection of papers and readings will be provided to complement these required textbooks.

Acknowledgments

This syllabus is based mainly on Course 04-630 Computer Science Principles for Practicing Engineers given by Mel Rosso-Llopart and Anthony J. Lattanze at Carnegie Mellon University, USA, Course CS-CO-412 Algorithms and Data Structures given by David Vernon at Innopolis University, Russia, and Course IT706A Scientific Theory in Informatics given by David Vernon and others at the University of Skövde, Sweden.