Decision Trees

Grade Level: 
High School
Subject: 
algorithms, programming, computer science
 
Lesson Synopsis 
This lesson activity explores how simple computing concepts/algorithms have contributed to solving real life problems. Students will also learn solving problems with decision trees. Students will have the opportunity to work in teams to explore an example of how the decision tree can be used for detecting subscription fraud.
 
Lesson Objectives 
- Learn about decision trees.
 
- Learn about algorithms: Decision tree is one way to display an algorithm.
 
- Learn about subscription fraud and how computing can help solve telecommunication challenges.
 
- Learn the implementation of algorithms using a programming language.
 
- Learn about teamwork and computing problem solving in groups.
 
Internet Connections
- TryComputing (www.trycomputing.org).
 
- ACM/IEEE Curricular task forces (http://csta.acm.org/Curriculum/sub/ACMK12CSModel.html).
 
- Wikipedia: Decision trees (www.wikipedia.org)
 
- Teamwork (Teamworkpm.net)
 
- Wikipedia: Data analysis technique for fraud detection (www.wikipedia.org)
 
Recommended Reading
Introduction to algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein ISBN: 978-0-262-03384-8.
 
- Advances in security and payment methods for mobile commerce By Wen Chen Hu, Chung-Wei Lee, Weidong Kou ISBN: 1591403456
 
- The decision tree by Kenneth A. Friedman ISBN: 0961586869
   
Optional Writing Activity 
Write a decision tree on the type of decision software developers would need to consider when developing a game. 
 
Materials
 
Procedure
1. Show students the various Student Reference Sheets. These may be read in class, or provided as reading material.
 
2. Divide students into groups of 3 or 4.
 
3. Each team should assign tasks to its members. Members could also be named based on task assigned to the member. E.g.Member1:Your task is the analyze the manual procedure for detecting subscription fraud and build a decision tree from it hence he can be called the “Computer analyst/Team leader” of the team, Member2: Your task will be to build an algorithm in the form of if-else statements from the decision tree hence he can be called the “algorithm designer” of the team, Member3: Your task will be to illustrate how it will be implemented in a programming language” etc. 
 
4. Teams should have the option of using Teamwork project management (www.teamworkpm.net) to complete the project. 
 
5. The “Team leader/Computer analyst” of the team should read about subscription fraud and write a short report briefing he’s other team members about the project.
  
6. All team members should work together to develop a step by step procedure on how they intend to achieve the goal. Starting from understanding how to detect subscription fraud manually, using this method to build a decision tree, translating the decision tree into if-else instructions, implementing it using PHP programming language and testing.
 
7. The First team member should also develop the decision tree for subscription fraud detection. Draw the decision tree, write a report on it, present & explain it to the other team members.
 
8. The Second team member should translate the decision tree into line by line conditional statements, write a report and explain it to he’s other team members.
 
9. The third team member should investigate how this conditional statement can be translated into a program using any programming language and present he’s report to he’s team members.
 
10.Teams complete an evaluation/reflection worksheet, and present their findings to the class.
 
Extended Option (Using teamworkpm.net)
If students decide to use teamworkpm.net:
1. The team leader should open an account with teamworkpm.net using the team’s name.
2. The team leader should invite he’s other members using their email addresses to join the team online.
3. The team can now use teamworkpm.net to submit reports, communicate, and collaborate to build the program using the procedure above.
 
Advanced Option
Have advanced students with knowledge in PHP to write a program using the decision tree to detect fraud.
 
Time Needed
Three to four 45 minute sessions.
   
Student Resource:
Algorithm and Decision Trees
 
Algorithms
In simple words an algorithm is a step-by-step procedure for calculations. In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, when executed, will proceed through a finite number of well-defined successive states, eventually producing "output" and terminating at a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate random input. 
 
Decision trees
Decision trees are used for deciding between several courses of action. Decision trees are predictive model, used to graphically organize information about possible options, consequences and end value. Decision trees are easy to understand and interpret. A decision tree is also a method for expressing algorithms. The goal of a decision tree is to create a model that predicts the value of a target variable based on several input variables.
 
Imagine you only ever do four things at the weekend: go shopping, watch a movie, play tennis, go walking or just stay in. What you do depends on three things: the weather (windy, rainy or sunny); how much money you have (rich or poor) and whether your parents are visiting. You say to your yourself: if my parents are visiting, we'll go to the cinema. If they're not visiting and it's sunny, then I'll play tennis, but if it's windy, and I'm rich, then I'll go shopping. If they're not visiting, it's windy and I'm poor, then I will go walking. If they're not visiting and it's rainy, then I'll stay in. To remember all this, you draw a flowchart which will enable you to read off your decision. We call such diagrams decision trees.  
 
We can see why such diagrams are called trees, because, while they are admittedly upside down, they start from a root and have branches leading to leaves (the tips of the graph at the bottom). Note that the leaves are always decisions, and a particular decision might be at the end of multiple branches.
 
Reading Decision Trees
There is a link between decision tree representations and logical representations, which can be exploited to make it easier to understand (read) learned decision trees. If we think about it, every decision tree is actually a disjunction of implications (if ... then statements), and the implications are Horn clauses: a conjunction of literals implying a single literal. Decision trees don't have to be representations of decision making processes, and they can equally apply to categorization problems.
 
Student Resource:
What is Subscription Fraud?
 
Telecommunication fraud
Telecommunication fraud is the theft of telecommunication service (telephones, cell phones, computers etc.) or the use of telecommunication service to commit other forms of fraud. Victims include consumers, businesses and communication service providers. Subscription fraud is a type of telecommunication fraud
 
Subscription fraud
The main form of telecommunication fraud that has taken place to date is subscription fraud (that is, the failure to pay for services used).Detection of such fraud is difficult because it is easily masked as bad debt. In subscription fraud, offenders typically misrepresent their identity in order to avoid payment. Misrepresentation of identity is also important because in the most severe cases, subscription fraud is not an end in itself but rather a platform for several other frauds.
 
Subscription fraud is the act of using telephone services without the intension of paying. Customer subscribe for postpaid services, when the time comes for customer to pay, they don’t. The telecommunication industry has loss billions of dollars to subscription fraud.
 
Detecting Subscription Fraud
From investigations the telecommunication industry discovered a way for detecting subscription fraud. The procedure is illustrated below:
Two new variables were created; Pay Duration (in days) and ratio.
Pay Duration: Date of checking customer’s fraud status (today’s date)-last date of payment by customer
Ratio: balance/credit limit. Where balance=Outstanding balance + unbilled
 
Investigations showed that whenever the pay duration is greater than 31 days the customer is likely to be involved in BAD DEBT.
 
But if it’s less than or equal to 31 days, the ratio of the customer is checked, if the ratio is less than 1 then the customer’s fraud status is NOT FRAUD, but if the ratio is greater than 1, the pay duration is checked again, if the pay duration is less than or equal to 0 days then the customers fraud status is SUBSCRIPTION FRAUD, but if its greater than 0 days it is BAD DEBT.
 
A close study of the manual procedure for detecting subscription should make sense. Examples are available at http://www.tryengineering.org/lesson_detail.php?lesson=117
 
Teamwork/Project Management
This lesson has introduced group working. Software systems are not built by individuals, but by large teams. More advanced groups may be interested in considering how they could manage decision making within the context of building large software systems and thinking about the many decision this would involve. Teams don’t need to be meeting all team members very frequently because most of the work can be done online. Because of the importance of teamwork, computer scientists are developing different simulation programs for teamwork to be achieved online. 
 
Teamworkpm.net is one of the numerous online project management software available online for computer scientist, managers etc to use to achieve their goals. Some companies already using the teamworkpm.net includes;
 
Features:
• Task management.
• Teamwork.
• Quick overview of recent activity.
• Easy access to all you projects.
• Keep team informed with announcements.
• Online communication among team members.
• Assign tasks to team members
 
Instructions for using teamworkpm.net:
Teamworkpm.net is very easy to use. Visit the website and follow the instructions.
 
Mercurial (mercurial.selenic.com):
Another kind of decision that teams need to make is about how they organize individuals in the team to collaboratively work on the same documents without loosing one another’s work by editing the same documents at the same time. More advanced computer scientist use software like the mercurial source control management. Mercurial is a free, distributed source control management tool. It offers you the power to efficiently handle projects of any size while using an intuitive interface. It is easy to use and hard to break, making it ideal for anyone working with versioned files. Visit website to learn more. Such systems effectively encode a decision making protocol in their source code to ensure that teams can work collaboratively without risk of accidently overwriting the work of other team members.
 
Student Assignment:
Design Your Subscription Fraud Detection Program
You are a team of computer scientist and have been approached by a telecommunication company to develop a program where they can check the fraud status. The Information Technology Department of the company requires you to send them the decision tree and algorithm (Conditional Statements) you intend to use.
 
You are to work as a team, and have the option of using teamworkpm.net. Study your student resource and use it to write the algorithm and draw your decision tree.
  
1. Draw a simple decision tree to categorize animals (mammal, fish, reptile, bird etc) using physical attributes (whether it lays eggs, number of legs, etc.). This could easily be phrased as a question of learning a decision tree to decide which category a given animal is in, e.g., if it lays eggs and is homeothermic, then it's a bird, and so on...
 
2. Translate the decision tree below into if/else (Conditional) statements;
 
Teacher Resource: 
Alignment to Curriculum Frameworks
 
Note: All lesson plans in this series are aligned to the Computer Science Teachers Association K-12 Computer Science Standards, the U.S. Common Core State Standards for Mathematics, and if applicable also to the National Council of Teachers of Mathematics' Principles and Standards for School Mathematics, the International Technology Education Association's Standards for Technological Literacy, and the U.S. National Science Education Standards which were produced by the National Research Council.
 
Principles and Standards for School Mathematics
 
Data Analysis and Probability Standard
- Select and use appropriate statistical methods to analyze data 
 
Problem Solving Standard
As a result of activities, all students should develop
- Apply and adapt a variety of appropriate strategies to solve problems. 
- Monitor and reflect on the process of mathematical problem solving. 
 
Representation Standard
- Create and use representations to organize, record, and communicate mathematical ideas
- Use representations to model and interpret physical, social and mathematical phenomena
 
Common Core State Standards for Mathematics
 
Statistics and Probability Standard
- Using Probability to make decisions
- Use probability to evaluate outcomes of decisions
 
Standards for Technological Literacy – All Ages
 
The Nature of Technology
- Standard 1: Students will develop an understanding of the characteristics and scope of technology.
- Standard 2: Students will develop an understanding of the core concepts of technology.  
- Standard 3: Students will develop an understanding of the relationships among technologies and the connections between technology and other fields of study.
 
Technology and Society
- Standard 4: Students will develop and understanding of the cultural, social, economic, and political effects of technology.
 
The Designed World
- Standard 17: Students will develop an understanding of and be able to select and use information and communication technologies.
 
CSTA K-12 Computer Science Standards Grades 9-12 (ages 14-18)
 
5.3 Level 3: Applying Concepts and Creating Real-World Solutions (L3)
 
5.3.A Computer Science in the Modern World (MW)
- Collaboration (CL)
- Community, Global, and Ethical Impacts (CI)
 
5.3.B Computer Science Concepts and Practices (CP)
- Computational Thinking (CT)
- Collaboration (CL)