Debugging - How can I debug my program? - Part 2

Pybeginner
By -
0

 

Know what you're supposed to do

The first step in debugging something is know what needs to be done.

“My program doesn't work” isn't good enough: in order to diagnose and fix problems, we need to be able to differentiate between correct and incorrect output. 


If we can write a test case for the failing case - that is, if we can assert that with these inputs, the function should produce this result - we are ready to start debugging. If we can't, we need to figure out how we'll know when we've fixed things.


But writing test cases for scientific software is often more difficult than writing test cases for commercial applications, because if we knew what the output of scientific code should be, we would not be running the software: we would be writing our results and moving on to the next program. . In practice, scientists tend to do the following:

1. Test with simplified data :

Before doing statistics on a real dataset, we should try to calculate statistics for a single record, for two identical records, for two records whose values ​​are separated by a step, or for some other case where we can calculate the right answer manually.


2. Test a simplified case :

If our program is to simulate magnetic eddies in rapidly rotating, supercooled helium bubbles, our first test must be a helium bubble that is not spinning and is not being subjected to any external electromagnetic fields. Likewise, if we are looking at the effects of climate change on speciation, our first test must hold temperature, precipitation, and other factors constant.


3. Compare with an oracle :

A test oracle is something whose results are reliable, such as experimental data, an older program, or a human expert. We use test oracles to determine if our new program produces the correct results. If we have a test oracle, we should store its output for particular cases so that we can compare it with our new results as many times as we like without rerunning that program.


4. Check conservation laws :

Mass, energy, and other quantities are conserved in physical systems, so they should also be in programs. Likewise, if we are analyzing patient data, the number of records should remain the same or decrease as we move from one analysis to another (since we can discard outliers or records with missing values). If “new” patients start showing up out of nowhere as we progress through our pipeline, it’s likely a sign that something is wrong.


5. View :

Data analysts often use simple views to verify the science they are doing and the correctness of their code (just as we did in the opening lesson of this tutorial). This should only be used for debugging as a last resort as it is very difficult to automatically compare two views.

Tags:

Post a Comment

0Comments

Post a Comment (0)