Abstractions: a cautionary tale

![](2017_04_23_abstractions/onion.png) Abstractions: a cautionary tale # Abstractions: a cautionary tale

April 23, 2017

> "All problems in computer science can be solved by another level of indirection" > -- David Wheeler > "...except for the problem of too many layers of indirection." > -- Unknown --------------------------------------------------------------------------------------------- At the request of the involved, the names have been changed.
Out of respect for the victims, the rest has been told exactly as it occurred.
(well, close enough). ## Paving with good intentions Martin is a programmer, and a good one at that! He knows all the rules, and his favorite one is DRY: Don't Repeat Yourself. He abhors duplicated code, knowing it is a constant source of bugs. When you fix a bug in one place, you will unfailingly forget the place where the code is duplicated. One day Martin spots the following code: ~~~ python def write_reports(results): write_report(results.summary, "output/summaries/summary.json") write_report(results.basic_stuff, "output/reports/basic_stuff.json") write_report(results.complex_stuff, "output/reports/complex_stuff.json") def read_report_summary(): return read_report("output/summaries/summary.json") ~~~ "This is so bad", Martin thinks. "So much duplication! What if we need to change the paths? How will we remember to change all the lines?". Martin knows what time it is: it's REFACTORING TIME! He starts by breaking out common strings into named variables: ~~~ python OUTPUT_DIR = "output" SUMMARIES_FOLDER = "summaries" REPORTS_FOLDER = "reports" REPORT_EXTENSION = ".json" def write_reports(results): write_report(results.summary, os.path.join(OUTPUT_DIR, SUMMARIES_FOLDER, "summary" + REPORT_EXTENSION)) write_report(results.basic_stuff, os.path.join(REPORT_DIR, REPORTS_FOLDER, "basic_stuff" + REPORT_EXTENSION)) write_report(results.complex_stuff, os.path.join(REPORT_DIR, REPORTS_FOLDER, "complex_stuff" + REPORT_EXTENSION)) def read_report_summary(): return read_report(os.path.join(OUTPUT_DIR, SUMMARIES_FOLDER, "summary" + REPORT_EXTENSION)) ~~~ "Nice!", Martin thinks. "Now if we change the report directory, we only have to change one line of code! And changing format from Json to XML is also just one line!" But the code was beginning to look a bit unwieldy. "I know, let's create a helper function, and let's add the file names as named variables while we're at it." ~~~ python OUTPUT_DIR = "output" SUMMARIES_FOLDER = "summaries" REPORTS_FOLDER = "reports" SUMMARY_FILENAME = "summary" BASIC_STUFF_FILENAME = "basic_stuff" COMPLEX_STUFF_FILENAME = "complex_stuff" REPORT_EXTENSION = ".json" def get_output_path(folder_name, report_file_name): return os.path.join(OUTPUT_DIR, folder_name, report_file_name + REPORT_EXTENSION) def write_reports(results): write_report(results.summary, get_output_path(SUMMARIES_FOLDER, SUMMARY_FILENAME)) write_report(results.basic_stuff, get_output_path(REPORTS_FOLDER, BASIC_STUFF_FILENAME)) write_report(results.complex_stuff, get_output_path(REPORTS_FOLDER, COMPLEX_STUFF_FILENAME)) def read_report_summary(): return read_report(get_output_path(SUMMARIES_FOLDER, SUMMARY_FILENAME)) ~~~ "There, much better! But we should probably separate the code that handles where files are put from the code that actually formats the files! Let's put path handling in its own file:" `paths.py`: ~~~ python OUTPUT_DIR = "output" SUMMARIES_FOLDER = "summaries" REPORTS_FOLDER = "reports" SUMMARY_FILENAME = "summary" BASIC_STUFF_FILENAME = "basic_stuff" COMPLEX_STUFF_FILENAME = "complex_stuff" REPORT_EXTENSION = ".json" def get_output_path(folder_name, report_file_name): return os.path.join(OUTPUT_DIR, folder_name, report_file_name + REPORT_EXTENSION) ~~~ `report.py`: ~~~ python import paths def write_reports(results): write_report(results.summary, paths.get_output_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) write_report(results.basic_stuff, paths.get_output_path(paths.REPORTS_FOLDER, paths.BASIC_STUFF_FILENAME)) write_report(results.complex_stuff, paths.get_output_path(paths.REPORTS_FOLDER, paths.COMPLEX_STUFF_FILENAME)) def read_report_summary(): return read_report(paths.get_output_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) ~~~ "Perfection!" ## ...or was it? Eva has never worked in this part of the code before. She is reading some code and find `write_reports` and is curious where the summary report ends up. To decipher `paths.get_output_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)` she has to read the whole `paths.py` file. After doing that she needs to piece it all together, mentally executing the code in hear head to get to the answer (`output/summaries/summary.json`). What a waste of time for poor Eva. *But it gets worse!* John is new at the company, still trying to figure things out. One day he finds a typo in the file `output/reports/basic_stuff.json`. "No problem!" John foolishly things, "Surely this will be a quick fix! First let me find out where the file is written". John begins by searching the code base for `output/reports/basic_stuff.json`: no hits. He searches for just `basic_stuff.json`. No hits. He scratches his head and finally tries `output/reports`: Still no hits. John quickly reconsiders his career choice before getting up to ask a colleague for help, hoping that he can find someone who wrote the code, remembers it, is still working at the company, *and* is at the office today. Luckily the first person John asks is Eva. They run `git blame` (whilst smugly remarking how well named the command is) and traces the code back to Martin. ## Confrontation! Eva asks Martin why he made the code so convoluted, and Martin replies: "Convoluted? It is so clean! What if we want to change where reports are written? In the old code you would have to change the code in many, many places, and you'd probably forget one place, creating a bug. With my version we only change the `OUTPUT_DIR` definition in one place in the code." "I don't think that is quite true." Eva says. "We read some of these reports in C++, so you would have to change the code there to.". "Ok, two places then. The point still stands". "And then there is our bash scripts, and I think we requests this from our web front end, so you probably need to check our JavaScript code too...". "Well, I guess I would have to search for where it is used and change all the places.", Martin says impatiently. "Search for what?" John butts in. "If the C++, Bash and JavaScript code is anything like this Python code we have no hope of finding where the file is named! All this abstraction is really just obfuscation!". "In any case", John continues, "if you are going to search the code, why not leave it like it was to start off with? We would still have to search-and-replace, the only difference it that *we would succeed!*". "It would be a huge diff!" Martin says. "We should aim for minimal diffs to avoid merge conflicts!" The discussion is now getting rather heated, and the three parties can't reach an agreement. At the end of the day, nothing changes. The code stays like it is: difficult to read, impossible to search, but with no repetition. ## The moral of the story While it is important to remove code duplication, it must be balanced against the complexity of adding more layers of abstraction. Each layer adds crust that a programmer must wade through before understanding what the code actually does, so it can hurt code readability. In this real-life example it also hurt *code searchability*. If *readability* is understanding the effect of some code, *searchability* is the opposite: given an effect, find the code that caused it. This is something programmers spend a lot of time doing: figuring out which piece of code produced a file, an object, a line in a log file, or showed that graph that popped up while running some script. TL;DR: balance [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) against [KISS](https://en.wikipedia.org/wiki/KISS_principle). Make sure your code is readable and searchable. ## Epilogue One week later another employee changes the root directory name from "output" to "result". The diff read: ~~~ diff --- paths.py +++ paths.py @@ -1,4 +1,4 @@ -RESULT_DIR = "result" +OUTPUT_DIR = "output" SUMMARIES_FOLDER = "summaries" REPORTS_FOLDER = "reports" SUMMARY_FILENAME = "summary" BASIC_STUFF_FILENAME = "basic_stuff" COMPLEX_STUFF_FILENAME = "complex_stuff" REPORT_EXTENSION = ".json" -def get_result_path(folder_name, report_file_name): - return os.path.join(RESULT_DIR, folder_name, report_file_name + REPORT_EXTENSION) +def get_output_path(folder_name, report_file_name): + return os.path.join(OUTPUT_DIR, folder_name, report_file_name + REPORT_EXTENSION) def get_result_path(folder_name, report_file_name): import paths --- reports.py +++ reports.py def write_reports(results): - write_report(results.summary, paths.get_result_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) - write_report(results.basic_stuff, paths.get_result_path(paths.REPORTS_FOLDER, paths.BASIC_STUFF_FILENAME)) - write_report(results.complex_stuff, paths.get_result_path(paths.REPORTS_FOLDER, paths.COMPLEX_STUFF_FILENAME)) + write_report(results.summary, paths.get_output_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) + write_report(results.basic_stuff, paths.get_output_path(paths.REPORTS_FOLDER, paths.BASIC_STUFF_FILENAME)) + write_report(results.complex_stuff, paths.get_output_path(paths.REPORTS_FOLDER, paths.COMPLEX_STUFF_FILENAME)) def read_report_summary(): - return read_report(paths.get_result_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) + return read_report(paths.get_output_path(paths.SUMMARIES_FOLDER, paths.SUMMARY_FILENAME)) ~~~ So much for changing just one line of code.