Refactoring User Input Importer in My Habit Tracker CLI (Intermediate Steps)
This article is part of the main Refactoring User Input Importer article that you can find here. If you haven't read that, you should start reading it from there because this page contains only the intermediate steps between identifying the problem and the refined solution.
Writing new code
I took the liberty to refactor these individual code blocks, make them simpler, reduce unnecessary abstractions and define them in the context of functions. Functions that have arguments with types and a return type.
In regards to parsing the journal entry and activity lines, my initial approach was to construct the Activity
and JournalEntry
dataclasses within the parser functions and return them. But that would violate the single responsibility principle because now the function would have two reasons to change. One when the user_input
file format changes and the other when the dataclass changes. So I think the best way is to return the parsed values as Tuple
or NamedTuple
.
- Extract a
journal_entry
/activity
date
Unpack the extend_date
abstraction and catch date formatting errors.
def parse_habits_date(partial_date: str) -> date:
full_date = f"{partial_date} {date.today().year}"
try:
habits_date = datetime.strptime(full_date, '%d %b %Y').date()
except ValueError:
logger.exception(f"Partial date '{partial_date}' has incorrect format, use '%d %b', i.e. 10 Aug")
return None
return habits_date
- Parse an
activity
line
In the case of the second code block, there is the parse_activity_line
function call that is an unnecessary abstraction considering that the entire code block is about parsing the activity line, so its implementation can just be placed within the function. And guess_life_aspect
is a different concern, so it shouldn't be part of the parser at all. In case the life_aspect
is not defined in the activity line, a None
result should be returned. A solution that declares a so-called ParsedActivity
NamedTuple would look like this:
from collections import namedtuple
ParsedActivityFields = ["activity_name", "life_aspect", "more_info"]
ParsedActivity = namedtuple("ParsedActivity", ParsedActivityFields)
def parse_activity(line: str) -> ParsedActivity:
activity_parts = line.split("|")
raw_activity_props = activity_parts[0]
try:
more_info = activity_parts[1].strip()
except IndexError:
more_info = None
activity_props = raw_activity_props.split(";")
activity_name = activity_props[0].strip()
try:
life_aspect = activity_props[1].strip()
except IndexError:
life_aspect = None
return ParsedActivity(activity_name, life_aspect, more_info)
- Parse multiple lines of
journal_entry
Parsing the journal entry lines is a super simple operation, essentially we want to chain all the lines together separated by a new line (\n
) character.
from typing import List
def parse_journal_entry(lines: List[str]) -> str:
return r"\n".join(lines)
You might think, "Why do we even need a separate function for a one-liner?". The question is valid but considering that this one-liner solves a problem specific to our domain, it deserves its own function. Also if the format of the journal entry in the user_input
file changes we know exactly which part of the codebase has to be altered.
From a different angle
So far we've looked at how individual lines that represent activities and journal entry records are processed. Now let's take a look at how the contents of a user_input
file will be broken down and fed into these parser functions.
After receiving the lines of a user input file from the read_user_input
function we need to split those lines into different lists that represent individual days. The user_input
format defines that each day must be separated by a new line, so between two days there always going to be an empty string and that's how we know what is the delimiter.
I tried to find an already existing function in the python standard library that splits a list using an element as a delimiter, but I couldn't find one so I decided to build my own. An implementation might look something like this:
from typing import List
def split_list(l: List[str], delimiter="", keep_delimiter=False) -> List[List[str]]:
result = []
chunk = []
for element in l:
if element == delimiter:
result.append(chunk)
chunk = []
if keep_delimiter:
chunk.append(element)
else:
chunk.append(element)
result.append(chunk)
return result
Tried to introduce the problem in generically so that it can be used for other things as well, because this is a well-defined algorithm I'm thinking to put this into a utils
library that can be used across packages or even projects.
Now that we have a list of things that happened on each individual day, we can loop through it and parse the days. The name of this function is describing this approach very well:
from typing import List
from collections import namedtuple
ParsedDayFields = ["raw_date", "raw_activities", "raw_journal_entry"]
ParsedDay = namedtuple("ParsedDay", ParsedDayFields)
def parse_day(lines: List[str]) -> ParsedDay:
raw_date = lines[0]
try:
journal_entry_index = lines.index("journal:")
except ValueError:
journal_entry_index = None
if journal_entry_index:
raw_activities = lines[1:journal_entry_index]
# we do not want to include the 'journal:' tag
raw_journal_entry = lines[journal_entry_index + 1:len(lines)]
else:
raw_activities = lines[1:len(lines)]
raw_journal_entry = None
return ParsedDay(raw_date, raw_activities, raw_journal_entry)
The argument of parse_day
is going to be the list of lines that represent that particular day and we want to return a NamedTuple
that will contain each of the sections of a day in separate properties (date, activities, journal entry). Even though these values will hold the raw data just as it was defined in the user input file, this is a useful data transformation because now we hold the sections of the day in different variables. And calling the parsing methods will be a piece of cake.
But before we come full circle we need to define the method that builds an Activity
and JournalEntry
instance from the parsed data. Let's call them builder functions, although they could be dataclass constructor overloads which I'm going to consider implementing in another iteration of this development.
- Building up the
Activity
instance
import logging
from datetime import date
from activities.activity import Activity
from activities.guess_life_aspect import guess_life_aspect
from parse_activity import ParsedActivity
from exceptions import ActivityValueError
logger = logging.getLogger(__name__)
def build_activity(parsed_activity: ParsedActivity, activity_date: date) -> Activity:
if not activity_date:
logger.error(f"activity_date is not defined")
raise ActivityValueError("Activity date is missing")
life_aspect = parsed_activity.life_aspect or guess_life_aspect(parsed_activity.activity_name)
if not life_aspect:
logger.error(f"Parsed activity '{parsed_activity}' is missing life aspect and can't be guessed")
raise ActivityValueError("Life aspect is missing")
return Activity(parsed_activity.activity_name, activity_date, life_aspect, parsed_activity.more_info)
The code is pretty self-explanatory however, I would like to mention that in this function we deal with getting data from different places such as guessing the life_aspect
property from previous records, given that the activity_name
has already been used before. Also, in this part of the code, we are going to raise crucial exceptions, that will indicate to the rest of the program that the user_input
might be written incorrectly and needs intervention.
If everything is good, a fresh new Activity
instance will be returned.
- Building up the
JournalEntry
instance
from datetime import date
from exceptions import JournalEntryValueError
from journal.journal_entry import JournalEntry
def build_journal_entry(parsed_record: str, record_date: date) -> JournalEntry:
if not parsed_record:
return None
if not record_date:
raise JournalEntryValueError
return JournalEntry(parsed_record, record_date)
Building the JournalEntry
object is much simpler however, if the record_date
is not defined and there's a valid parsed_record
we do want to raise an exception and stop the parsing process.
Piece it together
Now that we have all the building blocks necessary, let's put the pieces together and see what would the refactored solution look like.
We use the read_user_input
function to get the contents of the user_input
file as a list of strings (List[str]
) and we call the following parse_user_input
function with that list of strings and wait for a NamedTuple
called UserInput
which contains a list of activities (List[Activity]
) and a list of journal entries (List[JournalEntry]
) in return.
import sys
import logging
from typing import List
from collections import namedtuple
from read_user_input import read_user_input
from split_list import split_list
from exceptions import ActivityValueError, JournalEntryValueError
from parse_day import parse_day
from parse_habits_date import parse_habits_date
from parse_activity import parse_activity
from parse_journal_entry import parse_journal_entry
from build_activity import build_activity
from build_journal_entry import build_journal_entry
UserInputFields = ["activities", "journal_entries"]
UserInput = namedtuple("UserInput", UserInputFields)
def parse_user_input(lines: List[str]) -> UserInput:
if len(lines) == 0:
return UserInput([], [])
days = split_list(lines)
activities = []
journal_entries = []
for day in days:
parsed_day = parse_day(day)
habits_date = parse_habits_date(parsed_day.raw_date)
parsed_activities = [parse_activity(raw_activity) for raw_activity in parsed_day.raw_activities]
parsed_journal_entry = parse_journal_entry(parsed_day.raw_journal_entry)
activities += [build_activity(parsed_activity, habits_date) for parsed_activity in parsed_activities]
journal_entry = build_journal_entry(parsed_journal_entry, habits_date)
if journal_entry:
journal_entries.append(journal_entry)
return UserInput(activities, journal_entries)
if __name__ == "__main__":
filename = "user_input_example.txt"
user_input_lines = read_user_input(filename)
try:
user_input = parse_user_input(user_input_lines)
except ActivityValueError:
logging.error(f"There has been an issue parsing activities in '{filename}'")
sys.exit()
except JournalEntryValueError:
logging.error(f"There has been an issue parsing journal entries is '{filename}'")
sys.exit()
This code block has a bunch of imports, but that's exactly what we wanted, delegate responsibility to different functions so that the code responds better to change.
Conclusion
Check out the main Refactoring User Input Importer article here for reading the conclusion of the work.