# Good
day_one
day_1
# Bad
DayOne
dayone
# Really bad
T <- FALSE
c <- 10
mean <- function(x) sum(x)
15 Writing good code
15.1 Code style
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.1
All of the following examples are taken from the Tidyverse style guide.
15.1.1 Object names
There are only two hard things in Computer Science: cache invalidation and naming things.
– Phil Karlton
Variable and function names should use only lowercase letters, numbers, and _
. Use underscores (_
) (so called snake case) to separate words within a name.
15.1.2 Spacing
Do not put spaces inside or outside parentheses for regular function calls.
15.1.3 Infix operators
Most infix operators (==
, +
, -
, <-
, etc.) should always be surrounded by spaces:
Fun note: many languages have infixes that naturally change the meaning of a word. In English we have many prefixes and suffixes, for example, unhappy (“un” is the prefix) or hopeless (“less” is the suffix). There is only one infix in the English language: “friggin” (and its derivatives), as in: unfrigginbelievable.
15.1.4 Long function calls
If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing ). This makes the code easier to read and to change later.
# Good
do_something_very_complicated(
something = "that",
requires = many,
arguments = "some of which may be long"
)
# Bad
do_something_very_complicated("that", requires, many, arguments,
"some of which may be long"
)
15.1.5 Long lines (piping)
If the arguments to a function don’t all fit on one line, put each argument on its own line and indent:2
# Good
iris |>
summarise(
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
.by = Species
)
# Bad
iris |>
summarise(Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width), .by = Species)
# Also bad
summarise(
iris,
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width),
.by = Species
)
15.1.6 Short lines (piping)
Sometimes it’s useful to include a short pipe as an argument to a function in a longer pipe. Carefully consider whether the code is more readable with a short inline pipe or if it’s better to move the code outside the pipe and give it an evocative name.
# Good
x |>
semi_join(y |> filter(is_valid))
# Ok
x |>
select(a, b, w) |>
left_join(y |> select(a, b, v), join_by(a, b))
# Better
x_join <- x |> select(a, b, w)
y_join <- y |> select(a, b, v)
left_join(x_join, y_join, join_by(a, b))
15.1.7 Style
- The point is that coding happens in community.
- Not only do you want your code to run well, but you want other people to be able to understand it and use it.
- The more that you and others use the same syntax, the better the communication will be.
15.2 Reflection questions
What are some of the general rules for writing clear code?
Give one reason why piping into a data verb is preferred to using the data frame as the first argument.
15.3 Ethics considerations
What could go wrong if your code style doesn’t match what is expected by your collaborator? Or if it is hard to read by your collaborator?
Why is good communication important in data science?