image by Andrew Ridley
Use beginning and end of string in regular expressions
We often validate user input using regular expressions.
There are lots of regular expressions on the Internet. Every now and then we might ‘borrow’ one to save ourselves the life-sapping pain of creating one anew.
However, we should beware.
Instead of…
…using ^
and $
to enclose the regular expression.
# A regular expression matching a
# string of lowercase letters
/^[a-z]+$/
Use…
…\A
and \z
.
# A regular expression matching a
# string of lowercase letters
/\A[a-z]+\z/
But why?
Being specific in this case will reduce potential security holes in your code.
The characters ^
and $
match the beginning and end of a line, not the beginning and end of an entire string.
If your validations are not precise you could allow potentially dangerous user input to be permitted.
For example:
> "word\n<script>run_naughty_script();</script>".match?(/^[a-z]+$/)
=> true
> "word\n<script>run_naughty_script();</script>".match?(/\A[a-z]+\z/)
=> false
The string above, with its potentially harmful JavaScript, gets through the looser validation of ^
and $
. You certainly don’t want to let that sort of code to potentially run on your site.
Why not?
This is a case where being specific is important. Just do it.
Last updated on June 10th, 2018