SQL (often pronounced as sequel) stands for Structured Query Language and is one of the most commonly known programming languages. It is used as an interface for exchanging information between a person or program and a database, which is often known asquerying. It is also one of the easiest programming languages to understand, as it reads almost like English.
Say you have a table in your database that holds the data for your blog posts. The data looks a lot like a typical Excel spreadsheet, where the columns (also called fields) describe the data and the rows are entries in the table.
Writing a SQL query to get information from this table (called BlogPosts) is really quite easy. For example, if I want to make a list of the titles of all my posts, I can run:
SELECT Title FROM BlogPosts
And it will select the data for the Title column for all blog posts, returning the data:
- SQL Injection
- Lorem ipsum
Most of the time, if I want more than just the title, I’m probably only looking to get the data for one post in particular. So if I wanted to get the content and timestamp for this article, I would write:
SELECT Content, Timestamp FROM BlogPosts WHERE Title = "SQL Injection"
By adding the where clause, I limit the data returned to what matches the criteria.
There are other kinds of queries besides SELECT which allow you to retrieve data from the database — for example, an INSERT query adds another row to the table, an UPDATE query changes the data in an existing row, and a DELETE query… you get the point. There are of course many other queries, like those to create databases, schemas, tables, and columns, generate procedures, and create complex selections using joins, but the underlying idea is the same.
SQL injection is a kind of attack made by hackers to exploit how queries are written in SQL. Of all the common vulnerabilities on the web today, SQL injection is probably the easiest to exploit and one of the most severely damaging, since databases contain sensitive and important information.
Take, for example, a table that holds information for users who visit the website and log in.
(I should definitely have a better password, and these should be hashed, but for simplicity’s sake, let’s assume this is what the users table looks like.)
When a user logs in, we need to check if the user provided the correct username and password. We could do this with like so:
SELECT Password FROM Users WHERE Username = "[the text the user put in the Username field]"
For example, if the user typed matt into the Username box and hit submit, we would run the query
SELECT Password FROM Users WHERE Username = "matt"
And this query would return password123. If the user typed richard, however, the query would not return anything because there is norichard in this table. This is an easy way to check if a user with the inputted username exists in the table at all.
Now that we know matt’s password, we can check it against what he typed into the Password field to see if he entered the correct password.
More commonly, this process is done with just one check instead of two:
SELECT ID FROM Users WHERE Username = "[the text the user put in the Username field]" AND Password = "[the text the user put in the Password field]"
What this does is get the ID of the user if the username exists in the table and the password matched, or it gets nothing if the criteria weren’t satisfied.
So what’s vulnerable about this? What if somebody entered this:
If you follow the query above we wrote above, substituting [the text…] with the user’s input, you get this:
SELECT ID FROM Users WHERE Username = "matt" AND Password = "" OR "1" = "1"
The SQL interpreter reads this as: get the ID of a user if their username is matt and their password is blank, or if 1 is equal to 1. Since 1 is always equal to 1, the result of this query is to get the IDs of every single user in the table regardless. At a minimum, this means that the hacker can log in as anybody as long as he knows their username and enters that into the Password box. At worst, a poorly designed query allows the hacker to basically download all the information in the Users table — every user’s username and password, and essentially any other information stored in the database, including private data like phone numbers, addresses, and even more sensitive information.
Some applications allow more than one SQL query to be executed at once as long as you put a semicolon between them. If this is enabled, the hacker could do even more damage. What if the user entered in the Password field: ” OR “1” = “1”; DELETE FROM Users WHERE Username != “matt?
SELECT ID FROM Users WHERE Username = "matt" AND Password = "" OR "1" = "1"; DELETE FROM Users WHERE Username != "matt"
This is read as: get the ID of a user if their username is matt and their password is blank, or if 1 is equal to 1; then delete users whose username is not (!=) matt. A user typing some text into a textbox just managed to wipe out the entire users table in our database, and could potentially delete all the data in the entire database with a few other commands!
Fortunately, safeguarding against SQL injection is actually very easy. You can add a quotation mark to a query very easily with what is called an escape sequence — basically, if you put a backslash in front of a quotation mark, the SQL interpreter will read it as a quotation mark in the data instead. So simply replacing all ” with \” makes a query safe for the database.
SELECT ID FROM Users WHERE Username = "matt" AND Password = "\" OR \"1\" = \"1"
The above query simply means: get the ID of a user if their username is matt and their password is ” OR “1” = “1 (which is not the right password, of course).
There are other ways of sending data safely besides escaping. One other way is called a prepared statement, in which the query itself is sent separately from the data. It looks something like this:
Send this query to the server:
SELECT ID FROM Users WHERE Username = ? AND Password = ?
Then send the username to the server in another request, followed by the password. Finally, replace the first question mark with the username and the second with the password.
Both methods have their advantages and disadvantages, but with most programming languages, either one is very simple to implement. Unfortunately, many programmers go unaware of this major security hole and publish their applications with this vulnerability.