Create a worksheet of fake sales data in Microsoft Excel
Sometimes there’s a need to create our own data in Excel. Perhaps you want to practice data visualization but need an initial dataset. Maybe you need to make a calculation by looking up already existing values. Or the tutorial you’re watching doesn’t include a workbook to follow along with the video.
Today I will walk you through the creation of fake data, covering concepts like string concatenation, generating random numbers, data look up and more.
What we will create
In this demo we will create a dataset of fake sales data, where each row represents a sale, as seen in the following screenshot.
For the effect, we will mostly make use of random choice of values and number generation. The other thing that will be quite important is to have a worksheet from where we can look up values. For instance, given the customer id, use the name of that customer.
However, before going further, please understand the data included in this table is included just for the sake of practicing Excel. A real sales table could never have this structure. Our purpose with this exercise is to become more familiar with manipulating data, both creating new data and making use of already existing data (via look up).
Create the helper data
Now that you’ve been introduced to the demo, let’s talk about the helper data used for look up.
Since we’ll need to randomly choose values for our table, such as the quantity purchased, or look up values in a given range, like the price of a product, it would be good to have those values available somewhere to refer to. We’ll have that data written in a second worksheet, used exclusively for look up when generating data for our table.
So, let’s create our workbook. Open Microsoft Excel and create a new blank workbook. By default it has one worksheet, but that first one will be used for our dataset. Create a second worksheet by pressing the plus button in the footer
and now let’s start typing some data.
We’ll need to have sample sales personnel names, districts, customer ids and respective names and products and respective prices. What I will show next is that worksheet filled with some sample data:
All of the data you see in the above screenshot can be changed as you see fit, even the quantities. We only need to have some data that we can use as a base when generating the values for our table. And by the way, the first row with the titles is completely optional. I only included it for the sake of readability of the helper data.
If you want to use the same data as me, you can copy paste from this link. If you’re interested, you can also use this link to quickly generate fake names like the ones you see in the first column.
With the helper data out of the way, we can return to the first worksheet and start creating our dataset.
Set the data types
Before we start properly generating our data, we can change the data types of of our columns to facilitate data formatting.
First write down the column headers.
And now let’s change the data types. With that said, we only really need to change three columns: Date, Price and Profit. The first because we want its values to be stored as dates, and the other two because we want them to be stored as currency.
Hence, to change all the values for a column, click the letter above the column (A, B, C, …). Then, while pressing the Ctrl key, click the first cell of the selected column (to deselect the cell, after all, the column headers are just text). Finally, change the data to the appropriate data type:
For the Price and Profits, since you may want to select your preferred currency instead of your system default (by selecting the currency option shown in the previous screenshot), you can change it through this menu instead:
Apply the same changes to the Profit column and we are done with all the setup. We can move on to generating the actual data!
Order ID column
The Order ID column is used to uniquely identify each order. For the effect, we’ll keep it as a regular sequence of numbers.
And for that we’ll make use of the Fill feature to generate and fill as many rows of that column as we want. Thanks to Fill, we can easily generate a sequence of numbers given the initial and final values, as well as the step value (the difference between two consecutive numbers of the sequence).
To use Fill, write a 1 in the first row of the Order ID column. This value will be interpreted as the initial value of the sequence. Select that cell and then open the Fill Series option in the Home menu.
This opens a new dialog where we can set all the information needed to generate our sequence of ids/numbers:
Allow me to explain each value for our Fill:
- Series in: we choose Columns because we want to generate values to fill cells of the selected column rather than cells from the selected row, i.e., we want to generate the ids for the Order ID column
- Type: linear type because we want the number generation to be linear
- Step value: each number generated is incremented by 1 to generate the next one (so we have a consecutive sequence)
- Stop value: the last value of our sequence is 1000, that is, since this is a linear type series with a step value of 1, we are going to fill a thousand rows with the numbers between 1 and 1000, inclusive
Press Enter and voilá. The Order ID column is now complete and we have successfully generated one thousand order ids. If you press Ctrl+Down Arrow while having a cell of this first column selected, you can move to the last cell of the column immediately, the cell with the id 1000.
All columns will have a thousand rows precisely to show that quantity is not a problem in this generation of fake data, we can focus on generating the data we want.
Date column
Generating a random date for each order can be easily achieved by using the RANDBETWEEN function. We give it the interval of dates to choose from and RANDBETWEEN chooses a random date from that interval.
=RANDBETWEEN(first_date, last_date)
For the function to actually choose a date, we need to give it date values, and we do that with the DATE function. This second function creates a date by being given the year, month and day. Thus, what we are doing in this formula is saying that we want to generate a random date between January 1st 2018 and December 31st 2019, inclusive.
By the way, keep in mind that RANDBETWEEN generates an integer, even if we have given it dates to choose from. However, the cell displays the generated value as a date because we formatted this column in the beginning to be of the date data type.
Pressing F9 or any interaction with the workbook recalculates the formulas, i.e., generates another random date. Don’t worry though, at the end of the demo we’ll make these randomly generated values fixed.
Okay, all is well, but we have only generated the date for a single order. What about the remaining 999 orders? Let me show you two ways of achieving this. The first uses the Fill feature we’ve seen previously, the second is a literal copy and paste after selecting the desired range (by typing the range, not by selecting with the mouse!).
( Copy formula to multiple cells with Fill )
For the first method, using the Fill feature, we need to select all the cells which we want to copy the formula into. But we are not going to use the mouse for this. Imagine, selecting a thousand rows, by hand. Since we have an adjacent column of the size we want to select, we can do it with simple keyboard shortcuts.
Move the cursor to the last cell of the Order ID column (Ctrl+Down arrow while having a cell of that column selected) and then press the right arrow. This means now you have the last cell of the Date column selected. To select the remaining cells of that column, press the Ctrl+Shift+Up Arrow key combination. Now you have all the cells of the Date column selected, up to the one that already has a date created, and we can fill the empty ones. This is achieved by either pressing Ctrl+D or by selecting the Fill Down option in the menu.
This will fill all the selected cells with the formula stored in the first selected cell, the one that had a date already generated!
Note: if some dates are shown as number signs (#), just make the column larger.
( Copy formula to multiple cells by selecting the range )
With the second method, we copy a formula to multiple cells by pasting it after having selected the desired range. But we won’t be select the cells with the mouse!
We’ll use the Date column as an example once again. Select the cell which already has a date generated (cell B2) and copy it (Ctrl+C). Now, we want that formula to be used in all the other cells of the Date column, column B. In other words, we want all cells between B3 and B1001, inclusively, to have the random date generation formula.
While having the cell B2 copied (it will have the dotted border while copied), enter the desired range in the Name Box and press Enter.
Now that all the empty cells in the Date column are selected, and we have the formula copied, just press Ctrl+V to paste the formula!
And with these last two sections, we have finished generating the dates for the fake orders and have learned how to copy a formula to multiple cells with two different methods. Whichever one you prefer, please keep in mind these will be the crux of our data generation as they are what allow us to have an arbitrary number of rows of data.
Customer ID column
The generation of data for this next column is straightforwad, as we are just picking a random customer id by generating a random integer. As such, our formula is
=RANDBETWEEN(1, 10)
Friendly tip: if you don’t like using the mouse while writing a formula, press the Tab key to autocomplete the name of functions.
In this case I chose to pick a number between 1 and 10 because the helper data contains ten customer names (the data in the second worksheet). If you’re using different helper data, then adjust these two limits as needed.
So, write the formula in the first cell of the Customer ID column and then use one of the two methods we’ve seen previously to copy the formula to the remaining 999 cells.
Again, with each action, you’ll see the numbers being generated again. We’ll make them static at the end of the demo.
Customer Name column
The customer name will be the first time where we make use of the helper data in the second worksheet. Now that the orders have the id of the customer, we want to know the name of that customer.
For the sake of practice, the names generated will be composed by the word “Company”, followed by a letter/name from the helper data. But, instead of randomly picking a letter, we will look up the letter associated to the customer id. And for that, we need the VLOOKUP function.
When we have value from a range or table and want to find another value that is located in the same row as the value you have, but in a different column, VLOOKUP can save the day. As such, the function needs the following information:
- The value you want to look up (this value needs to available in the first column of the given table/range)
- The table/range in which to search
- The column of table/range in which to find the desired value
- Type of search (true if you want an approximate match or false if you want an exact match; we want the latter)
If we now translate these arguments into our case, we want to look up the customer ids in the columns of customer ids and names of the helper data worksheet, knowing that the value we want (the name) is available in the second column.
Which means our VLOOKUP function, for finding the first customer name, will look something like this:
=VLOOKUP(C2, Sheet2!$D$2:$E$11, 2, 0)
Notice how the reference to the helper data is an absolute reference, that is, no matter from where we call this function, the range looked up on will always be those two columns. C2 is simply the cell address of the first customer id. But why is the id address not an absolute reference too? Because that address changes throughout the rows of the Customer Name column. While the first customer id is found on the cell C2, the second is found on C3, the third on C4 and so on. The last argument, to look for an exact match, can be passed either as FALSE or 0.
Now, if you copy the formula to the remaining cells of the Customer Name column using one of the two methods we are familiar with, we can wrap up this column. As the formula is copied to each cell, Excel will update the cell address of the id so that the correct id is looked up for each row.
Actually, hold on a second. If we finish it now, the Customer Names will look like this:
This way the names don’t start with “Company” as we discussed in the beginning. To do that, we need string concatenation, i.e., to join values/strings, in this case the word “Company” with the letter/name found in the helper data. To perform this operation in Excel, we put an ampersand (&) in between the values we want to join. So, in our case, this is the resulting formula for the first cell of the Costumer Name column:
="Company " & VLOOKUP(C2, Sheet2!$D$2:$E$11, 2, 0)
Notice the space at the end of “Company “ so that there’s a space between the word and the retrieved letter in the resulting name.
Now, copy this formula to the other cells and we can move on to the next column.
Sales Person column
To choose the sales person responsible for closing the order, we’ll make Excel choose a random employee name from the helper data.
Unlike before, where we wanted to find one value given another as reference, this time we just want to pick a random value from a range of data. Perhaps you have heard or used the INDEX-MATCH combination of functions before. The formula for this example will be a variation of that combination, where instead of finding a specific value like the customer name, this one will return a random employee name.
In case you are not familiar with INDEX, this function is used to return a value from a specific range:
=INDEX(range_of_data, row_of_value_desired, column_of_value_desired)
Since our employee names are stored in a one-column range
the range and the column given to the function are static, the row is what gives the formula its randomization. We can use RANDBETWEEN yet again to randomly choose the row from which we’ll get the employee name. But, this time, the number we want to generate is between 1, as the first row of names is a valid option, and the number of names available. In the above screenshot there are ten names, but we want the formulas in this demo to be as dynamic as possible: the ROWS function can easily count the number of rows available.
Thus, the choice of a random row from our range looks like this:
RANDBETWEEN(1, ROWS(Sheet2$A$2:$A$11))
Don’t forget to use an absolute reference for the helper data!
And now the complete formula to pick a random employee name:
=INDEX(
Sheet2!$A$2:$A$11,
RANDBETWEEN(1, ROWS(Sheet2!$A$2:$A$11)),
1)
(I separated the INDEX function arguments for the sake of readability)
With this formula, INDEX will retrieve, from the range of sales people names, the value from a random row in the first column.
Now, copy this formula to the remaining cells of the Sales Person column and that’s another column completed!
District column
I hope you understood what we did with the Sales Person column, because the District column uses the exact same formula, except the helper data is from a different range.
In other words, the District data uses the INDEX function as well, but to pick a random district from the range available in the helper data worksheet:
Thus, to pick a random row from the helper data, the formula is:
RANDBETWEEN(1, ROWS(Sheet2!$B$2:$B$19))
And the complete formula identical to the one from the Sales Person column:
=INDEX(
Sheet2!$B$2:$B$19,
RANDBETWEEN(1, ROWS(Sheet2!$B$2:$B$19)),
1)
Finally, copy the formula to the remaining cells to finish another column of data!
Product column
And, for the third time, we’ll use the INDEX-RANDBETWEEN combination to fill a column. This time, we’ll pick a random product from the helper data:
Thus, to get a random of row from the column of product names we use:
RANDBETWEEN(1, ROWS(Sheet2!$G$2:$G$6))
And the complete formula:
=INDEX(
Sheet2!$G$2:$G$6,
RANDBETWEEN(1, ROWS(Sheet2!$G$2:$G$6)),
1)
Copy the formula to the remaining cells and that’s the the last time we’ll use the INDEX-RANDBETWEEN formula in this demo.
Price column
To know the price of the product ordered, we’ll look up that information in the helper data. Just like with the customer names, we can use VLOOKUP to find out the price in the helper data, given the product.
As a reminder, VLOOKUP receives four arguments:
=VLOOKUP(value_to_lookup, helper_data_range, target_column, exact_or_approximate_match)
This means that we will look up the product name in the range of helper data for the products, knowing that we want an exact match and the value we are looking for is in the second column of the range. Thus, the formula for the first order is the following
=VLOOKUP(G2, Sheet2!$G$2:$H$6, 2, 0)
And if we copy the formula to the remaining cells, Excel updates the cell address of the value we have accordingly:
Quantity column
The quantity, just like the customer id, is a random integer. As such, the formula for the cells of this column is a simple usage of the RANDBETWEEN function:
=RANDBETWEEN(1, 15)
Profit column
This last column calculates the profit obtained from each order. In this case, we will calculate the profit as an arbitrary percentage of the price times the quantity purchased.
So, the formula to calculate the profit is a straightforward multiplication of three values: the price, the quantity and the profit percentage. The resulting formula for the first cell will look like this, for a case where the profit is 80% of the order’s total price:
=H2 * I2 * 0.8
Copy the formula to the remaining cells of the Profit column (Excel updates the cell addresses accordingly) and we are done generating our fake data.
The only thing that’s left is to make the generated values fixed and, optionally, to convert the range of data into a table.
Create fixed data
Finally, we’ll make our dataset fixed. So far, given that the data in each column is created by a formula, moreover formulas that generate random data, each time we make a change to the workbook, the formulas are calculated, that is, the data is generated anew.
So, to make the dataset fixed, select any cell from the table and press Ctrl+A to select the complete table (technically, the data is stored in a range, not a table yet). Now, to store the data statically with the Paste Special feature, you can paste it either on top of the already existing data, or in a new worksheet. If you choose the former, you’ll lose the formulas we’ve written, so I advise creating a new, third worksheet.
When you paste the data (Ctrl+V), you can make use of the Paste Special feature by pressing Ctrl afterwards or clicking the respective icon:
In our case, since we want only the values from the cells copied, we want one of the Paste Values options. And, since we had formatted some columns, we’ll choose the second option to keep that formatting
And with this we have now successfully finished generating our fake data. The only thing you need to change after the Paste Special is the widths of the columns because they keep the initial width.
And here is an example of the finalized dataset:
Of course, your values will be different as the data was generated randomly, but the values you see now after the paste won’t change. If you click on any cell, you’ll see that the values stored in the cells are the actual values, the formulas are nowhere to be seen.
If you chose to create a new worksheet and kept the cells with the formulas in the first worksheet, then those original cells will keep generating new random values. If you copy and pasted the static values on top of the original cells, then you can’t use those formulas anymore.
Conclusion
The demo is now finished but, optionally, we can turn our range of data into an actual table. Put the cursor in one of the cells with data and press Ctrl+A to select all the cells, followed by Ctrl+T. This second key combination will open a prompt for turning our range of data into a table.
Press Enter (the table will use the row with the column names as the headers) and voilà, our dataset is now stored in a table instead of a range and the demo finished for good.
One last tip for future reference: if you change the names of the worksheets, Excel will update the names in the formulas automatically 😎
I hope you found this demo useful. In my opinion this a good exercise when you’re starting out with Excel as it allows you to learn some essential formulas as well as training data look up and creation. Plus, I hope the keyboard shortcuts mentioned throughout the explanations are helpful :)
If you’re interested, you can find a complete workbook demo here.