External data is data that is stored outside of Excel. Excel easily works with data created in different applications if this data is saved in a format Excel can recognize. We may want to import data from another database that meets certain criteria. By importing the data into Excel, we can still manipulate and format the data using the tools in Excel.
Often, when using Excel, you will be getting your data from another source, and it will need to be cleaned up before you can begin analyzing it. The Microsoft Support site has a long list of recommendations for cleaning up data.
The core steps for cleaning data are:
- Import the data from the external data source.
- Make a backup copy of the unmodified imported data in a separate workbook.
- Make sure the data conforms to a tabular format of rows and columns with:
- similar data in each column.
- all columns and rows visible.
- no blank rows or columns within the range.
- Next, do any tasks that don't require manipulation of all of the data in an entire column first (e.g., spell-checking or using Find and Replace).
- Finally, do the tasks that require manipulation of all of the data in an entire column.
Having the backup of the imported data provides a safety net to perform the other, potentially more destructive (intentionally or not) tasks.
Let's see how some of these core steps might actually be performed now.
Spoiling the Game of Thrones Book Series
For this next section, we have a text file containing data about characters from the book series, A Song of Ice and Fire, and the fates of those characters. We will be importing this text file into Excel, so we may do some clean up on the data, and potentially use it for some deeper analysis later.
Let's take a look at the actual contents of the file, in a text editor.
Examining the Source File Contents
The text file contains information about characters in the A Song of Ice and Fire book series, and when they were introduced. There are some SPOILERS in this file regarding some characters' fates.
This data set is derived from data available from the blog Probably Overthinking It.
Let's look more closely at the contents of this text file and what it contains.
Step1. Open a text editor.
Step2. Navigate into the epclass folder on the Desktop.
Step3. Open the subfolder, ExcelDataMgt.
Step4. Open the text file, GoT Books Spoilers.txt.
- Name: character name
- Allegiances: character house
- Death Year: year character died
- Book of Death: book character died in
- Death Chapter: chapter character died in
- Book Intro Chapter: chapter character was introduced in
- Gender: 1 is male, 0 is female
- Nobility: 1 is noble, 0 is a commoner
- GoT: Appeared in first book, A Game of Thrones
- CoK: Appeared in second book, A Clash of Kings
- SoS: Appeared in third book, A Storm of Swords
- FfC: Appeared in fourth book, A Feast of Crows
- DwD: Appeared in fifth book, A Dance with Dragons
Step5. Close GoT Books Spoilers.txt, without saving any changes.
Using the Text Import Wizard
For this exercise, we will import that text file by using the Text Import Wizard to help ensure that the data is imported easily and correctly into Excel.
The data we want to use is in another file, so we have to tell Excel where to find it.
Step1. Create a blank workbook.
Step2. Save the workbook with the name Spoilers Backup.xlsx inside the ExcelDataMgt directory.
Step3. To switch to the Data tab,
Click the Data tab
Step4. To begin to import the file, in the Get External Data group,
Click
Step5. To proceed with the import,
Double-ClickGoT Books Spoilers.txt

Step6. To change the file type to match the portion to import,
Click the Delimited radio button
Step7. Scroll down in the Preview portion of the dialog box, until you see the comma-separated data beginning on line 28.
Step8. To start the import at the desired row, in the "Start import at row:" field,
Double-Click the value, type: 28
Step9. To confirm the settings,
Click
Step10. To change the delimiter settings,
Click the Tab checkbox, Click the Comma checkbox
Step11. To accept the these settings,
Click
Step12. To complete the data import,
Click

Step13. To complete the import,
Click
Step14. Save the workbook, Spoilers Backup.xlsx.
Step15. Save the workbook as Spoilers.xlsx.
Splitting Worksheets
When you are working with larger data sets on a spreadsheet, it may be helpful to view different areas of the same sheet at the same time. Splitting the worksheet enables separate scroll bars for each of the sheet's sections created by the split.
The split will occur immediately above the location of the active cell.
We can use the split to examine how the data is arranged overall (e.g., are there any entirely blank rows or columns), and also aspects of specific rows of data.
Step1. Scroll down the worksheet, to the entry for Arys Oakheart.
Step2. To activate that cell,
Click in cell A58
Step1. Switch to the View command tab.
Step2. To split the worksheet, in the Window group,
Click
NOTE: If the cursor is not positioned in the first column, you will get a 4-way split with the window divided into four sections.
Step3. To scroll in the lower section,
Click in the bottom half of the window and scroll until row 128 (Catelyn Tully) is just below the split bar
Step4. Scroll through the data, and look for any completely blank rows or columns in the data set.
NOTE: To move the split bar, place the cursor over the bar until you see a cursor shape with vertical arrows, then press and drag the bar to a desired location.
Step5. To remove the split, on the View tab of the Ribbon,
Click
Using Find and Replace to Change Data
Now we have confirmed the first few core steps of cleaning data. We have imported the data, made a backup workbook of the unmodified data, and we have verified our data set doesn't have any completely blank rows or columns within it. Now we want to change the data recorded about alleged character gender to use category words instead of the numeric code currently displayed. We can use the Find and Replace feature to do this.
Step1. Scroll to first row of the data set.
Step2. To select the desired data,
Click the column G header
Step3. Switch to the Home tab on the Ribbon.
Step4. To locate the Find and Replace feature, in the Editing group,
Click
NOTE for MacOS Users: To activate the Find and Replace feature, on the Menu bar, Click Edit, Point Find, Click Replace...
Step5. To activate the Find and Replace feature,
Click Replace...
Step6. To indicate the initial data to be found, in the Find what: field, type:
1
Step7. To indicate the initial replacement string,
Click in the "Replace with:" field, type: Male
Step8. To reveal the options, in the Find and Replace dialog box,
Click
NOTE for MacOS Users: The controls are already visible.
Step9. To search by columns, in the Search: drop-down,
Click By Columns
Step10. To ensure only an integer 1 is targeted,
Click the "Match entire cell contents" checkbox
Step11. To replace all of the 1s with the word Male,
Click
Step12. To dismiss the message,
Click
Step13. To indicate the initial data to be found, in the Find what: field, type:
0
Step14. To indicate the initial replacement string, in the Replace with: field, type:
Click in the "Replace with:" field, type: Female
Step15. To replace all of the 0s with the word Female,
Click
Step16. To dismiss the message,
Click
Step17. To close the dialog box,
Click
Step18. Save the workbook, Spoilers.xlsx.
Refining the Titanic Passenger List
Our next dataset contains information about the Titanic passenger list. Currently, the data is distributed across three worksheets in a workbook. Our task will combine the data into a single worksheet, and then make further changes to the data.
This dataset was obtained from the Department of Biostatistics Vanderbilt University.
The data is grouped by worksheet according to the passenger class (1st class, 2nd class,...) of the individuals, and we want to create a single worksheet containing all of the passengers data, so that certain kinds of analysis will be easier and more complete.
Let's open the workbook, TitanicList.xlsx, containing the data.
Step1. To open the desired workbook, press:
Control key+o, Click,
Double-Click TitanicList.xlsx
NOTE for MacOS Users: To open the desired workbook, press: Command key+o, Click On My Mac, Double-Click TitanicList.xlsx.
Copying a Worksheet
Since additional worksheets can be added or deleted easily, copying entire worksheets can be an efficient way to experiment with data without risk. Always make a copy of a worksheet before doing something that could result in an accidental or undesirable change to data.
Step1. To switch to the 1st Class worksheet,
Click the 1st Class worksheet tab
Step2. To begin making a copy of the worksheet,
Right-Click the 1st Class worksheet tab,
Click Move or Copy...
NOTE for MacOS Users: To begin making a copy of the worksheet, press Control key+Click the 1st Class worksheet tab, Click Move or Copy...
Step3. To select the "Create a copy" option,
Click the "Create a copy" checkbox
Step4. To copy the 1st Class worksheet to the end of the other worksheets, in the Before sheet field,
Click (move to end)
Step5. To complete the process,
Click
Modifying a Worksheet Tab
We can give any worksheet a different name. Up to 31 characters are allowed in a sheet name. It is a good idea to keep the sheet names short and concise so that we can see more available tabs at the bottom of the workbook.
Step1. To select the worksheet tab name,
Double-Click the 1st Class (2) worksheet tab
Step2. To rename the tab, type:
AllPass Enter
Step3. To change the tab color,
Right-Click the AllPass worksheet tab
Point to Tab Color..., Clickany color
Step4. Save the workbook, TitanicList.xlsx.
Selecting the Current Region
When working with large data sets, we may want to copy the data itself, without making an entire copy of the worksheet. For some data, it may be fine to use a simple selection followed by copy and paste. For large or very large data sets, this may be impractical, as sometimes Excel can scroll with surprising speed when you are using the mouse to select data. However, Excel has a feature which makes selecting a large, well-structured chunk of data very easy to do, accurately.
If your data is in a range which has no blank rows or columns within the data itself, and the active cell is within that range, that range is considered the Current Region. The Current Region will extend to the right until Excel encounters a blank column, and then it will continue down until there is an entire blank row.
We will select the Current Region on the other two worksheets, then copy and paste the result into the AllPass worksheet to create our full dataset.
Let's see how to do this. We'll begin by copying the data from the 2nd Class worksheet.
Step1. To move to the desired worksheet,
Click the 2nd Class worksheet tab
Step1. To begin to select the Current Region, on the Ribbon,
Click, Click Go To Special...
NOTE for MacOS Users: To activate the Find and Replace feature, on the Menu bar, Click Edit, Point Find, Click Go To..., Click Special...
Step2. To select the Current Region,
Click the Current region radio button, Click
Step3. To copy the current selection, press:
Control key+c
Step4. To move to the desired worksheet,
Click the AllPass worksheet tab
Step5. To move the active cell to be under the last row of data, press:
Control key+Down Arrow key, and then press: Down Arrow key
NOTE for MacOS Users: To move the active cell to be under the last row of data, press: Command key+Down Arrow key, and then press: Down Arrow key.
Step6. To paste the selected data, press:
Control key+v
Step7. To move to the desired worksheet,
Click the 3rd Class worksheet tab
Step8. To select the Current Region, press:
Control key+Shift key+*
NOTE for MacOS Users: To select the Current Region, press: Shift key+Control key+Spacebar.
Step9. To copy the current selection, press:
Control key+c
Step10. To move to the desired worksheet,
Click the AllPass worksheet tab
Step11. To move the active cell to be under the last row of data, press:
Control key+Down Arrow key, and then press: Down Arrow key
NOTE for MacOS Users: To move the active cell to be under the last row of data, press: Command key+Down Arrow key, and then press: Down Arrow key.
Step12. To paste the selected data, press:
Control key+v
Step13. Save the workbook, TitanicList.xlsx.
Removing Duplicates
When we remove duplicate values in Excel, we permanently delete duplicate values. The duplicate value is one where all specified values in the row exactly match all the same specified values in another row. The Remove Duplicates feature permits selection of columns for the specific data to be examined for duplicates. The initial row will be retained, and any other duplicate rows will be deleted. Obviously, this feature can be dangerous to data, so it is a good idea to save your workbook, as we have, before using this feature.
Step1. Switch to the Data command tab on the Ribbon.
Step2. To select the Current Region, press:
Control key+Shift key+*
NOTE for MacOS Users: To select the Current Region, press: Shift key+Control key+Spacebar.
Step3. To begin to remove the duplicate rows, in the Data Tools group,
Click
NOTE for MacOS Users: The behavior of Remove Duplicates is different. The confirmation of duplicates found is displayed before you actually change the data. To remove both duplicate rows, Click Remove Duplicates.
Step4. To permit a duplicate match to the header row,
Click the "My data has headers" checkbox
Step5. To remove the duplicate rows,
Click
Step6. To close the message,
Click
Tiling Multiple Worksheets
Just as splitting a worksheet allows comparison of content in different areas of one sheet, tiling worksheets can enable the comparison of content in different worksheets whether they are in the same workbook or in different workbooks.
Let's tile the AllPass and 2ndClass worksheets for ease of comparison. When comparing two worksheets from the same workbook, we must first create a new window.
Step1. To create a new window for viewing, on the Ribbon,
Click the View tab, Click
NOTE for MacOS Users: To create a new window for viewing, on the Menu bar, Click Window, Click New Window.
Step2. To select the 2nd Class worksheet at the bottom of this window,
Click the 2nd Class worksheet tab
Step3. Switch to the View tab again, if necessary.
Step4. To switch windows, in the Window group,
Click, Click TitanicList.xlsx:1
NOTE for MacOS Users: To switch windows, on the Menu bar, Click Window, Click Arrange.
Step5. To view both worksheets in the same window, in the Window group,
Click
NOTE for MacOS Users: The options presented on the Ribbon are Tiled, Horizontal, Vertical, and Cascade. To restrict the windows being arranged, Click ‘Windows of active workbook".
Step6. To select the desired window,
Click TitanicList.xlsx:2, Click
NOTE: In Microsoft Excel 2016 for Windows, workbooks are shown as separate Excel windows with their own Ribbon, instead of multiple windows inside of a single Excel window. This means you will be able to display worksheets on different monitors.
Step7. Use the scroll bar to move to a different area of the active worksheet.
NOTE for MacOS Users: This feature is not available in Excel 2016 for Mac. The windows will scroll independently.
Step8. To turn off Synchronous Scrolling, in the Window group,
Click
Step9. To verify the last entry on the 2nd Class worksheet, in the TitanicList.xlsx:2 window, press:
Control key+Down Arrow key
NOTE for MacOS Users: To verify the last entry on the 2nd Class worksheet, in the TitanicList.xlsx:2 window, press: Command key+Down Arrow key.
Step10. Scroll to row 601in the AllPass worksheet, in the TitanicList.xlsx:1 window, to verify.
Step11. Scroll in both windows to make certain only the extra duplicate label rows were deleted. The other duplicate was at row 324.
Step12. Close the TitanicList.xlsx:2 window.
Step13. Save the workbook, TitanicList.xlsx.