Monday, July 6, 2009

A Brief Guide to Exporting Data from Geosystem for Windows

Why should I read this guide?

Although extensive documentation for Geosystem’s export tool has not been published, Geosystem for Windows contains some helpful information. To access it, go to the Tools menu, select “Data Summary and Export…,” go to the Help menu, and select “Current Screen.” In addition, VES, Inc. has published a very brief guide to exporting data into a file compatible with Microsoft Excel. It can be found at: http://www.geosystemsoftware.com/support/gfwexprt.htm. New users can try out the Data Summary and Export Tutorial, a piece of bundled software that briefly but effectively demonstrates how to use the export tool. Finally, users may consult Creating, Saving and Archiving Laboratory Test Files with the GEOSYSTEM Data Manager, an interesting—but somewhat outdated—manual available at the Geosystem website.

All of these contain valuable information. However, they do not provide detailed recommendations for creating anything other than a generic spreadsheet. Adventurous users will be perfectly happy to experiment with Geosystem and develop a data export system that matches their unique needs. Everyone else has the opportunity to learn from the mistakes of others by reading this guide, especially the sample export system presented in the appendix.

This guide does not purport to fully explain all of the operations you may be required to carry out prior to exporting data. You should have some familiarity with Geosystem for Windows. Furthermore, this guide was created with Geosystem for Windows v2.1, LD4 module v4.1. While other versions of Geosystem for Windows are very similar, they may not be identical in function.

What should I do first?

You may be tempted to open up the tool and immediately export every field listed in the LD4 data summary and export window. As is so often the case, it is best to restrain this instinct and take a moment to think about what you are trying to achieve. Adding unnecessary fields not only takes time, it also creates a large, messy file that requires pruning prior to use. Furthermore, you can only export fields associated with a single data entry form, so you must be sure that form contains everything you will need. Here are some of the things you will want to think about:

  • What data do I really need to export?

  • Is this data already in my LD4 data file? If not, will I have to add new fields to my data entry form?

  • Will it be easier to use my exported data in separate files for each project or in a single exported file?

  • Do I need to present my results in LD4 boring logs or can I perform some data manipulation outside of the software?

  • In what file format will my exported data need to be?

  • Are there specific field names that my final product will need to have?

  • Do the units in my data file match the units I will be using at the end of the process?

Once you know what you are trying to accomplish, you can get started by making sure that your data file has everything you need.

Is my data file ready to be exported?

Open your project file and go to the Tools menu. Select “Data Summary and Export.” You will see a window with a variety of text boxes (See Figure 1.) Look at the one titled “Dictionary.” This box contains all of the data fields used by the current project’s data entry forms. If you do not have both LD4 and Labsuite, you might not be able to see any laboratory data fields, even if your file was created by someone who entered data using Labsuite. If this is the case, you will need to add the relevant laboratory data fields to the current project’s data entry form.

1 Figure 1. The Data Summary and Export Window

Even if all of the raw data is visible, it may be better to have Geosystem perform some calculations prior to exporting the file. This is the case when there is a chance that future reports will need to use a field not yet created, but required by the final use of the exported data. Of course, you can export your raw data, manipulate it, and type the results into a new field in the Geosystem data file, but this would duplicate work that might be completed by the software. Describing how to create calculated fields within Geosystem is beyond the scope of this guide, but the Customization Guide bundled with LD4 may help.

Sometimes the delimitation required for your final product will not be compatible with the text stored in some fields. Although there is a method for dealing with this issue during export, you may be forced to choose between eliminating all of the delimiting characters from your data file or all of the quotation marks. (This is because many programs use quotation marks to determine how to interpret delimiting characters.) As a general rule, it is easier to replace ““” with “–inches” or “in.”, than it is to find a substitute for the comma. The easiest way to carry out replacement of a specific character is to use Geosystem’s replace function. To do so, go to the Project menu and select “Replace Text…” This brings up the Find/Replace window (See Figure 2.) Enter the character you wish to eliminate and the text you wish to replace it with. Then select “Replace All.” Unlike many other Windows applications, Geosystem will not automatically replace every instance of the character. Instead, it will ask the user to verify every replacement. While this is a slow process, it does give you the opportunity to replace the eliminated character with different strings in different contexts. You must do this for every project data file that is to be exported or, preferably, do it once for a merged project data file.1

2 Figure 2. The Find/Replace Window

Finally, you may wish to merge multiple project data files prior to export. Under some circumstances, it will be preferable to keep all the project files separate. In most cases, merging projects will reduce the overall time to export and simplify subsequent processing.

If none of these concerns applies to your data file and if all of the data has already been entered, you may skip to section 4.

How do I add fields and data to my project file?

In order to add fields to the data file, return to the Data Entry window by closing the Data Summary and Export window (See Figure 3.)

3 Figure 3. The Data Entry Window

Go to the Tools menu, and select “Data Entry Forms.” (You must be using LD4 in order to add data entry fields to the file. Users who have only licensed Geosystem’s laboratory testing modules can only export the standard data fields.) A new window will display. Select the option from the left side of the screen that corresponds to the type of data you wish to export. This will bring up a listing of the data entry fields already associated with your project data file (See Figure 4.) In the yellow box at the top of the screen, Geosystem has a long list of fields that have already been made for you. You may select one of these by dragging it from the yellow box to an appropriate location on the card below. If none of the standard fields seems appropriate, you may create your own by selecting the gray button, “Click if your field isn’t listed” and filling the blanks at the bottom left corner of the screen. Check to make sure that your new data entry field is correctly formatted; then go to the Entry Form menu and select “Exit.” (Geosystem automatically saves any changes you have made.)

4 Figure 4. The Data Entry Form Design Window

To enter more data into your file, simply select the relevant table from the yellow box on the left, select the relevant data entry field, and begin typing.

To add data from other project data files to the current file, go to the Tools menu and select “Project Merge…” Select “Continue,” find the first file you wish to merge, and select “OK.” Repeat until all the necessary project data files have been merged. The details of merging projects with problematic similarities are beyond the scope of this guide. You may wish to consult Geosystem’s Help menu for more information. Now that all of the project files are combined, remember to replace any characters that could interfere with the exported data file’s delimitation.

How should I organize my data export system?

Geosystem stores the arrangement of your exported data in a *.lfg file it calls a “Configuration.” Since you can save a nearly infinite number of configurations, you may export your data in a variety of formats for a variety of goals. This also makes it possible to break up a single project data file into pieces that are more easily assimilated into your final product. Due to the likelihood that your first attempt at data export will not be perfect, I highly recommend that you save a configuration as soon as you have finished arranging your data fields for export.

Creating a configuration is quite easy. Simply drag the data fields you need from the Dictionary text box to the Table box and select “OK” when prompted (See Figure 5.) Note that the names displayed in these two text boxes may not be the same. The name in the Dictionary box represents the string used by Geosystem to label its data fields, while the name in the Table box represents the string that will be exported as a column title. The other boxes can be safely ignored, as anything entered in them will not be exported. Geosystem shows them only because the software developers have used the same window for printing data summaries. You may wish to use the “Title:” box to enter a note describing the purposes for which you created this export configuration. Likewise, the Layout menu only pertains to printed summaries.

5 Figure 5. The Column Definition Window

The order that you drag your data fields determines the order of the columns in the exported data file. Unfortunately, Geosystem does not offer any easy means of changing this order once you have created it. The only way to change the order is to delete all the data fields between the desired location of a data field and its current location. Therefore, you should take care to add fields to the Table box in the exact order you need. To delete a data field from the Table box, double click the field and select “Delete.”

Once all the data fields have been selected and put into the correct order, the column titles should be renamed. If you will be importing the exported data file directly into a program that requires specific strings, this is an absolute necessity. If you will be opening the file in a program that allows text editing, you can change the names of the column titles later. However, keep in mind that most users will export their data using the same configuration file more than once. Saving the new field names to a configuration file will save you the effort of retyping column titles again and again. To rename the data field, double click on it in the Table text box, change the name in the Column Title text box and select “OK.” When writing field names, avoid using quotation marks and whatever characters you will use to delimit the file.

You may have noticed two check boxes in the Column Definition window. These are self-explanatory, but I will note that the format in which the counts are presented is “total=”#. Many users will prefer a more sophisticated set of statistics available from any of the variety of programs that can use the exported data.

Once you have confirmed that all of the data fields are in the correct order and have the correct titles, go to the File menu and select “Save Configuration.” Once you have saved the configuration file, go to the File menu and select “Export to Disk…” Choose your directory and file name and select “OK.” This brings up the Data Export Options window (See Figure 6.) Unfortunately, these options are not saved in the configuration file. Instead, the program itself seems to store the settings from the last use. Thus, you will need to check that these settings match the requirements for the exported data file.

6 Figure 6. The Data Export Options Window

The three most common characters used to delimit (separate) data are probably tab, comma, and space. If you have a choice, tab is usually the best, as no Geosystem fields contain tab characters. This is because the Data Entry window interprets any tab keystroke as the cue to move the cursor to the next data entry field. Most Geosystem project data files will contain both commas and spaces. If you must delimit your exported data file with these characters (or any other character that appears in the data file,) you may check “Surround each item with quotes.” This will allow many programs to interpret the data fields containing delimiters as solid blocks, ignoring any characters that are actually part of your data. Unfortunately, this requires that your project data file contain no quotation marks. (See Section 2 for a description of how to easily remove problematic characters from your file.) Alternatively, you can export any file in a tab-delimited format and use a spreadsheet or text editor to modify it by hand. Since this is likely to be more work, you may wish to work out an export system that uses Geosystem to its full potential.

Unless you will be accessing your exported data using a software program that cannot accept column titles, you should check “First row has column titles.” The contents of the exported data file may be easy for you to interpret today, but they may not be so obvious to anyone who must review your work in the months and years to come. Once you are satisfied that the options will preserve your data structure, select “OK.” Geosystem will now create the exported data file. This process can take quite a while with a large project or a slow computer.

Am I done yet?

Once the file has been exported, you should open it up in a spreadsheet program to make sure that everything is where it should be. Take a look at the data contained in the various columns. Is it reasonable? If not, a couple of things may have happened. The first is that you may have typed the column title into the wrong data field. If you have saved a copy of your configuration, this will be easy to check. The other possibility is that there are delimiting characters or quotation marks within your data that are being misinterpreted by the spreadsheet program. You must eliminate these or change the export options and export your data again. Do not save the file that you have opened up, as the spreadsheet program will change the way that the data is stored. If all of the data appears to be correctly labeled and ordered, close the file. Your data has been exported successfully.

You may find the final product unusable due to Geosystem’s inability to list all relevant data on each line. (For instance, each boring number only appears once.) This problem can only be corrected manually. Most users will be able to quickly fix their files in a spreadsheet program. However, due to the idiosyncrasies of spreadsheet file formatting, some users must use a basic text editor—such as Microsoft Notepad—to amend their data.

Appendix: A Sample Project Exported

For this hypothetical project, a researcher wishes to develop a regional coefficient for correlations between standard penetration test N-values and undrained shear strength in northern Virginia. To do so, she will be using a software program called NCAN to process the raw data into depth-based and geologically based families prior to determining the relevant coefficient. NCAN is very fussy about data formats and will only accept a comma-delimited text file with a *.ndf extension. Furthermore, the data must be stored in a table with the following format:

Depth Stratum SS N
# Text # #
# Text # #
# Text # #

Note that because the data is stored in a text file, the researcher does not have to create any special number format as she would for a spreadsheet file. However, she still needs to ensure that the data itself is stored correctly. NCAN’s manual states that Depth is measured in feet, Stratum can be any one-word description of the geology, SS stands for undrained shear strength measured in tsf, and N is the N-value.

The researcher has collected data from a variety of organizations. Some of the information has already been stored in Geosystem files, but most of it had to be typed into Geosystem. The researcher chose to enter the new data into the project data file which already contained a large quantity of information. Next, she merged the other project files into the main project data file. The researcher found it necessary to create a new data entry field called Stratum. For convenience, she placed Stratum and Material Description at the start of each card (See Figure 7.) Then, she decided upon several geological categories and manually entered the relevant categories into the Stratum field based on the material descriptions available.

7 Figure 7. Creating the Stratum Field

Although the results of laboratory testing for shear strength were stored in Geosystem’s standard field “Shear,” the units are pounds per square foot (psf.) The researcher decides to create a calculated data field to store the data converted to shear strength measured in tons per square foot (tsf.) She would use SS as the field name, but that has already been taken by a custom data entry field in one of the merged Geosystem project data files. So she uses “Shear2” instead (See Figure 8.) Unfortunately, Geosystem treats all calculated fields as subject to manual override. In this case, there is no data in Shear2, but the program understands this to represent a blank value entered by the user. Thus, the researcher must manually calculate and type each tsf value into the new data entry field.2

8 Figure 8. Creating a Second Shear Strength Field

Once all of the data has been entered, the researcher prepares to export it (See Figure 9.) She places the fields in their final order, renames the column titles, and saves the configuration. Next she exports the data, using column titles and comma delimiters. Quotation marks are not needed because none of the data fields are likely to contain commas. Because the project data file is large, Geosystem takes about two minutes to export it. The computer’s resources are stretched by the export process, so the researcher takes a hard-earned break.

9 Figure 9. Saving the NCAN Export Configuration

When she returns, the data has been exported to NorthernVA.txt, the filename specified during the export process. The researcher opens NorthernVA.txt in Microsoft Excel, using the comma-delimited option. She scrolls through the data, making sure that everything is in its proper place. Because none of the data fields contained commas or carriage returns, all of the data is where it should be. The researcher closes the file without saving and renames it to NorthernVA.ndf. Now the data is ready for export to NCAN.

1 In my experience, the Replace tool in Geosystem is buggy, so you should scan the resulting text for errors. Due to this fact and the difficulty of use, it is best to clean data after export, if this is at all possible.

2 It would have been much easier to export the data in “Shear” and use a spreadsheet program to calculate and save the data into “SS.” However, this would have precluded the use of shear strength measured in tsf in boring logs or other reports created in Geosystem.

No comments:

Post a Comment