In-Class Exercise 2 [Joining Tables]
Data organization and management in related tables.
A. Introduction to the Callalli study
B. Problem 1: Ceramics Lab results. Cleaning up the database logic by creating a subset of a table and rejoining to the original.
C. Problem 2: Show relationship between size the archaeological site (m2) and obsidian density using Pivot Tables and Summarize Column, then show with pie charts.
In lab this week we will focus on a-spatial data management techniques in ArcGIS. Joining and relating tables are powerful tools, but it also creates a structure that can make queries and representation more difficult.
River terraces along the Llampa River during the dry season.
We will be working with data from my dissertation survey project in highland Peru. This research area is at about 4200 meters above sea level at the upper limits of agriculture, and about 3 hour walk from a major obsidian source known as "Chivay". People brought obsidian nodules back to their residential valley and reduced the obsidian either for further transport or for immediate use. Further background information to this survey block is provided here Links to an external site. although not necessary for this exercise.
Download Callalli Geodatabase Download Callalli Geodatabase and unzip into your work directory.
Load all of the contents in Arcmap.
Organize the layers in the table of contents in the following order (you must be in "List by Drawing Order" mode in the Table of Contents)
Lithic_p | lithic artifact as a geographical point |
Ceram_p | ceramic artifact as a geographical point |
Lithic_a | lithic concentration as a polygon area |
Ceram_a | ceramic concentration as a polygon area |
Site_a | archaeological site boundary as a polygon area |
Hydro_Lines100k | stream network from 1:100,000 scale map |
Callalli_hs | Hillshade from Digital elevation model from Peru with 30m cells. Shown with 60% transparency to reveal DEM below. |
Callalli_dem | Digital elevation model from Peru with 30m cells |
Note that if you use the "List by Source" mode in the Table of Contents you'll also see non-spatial data and these include
Lithics_Lab1 | Coarse overview of lithics collections differentiating only obsidian and non-obsidian by weight |
Lithics_Lab2 | Detailed investigation of lithics focusing on bifacial implements. |
Ceram_Lab2 | Detailed study of ceramics focusing on temporally diagnostic pottery. |
The recording strategy Links to an external site. for this archaeological survey work will be explained in class and we’ll discuss more extensively during the Mobile GIS week at the end of the semester.
Suffice it to say that rather than simply recording artifacts (point geometry) in sites (polygon areas) we also recorded artifact concentrations with polygons as loci (lithic_a and ceramic_a).
Example of a high density area containing point geometry, loci, and a site boundary. |
The function of the database fields ArchID, LocusID, SiteID
- Open the attribute table for Lithic_P.
You'll see that Row 2 is a projectile point with ArchID 615 inside of a Locus (614) that falls inside of a Site 612. It’s a nesting relationship with Locus being analytically more useful than “site” but site areas are important management tools.
Note that these are spatial. Archaeological work also involves non-spatial data, for example lab analysis results, and these are referenced with one more ID field called “CatID”.
In View by Source mode you can see the spatial and non-spatial data in Arcmap.
Uncheck the All_ArchID (centroids) layer to prevent these from displaying
I’m leaving it up to you to symbolize it but here are suggestions:
- show sites as tan, ceramics as red, lithics as green. Orange circles = pottery sherds (ceram_p), green triangles = lithic_p.
- hydrology = blue. Lg river and smaller stream line width can be symbolized as classified using the Size value
Note that principal GIS file formats include: coverages, shapefiles, Geodatabase (Personal and File). Raster / GRID. This is an example of a personal geodatabase.
B. Problem 1: Ceramics Lab results. Cleaning up the database logic.
Working at the artifact level of analysis the ArchID number system is problematic because it only indexes items that have spatial location and not all artifacts were point-provenienced. Many are recorded in aggregate inside of a Locus or Site polygon. This is a 1-to-Many (1:M) relationship issue
The background: You have collected artifacts and did not have time in the field to point-provenience everything you collect however you do know what locus the collections came from. Subsequent lab analysis of those collections results in a detailed database and you now have analysis information on every artifact but you don’t have a point location for each one. One (locus spatial ) to Many (sherds lab results). This process allows us to link those sherds back to space.
Chevron incised and reed stamped rim sherds from ArchID 679
Pause for a moment and try think of some solutions to this problem of referencing back through a Many-to-One relationship. recall that in Ex. 1 we generated the total count of pyramids per country by using a spatial join to an Administrative boundaries layer. How can we condense or summarize our data in the Many table to link it into the One table?
1. Open the Ceramics_Lab2 attribute table.
- Click Add field… in the Attribute Table menu below.
- Create a Long Integer field and call it “ArchCat”. Make a second one called “ArchCat_temp”
- Do you see both ArchCat fields in the table? Scroll to the right.
- Start an editing session (if you have not, any changes you make are permanent)
2. Make sure no rows are selected before the next step
- Right click the “ArchCat_temp” field title column
- Choose “Field Calculator…” in the dropdown menu.
- Multiply the contents of ArchID by 1000 into this field.(Double click “ARCHID” in the ‘fields’ table)
- Next, Right click the “ArchCat” field title column
- Choose “Field Calculator…” in the menu.
- Add the contents of “CatID” to “ArchCat_temp” into this field.
Do you now have a six digit ID#? That’s your new unique ID number that allows you to refer to individual artifacts.
Step 3. Let's say you're sharing a subset of your data with someone and all you'd like to provide is Diameters and the ArchCat attribute. This exercise will take you through the process of slicing off one portion of your data and then rejoining it back to the source table.
Cut off the “diameter” measure is also a more efficient design since only some rim sherds have diameter values. In the case of huge databases with millions of records it's wasteful to have database fields that are mostly empty, as with our Diameters field.
This is an example of a process called "database normalization".
- Right click the Diameter column and sort Descending.
- Select in blue the rows with values > 0
- Right click the Ceram_Lab2 layer and choose Export Data > Export… and save to a new table called Rim_Diameters. Save it back into the Callalli geodatabase as a table and add new table layer to the map.
- Look at the new table. Where is your ArchCat field? It’s back on the right side. The goal here is to have a very efficient database structure with little redundancy and minimal empty cells.
- Open the toolbox (red toolbox in the top toolbar) and go to Data Management Tools > Fields > Delete Fields…
- Choose the RimDiameters table.
- Select All, then uncheck "ArchID", “Diameters” and “ArchCat” (at the end of the list).
- Go to the Ceramics_Lab2 table and delete the Diameters field.
- Right click the Ceramics_Lab2 table and click “Joins and Relates… > Join” and join the table to Ceramics table to the RimDiameters table. Think about the fields that you will use to Join. In Arcmap Joins are One:One and Relates allow One:Many.
- Next, join the Ceramics_lab2 table with the Ceram_p layer.
Other selection methods. Try selecting data in the Ceram_p (the spatial layer not the lab file) file using Select by Attributes.
The text you enter in the box is a modified SQL query. Try to select all of the Bowls from the LIP period and look at the selection on the map
You should have just one record selected.
The point of this exercise was to show the process of peeling off a subset of the data
C. Problem 2: Show relationship between size the archaeological site (m2) and proportion of obsidian artifacts in the lab2 analysis.
Representing spatial information in Arcmap (symbolizing and labeling) is often much simpler if you pull together relevant information into a single layer that also does the calculations you want to display. The basic problem here is that there are Many artifacts coming from each site. In order to represent the data on the Many side we need to collapse the contents of the Many table into units so the One table can symbolize it.
Steps:
1. Problem. You’ve explored the data in Site_A and in Lithics_Lab1 and you recognize that you have to collapse the values in Lithics_Lab1 before you can show them on the map. You can collapse table values with a tool called Pivot Tables and then by Summarizing on the SiteID column.
First, make sure there are no rows selected in the “Lithics_Lab1” table.
Open the Pivot Tables function in the toolbox under Data Management > Tables > Pivot Tables
2. Pivoting the data table. In the Pivot Tables box select “Lithics_Lab1” as your pivot table. This is the table that has the quantity of data we want to reduce. We are effectively swapping the rows and columns in the original table and summarizing selected quantitative information by the Site field so we can link this information to space.
- Click in the Input fields box and read the Help on the right (click Show Help button if it’s not on).
- “Define records to be included in the pivot table”. Think about this… we can only link these lab results to space if we can tie them to spatial units. Space, in this exercise, is defined by SITE areas. Therefore we ONLY want to deal with data in terms of the site that falls into. If you have records logged by other criteria (too many boxes checked) you would have too many rows in your pivot table and it will not mesh with the Site_A feature. Therefore select only SITEID.
- Click in the Pivot Field box and read the help. “Generate names of the new fields”. Fields are columns in tables therefore you are choosing the field that will become your column headings. We are trying to distinguish Obsidian from non-obsidian therefore pick MATERIAL.
- Click in the Value field and read the help. “Populate the new fields”. In other words this is the value be summed for each site per material type. Choose Peso_Orig (spanish for Weight, measured in grams).
- The result should include the following: for each site with obsidian or non-obsidian we will report the weight of each material in separate columns that can easily be summed.
You can use the default filename. It should end up in your default geodatabase. Have a look at the contents of the output table. Note that there are many SITEID values that are NULL. That is because there were artifacts collected outside of sites in some cases. Also note that the MATERIAL TYPE values have become column headings (Obs, NotObs), but there are still duplicate sites in rows as you can see in the SITEID column.
- If we want a table to show in a report we can generate this with Summarize by the SITEID column in order to collapse these into a single line per Site. Right-click the column header title and go to Summarize on SITEID.
- Next, check sum in “Obsidian” and check sum in “Not Obsidian”. Save it adjacent to your other datasets in the Geodatabse. Name the table “Obsidian_by_Site”.
Look at the resulting table. Is there only one SITEID value per row? Why are there so many with a 0 SITEID value? Where did those data come from?
This summarized table is useful for reports. It can be saved (File > Export Data...) with the format "Text" for a comma-separated table for use in Excel or elsewhere.
3. Representing the data. Now that the results for each site is available on a single line you can Join it to the Site_A feature. Part of the process of Joining it also summarizes against Site so we’ll use the Summarized Pivot Table output.
- Right-Click the Sites_A feature and go down to Joins and Relates… and choose Join…
- Make sure the top field is for “Attributes from a Table”. Then jump down to the #2 question in the middle and choose your Obsidian_by_Site. Next, select the appropriate fields in fields #1 and #3 on which to base a join. You should be able to join ARCHID in the Site table with SITEID in the Obsidian_by_Site table.
- Click “Advanced…” and read the different descriptions. This distinction is also known as the “Outer Join” vs “Inner Join” relationship. Choose “matching records” (i.e., inner join, or exclusive rather than inclusive join). This will result in only showing sites that have values in the "Obsidian_by_Site" table (in other words, only sites with Obsidian will appear in the map.
Look at the resulting table. Does it contain all the information you need collapsed onto rows? Look at the number of records. There were 88 in the original Site_A table, now there are fewer. Why is that? The original question asked about the relationship between site size and presence of obsidian. How are you going to measure site size?
4. Symbolizing the data. You can show these data using a number of tools. For example, for a simple map you could divide the weight of obsidian by the weight of non-obsidian per site and symbolize by the ratio of obsidian over non-obsidian.
Here, we will do something slightly more complex and use a pie chart to show the relationship with respect to size.
- Right click Sites_a choose Properties, and Symbology tab…
- In the Show list choose Charts > Pie
- Choose the bottom two fields (Obsidian, Not Obsidian) on the Field Selection and click the right arrow.
- Change the colors of the two fields you selected to two contrasting fields.
- Click OK and look at the result.
- What about the size of the site? Right click Site_A choose Properties > Symbolize … again
- Click the Size… button at the bottom center. Choose “Vary size using a Field…” and choose “Shape_Area” as the Size.. field and “Log” as the Normalize field.
Zoom in and out and notice how the symbols change as you pan around. Zoom way in to a single busy site. It looks better at that scale doesn't it?
Experiment with other ways of symbolizing these data.
As a final step go to File > Export Map... and choose Save as Type: PDF (300 dpi)
Save this project, we will work on it more later.
Assignment: Upload the PDF you generated immediately above as a file to bCourses for this exercise.