This R markdown document provide an example of how to use the uniform probability density analysis function to model site population history using dated cermic objects. For technical details of the approach see the Ortaman 2016 and Mills et al. 2018 articles cited in the repository.

For an example here I am using data from a series of sites in the San Pedro region of southern Arizona. I provide examples of how this procedure can be implemented for a single site or for a data file containing information on many sites. Note that the scripts used here are designed to take advantage of parallel processing but this procedure may still take several minutes for thousands of sites.

For more information on the sites or data used here, see the cyberSW web platform.

Initializing Scripts and libraries

This line will initialize all of the required libraries and define the two functions to be used here. The "updf" function runs the uniform probability density analysis for user provided data and the "updf.plot" plots the output of the updf function.

source('updf_functions.R')

Reading in Data

Next we read in the data file and create a variable containing site names.

dat.long <- read.csv('data_SP.csv',header=T)
sites <- unique(dat.long$Site)

Running the scripts for a single site

Let's first try running the script for a single site. The updf function expects the following user defined input:

The function also needs a few additional user supplied parameters * interval - the rounding interval to be used to define ceramic modeling periods * min.period - the minimum length of a period to include (periods shorter than this interval will be merged with the largest adjacent period). * cutoff - the minimum probabiilty of occupation required to set the beginning and ending date for the site.

The next chunk of code first extracts a single site from the data we read in above and then runs the updf code for that site and displays the extensive results. The output includes the site name, the prior probabilites by period, the posterior probabilities by period, the conditional probabilities by period, the period designations, the ceramic sample size, and the estimated period of occupation given the cutoff provided.

ex1.site <- dat.long[which(dat.long$Site=='Davis Ranch Site'),]

ex1 <- updf(site=ex1.site$Site,
            cer.type=ex1.site$Type,
            ct=ex1.site$Count,
            start=ex1.site$Begin,
            end=ex1.site$End,
            chron=ex1.site$Chronology,
            interval=25,
            min.period=10,
            cutoff=0.1)

ex1
## $site
## [1] "Davis Ranch Site"
## 
## $prior
##    AD200-400    AD400-500    AD500-650    AD650-700    AD700-750    AD750-800 
## 0.0000000000 0.0000000000 0.0000000000 0.0002096248 0.0041039040 0.0049873227 
##    AD800-850    AD850-900    AD900-950   AD950-1000  AD1000-1100  AD1100-1125 
## 0.0092172511 0.0159102706 0.0137915631 0.0166783957 0.0190274409 0.0037722477 
##  AD1125-1150  AD1150-1200  AD1200-1250  AD1250-1275  AD1275-1300  AD1300-1325 
## 0.0037722477 0.0022464788 0.0013960012 0.0012707254 0.0145572996 0.0211455067 
##  AD1325-1350  AD1350-1375  AD1375-1400  AD1400-1450  AD1450-1600  AD1600-1700 
## 0.1741705918 0.1782365637 0.1718355216 0.3436710431 0.0000000000 0.0000000000 
##  AD1700-1800  AD1800-1900 
## 0.0000000000 0.0000000000 
## 
## $posterior
##    AD200-400    AD400-500    AD500-650    AD650-700    AD700-750    AD750-800 
## 0.000000e+00 0.000000e+00 0.000000e+00 3.143985e-05 1.938122e-03 2.963640e-03 
##    AD800-850    AD850-900    AD900-950   AD950-1000  AD1000-1100  AD1100-1125 
## 5.660352e-03 9.425613e-03 7.333303e-03 1.012831e-02 1.143070e-02 2.585863e-03 
##  AD1125-1150  AD1150-1200  AD1200-1250  AD1250-1275  AD1275-1300  AD1300-1325 
## 2.810109e-03 1.472963e-03 1.055689e-03 1.187753e-03 2.482397e-02 3.298839e-02 
##  AD1325-1350  AD1350-1375  AD1375-1400  AD1400-1450  AD1450-1600  AD1600-1700 
## 2.020959e-01 2.009139e-01 1.727041e-01 3.084499e-01 0.000000e+00 0.000000e+00 
##  AD1700-1800  AD1800-1900 
## 0.000000e+00 0.000000e+00 
## 
## $conditional
##   AD200-400   AD400-500   AD500-650   AD650-700   AD700-750   AD750-800 
## 0.003142044 0.004979133 0.004310641 0.009218423 0.029027044 0.036523867 
##   AD800-850   AD850-900   AD900-950  AD950-1000 AD1000-1100 AD1100-1125 
## 0.037745121 0.036412530 0.032681724 0.037325130 0.036924229 0.042133172 
## AD1125-1150 AD1150-1200 AD1200-1250 AD1250-1275 AD1275-1300 AD1300-1325 
## 0.045786956 0.040300308 0.046480310 0.057450413 0.104811558 0.095887474 
## AD1325-1350 AD1350-1375 AD1375-1400 AD1400-1450 AD1450-1600 AD1600-1700 
## 0.071318371 0.069283848 0.061774414 0.055164603 0.006620858 0.015889046 
## AD1700-1800 AD1800-1900 
## 0.016181328 0.002627454 
## 
## $period
##       [,1] [,2]
##  [1,]  200  400
##  [2,]  400  500
##  [3,]  500  650
##  [4,]  650  700
##  [5,]  700  750
##  [6,]  750  800
##  [7,]  800  850
##  [8,]  850  900
##  [9,]  900  950
## [10,]  950 1000
## [11,] 1000 1100
## [12,] 1100 1125
## [13,] 1125 1150
## [14,] 1150 1200
## [15,] 1200 1250
## [16,] 1250 1275
## [17,] 1275 1300
## [18,] 1300 1325
## [19,] 1325 1350
## [20,] 1350 1375
## [21,] 1375 1400
## [22,] 1400 1450
## [23,] 1450 1600
## [24,] 1600 1700
## [25,] 1700 1800
## [26,] 1800 1900
## 
## $samp.size
## [1] 42714
## 
## $occ
##       lwr  upr
## [1,] 1300 1450

Plotting the results

Since the plotting function was designed with the updf output in mind, plotting is just a single line of code. In the output, the red line represents the uniform prior probability estimates, the blue dotted line represents the conditional, and the filled gray area representes the posterior probability estiamtes. The black dotted lines represent the estimated site beginning and ending dates based on the probability cutoff. Note here that there are low probability intervals both before and after the 1300-1450 interval between the dotted lines in the posterior probability estimates. These may represent smaller distinct components, heirloomed vessels, later revisitation, or other processes. The dotted lines are not meant to suggest we should exclude those intervals from consideration but instead are meant to highlight the most likely primary occupation interval represented by the ceramic assemblage provided.

For this function, you just need to provide an updf object, and select a beginning date (beg.date), ending date (end.date), and labelling interval (plot.interval) to define the years that will be shown on the x-axis.

  updf.plot(ex1,
            beg.date=500,
            end.date=1900,
            plot.interval=50)

Running the analysis for multiple sites

One simple way to run these analyses and generate output for multiple sites is to use loops and lists. In the following chunk of code we define a list and loop over all sites included in our data file to run this analysis site by site. Here we limit ourselves to the first 3 sites in the dataset but this could easily be applied to all. Also, the output could be wrapped in a pdf() function to output plots directly to a file.

out.list <- list()

for (i in 1:3) { # 3 could be replaced with length(sites) to plot and analyze all included sites.
  site.qv <- dat.long[which(dat.long$Site==sites[i]),]
  
  updf.out <- updf(site=site.qv$Site,
                   cer.type=site.qv$Type,
                   ct=site.qv$Count,
                   start=site.qv$Begin,
                   end=site.qv$End,
                   chron=site.qv$Chronology,
                   interval=25,
                   cutoff=0.1,
                   min.period=10)

  out.list[[i]] <- updf.out

  updf.plot(updf.out,
          beg.date=200,
          end.date=1900,
          plot.interval=50)
}

out.list[[1]]
## $site
## [1] "Swingle's Sample"
## 
## $prior
##   AD400-700  AD700-1000 AD1000-1100 AD1100-1200 AD1200-1250 AD1250-1275 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.001854599 
## AD1275-1300 AD1300-1325 AD1325-1350 AD1350-1400 AD1400-1450 AD1450-1700 
## 0.119913099 0.134749894 0.139200933 0.302140738 0.302140738 0.000000000 
## 
## $posterior
##   AD400-700  AD700-1000 AD1000-1100 AD1100-1200 AD1200-1250 AD1250-1275 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000952583 
## AD1275-1300 AD1300-1325 AD1325-1350 AD1350-1400 AD1400-1450 AD1450-1700 
## 0.097324721 0.144433501 0.147806238 0.325509671 0.283973285 0.000000000 
## 
## $conditional
##   AD400-700  AD700-1000 AD1000-1100 AD1100-1200 AD1200-1250 AD1250-1275 
##  0.01094322  0.01524328  0.03070176  0.03948130  0.06666994  0.07734708 
## AD1275-1300 AD1300-1325 AD1325-1350 AD1350-1400 AD1400-1450 AD1450-1700 
##  0.12222153  0.16141009  0.15989754  0.16223546  0.14153354  0.01231528 
## 
## $period
##       [,1] [,2]
##  [1,]  400  700
##  [2,]  700 1000
##  [3,] 1000 1100
##  [4,] 1100 1200
##  [5,] 1200 1250
##  [6,] 1250 1275
##  [7,] 1275 1300
##  [8,] 1300 1325
##  [9,] 1325 1350
## [10,] 1350 1400
## [11,] 1400 1450
## [12,] 1450 1700
## 
## $samp.size
## [1] 1219
## 
## $occ
##       lwr  upr
## [1,] 1275 1450