Abstract: Social scientists investigating how context varies by geographical location and/or how macro-level phenomenon affects individual outcomes often make use of U.S. Census Bureau Public Use Microdata Sample (PUMS) files where micro-units can only be geographically located to Public Use Microdata Area (PUMA) polygons. Most spatial analysis investigations with PUMAs ignore the fact that many of them are multipart polygons-spatially separated polygons that share the same attribute and are stored as a single feature in a vector file. We briefly discuss the theoretical premises of how geographical boundaries are created for macro units and investigate the quantity, degree, and location of PUMA fragmentation. We argue that the basic contiguity principle (the assumption that spatial analysis uses polygon centroids for solid and contiguous geographic units) in spatial dependence analysis is being violated with many PUMAs in the U.S. mainland-where Texas, California, Tennessee, and Illinois merit special attention. Future research should outline a method for handling multipart polygons in spatial and hierarchical analyses.
Keywords: Spatial analysis, Spatial demography, ACS, PUMS, PUMAs, Clustering.
Article Info: Manuscript Received: March 12, 2013; Revised: November 9, 2013; Accepted: November 12, 2013; Online: November 18, 2013.
Introduction
A general tenet in spatial demography is that "con- text" (interchangeably refer to as the environment or the macro-unit) matters and should be accounted for in research (see Dürkheim, 1951). The interaction between the individual (interchangeably referred to as the micro-unit) and his/her context can operate through social interaction with physically-proximal others and the perception of prevailing norms in immediate-context (see Books and Prysby, 1988). The use of geographically referenced ("geo-referenced") information continues to grow in popularity as many research sectors, like public health, begin to expand their use of Geographic Information Systems (GIS: Cromley and McLafferty, 2012). Geo-referencing data has frequently been cited as one of the most im- portant applications of GIS (Mansour et ah, 2012) because merging person-level with place-level infor- mation may offer researchers the ability to address important questions on how the environment inter- acts with the individual to affect life chances and health outcomes.
Researchers requiring use of geo-referenced in- formation must either develop their own data (an expensive and time-consuming process) or make use of readily available secondary data. Our project sub- stantively contributes to "place effect" oriented hu- man geographies research by expanding our under- standing of how context is measured in a readily available and popular secondary data source. We briefly discuss the theoretical assumptions in geo- graphical units when measuring context. For social sciences, the theoretical importance of environmental measures lies on the largely implicit assumption that relationships among geographically distributed mi- cro-level variables vary as a function of macro-level phenomenon and/or geographical location. This view is based on the assumption that everything is related to everything else, but near things are more related than distant things (Tobler, 1970).
Testing the effects and spatial clustering of macro- level phenomenon is important and requires the ap- propriate delineation of areal boundaries (i.e., geo- graphical boundarization) and measurement of con- text. Consequently, the boundaries of spatial units matter (see Jacquez, 1995). Challenges for estimating "spatial clusters" in human populations dates back several decades (Cuzik and Edwards, 1990) and con- tinues to date with publications highlighting the fact that boundaries in spatial units can encapsulate sin- gle- and multi-part geographies (Siordia and Fox, 2013). Multipart units refer to spatially separated pol- ygons that share the same attribute and are stored as a single feature in a vector fde. Siordia and Fox (2013) explain that treating multi-part "polygons as contigu- ous spatial entities erroneously imposes a false struc- ture of contiguity that challenges theoretical and statistical assumptions in geographically aware re- search" (Pg. 42). Multi-part polygons may be creating "geospatial mismatch" (Siordia and Fox, 2013) by geo- graphically referencing data to the wrong location. Geospatial mismatch has been documented in other publications where probabilistic methods with re- gression models are suggested for the classification of "mislabeled points" (Mansour et al" 2012). The im- portance of "positional accuracy" continues to receive extensive coverage in different fields (Hart and Zandbergen, 2013; Shootman et al" 2007; Zandbergern, 2008).
Although frequently absent in print, we posit that there are five principal components underlining the justification for social sciences' ability to derive meaningful geographic boundaries. Potential for measuring structural phenomenon can only exist if macro-measures are: (1) detectable; and made up of geographical polygons that are (2) non-overlapping, (3) contiguous, (4) non-porous, and (5) non-fluid. For example, measuring "neighborhood economic depri- vation" (Burdette, 2013; Law and Quick, 2013) requires the ability to conceptualize, operationalize, and de- velop data that capture the concept. If the ability for measuring the macro-measure is present, investiga- tors must then spatially reference the measure by using solid polygons that are mutually exclusive. "Sol- id polygons" refers to areas where geographical units do not have gaps, constantly shift, or are fragmented. "Mutually exclusive" refers to geographical units that do not non-overlap in space. The main point here is that the "the contiguity of the geographical polygon is an implicit and necessary condition" (Siordia and Fox, 2013, pg. 44) in spatial analysis seeking to geo- reference data (Grubesic and Matisziw 2006). Our discussion focuses on the contiguity principle.
Social scientists' ability to capture the effects of context on micro-outcomes and the spatial autocor- relation within such measures depends on their abil- ity to avoid violating the postulates outlined above. Spatial autocorrelation refers to the fact that "meas- urements made at nearby locations may be closer in value than measurements made at locations farther apart" (Srinivasan and Venkatesan, 2013, Pg. 1). If the contiguity principle proofs necessary, then the social significance, reliably, and validly of macro-measures will be determined by our ability to properly demar- cate boundaries. The current study only focuses on a particular challenge (i.e., polygon fragmentation-the presence of multi-part polygons) when attempting to demarcate geographical boundaries.
Spatial Reference and Dependence
Geographically referenced demographic data has many analytical uses. Hierarchical (Raudenbush et al, 2002) and spatial approaches like local clustering (Anselin, 1995) and geographically weighted regres- sion (Fotheringham et al 2002) have been applied across many investigation topics (e.g., Fullerton, 2012; Liu and Painter, 2012; Yang and Matthews, 2012). At the core of these investigative approaches is the idea that context plays a significant role in an individual's behavior-that "place maters" (Laraia et al., 2012).
There are five general uses of spatially reference demographic data: (1) to create customized macro- measures in a spatial clustering investigation (Wang, 2007); (2) where the micro-unit is the unit of analysis in a model that includes macro-unit measures (e.g., Raphael and Stoll, 2010); (3) where the micro-unit of analysis and its geographical location is required in the estimation of the model (e.g., Barrios et al, 2010); (4) where the macro-unit, as the unit of analysis, requires a customized estimate (e.g., Yu and Myers, 2007); and (5) where the geographic polygon is used in areal interpolation procedures (Salvatore et al, 2007). Polygon fragmentation could pose problems for each of these as follows: (1) fallible polygon cen- troids; (2) indeterminable inaccuracy of macro-level measures assigned to micro-units across fragments; (3) variation in precision for micro-units' approxi- mate physical location; (4) misleading aggregations; and (5) ambiguous areal interpolations from frag- ments (for more details see Siordia and Fox, 2012). Our investigation only focuses on the first challenge: showing evidence that multipart polygons exist. The crucial point is that "polygon centroids" in multi-part geographical units may be calculated outside the polygon-thereby creating a geo-spatial mismatch.
Because a polygon's demographic attribute (e.g., percent who do not speak English) is partially a func- tion of the same attribute in neighboring units (see Flint et al, 2000), investigating spatial dependence can compliment cross-level modeling (see LeSage and Pace, 2009). Explorations on spatial autocorrelation are fundamentally interested in knowing if/how con- texts vary as a function of location, in capturing how demographic attributes can be spatially nonstation- ary (i.e. vary as a function of geographical space). Others have explained elsewhere that ignoring spatial dependence has theoretical and statistical implica- tions (Vilalta, 2011). Thus, when possible, researchers should seek to investigate how spatial non-stationary processes play a role in macro-level measures.
Measurements of spatial dependence were devel- oped many years ago. The general approach was based on the strong statistical assumptions that devi- ations in observed point patterns can be detected by comparing them to a condition where random point- process are stationary (non-moving) and isotropic (maintains degree of movement in all directions) (Baddeley and Silverman, 1984). Ripley (1976; 1977; 1981) formally introduced the K function (used in spatial clustering analysis) and proved its reliability as a statistical tool to analyze second order moment in a point pattern process. The K function can be given as: K(h) = {E[N0( h)]} -r À, where the numerator is the expected number of events lying within distance h in an arbitrary event of process, and where the denomi- nator À is the intensity of the process (also see Diggle, 1983). So that if we are said to have a benchmark point pattern process is K(h) = 7ih2, then K(h) < 7ih2 signals a "regular" point in the pattern, while K(h) > 71h2 indicates a "clustered" process (Galvis et al, 2009)-where the intensity in movement is over the threshold.
Implicit here is the statistical assumption that benchmark point pattern processes are measured between points representing singlepart spatial units. More simply, spatial clustering techniques may statis- tically be assuming "movements" occur over continu- ous space (Jones and Casetti, 1992) where "local in- stabilities" are clusters (Openshaw, 1993). For exam- ple, measurements of local spatial association (Ansel- in, 1995) make use of the k function by observing the spatial autocorrelation (the correlation of one varia- ble with itself as a function of location). When defin- ing local indicator of spatial association (LISA) statis- tics, Anselin (1995) explains that local spatial depend- ence is present when similar values significantly clus- ter. With similar approaches using points (below referred to as polygon centroids), analysis of mapped planar point patterns then focuses on the behavior of point-attributes and their distances between pairs of points in the pattern (Baddeley and Silverman, 1984). Here again, we have the implicit idea that a solid and singlepart spatial unit is compared with similar spa- tial entities in a given neighborhood-bandwidth.
Polygon Centroids
Most statistical approaches measuring spatial nonsta- tionarity are determined based upon a polygon's ge- ometric centroid. Centroids use planimetric calcula- tions-make use of spatial references from projected (rather than spherical or geodesic) space. Most cen- troids represent a polygon's mean center based on the weighted average of its x- and y-geographic coor- dinates (Mitchell, 2005). A centroid is in essence the center of gravity on which the polygon can be bal- anced and it is a common way of summarizing the location of an attribute in spatial analysis (Goodman et ak, 2012; Wang, 2010). In most research, the dis- tance between spatial units is measured using these feature centroids. This is most appropriate where the polygons are roughly the same size and shape (Mitchell, 2005)-and where polygons are made up of a singlepart shape where reaching all the points with- in the polygon can be done without ever stepping out of the spatial unit.
Since investigating spatial nonstationarity is cru- cial, accounting for it while using secondary data sources requires that scientists make use of whatever geographic polygons are available with the data. The use of multipart geographical units available in sec- ondary data sources presents a challenge with a spa- tial dimension. For example, detail-rich Public Use Microdata Sample (PUMS) files (more details below) only allow individual-level units to be spatially refer- enced to Public Use Microdata Area (PUMA) poly- gons. Although many researchers may be unaware of it, a substantial amount of PUMAs are made up of multipart polygons. We believe burgeoning theories on spatial dependence and the methodologies em- ployed in its measurement must pay careful attention to such a special condition. Our paper makes a sub- stantive contribution to the literature by highlighting this challenge.
A discussion on Figure 1 may help understand the implications of using geometric centroids with sin- glepart and multipart polygons. On the left, in Figure 1, we have a singlepart polygon and on the right (en- closed in the dotted line) we have a multipart poly- gon, where the dark circle for both represents their geometric center. With the singlepart polygon, the centroid is within the spatial unit and adequately represents the center of the unit. In contrast, the centroid in the multipart polygon is outside any of the fragments and is less representative of the center represented in the singlepart polygon. Please note that the fragments that constitute the multipart- polygon vary in shape, size, and distance from each other. Although not shown in Figure 1, the space be- tween the multipart-polygon fragments would, in the case of PUMAs, be filled by other PUMA units. For an example of an actual map displaying multipart poly- gons see Siordia and Fox (2013).
Our core theoretical question is: Is polygon conti- guity a necessary condition when investigating spatial nonstationarity? Contiguity refers to geographical units sharing common boundaries (as is the case with singlepart polygons) while noncontiguity refers to spatial units made up of multiple parts that do not share a boundary (as is the case with multipart poly- gons). A person travelling to all the internal points in a contiguous spatial unit would not have to leave the polygon, while a person travelling in a noncontiguous polygon would have to exit the geometrical unit at some point to enter it again in a different location (Cova and Church, 2000). A fragmented polygon then refers to a multipart spatial unit. If polygon contigui- ty is a necessary condition for investigating spatial nonstationarity, and we contend this is the case, then multipart polygons pose a problem.
Our core argument is that polygon contiguity is necessary in investigations seeking to account for spatial nonstationarity. Exploring how macro-level measures interact with micro-level outcomes and how statistical relationships may vary as a function of geographical location requires, given theoretical and statistical assumptions, that geographical attributes be derived from contiguous polygons. Polygon non- contiguity can have a significant impact on spatial analysis and theory (Grubesic and Matisziw, 2006), because the treatment of multipart polygons as con- tiguous units imposes a false structure that may pro- duce systemic errors in theory and methods. Because investigations on spatial nonstationarity (using the center of the polygon to measure location) are theo- retically premised on the assertion that a polygon's attribute is constituted by a contiguous spatial unit, the presence of PUMA fragmentation merits research attention.
Specific Aims
Our research question is: Is PUMA polygon disconti- guity present in US mainland states? The specific aims of this paper are to investigate the quantity and de- gree of fragmentation at the state-level across US contiguous states and the District of Columbia. Our project quantifies and spatially references where the basic contiguity requirement is violated with PUMA polygons. In closing, we discuss implications of the findings, limitations, and what future research should be undertaken.
Materials and Methods
In the United States (US), the use of US Census Bu- reau microdata (i.e., individual-level data) aids the allocation of governmental funds and services (see Reamer 2010). In order to provide non-Census re- searchers the opportunity to work with information- rich individual-level data, the Census releases Public Use Microdata Sample (PUMS) files. In order to en- sure the confidentiality of survey respondents, they undertake several procedures (e.g., capping age max- imum at 99) and only allow the public data users the ability to geographically locate individuals to spatial units referred to as Public Use Microdata Areas (PU- MAs). Although our investigation explores the geo- metrical attributes of PUMAs and discusses their implications for spatial analysis, similar analysis could be done with other spatial entities.
The US Census Bureau releases PUMA geogra- phies using a combination of numeric or alphanu- meric codes to spatially reference micro-units and where each spatial unit must contain at least 100,000 people. The geographical boundaries of PUMAs are created from collaboration between State Data Cen- ters (SDCs) and the federal goverment (i.e., US Cen- sus Bureau). Criterion for PUMA delineation has changed since 1990 when this approach was first used (Siordia and Fox 2oi2).We investigate the quantity and location of multipart PUMAs (across mainland states) by using Topological Integrated Geographic Encoding Referencing (TIGER) Shapefiles (Zandber- gen et ak, 2011).
Geographic features can be represented digitally for application in geographic information systems using any of a number of possible geospatial vector formats. For the purposes of this study, all analysis employs geospatial features in shapefile format (ESRI 1998). Our 2007 TIGER/Line PUMA shapefiles con- tain the geographic boundaries as of January 1, 2007, which includes a Census 2000 vintage geography. In 2000, counties, minor civil divisions, incorporated places, and tracts were used as the building blocks to deliniate the geographic boundaries of PUMA poly- gons. PUMA multipart data is produced using ArcGIS® 9.3 [software by ESRI. ArcGIS® and ArcMap(TM) are the intellectual property of ESRI and are used herein under license (Copyright © ESRI, all rights reserved) for more information about ESRI® software, please visit www.esri.com] (ESRI 2011).
We use the ArcGIS 9.3 Explode tool in advanced editing to identify the quantity and location of multi- part polygons. From the produced shapefile contain- ing the single-part polygons (which retain their origi- nal PUMA identification number along with a new polygon id) we generated our analytic PUMA- polygon sample.
The quantification and localization of fragmenta- tion, from the resultant data output in the previous step, is done by importing our data and managing it in SAS 9.2 (Copyright, SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA).
Our descriptive analysis explores PUMA fragmen- tation by state. We computed a state's percent of PUMA fragmentation with the following equation: [(total number of PUMAs -r number of fragmented PUMAs)*ioo], This measure captures the amount of PUMA fragmentation by state. We also calculated a ratio to measure a state's degree of PUMA fragmenta- tion by using the following equation: (number of fragments from multipart PUMAs -r number of frag- mented PUMAs). This measure assesses the degree to which multipart PUMAs are fragmented.
Results and Discussion
From the PUMAs (i.e., total number of PUMAs in state) in Table 1, we saw the following states had more than 100 PUMA polygons (number given in parentheses): California (233); Texas (153); New York (143); and Florida (127). Our table provided the measures for states where at least one PUMA frag- mentation was present. The following states were omitted from the table because they have zero frag- mentations (number of PUMAs in state given in pa- rentheses): Delaware (6), Mississippi (23); Montana (7); North Dakota (5); Nebraska (14); Utah (16); West Virginia (12); Wyoming (4); and (5) the District of Columbia. Please note that although these PUMA non-fragmenting states had a small count of poly- gons, some of the fragmenting states in our tables had fewer PUMA counts (e.g., Vermont, Maine, and Idaho). These states are in the "o to 5%" category in Figure 2.
Many factors may be at play in how PUMAs be- come fragmented-since they are the product of fed- eral and local government efforts (see Siordia and Fox 2012). Even though PUMAs are driven by population size, and are shaped by their building blocks (e.g., blocks, tracts) the amount of rural space in the state does not seem to play a role in the presence of frag- mented PUMAs.
Although an internal investigation on the proce- dures involved (between the federal and local gov- ernment) in the delineation of PUMA geographical boundaries would be required to understand why some states avoided fragmenting PUMAs, non- fragmenting states should be commended for avoid- ing (whether intentionally or non-intentionally) the formation of multipart polygons.
We now evaluate states' percent of fragmentation and then discuss the degree to which multipart poly- gons are fragmented for each state.
Relative percent of fragmentation
From Table 1, we first dealt with the "% Fragmented" column. Although the table is sorted (from largest to smallest) on the degree of fragmentation, we made note of states where at least 20% of the PUMAs were fragmented. Figure 2 also displays the data for the US states. The State with the most geometrical fragments was Oklahoma with 7 of its 18 PUMAs being frag- mented. It is followed by Nevada and North Carolina with one third of their PUMAs fragmented. Texas has 41 of its 153 PUMAs fragmented while California has 46 of its 233 PUMAs fragmented. The other states (VT, KS, WI, ID, ME) with at least 20% of their PU- MAs fragmented have a smaller PUMA counts and contain between a 20% and 25% amount of multipart PUMA polygons. From our descriptive analysis on the states' percent of PUMA fragmentation, we found that the small PUMA count states of Oklahoma and Ne- vada had the largest percent of fragmentation, while the large PUMA count states of Texas and California had 27% and 20% multipart PUMA polygons respec- tively.
Degree of fragmentation
We now turn our attention to the degree to which multipart PUMAs are fragmented. From Table 1, we see that Tennessee and South Carolina have a ratio of about 24 fragments for each of their multipart PU- MAs. Illinois follows them with a ratio of 22, while Michigan and Alabama have about 15 fragments for each of their multipart PUMAs. North Carolina and Kansas have a 12 and 11 ratio respectively. Texas and California have about 7 and 6 fragments for each of their many multipart PUMAs accordingly.
From this descriptive analysis on the states' degree of PUMA fragmentation, we find that Tennessee and Illinois have the largest degree of multipart PUMA fragmentation within moderate PUMA count states. Because of their large PUMA count, it is worth men- tioning that Texas and California on average split their multipart PUMAs into six fragments. Please note that in contrast, the large PUMA count states of Florida and New York only have, on average, 3 frag- ments for their few multipart polygons-this too is a commendable accomplishment.
Conclusions
In answer to our research question, we found that PUMA polygon discontiguity was present in US main- land. The basic contiguity requirement for spatial analysis is being violated with many PUMAs in 40 out of 49 mainland US states. Texas and California are of the biggest concern within large PUMA count states, Tennessee and Illinois within moderate PUMA count states, while Oklahoma and Nevada merit attention even though they have a small PUMA count. The fact that PUMA fragments are dispersed over a sea of complete- and fractured-PUMA polygons may allow the "inappropriate" overlapping of feature centroids, given the geographical proximity of the flawed cen- troids from fragments, which could cause the detec- tion of a false-positive (saying that the a relationship is statistically significant when it is not) when inves- tigating spatial dependency. When researchers are exploring spatial clustering or modeling spatial auto- correlation, they should insure their geographical units are not made up by multi-part polygons.
One limitation in our study is that we do not pro- vide the reader with information on the size and shape of fragments or the encaptured square-mile area within multipart polygons. For example, some fragments may be made up of one square-mile while other may be constituted by a polygon ten times that size. Some may have more homogenous shape con- figurations where others are made up of widely dis- perse slithers. Also, fragments may be dispersed over a one-mile radius or a 10-mile radius. Out investiga- tions does not offer insights to these important PUMA fragmentation elements. Future research should pursue these questions. Social scientist should also seek to explore if research in other fields on "edge detection" (e.g., Safher et al, 2011) can help formulate more theory driven macro-level measures and Bayesian clustering techniques. By employing such approaches, the implicit assumption that mac- ro-level boundaries reliably and validly demarcate where an abrupt change in how context occurs can be investigated.
This paper, notwithstanding the limitations, makes a substantive contribution by highlighting the presence, quantity, degree, and location of PUMA discontiguity by states in the US mainland. Building a bridge between geographically aware research and social science endeavors, requires that we expand our dialogue on how our rapidly evolving methods and software can reliably and validly investigate (or fail to do so because of multipart polygons) our research questions. Because there may be many social and policy consequences from deriving erroneous conclu- sions through flawed approaches, more research on the implications of polygon fragmentation in the analysis of spatial nonstationarity is necessary.
References
Anselin, L 1995/ Local indicators of spatial association-LISA, Geographical Analysis, 27, pp. 93-115.
Baddeley, AJ & Silverman, BW 1984/A Cautionary Example on the Use of Second-Order Methods for Analyzing Point Patterns', Biometrics, 40, pp. 1089-1093.
Barrios, T, Diamond, R, Imbens, GW & Kolesar, M 2010, Clustering, Spatial Correlations and Randomization Infer- ence. National Bureau of Economic Research, NBER Working Paper No. 1576.
Books, J & Prysby, C 1988, 'Studying contextual effects on political behavior: a research inventory and agenda', American Politics Quarterly, 16, p. 2U-238.
Burdette, AM 2013, 'Neighborhood context and breastfeed- ing behaviors among urban mothers', Journal of Human Lactation, 29, pp. 597-604.
Cova, TJ & Church, RL 2000/ Contiguity constraints for single-region site search problems', Geographical Analysis, 32, pp. 306-329.
Cromley, EK & McLafferty, S 2012, GIS and public health, Guilford Press.
Cuzick, J & Edvards, R 1990/ Spatial clustering for inhomo- geneous populations', Journal of the Royal Statistical Soci- ety, 52, pp. 73-104.
Diggle, PJ 1983, Statistical analysis of spatial point patterns, Academic Press, London.
Dürkheim, E 1951, Suicide: a study in sociology, Free Press.
ESRI 1998, Shapefile Technical Description: An ESRI White Paper, Redlands, California.
ESRI 2011, ArcGIS Desktop, Release 10, Environmental Sys- tems Research Institute, Redlands, CA.
Flint, C, Harrower, M & Edsall, R 2000, But how does place matter? using Bayesian networks to explore a structural definition of place. Paper presented at the New Method- ologies of the Social Sciences Conference, University of Colorado Boulder.
Fotheringham, AS, Brunsdon, C & Charlton, ME 2002, Geo- graphically Weighted Regression: The Analysis of Spatially Varying Relationships, John Wiley, West Sussex, UK.
Fullerton, AS 2012/Spatial agglomeration and wages in the U.S. biotechnology sector', Sociological Spectrum, 32, pp. 61-80.
Galvis, L, Guertin, PJ & Meyer, WD 2009, Actionable cultur- al understanding for support to tactical operations: the ef- fect of data quality on spatial analysis results. Report, ERDC/CERL TR-09-15.
Goodman, JM, Owens, PR & Libohova, Z 2012/Predicting soil organic carbon using mixed conceptual and geostatis- tical models', Digital Soil Assessments and Beyond: Pro- ceedings of the 5th Global Workshop on Digital Soil Map- ping, , 10-13 April 2°12> CRC Press, Sydney, Australia.
Grubesic, TH & Matisziw, TC 2006/On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data', International Journal of Health Geographies, 5, pp. 1-15.
Hart, TC & Zandbergen, PA 2013/Reference data and ge- ocoding quality: Examining completeness and positional accuracy of street geocoded crime incidents', Policing: An International Journal of Police Strategies & Management, 36, pp. 263-294.
Jacquez, GM 1995/The map comparison problem: tests for the overlap of geographic boundaries', Statistics in Medi- cine, 14, pp. 2343-2361.
Jones, JP & Caselli, E 1992, Applications of the expansion method, Routledge, London.
Laraia, BA, Karter, AJ, Warton, EM, et al. 2012/Place mat- ters: neighborhood deprivation and cardiometabolic risk factors in the Diabetes Study of Northern California (DISTANCE)', Social Science Medicine, 74, pp. 1082-1090.
Law, J & Quick, M 2013/Exploring links between juvenile offenders and social disorganization at a large map scale: a Bayesian spatial modeling approach', Journal of Geo- graphical Systems, 15, pp. 89-113.
LeSage, JP & Pace, RK 2009, Introduction to spatial econo- metrics, CRC Press, Boca Raton.
Liu, CY & Painter, G 2012/Travel behaviour among Latino immigrants: the role of ethnic concentration and ethnic employment', Journal of Planning Education and Research, 32, pp. 62-80.
Mansour, S, Martin, D & Wright, J 2012,' Problems of spatial linkage of a geo-referenced Demographic and Health Sur- vey (DHS) dataset to a population census: A case study of Egypt', Computers, Environment and Urban Systems, 36, PP- 350-358-
Mitchell, A 2005, The ESRI Guide to GIS Analysis, Volume 1: Geographic Patterns and Relationships and Zeroing In: Ge- ographic Information Systems at Work in the Community, ESRI Press, US.
Openshaw, S 1993/ Some suggestions concerning the devel- opment of artificial intelligence tools for spatial model- ling and analysis in GIS in MM Fischer & P Nijkamp (eds), Geographic Information Systems, Spatial Modelling and Policy Evaluation, pp. 17-33, Springer Verlag, Berlin.
Raphael, S & Stoll, MA 2010/Job sprawl and the suburbani- zation of poverty. Metropolitan Policy Program at Brook- ings', Metropolitan Opportunity Series, March: 1-21.
Raudenbush, SW & Bryk, AS 2002, Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd edn, Thousand Oaks, Sage Publications, California.
Reamer, AD 2010, Surveying for dollars: the role of the Amer- ican Community Survey in the geographic distribution of federal funds, Metropolitan Policy Program at Brookings, Washington D.C.
Ripley, B 1976/The second-order analysis of stationary point processes', Journal of Applied Probability, 13, pp. 255- 266.
Ripley, BD 1977/Modelling spatial patterns (with discus- sion)', Journal of the Royal Statistical Society, Series B 39, pp. 172-212.
Ripley, BD 1981, Spatial statistics, Wiley, New York.
Safner, T, Miller, MP, McRae, BH, Fortin, M & Manel, S 2011/ Comparison of Bayesian clustering and edge detec- tion methods for inferring boundaries in landscape genet- ics', International Journal of Molecular Sciences, 12, pp. 865-889.
Salvatore, S, Chavers, JM, Nixon, LC & McQuiddy, MR 2007/ From here to there: methods of allocating data be- tween census geography and socially meaningful areas', Social Science Research, 36, pp. 897-920.
Siordia, C & Fox, A 2013, Public Use Microdata Area frag- mentation: research and policy implications of polygon discontiguity, Spatial Demography, 1(1), pp. 42-56.
Schootman, M, Sterling, DA, Struthers, J et al. 2007/ Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research', Annals of epidemiology, 17, pp. 464-470.
Srinivasan, R & Venkatesan, P 2013/Bayesian model for spatial dépendance and prediction of tuberculosis', Inter- national Journal, 3, pp. 2307-2083.
Tobler, W 1970,'A Computer Movie Simulating Urban Growth in the Detroit Region', Economic Geography, 46, pp. 234-240.
Vilalta, CJ 20ii,'The spatial dependence of judicial data', Applied Spatial Analysis and Policy, pp. 1-17.
Wang, F 2010, Quantitative methods and applications in GIS, CRC Press.
Wang, Q 2007/ Linking home to work: ethnic labor market concentration in the San Francisco consolidated metro- politan area', Urban Geography, 27, pp. 72-92.
Yang, T & Matthews, SA 2012/Understanding the non- stationary associations between distrust of the health care system, health conditions, and self-rated health in the el- derly: a geographically weighted regression approach', Health and Place, 18, pp. 576-585.
Yu, Z & Myers, D 2007/ Convergence or divergence in Los Angeles: three distinctive patterns of immigrant residen- tial assimilation, Social Science Research, 36, pp. 254-285.
Zandbergen, PA 2008/Positional Accuracy of Spatial Data: Non-Normal Distributions and a Critique of the National Standard for Spatial Data Accuracy, Transactions in GIS, 12, pp. 103-130.
Zandbergen, PA, Ignizio, DA & Lenzer, KE 2011/ Positional accuracy of TIGER 2000 and 2009 road networks', Trans- actions in GIS, 15, pp. 495-519.
Carlos Siordiaa*, Douglas F. Wunneburgerb
a Department of Epidemiology at the University of Pittsburgh, USA
b Department of Landscape Architecture and Urban Planning at Texas A&M University, USA
* Corresponding author:
Address: 130 DeSoto Street, Pittsburgh, PA, 15261, USA
Telephone: 1-412-383-1708
Email: [email protected]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Bucharest University 2013
Abstract
Social scientists investigating how context varies by geographical location and/or how macro-level phenomenon affects individual outcomes often make use of the US Census Bureau Public Use Microdata Sample (PUMS) files where microunits can only be geographically located to Public Use Microdata Area (PUMA) polygons. Most spatial analysis investigations with PUMAs ignore the fact that many of them are multipart polygons-spatially separated polygons that share the same attribute and are stored as a single feature in a vector file. The authors have briefly discuss the theoretical premises of how geographical boundaries are created for macro units and investigate the quantity, degree, and location of PUMA fragmentation. They argue that, the basic contiguity principle (the assumption that spatial analysis uses polygon centroids for solid and contiguous geographic units) in spatial dependence analysis is being violated with many PUMAs in the US mainland -- where Texas, California, Tennessee, and Illinois merit special attention. Future research should outline a method for handling multipart polygons in spatial and hierarchical analyses.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer