Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). # Querying spatial data in Amazon Redshift *Spatial data* describes the position and shape of a geometry in a defined space (a spatial reference system). Amazon Redshift supports spatial data with the `GEOMETRY` and `GEOGRAPHY` data types, which contain spatial data and optionally the data's spatial reference system identifier (SRID). Spatial data contains geometric data that you can use to represent geographic features. Examples of this type of data include weather reports, map directions, tweets with geographic positions, store locations, and airline routes. Spatial data plays an important role in business analytics, reporting, and forecasting. You can query spatial data with Amazon Redshift SQL functions. Spatial data contains geometric values for an object. The `GEOMETRY` data type operations work on the Cartesian plane. Although the spatial reference system identifier (SRID) is stored inside the object, this SRID is merely an identifier of the coordinate system and plays no role in the algorithms used to process the `GEOMETRY` objects. Conversely, the operations on the `GEOGRAPHY` data type treat the coordinates inside objects as spherical coordinates on a spheroid. This spheroid is defined by the SRID, which references a geographic spatial reference system. By default, `GEOGRAPHY` data types are created with spatial reference (SRID) 4326, referencing the World Geodetic System (WGS) 84. For more information about SRIDs, see [Spatial reference system](https://en.wikipedia.org/wiki/Spatial_reference_system) in Wikipedia. You can use the ST\$1Transform function to transform the coordinates from various spatial reference systems. After the transformation of the coordinates is done, you can also use a simple cast between the two, as long as the input `GEOMETRY` is encoded with the geographic SRID. This cast simply copies coordinates without any further transformation. For example: ``` SELECT ST_AsEWKT(ST_GeomFromEWKT('SRID=4326;POINT(10 20)')::geography); ``` ``` st_asewkt ------------------------ SRID=4326;POINT(10 20) ``` To better understand the difference between `GEOMETRY` and `GEOGRAPHY` data types, consider calculating the distance between the Berlin airport (BER) and the San Francisco airport (SFO) using the World Geodetic System (WGS) 84. Using the `GEOGRAPHY` data type, the result is in meters. When using `GEOMETRY` data type with SRID 4326, the result is in degrees, which can’t convert to meters because the distance of one degree depends on where on the globe geometries are located. Calculations on the `GEOGRAPHY` data type are mostly used for realistic round earth calculations such as the precise area of a country without distortion. But they are far more expensive to compute. Therefore, ST\$1Transform can transform your coordinates to an appropriate local projected coordinate system and do the calculation on the `GEOMETRY` data type faster. Using spatial data, you can run queries to do the following: + Find the distance between two points. + Check whether one area (polygon) contains another. + Check whether one linestring intersects another linestring or polygon. You can use the `GEOMETRY` data type to hold the values of spatial data. A `GEOMETRY` value in Amazon Redshift can define two-dimensional (2D), three-dimensional (3DZ), two-dimensional with a measure (3DM), and four-dimensional (4D) geometry primitive data types: + A two-dimensional (2D) geometry is specified by two Cartesian coordinates (x, y) in a plane. + A three-dimensional (3DZ) geometry is specified by three Cartesian coordinates (x, y, z) in space. + A two-dimensional with measure (3DM) geometry is specified by three coordinates (x, y, m), where the first two are Cartesian coordinates in a plane and the third is a measurement. + A four-dimensional (4D) geometry is specified by four coordinates (x, y, z, m), where the first three are Cartesian coordinates in a space and the fourth is a measurement. For more information about geometry primitive data types, see [Well-known text representation of geometry](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) in Wikipedia. You can use the `GEOGRAPHY` data type to hold the values of spatial data. A `GEOGRAPHY` value in Amazon Redshift can define two-dimensional (2D), three-dimensional (3DZ), two-dimensional with a measure (3DM), and four-dimensional (4D) geometry primitive data types: + A two-dimensional (2D) geometry is specified by longitude and latitude coordinates on a spheroid. + A three-dimensional (3DZ) geometry is specified by longitude, latitude, and altitude coordinates on a spheroid. + A two-dimensional with measure (3DM) geometry is specified by three coordinates (longitude, latitude, measure), where the first two are angular coordinates on a sphere and the third is a measurement. + A four-dimensional (4D) geometry is specified by four coordinates (longitude, latitude, altitude, measure), where the first three are longitude, latitude and altitude, and the fourth is a measurement. For more information about geographic coordinate systems, see [Geographic coordinate system](https://en.wikipedia.org/wiki/Geographic_coordinate_system) and [Spherical coordinate system](https://en.wikipedia.org/wiki/Spherical_coordinate_system) in Wikipedia. The `GEOMETRY` and `GEOGRAPHY` data types have the following subtypes: + `POINT` + `LINESTRING` + `POLYGON` + `MULTIPOINT` + `MULTILINESTRING` + `MULTIPOLYGON` + `GEOMETRYCOLLECTION` There are Amazon Redshift SQL functions that support the following representations of geometric data: + GeoJSON + Well-known text (WKT) + Extended well-known text (EWKT) + Well-known binary (WKB) representation + Extended well-known binary (EWKB) You can cast between `GEOMETRY` and `GEOGRAPHY` data types. The following SQL casts a linestring from a `GEOMETRY` to a `GEOGRAPHY`. ``` SELECT ST_AsEWKT(ST_GeomFromText('LINESTRING(110 40, 2 3, -10 80, -7 9)')::geography); ``` ``` st_asewkt ---------------------------------------------- SRID=4326;LINESTRING(110 40,2 3,-10 80,-7 9) ``` The following SQL casts a linestring from a `GEOGRAPHY` to a `GEOMETRY`. ``` SELECT ST_AsEWKT(ST_GeogFromText('LINESTRING(110 40, 2 3, -10 80, -7 9)')::geometry); ``` ``` st_asewkt ---------------------------------------------- SRID=4326;LINESTRING(110 40,2 3,-10 80,-7 9) ``` Amazon Redshift provides many SQL functions to query spatial data. Except for the `ST_IsValid` function, spatial functions that accept a `GEOMETRY` object as an argument expect this `GEOMETRY` object to be a valid geometry. If the `GEOMETRY` or `GEOGRAPHY` object isn't valid, then the behavior of the spatial function is undefined. For more information about validity, see [Geometric validity](spatial-terminology.md#spatial-terminology-validity). For details about SQL functions to query spatial data, see [Spatial functions](geospatial-functions.md). For details about loading spatial data, see [Loading a column of the GEOMETRY or GEOGRAPHY data type](copy-usage_notes-spatial-data.md). **Topics** + [Tutorial: Using spatial SQL functions with Amazon Redshift](spatial-tutorial.md) + [Loading a shapefile into Amazon Redshift](spatial-copy-shapefile.md) + [Terminology for Amazon Redshift spatial data](spatial-terminology.md) + [Considerations when using spatial data with Amazon Redshift](spatial-limitations.md) # Tutorial: Using spatial SQL functions with Amazon Redshift This tutorial demonstrates how to use some of the spatial SQL functions with Amazon Redshift. To do this, you query two tables using spatial SQL functions. The tutorial uses data from public datasets that correlate location data of rental accommodations with postal codes in Berlin, Germany. **Topics** + [Prerequisites](#spatial-tutorial-prerequisites) + [Step 1: Create tables and load test data](#spatial-tutorial-test-data) + [Step 2: Query spatial data](#spatial-tutorial-query) + [Step 3: Clean up your resources](#spatial-tutorial-clean-up) ## Prerequisites For this tutorial, you need the following resources: + An existing Amazon Redshift cluster and database that you can access and update. In the existing cluster, you create tables, load sample data, and run SQL queries to demonstrate spatial functions. Your cluster should have at least two nodes. To learn how to create a cluster, follow the steps in [Amazon Redshift Getting Started Guide](https://docs.aws.amazon.com/redshift/latest/gsg/). + To use the Amazon Redshift query editor, make sure that your cluster is in an AWS Region that supports the query editor. For more information, see [Querying a database using the query editor](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html) in the *Amazon Redshift Management Guide*. + AWS credentials for your Amazon Redshift cluster that allow it to load test data from Amazon S3. For information about how to access other AWS services like Amazon S3, see [ Authorizing Amazon Redshift to access AWS services](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html). + The AWS Identity and Access Management (IAM) role named `mySpatialDemoRole`, which has the managed policy `AmazonS3ReadOnlyAccess` attached to read Amazon S3 data. To create a role with permission to load data from an Amazon S3 bucket, see [Authorizing COPY, UNLOAD, and CREATE EXTERNAL SCHEMA operations using IAM roles](https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html) in the *Amazon Redshift Management Guide*. + After you create the IAM role `mySpatialDemoRole`, that role needs an association with your Amazon Redshift cluster. For more information on how to create that association, see [Authorizing COPY, UNLOAD, and CREATE EXTERNAL SCHEMA operations using IAM roles](https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html) in the *Amazon Redshift Management Guide*. ## Step 1: Create tables and load test data The source data used by this tutorial is in files named `accommodations.csv` and `zipcodes.csv`. The `accommodations.csv` file is open-source data from insideairbnb.com. The `zipcodes.csv` file provides postal codes that are open-source data from the national statistics institute of Berlin-Brandenburg in Germany (Amt für Statistik Berlin-Brandenburg). Both data sources are provided under a Creative Commons license. The data is limited to the Berlin, Germany, region. These files are located in an Amazon S3 public bucket to use with this tutorial. You can optionally download the source data from the following Amazon S3 links: + [ Source data for the `accommodations` table](https://s3.amazonaws.com/redshift-downloads/spatial-data/accommodations.csv). + [ Source data for the `zipcode` table](https://s3.amazonaws.com/redshift-downloads/spatial-data/zipcode.csv). Use the following procedure to create tables and load test data. **To create tables and load test data** 1. Open the Amazon Redshift query editor. For more information on working with the query editor, see [Querying a database using the query editor](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html) in the *Amazon Redshift Management Guide*. 1. Drop any tables used by this tutorial if they already exist in your database. For more information, see [Step 3: Clean up your resources](#spatial-tutorial-clean-up). 1. Create the `accommodations` table to store each accommodation's geographical location (longitude and latitude), the name of the listing, and other business data. This tutorial explores room rentals in Berlin, Germany. The `shape` column stores geographic points of the location of accommodations. The other columns contain information about the rental. To create the `accommodations` table, run the following SQL statement in the Amazon Redshift query editor. ``` CREATE TABLE public.accommodations ( id INTEGER PRIMARY KEY, shape GEOMETRY, name VARCHAR(100), host_name VARCHAR(100), neighbourhood_group VARCHAR(100), neighbourhood VARCHAR(100), room_type VARCHAR(100), price SMALLINT, minimum_nights SMALLINT, number_of_reviews SMALLINT, last_review DATE, reviews_per_month NUMERIC(8,2), calculated_host_listings_count SMALLINT, availability_365 SMALLINT ); ``` 1. Create the `zipcode` table in the query editor to store Berlin postal codes. A *postal code* is defined as a polygon in the `wkb_geometry` column. The rest of the columns describe additional spatial metadata about the postal code. To create the `zipcode` table, run the following SQL statement in the Amazon Redshift query editor. ``` CREATE TABLE public.zipcode ( ogc_field INTEGER PRIMARY KEY NOT NULL, wkb_geometry GEOMETRY, gml_id VARCHAR(256), spatial_name VARCHAR(256), spatial_alias VARCHAR(256), spatial_type VARCHAR(256) ); ``` 1. Load the tables using sample data. The sample data for this tutorial is provided in an Amazon S3 bucket that allows read access to all authenticated AWS users. Make sure that you provide valid AWS credentials that permit access to Amazon S3. To load test data to your tables, run the following COPY commands. Replace *`account-number`* with your own AWS account number. The segment of the credentials string that is enclosed in single quotation marks can't contain any spaces or line breaks. ``` COPY public.accommodations FROM 's3://redshift-downloads/spatial-data/accommodations.csv' DELIMITER ';' IGNOREHEADER 1 REGION 'us-east-1' CREDENTIALS 'aws_iam_role=arn:aws:iam::account-number:role/mySpatialDemoRole'; ``` ``` COPY public.zipcode FROM 's3://redshift-downloads/spatial-data/zipcode.csv' DELIMITER ';' IGNOREHEADER 1 REGION 'us-east-1' CREDENTIALS 'aws_iam_role=arn:aws:iam::account-number:role/mySpatialDemoRole'; ``` 1. Verify that each table loaded correctly by running the following commands. ``` select count(*) from accommodations; ``` ``` select count(*) from zipcode; ``` The following results show the number of rows in each table of test data. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/spatial-tutorial.html) ## Step 2: Query spatial data After your tables are created and loaded, you can query them using SQL SELECT statements. The following queries demonstrate some of the information that you can retrieve. You can write many other queries that use spatial functions to satisfy your needs. **To query spatial data** 1. Query to get the count of the total number of listings stored in the `accommodations` table, as shown following. The spatial reference system is World Geodetic System (WGS) 84, which has the unique spatial reference identifier 4326. ``` SELECT count(*) FROM public.accommodations WHERE ST_SRID(shape) = 4326; ``` ``` count ------- 22248 ``` 1. Fetch the geometry objects in well-known text (WKT) format with some additional attributes. Additionally, you can validate if this postal code data is also stored in World Geodetic System (WGS) 84, which uses the spatial reference ID (SRID) 4326. Spatial data must be stored in the same spatial reference system to be interoperable. ``` SELECT ogc_field, spatial_name, spatial_type, ST_SRID(wkb_geometry), ST_AsText(wkb_geometry) FROM public.zipcode ORDER BY spatial_name; ``` ``` ogc_field spatial_name spatial_type st_srid st_astext --------------------------------------------------------------- 0 10115 Polygon 4326 POLYGON((...)) 4 10117 Polygon 4326 POLYGON((...)) 8 10119 Polygon 4326 POLYGON((...)) ... (190 rows returned) ``` 1. Select the polygon of Berlin Mitte (10117), a borough of Berlin, in GeoJSON format, its dimension, and the number of points in this polygon. ``` SELECT ogc_field, spatial_name, ST_AsGeoJSON(wkb_geometry), ST_Dimension(wkb_geometry), ST_NPoints(wkb_geometry) FROM public.zipcode WHERE spatial_name='10117'; ``` ``` ogc_field spatial_name spatial_type st_dimension st_npoint ----------------------------------------------------------------------------------------------- 4 10117 {"type":"Polygon", "coordinates":[[[...]]]} 2 331 ``` 1. Run the following SQL command to view how many accommodations are within 500 meters of the Brandenburg Gate. ``` SELECT count(*) FROM public.accommodations WHERE ST_DistanceSphere(shape, ST_GeomFromText('POINT(13.377704 52.516431)', 4326)) < 500; ``` ``` count ------ 29 ``` 1. Get the rough location of the Brandenburg Gate from data stored in the accommodations that are listed as nearby by running the following query. This query requires a subselect. It leads to a different count because the requested location is not the same as the previous query because it is closer to the accommodations. ``` WITH poi(loc) as ( SELECT st_astext(shape) FROM accommodations WHERE name LIKE '%brandenburg gate%' ) SELECT count(*) FROM accommodations a, poi p WHERE ST_DistanceSphere(a.shape, ST_GeomFromText(p.loc, 4326)) < 500; ``` ``` count ------ 60 ``` 1. Run the following query to show the details of all accommodations around the Brandenburg Gate, ordered by price in descending order. ``` SELECT name, price, ST_AsText(shape) FROM public.accommodations WHERE ST_DistanceSphere(shape, ST_GeomFromText('POINT(13.377704 52.516431)', 4326)) < 500 ORDER BY price DESC; ``` ``` name price st_astext ------------------------------------------------------------------------------------------------------- DUPLEX APARTMENT/PENTHOUSE in 5* LOCATION! 7583 300 POINT(13.3826510209548 52.5159819722552) DUPLEX-PENTHOUSE IN FIRST LOCATION! 7582 300 POINT(13.3799997083855 52.5135918444834) ... (29 rows returned) ``` 1. Run the following query to retrieve the most expensive accommodation with its postal code. ``` SELECT a.price, a.name, ST_AsText(a.shape), z.spatial_name, ST_AsText(z.wkb_geometry) FROM accommodations a, zipcode z WHERE price = 9000 AND ST_Within(a.shape, z.wkb_geometry); ``` ``` price name st_astext spatial_name st_astext ------------------------------------------------------------------------------------------------------------------------------------------------- 9000 Ueber den Dächern Berlins Zentrum POINT(13.334436985013 52.4979779501538) 10777 POLYGON((13.3318284987227 52.4956021172799,... ``` 1. Calculate the maximum, minimum, or median price of accommodations by using a subquery. The following query lists the median price of accommodations by postal code. ``` SELECT a.price, a.name, ST_AsText(a.shape), z.spatial_name, ST_AsText(z.wkb_geometry) FROM accommodations a, zipcode z WHERE ST_Within(a.shape, z.wkb_geometry) AND price = (SELECT median(price) FROM accommodations) ORDER BY a.price; ``` ``` price name st_astext spatial_name st_astext --------------------------------------------------------------------------------------------------------------------------------------------- 45 "Cozy room Berlin-Mitte" POINT(13.3864349535358 52.5292016386514) 10115 POLYGON((13.3658598465795 52.535659581048,... ... (723 rows returned) ``` 1. Run the following query to retrieve the number of accommodations listed in Berlin. To find the hot spots, these are grouped by postal code and sorted by the amount of supply. ``` SELECT z.spatial_name as zip, count(*) as numAccommodations FROM public.accommodations a, public.zipcode z WHERE ST_Within(a.shape, z.wkb_geometry) GROUP BY zip ORDER BY numAccommodations DESC; ``` ``` zip numaccommodations ---------------------------- 10245 872 10247 832 10437 733 10115 664 ... (187 rows returned) ``` ## Step 3: Clean up your resources Your cluster continues to accrue charges as long as it's running. When you have completed this tutorial, you can delete your sample cluster. If you want to keep the cluster but recover the storage used by the test data tables, run the following commands to delete the tables. ``` drop table public.accommodations cascade; ``` ``` drop table public.zipcode cascade; ``` # Loading a shapefile into Amazon Redshift You can use the COPY command to ingest Esri shapefiles stored in Amazon S3 into Amazon Redshift tables. A *shapefile* stores the geometric location and attribute information of geographic features in a vector format. The shapefile format can spatially describe spatial objects such as points, lines, and polygons. For more information about a shapefile, see [Shapefile](https://en.wikipedia.org/wiki/Shapefile) in Wikipedia. The COPY command supports the data format parameter `SHAPEFILE`. By default, the first column of the shapefile is either a `GEOMETRY` or `IDENTITY` column. All subsequent columns follow the order specified in the shapefile. However, the target table doesn't need to be in this exact layout because you can use COPY column mapping to define the order. For information about the COPY command shapefile support, see [SHAPEFILE](copy-parameters-data-format.md#copy-shapefile). In some cases, the resulting geometry size might be greater than the maximum for storing a geometry in Amazon Redshift. If so, you can use the COPY option `SIMPLIFY` or `SIMPLIFY AUTO` to simplify the geometries during ingestion as follows: + Specify `SIMPLIFY tolerance` to simplify all geometries during ingestion using the Ramer-Douglas-Peucker algorithm and the given tolerance. + Specify `SIMPLIFY AUTO` without tolerance to simplify only geometries that are larger than the maximum size using the Ramer-Douglas-Peucker algorithm. This approach calculates the minimum tolerance that is large enough to store the object within the maximum size limit. + Specify `SIMPLIFY AUTO max_tolerance` to simplify only geometries that are larger than the maximum size using the Ramer-Douglas-Peucker algorithm and the automatically calculated tolerance. This approach makes sure that the tolerance doesn't exceed the maximum tolerance. For information about the maximum size of a `GEOMETRY` data value, see [Considerations when using spatial data with Amazon Redshift](spatial-limitations.md). In some cases, the tolerance is low enough that the record can't shrink below the maximum size of a `GEOMETRY` data value. In these cases, you can use the `MAXERROR` option of the COPY command to ignore all or up to a certain number of ingestion errors. The COPY command also supports loading GZIP shapefiles. To do this, specify the COPY GZIP parameter. With this option, all shapefile components must be independently compressed and share the same compression suffix. If a projection description file (.prj) exists with the shapefile, Redshift uses it to determine the spatial reference system id (SRID). If the SRID is valid, the resulting geometry has this SRID assigned. If the SRID value associated with the input geometry does not exist, the resulting geometry has the SRID value zero. You can disable automatic detection of the spatial reference system id at the session level by using `SET read_srid_on_shapefile_ingestion` to `OFF`. Query the `SYS_SPATIAL_SIMPLIFY` or `SVL_SPATIAL_SIMPLIFY` system views to view which records have been simplified, along with the calculated tolerance. When you specify `SIMPLIFY tolerance`, this view contains a record for each COPY operation. Otherwise, it contains a record for each simplified geometry. For more information, see [SYS\$1SPATIAL\$1SIMPLIFY](SYS_SPATIAL_SIMPLIFY.md) or [SVL\$1SPATIAL\$1SIMPLIFY](r_SVL_SPATIAL_SIMPLIFY.md). For examples of loading a shapefile, see [Loading a shapefile into Amazon Redshift](r_COPY_command_examples.md#copy-example-spatial-copy-shapefile). # Terminology for Amazon Redshift spatial data The following terms are used to describe some Amazon Redshift spatial functions. ## Bounding box A bounding box of a geometry or geography is defined as the cross product (across dimensions) of the extents of the coordinates of all points in the geometry or geography. For two-dimensional geometries, the bounding box is a rectangle that completely includes all points in the geometry. For example, a bounding box of the polygon `POLYGON((0 0,1 0,0 2,0 0))` is the rectangle that is defined by the points (0, 0) and (1, 2) as its bottom-left and top-right corners. Amazon Redshift precomputes and stores a bounding box inside a geometry to speed up geometric predicates and spatial joins. For example if the bounding boxes of two geometries don't intersect, then these two geometries can't intersect, and they can't be in the result set of a spatial join using the ST\$1Intersects predicate. You can use spatial functions to add ([AddBBox](AddBBox-function.md)), drop ([DropBBox](DropBBox-function.md)), and determine support ([SupportsBBox](SupportsBBox-function.md)) for a bounding box. Amazon Redshift supports the precomputaton of bounding boxes for all geometry subtypes. The following example shows how to update existing geometries in a table to store them with a bounding box. If your cluster is at cluster version 1.0.26809 or later, then all new geometries are created with a precomputed bounding box by default. ``` UPDATE my_table SET geom = AddBBox(geom) WHERE SupportsBBox(geom) = false; ``` After you update existing geometries, we recommend you run the VACUUM command on the updated table. For more information, see [VACUUM](r_VACUUM_command.md). To set whether geometries are encoded with a bounding box during a session, see [default\$1geometry\$1encoding](r_default_geometry_encoding.md). ## Geometric validity Geometric algorithms used by Amazon Redshift assume that the input geometry is a valid geometry. If an input to an algorithm is not valid, then the result is undefined. The following section describes the geometric validity definitions used by Amazon Redshift for each geometry subtype. Point A point is considered to be valid if one of the following conditions is true: + The point is the empty point. + All point coordinates are finite floating point numbers. A point can be the empty point. Linestring A linestring is considered to be valid if any of the following conditions are true: + The linestring is empty; that is, it contains no points. + All points in a nonempty linestring have coordinates that are finite floating point numbers. + The linestring, if not empty, must be one-dimensional; that is, it can't degenerate to a point. A linestring can't contain empty points. A linestring can have duplicate consecutive points. A linestring can have self-intersections. Polygon A polygon is considered to be valid if any of the following conditions are true: + The polygon is empty; that is, it contains no rings. + If not empty, a polygon is valid if all of the following conditions are true: + All rings of the polygon are valid. A ring is considered to be valid if all the following conditions are true: + All points of the ring have coordinates that are finite floating point numbers. + The ring is closed; that is, its first point and its last point coincide. + The ring doesn't have any self-intersections. + The ring is two-dimensional. + The rings of the polygon must have consistent orientations. That is, if you traverse any ring, the interior of the polygon is either to your right or to your left. + All interior rings must be within the exterior ring of the polygon. + All interior rings must have the same orientation, and it must be the opposite orientation of the exterior ring. + Interior rings can't be nested; that is, an interior ring can't be within another interior ring. + Interior and exterior rings can only intersect at a finite number of points. + The interior of the polygon must be simply connected. A polygon can't contain empty points. Multipoint A multipoint is considered to be valid if any of the following conditions are true: + The multipoint is empty; that is, it contains no points. + A multipoint is not empty, and all points are valid according to the point validity definition. A multipoint can contain one or more empty points. A multipoint can have duplicate points. Multilinestring A multilinestring is considered to be valid if any of the following conditions are true: + The multilinestring is empty; that is, it contains no linestrings. + All linestrings in a nonempty multilinestring are valid according to the linestring validity definition. A nonempty multilinestring that consists of only empty linestrings is considered to be valid. An empty linestring in a multilinestring doesn't affect its validity. A multilinestring can have linestrings with duplicate consecutive points. A multilinestring can have self-intersections. A multilinestring can't contain empty points. Multipolygon A multipolygon is considered to be valid if any of the following conditions are true: + The multipolygon doesn't contain any polygons (it is empty). + The multipolygon is not empty and all of the following are true: + All polygons in the multipolygon are valid. + No two polygons in the multipolygon can intersect at an infinite number of points. In particular, this implies that the interior of any two polygons can't intersect and that they can only touch at a finite number of points. An empty polygon in a multipolygon doesn't invalidate a multipolygon. A multipolygon can't contain empty points. Geometry collection A geometry collection is considered to be valid if any of the following conditions are true: + The geometry collection is empty; that is, it doesn't contain any geometries. + All geometries in a nonempty geometry collection are valid. This definition still applies, although in a recursive manner, for nested geometry collections. A geometry collection can contain empty points and multipoints with empty points. ## Geometric simplicity Geometric algorithms used by Amazon Redshift assume that the input geometry is a valid geometry. If an input to an algorithm is not valid, then the simplicity check is undefined. The following section describes the geometric simplicity definitions used by Amazon Redshift for each geometry subtype. Point A valid point is considered to be simple if any of the following conditions are true: + A valid point is always considered to be simple. + An empty point is considered to be simple. Linestring A valid linestring is considered to be simple if any of the following conditions are true: + The linestring is empty. + The linestring is not empty and all of the following conditions are true: + It has no duplicate consecutive points. + It has no self-intersections, except possibly for its first point and last point, which can coincide. In other words, the linestring can't have self-intersections except at boundary points. Polygon A valid polygon is considered to be simple if it doesn't contain any duplicate consecutive points. Multipoint A valid multipoint is considered to be simple if any of the following conditions are true: + The multipoint is empty; that is, it contains no points. + No two nonempty points of the multipoint coincide. Multilinestring A valid multilinestring is considered to be simple if any of the following conditions are true: + The multilinestring is empty. + The multilinestring is nonempty and all of the following conditions are true: + All its linestrings are simple. + Any two linestrings of the multilinestring don't intersect, except at points that are boundary points of the two linestrings. A nonempty multilinestring that consists of empty linestrings only is considered to be empty. An empty linestring in a multilinestring doesn't affect its simplicity. A closed linestring in a multilinestring can't intersect with any other linestring in the multilinestring. A multilinestring can't have linestrings with duplicate consecutive points. Multipolygon A valid multipolygon is considered to be simple if it doesn't contain any duplicate consecutive points. Geometry collection A valid geometry collection is considered to be simple if any of the following conditions are true: + The geometry collection is empty; that is, it doesn't contain any geometries. + All geometries in a nonempty geometry collection are simple. This definition still applies, although in a recursive manner, for nested geometry collections. ## H3 H3 is a hierarchical geospatial indexing grid system, which offers a way to index spatial coordinates down to square meter resolution. Indexed data can be joined across disparate datasets and aggregated at different levels of precision. H3 enables a range of algorithms and optimizations based on the grid, including nearest neighbors, shortest path, gradient smoothing, and more. H3 indexes refer to cells that can be either hexagons or pentagons. The space is subdivided hierarchically given a resolution. H3 supports 16 resolutions from 0–15, inclusive. With `0` being the coarsest and `15` being the finest. Amazon Redshift provides the following H3 spatial functions: + [H3\$1Boundary](H3_Boundary-function.md) + [H3\$1Center](H3_Center-function.md) + [H3\$1FromLongLat](H3_FromLongLat-function.md) + [H3\$1FromPoint](H3_FromPoint-function.md) + [H3\$1IsValid](H3_IsValid-function.md) + [H3\$1Polyfill](H3_Polyfill-function.md) + [H3\$1Resolution](H3_Resolution-function.md) + [H3\$1ToChildren](H3_ToChildren-function.md) + [H3\$1ToParent](H3_ToParent-function.md) # Considerations when using spatial data with Amazon Redshift The following are considerations when using spatial data with Amazon Redshift: + The maximum size of a `GEOMETRY` or `GEOGRAPHY` object is 1,048,447 bytes. + Amazon Redshift Spectrum doesn't natively support spatial data. Therefore, you can't create or alter an external table with a `GEOMETRY` or `GEOGRAPHY` column. + Data types for Python user-defined functions (UDFs) don't support the `GEOMETRY` or `GEOGRAPHY` data type. + You can't use a `GEOMETRY` or `GEOGRAPHY` column as a sort key or a distribution key for an Amazon Redshift table. + You can't use `GEOMETRY` or `GEOGRAPHY` columns in SQL ORDER BY, GROUP BY, or DISTINCT clauses. + You can't use `GEOMETRY` or `GEOGRAPHY` columns in many SQL functions. + You can't perform an UNLOAD operation on `GEOMETRY` or `GEOGRAPHY` columns into every format. You can UNLOAD `GEOMETRY` or `GEOGRAPHY` columns to text or comma-separated value (CSV) files. Doing this writes `GEOMETRY` or `GEOGRAPHY` data in hexadecimal EWKB format. If the size of the EWKB data is more than 4 MB, then a warning occurs because the data can't later be loaded into a table. + The supported compression encoding of `GEOMETRY` or `GEOGRAPHY` data is RAW. + When using JDBC or ODBC drivers, use customized type mappings. In this case, the client application must have information on which parameters of a `ResultSet` object are `GEOMETRY` or `GEOGRAPHY` objects. The `ResultSetMetadata` operation returns type `VARCHAR`. + To copy geographic date from a `SHAPEFILE`, first ingest into a `GEOMETRY` column, and then cast the objects to `GEOGRAPHY` objects. . The following nonspatial functions can accept an input of type `GEOMETRY` or `GEOGRAPHY`, or columns of type `GEOMETRY` or `GEOGRAPHY`: + The aggregate function COUNT + The conditional expressions COALESCE and NVL + CASE expressions + The default encoding for `GEOMETRY` and `GEOGRAPHY` is RAW. For more information, see [Compression encodings](c_Compression_encodings.md).