Wednesday, November 24, 2010

Converting shapefiles (SHP) to a different projection

I've written a small perl script to handle converting the coordinates in shapefiles to a different coordinate set. You must specify the source EPSG projection and the destination EPSG projection, and the script will walk through the data and create a new shapefile with the coordinates transformed by using the new projection. The new files are saved next to the original files, with the projection number appended to the file names.

Currently the script supports POINT data and LINESTRING data. All other geometries are ignored.

In order to use this script you will need to have the following packages installed (example for ubuntu): shapelib, proj-bin, gdal-bin

Sample usage:
adrianp@frost:~/bin$ ./ epsg:4326 epsg:31700 ofm_fiber.shp
Going to save data as ofm_fiber_31700.[shp/shx/dbf]
Depending on your data size, this may take a while...
Copying dbf file...

use strict;
use warnings;

# Author: Adrian Popa
# License: GPLv2/3

# This script reprojects a shape file from a source projection to a destination projection by
# reprojecting each coordinate in the file. The destination file will be named like the source
# file with a suffix containing the projection code.

# This script requires the following binaries to be available (adjust the path to fit your system):
# On ubuntu you can get these binaries by running sudo apt-get install shapelib proj-bin gdal-bin
my $shpcreate = '/usr/bin/shpcreate';
my $shpadd = '/usr/bin/shpadd';
my $cs2cs = '/usr/bin/cs2cs';
my $ogrinfo = '/usr/bin/ogrinfo';

if(scalar (@ARGV) != 3){
    print "Incorrect number of arguments\n";

my $epsg_in = $ARGV[0];
my $epsg_out = $ARGV[1];
my $in_file = $ARGV[2];

#validate parameters

if($epsg_in !~/^epsg:[0-9]+$/i){
    print "epsg_in is invalid\n";

my $epsg_out_code = "0";
    print "epsg_out is invalid\n";
    $epsg_out_code = $1;

if(! -f $in_file){
    print "$in_file is not a file\n";

#determine shp file type

my @output = `$ogrinfo -so "$in_file" 2>&1`;
my $geometryType = undef;
foreach my $line (@output){
#    print "$line";
    if($line=~/\s\(Line String\)/){
        $geometryType = "linestring";
        $geometryType = "point";
    if($line=~/Unable to open datasource/){
        die "$in_file is not supported: $line\n";

die "Unsupported geometry for $in_file. The only supported geometries are Line string and Point\n" if(!defined $geometryType);

#prepare destination file
my $file_base = $in_file;
$file_base=~s/\.shp$//i; #cut out the extension (if any)
my $out_file = $file_base;
$out_file.="_${epsg_out_code}"; #append the destination projection code

print "Going to save data as ${out_file}.[shp/shx/dbf]\n";
if($geometryType eq 'linestring'){
    print `$shpcreate "$out_file" arc 2>&1`;
if($geometryType eq 'point'){
    print `$shpcreate "$out_file" point 2>&1`;

print "Depending on your data size, this may take a while...\n";
@output = `$ogrinfo -al "$in_file" 2>&1`;
foreach my $line (@output){
    if($geometryType eq 'linestring'){
        if($line=~/^\s+LINESTRING \((.*)\)$/){
            my $linestring = $1;
#            print $linestring;
            my $projected = reproject($linestring);
            #add the line to the file
            print `$shpadd "$out_file" $projected 2>&1`;
    if($geometryType eq 'point'){
        if($line=~/^\s+POINT \((.*)\)$/){
            my $point = $1;
#            print $linestring;
            my $projected = reproject($point);
            #add the line to the file
            print `$shpadd "$out_file" $projected 2>&1`;

#we're almost done. Copy the dbf file unchanged (the order of the records is the same)
print "Copying dbf file...\n";
print `cp "$file_base.dbf" "$out_file.dbf" 2>&1`;

#take a line of coordinates and convert them. Return a string with the converted coordinates
sub reproject {
    my $string = shift;
    my $converted = "";
    my $coordinates = "";
    my @pairs = $string=~/([-0-9\.]+ [-0-9\.]+),?/g;
    foreach my $pair (@pairs){
#        print "DBG: $pair\n";
        my ($x, $y) = $pair=~/([-0-9\.]+) ([-0-9\.]+)/;
        $coordinates.= "$x $y\n";
#    print "DBG: coordinates: $coordinates\n";
    #convert $coordinates (expensive, but we won't run into shell problems by converting everything at once)
    my $cmd = "echo '$coordinates' | $cs2cs +init=$epsg_in +to +init=$epsg_out -f '%0.6f' 2>&1";
#    print "DBG: $cmd";
    my @output = `$cmd`;
    foreach my $line (@output){
#        print $line;
            next if ($line eq $output[-1]); #always skip the last line. It's not relevant to what we want
            my $x = $1;
            my $y = $2;
            $converted.=" $x $y";
#    print "DBG: converted:$converted\n";
    return $converted;

sub usage {
    print "Usage: $0 epsg_in epsg_out file.shp

epsg_in is the source EPSG projection code (e.g. epsg:31700)
epsg_out is the destination EPSG projection code (e.g. epsg:4326)
file.shp is the source shape file

Example: $0 epsg:31700 epsg:4326 streets.shp