postgis_tiger geocoding extension : France & temporal geocoding

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

postgis_tiger geocoding extension : France & temporal geocoding

Rémi Cura
Hey dear list,
I'm currently considering extending postgis geocoder.

I have two needs:
 - to work with French data
 - to perform temporal geocoding (you give an addresses and a date, you get geographic position).

For the moment I'm trying to understand how the geocoder works with USA data.

I imported Massachusetts for a test, and everything seems okay except that my tiger.state table is empty, which prevents geocoding.

Notwithstanding the import step, and the adress normaliser, it seems to boil down to a large query in the function tiger.geocode_address().


I would be delighted to have some pointers on the design of the postgis_tiger_geocoder extension.
In perticular, I don't understand the adress standardizer business at all (yeah, I found some doc, no, I don't understand it well)


Many thanks for help,
Cheers,
Rémi-C






-------------------------------------
WITH a AS
(
         SELECT   *
         FROM     (
                             SELECT     f.*,
                                        ad.side,
                                        ad.zip,
                                        ad.fromhn,
                                        ad.tohn,
                                        Rank() over(ORDER BY Diff_zip(ad.zip,'01104')*2.00 +
                                        CASE
                                                   WHEN Lower(f.name) = Lower('FRANKLIN') THEN 0
                                                   ELSE Levenshtein_ignore_case(f.name, Lower('FRANKLIN') )
                                        END + Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
                                                   || ' '
                                                   || Coalesce('ST','')) ) +
                                        CASE
                                                   WHEN (
                                                                         Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
                                                   ELSE 1
                                        END +
                                        CASE
                                                   WHEN '1'::INTEGER BETWEEN Least_hn(ad.fromhn,ad.tohn) AND        Greatest_hn(ad.fromhn, ad.tohn) THEN 0
                                                   ELSE 4
                                        END +
                                        CASE
                                                   WHEN Lower('ST') = Lower(f.suftypabrv)
                                                   OR         Lower('ST') = Lower(f.pretypabrv) THEN 0
                                                   ELSE 1
                                        END + Rate_attributes(NULL, f.predirabrv, 'FRANKLIN', f.name , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) ) AS rank
                             FROM       tiger.featnames                                                                                                   AS f
                             inner join tiger.addr                                                                                                        AS ad
                             ON         (
                                                   f.tlid = ad.tlid)
                             WHERE      '25' = f.statefp
                             AND        '25' = ad.statefp
                             AND        (
                                                   Lower(f.fullname) LIKE (Coalesce(NULL
                                                              || ' ','')
                                                              || Lower('FRANKLIN')
                                                              || '%')::text
                                        OR         Lower(f.name) = Lower('FRANKLIN')
                                        OR         Soundex(f.name) = Soundex('FRANKLIN') )
                             AND        (
                                                   ad.zip = ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
         ORDER BY rank limit 30 )

, b AS (
                  SELECT     a.tlid,
                     a.fullname,
                     a.name,
                     a.predirabrv,
                     a.suftypabrv,
                     a.sufdirabrv,
                     a.prequalabr,
                     a.pretypabrv,
                     b.the_geom,
                     tfidr,
                     tfidl,
                     a.side ,
                     a.fromhn,
                     a.tohn,
                     a.zip,
                     p.name AS place
                  FROM       a
                  inner join tiger.edges AS b
                  ON         (
                            a.statefp = b.statefp
                     AND        a.tlid = b.tlid )
                  inner join tiger.faces AS f
                  ON         (
                            '25' = f.statefp
                     AND        ( (
                                      b.tfidl = f.tfid
                                   AND        a.side = 'L')
                            OR         (
                                      b.tfidr = f.tfid
                                   AND        a.side = 'R' ) ))
                  inner join tiger.place p
                  ON         (
                            '25' = p.statefp
                     AND        f.placefp = p.placefp )
                  WHERE      a.statefp = '25'
                  AND        b.statefp = '25' )

, sub AS  (
    SELECT   tlid,
           predirabrv,
           coalesce(b.prequalabr
                || ' ','' )
                || b.name AS fename,
           suftypabrv,
           sufdirabrv,
           fromhn,
           tohn,
           side,
           zip,
           rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
           CASE
                WHEN '1'::INTEGER IS NULL
                OR       b.fromhn IS NULL THEN 20
                WHEN '1'::    INTEGER >= least_hn(b.fromhn, b.tohn)
                AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
                AND      (
                          '1'::INTEGER % 2) = (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
                WHEN '1'::    INTEGER >= least_hn(b.fromhn,b.tohn)
                AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn) THEN 2
                ELSE ((1.0 - (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC / (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text))) )) * 5)::INTEGER + 5
           END::INTEGER AS sub_rating,
           '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
      AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
      AND      (
                '1' % 2)::NUMERIC::INTEGER = (to_number(b.fromhn,'99999999') % 2) AS exact_address,
           b.name,
           b.prequalabr,
           b.pretypabrv,
           b.tfidr,
           b.tfidl,
           b.the_geom,
           b.place
      FROM     b
      ORDER BY 10 ,
           11 DESC limit 20 )



--, foo AS (

    SELECT DISTINCT
         ON (
                         sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv) ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
                 sub.fename,
                 coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
                 sub.sufdirabrv                           AS fedirs,
                 sub.place ,
                 s.stusps                                                                                                                                             AS state,
                 sub.zip                                                                                                                                              AS zip,
                 interpolate_from_address('1', sub.fromhn, sub.tohn, sub.the_geom, sub.side)                                                                          AS address_geom,
                 (sub.sub_rating + least(coalesce(diff_zip('01104' , sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD', sub.place),5) )::INTEGER AS sub_rating,
                 sub.exact_address                                                                                                                                    AS exact_address,
                 sub.tohn,
                 sub.fromhn
         FROM           sub 
         join            tiger.state s
         ON              (
                         '25' = s.statefp)
         ORDER BY        1,
                 2,
                 3,
                 4,
                 5,
                 6,
                 7,
                 9 limit 20)
                                        
SELECT   *
FROM foo          
ORDER BY sub_rating,
         exact_address DESC limit 100
-------------------------------------

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: postgis_tiger geocoding extension : France & temporal geocoding

Stephen Woodbridge
Hi Remi,

I have started a project to create an address standardizer that can be
internationalized with the longer term goal of redesigning the postgis
geocoder to work with data from any data sources.

https://github.com/woodbri/address-standardizer

The design for a generic geocoder works like this:

* load street data into a table with geometry
* standardize the street data into a standardized table
* given a query address
   * standardize it
   * generate a query to search for candidate records in the standized table
   * score/filter the candidate records
   * fetch the geometry and compute the location

The address-standardizer needs some performance tuning as it has
problems when the lexicon gets large and/or when the grammar gets too
complex.

I'm still fairly new at C++, and would be happy to get pull requests or
a code review.

I have a sample lexicon and grammar for France and it should be straight
forward to add a dimension like date to the scheme above and filter
candidate records based on dates in the scoring process.

I would be happy to work with you on this and discuss how to best solve
these problems.

Thanks,
   -Steve

On 6/16/2016 6:10 AM, Rémi Cura wrote:

> Hey dear list,
> I'm currently considering extending postgis geocoder.
>
> I have two needs:
>  - to work with French data
>  - to perform temporal geocoding (you give an addresses *and *a date,
> you get geographic position).
>
> For the moment I'm trying to understand how the geocoder works with USA
> data.
>
> I imported Massachusetts for a test, and everything seems okay except
> that my tiger.state table is empty, which prevents geocoding.
>
> Notwithstanding the import step, and the adress normaliser, it seems to
> boil down to a large query in the function tiger.geocode_address().
>
>
> I would be delighted to have some pointers on the design of the
> postgis_tiger_geocoder extension.
> In perticular, I don't understand the adress standardizer business at
> all (yeah, I found some doc, no, I don't understand it well)
>
>
> Many thanks for help,
> Cheers,
> Rémi-C
>
>
>
>
>
>
> -------------------------------------
> WITH a AS
> (
>          SELECT   *
>          FROM     (
>                              SELECT     f.*,
>                                         ad.side,
>                                         ad.zip,
>                                         ad.fromhn,
>                                         ad.tohn,
>                                         Rank() over(ORDER BY
> Diff_zip(ad.zip,'01104')*2.00 +
>                                         CASE
>                                                    WHEN Lower(f.name
> <http://f.name>) = Lower('FRANKLIN') THEN 0
>                                                    ELSE
> Levenshtein_ignore_case(f.name <http://f.name>, Lower('FRANKLIN') )
>                                         END +
> Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
>                                                    || ' '
>                                                    || Coalesce('ST','')) ) +
>                                         CASE
>                                                    WHEN (
>
> Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
>                                                    ELSE 1
>                                         END +
>                                         CASE
>                                                    WHEN '1'::INTEGER
> BETWEEN Least_hn(ad.fromhn,ad.tohn) AND        Greatest_hn(ad.fromhn,
> ad.tohn) THEN 0
>                                                    ELSE 4
>                                         END +
>                                         CASE
>                                                    WHEN Lower('ST') =
> Lower(f.suftypabrv)
>                                                    OR
> Lower('ST') = Lower(f.pretypabrv) THEN 0
>                                                    ELSE 1
>                                         END + Rate_attributes(NULL,
> f.predirabrv, 'FRANKLIN', f.name <http://f.name> , 'ST', suftypabrv ,
> NULL, sufdirabrv, prequalabr) ) AS rank
>                              FROM
> tiger.featnames
> AS f
>                              inner join
> tiger.addr
> AS ad
>                              ON         (
>                                                    f.tlid = ad.tlid)
>                              WHERE      '25' = f.statefp
>                              AND        '25' = ad.statefp
>                              AND        (
>                                                    Lower(f.fullname)
> LIKE (Coalesce(NULL
>                                                               || ' ','')
>                                                               ||
> Lower('FRANKLIN')
>                                                               || '%')::text
>                                         OR         Lower(f.name
> <http://f.name>) = Lower('FRANKLIN')
>                                         OR         Soundex(f.name
> <http://f.name>) = Soundex('FRANKLIN') )
>                              AND        (
>                                                    ad.zip =
> ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
>          ORDER BY rank limit 30 )
>
> , b AS (
>                   SELECT     a.tlid,
>                      a.fullname,
>                      a.name <http://a.name>,
>                      a.predirabrv,
>                      a.suftypabrv,
>                      a.sufdirabrv,
>                      a.prequalabr,
>                      a.pretypabrv,
>                      b.the_geom,
>                      tfidr,
>                      tfidl,
>                      a.side ,
>                      a.fromhn,
>                      a.tohn,
>                      a.zip,
>                      p.name <http://p.name> AS place
>                   FROM       a
>                   inner join tiger.edges AS b
>                   ON         (
>                             a.statefp = b.statefp
>                      AND        a.tlid = b.tlid )
>                   inner join tiger.faces AS f
>                   ON         (
>                             '25' = f.statefp
>                      AND        ( (
>                                       b.tfidl = f.tfid
>                                    AND        a.side = 'L')
>                             OR         (
>                                       b.tfidr = f.tfid
>                                    AND        a.side = 'R' ) ))
>                   inner join tiger.place p
>                   ON         (
>                             '25' = p.statefp
>                      AND        f.placefp = p.placefp )
>                   WHERE      a.statefp = '25'
>                   AND        b.statefp = '25' )
>
> , sub AS  (
>     SELECT   tlid,
>            predirabrv,
>            coalesce(b.prequalabr
>                 || ' ','' )
>                 || b.name <http://b.name> AS fename,
>            suftypabrv,
>            sufdirabrv,
>            fromhn,
>            tohn,
>            side,
>            zip,
>            rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name
> <http://b.name> , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
>            CASE
>                 WHEN '1'::INTEGER IS NULL
>                 OR       b.fromhn IS NULL THEN 20
>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn, b.tohn)
>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>                 AND      (
>                           '1'::INTEGER % 2) =
> (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn,b.tohn)
>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn) THEN 2
>                 ELSE ((1.0 -
> (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC /
> (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text)))
> )) * 5)::INTEGER + 5
>            END::INTEGER AS sub_rating,
>            '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
>       AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>       AND      (
>                 '1' % 2)::NUMERIC::INTEGER =
> (to_number(b.fromhn,'99999999') % 2) AS exact_address,
>            b.name <http://b.name>,
>            b.prequalabr,
>            b.pretypabrv,
>            b.tfidr,
>            b.tfidl,
>            b.the_geom,
>            b.place
>       FROM     b
>       ORDER BY 10 ,
>            11 DESC limit 20 )
>
>
>
> --, foo AS (
>
>     SELECT DISTINCT
>          ON (
>
>  sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv)
> ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
>                  sub.fename,
>                  coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
>                  sub.sufdirabrv                           AS fedirs,
>                  sub.place ,
>
>  s.stusps
> AS state,
>
>  sub.zip
> AS zip,
>                  interpolate_from_address('1', sub.fromhn, sub.tohn,
> sub.the_geom,
> sub.side)
> AS address_geom,
>                  (sub.sub_rating + least(coalesce(diff_zip('01104' ,
> sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD',
> sub.place),5) )::INTEGER AS sub_rating,
>
>  sub.exact_address
> AS exact_address,
>                  sub.tohn,
>                  sub.fromhn
>          FROM           sub
>          join            tiger.state s
>          ON              (
>                          '25' = s.statefp)
>          ORDER BY        1,
>                  2,
>                  3,
>                  4,
>                  5,
>                  6,
>                  7,
>                  9 limit 20)
>
> SELECT   *
> FROM foo
> ORDER BY sub_rating,
>          exact_address DESC limit 100
> -------------------------------------
>
>
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: postgis_tiger geocoding extension : France & temporal geocoding

Stephen Woodbridge
Also, I forgot to mention that Reginia worked on windows builds for this
and I maintain Windows development builds here:

http://imaptools.com:8080/dl/address-standardizer2/

-Steve

On 6/16/2016 9:15 AM, Stephen Woodbridge wrote:

> Hi Remi,
>
> I have started a project to create an address standardizer that can be
> internationalized with the longer term goal of redesigning the postgis
> geocoder to work with data from any data sources.
>
> https://github.com/woodbri/address-standardizer
>
> The design for a generic geocoder works like this:
>
> * load street data into a table with geometry
> * standardize the street data into a standardized table
> * given a query address
>   * standardize it
>   * generate a query to search for candidate records in the standized table
>   * score/filter the candidate records
>   * fetch the geometry and compute the location
>
> The address-standardizer needs some performance tuning as it has
> problems when the lexicon gets large and/or when the grammar gets too
> complex.
>
> I'm still fairly new at C++, and would be happy to get pull requests or
> a code review.
>
> I have a sample lexicon and grammar for France and it should be straight
> forward to add a dimension like date to the scheme above and filter
> candidate records based on dates in the scoring process.
>
> I would be happy to work with you on this and discuss how to best solve
> these problems.
>
> Thanks,
>   -Steve
>
> On 6/16/2016 6:10 AM, Rémi Cura wrote:
>> Hey dear list,
>> I'm currently considering extending postgis geocoder.
>>
>> I have two needs:
>>  - to work with French data
>>  - to perform temporal geocoding (you give an addresses *and *a date,
>> you get geographic position).
>>
>> For the moment I'm trying to understand how the geocoder works with USA
>> data.
>>
>> I imported Massachusetts for a test, and everything seems okay except
>> that my tiger.state table is empty, which prevents geocoding.
>>
>> Notwithstanding the import step, and the adress normaliser, it seems to
>> boil down to a large query in the function tiger.geocode_address().
>>
>>
>> I would be delighted to have some pointers on the design of the
>> postgis_tiger_geocoder extension.
>> In perticular, I don't understand the adress standardizer business at
>> all (yeah, I found some doc, no, I don't understand it well)
>>
>>
>> Many thanks for help,
>> Cheers,
>> Rémi-C
>>
>>
>>
>>
>>
>>
>> -------------------------------------
>> WITH a AS
>> (
>>          SELECT   *
>>          FROM     (
>>                              SELECT     f.*,
>>                                         ad.side,
>>                                         ad.zip,
>>                                         ad.fromhn,
>>                                         ad.tohn,
>>                                         Rank() over(ORDER BY
>> Diff_zip(ad.zip,'01104')*2.00 +
>>                                         CASE
>>                                                    WHEN Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN') THEN 0
>>                                                    ELSE
>> Levenshtein_ignore_case(f.name <http://f.name>, Lower('FRANKLIN') )
>>                                         END +
>> Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
>>                                                    || ' '
>>                                                    ||
>> Coalesce('ST','')) ) +
>>                                         CASE
>>                                                    WHEN (
>>
>> Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
>>                                                    ELSE 1
>>                                         END +
>>                                         CASE
>>                                                    WHEN '1'::INTEGER
>> BETWEEN Least_hn(ad.fromhn,ad.tohn) AND        Greatest_hn(ad.fromhn,
>> ad.tohn) THEN 0
>>                                                    ELSE 4
>>                                         END +
>>                                         CASE
>>                                                    WHEN Lower('ST') =
>> Lower(f.suftypabrv)
>>                                                    OR
>> Lower('ST') = Lower(f.pretypabrv) THEN 0
>>                                                    ELSE 1
>>                                         END + Rate_attributes(NULL,
>> f.predirabrv, 'FRANKLIN', f.name <http://f.name> , 'ST', suftypabrv ,
>> NULL, sufdirabrv, prequalabr) ) AS rank
>>                              FROM
>> tiger.featnames
>> AS f
>>                              inner join
>> tiger.addr
>> AS ad
>>                              ON         (
>>                                                    f.tlid = ad.tlid)
>>                              WHERE      '25' = f.statefp
>>                              AND        '25' = ad.statefp
>>                              AND        (
>>                                                    Lower(f.fullname)
>> LIKE (Coalesce(NULL
>>                                                               || ' ','')
>>                                                               ||
>> Lower('FRANKLIN')
>>                                                               ||
>> '%')::text
>>                                         OR         Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN')
>>                                         OR         Soundex(f.name
>> <http://f.name>) = Soundex('FRANKLIN') )
>>                              AND        (
>>                                                    ad.zip =
>> ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
>>          ORDER BY rank limit 30 )
>>
>> , b AS (
>>                   SELECT     a.tlid,
>>                      a.fullname,
>>                      a.name <http://a.name>,
>>                      a.predirabrv,
>>                      a.suftypabrv,
>>                      a.sufdirabrv,
>>                      a.prequalabr,
>>                      a.pretypabrv,
>>                      b.the_geom,
>>                      tfidr,
>>                      tfidl,
>>                      a.side ,
>>                      a.fromhn,
>>                      a.tohn,
>>                      a.zip,
>>                      p.name <http://p.name> AS place
>>                   FROM       a
>>                   inner join tiger.edges AS b
>>                   ON         (
>>                             a.statefp = b.statefp
>>                      AND        a.tlid = b.tlid )
>>                   inner join tiger.faces AS f
>>                   ON         (
>>                             '25' = f.statefp
>>                      AND        ( (
>>                                       b.tfidl = f.tfid
>>                                    AND        a.side = 'L')
>>                             OR         (
>>                                       b.tfidr = f.tfid
>>                                    AND        a.side = 'R' ) ))
>>                   inner join tiger.place p
>>                   ON         (
>>                             '25' = p.statefp
>>                      AND        f.placefp = p.placefp )
>>                   WHERE      a.statefp = '25'
>>                   AND        b.statefp = '25' )
>>
>> , sub AS  (
>>     SELECT   tlid,
>>            predirabrv,
>>            coalesce(b.prequalabr
>>                 || ' ','' )
>>                 || b.name <http://b.name> AS fename,
>>            suftypabrv,
>>            sufdirabrv,
>>            fromhn,
>>            tohn,
>>            side,
>>            zip,
>>            rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name
>> <http://b.name> , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
>>            CASE
>>                 WHEN '1'::INTEGER IS NULL
>>                 OR       b.fromhn IS NULL THEN 20
>>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn, b.tohn)
>>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>>                 AND      (
>>                           '1'::INTEGER % 2) =
>> (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
>>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn,b.tohn)
>>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>> THEN 2
>>                 ELSE ((1.0 -
>> (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC /
>> (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text)))
>> )) * 5)::INTEGER + 5
>>            END::INTEGER AS sub_rating,
>>            '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
>>       AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>>       AND      (
>>                 '1' % 2)::NUMERIC::INTEGER =
>> (to_number(b.fromhn,'99999999') % 2) AS exact_address,
>>            b.name <http://b.name>,
>>            b.prequalabr,
>>            b.pretypabrv,
>>            b.tfidr,
>>            b.tfidl,
>>            b.the_geom,
>>            b.place
>>       FROM     b
>>       ORDER BY 10 ,
>>            11 DESC limit 20 )
>>
>>
>>
>> --, foo AS (
>>
>>     SELECT DISTINCT
>>          ON (
>>
>>  sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv)
>> ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
>>                  sub.fename,
>>                  coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
>>                  sub.sufdirabrv                           AS fedirs,
>>                  sub.place ,
>>
>>  s.stusps
>> AS state,
>>
>>  sub.zip
>> AS zip,
>>                  interpolate_from_address('1', sub.fromhn, sub.tohn,
>> sub.the_geom,
>> sub.side)
>> AS address_geom,
>>                  (sub.sub_rating + least(coalesce(diff_zip('01104' ,
>> sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD',
>> sub.place),5) )::INTEGER AS sub_rating,
>>
>>  sub.exact_address
>> AS exact_address,
>>                  sub.tohn,
>>                  sub.fromhn
>>          FROM           sub
>>          join            tiger.state s
>>          ON              (
>>                          '25' = s.statefp)
>>          ORDER BY        1,
>>                  2,
>>                  3,
>>                  4,
>>                  5,
>>                  6,
>>                  7,
>>                  9 limit 20)
>>
>> SELECT   *
>> FROM foo
>> ORDER BY sub_rating,
>>          exact_address DESC limit 100
>> -------------------------------------
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> [hidden email]
>> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: postgis_tiger geocoding extension : France & temporal geocoding

Regina Obe
In reply to this post by Rémi Cura
> On Thu, Jun 16, 2016 at 6:10 AM, Rémi Cura <[hidden email]> wrote:

   >  Hey dear list,
  >  I'm currently considering extending postgis geocoder.

  >  I have two needs:
 >    - to work with French data
 >    - to perform temporal geocoding (you give an addresses and a date, you
get geographic position).

>    For the moment I'm trying to understand how the geocoder works with USA
data.

>    I imported Massachusetts for a test, and everything seems okay except
that my tiger.state table is empty, which prevents geocoding.

>    Notwithstanding the import step, and the adress normaliser, it seems to
boil down to a large query in the function tiger.geocode_address().


>    I would be delighted to have some pointers on the design of the
postgis_tiger_geocoder extension.
>    In perticular, I don't understand the address standardizer business at
all (yeah, I found some doc, no, I don't understand it well)


>    Many thanks for help,
>    Cheers,
>    Rémi-C

Just adding to what Steve has already said.

Don't try to use the postgis tiger geocoder as a model of how to set up a
geocoder.  
The structure of it is very tied to how the US Census Tiger data is setup.
The irony is because it is setup like US Census, I think people try to use
it for more than a geocoder and I get bug reports on fields I ignored
because they really aren't relevant as far as geocoding is concerned or not
relevant to me (in other moneywise ways).  I can't blame them since I do use
it myself on other projects for more than just geocoding.  E.g. census tract
work or just joining with the states, counties etc when I need a county
list.

If I had started from scratch instead of inheriting the code this is not how
I would do it.  
But without enough traction of people willing to devote time/fund  a more
generic geocoder, the more globally useful solution will not go anywhere.


Thanks,
Regina

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: postgis_tiger geocoding extension : France & temporal geocoding

Rémi Cura
Thanks for the anwser Stephen and Regina,
it really helps to provide some broader context about a project history and purpose.

@Stephen I'll contact you in private about that.

Cheers,
Rémi-C

2016-06-17 7:30 GMT+02:00 Regina Obe <[hidden email]>:
> On Thu, Jun 16, 2016 at 6:10 AM, Rémi Cura <[hidden email]> wrote:

   >  Hey dear list,
  >  I'm currently considering extending postgis geocoder.

  >  I have two needs:
 >    - to work with French data
 >    - to perform temporal geocoding (you give an addresses and a date, you
get geographic position).

>    For the moment I'm trying to understand how the geocoder works with USA
data.

>    I imported Massachusetts for a test, and everything seems okay except
that my tiger.state table is empty, which prevents geocoding.

>    Notwithstanding the import step, and the adress normaliser, it seems to
boil down to a large query in the function tiger.geocode_address().


>    I would be delighted to have some pointers on the design of the
postgis_tiger_geocoder extension.
>    In perticular, I don't understand the address standardizer business at
all (yeah, I found some doc, no, I don't understand it well)


>    Many thanks for help,
>    Cheers,
>    Rémi-C

Just adding to what Steve has already said.

Don't try to use the postgis tiger geocoder as a model of how to set up a
geocoder.
The structure of it is very tied to how the US Census Tiger data is setup.
The irony is because it is setup like US Census, I think people try to use
it for more than a geocoder and I get bug reports on fields I ignored
because they really aren't relevant as far as geocoding is concerned or not
relevant to me (in other moneywise ways).  I can't blame them since I do use
it myself on other projects for more than just geocoding.  E.g. census tract
work or just joining with the states, counties etc when I need a county
list.

If I had started from scratch instead of inheriting the code this is not how
I would do it.
But without enough traction of people willing to devote time/fund  a more
generic geocoder, the more globally useful solution will not go anywhere.


Thanks,
Regina

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel