How can I avoid double counts from overlapping areas in Postgis? - geolocation

I want to compute the impact of events in a town using Postgis. I have a table with point locations (event_count_2019_geo) of the events and a table containing all buildings of the town (utrecht_2020) as well in point locations. I count all houses around the event in a range of slightly more than 200 meters and count the number of inhabited houses. See code below.
-- In a range of ~200 meters
UPDATE event_count_2019_geo
SET gw200 = temp.aantal_woningen
FROM (SELECT locatie, count(event_count_2019_geo.locatie) AS aantal_woningen
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002))
WHERE bag.verblijfsobjectgebruiksdoel LIKE '%woonfunctie%'
GROUP BY locatie
) AS temp
WHERE event_count_2019_geo.locatie = temp.locatie;
Trouble is that I end up with way too many houses being impacted by the event. I made a drawing of all ranges of 200m around each event (see picture below). The overlapping areas are counted twice, thrice or event four times. The houses are counted correctly for each event but I cannot sum the results. Is there a way to correct for these overlaps so that I can come at a correct total of the number of houses over all selected events?
Edit: Example
Just a very simple example: a query of event 1 yields the houses A, B, D; event 2 = C, D, E. The count for each event is 3, their sum is 6 (which is correct behavior indeed) and what I would like to see is 5, as D is counted double.

Thanks to the suggestion of #JimJones I found the solution. I defined two views: one in the old way that finds all houses (find_houses_all) and the other to only return unique houses (find_houses_unique).
-- Find all houses within a radius of ~200m of an event
DROP VIEW IF EXISTS find_houses_all;
CREATE VIEW find_houses_all AS
SELECT bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
event_count_2019_geo.locatie
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));
-- Find all *unique* houses within a radius of ~200m of an event
-- Each house is uniquely identiefied by openbareruimte, huisnummer, huisletter
-- and huisnummertoevoeging, so these are the columns to apply DISTINCT ON
DROP VIEW IF EXISTS find_houses_unique;
CREATE VIEW find_houses_unique AS
SELECT DISTINCT ON(bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging)
bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
event_count_2019_geo.locatie
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));
I ran both scripts and got indeed output as I expected.
SELECT locatie, COUNT (locatie)
FROM find_houses_all -- find_houses_unique
GROUP BY locatie
ORDER BY locatie;
The output for find_houses_all is in all cases more or equal than the output for find_houses_unique. Sample output in a spreadsheet and subtracted looks as follows:
Locatie All Unique All - Unique
achter st.-ptr. 617 222 395
berlijnplein 87 87 0
boothstraat 653 175 478
breedstraat 1057 564 493
buurkerkhof 914 163 751
catharijnesngl. 134 38 96
domplein 842 149 693
...
Total 35399 13196 22203
negative numbers would have indicated an error.

Related

Filter values where combination of two columns in one table matches another table

I'm working in Google Sheets and trying to create a FILTER function that returns only the results from a second table where a pair of values exists in the first table. Here's a simplified example:
SpellsInitial (Table 1)
Level
Name
1
Heal
2
Flaming Sphere
3
Fireball
SpellsHeightened (Table 2)
Level
Name
1
Heal
2
Flaming Sphere
2
Heal
3
Fireball
3
Flaming Sphere
3
Heal
And I want to filter SpellsHeightened to return only the results that are in SpellsInitial—essentially "(Level=Level)*(Name=Name)=1".
I have a FILTER function taking a level value as input to print a list of names, but I can't seem to get the ArrayFormula part to work.
=TRANSPOSE(FILTER(SpellsHeightened_Name, A30=SpellsHeightened_Level, (SpellsHeightened_Name=SpellsInitial_Name)*(A30=SpellsInitial_Level)))
I know what I actually need on the last line is "the value on a given line in SpellsHeightened_Name" because otherwise it's the whole array, but I guess I'm struggling to identify and pass in that value using only a level value as input. I tried nesting one FILTER (to get the list of names from Heightened) inside a second FILTER (to match the names up with Initial) but could get that figured out either.
Here's the actual thing in practice.
Perhaps:
=FILTER( SpellsHeightened,
ISNUMBER( MATCH( SpellsHeightened_Level&SpellsHeightened_Name,
SpellsInitial_Level&SpellsInitial_Name, 0 ) ) )

SPSS - Filter columns based on specific criteria

I have a dataset (See below) where I want to filter out any observations where there is only a 1 in the McDonalds column, such as for ID#3 (I do not want Mcdonalds in my analyses). I want to keep any observations where there is a 1 in other columns (eventhough there is a 1 in the McDonalds column - such as ID #1-2). I have tried using the select cases option, and just putting McDonalds=0, but this filters out any observations where there are 1s in the other columns as well. Below is a sample of my dataset, I actually have many more columns and was trying to avoid having to individually name every other column in the "Select Cases" option in SPSS. Would anyone be able to help me please? Thanks.
Data:
To avoid naming each of the other columns separately you can use to in the syntax. Also, basically, you want to keep lines that have 1 in any of the other columns regardless of the value in the Mcdonald's column, so there is no need to mention it in the syntax.
So say for example that your column names are McDonalds, RedBull, var3, var4, var5, TacoBell, you could use either of these following options:
select if any(1, RedBull to TacoBell).
or this :
select if sum(RedBull to TacoBell)>1.
Note: using the to convention requires that the relevant variables be contiguous in the data.
You just need to add the "OR" operator (which is the vertical bar: |) between all the mentioned conditions.
So basically, you want to keep the cases when McDonalds = 0 | RedBull = 1 | TacoBell = 1.
You can either copy the above line into the Select cases -> If option, or write the following lines into the SPSS syntax file, replacing the DataSet1 for the name of your dataset:
DATASET ACTIVATE DataSet1.
USE ALL.
COMPUTE filter_$=(McDonalds = 0 | RedBull = 1 | TacoBell = 1).
VARIABLE LABELS filter_$ 'McDonalds = 0 | RedBull = 1 | TacoBell = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

Find nodes with 3+ occurrences in a 10 minute period

I have a list of nodes with a startTime property. I need to determine if the list contains a clump of 3 or more nodes with a startTime within 10 minutes of each other. I don't need to get the nodes that are in the clump, I just need a boolean indicating the existence of such a clump.
I am at a loss, everything I have tried fails so badly that it is not worth posting them.
I feel that I am missing something easy.
This should be doable.
First you'll need to collect the startTimes, order them, and collect them.
From there, you'll need to get the relevant pairings (each entry, and the entry 2 indices ahead for the end of the duration) that will comprise a group of 3, then see if the start times of that pair occur within 10 minutes of each other.
Assuming for the sake of example :Event nodes with a startTime property, you might use this query to get the results you want:
MATCH (e:Event)
WITH e
ORDER BY e.startTime ASC
WITH collect(e.startTime)[1..] as times
WITH times, range(0, size(times) - 3) as indices
RETURN any(index in indices WHERE times[index + 2] <= times[index] + duration({minutes:10}))

when loading csv in neo4j do not create all the relationships

good to all please help me with this problem :D
when I execute my query:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:///Create_all.csv" AS row
MATCH(x:Category{uuid:row.uuid_category})
MERGE (t:Subscriber{name:row.name_subscriber, uuid:row.uuid_subscriber})
CREATE (n:Product{name: row.name_product, uuid: row.uuid_product}),
(Price:AttributeValue{name:'Price', value: row.price_product}),
(Stock:AttributeValue{name:'Stock', value: row.stock_product }),
(Style:AttributeValue{name:'Style', value: 'Pop Art'}),
(Subject:AttributeValue{name:'Subject', value: 'Portrait'}),
(Originality:AttributeValue{name:'Originality', value: 'Reproduction'}),
(Region:AttributeValue{name:'Region', value: 'Japan'}),
(Price)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Stock)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Style)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Subject)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Originality)-[:IS_ATTRIBUTEVALUE_OF]->(n),
(Region)-[:IS_ATTRIBUTEVALUE_OF]->(n)
WITH (n),(t),(x)
create (n)-[:OF_CATEGORY]->(x)
create (t)-[:SELLS]->(n)
The format of my csv is as follows:
I have 4 categories, 30 products and 10 subscriber creates me:
Added 164 labels, created 164 nodes, set 328 properties, created 184
relationships, completed after 254 ms.
I verify the result with:
MATCH p=()-[r:OF_CATEGORY]->() RETURN count(r)
There are 23 relationships created, however, the remaining 7 relationships were not created.
please guide me with the query should be created all relationships in this case would be 30 relationships products with categories
The critical part is MATCH(x:Category{uuid:row.uuid_category})
If that match fails for a row, the row will be wiped out and none of the other operations for that row will execute.
Since your input consists of 4 of the same category (let's call them 1,2,3,and 4) repeating 7 times (for 28 rows total so far), and then two of those occurring one more time each (2 times if both successful, for a total of your entire 30 rows), it would make sense if some of your matches are failing, with :Category nodes with some of those uuid_category properties not actually being present in the graph.
Of those uuids (1,2,3, and 4), only 1 and 2 occur at the end (so occurring across 8 rows for these two, as opposed to 7 times for uuids 3 and 4). It would make sense if either uuid 3 or 4 doesn't have a corresponding node in the graph. That would get us 1 * 7 + 2 * 8 = 23, which is the number of relationships that your query is creating.
So there is no :Category node for the uuid_category ending with either 3 or 4.
Check your graph against your data to confirm.

How to apply content based filtering in ne04j

I have a data in below format where 1st column represents the products node, all the following columns represent properties of the products. I want to apply content based filtering algo using cosine similarity in Neo4j. For that, I believe, I need to define the fx columns as the properties of each product node and then call these properties as a vector and then apply cosine similarity between the products. I am having trouble doing two things:
1. How to define these columns as properties in one go(as the columns could be more than 100).
2. How to call all the property values as a vector to be able to apply cosine similarity.
Product f1 f2 f3 f4 f5
P1 0 1 0 1 1
P2 1 0 1 1 0
P3 1 1 1 1 1
P4 0 0 0 1 0
You can use LOAD CSS to input your data.
For example, this query will read in your data file and output for each input line (ignoring the header line) a name string and a props collection:
LOAD CSV FROM 'file:///data.csv' AS line FIELDTERMINATOR ' '
WITH line SKIP 1
RETURN HEAD(line) AS name, [p IN TAIL(line) | TOFLOAT(p)] AS props
Even though your data has a header line, the above query skips over it, as it is not needed. In fact, we don't want to use the WITH HEADERS option of LOAD CSV, since that would convert each data line into a map, whereas it is more convenient for our current purposes to get each data line as a collection of values.
The above query assumes that all the columns are space-separated, that the first column will always contain a name string, and that all other columns contain the numeric values that should be put into the same collection (named props).
If you replace RETURN with WITH, you can append additional clauses to the query that make use of the name and props values.

Resources