Neo4j creating nodes and relationships from Bus route CSV - neo4j

I have a CSV file with bus route information that looks like this. I am having trouble creating nodes and path relationships in Neo4j with it in this format.
I would like to have nodes for the stops and routes, and routes between them using the sequence and route detail id to show the direction of the routes.
RouteName
route_detail_id
Stop
Sequence
Arrives
Departs
Bus1
50701
Cherry
1
9:00
Bus1
50802
Market
2
9:30
10:00
Bus1
59003
Raleigh
3
10:30
10:50
Bus1
59004
Stuart
4
11:05
11:30
Bus1
58006
Possum
5
12:30
Bus2
67003
Cherry
1
11:00
Bus2
67004
Market
2
11:30
12:00
Bus2
67009
Raleigh
3
12:30
12:50
Bus2
67010
Stuart
4
13:05
13:30
Bus2
67011
Possum
5
14:30
Bus3
89004
Highland
1
9:00
Bus3
88005
McKinley
2
9:30
10:00
Bus3
67098
Jersey
3
10:30
10:50
Bus3
4500
Ridgewood
4
11:05
11:30
Bus3
67890
Osprey
5
12:30
route_detail_id is the unique identifier for that particular stop on that particular route.
I would like to be able to use the times for shortest path queries in the future, but right now would just like to be able to create a structure and visualize in neo4j.
Eventually it will be used to create connecting routes, and shortest path searching, but right now I am just stumbling over even converting information in this format to Neo4j.

I would start by converting the format into a list of nodes connected by arcs, such as:
Cherry -- Bus1, 50701, n/a, 9:00 --> Market
Market -- Bus1, 50802, 9:30, 10:00 --> Raleigh
...
Cherry -- Bus2, 67003, n/a, 11:00 --> Market
..
This seems to me to be a more natural way of representing the data, as you have stops (nodes), which are connected by bus routes (directed arcs, with route details).
You can then query the database by looking for links between the nodes. You can convert also convert the arrival/departure times into the duration of the journey between two nodes if you want to find a shortest path.

Related

Time function in sheets

I have the data of 4000 employees in google sheets along with their shift timings (9 hour long shift) spread across 24 hours. I wish to use a formula to understand the most common timing these employees are available in the office (09:00 to 18:00). My results would be 09:00 to 11:00, 11:00 to 13:00, 13:00 to 15:00, 15:00 to 18:00, 18:00 to 22:00, 22:00 to 09:00.
I could have used this formula to derive to the value:
=IF(AND(TIMEVALUE(A2)>=TIMEVALUE("09:00"), TIMEVALUE(A2)<=TIMEVALUE("11:00")), "09:00 to 11:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("11:00"), TIMEVALUE(A2)<=TIMEVALUE("13:00")), "11:00 to 13:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("13:00"), TIMEVALUE(A2)<=TIMEVALUE("15:00")), "13:00 to 15:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("15:00"), TIMEVALUE(A2)<=TIMEVALUE("18:00")), "15:00 to 18:00",
IF(AND(TIMEVALUE(A2)>=TIMEVALUE("18:00"), TIMEVALUE(A2)<=TIMEVALUE("22:00")), "18:00 to 22:00", "22:00 to 09:00")))))
but the problem is the timings are not in the time format but they are in text format
Here's my take:
Suppose Column A has clock ins, and Column B has clock outs. Let Column D have Times starting at 00:00 and going up to 33:00 (8am next day) in 5 minute (or 30, 60 etc) increments.
Let column E be the amount of clock in and outs that an employee was in the office at the time referred to in E.
We will define E to be =COUNTIFS($A$2:$A$9999,"<="&D2,$B$2:$B$9999,">="&D2).
Next, apply some conditional formatting to highlight the most busy times.
Note that you will need only the times of day, which it sounds like you have, but you will need to convert overnight shifts to not wrap around midnight.

How can I manually supplement data coming from IMPORTHTML for error handling?

I have a google sheet that grabs data from multiple web pages to list the next game for each team in the FIFA World Cup. I use the following to do so for the follow example of 2 teams. In reality I would like to grab for 5:
=UNIQUE({
IMPORTHTML("https://www.espn.com/soccer/team/fixtures/_/id/2869/league/FIFA.WORLD#","TABLE",1);
IMPORTHTML("https://www.espn.com/soccer/team/fixtures/_/id/475/league/FIFA.WORLD#","TABLE",1)
})
The problem is that sometimes teams get eliminated and don't have a next game, so the table is deleted from the webpage and an error is returned by the above function,
How can I trap the error on each IMPORTHTML call and return an empty row instead of the whole UNIQUE function failing. Here's what the data returns when it works properly.
DATE MATCH TIME COMPETITION TV Opponent
Thu, Dec 1 Canada v Morocco 10:00 AM FIFA World Cup FS1 Canada
Fri, Dec 2 Serbia v Switzerland 2:00 PM FIFA World Cup FS1 Serbia
Here's what I would like it to look like if the first IMPORT fails:
DATE MATCH TIME COMPETITION TV Opponent
Fri, Dec 2 Serbia v Switzerland 2:00 PM FIFA World Cup FS1 Serbia
Have you considered using an IFERROR for each IMPORTHTML so it doesn't affect the other values in the table? Like this:
=UNIQUE({
IFERROR(IMPORTHTML("https://www.espn.com/soccer/team/fixtures/_/id/2869/league/FIFA.WORLD#","TABLE",1));
IFERROR(IMPORTHTML("https://www.espn.com/soccer/team/fixtures/_/id/475/league/FIFA.WORLD#","TABLE",1))
})
You can read more about the IFERRORformula here.

How can I make sure every change to dimension is captured in SCD2?

I work for a finance company. We need to track exact value dimensions at the time of the transaction. We try to load data incrementally into the warehouse ~ 15 mins, and in this period, we could see a dimension with the exact business key change multiple times (multiple records are collected). Usually, we write scripts to pick the latest of all the changes in 15 min window. But in our case, I want all those changes to be loaded into dimension table. How can this be implemented?
EDIT:
Examples in same Batch:
Business Key, Name, email (scd 2), Created_at
1, xyz, xyz#gmail.com, 1/1/21 10:00 AM
1, xyz, abc#gmail.com, 1/1/21 10:05 AM
Expected changes in dimension
SK, BK, Name, Email, Effective_date, Expiration_date, Current
1, 1, efg#gmail.com, 01/01/1900 0:00 AM, 1/1/21 9:59 AM, N
--- New changes from batch ------
2, 1, xyz#gmail.com, 01/01/2021 10:00 AM, 01/01/2021 10:05 AM, N
3, 1, abc#gmail.com, 01/01/2021 10:05 AM, 12/31/9999 00:00 AM, Y

HighCharts - Column style, interval of 5 minutes

I want to display a Highcharts.com column-style chart with a column interval of exactly 5 minutes. My database contains data for every minute. Now I see irregular gaps of sometimes 6 or 7 minutes in my chart. So e.g. 10:05 a.m., 10:10 a.m., 10:16 a.m., 10:21 a.m., etc. etc. Where do these gaps come from and why is the data not consistently read from the database from 0:00 a.m. to 12:59 p.m. in 5 minute intervals?

Rails, postgres, timezones, and ActiveRecord order method

I have a query that orders a group of tasks by their start_date(UTC datetime).
This works perfectly.
The problem comes when displaying the ordered records for the end user in their timezone.
lets say I have 5 tasks (ordered by their utc times)
1 2:00
2 3:00
3 12:00
4 16:00
5 22:00
This is the correct order.
But when I convert to EDT:
1 22:00
2 23:00
4 12:00
5 18:00
Is there a way to specify the timezone you're querying in postgres using activerecord?
Thank you

Resources