parsing a text file into groups using Scala

parsing a text file into groups using Scala - parsing

I have a CSV file that is really a set of many CSV files in one. Something like this:
"First Part"
"Some", "data", "in", "here"
"More", "stuff", "over", "here"
"Another Part"
"This", "section", "is", "not", "the", "same", "as", "the", "first"
"blah", "blah", "blah", "blah", "blah", "blah", "blah", "blah", "blah"
"Yet another section"
"And", "this", "is", "yet", "another"
"blah", "blah", "blah", "blah", "blah"
I'd like to break it into separate components. Given I know the header for each section, it'd be nice if I could do some kind of groupBy or something where I pass in a set of regexp's representing header patterns and return a Seq[Seq[String]] or something similar.

You could do the following:
val groups = List("\"First Part\"", "\"Another Part\"", "\"Yet another section\"")
val accumulator = List[List[String]]()
val result = input.split("\n").foldLeft(accumulator)((acc,e) => {
if (groups.contains(e)) {
// Make new group when we encounter a string matching one of the groups
Nil :: acc
} else {
// Grab current group and modify it
val newHead = e :: acc.head
newHead :: acc.tail
}
})
Each list in result now represent a group. If you want to use regex to find your matches then just replace the groups.contains(e) with a match test. There are some subtleties here that might deserve a mention:
The algorithm will fail if the input does not start with a heading
If a heading is present several times each time it is present will generate a new group
Groups will contain the lines in the input in reverse.
Empty lines will also be included in the result.

EDIT this is similar to the other solution that was posted at the same time. A similar thing for the sections headings could be done instead of my quick hack of size==1. This solution has the added benefit of including the secion name so ordering doesn't matter.
val file: List[String] = """
heading
1,2,3
4,5
heading2
5,6
""".split("\n").toList
val splitFile = file
.map(_.split(",").toList)
.filterNot(_ == List(""))
.foldLeft(List[(String, List[List[String]])]()){
case (h::t,l) => {if(l.size==1) (l(0),List()):: h :: t else (h._1, l :: h._2) :: t};
case (Nil, l)=> if(l.size==1) List((l(0),List())) else List() }
.reverse
produces
splitFile: List[(String, List[List[String]])] = List((heading,List(List(4, 5), List(1, 2, 3))), (heading2,List(List(5, 6))))

Related

lua table constructor with keys containing spaces

i know i can construct tables like this:
local t= {
first = "value1",
second = "value2"
}
and i now i can use keys containing spaces like t["some key"] = "some value"
but is there a way to construct table like above with keys containing spaces?
I tried a few things, but i only goit errors

You can declare any expression as a key in a table constructor by putting it in brackets:
local t = {
["some key"] = "some value",
[234567 + 2] = "other value",
}

local t= {
first = "value1",
second = "value2"
}
Is syntactic sugar for
local t= {
["first"] = "value1",
["second"] = "value2"
}
This convenient syntax only works for names. Lua names may only consist of letters, numbers and underscore and they must not start with a number.
As a valid Lua identifer may not contain a space you cannot use the syntactic sugar. Hence the only way to do this is to use the full syntax
local t = {["hello world"] = 1}
This also applies to indexing that table field. So the only way is t["hello world"]

Create relationship with multipl values

how can I create an relationship in NEO4J from one node to another node which has multiple vales.
The first node has unique values for the identifier. For example:
Data of the first NodeA:
{
"c": "11037",
"b": 15.4,
"a": 10.0,
"id": 11137100
}
The second NodeB look like this:
{
"text": "some text",
"prio": 1,
"id": 11137100,
"value": 0.1
}
But here we have data which has the same id like here:
{
"text": "some other text",
"prio": 2,
"id": 11137100,
"value": 2.1
}
Now want to create a relationship between both nodes. But if I do things like:
MATCH (p:NodeA),(h:NodeB)
WHERE h.id = p.id
CREATE (p)-[dr:Contains{prio:h.prio}]->(h)
RETURN (dr)
I get multiple relationships. I want one NodeA with two Outputs to NodeB.
How can I do it?

The CREATE statement will create a new node/relationship, irrespective if one already exists.
If the intent is to only create a relationship if one does not already exist, I would suggest you do a pre-filter query first, e.g.
MATCH (p:NodeA), (h:NodeB)
WHERE h.id = p.id AND NOT (p)-[:Contains{prio:h.prio}]->(h)
//continue your query here

How to replace entries in a string array with cypher

I have a string array property like so:
["name1", "name2", "name3", "name2", "name4"]
I would like to replace in this array e.g. "name2" with "name5":
["name1", "name5", "name3", "name5", "name4"]
So far i came up with a query like this:
MATCH (parent)-[rel]->(child)
WHERE 'name2' IN rel.names
SET rel.names = [x IN (rel.names+['name5']) WHERE x<>"name2"]
Which results in nearly what i want:
["name1", "name3", "name4", "name5"]
The problem of this query is obvious - it just add's only one times "name5" statically without checking how often "name2" is in the array. For example if I have "name2" n-times the query only add's one "name5" instead of n-times.
Without the "where clause" the query add's a "name5" to arrays that doesn't even have a "name2" included. The right approach should be that the query should instead find 0 times "name2" and add 0 times "name5". So the where part shouldn't be required. How would you solve the problem and is my solution approach the right way to go?

This should work:
MATCH (parent)-[rel]->(child)
WHERE 'name2' IN rel.names
SET rel.names = [x IN rel.names | CASE WHEN "name2" = x THEN "name5" ELSE x END]

How to filter out Flux<Example> that don't contain some value from Flux<String>

So let's say I have a Flux<String> firstLetters containing "A", "B", "C", "D" and Flux<String> lastLetters containing "X", "Y", "Z"
And I have a Flux containing many:
data class Example(val name: String)
And from the whole Flux<Example> I want to split the elements to two variables: one Flux<Example> containing all that name IN ("A", "B", "C", "D") and second Flux<Example> that has name IN ("X", "Y", "Z") and save those two Fluxes two variables.
Is it possible to do so in one flow without doing same logic first for firstLetters and then for lastLetters

Is it possible to do so in one flow without doing same logic first for firstLetters and then for lastLetters
As the problem stands I don't believe so, as you'll have to process each element multiple times (one per each value on the list to see if it contains the value you need.) You can call cache() on the Flux though to ensure that the values are only retrieved once, or convert to another data structure entirely.
Given that you have to re-evaluate anyway, and assuming you still want to stick with raw Flux objects, filterWhen() and any() can be used quite nicely here:
Flux<Example> firstNames = names.filterWhen(e -> firstLetters.any(e.name::contains));
Flux<Example> lastNames = names.filterWhen(e -> lastLetters.any(e.name::contains));
You can of course pull the Predicate out into a separate method if you're concerned about code duplication there.

If Flux<String> firstLetters/lastLetters can be replaced with Set<String> firstLetters/lastLetters then you can easily leverage Flux::groupBy method on Flux<Example> to split it into different groups.
enum Group {
FIRST, LAST, UNDEFINED
}
Group toGroup(Example example) {
if (firstLetters.contains(example.name)) return FIRST;
else if (lastLetters.contain(example.name)) return LAST;
else return UNDEFINED;
}
Flux<GroupedFlux<Group, Example>> group(Flux<Example> examples) {
return examples.groupBy(example -> toGroup(example));
}
You can then get the group by calling GroupedFlux<K, V>::key.

Parse a complex hash and return changes to keys

I'm using json-compare gem to compare two different json files.
Example file 1:
{"suggestions": [
{
"id1": 1,
"title1": "Test",
"body1": "Test"
}
]
}
Example file 2:
{"suggestions": [
{
"id2": 1,
"title2": "Test",
"body2": "Test"
}
]
}
The gem works well and spits out a hash that looks like this:
{:update=>
{"suggestions" =>
{:update=>
{0=>
{:append=>
{"id2"=>1, "title2"=>"Test", "body2"=>"Test"},
:remove=>
{"id1"=>1, "title1"=>"Test", "body1"=>"Test"},
}
}
}
}
}
How can I parse this and return all the places where json Keys were changed? For the sake of simplicity, how would I put to the console:
id1 changed to id2
title1 changed to title2
body1 changed to body2
For the purpose of what I'm building I don't need to know changes to the values. I just need to know that id1 became id2, etc.

Except if you are relaying on key ordering there is no way to tell that id1 got replaced by id2 and title2 by title1, or that id1 became title1 and id2 became title2. Sounds like you would need specific logic related to the actual key names (in this example searching for different integer suffixes).

Maybe this can be enough for the purpose:
def find_what_changed_in(mhash, result = [])
result << mhash
return if mhash.keys == [:append, :remove]
mhash.keys.each { |k| find_what_changed_in(mhash[k], result) }
result.last
end
find_what_changed_in(changes)
#=> {:append=>{"id2"=>1, "title2"=>"Test", "body2"=>"Test"}, :remove=>{"id1"=>1, "title1"=>"Test", "body1"=>"Test"}}
Where:
changes = {:update=>
{"suggestions" =>
{:update=>
{0=>
{:append=>
{"id2"=>1, "title2"=>"Test", "body2"=>"Test"},
:remove=>
{"id1"=>1, "title1"=>"Test", "body1"=>"Test"},
...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

parsing a text file into groups using Scala - parsing

Related

lua table constructor with keys containing spaces

Create relationship with multipl values

How to replace entries in a string array with cypher

How to filter out Flux<Example> that don't contain some value from Flux<String>

Parse a complex hash and return changes to keys

Categories

Resources