Pandoc's lua filter makes it really easy to iterate over a document and munge the document as you go. My problem is I can't figure out how to isolate list item elements. I can find lists and the block level things inside each list item, but I can't figure out a way to iterate over list items.
For example let's say I had the following Markdown document:
1. One string
Two string
2. Three string
Four string
Lets say I want to make the first line of each list item bold. I can easily change how the paragraphs are handled inside OrderedLists, say using this filter and pandoc --lua-filter=myfilter.lua --to=markdown input.md
local i
OrderedList = function (element)
i = 0
return pandoc.walk_block(element, {
Para = function (element)
i = i + 1
if i == 1 then return pandoc.Para { pandoc.Strong(element.c) }
else return element end
end
})
end
This will indeed change the first paragraph element to bold, but it only changes the first paragraph of the first list item because it's iterating across all paragraphs in all list items in the list, not on each list item, then on each paragraph.
1. **One string**
Two string
2. Three string
Four string
If I separate the two list items into two separate lists again the first paragraph of the first item is caught, but I want to catch the first paragraph of every list item! I can't find anything in the documentation about iterating over list items. How is one supposed to do that?
The pandoc Lua filter docs have recently been updated with more info on the properties of each type. E.g., for OrderedList elements, the docs should say (it currently says items instead of content, which is a bug):
OrderedList
An ordered list.
content: list items (List of Blocks)
listAttributes: list parameters (ListAttributes)
start: alias for listAttributes.start (integer)
style: alias for listAttributes.style (string)
delimiter: alias for listAttributes.delimiter (string)
tag, t: the literal OrderedList (string)
So the easiest way is to iterate over the content field and change items therein:
OrderedList = function (element)
for i, item in ipairs(element.content) do
local first = item[1]
if first and first.t == 'Para' then
element.content[i][1] = pandoc.Para{pandoc.Strong(first.content)}
end
end
return element
end
Let's say I have a PairRDD, students (id, name). I would like to only keep rows where id is in another RDD, activeStudents (id).
The solution I have is to create a PairDD from activeStudents, (id, id), and the do a join with students.
Is there a more elegant way of doing this?
Thats a pretty good solution to start with. If active students is small enough you could collect the ids as a map and then filter with the id presence (this avoids having to a do a shuffle).
Much like you thought, you can do an outer join if both RDDs contain keys and values.
val students: RDD[(Long, String)]
val activeStudents: RDD[Long]
val activeMap: RDD[(Long, Unit)] = activeStudents.map(_ -> ())
val activeWithName: RDD[(Long, String)] =
students.leftOuterJoin(activeMap).flatMapValues {
case (name, Some(())) => Some(name)
case (name, None) => None
}
If you don't have to join those two data sets then you should definitely avoid it.
I had a similar problem recently and I successfully solved it using a broadcasted Set, which I used in UDF to check whether each RDD row (rather value from one of its columns) is in that Set. That UDF is than used as the basis for the filter transformation.
More here: whats-the-most-efficient-way-to-filter-a-dataframe.
Hope this helps. Ask if it's not clear.
I want to replace the value of the 'Amount' key in a map (literal) with the sum of the existing 'Amount' value plus the new 'Amount' value such where both the 'type' and 'Price' match. The structure I have so far is:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,
{type:2, Order:{Price:11,Amount:50}} as NewOrder
(I'm trying to get it to:)
RETURN [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:250},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as CombinedOrders
If there is no existing NewOrder.type and NewOrder.Price then it should obviously insert the new record rather than add it together.
Sorry, this is possibly really straight forward, but I'm not very good at this yet.
thanks
Edit:
I should add, that I have been able to get this working for a simpler map structure as such:
WITH [{type:1, Amount:100},{type:2, Amount:200},{type:3, Amount:300}] as ExistingOrders,
{type:2, Amount:50} as NewValue
RETURN reduce(map=filter(p in ExistingOrders where not p.type=NewValue.type),x in [(filter(p2 in ExistingOrders where p2.type=NewValue.type)[0])]|CASE x WHEN null THEN NewValue ELSE {type:x.type,Amount:x.Amount+NewValue.Amount} END+map) as CombinedOrders
But I'm struggling I think because of the Orders[array] in my first example.
I believe you are just trying to update the value of the appropriate Amount in ExistingOrders.
The following query is legal Cypher, and should normally work:
WITH ExistingOrders, NewOrder, [x IN ExistingOrders WHERE x.type = NewOrder.type | x.Orders] AS eo
FOREACH (y IN eo |
SET y.Amount = y.Amount + CASE WHEN y.Price = NewOrder.Order.Price THEN NewOrder.Order.Amount ELSE 0 END
)
However, the above query produces a (somewhat) funny ThisShouldNotHappenError error with the message:
Developer: Stefan claims that: This should be a node or a relationship
What the message is trying to say (in obtuse fashion) is that you are not using the neo4j DB in the right way. Your properties are way too complicated, and should be separated out into nodes and relationships.
So, I will a proposed data model that does just that. Here is how you can create nodes and relationships that represent the same data as ExistingOrders:
CREATE (t1:Type {id:1}), (t2:Type {id:2}), (t3:Type {id:3}),
(t1)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t1)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t1)-[:HAS_ORDER]->(:Order {Price:12,Amount:300}),
(t2)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t2)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t2)-[:HAS_ORDER]->(:Order {Price:12,Amount:300}),
(t3)-[:HAS_ORDER]->(:Order {Price:10,Amount:100}),
(t3)-[:HAS_ORDER]->(:Order {Price:11,Amount:200}),
(t3)-[:HAS_ORDER]->(:Order {Price:12,Amount:300});
And here is a query that will update the correct Amount:
WITH {type:2, Order:{Price:11,Amount:50}} as NewOrder
MATCH (t:Type)-[:HAS_ORDER]->(o:Order)
WHERE t.id = NewOrder.type AND o.Price = NewOrder.Order.Price
SET o.Amount = o.Amount + NewOrder.Order.Amount
RETURN t.id, o.Price, o.Amount;
There's two parts to your question - one with a simple answer, and a second part that doesn't make sense. Let me take the simple one first!
As far as I can tell, it seems you're asking how to concatenate a new map on to a collection of maps. So, how to add a new item in an array. Just use + like this simple example:
return [{item:1}, {item:2}] + [{item:3}];
Note that the single item we're adding at the end isn't a map, but a collection with only one item.
So for your query:
RETURN [
{type:1, Orders:[{Price:10,Amount:100},
{Price:11,Amount:200},
{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},
{Price:11,Amount:**250**},
{Price:12,Amount:300}]}]
+
[{type:3, Orders:[{Price:10,Amount:100},
{Price:11,Amount:200},{Price:12,Amount:300}]}]
as **CombinedOrders**
Should do the trick.
Or you could maybe do it a bit cleaner, like this:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},
{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,
{type:2, Order:{Price:11,Amount:50}} as NewOrder
RETURN ExistingOrders + [NewOrder];
OK now for the part that doesn't make sense. In your example, it looks like you want to modify the map inside of the collection. But you have two {type:2} maps in there, and you're looking to merge them into something with one resulting {type:3} map in the output that you're asking for. If you need to deconflict map entries and change what the map entry ought to be, it might be that cypher isn't your best choice for that kind of query.
I figured it out:
WITH [{type:1, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},Price:12,Amount:300}]},{type:2, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]},{type:3, Orders:[{Price:10,Amount:100},{Price:11,Amount:200},{Price:12,Amount:300}]}] as ExistingOrders,{type:2, Orders:[{Price:11,Amount:50}]} as NewOrder
RETURN
reduce(map=filter(p in ExistingOrders where not p.type=NewOrder.type),
x in [(filter(p2 in ExistingOrders where p2.type=NewOrder.type)[0])]|
CASE x
WHEN null THEN NewOrder
ELSE {type:x.type, Orders:[
reduce(map2=filter(p3 in x.Orders where not (p3.Price=(NewOrder.Orders[0]).Price)),
x2 in [filter(p4 in x.Orders where p4.Price=(NewOrder.Orders[0]).Price)[0]]|
CASE x2
WHEN null THEN NewOrder.Orders[0]
ELSE {Price:x2.Price, Amount:x2.Amount+(NewOrder.Orders[0]).Amount}
END+map2 )]} END+map) as CombinedOrders
...using nested Reduce functions.
So, to start with it combines a list of orders without matching type, with a list of those orders (actually, just one) with a matching type. For those latter ExistingOrders (with type that matches the NewOrder) it does a similar thing with Price in the nested reduce function and combines non-matching Prices with matching Prices, adding the Amount in the latter case.
I have two cases. The preliminary code:
open Microsoft.Office.Interop.Excel
let xl = ApplicationClass()
xl.Workbooks.OpenText(fileName...)
let wb = xl.Workbooks.Item(1)
let ws = wb.ActiveSheet :?> Worksheet
let rows = string ws.UsedRange.Rows.Count
First, I try the following to sort:
ws.Sort.SortFields.Clear()
ws.Sort.SortFields.Add(xl.Range("A8:A" + rows), XlSortOn.xlSortOnValues, XlSortOrder.xlAscending, XlSortDataOption.xlSortNormal) |> ignore
ws.Sort.SortFields.Add(xl.Range("H8:H" + rows), XlSortOn.xlSortOnValues, XlSortOrder.xlAscending, XlSortDataOption.xlSortNormal) |> ignore
ws.Sort.SetRange(xl.Range("A7:I" + rows))
ws.Sort.Header <- XlYesNoGuess.xlYes
ws.Sort.MatchCase <- false
ws.Sort.Orientation <- XlSortOrientation.xlSortRows
ws.Sort.SortMethod <- XlSortMethod.xlPinYin
ws.Sort.Apply()
This results in "The sort reference is not valid. Make sure that it's within the data you want to sort, and the first Sort By box isn't the same or blank.".
Then I try the following to sort:
ws.Range("A7:I" + rows).Sort(xl.Range("A8:A" + rows), XlSortOrder.xlAscending,
xl.Range("H8:H" + rows), "", XlSortOrder.xlAscending,
"", XlSortOrder.xlAscending, XlYesNoGuess.xlYes,
XlSortOrientation.xlSortRows) |> ignore
This results in my columns being rearranged, though I don't see any logic to the rearrangement. And the rows are not sorted.
Please tell me what I'm doing wrong.
I haven't figured out why the first Sort attempt doesn't work. But I checked out the documentation on the second Sort. Using named parameters and dropping all but necessary parameters, I came up with the following:
ws.Range("A8:H" + rows).Sort(Key1=xl.Range("A8:A" + rows), Key2=xl.Range("H8:H" + rows),
Orientation=XlSortOrientation.xlSortColumns) |> ignore
The default Orientation is xlSortRows. I interpret this as sorting the rows, the normal, default, intuitive sort that Excel does when one sorts manually. Oh no. If you want the normal, default, intuitive sort that Excel does when one sorts manually, specify xlSortColumns.
The first Sort doesn't work because you're probably trying to sort a table (judging by property Header set to XlYesNoGuess.xlYes) by columns located on A and H. In this case you can only sort using "Sort top to bottom"(xlSortColumns).
You can also convince yourself by trying this in Excel first and see what options are available:
1) Select the range you want to apply sorting (e.g. in your case "A7:I" + rows)
2) Click on "Data" menu -> "Sort & Filter" task pane -> Sort
3) Add/Delete columns/rows by using "Add Level"/"Delete Level" buttons
4) Then you can select whatever level you want and click "Options..." button and you will see your available "Sort Options".
You will notice that in case of a table you will have only one available option for "Orientation", which is "Sort top to bottom"
xlSortColumns - sorts by columns (meaning rows are sorted based on those columns)
xlSortRows - sort by rows (meaning columns are sorted based on those rows)