In my filepath I want to check for specific directory names. If found, then only process.
My filepath values are like
force-app\main\default*aura*\TestCompDeploy\TestCompDeployHelper.js
force-app\main\default*lwc*\testLWCDeployComp\testLWCDeployComp.js
force-app\main\default*staticresources*\logo.jpeg
Below MATCH expression with a single string pattern works.
<matches string="${SamplePathTrial}" pattern="/aura/"/>
But a multi search pattern fails. How to do something like below?
<matches string="${SamplePathTrial}" pattern="['/aura/', '/lwc/', '/staticresources/']"/>
The matches <condition> pattern is a regular expression, hence you'd likely need something like:
<matches string="${SamplePathTrial}" pattern="/aura/|/lwc/|/staticresources/" />
Pipe characters separate the alternate patterns, effectively saying "match /aura/ OR /lwc/ OR /staticresources/".
I have a neo4j community edition 1.9.5 running in an ec2 m1.medium instance (around 4gb ram) . I have around 300 nodes, 800 relationships and some 2000 properties. Neo4j is running in REST mode. Below is my applicationContext.xml :
<beans profile="default">
<bean class="org.springframework.data.neo4j.rest.SpringRestGraphDatabase" id="graphDatabaseService">
<constructor-arg index="0" value="http://localhost:7474/db/data/"/>
</bean>
<neo4j:config graphDatabaseService="graphDatabaseService"/>
</beans>
Now, I have this query below, which shows all the movies which your friends have liked, which takes like ~10 seconds to return ! :
start user=node(*)
match user-[friend_rela:FRIENDS]-friend,friend-[movie_rela:LIKE]->movie
where has(user.uid) and user.uid={0}
return distinct movie,movie_rela,friend
order by movie_rela.timeStamp desc
skip {1} " +
limit {2}
I have indexed the following things:
My Indexes in the adming UI shows I have indexed the following :
Nodes:
movieId (from Movie)
__types__
Movie
uid (from User)
User
Relationships:
IsFriends
Like
__rel_types__
timeStamp
I have also changed the neo4j-wrapper.conf file to have the following heap sizes
# Initial Java Heap Size (in MB)
wrapper.java.initmemory=512
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=2000
Do you think I am missing anything. Wondering why it takes so long ! Please advise.
Thanks
Your query is horrible inefficient, it basically traverses the full graph multiple times. You should use a index lookup to find the start point for your query and then traverse from the start point(s) - so you're doing a local query instead of a global one:
start user=node:Movie(uid={0})
match (user)-[friend_rela:FRIENDS]-(friend)-[movie_rela:LIKE]->(movie)
return distinct movie,movie_rela,friend
order by movie_rela.timeStamp desc
skip {1} limit {2}
I want to write a ETL for rows in given time duration.
I am thinking of passing start_time and end_time in etl.properties. However I am not sure how do I define the defaults if the properties file do not have them defined.
I was thinking of something like, but not sure if such this is possible or not.
<script connection-id="in" if="not properties.start_time">
select #starttime := last_day(now() - interval 1 month);
</script>
If properties.start_time is not defined use the value of start time as one month from now.
How do I go about it.
Thanks
You can set the default value of the property by adding an assignment after the <include> element. Example:
<properties>
<include href="etl.properties"/>
<!-- The new value is set only if it was not defined before -->
start_time=value
</properties>
In case of multiple declarations of the same property, the one that comes first takes precedence over subsequent declarations. This is why <include> comes first in the above example.
---- Update ----
Alternative option would be to use a a ternary expression, e.g. ${start_time==null?'':a} or COALESCE SQL function which is supported by many databases. The latter should be more suitable for your example. Try if something like this will work:
INSERT INTO SomeTable VALUES (COALESCE(?start_time, last_day(now() - interval 1 month)));
Washington
New York
New Delhi
India
United States Of America
In ant I want to extract all the values as separate values like washington, new, delhi, india, united, states, of, america. Altough I am able to extract them line wise as
<loadfile property="message" srcFile="../Ant_Scripts/Name.csv"/>
<target name="init">
<for list="${message}" delimiter="${line.separator}" param = "val">
<echo message=${val}/>
but I am not able to extract them as individual units that is once I got New Delhi or New York I should be able to get New and Delhi seprately also.
can you please post your ant script – Satya
<loadfile property="message" srcFile="../Ant_Scripts/Name.csv"/>
<target name="init">
<for list="${message}" delimiter="${line.separator}" param = "val">
<sequential>
<echo>$val</echo>
</sequential>
</for>
</target>
</project>
This code will print all the names line by line, but after this I want to break those lines on the basis of space.
There is one fundamental error:
http://dailyraaga.wordpress.com/2010/12/21/ant-for-loop/
you have to access the CSV element with #{param}, no ${param}. This loop uses attributes, not properties ;)
There is also one task you might find handy for your line seperation needs:
http://ant.apache.org/manual/Tasks/fixcrlf.html
Is there any way to implement JOINS in Lucene?
You can also use the new BlockJoinQuery; I described it in a blog post here:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
You can do a generic join by hand - run two searches, get all results (instead of top N),
sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).
There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.
There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by #ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.
An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.
Lucene does not support relationships between documents, but a join is nothing else but a specific combination of multiple AND within parenthesis, but you will need to flatten the relationship first.
Sample (SQL => Lucene):
SQL:
SELECT Order.* FROM Order
JOIN Customer ON Order.CustomerID = Customer.ID
WHERE Customer.Name = 'SomeName'
AND Order.Nr = 400
Lucene:
Make sure you have all the neccessary fields and their respective values on the document like:
Customer.Name => "Customer_Name" and
Order.Nr => "Order_Nr"
The query would then be:
( Customer_Name:"SomeName" AND Order_Nr:"400" )
https://issues.apache.org/jira/browse/SOLR-2272
Use joinutil. It allows query time joins.
See: http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/JoinUtil.html
A little late but you could use Package org.apache.lucene.search.join : https://lucene.apache.org/core/6_3_0/join/org/apache/lucene/search/join/package-summary.html
From their documentation:
The index-time joining support joins while searching, where joined
documents are indexed as a single document block using
IndexWriter.addDocuments().
String fromField = "from"; // Name of the from field
boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
String toField = "to"; // Name of the to field
ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
// Render topDocs...
There are some implementations on the top of Lucene that make those kind of joins among several different indexes possible. Numere (http://numere.stela.org.br/) enable that and make it possible to get results as a RDBMS result set.
Here is an example Numere provides an easy way to extract analytical data from Lucene indexes
select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total"
from a (index)
inner join b (index) on (a.seq_id = b.seq_id)
group by a.type, b.category
order by a.type asc, b.category asc
Join join = RequestFactory.newJoin();
// inner join a.seq_id = b.seq_id
join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER);
// left
{
Request left = join.left();
left.repository(UtilTest.getPath("indexes/md/master"));
left.addColumn("type").textType().asc();
left.addMeasure("value").alias("sales").intType().sum();
}
// right
{
Request right = join.right();
right.repository(UtilTest.getPath("indexes/md/detail"));
right.addColumn("category").textType().asc();
right.addMeasure("product_id").intType().alias("total").count_distinct();
}
Processor processor = ProcessorFactory.newProcessor();
try {
ResultPacket result = processor.execute(join);
System.out.println(result);
} finally {
processor.close();
}
Result:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<DATAPACKET Version="2.0">
<METADATA>
<FIELDS>
<FIELD attrname="type" fieldtype="string" WIDTH="20" />
<FIELD attrname="category" fieldtype="string" WIDTH="20" />
<FIELD attrname="sales" fieldtype="i8" />
<FIELD attrname="total" fieldtype="i4" />
</FIELDS>
<PARAMS />
</METADATA>
<ROWDATA>
<ROW type="Book" category="stand" sales="127003304" total="2" />
<ROW type="Computer" category="eletronic" sales="44765715835" total="896" />
<ROW type="Meat" category="food" sales="3193526428" total="110" />
... continue