Java 8 parallelStream forEach iteration [duplicate] - foreach

Java Code Like :
List<Detail> DbDetails = ... Like 50000 rows records
Map<Long, List<Detail>> details = new HashMap();
DbDetails .parallelStream().forEach(detail -> {
Long id = detail.getId();
details.computeIfAbsent(id, v -> new ArrayList<>()).add(detail);
});
Then ...
details.entrySet().stream().forEach(e -> {
e.getValue(); // Some value is empty
});
I guess it because HashMap is thread-unsafe, so I use Hashtable instead of it. Then it run ok, all value has value, but I don't know why?

HashMap is not thread-safe, so don't use parallel streams with it.
Besides, why do that there, when streams can do it for you?
DbDetails.parallelStream().collect(Collectors.groupingBy(Detail::getId))

Related

Applying the `collec()` of stream for counting. Exercise

I am trying to create a custom collector in order to count valid elements of a list. I have done it using one of the already provide collectors:
arr.stream()
.filter(e -> e.matches("[:;][~-]?[)D]"))
.map(e -> 1)
.reduce(0, Integer::sum);
but as a challenge for myself, I wanted to create my own custom collector in order to understand it better. And this is where I got stuck.
It is probably something trivial but I am learning this and can't figure a supplier, an accumulator, and a combiner. I guess I still don't understand something about them. For instance, I have a similar stream:
arr1.stream()
.filter(e -> e.matches("[:;][~-]?[)D]"))
.map(e -> 1)
.collect(temporary array, adding to array, reduce);
AFAIK supplier is a function without arguments, which returns something. I studied standard examples and it is usually a method reference for a new collection, e.g. ArrayList::new. I tried to use constant 0, e -> 0 because I want to return a scalar. I think it is wrong because it makes the stream returning 0. If using method reference for a temporary collection, Java complains about a mismatch of types of a supplier and returning object. I am also confused about using an accumulator if the final result is a number as a combiner would reduce all elements to a number, e.g. (a,b) -> a + b`.
I'm completely stumped.
Probably part of your problem is that you cannot obviously create an accumulator for an Integer type since it is immutable.
You start with this:
System.out.println(IntStream.of(1,2,3).reduce(0, Integer::sum));
You can extend to this:
System.out.println(IntStream.of(1,2,3).boxed()
.collect(Collectors.reducing(0, (i1,i2)->i1+i2)));
Or even this, which has an intermediate mapping function
System.out.println(IntStream.of(1,2,3).boxed()
.collect(Collectors.reducing(0, i->i*2, (i1,i2)->i1+i2)));
You can get this far with your own Collector
Collector<Integer, Integer, Integer> myctry = Collector.of(
()->0,
(i1,i2)->{
// what to do here?
},
(i1,i2)->{
return i1+i2;
}
);
The accumulator is A function that folds a value into a mutable result container with mutable being the keyword here.
So, make a mutable integer
public class MutableInteger {
private int value;
public MutableInteger(int value) {
this.value = value;
}
public void set(int value) {
this.value = value;
}
public int intValue() {
return value;
}
}
And now:
Collector<MutableInteger, MutableInteger, MutableInteger> myc = Collector.of(
()->new MutableInteger(0),
(i1,i2)->{
i1.set(i1.intValue()+i2.intValue());
},
(i1,i2)->{
i1.set(i1.intValue()+i2.intValue());
return i1;
}
);
And then:
System.out.println(IntStream.of(1,2,3)
.mapToObj(MutableInteger::new)
.collect(myc).intValue());
Reference:
Example of stream reduction with distinct combiner and accumulator
EDIT: The finisher just does whatever with the final result. If you don't set it on purpose then it is set by default to IDENTITY_FINISH which is Function.identity() which says just to return the final result as is.
EDIT: If you're really desperate:
Collector<int[], int[], int[]> mycai = Collector.of(
()->new int[1],
(i1,i2)->i1[0] += i2[0],
(i1,i2)->{i1[0] += i2[0]; return i1;}
);
System.out.println(IntStream.of(1,2,3)
.mapToObj(v->{
int[] i = new int[1];
i[0] = v;
return i;
})
.collect(mycai)[0]);

How can I use a HashMap of List of String in Vala?

I am trying to use a HashMap of Lists of strings in Vala, but it seems the object lifecycle is biting me. Here is my current code:
public class MyClass : CodeVisitor {
GLib.HashTable<string, GLib.List<string>> generic_classes = new GLib.HashTable<string, GLib.List<string>> (str_hash, str_equal);
public override void visit_data_type(DataType d) {
string c = ...
string s = ...
if (! this.generic_classes.contains(c)) {
this.generic_classes.insert(c, new GLib.List<string>());
}
unowned GLib.List<string> l = this.generic_classes.lookup(c);
bool is_dup = false;
foreach(unowned string ss in l) {
if (s == ss) {
is_dup = true;
}
}
if ( ! is_dup) {
l.append(s);
}
}
Note that I am adding a string value into the list associated with a string key. If the list doesn't exist, I create it.
Lets say I run the code with the same values of c and s three times. Based some printf debugging, it seems that only one list is created, yet each time it is empty. I'd expect the list of have size 0, then 1, and then 1. Am I doing something wrong when it comes to the Vala memory management/reference counting?
GLib.List is the problem here. Most operations on GLib.List and GLib.SList return a modified pointer, but the value in the hash table isn't modified. It's a bit hard to explain why that is a problem, and why GLib works that way, without diving down into the C. You have a few choices here.
Use one of the containers in libgee which support multiple values with the same key, like Gee.MultiMap. If you're working on something in the Vala compiler (which I'm guessing you are, as you're subclassing CodeVisitor), this isn't an option because the internal copy of gee Vala ships with doesn't include MultiMap.
Replace the GLib.List instances in the hash table. Unfortunately this is likely going to mean copying the whole list every time, and even then getting the memory management right would be a bit tricky, so I would avoid it if I were you.
Use something other than GLib.List. This is the way I would go if I were you.
Edit: I recently added GLib.GenericSet to Vala as an alternative binding for GHashTable, so the best solution now would be to use GLib.HashTable<string, GLib.GenericSet<string>>, assuming you're okay with depending on vala >= 0.26.
If I were you, I would use GLib.HashTable<string, GLib.HashTable<unowned string, string>>:
private static int main (string[] args) {
GLib.HashTable<string, GLib.HashTable<unowned string, string>> generic_classes =
new GLib.HashTable<string, GLib.HashTable<unowned string, string>> (GLib.str_hash, GLib.str_equal);
for (int i = 0 ; i < 3 ; i++) {
string c = "foo";
string s = i.to_string ();
unowned GLib.HashTable<unowned string, string>? inner_set = generic_classes[c];
stdout.printf ("Inserting <%s, %s>, ", c, s);
if (inner_set == null) {
var v = new GLib.HashTable<unowned string, string> (GLib.str_hash, GLib.str_equal);
inner_set = v;
generic_classes.insert ((owned) c, (owned) v);
}
inner_set.insert (s, (owned) s);
stdout.printf ("container now holds:\n");
generic_classes.foreach ((k, v) => {
stdout.printf ("\t%s:\n", k);
v.foreach ((ik, iv) => {
stdout.printf ("\t\t%s\n", iv);
});
});
}
return 0;
}
It may seem hackish to have a hash table with the key and value having the same value, but this is actually a common pattern in C as well, and specifically supported by GLib's hash table implementation.
Moral of the story: don't use GLib.List or GLib.SList unless you really know what you're doing, and even then it's generally best to avoid them. TBH we probably would have marked them as deprecated in Vala long ago if it weren't for the fact that they're very common when working with C APIs.
Vala's new can be a little weird when used as a parameter. I would recommend assigning the new list to a temporary, adding it to the list, then letting it go out of scope.
I would also recommend using libgee. It has better handling of generics than GLib.List and GLib.HashTable.

Passing query parameters in Dapper using OleDb

This query produces an error No value given for one or more required parameters:
using (var conn = new OleDbConnection("Provider=..."))
{
conn.Open();
var result = conn.Query(
"select code, name from mytable where id = ? order by name",
new { id = 1 });
}
If I change the query string to: ... where id = #id ..., I will get an error: Must declare the scalar variable "#id".
How do I construct the query string and how do I pass the parameter?
The following should work:
var result = conn.Query(
"select code, name from mytable where id = ?id? order by name",
new { id = 1 });
Important: see newer answer
In the current build, the answer to that would be "no", for two reasons:
the code attempts to filter unused parameters - and is currently removing all of them because it can't find anything like #id, :id or ?id in the sql
the code for adding values from types uses an arbitrary (well, ok: alphabetical) order for the parameters (because reflection does not make any guarantees about the order of members), making positional anonymous arguments unstable
The good news is that both of these are fixable
we can make the filtering behaviour conditional
we can detect the category of types that has a constructor that matches all the property names, and use the constructor argument positions to determine the synthetic order of the properties - anonymous types fall into this category
Making those changes to my local clone, the following now passes:
// see https://stackoverflow.com/q/18847510/23354
public void TestOleDbParameters()
{
using (var conn = new System.Data.OleDb.OleDbConnection(
Program.OleDbConnectionString))
{
var row = conn.Query("select Id = ?, Age = ?", new DynamicParameters(
new { foo = 12, bar = 23 } // these names DO NOT MATTER!!!
) { RemoveUnused = false } ).Single();
int age = row.Age;
int id = row.Id;
age.IsEqualTo(23);
id.IsEqualTo(12);
}
}
Note that I'm currently using DynamicParameters here to avoid adding even more overloads to Query / Query<T> - because this would need to be added to a considerable number of methods. Adding it to DynamicParameters solves it in one place.
I'm open to feedback before I push this - does that look usable to you?
Edit: with the addition of a funky smellsLikeOleDb (no, not a joke), we can now do this even more directly:
// see https://stackoverflow.com/q/18847510/23354
public void TestOleDbParameters()
{
using (var conn = new System.Data.OleDb.OleDbConnection(
Program.OleDbConnectionString))
{
var row = conn.Query("select Id = ?, Age = ?",
new { foo = 12, bar = 23 } // these names DO NOT MATTER!!!
).Single();
int age = row.Age;
int id = row.Id;
age.IsEqualTo(23);
id.IsEqualTo(12);
}
}
I've trialing use of Dapper within my software product which is using odbc connections (at the moment). However one day I intend to move away from odbc and use a different pattern for supporting different RDBMS products. However, my problem with solution implementation is 2 fold:
I want to write SQL code with parameters that conform to different back-ends, and so I want to be writing named parameters in my SQL now so that I don't have go back and re-do it later.
I don't want to rely on getting the order of my properties in line with my ?. This is bad. So my suggestion is to please add support for Named Parameters for odbc.
In the mean time I have hacked together a solution that allows me to do this with Dapper. Essentially I have a routine that replaces the named parameters with ? and also rebuilds the parameter object making sure the parameters are in the correct order.
However looking at the Dapper code, I can see that I've repeated some of what dapper is doing anyway, effectively it each parameter value is now visited once more than what would be necessary. This becomes more of an issue for bulk updates/inserts.
But at least it seems to work for me o.k...
I borrowed a bit of code from here to form part of my solution...
The ? for parameters was part of the solution for me, but it only works with integers, like ID. It still fails for strings because the parameter length isn't specifed.
OdbcException: ERROR [HY104] [Microsoft][ODBC Microsoft Access Driver]Invalid precision value
System.Data.Odbc. OdbcParameter.Bind(OdbcStatementHandle hstmt,
OdbcCommand command, short ordinal, CNativeBuffer parameterBuffer, bool allowReentrance)
System.Data.Odbc.OdbcParameterCollection.Bind(OdbcCommand command, CMDWrapper cmdWrapper, CNativeBuffer parameterBuffer)
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, string method, bool needReader, object[] methodArguments, SQL_API odbcApiMethod)
System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, string method, bool needReader)
System.Data.Common.DbCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
Dapper.SqlMapper.QueryAsync(IDbConnection cnn, Type effectiveType, CommandDefinition command) in SqlMapper.Async.cs
WebAPI.DataAccess.CustomerRepository.GetByState(string state) in Repository.cs
var result = await conn.QueryAsync(sQuery, new { State = state });
WebAPI.Controllers.CustomerController.GetByState(string state) in CustomerController .cs
return await _customerRepo.GetByState(state);
For Dapper to pass string parameters to ODBC I had to specify the length.
var result = await conn.QueryAsync<Customer>(sQuery, new { State = new DbString { Value = state, IsFixedLength = true, Length = 4} });

multi level join in solr

Hi i have data in a 3 level tree structure. Can I use SOlr JOIN to get the root node when the user searches 3rd level node.
FOr example -
PATIENT1
-> FirstName1
-> LastName1
-> DOCUMENTS1_1
-> document_type1_1
-> document_description1_1
-> document_value1_1
-> CODE_ITEMS1_1_1
-> Code_id1_1_1
-> code1_1_1
-> CODE_ITEMS1_1_1
-> Code_id1_1_2
-> code1_1_2
-> DOCUMENTS1_2
-> document_type1_2
-> document_description1_2
-> document_value1_2
-> CODE_ITEMS1_2_1
-> Code_id1_2_1
-> code1_2_1
-> CODE_ITEMS1_2_2
-> Code_id1_2_2
-> code1_2_2
PATIENT2
-> FirstName2
-> LastName2
-> DOCUMENTS2_1
-> document_type2_1
-> document_description2_1
-> document_value2_1
-> CODE_ITEMS2_1_1
-> Code_id2_1_1
-> code2_1_1
-> CODE_ITEMS2_1_2
-> Code_id2_1_2
-> code2_1_2
I want to search a CODE_ITEM and return all the patient that matches the code items search criteria. How can this be done. Is it possible to implement join twice. First join gives all the documents for the code_item search and the next join gives all the Patient.
Something like in SQL query -
select * from patients where docID (select DOCID from DOCUMENTS where CODEID IN (select CODEID from CODE_ITEMS where CODE LIKE '%SEARCH_TEXT%'))
I really don't know how internally Solr joins work, but knowing that RDB multiple joins are extremely inefficient on large data sets, I'd probably end up writing my own org.apache.solr.handler.component.QueryComponent that would, after doing normal search, get root parent (of course, this approach requires that each child doc has a reference to its root patient).
If you choose to go this path I'll post some examples. I had similar (more complex - ontology) problem in one of my previous Solr projects.
The simpler way to go (simpler when it comes to solving this problem, not the whole approach) is to completely flatten this part of your schema and store all information (documents and code items) into its parent patient and just do a regular search. This is more in line with Solr (you have to look at Solr schema in a different way. It's nothing like your regular RDB normalized schema, Solr encourages data redundancy so that you may search blindingly fast without joins).
Third approach would be to do some joins testing on representative data sets and see how search performance is affected.
In the end, it really depends on your whole setup and requirements (and test results, of course).
EDIT 1:
I did this couple of years back, so you'll have to figure out whether things changed in the mean time.
1. Create custom request handler
To do completely clean job, I suggest you define your own Request handler (in solrconfig.xml) by simply copying the whole section that starts with
<requestHandler name="/select" class="solr.SearchHandler">
...
...
</requestHandler>
and then changing name to something meaningful to your users, like e.g. /searchPatients.
Also, add this part inside:
<arr name="components">
<str>patients</str>
<str>facet</str>
<str>mlt</str>
<str>highlight</str>
<str>stats</str>
<str>debug</str>
</arr>
2. Create custom search component
Add this to your solrconfig:
<searchComponent name="patients" class="org.apache.solr.handler.component.PatientQueryComponent"/>
Create PatientQueryComponent class:
The following source probably has errors since I edited my original source in text editor and posted it without testing, but the important thing is that you get recipe, not finished source, right? I threw out caching, lazy loading, prepare method and left only the basic logic. You'll have to see how the performance will be affected and then tweak the source if needed. My performance was fine, but I had a couple of million documents in total in my index.
public class PatientQueryComponent extends SearchComponent {
...
#Override
public void process(ResponseBuilder rb) throws IOException {
SolrQueryRequest req = rb.req;
SolrQueryResponse rsp = rb.rsp;
SolrParams params = req.getParams();
if (!params.getBool(COMPONENT_NAME, true)) {
return;
}
searcher = req.getSearcher();
// -1 as flag if not set.
long timeAllowed = (long)params.getInt( CommonParams.TIME_ALLOWED, -1 );
DocList initialSearchList = null;
SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
cmd.setTimeAllowed(timeAllowed);
cmd.setSupersetMaxDoc(UNLIMITED_MAX_COUNT);
// fire standard query
SolrIndexSearcher.QueryResult result = new SolrIndexSearcher.QueryResult();
searcher.search(result, cmd);
initialSearchList = result.getDocList();
// Set which'll hold patient IDs
List<String> patientIds = new ArrayList<String>();
DocIterator iterator = initialSearchList.iterator();
int id;
// loop through search results
while(iterator.hasNext()) {
// add your if logic (doc type, ...)
id = iterator.nextDoc();
doc = searcher.doc(id); // , fields) you can try lazy field loading and load only patientID filed value into the doc
String patientId = doc.get("patientID") // field that's in child doc and points to its root parent - patient
patientIds.add(patientId);
}
// All all unique patient IDs in TermsFilter
TermsFilter termsFilter = new TermsFilter();
Term term;
for(String pid : patientIds){
term = new Term("patient_ID", pid); // field that's unique (name) to patient and holds patientID
termsFilter.addTerm(term);
}
// get all patients whose ID is in TermsFilter
DocList patientsList = null;
patientsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
long totalSize = initialSearchList.size() + patientsList.size();
logger.info("Total: " + totalSize);
SolrDocumentList solrResultList = SolrPluginUtils.docListToSolrDocumentList(patientsList, searcher, null, null);
SolrDocumentList solrInitialList = SolrPluginUtils.docListToSolrDocumentList(initialSearchList, searcher, null, null);
// Add patients to the end of the list
for(SolrDocument parent : solrResultList){
solrInitialList.add(parent);
}
// replace initial results in response
SolrPluginUtils.addOrReplaceResults(rsp, solrInitialList);
rsp.addToLog("hitsRef", patientsList.size());
rb.setResult( result );
}
}
Take a look at this post: http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
Actually you can do it in SOLR 4.5

Find duplicate values in an arraylist without using LINQ

I have an arraylist with few duplicate items. I need to know the count of each duplicated item. I am using 2.0 so cannot use linq.
I had posted a similar question earlier, but my question was not clear.
Thanks
Prady
I've done something in the past. My solution was to loop through the ArrayList and store the counts in a dictionary. Then loop though the dictionary to display the results:
ArrayList list = new ArrayList();
list.Add(1);
list.Add("test");
list.Add("test");
list.Add("test");
list.Add(2);
list.Add(3);
list.Add(2);
Dictionary<Object, int> itemCount = new Dictionary<object, int>();
foreach (object o in list)
{
if (itemCount.ContainsKey(o))
itemCount[o]++;
else
itemCount.Add(o, 1);
}
foreach (KeyValuePair<Object, int> item in itemCount)
{
if (item.Value > 1)
Console.WriteLine(item.Key + " count: " + item.Value);
}
Output:
test count: 3
2 count: 2
Edit
Realized I used the var keyword which is not a 2.0 feature. Replaced it with KeyValuePair.
Option 1: Sort the list and then count adjacently Equal items (requires you to override the Equals method for your class)
Option 2: Use your unique identifier (however you're defining two objects to be equal) as the key for a Dictionary and add each of your objects to that entry.
i needed something similar for a project a long time ago, and made a function for it
static Dictionary<object, int> GetDuplicates(ArrayList list, out ArrayList uniqueList)
{
uniqueList = new ArrayList();
Dictionary<object, int> dups = new Dictionary<object, int>();
foreach (object o in list)
{
if (uniqueList.Contains(o))
if (!dups.ContainsKey(o))
dups.Add(o, 2);
else
dups[o]++;
else
uniqueList.Add(o);
}
return dups;
}

Resources