Updating document in solr cloud 4.4 - solr4

I have a solr cloud 4.4 set up with two shards.The shards are in two different machine. I have successfully indexed documents using CloudSolrServer of solr client 4.2.0. However while updating the documents some of the documents are getting updated while others are not getting updated and the exception is "org.apache.solr.common.SolrException: No active slice servicing hash code f02c79de in DocCollection(myindex)={" where "myindex" is the name of my index.
The code for updating the documents is as follows:
CloudSolrServer server = new CloudSolrServer("myip:9983");
server.setDefaultCollection("myindex");
List<SolrInputDocument> solrInputDocsList = new ArrayList<SolrInputDocument>();
SolrInputDocument solrInputDoc = new SolrInputDocument();
List<String> servicesList = new ArrayList<String>();
servicesList.add("hockey1");
servicesList.add("blogger1");
Map<String, List<String>> services = new HashMap<String, List<String>>();
services.put("set", servicesList);
solrInputDoc.addField("services",services);
solrInputDoc.addField("id",id);
solrInputDocsList.add(solrInputDoc);
server.add(solrInputDocsList);
server.commit();
My Question are:
1) When document are indexed how it is decides which document goes to which shard.
2) Why the update for some of the documents are failing.

For question 1, check this article
http://searchhub.org/2013/06/13/solr-cloud-document-routing/
The idea is that you can tell SolrCloud to put some documents together by generating the Id field following this pattern
shard_key!document_id
Everything that shares the shard_key will be placed in the same shard
For question 2, I had the same problem and I fixed it by cleaning the configuration Zookeeper had stored
Regards,
Randall Rosales

Related

Phonograph2:ReadOnlyTables error in Slate writeback query

I'm working with a sample dataset of airports as I continue exploring Slate features for my team. I copied the default airport dataset into my files, so this is a version that I fully own (so presumably no permission issues there). The dataset is properly available in my Slate application since I'm also using it to display and filter data via Phonograph2 queries.
Based on the Phonograph2 docs, I created a new query to add a new airport to the dataset. I'm using the "Table Storage Service" and the "Post Event" endpoint. As a test, I configured my tableEditedEventPostRequest request as:
{
"primaryKey": {
"airport": "ABC"
},
"payload": {
"type": "rowAdded",
"rowAdded": {
"columns": {
"display_name": "[ABC] My New Airport"
}
}
}
}
(Once I get this working I'd switch the values out with dynamic values from widgets.)
When I run a test of this query, I get this error response:
{
"errorCode":"INVALID_ARGUMENT",
"errorName":"Phonograph2:ReadOnlyTables",
"errorInstanceId":"17ec990d-5d58-479d-a1b6-5ad033c8c808",
"parameters":{
"tableRids":"[ri.phonograph2.main.table.f3f33f6e-801a-4454-98e9-f2df5f170559]",
"dataInputLocatorRids":"[ri.foundry.main.dataset.6add7c46-d3c9-4056-89b6-a19dbe461ed4]"
}
}
I'm not finding anything about this error or anything in the docs (so far) about the target dataset being configured as read-only. There aren't settings I can find on the dataset to make it more permissive and I'm already the owner of the dataset. I'd appreciate any insights or tips to get past this road block.
For a Phonograph Table to be "editable" it needs to be associated with a writeback dataset. If you created the sync through the Ontology, which it seems like you did not, you would do this on the "Datasources" configuration tab.
Since it sounds like you created the sync directly from the Dataset Details view (or maybe through the Slate Datasets tab), you should have an option in that configuration to create a new dataset for writeback. All you should need to do is provide a dataset name and folder location.

Why is lastModifiedTime in DistributedRegionMXBean is not updated if data is deleted from a region?

When I add an entry to a region or when a server is created, my region statistics(Eg. lastModifiedTime) gets updated. But when I delete an entry, my region's lastModifiedTime doesn't change.
Is this a bug with Gemfire's statistics?
While creating region, I also set enable-statistics=true
I am using below command with GFSH
show metrics --region=region_name
With code, I have connected using JMX,
final DistributedRegionMXBean rBean = JMX.newMXBeanProxy(mbeanConnection, objName,DistributedRegionMXBean.class);
final long modifiedTime = rBean.getLastModifiedTime();
Basically I am trying get modified regions by specifying a datetime.
I know this is an old question, but I've just tried this locally with the latest Apache Geode version and it seems to be failing as well. I've created GEODE-7764 to track the issue.
Cheers.

BuildHTTPClient not able to get Build Definition Steps?

We are using the BuildHTTPClient to programmatically create a copy of a build definition, update the variables in memory and then save the updated object as a new definition.
I'm using Microsoft.TeamFoundation.Build2.WebApi.BuildHTTPClient 16.141. The TFS version is 17 update 3 (rest api 3.x)
This is a similar question to https://serverfault.com/questions/799607/tfs-buildhttpclient-updatedefinition-c-example but I'm trying to stay within using the BuildHttpClient libraries and not go directly to the RestAPIs.
The problem is the Steps list is always null along with other properties even though we have them in the build definition.
UPDATE Posted as an answer below
After looking at #Daniel Frosts attempt below we started looking at using older versions of the NuGet package. Surprisingly the supported version 15.131.1 does not support this but we have found out that the version="15.112.0-preview" does.
After rolling back all of our Dlls to match that version the steps were cloned when saving the new copy of the build.
All of the code examples we used work when you are using this package. We were unable to get Daniel's example working but the version of the Dll was the issue.
We need to create a GitHub issue and report it to MS
First Attempt - GetDefinitionAsync:
VssConnection connection = new VssConnection(DefinitionTypesDTO.serverUrl, new VssCredentials());
BuildHttpClient bdClient = connection.GetClient<BuildHttpClient>();
Task <BuildDefinition> resultDef = bdClient.GetDefinitionAsync(DefinitionTypesDTO.teamProjectName, buildID);
resultDef.Wait();
BuildDefinition updatedDefinition = UpdateBuildDefinitionValues(resultDef.Result, dr, defName);
updatedTask = bdClient.CreateDefinitionAsync(updatedDefinition, DefinitionTypesDTO.teamProjectName);
The update works on the variables and we can save the updated definition back to TFS but there are not any tasks in the newly created build definition. When we look at the object that is returned from GetDefinitionAsync we see that the Steps list is empty. It looks like GetDefinitionAsync just doesn't get the full object.
Second Attempt - Specific Revision:
int rev = 9;
Task <BuildDefinition> resultDef = bdClient.GetDefinitionAsync(DefinitionTypesDTO.teamProjectName, buildID, revision: rev);
resultDef.Wait();
BuildDefinition updatedDefinition = UpdateBuildDefinitionValues(resultDef.Result, dr, defName);
Based on SteveSims post we were thinking we are not getting the correct revision. So we added revision to the request. I see the same issue with the correct revision. Similarly to SteveSims post I can open the DefinitionURL in a browser and I see that the tasks are in the JSON in the browser but the BuildDefinition object is not populated with them.
Third Attempt - GetFullDefinition:
So then I thought to try getFullDefinition, maybe that's that "Full" means of course with out any documentation on these libraries I have no idea.
var task2 = bdClient.GetFullDefinitionsAsync(DefinitionTypesDTO.teamProjectName, "MyBuildDefName","$/","TfsVersionControl");
task2.Wait();
Still no luck, the Steps list is always null even though we have steps in the build definition.
Fourth Attempt - Save As Template
var task2 = bdClient.GetTemplateAsync DefinitionTypesDTO.teamProjectName, "1_Batch_Dev");
task2.Wait();
I tried saving the Build Definition off as a template. So in the Web UI I chose "Save as Template", still no steps.
Fifth Attempt: Using the URL as mentioned in SteveSims post:
Finally i said ok, i'll try the solution SteveSims used, using the webclient to get the object from the URL.
var client = new WebClient();
client.UseDefaultCredentials = true;
var json = client.DownloadString(LastDefinitionUrl);
//Convert the JSON to an actual builddefinition
BuildDefinition result = JsonConvert.DeserializeObject<BuildDefinition>(json);
This also didn't work. The build definition steps are null. Even when looking at the Json object (var json) i see the steps. But the object is not loaded with them.
I've seen this post which seems to add the Steps to the base definition, i've tried this but honestly I'm having an issue understanding how he has modified the BuildDefinition Object when referencing that via NuGet?
https://dennisdel.com/blog/getting-build-steps-with-visual-studio-team-services-.net-api/
After looking at #Daniel Frosts attempt below we started looking at using older versions of the NuGet package. Surprisingly the supported version 15.131.1 does not support this but we have found out that the version="15.112.0-preview" does.
After rolling back all of our Dlls to match that version the steps were cloned when saving the new copy of the build.
All of the code examples above work when you are using this package. We were unable to get Daniel's example working but we didn't try hard as we had working code.
We need to create a GitHub issue for this.
Found this in my code, which works.
Use this package, not sure if it could have an impact (joke).
...packages\Microsoft.TeamFoundationServer.Client.15.112.1\lib\net45\Microsoft.TeamFoundation.Build2.WebApi.dll
private Microsoft.TeamFoundation.Build.WebApi.BuildDefinition GetBuildDefinition(string projectName, string buildDefinitionName)
{
var buildDefinitionReferences = _buildHttpClient.GetFullDefinitionsAsync(projectName, "*", null, null, DefinitionQueryOrder.DefinitionNameAscending, top: 1000).Result;
return buildDefinitionReferences.SingleOrDefault(x => x.Name == buildDefinitionName && x.DefinitionQuality != DefinitionQuality.Draft);
}
With the newer clients Steps will always be empty. In newer api-versions (which are used by the newer clients) the steps have moved to Phases. If you use GetDefinitions or GetFullDefinitions and look in
definition.Process.Phases[0].Steps
you'll find them. (GetDefinitions gets shallow references so the process won't be included.)
The Steps collection still exists for compatibility reasons (we don't want apps to crash with stuff like MethodNotFoundExceptions) but it won't be populated.
I was having this problem, although I able to get Phases[0] information at runtime, but could not get it at design time. I solved this problem using dynamic type.
dynamic process = buildDefTemplate.Process;
foreach (BuildDefinitionStep tempStep in process.Phases[0].Steps)
{
// do some work here
}
Not, it is working!
Microsoft.TeamFoundationServer.Client version 16.170.0 I can get build steps through process.Phases[0].Steps only with process and step being dynamic as #whitecore above stated
var definitions = buildClient.GetFullDefinitionsAsync(project: project.Name);
foreach (var definition in definitions.Result)
{
Console.WriteLine(string.Format("\n {0} - {1}:", definition.Id, definition.Name));
dynamic process = definition.Process;
foreach (dynamic step in process.Phases[0].Steps)
{
Console.WriteLine(step.DisplayName);
}
}

Neo4j Java VM Tuning (v2.3 Community)

From what I can tell I'm having an issue with my Neo4j v2.3 Community Java VM adding items to the Old Gen Heap and never being able to Garbage Collecting them.
Here is a detailed outline of the situation.
I have a PHP file which calls the Dropbox Delta API and writes out the file structure to my Neo4j Database. Each call to Delta returns a 2000 Item data sets of which I pull out the information I need, the following is an example of what that query looks like with just one item, usually I send in full batches of 2000 items as it gave me the best results.
***Following is an example Query***
MERGE (c:Cloud { type:'Dropbox', id_user:'15', id_account:''})
WITH c
UNWIND [
{ parent_shared_folder_id:488417928, rev:'15e1d1caa88',.......}
]
AS items MERGE (i:Item { id:items.path, id_account:'', id_user:'15', type:'Dropbox' })
ON Create SET i = { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
ON Match SET i+= { id:items.path, id_account:'', id_user:'15', is_dir:items.is_dir, name:items.name, description:items.description, size:items.size, created_at:items.created_at, modified:items.modified, processed:1446769779, type:'Dropbox'}
MERGE (p:Item {id_user:'15', id:items.parentPath, id_account:'', type:'Dropbox'})
MERGE (p)-[:Contains]->(i)
MERGE (c)-[:Owns]->(i)
***The query is sent via Everyman****
static function makeQuery($client, $qry) {
return new Everyman\Neo4j\Cypher\Query($client, $qry);
}
This works fine and generally from start to finish takes 8-10 seconds to run.
The Dropbox account I am accessing contains around 35000 items, and takes around 18 runs of my PHP to populate my Neo4j Database with the folder/file structure of the dropbox account.
With every run of this PHP, around 50mb of items are added to the Neo4j JVM Old Gen heap, 30mb of that is not removed by GC.
The end result is obviously the VM runs out of memory and gets stuck in a constant state of GC throttling.
I have tried a range of Neo4j VM settings, as well as an update from Neo4j v2.2.5 to v2.3, which actually has appeared to make the problem worse.
My current settings are as follows,
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=3072m
-XX:MaxNewSize=3072m
-XX:SurvivorRatio=1
I am testing on a windows 10 PC with 8GB of ram and an i5 2.5GHz quad core. Java 1.8.0_60
Any info on how to solve this issue would be greatly appreciated.
Cheers, Jack.
Reduce the new size to 1024m
change your settings to:
-server
-Xms4096m
-Xmx4096m
-XX:NewSize=1024m
It is most likely that the size of your tx grows too large.
I recommend sending each of the parents in separately, so instead of the UNWIND sent one statement each.
Make sure to use the new transactional http endpoint, I recommend to go wit neoclient instead of Neo4jPHP
You should also use parameters instead of literal values!!!
And don't repeeat user-id and type etc. properties on every item.
Are you sure you want to connect everything to c not just the root of the directory structure? I would do the latter.
MERGE (c:Cloud:Dropbox { id_user:{userId}})
MERGE (p:Item:Dropbox {id:{parentPath}})
// owning the parent should be good enough
MERGE (c)-[:Owns]->(p)
WITH c
UNWIND {items} as item
MERGE (i:Item:Dropbox { id:item.path})
ON Create SET i += { is_dir:item.is_dir, name:item.name, created_at:item.created_at }
SET i += { description:item.description, size:item.size, modified:items.modified, processed:timestamp()}
MERGE (p)-[:Contains]->(i);
Make sure to use 2.3.0 for best MERGE performance for relationships.

Neo4j: Inserting 7k nodes is slow (Spring Data Neo4j / SpringRestGraphDatabase)

I'm building an application where my users can manage dictionaries. One feature is uploading a file to initialize or update the dictionary's content.
The part of the structure I'm focusing on for a start is Dictionary -[:CONTAINS]->Word.
Starting from an empty database (Neo4j 1.9.4, but also tried 2.0.0M5), accessed via Spring Data Neo4j 2.3.1 in a distributed environment (therefore using SpringRestGraphDatabase, but testing with localhost), I'm trying to load 7k words in 1 dictionary. However I can't get it done in less than 8/9 minutes on a linux with core i7, 8Gb RAM and SSD drive (ulimit raised to 40000).
I've read lots of posts about loading/inserting performance using REST and I've tried to apply the advices I found but without better luck. The BatchInserter tool doesn't seem to be a good option to me due to my application constraints.
Can I hope to load 10k nodes in a matter of seconds rather than minutes ?
Here is the code I came up with, after all my readings :
Map<String, Object> dicProps = new HashMap<String, Object>();
dicProps.put("locale", locale);
dicProps.put("category", category);
Dictionary dictionary = template.createNodeAs(Dictionary.class, dicProps);
Map<String, Object> wordProps = new HashMap<String, Object>();
Set<Word> words = readFile(filename);
for (Word gw : words) {
wordProps.put("txt", gw.getTxt());
Word w = template.createNodeAs(Word.class, wordProps);
template.createRelationshipBetween(dictionary, w, Contains.class, "CONTAINS", true);
}
I resolve such problem by just creating some CSV file and after that read it from Neo4j. It is needed to make such steps:
Write some class which get input data and base on it create CSV file (it can be one file per node kind or even you can create file which will be used to build relation).
In my case I have also create servlet which allow Neo4j to read that file by HTTP.
Create proper Cypher statements which allow to read and parse that CSV file. There are some samples of which I use (if you use Spring Data also remember about labels):
simple one:
load csv with headers from {fileUrl} as line
merge (:UserProfile:_UserProfile {email: line.email})
more complicated:
load csv with headers from {fileUrl} as line
match (c:Calendar {calendarId: line.calendarId})
merge (a:Activity:_Activity {eventId: line.eventId})
on create set a.eventSummary = line.eventSummary,
a.eventDescription = line.eventDescription,
a.eventStartDateTime = toInt(line.eventStartDateTime),
a.eventEndDateTime = toInt(line.eventEndDateTime),
a.eventCreated = toInt(line.eventCreated),
a.recurringId = line.recurringId
merge (a)-[r:EXPORTED_FROM]->c
return count(r)
Try the below
Usw native Neo4j API rather than spring-data-neo4j while performing batch operations.
Commit in batches i.e. may be for each 500 words
NOTE: There are certain properties (type) added by SDN which will be missing when using the native approach.

Resources