Java: is it possible to create two objects during different runtimes (different JVMs) with the same identity? - db4o

I am using DB4o and want to make sure I get one unique object back when I use code like this:
public List<Object> getListOfObjects(final Object o){
List<Object> result = db.query(new Predicate<Object>(){
#Override
public boolean match (Object arg0){
if(arg0.equals(o)){
return true;
}
else{
return false;
}
});
return result;
}
The List object "result" desirably should have no more than 1 element. However, isn't it possible that Java creates objects with the same identities during different runtimes (different JVMs)? If that could occur then it would mess up my database.
Surely there must be an answer to whether or not Java objects can have the same identities across JVMs.
-Alex

If you overwrite the .equals() method of your object, then its easy to have multiple instances of a object which is equals. The whole purpose of the equal() method is to compare two objects about the 'semantic/content" equality. It does not quarantine any uniqueness.
Now if you do not override any equal method, then the object identity is compared. (like using the == operator). The identity is unique in a JVM and there are never two objects with the same identity.
Btw/Offtopic: If you store thousands of object in db4o and use your query, it will be quite slow. More about that here.

Related

What's better, an initially empty non-nullable or an initially nullable and null container?

Let's say you're managing the list of serial numbers of the bicycles owned by residents in your building, with the objective of planning ahead to build additional safe bike storage.
Some people will of course have no bikes.
In Dart > 2.12 (with null safety) you could
use a non-nullable List<String> and initialize it to an empty list
class Resident {
String name;
List<String> bicycles = [];
}
or you could use a nullable List<String> and use null as a flag to signal that someone has no bikes.
class Resident {
String name;
List<String>? bicycles;
}
Both designs are of course workable, but does it turn out down the road that one is clearly better than the other—more idiomatic to the new Dart, for example? In other words, what's better, an initially empty non-nullable or an initially nullable and null container?
Even if I count the bits needed, it's not quite clear. There would be wasted storage to construct an empty list in the first case, but there is also storage wasted in the second case—though it's of an unknown, and perhaps implementation dependent—amount.
If you want to represent having none of something, then prefer non-nullable container types with an empty state to nullable types.
With a nullable container, you need to potentially need to deal with the container being empty anyway, and now you have to do extra work to check for null everywhere.
Meanwhile, dealing with an empty container often doesn't involve any extra work. Contrast:
// Nullable case.
final bicycles = resident.bicycles;
if (bicycles != null) {
for (var bicycle in bicycles) {
doSomething(bicycle);
}
}
with
/// Non-nullable case.
for (var bicycle in resident.bicycles) {
doSomething(bicycle);
}
You could try to reset references to null when the container becomes empty so that there aren't two cases to deal with, but as noted above, the empty case often is free anyway, so that'd be more work for usually no gain. Furthermore, resetting references can be a lot of extra work:
var list = [1, 2, 3];
mutateList(list);
if (list.isEmpty) {
list = null;
}
If mutateList could remove elements from list, then every caller would need to do extra work to replace empty Lists to null.
Even if you don't care to replace empty containers with null, you'd still have different behaviors when transitioning to a non-empty container. Consider:
var sam = Resident();
var samsBicycles = sam.bicycles;
sam.addBicycle();
What would you expect samsBicycles to be? If Resident.bicycles is initially null, then samsBicycles will remain null and will no longer refer to the same object as sam.bicycles.

Web Service Contributing ID Disambiguation

I work with a Web Service API that can pump through a generic type of Results that all offer certain basic information, most notably a unique ID. That unique ID tends to be--but is not required to be--a UUID defined by the sender, which is not always the same person (but IDs are unique across the system).
Fundamentally, the API results in something along the lines of this (written in Java, but the language should be irrelevant), where only the base interface represents common details:
interface Result
{
String getId();
}
class Result1 implements Result
{
public String getId() { return uniqueValueForInstance; }
public OtherType1 getField1() { /* ... */ }
public OtherType2 getField2() { /* ... */ }
}
class Result2 implements Result
{
public String getId() { return uniqueValueForInstance; }
public OtherType3 getField3() { /* ... */ }
}
It's important to note that each Result type may represent a completely different kind of information. Some of it cannot be correlated with other Results, and some of it can, whether or not they have identical types (e.g., Result1 may be able to be correlated with Result2, and therefore vice versa, but some ResultX may exist that cannot be correlated because it represents different information).
We are currently implementing a system that receives some of those Results and correlates them where possible, which generates a different Result object that is a container of what it correlated together:
class ContainerResult implements Result
{
public String getId() { return uniqueValueForInstance; }
public Collection<Result> getResults() { return containedResultsList; }
public OtherType4 getField4() { /* ... */ }
}
class IdContainerResult implements Result
{
public String getId() { return uniqueValueForInstance; }
public Collection<String> getIds() { return containedIdsList; }
public OtherType4 getField4() { /* ... */ }
}
These are two containers, which present different use cases. The first, ContainerResult, allows someone to receive the correlated details as well as the actual complete, correlated data. The second, IdContainerResult, sacrifices the complete listing in favor of bandwidth by only sending the associated IDs. The system doing the correlating is not necessarily the same as the client, and the client can receive Results that those IDs would represent, which is intended to allow them to show correlations on their system by simply receiving the IDs.
Now, my problem may be non-obvious to some, and it may be obvious to others: if I send only the ID as part of the IdContainerResult, then how does the client know how to match the Result on their end if they do not have a single ID-store? The types of data that are actually represented by each Result implementation lend themselves to being segregated when they cannot be correlated, which means that a single ID-store is unlikely in most situations without forcing a memory or storage burden.
The current solution that we have come up with entails creating a new type of ID, we'll call it TypedId, which combines the XML Namespace and XML Name from each Result with the Result's ID.
My main problem with that solution is that it requires either maintaining a mutable collection of types that is updated as they are discovered, or prior knowledge of all types so that the ID can be properly associated on any client's system. Unfortunately, I cannot come up with a better solution, but the current solution feels wrong.
Has anyone faced a similar situation where they want associate generic Results with its original type, particularly with the limitations of WSDLs in mind, and solved it in a cleaner way?
Here's my suggestion:
You want to have "the client know how to match the Result on their end". So include in your response an extra discriminator field called "RequestType", a String.
You want to avoid "maintaining a mutable collection of types that is updated as they are discovered, or prior knowledge of all types so that the ID can be properly associated on any client's system". Obviously, each client request call DOES know what area of processing the Result will relate to. So you can have the client pass the "RequestType" string in as part of the request. As long as the RequestType is a unique string for each different type of client request, your service can process and correlate it without hard-coding any knowledge.
Here's one possible example of java classes for request and response messages (i.e. not the actual service endpoint):
interface Request {
String getId();
String getRequestType();
// anything else ...
}
interface Result {
String getId();
String getRequestType();
}
class Result1 implements Result {
public String getId() { return uniqueValueForInstance; }
public OtherType1 getField1() { /* ... */ }
public OtherType2 getField2() { /* ... */ }
}
class Result2 implements Result {
public String getId() { return uniqueValueForInstance; }
public OtherType3 getField3() { /* ... */ }
}
Here's the gotcha. (2) and (3) above do not give a completely dynamic solution. You want your service to be able to return a flexible record structure relating to each different request. You have the following options:
4A) In XSD, declare Result as a singular strongly-typed variant record type, and in WSDL return Result from a single service endpoint and single operation. The XSD will still need to hardcode the values for the discriminator element when declaring variant record structure.
4B) In XSD, declare multiple strongly-typed unique types Result1, Result2, etc for each possible client request. In WSDL, have a multiple uniquely named operations to return each one of these. These operations can be across one or many service endpoints - or even across multiple WSDLs. While this avoids hard coding the request type as a specific field per se, it is not actually a generic client-independent solution because you are still explicitly hard-coding to discriminate each request type by creating a uniquely name for each result type and each operation. So any apparent dynamism is a mirage.
4C) In XSD, define a flexible generic data structure that is not variant, but has plenty of generally named fields that could be able to handle all possible results required. Example fields could be "stringField1", "stringField2", "integerField1", "dateField1058", etc. i.e. use extremely weak typing and put the burden on the client to magically know what data is in each field. This option may be very generic, but it is usually considered terrible practice. It is inelegant, pretty unreadable, error prone and has limitations/assumptions built in anyway - how do you know you have enough generic fields included? In your case, (4A) is probably the best option.
4D) Use flexible XSD schema design tactics - type substitutability and use of "any" element. See http://www.xfront.com/ExtensibleContentModels.html.
4E) Use the #Produces #SomeQualifier annotations against your own factory class method which creates a high level type. This tells CDI to always use this method to construct the specificied bean type & qualifier. Your factory method can have fancy logic to decide which specific low-level type to construct upon each call. #SomeQualifier can have additional parameters to give guidance towards selecting the type. This potentially reducing the number of qualifiers to just one.
If you use (4D) you will have a flexible service endpoint design that can deal with changing requirements quite effectively. BUT your service implementation still needs to implement the flexible behaviour to decide which results fields to return for each request. Fact is, if you have a logical requirement for varying data structures, your code must know how to process these data structures for each separate request, so must depend on some form of RequestType / unique operation names to discriminate. Any goal of completely dynamic processing (without adapting to each client's needs for results data) is over-ambitious.

System.InvalidOperationException when trying to iteratively add objects using EF 4

This question is very similiar to this one. However, the resolution to that question:
Does not seem to apply, or
Are somewhat suspect, and don't seem like a good approach to resolving the problem.
Basically, I'm iterating over a generic list of objects, and inserting them. Using MVC 2, EF 4 with the default code generation.
foreach(Requirement r in requirements)
{
var car = new CustomerAgreementRequirement();
car.CustomerAgreementId = viewModel.Agreement.CustomerAgreementId;
car.RequirementId = r.RequirementId;
_carRepo.Add(car); //Save new record
}
And the Repository.Add() method:
public class BaseRepository<TEntity> : IRepository<TEntity> where TEntity : class
{
private TxRPEntities txDB;
private ObjectSet<TEntity> _objectSet;
public void Add(TEntity entity)
{
SetUpdateParams(entity);
_objectSet.AddObject(entity);
txDB.SaveChanges();
}
I should note that I've been successfully using the Add() method throughout my code for single inserts; this is the first time I've tried to use it to iteratively insert a group of objects.
The error:
System.InvalidOperationException: The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.
As stated in the prior question, the EntityKey is set to True, StoreGeneratedPattern = Identity. The actual table that is being inserted into is a relationship table, in that it is comprised of an identity field and two foreign key fields. The error always occurs on the second insert, regardless of whether that specific entity has been inserted before or not, and I can confirm that the values are always different, no key conflicts as far as the database is concerned. My suspicion is that it has something to do with the temporary entitykey that gets set prior to the actual insert, but I don't know how to confirm that, nor do I know how to resolve it.
My gut feeling is that the solution in the prior question, to set the SaveOptions to None, would not be the best solution. (See prior discussion here)
I've had this issue with my repository using a loop as well and thought that it might be caused by some weird race-like condition. What I've done is refactor out a UnitOfWork class, so that the repository.add() method is strictly adding to the database, but not storing the context. Thus, the repository is only responsible for the collection itself, and every operation on that collection happens in the scope of the unit of work.
The issue there is that: In a loop, you run out of memory damn fast with EF4. So you do need to store the changes periodically, I just don't store after every save.
public class BaseRepository : IRepository where TEntity : class
{
private TxRPEntities txDB;
private ObjectSet _objectSet;
public void Add(TEntity entity)
{
SetUpdateParams(entity);
_objectSet.AddObject(entity);
}
public void Save()
{
txDB.SaveChanges();
}
Then you can do something like
foreach(Requirement r in requirements)
{
var car = new CustomerAgreementRequirement();
car.CustomerAgreementId = viewModel.Agreement.CustomerAgreementId;
car.RequirementId = r.RequirementId;
_carRepo.Add(car); //Save new record
if (some number limiting condition if you have thousands)
_carRepo.Save(); // To save periodically and clear memory
}
_carRepo.Save();
Note: I don't really like this solution, but I hunted around to try to find why things break in a loop when they work elsewhere, and that's the best I came up with.
We have had some odd collision issues if the entity is not added to the context directly after being created (before doing any assignments). The only time I've noticed the issue is when adding objects in a loop.
Try adding the newed up entity to the context, do the assignments, then save the context. Also, you don't need to save the context each time you add a new entity unless you absolutely need the primary key.

Entity Framework 4 Code First and the new() Operator

I have a rather deep hierarchy of objects that I'm trying to persist with Entity Framework 4, POCO, PI (Persistence Ignorance) and Code First. Suddenly things started working pretty well when it dawned on me to not use the new() operator. As originally written, the objects frequently use new() to create child objects.
Instead I'm using my take on the Repository Pattern to create all child objects as needed. For example, given:
class Adam
{
List<Child> children;
void AddChildGivenInput(string input) { children.Add(new Child(...)); }
}
class Child
{
List<GrandChild> grandchildren;
void AddGrandChildGivenInput(string input) { grandchildren.Add(new GrandChild(...)); }
}
class GrandChild
{
}
("GivenInput" implies some processing not shown here)
I define an AdamRepository like:
class AdamRepository
{
Adam Add()
{
return objectContext.Create<Adam>();
}
Child AddChildGivenInput(Adam adam, string input)
{
return adam.children.Add(new Child(...));
}
GrandChild AddGrandchildGivenInput(Child child, string input)
{
return child.grandchildren.Add(new GrandChild(...));
}
}
Now, this works well enough. However, I'm no longer "ignorant" of my persistence mechanism as I have abandoned the new() operator.
Additionally, I'm at risk of an anemic domain model since so much logic ends up in the repository rather than in the domain objects.
After much adieu, a question:
Or rather several questions...
Is this pattern required to work with EF 4 Code First?
Is there a way to retain use of new() and still work with EF 4 / POCO / Code First?
Is there another pattern that would leave logic in the domain object and still work with EF 4 / POCO / Code First?
Will this restriction be lifted in later versions of Code First support?
Sometimes trying to go the POCO /
Persistence Ignorance route feels like
swimming upstream, other times it feels
like swimming up Niagra Falls. Still, I want to believe...
Here are a couple of points that might help answer your question:
In your classes you have a field for the children collection and a method to add to the children. EF in general (not just Code First) currently requires that collections are surface as properties, so this pattern is not currently supported. More flexibility in how we interact with classes is a common ask for EF and our team is looking at how we can support this at the moment
You mentioned that you need to explicitly register entities with the context, this isn’t necessarily the case. In the following example if GetAdam() returned a Adam object that is attached to the underlying context then the new child Cain would be automatically discovered by EF when you save and inserted into the database.
var adam = myAdamRepository.GetAdam();
var cain = new Child();
adam.Children.Add(cain);
~Rowan

Repository Interface - Available Functions & Filtering Output

I've got a repository using LINQ for modelling the data that has a whole bunch of functions for getting data out. A very common way of getting data out is for things such as drop down lists. These drop down lists can vary. If we're creating something we usually have a drop down list with all entries of a certain type, which means I need a function available which filters by the type of entity. We also have pages to filter data, the drop down lists only contain entries that currently are used, so I need a filter that requires used entries. This means there are six different queries to get the same type of data out.
The problem with defining a function for each of these is that there'd be six functions at least for every type of output, all in one repository. It gets very large, very quick. Here's something like I was planning to do:
public IEnumerable<Supplier> ListSuppliers(bool areInUse, bool includeAllOption, int contractTypeID)
{
if (areInUse && includeAllOption)
{
}
else if (areInUse)
{
}
else if (includeAllOption)
{
}
}
Although "areInUse" doesn't seem very English friendly, I'm not brilliant with naming. As you can see, logic resides in my data access layer (repository) which isn't friendly. I could define separate functions but as I say, it grows quite quick.
Could anyone recommend a good solution?
NOTE: I use LINQ for entities only, I don't use it to query. Please don't ask, it's a constraint on the system not specified by me. If I had the choice, I'd use LINQ, but I don't unfortunately.
Have your method take a Func<Supplier,bool> which can be used in Where clause so that you can pass it in any type of filter than you would like to construct. You can use a PredicateBuilder to construct arbitrarily complex functions based on boolean operations.
public IEnumerable<Supplier> ListSuppliers( Func<Supplier,bool> filter )
{
return this.DataContext.Suppliers.Where( filter );
}
var filter = PredicateBuilder.False<Supplier>();
filter = filter.Or( s => s.IsInUse ).Or( s => s.ContractTypeID == 3 );
var suppliers = repository.ListSuppliers( filter );
You can implement
IEnumerable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will retrieve all suppliers from the database that are then filtered using LINQ.
Assuming you are using LINQ to SQL you can also implement
IQueryable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will only retrieve the necessary suppliers from the database when the collection is enumerated. This is very powerful and there are also some limits to the LINQ you can use. However, the biggest problem is that you are able to drill right through your data-access layer and into the database using LINQ.
A query like
var query = from supplier in repository.GetAllSuppliers()
where suppliers.Name.StartsWith("Foo") select supplier;
will map into SQL similar to this when it is enumerated
SELECT ... WHERE Name LIKE 'Foo%'

Resources