data when using DSE Solr - datastax-enterprise

From Cassandra Datastax documentation: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/search/customizeSchemaSearch.html, it is written:
Fields with indexed="true" are indexed and stored as secondary files in Lucene so that the fields are searchable. The indexed fields are stored in the database, not in Lucene, regardless of the value of the stored attribute value, with the exception of copy fields. Copy field destinations are not stored in the database.
I'd like to know from where the data is taken when running a CQL Solr Request (e.g. SELECT first_name, last_name FROM individual where solr_query=...).
first_name and last_name are fetched from the cassandra database or from the Solr index that store the field as well ?
I don't get the "Fields with indexed="true" are indexed and stored as secondary files in Lucene" and then "The indexed fields are stored in the database, not in Lucene" which seems contradictory ?
Thanks for your help !

When you issue
SELECT first_name, last_name FROM individual WHERE solr_query=...;
first_name and last_name must exist in the Cassandra table individual, they are not stored in Lucene (as the documentation states).
The line of documentation should read:
Fields with indexed="true" are indexed and stored as secondary index files in Lucene because the Solr / Lucene integration in DSE uses Cassandra secondary index implementation.
These index files are searched and a set of unique IDs returned that are used to read rows from the Cassandra table.

Related

Get the schema from an SqlEntityConnection

I want to fetch information from 1 table (Using OdbcConnection to fetch information) and insert it within a sql server that I connect via the SqlEntityConnection type provider.
I want to do a simple validation where I will compare the columns name and columns type of the tables in both databases. I've been looking around and can't find a way to fetch the columns name and columns type from
let EntityConnection = SqlEntityConnection<ConnectionString="...">
I am able to fetch information from the table I want but I don't think I can access INFORMATION_SCHEMA, so I'm a little bit loss as to how I can fetch this information.
Thank you.

solr join on multiple cores in solr 4.6.1

I'm trying to write a join query between two solr cores that are on the same jvm. A very simple description of the cores: categories containing id (int field) and keywords (multivalued text field) and firma containing information about companies whit one field categ_id (multivalued int field). What I'm trying to get is the id's of the companies that have the searched keyword, but first I just want to see all companies from a given category by id. The query look like that:
catDEkw/select?q=*:*&wt=json&indent=true&fl=*,score&fq={!join from=id to=cf_cs_ids fromIndex=searchDEbis}cf_cs_ids:926
where:
catDEkw is the categories core whit id as categories id
searchDEbis is the core that contains info about companies and have the cf_cs_ids field (which is a multivalued field)
No results. Am I doing something wrong? Or the problem is the field type after the join was made?
Thanks in advance!
I could be wrong but if I understand correctly, the from should be cf_cs_ids. i.e;
catDEkw/select?q=:&wt=json&indent=true&fl=*,score&fq={!join
from=cf_cs_ids to=id fromIndex=searchDEbis}cf_cs_ids:926

What are Indexes in the Xcode Core-Data data model inspector

In Xcode you can add "Indexes" for an entity in the data model inspector.
For the screenshot I did hit "add" twice so "comma,separated,properties" is just the default value.
What exactly are those indexes?
Do they have anything to do with indexed attributes? And if they have what is the difference between specifying the Indexes in this inspector and selecting "Indexed" for the individual attribute?
Optimizing Core Data searches and sorts
As the title says, indexing is to speed up searching and sorting your database. However it slows down saving changes to persistant store. It matters when you are using NSPredicate and NSSortDescriptor objects within your query.
Let's say you have two entities: PBOUser and PBOLocation (many to many). You can see its properties at the image below:
Suppose that in database there is 10,000 users, and 50,000 locations. Now we need to find every user with email starting on a. If we provide such query without indexing, Core Data must check every record (basically 10,000).
But what if it is indexed (in other words sorted by email descending)? --> Then Core Data checks only those records started with a. If Core Data reaches b then it will stop searching because it is obvious that there are no more records whose email starts with a since it is indexed.
How to enable indexing on a Core Data model from within Xcode:
or:
Hopefully they are equivalent:-)
But what if you wanted: Emails started with a and name starts with b You can do this checking INDEXED for name property for PBOUser entity, or:
This is how you can optimise your database:-)
Use the Indexes list to add compound indexes to the entity. A compound index is an index that spans multiple attributes or relationships. A compound index can make searching faster. The names of attributes and relationships in your data model are the most common indexes. You must use the SQLite store to use compound indexes.
Adding a row with a single attribute to the Indexes list is equivalent to selecting Indexed for that attribute: It creates an index for the attribute to speed up searches in query statements.
The Indexes list is meant for compound indexes. Compound indexes are useful when you know that you will be searching for values of these attributes combined in the WHERE clause of a query:
SELECT * FROM customer WHERE surname = "Doe" AND firstname = "Joe";
This statement could make use of a compound index surname, firstname. That index would also be useful if you just search for surname, but not if you only search for firstname. Think of the index as if it were a phone book: It is sorted by surname first, then by first name. So the order of attributes is important.

Search Engine indexes and types

Being somewhat new to search engines, the notions of indexes and types are not very clear to me. Elastic search has the notion of indexes and types where you can store a document.
Does the notion of an index correlate with a schema in a database?
While the notion of a type correlate with a table?
Can someone please explain the purpose of having another grouping below indexes?
Why can't we store all documents of the same type on a single index?
Does the notion of an index correlate with a schema in a database? While the notion of a type correlate with a table?
No and no. First, ElasticSearch is schema free: you don't have to specify upfront the structure of your documents. Just throw some JSON at ElasticSearch and it will happily index it, store it, retrieve it, search it.
The concept of index correlates to the notion of database: a database contains many tables, eg. heterogenously structured data.
The notion of type correlates to the notion of table: various types stored under one index can have different mapping, ie. different analyzers for fields, etc.
Another way how to look at types would be to look at them as column families in column databases such as HBase or Cassandra.
There is actually a very nice example in the ElasticSearch README: storing two different types of data (users and their tweets) in one index, named “twitter”.
(All that said, nobody forces you to exploit this feature: you can have one type under an index, if it makes sense for you.)

MongoDB indexes

I'm in the process of converting my Rails app to use mongodb through mongoid. I have two questions relating to indexes. I think I know the answer, but I want confirmation from someone who has more experience with mongodb.
Let's look at the following example where I have one relational association between Users and Posts.
user.rb
class User
has_many_related :posts
end
post.rb
class Post
belongs_to_related :user
end
Now when I look at the indexes created through the MongoHQ interface, I notice the following two:
Key Name: _id_
Indexed Field: _id
Unique: <blank>
Is the id guaranteed to be unique? If so, why isn't unique set. If not, how can I set this and do I need to?
Key Name: user_id_1
Indexed Field: user_id
Unique: false
Am I correct in assuming the Indexed Field is the field name in the collection? Just want to confirm as Key Name has the _1 after it.
Yes, _id in MongoDB is always unique. It's the primary key, which is why setting UNIQUE isn't necessary.
Here is very clearly described indexes in MongoDB Indexing Overview.
_id
The _id index is a unique index** on the _id field, and MongoDB creates this index by default on all collections. You cannot delete the index on _id.
The _id field is the primary key for the collection, and every document must have a unique _id field. You may store any unique value in the _id field. The default value of _id is ObjectID on every insert()
** Although the index on _id is unique, the getIndexes() method will not print unique: true in the mongo shell.
If you do not specify the _id value manually in MongoDB, then the type will be set to a special BSON datatype that consists of a 12-byte binary value.
The 12-byte value consists of a 4-byte timestamp, a 3-byte machine id, a 2-byte process id, and a 3-byte counter. Due to this design, this value has a resonably high probability of being unique.
Reference: The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing (book)

Resources