django.db.utils.IntegrityError: (1062, "Duplicate entry '22-add_' for key 'content_type_id'") - django-database

I am using django multiple DB router concepts, having multiple sites with different db's. Base database user will login with all other sub sites.
When i try syncdb in base site its worked properly(at any time), but trying syncdb with other sites works first time only, if we try next time on-wards it throws integiry error like below
django.db.utils.IntegrityError: (1062, "Duplicate entry
'22-add_somesame' for key 'content_type_id'")
Once i removed multiple DB router settings in that project means syncdb works properly(at any time).
So is this relates to multiple db router? or what else?
Please anyone advise on this, thanks.

The problem here is with the db router and django system objects. I've experienced the same issue with multiple DBs and routers. As I remember the problem here is with the auth.permission content types, which get mixed in between databases. The syncdb script otherwise tries to create these in all databases, and theb it creates permission content type for some object, which id is already reserved for a local model.
I have the following
BASE_DB_TYPES = (
'auth.user',
'auth.group',
'auth.permission',
'sessions.session',
)
and then in the db router:
def db_for_read(self, model, **hints):
if hasattr(model, '_meta') and str(model._meta) in BASE_DB_TYPES:
return 'base_db' # the alias of base db that will store users
return None # the default database, or some custom mapping
EDIT:
Also, the exception might say that you're declaring a permission 'add_somesame' for your model 'somesame', while Django automatically creates add_, delete_, edit_ permissions for all objects.

Related

Rails 6 with multiple database, auto change connection based on read or create query

The question might be silly and it's not practiced in real world. Anyway kindly give your thoughts/pros/cons....
Lets say I am having two database read replica database and master database
Scenario 1:
Model.all # It should query from read replica database
Scenario 2:
Model.create(attributes) # It should create data in master database
Scenario 3:
Model.where(condition: :some_condition).update(attributes) # It should read data from replica database and update the data in master database
Note: In runtime database should detect the query and process the above 3 scenario.
Questions:
Is this a valid expectation?
if Yes, How to achieve this case completely or partially?
if No, What wrong in this case and what issues we will be facing?
Rails 6 provides a framework for auto-routing incoming requests to either the primary database connection, or a read replica.
By default, this new functionality allows your app to automatically route read requests (GET, HEAD) to a read-relica database if it has been at least 2 seconds since the last write request (any request that is not a GET or HEAD request) was made.
The logic that specifies when a read request should be routed to a replica is specified in a resolver class, ActiveRecord::Middleware::DatabaseSelector::Resolver by default, which you would override if you wanted custom behavior.
The middleware also provides a session class, ActiveRecord::Middleware::DatabaseSelector::Resolver::Session that is tasked with keeping track of when the last write request was made. Like the resolver, this class can also be overridden.
To enable the default behavior, you would add the following configuration options to one of your app's environment files - config/environments/production.rb for example:
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver =
ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_operations =
ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
If you decide to override the default functionality, you can use these configuration options to specify the delay you'd like to use, the name of your custom resolver class, and the name of your custom session class, both of which should be descendants of the default classes

Unable to connect to Neo4j from c# driver Session to fabric database

Using the Neo4j.Driver (4.1.0) I am unable to connect a session to server's configured fabric database. It works fine in the Neo4j Browser. Is there a trick to setting the context to a fabric database?
This times out:
var session = driver.AsyncSession(o => o.WithDatabase("fabric"));
Actual database names work fine.
Does the c# driver not support setting the Session context to a fabric database?
I'm trying to execute something like the following:
use fabric.graph(0)
match ...
set...
I found a workaround by co-opting a sub-query as follows, but it seems that setting the session context would make more sense.
use fabric
call {
use fabric.graph(0)
match ...
set ...
return 0
}
return 0
I've not yet worked with fabric. But I have worked with clusters. You can only add nodes/edges to the one Neo4j database that has a WRITE role. To do this you need a small function to query the routing table and determine the write database. Here's the key query:
CALL dbms.cluster.routing.getRoutingTable({}) YIELD ttl, servers UNWIND servers as server with server where server.role='WRITE' RETURN server.addresses
You then address your write query to that specific database.

How can I query the Spamhaus DBL in Ruby on Rails?

I have a Rails web application. I want to create a class that takes an email address, say "matt#trucksandstuff.com," parses out the domain, and then checks if the domain is found in the Spamhaus DBL. I am having no luck with the dig or host commands as described on their website and the Charon gem doesn't seem to work with their sample URL either. Any ideas?
EDIT: Here is what is on the website:
In response to "How can I test the DBL?" they said:
First, the DBL follows RFC5782 for determining whether a URI zone is operational with an entry for TEST. Second, the DBL has a specific domain for testing DBL applications: dbltest.com. To test functionality of the DBL use the host or dig command to do a manual query. (If you need to look up a domain in the DBL via the web, use the domain lookup form at our Blocklist Removal Center. Do not query our website with automated tools.).
I have tried using the Charon gem, which I think should be as simple as running
Charon.query('dbltest.com')
with variations that remove the parentheses, add a space, etc.
Also tried
resolver = Resolv::DNS.new
name = 'dbltest.com'
resolver.getresources("#{name}.zen.spamhaus.org", Resolv::DNS::Resource::IN::A)
in the Rails console.
The Zen database is only for IP addresses. The DBL list is for hostnames. Therefore Charon (Zen query) only works with IP addresses. To test hostnames, query them with Resolv and dbl.spamhouse.org:
def is_spammer?(host)
!Resolv::DNS.new.getresources("#{host}.dbl.spamhaus.org",
Resolv::DNS::Resource::IN::A).empty?
end
is_spammer?('dbltest.com')
=> true
is_spammer?('google.com')
=> false

Put class instance to class constant in initializers

In one of my old apps, I'm using several API connectors - like AWS or Mandill as example.
For some reason (may be I saw it somewhere, don't remember), I using class constant to initialize this objects on init stage of application.
As example:
/initializers/mandrill.rb:
require 'mandrill'
MANDRILL = Mandrill::API.new ENV['MANDRILL_APIKEY']
Now I can access MANDRILL class constant of my application in any method and use it. (full path MyApplication::Application::MANDRILL, or just MANDRILL). All working fine, example:
def update_mandrill
result = MANDRILL.inbound.update_route id, pattern, url
end
The question is: it is good practice to use such class constants? Or better create new class instance in every method that using this instance, like in example:
def update_mandrill
require 'mandrill'
mandrill = Mandrill::API.new ENV['MANDRILL_APIKEY']
result = mandrill.inbound.update_route id, pattern, url
end
Interesting question.
It's very handy approach but it may have cons in some scenarios.
Imagine you have a constant that either takes a lot of time to initialize or it loads a lot of data into memory. When its initialization takes long you essentially degrade app boot time (which may or may not be a problem, usually it will in development).
If it loads a lot of data into memory it may turn out it's gonna be a problem when running rake tasks for example which load entire environment. You may hit memory boundaries in use cases in which you essentially do not need this data at all.
I know one application which load a lot of data during boot - and it's done very deliberately. Sure, use case is a bit uncommon, but still.
Another thing to consider is - imagine, you're trying to establish connection to external service like Mongo or anything else. If this service is unavailable (what happens) your application won't be able to boot. Maybe this service is essential for app to work, and without it it would be "useless" anyway, but it's also possible that you essentially stop everything because storage in which you keeps log does not work.
I'm not saying you shouldn't use it as you suggested - I do it also in my apps, but you should be aware of potential drawbacks.
Yes, pre-creating a pseudo-constant object (like that api client) is usually a good idea. However, there is, approximately, a thousand ways go about it and the constant is not on top of my personal list.
These days I usually go with setting it in the env files.
# config/environments/production.rb
config.email_client = Mandrill::API.new ENV['MANDRILL_APIKEY'] # the real thing
# config/environments/test.rb
config.email_client = a_null_object # something that conforms to the same api, but does absolutely nothing
# config/environments/development.rb
config.email_client = a_dev_object # post to local smtp, or something
Then you refer to the client like this:
Rails.application.configuration.email_client
And the correct behaviour will be picked up in each env.
If I don't need this per-env variation, then I either use some kind of singleton object (EmailClient.get) or a global variable in the initializer ($email_client). It can be argued that a constant is better than global variable, semantically and because it raises a warning when you try to re-assign it. But I like that global variable stands out more. You see right away that it's something special. (And then again, it's only #3 in the list, so I don't do it very often.).

Keeping elasticsearch and database in sync

I am trying to figure out a way to keep my mysql db and elasticsearch db in sync. I have setup a jdbc river using the jprante / elasticsearch-river-jdbc plugin for elasticsearch. When I execute the below request:
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/MY-DATABASE",
"user" : "root",
"password" : "password",
"sql" : "select * from users",
"poll" : "1m"
},
"index" : {
"index" : "test_index",
"type" : "user"
}
}'
the river starts indexing data, but for some records I get org.elasticsearch.index.mapper.MapperParsingException. Well there is discussion related to this issue here, but I want to know a way to get around this issue.
Is it possible to permanently fix this by creating an explicit mapping for all 'fields' of the 'type' that I am trying to index or is there a better way to solve this issue?
Another question that I have is, when the jdbc-river polls the database again, it seems to re-index the entire data-set(given in sql query) again into ES. I am not sure, but is this done because elasticsearch wants to add fresh data as well as update any changes in the existing data? Is it possible to index only the fresh data, if the table's data is static?
Did you look at default mapping?
http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html
I think it can help you here.
If you have an insertion date field in your datatable, you can use it to filter what you have to index.
See https://github.com/jprante/elasticsearch-river-jdbc#time-based-selecting
HTH
David
Elastic Search has dropped the river sync concept at all. It is not a recommended path, because usually it doesn't make sense to keep same normalized SQL table structure in document store like Elastic Search.
Say, you have Product as an entity with some attributes, and Reviews on Product entity as a parent child table as Reviews could be multiple on same table.
Products(Id, name, status,... etc)
Product_reviewes(product_id, review_id)
Reviews(id, note, rating,... etc)
In document store you may want to create a single Index with name say product that includes Product{attribute1, attribute1,... Product reviews[review1, review2,...]}
Here is approach of syncing in such setup.
Assumption:
SQL Database(True Source of record)
Elastic Search or any other NoSql Document Store
Solution:
As soon as Update/updates happens in Publish event/events in JMS/AMQP/Database Queue/File System Queue/Amazon SQS etc. either full Product or primary object ID(I would recommend just ID)
Queue consumer should then call the Web Service to get full object if only Primary ID is pushed to Queue or just take the object it self and send the respective changes to Elastic search/NoSQL database.

Resources