How to specify an __init__ argument that is not a class attribute - python-attrs

With attrs, how can I specify an init argument that is not a class attribute.
e.g. a CRC8 object might be passed some bytes or bytearray in the constructor, but I don't want to store that input, just calculate the CRC and store the result.
Is this pattern an example where using a class method is appropriate (as described in this link)?
http://www.attrs.org/en/stable/init.html#initialization

What you want here is a classmethod factory:
#attr.s
class CRC8:
checksum = attr.ib()
#classmethod
def from_bytes(cls, bytes):
# compute your CRC
# checksum = ...
return cls(checksum)
That allows you to do proper checks and balances.
In this case I'd wonder if a function wouldn’t do? Similarly to the standard library's binascii.crc32().

Related

Disable leading underscore removal in attrs auto-generated init method signature

attr strips leading underscore from attribute names for the generated __init__ method . Is there a way to override that for a particular attribute, short of disabling auto-generated initialization method for the class entirely?
I'd like to use attr class to represent MongoDB documents, which in Python are dictionaries with the _id key to record the unique id. I was hoping to be able to have a from_db(cls, doc) class method that's little more than return cls(**doc), but the presence of _id causes "TypeError: init() got an unexpected keyword argument '_id'". Right now, I work around it by declaring `_id: attr.ib(init=False, default=None), and having:
#classmethod from_db(cls, doc):
_id = doc["_id"]
del(doc["_id"])
obj = cls(**doc)
obj._id = _id
return obj
but that seems really klugey. Is there a better way?
Currently not possible, sorry.
See:
https://github.com/python-attrs/attrs/issues/391
https://github.com/python-attrs/attrs/issues/619
Long-term you'll probably run into more complicated exceptions and then tools like cattrs make more sense.

How to make custom object available for function passed to dask df.apply (cannot serialize)

All this code works in pandas, but running single threaded is slow.
I have an object (it's a bloom filter) that's slow to create.
I have dask code that looks something like:
def has_match(row, my_filter):
return my_filter.matches(
a=row.a, b =row.b
)
# ....make dask dataframe ddf
ddf['match'] = ddf.apply(has_match, args=(my_filter, ), axis=1, meta=(bool))
ddf.compute()
When I try to run this I get an error that starts:
distributed.protocol.core - CRITICAL - Failed to Serialize
My object was created from a C library, so I'm not surprised that it can't be automagically serialized, but I don't know how to work around this.
Distributed expects all intermediate results to be serializable. In your case, you have an object that doesn't implement pickle. In general you have a few options here (in order of best to worst IMHO):
Implement pickle for this object. Note that using the copyreg module you can add pickle support for classes that aren't in your control.
Cache the creation of the filter in your function manually. You could do this with an object, or with a global variable in your module. Note that this code below would need to be part of a module imported, not part of your interactive session (i.e. not in a jupyter notebook/ipython session).
For example (untested):
myfilter = None
def get_or_load():
global myfilter
if myfilter is None:
myfilter = load_filter()
else:
return myfilter
def load_filter():
pass
def has_match(row):
my_filter = get_or_load()
return my_filter.matches(a=row.a, b=row.b)
And then in your user code:
from my_filter_utils import has_match
ddf['match'] = ddf.apply(has_match, axis=1, meta=('matches', bool))
Use dask to manage the cache. To do this, wrap the object in another class that re-loads the object when serialized. If you then persist that object in the cluster, dask will hold on to it and at most the creation function will be called once on every node.
For example (untested):
from dask import delayed
class Wrapper(object):
def __init__(self, func):
self.func = func
self.filter = func()
def __reduce__(self):
# When unpickled, the filter will be reloaded
return (Wrapper, (func,))
def load_filter():
pass
# Create a delayed function to load the filter
wrapper = delayed(Wrapper)(load_filter)
# Optionally persist the wrapper in the cluster, to be reused over multiple computations
wrapper = wrapper.persist()
def has_match(row, wrapper):
return wrapper.filter.matches(a=row.a, b=row.b)
ddf['match'] = ddf.apply(has_match, args=(wrapper,), axis=1, meta=('matches', bool))
Use only threads
One way is to avoid the problem altogether and just not use separate processes at all. That way you won't need to serialize data between them.
ddf.compute(scheduler='threads')
This does limit you to running in a single process on a single machine though, which may not be what you want.
Figure out how to serialize your object
If you can figure out how to turn your object into a bytestring and back then you can either implement the pickle protocol on your object (like the __getstate__ and __setstate__ methods, see Python docs) or you can add definitions to the dask_serialize and dask_deserialize dispatchable functions. See Dask's serialization docs for an example.
Re-create your objects every time
Maybe it's hard to serialize your object, but cheap to recreate it once per partition?
def has_match(partition):
my_filter = make_filter(...)
return partition.apply(my_filter.matches(a=row.a, b =row.b))
ddf['match'] = ddf.map_partitions(has_match)

Method arguments with and without def keyword

Suppose we have the following method:
def myMethodWithParameters(param1, def param2, Object param3) {
...
}
What are the differences between using the def keyword and using Object as type for an argument?
What are the differences between using the def keyword and not using any type/keyword for an argument?
What I know so far and does not completely answer the question:
def keyword is used to allow dynamic types.
So you can even put an Object[] in it.
def keyword can be used to make variables available only in current scope instead of globally
Quick link to the docs which do a good job of explaining this:
When defining a method with untyped parameters, you can use def but
it’s not needed, so we tend to omit them. So instead of:
void doSomething(def param1, def param2) { }
Prefer:
void doSomething(param1, param2) { }
But as we mention in the last
section of the document, it’s usually better to type your method
parameters, so as to help with documenting your code, and also help
IDEs for code-completion, or for leveraging the static type checking
or static compilation capabilities of Groovy.
The general rule I follow with Groovy, is:
If you know what type you expect, or return, then put that type in the definition. If you only accept String, add the type to the parameter (the same with returning a value). This goes doubly for methods which form part of your "public" API (ie: if other classes or people are going to be making use of the method).
If it's just internal, or accepts a range of value types, then leave the argument untyped, and let Groovy sort it out...

GraphQL-Ruby: Custom Logic on Type Resolution

My team's been working with GraphQL Ruby, and we've found use cases for applying scopes and sorting through our queries, and have been writing the same code over and over in several places.
I'm wondering, is there a way to implement resolution logic for a type anywhere it is used? I want to add filtering and sorting arguments on every field that returns a specific type without having to write additional boiler plate whenever I return it from a field. In the example below, I would wrap it with a GraphQL::Function, and the sorting and filtering arguments are passed in on args. I'd like to not have to use the function every time, but just have the type implement the ability to use those arguments for its own resolution.
field :institutionUser, Types::InstitutionUserType do
with_filtering(resolve ->(obj, args, ctx) {
....resolution logic
})
end
It sounds like you want to use a Resolver.
field :institutionUser, resolver: Resolvers::InstitutionUser
...
module Resolvers
class InstitutionUser < GraphQL::Schema::Resolver
type Types::InstitutionUser
def resolve
...resolution logic
end
end
end
https://graphql-ruby.org/fields/resolvers.html

attr_accessor strongly typed Ruby on Rails

Just wondering if anyone can shed some light on the basics of getter setters in Ruby on Rails with a view on strongly typed. I am very new to ruby on rails and predominately have a good understanding of .NET.
For example, let's consider we have a .net class called Person
class Person
{
public string Firstname{get;set;}
public string Lastname{get;set;}
public Address HomeAddress{get;set;}
}
class Address
{
public string AddressLine1{get;set;}
public string City{get;set;}
public string Country{get;set;}
}
In Ruby, I would write this as
class Person
attr_accessor :FirstName
attr_accessor :LastName
attr_accessor :HomeAddress
end
class Address
attr_accessor :AddressLine1
attr_accessor :City
attr_accessor :Country
end
Looking at the Ruby version of the Person class how do I specify the types for the accessor methods FirstName, LastName and HomeAddress? If I were to consume this class I could feed any type into HomeAddress but I want this accessor method to accept only the TYPE Address.
Any suggestions?
TL;DR: No it's not possible ... and long answer, yes it is possible, read the metaprogramming section :)
Ruby is a dynamic language, that's why you won't get compile time type warnings/errors as you get in languages like C#.
Same as you can't specify a type for a variable, you can't specify a type for attr_accessor.
This might sound stupid to you coming from .NET, but in the Ruby community, people kind of expect you to write tests. If you do so, these types of problems will basically vanish. In Ruby on Rails, you should test your models. If you do so, you won't really have any trouble with accidentaly assigning something somewhere wrong.
If you're talking about ActiveRecord in Ruby on Rails specifically, assigning a String into an attribute which is defined as an Integer in the database will result in exception being thrown.
By the way, according to convention, you shouldn't use CamelCase for attributes, so the correct class definition should be
class Person
attr_accessor :first_name
attr_accessor :last_name
attr_accessor :home_address
end
class Address
attr_accessor :address_line1
attr_accessor :city
attr_accessor :country
end
One reason for this is that if you Capitalize the first letter, Ruby will define a constant instead of a variable.
number = 1 # regular variable
Pi = 3.14159 # constant ... changing will result in a warning, not an error
Metaprogramming hacks
By the way, Ruby also has insanely huge metaprogramming capabilities. You could write your own attr_accessor with a type check, that could be used something like
typesafe_accessor :price, Integer
with definition something like
class Foo
# 'static', or better said 'class' method ...
def self.typesafe_accessor(name, type)
# here we dynamically define accessor methods
define_method(name) do
# unfortunately you have to add the # here, so string interpolation comes to help
instance_variable_get("##{name}")
end
define_method("#{name}=") do |value|
# simply check a type and raise an exception if it's not what we want
# since this type of Ruby block is a closure, we don't have to store the
# 'type' variable, it will 'remember' it's value
if value.is_a? type
instance_variable_set("##{name}", value)
else
raise ArgumentError.new("Invalid Type")
end
end
end
# Yes we're actually calling a method here, because class definitions
# aren't different from a 'running' code. The only difference is that
# the code inside a class definition is executed in the context of the class object,
# which means if we were to call 'self' here, it would return Foo
typesafe_accessor :foo, Integer
end
f = Foo.new
f.foo = 1
f.foo = "bar" # KaboOoOoOoM an exception thrown here!
or at least something along these lines :) This code works! Ruby allows you to define methods on the fly, which is how attr_accessor works.
Also blocks are almost always closures, which means I can do the if value.is_a? type without passing it as a parameter.
It's too complicated to explain here when this is true and when it's not. In short, there are different types of blocks
Proc, which is created by Proc.new
lambda, which is created by the keyword lambda
one of the differences is that calling return in a lambda will only return from the lambda itself, but when you do the same thing from a Proc, the whole method around the block will return, which is used when iterating, e.g.
def find(array, something)
array.each do |item|
# return will return from the whole 'find()' function
# we're also comparing 'item' to 'something', because the block passed
# to the each method is also a closure
return item if item == something
end
return nil # not necessary, but makes it more readable for explanation purposes
end
If you're into this kind of stuff, I recommend you check out PragProg Ruby Metaprogramming screencast.
Ruby is a dynamically typed language; like many dynamically typed languages, it adheres to duck typing -- from the English Idiom, "If it walks like a duck and quacks like a duck, then it's a duck."
The upside is that you don't have to declare types on any of your variables or class members. The restrictions on what types of objects you can store into the variables or class members comes only from how you use them -- if you use << to "write output", then you could use a file or array or string to store the output. This can greatly increase the flexibility of your classes. (How many times have you been upset that an API you must use required a FILE * C standard IO file pointer rather than allowing you to pass in a buffer?)
The downside (and, in my mind, it's a big one) is that there's no easy way for you to determine what data types you can safely store into any given variable or member. Perhaps once every leap year, a new method is called on a variable or member -- your program might crash with a NoMethodError, and your testing might have missed it completely because it relied on inputs you might not realize were vital. (This is a fairly contrived example. But corner cases are where most programming flaws exist and dynamic typing makes corner cases that much harder to discover.)
In short: there's no restriction on what you can store in your Address fields. If it supports the methods you call on those objects, it is -- as far as the language is concerned -- an Address. If it doesn't support the methods you need, then it will crash during sufficiently-exhaustive testing.
Just be sure to use the testing facilities to their fullest, to make sure you're exercising your code sufficiently to find any objects not fully compliant with a required API.

Resources