Identifying sub parts within sequence of text using deep learning

Identifying sub parts within sequence of text using deep learning - machine-learning

First of all I am very new to deep learning and sorry if I am not asking the questions that are not up to the standard of the site.
I have a sequence of arm assembly opcodes that are respective to a number of functions (You can view the whole csv file from here). If I give a one particular sequence of a function it is as follows.
// This is the disassembly sequence of a function named bit()
// just assume four opcodes `b0 0a 46 01` belong to certain other library
83 b0 0a 46 01 90 02 a8 01 70 ff e7 01 98 01 68 01 22 52 05 91 43 01 60 02 a8 00 78 40 05 00 90 ff e7 01 98 01 68 00 9a 11 43 01 60 01 98 03 b0 70 47
I have already built a small deep learning model (by following a NLP tutorial) to classify among eight classes of function when a byte sequence as above is passed. However, what I need is in addition to identifying the function label, to identify the byte sequence that belong to particular library.For example, in the above function b0 0a 46 01 believe to a different library. So I want to identify such sub sequences when a whole function sequence is passed. I believe it is similar to object detection in images where instead of just identifying the image itself, identifying the objects in the model.
Frankly, I don't know whether such a requirement is possible in deep learning and if possible, I would like to know any resources/tutorials that I could look up and learn in order to reach my goal. Once again, sorry if I am asking something that doesn't make sense. Appreciate any help if possible.

Your idea to consider this something like a object detection problem seems to make sense. The Yolo model is supposed to be pretty decent in this use case: https://pjreddie.com/darknet/yolo/ . Perhaps you can substitute 1D convolutions in place of the 2D convolutions to adapt to your use case. Additionally, creating embeddings to encode your opcodes as a first step might be helpful, although you might have already implemented this part. I hope this helps.

Related

decode lorawan data gps tracker Moko LW001-BG Thethings network

I am new in Lora, I tray to connect my Lora GW & GPS tracer LW001-BG to The things Network, & it successfully connected to TTN, but how to convert or decode the data from the GPS to latlong format?
here is the documentation http://doc.mokotechnology.com/index.php?s=/2&page_id=143
I receive data format like this 02 01 56 F8 0B 45 F4 29 32 46 and I need to convert/decode it to readable format
thanks I hope someone can help me

The payload of the message is in bytes 3-6 (for the latitude) and 7-10 (for the longitude). The first two bytes indicate how many packages there are (two) and which the current one is (the first).
The four bytes represent a 32 bit floating point value; in your example this is 2239.5210 for the latitude. This means 22 degrees, 39 minutes, and 31.26 seconds (which is the fraction times sixty).
You can see this in an on-line converter: As the byte order is lowest byte first, you need to reverse it, convert it to binary, and then check the checkboxes in the binary representation:
54 F8 0B 45 becomes 45 0B F8 56 or binary
01000101000010111111100001010110
Here the first bit is the sign, followed by 8 bits of the exponent and 23 bits mantissa. The decimal representation is 2239.52099609 and you discard all digits after the fourth to get 2239.5210 (with rounding).
Depending on how you process this data, you might be able to simply cast this to a float variable, as they are generally following the 32 bit IEEE 754 standard.

How can I get "mode & PIDs" from raw OBD2 identifier 11 or 29 bit?

I have connected OBD2 and getting the can data (11bit 500kpbs CAN) using atmel can controller.
I get data.
Now, how do I get the mode and PIDs from this data?
For example, my data looks like this:
15164A8A-FF088B52 -- Data: 00,00,00,86,9C,FE,9C,FE,
I could see RPM changing, ignition on/off etc... on the data fields.
I don't want to use ELM chips. I need to handle the raw data directly.

HINT: All of my numbers are in HEX.
OBD2 protocol sends you responses in bytes (8 bits). responses are subdivided into header (or called ID as well) and data.
IDs are the address of the ECU and data is "response data" from ECU and it is always 8 bytes (in CAN Bus protocol?!).
8 Bytes of data will be divided into PCI (which can be one or two bytes) and values. PCI will show you what is your frame type (single, First, consecutive or flow control frame) and how many bytes are incoming.
to make it easier I make an example only for single frame:
you might send an OBD request to main ECU like this:
7DF 02 01 0C 00 00 00 00 00
7DF is ECU address for diagnose tester device.
02 is number of sending data bytes
01 is the mode (which you might be interesting in!) 01 is current data, 02 is freeze frame and etc.
0C is rpm PID.
The response from ECU would be something like (single frame):
7E8 04 41 0C 12 13 00 00 00
7E8 is the ECU that responding.
04 number of incoming data bytes.
41 the data are in response to 01 PID
0C response to this PID
12 13 are two byte in response to 0C. Please keep in mind that you have to decode these two bytes with OBD II ISO protocol. you can also find some of conversion rates on Wikipedia.
Other bytes are useless.
To make it short: you have to parse each response from ECU and try to convert the useful bytes to readable decimal value. It depends on which programming language you are using. in C/C++ the best practice in my opinion would be unsigned char which is guaranteed by compiler to be 8 bits and in JAVA it can be Byte. Moreover, try to use bitwise operators to make your life a bit easier.
By more questions do not hesitate to ask.

Difference between Bluetooth and iBeacon protocols in iOS

I am integrating an iOS app with a Bluetooth chip.
I am trying to understand the differences between a constant BLE connection and an iBeacon notifications in terms of the protocol.
So I am able to connect to the chip and send/recieve data from/to iPhone.
To communicate , you need 3 things :
name
service UUID
characteristic UUID (to write and read from).
Then, when you want to register to a region of an iBeacon, you also need a UUID of the characteristic, but also major/minor values.
Thing is, all iBeacon apps will not detect this type of broadcast- the one I use for communication.
So, whats basically the difference in the protocol? what if I keep advertising the same service/char I use for regular communication? is it good also for beacon push notification ? apparently not- but whats the different?
Also, I have 2 hardware chips, both will not let you set a major/minor values, but only to update the characteristic to notify a new value to subscribers which is hex (not an integer like major/minor) - is that equivalent to an iOS push notification by iBeacon detection ??

Difference between constant BLE connection and Beacons
A "constant BLE connection" is a connection, two devices pair with each other.
A BLE iBeacon device does not know about other devices. It simply broadcasts a certain signal frequently. Other devices can than listen to this signal and evaluate the signals strength to estimate how nearby the sender might be.
"Is it equivalent to Push Notifications?"
No.
Major & Minor
The major number (2 bytes) is used to group a related set of beacons. For example, all beacons in my flat will have the same major number, while the neighbour uses his own. That way the application know in which specific flat the application is.
The minor number (2 bytes) is used to identify the actual beacon. Each beacon in my flat has a different minor number, so that you know where within my flat the application is.
" value to subscribers ... is hex ... not an integer"
An hexadecimal is an integer.
HowTo
You need to insert a specific set of bytes into the optional manufacturer specific Data field (your "new value" for subscribers).
According to this site, you need the following values:
ID (uint8_t)
Data Length (uint8_t) - The number of bytes in the rest of the payload = 0x15
128-bit UUID (uint8_t[16]) - The 128-bit ID identifying the Beacons manufacturer
Major (uint16_t) - The major value
Minor (uint16_t) - The minor value
TX Power (uint8_t) - This value is used to try to estimate distance based on the RSSI value
Example from this site:
0x02 0x0008 1E 02 01 1A 1A FF 4C 00 02 15 00 00 00 00 C8 00
0x02
0x0008
1E 02 01 1A 1A FF 4C 00 02 15 Manufacturer
00 00 Major
00 00 Minor
C8 00 Power

Why do different data types take more memory for the same data?

For example, if I want to store the number one, I can use an integer type, taking up 32-bits or a long type, taking up 64-bits, however there will be the same amount of information (from a symbolic perspective) on both data types.

The variable occupies space based on the type, not the actually contained value.
From the type depends the totality of possible values, of which the current actual value is just one. So the definition set requires a certain amount of space, not the value itself.
EDIT:
I sense confusion :)
Let's say we have 2 bits which can be combined in 4 ways:
00
01
10
11
Now these are all possible combinations of 2 bits.
What those represent is completely indifferent. We just have 4 different states. We can map those to whatever we want:
00 white
01 black
10 red
11 blue
or
00 A
01 B
10 C
11 D
or
00 0
01 1
10 2
11 3
The fact that we can encode those 4 states is bound to the type. Whatever value we store in a variable of that type will always occupy all 2 bits that are necessary to encode all 4 possible values.
A remarkable exception are strings. They can be seen as a modern implementation of Turing's finite tape on which to inscribe characters from an alphabet. Remarkably, we can store all human knowledge with that type (e.G. the totality of all written books could be stored in one single string).

What data structure should I use to store the results of a survey?

Xcode 5/iOS7 No Core Data.
I'm trying to do a survey for users and need some assistance on setting up the Data structure.
The survey is basically a Tree. Depending upon what the option user selects, next screen can different.
Here is the format:
Scene 1: Question 1
Answer choices for Question 1:
Answer 1A //This option will take them to scene 2A -- all the answers will take them to --> 3A -- then all the options in 3A will take them to 4A and the survey will end
Answer 1B //This option will take them to scene 2B -- all the answers will take them to 3B --> then all the options in 3B will take them to 4A and the survey will end
Answer 1C //This option will take them to scene 2C -- all the answers will take them to 3C --> then all the options in 3C will take them to 4A and the survey will end
Answer 1D //This option will take them to scene 4A and the survey will end
Answer 1E //This option will take them to scene 5A and the survey will end
Hopefully the above make sense.
So my question is, what's the best way to keep track of what options the user selected, especially when the user clicks 'Back'.
My thought is doing an Array for each scene where the question in each array is [NSArray objectAtIndex:0] answer choices will be [NSArray objectAtIndex:1], [2], [x]
But I'm not sure how to keep track of what Scene to call next, and which Scenes the User was prompted and the answer choices that were selected.
Has anybody dealt with something like this? Any guidance on data structure would be appreciated!

I think you can map these options to certain unique characters like 'X' (Back), 'y' (Forward) or you can assign number further using these characters Ascii value (means make conditions as per characters Ascii value) you can implement searching or moving to a particular branch of the tree.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart