Package src :: Package com :: Package deviceatlas :: Package device :: Module node :: Class Node
[frames] | no frames]

Class Node


The Trie is created by chains of nested Node objects. It is created by
converting character code points to their byte equivalent. Child nodes are
represented by byte arrays that can be accessed to return the next node in
the chain.

To walk the trie a character code point is passed to a Node. The codepoint is
converted into bytes which are used to walk the trie.

If a code point is larger than 2^16 then it is split into two bytes and added
as two chained nodes. This occurs for characters outside the ascii range.
Characters that are surrogate pairs such as emoji are handled as separate
characters instead of converting to a codepoint higher than 2^16.

Each node can have two types of child node arrays.

1) Direct child nodes - these are direct children that are part of a given token.
2) Token start nodes - these are the first byte of a given token and appear
                       at the root node and at the start of every token in a
                       token chain

The child nodes are represented by arrays of byte values where each item in
the array is either null or the next Node in the chain. An offset is used to
reduce the number of items in the array.


Example: children = 'd','h','i'

==> offset: byte value of d == 100
==> children array has six elements with three of them being empty.
children[0] == 0+100 = 100 = 'd'
children[1] == 1+100 = 101 = 'e' // nothing, not a real node
children[2] == 2+100 = 102 = 'f' // nothing, not a real node
children[3] == 3+100 = 103 = 'g' // nothing, not a real node
children[4] == 4+100 = 104 = 'h'
children[5] == 5+100 = 105 = 'i'

Example
   RUle1: ALCATELVH1
   Rule2: ALCATEL AND V15T

Combined:
   ALCATEL
         \ VH1 (direct child node="V", i.e. part of the original token)
         \ V15T (token start child node="V", i.e. start of next token in rule chain)

FAIL (without direct/token start distinction - no way to tell end of token):
   ALCATELV
          \ H1
          \ 15T


@author DeviceAtlas Limited

Instance Methods
 
__init__(self)
 
is_start_with_allowed_any_match(self)
 
match_candidates(self, match_candidates)
 
has_token_start_node(self)
Returns TRUE if this Node contains any token-starts children.
 
get_token_start_node(self, code_point)
Try and get a token starts child node by using the character code point.
 
update_starts_with_allowed_any_match(self)
Check all matchCandidates to see if any contain a starts-with constraint.
 
get_token_start_node_by_byte(self, seek)
 
get_child_node(self, seek, is_token_start_child)
Try and get a child node using a single byte value and not the usual integer code point that may be converted to more than one byte.
 
add_child(self, child_value, is_token_start_child)
 
get_direct_child_node(self, code_point)
Try and get a direct child node by using the character code point.
 
get_direct_child_node_by_byte(self, seek)
Class Variables
  MAX_UNSIGNED_BYTE_VALUE = 255
Method Details

match_candidates(self, match_candidates)

 
Decorators:
  • @match_candidates.setter

has_token_start_node(self)

 

Returns TRUE if this Node contains any token-starts children. FALSE otherwise. :return:

get_token_start_node(self, code_point)

 

Try and get a token starts child node by using the character code point. The codepoint is converted to one or possibly two byte values to walk the Nodes. If the codepoint is converted to two bytes then this method first tries to get a Node from the token start array and then subsequently from a direct child array. :param code_point: The codepoint value of a character ranges 0-65535 :return: The next node or null if none found.

update_starts_with_allowed_any_match(self)

 

Check all matchCandidates to see if any contain a starts-with constraint. If so set the swAllowedAnyMatch flag to true. :return:

get_child_node(self, seek, is_token_start_child)

 

Try and get a child node using a single byte value and not the usual integer code point that may be converted to more than one byte. :param seek: :param is_token_start_child: :return:

get_direct_child_node(self, code_point)

 

Try and get a direct child node by using the character code point. The codepoint is converted to one or possibly two byte values to walk the Nodes. If the codepoint is converted to two bytes then this method tries to perform to hops in the Trie. :param code_point: The codepoint value of a character ranges 0-655 :return: The next node or null if none found.