Survey on DeepLog

Posted on 2019-01-27

This is a paper published on CCS’ 2017 by members from Feifei Li’s group. The paper mainly focus on the log analysis of large scale system, for the purpose of debug.

In real world, we debug a system by analyzing the log records produced by different sources from the system. For example, the network log and the storage log in an openstack service. In this paper, the authors combined the deep learning model with log analysis, to be more specific, the model utilize the RNN to catch the characters between timestamps.

Pointer Network

Posted on 2018-11-29

Report

In this report, we introduced a new encoder&decoder arch network to solve the sorting problem as the general purpose. The report can be seperated into several parts:

The Attention mechanism and the seq2seq model
The architecture&usage of the pointer network
The implementation of the code
The performance analysis of the result work.

Attention mechanism

If we talk about the attention mechanism, we must know what the RNN&seq2seq is.

In the NLP problem, we would meet the translation problem quite offten. We usually use the seq2seq model to solve it, the following architecture is a standard seq2seq model contains both the encoder and the decoder.

屏幕快照 2018-12-09 下午3.48.21

We have omitted the loss function design in the seq2seq, since it varies with different implementation. In the pointer network, we will represent a much more detailed information about the given network. In this implementation, we can see some shortcomings very easily. For example, once the sentence length expands, we can see that the yellow cell can not memory the enough information.

So we brought out a new conecept Attention, in this implementation, we can fully utilize the encoder’s output information, the following picture show the partial architecture of attention mechanism.

屏幕快照 2018-12-09 下午7.20.24

In the attention mechanism, we add a weight matrix A to represent the encoder’s ouput. The detail of the formula will be told in the implementation part.

屏幕快照 2018-12-09 下午7.24.15

But pay attention that the attention mechanism varies in different implementation and models, for more detailed work, your can refer to this survey https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html

What is Pointer Network?

The pointer network can be viewed as a seq2seq model equipped with unique attention mechanism, the picture below describe the architecture of a pointer network.

屏幕快照 2018-12-09 下午7.31.36

At the first glimpse, we knew this a seq2seq model, but we should pay attention the fact that the pointer with red mark has slight difference with the formal version of seq2seq,because the pointer means the output of the decoder at a certain timestamp represent a softmaxed vector, each element’s value means the probability that this element pointing to the index of the element. The formula below will give you a more detailed pic of how to get the softmax layer.

屏幕快照 2018-12-09 下午7.51.27

Here e_j and d_i means the encoders’ hidden states and decoders’ hidden states, the W1&W2 are the weight matrix remains to be trained, v_t is also a weight vector to transform the result to proper size,

then using softmax function can generate a probability value which means the probability that the current index of this timestamp is i. The implementation in pytorch will give you a more detailed insight.

The following 3 lines defined the W1, W2 and v_t in the formula above.

1
2
3

self.W1 = nn.Linear(hidden_size, weight_size, bias=False) # blending encoder
self.W2 = nn.Linear(hidden_size, weight_size, bias=False) # blending decoder
self.vt = nn.Linear(weight_size, 1, bias=False)

What is the sorting task?

Here we need to clarify what is the sorting task, namely what is the input and the ouput of our neural network. The sorting task here is slightly different from the normal sorting, we should output the expected index of each element at the order of raw array. For example:

1	x = [0.54431329, 0.64373097, 0.9927333 , 0.70941862, 0.10016056]

1	y = [4,0,1,3,2]

The y[i] means the expected index in the sorted array for value x[i]

How pointer network solve the sorting task?

We use lstm cell as the basic element of the encoder and decoder, and the following parts will tell you how to “feed” our input data, and construct our loss function just step by step.

Encoder Steps

At each time step of the encoder, we put the number directly(no embedding like the bag of word model!)

Then the lstm cell will generate a ouput vector at each timestamp, and pass the hidden layer to the next tiemstamp

1 2	self.enc = nn.LSTM(input_size, hidden_size, batch_first=True) #Define encoder_states, hc = self.enc(new_input.float()) # encoder_state: (batch_size, L, H)

Here the encoder_states stack the state at each timestep.

Decoder Steps

For the decode steps, we can not use the nn.LSTM directly, because we will hack each timestamp’s calculation, so we choose the nn.LSTMCell, and implement the RNN mechanism by for loop manually.

Before we implement the algo, you should recap the formula we mention at What is the Pointer Network? , As for the implementation, we just need the corresponding logic at forward function.

hidden, cell_state = self.dec(decoder_input, (hidden, cell_state)) # (batch_size, h), (batch_size, h)
## Reference: https://zhuanlan.zhihu.com/p/30860157
blend1 = self.W1(encoder_states)          # (L, bs, W)
blend2 = self.W2(hidden)                  # (bs, W)
blend_sum = F.tanh(blend1 + blend2)    # (L, bs, W)
out = self.vt(blend_sum).squeeze()        # (L, bs)
out = F.log_softmax(out.transpose(0, 1).contiguous(), -1) # (bs, L)
decoder_input = torch.max(out, 1)[1].view(batch_size, -1).float() # (bs, input_size)
probs.append(out)

The code above is main logic of the given for loop, here we should pay attention the fact that decoder_input, input layer of the current timestamp is the predicted index at the previous timestamp. (Impose argmax on probability maxtrix)

1 2	out = F.log_softmax(out.transpose(0, 1).contiguous(), -1) # (bs, L) decoder_input = torch.max(out, 1)[1].view(batch_size, -1).float() # (bs, input_size)

Network architecture overview

input_size: The input data size at each timestamp(1 in sorting task)

answer_seq_len: lens of the array

weight_size: the first dimension size of blending matrix

hidden_size: the hidden layer of the encoder and decoder’s lstm

Enc: encoder framework

dec: decoder network(step by step)

class PointerNetwork(nn.Module):
    def __init__(self, input_size, answer_seq_len, weight_size=128, hidden_size=128):
        super(PointerNetwork, self).__init__()

        self.hidden_size = hidden_size
        self.answer_seq_len = answer_seq_len
        self.weight_size = weight_size
        self.input_size = input_size

        self.enc = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.dec = nn.LSTMCell(input_size, hidden_size) # LSTMCell's input is always batch first

        ## Reference: https://zhuanlan.zhihu.com/p/30860157
        self.W1 = nn.Linear(hidden_size, weight_size, bias=False) # blending encoder
        self.W2 = nn.Linear(hidden_size, weight_size, bias=False) # blending decoder
        self.vt = nn.Linear(weight_size, 1, bias=False)

Loss Function Calculation

We use cross entropy loss to calculate the loss function for our neural network,

1 2	out = F.log_softmax(out.transpose(0, 1).contiguous(), -1) # (bs, L) loss = F.nll_loss(outputs, y_batch)

Training Framework

In the training part, we implement three different task based on three different training models(with the same structure)

And I refract the test code

model = PointerNetwork(1, input_seq_len)
    # model.train()
    optimizer = optim.Adam(model.parameters())
    train_X, train_Y = getdata(shiyan, dataset_size)
    train_x = torch.LongTensor(train_X)
    train_y = torch.LongTensor(train_Y)
    loader = DataLoader(TensorDataset(train_x, train_y), batch_size=batch_size)
    for epoch in range(MAX_EPOCH):
        for x_batch, y_batch in loader:
        ############################
        # compute the  prediction
            probs = model(x_batch)
            outputs = probs.view(-1, input_seq_len)
            y_batch = y_batch.view(-1)
            loss = F.nll_loss(outputs, y_batch)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            torch.save(model, 'mytraininglstm' + str(shiyan) + '.pkl')

        if epoch % 10 == 0 and epoch != 0:
            print (epoch ,' \t loss is \t',loss.item())
            print (loss)



        if epoch % 100 == 0 and epoch != 0: #
            evaluate(model, shiyan)
            evaluate_naive(model, train_x, train_y)

This is the pipeling of training and testing, I use the DataLoader provided by pytorch, which acts like a generator for a large dataset size, and we log the loss and accuracy at certain timestamps. The next module will show some result of our experiment under different circumstances.

optimizer = optim.Adam(model.parameters())
train_X, train_Y = getdata(shiyan, dataset_size)
train_x = torch.LongTensor(train_X)
train_y = torch.LongTensor(train_Y)
loader = DataLoader(TensorDataset(train_x, train_y), batch_size=batch_size)
for epoch in range(MAX_EPOCH):
    for x_batch, y_batch in loader:
        ############################
        # compute the  prediction
        probs = model(x_batch)
        outputs = probs.view(-1, input_seq_len)
        y_batch = y_batch.view(-1)
        loss = F.nll_loss(outputs, y_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        torch.save(model, 'mytraininglstm' + str(shiyan) + '.pkl')

        if epoch % 10 == 0 and epoch != 0:
            print (epoch ,' \t loss is \t',loss.item())
            print (loss)



        if epoch % 100 == 0 and epoch != 0: #
                evaluate(model, shiyan)
                evaluate_naive(model, train_x, train_y)

Result and Conclusion

We implement the same network architecture to finish 3 tasks, we set the batch_size as 250, max_epoch as 1000, dataset_size as 2500.

We will analyse how the architecture of the network affect performance(training time& accuracy) of tasks, and also analyse how different sort tasks affect accuracy.

We analyze the task1 with different architecture.

Network Size(weight_size/hidden_size)	Time Used(to stable)	Accuracy
128/128	522s	92.3%
512/512	1033s	90.1%
10/10	213s	87.4%

So the 128/128 arch can be the best arch for the given arch.

We also analyse 3 tasks using the best architecture(128/128)

Task	Time Used(to stable)	Accuracy
1(0~100, 5)	422s	91.4%
2(0~1, 5)	300s	67.3%
3(0~100, 10)	1400s	87.2%

Comparing 1 and 3, we knew that the as the sequence length enlarged, it became harder to train, and for the task2, it will be much more difficult than the previous two tasks, because the floating number can be considered with much larger range.

Neural Adaptive Content-aware Internet Video Delivery

Posted on 2018-11-20

CV paper

Generalized Abbreviation

Posted on 2018-11-18

Description

Write a function to generate the generalized abbreviations of a word.

Note: The order of the output does not matter.

Example:

1
2
3

Input: "word"
Output:
["word", "1ord", "w1rd", "wo1d", "wor1", "2rd", "w2d", "wo2", "1o1d", "1or1", "w1r1", "1o2", "2r1", "3d", "w3", "4"]

Solution

class Solution {
public:
    vector<string> generateAbbreviations(string word) {
        vector<string> ret;
        string current = "";
        helper(word, word.size(), 0, current, ret);
    }
    
    void helper(string word, int size, int index, string current, vector<string> ret) {
        if(index == size) {
            ret.push_back(current);
            return;
        }
        
        
        for(int i = index; i < size; ++i) {
            helper(word, size, i + 1, 
               current + word.substr(index, i - index + 1), ret);
        }
        return;
    }
};

Group Shifted Strings

Posted on 2018-11-12

Description

Given a string, we can “shift” each of its letter to its successive letter, for example: "abc" -> "bcd". We can keep “shifting” which forms the sequence:

1	"abc" -> "bcd" -> ... -> "xyz"

Given a list of strings which contains only lowercase alphabets, group all strings that belong to the same shifting sequence.

Example:

Input: ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"],
Output: 
[
  ["abc","bcd","xyz"],
  ["az","ba"],
  ["acef"],
  ["a","z"]
]

Solution

class Solution {
public:
    vector<vector<string>> groupStrings(vector<string>& strings) {
        unordered_map<int, vector<string>> mapping;
        for(auto& item : strings) {
            int hashValue = 0;
            int base = 1;
            for (int i = 1; i < item.size(); ++i){
                hashValue += base * distance(item[i-1], item[i]);
                base *= 26;
            }
            mapping[hashValue].push_back(item);
        }
        
        vector<vector<string>> ret;
        for(auto& item : mapping) {
            ret.push_back(item.second);
        }
        
        return ret;
    }
    
    int distance(char a, char b) {
        return b - a + (a < b ? 0 : 26);
    }
};

Max Consecutive Ones II

Posted on 2018-11-12

Description

Given a binary array, find the maximum number of consecutive 1s in this array if you can flip at most one 0.

Example 1:

Input: [1,0,1,1,0]
Output: 4
Explanation: Flip the first zero will get the the maximum number of consecutive 1s.
    After flipping, the maximum number of consecutive 1s is 4.

Note:

The input array will only contain 0 and 1.
The length of input array is a positive integer and will not exceed 10,000

Follow up:
What if the input numbers come in one by one as an infinite stream? In other words, you can’t store all numbers coming from the stream as it’s too large to hold in memory. Could you solve it efficiently?

Two Pass Solution

class Solution {
public:
    int findMaxConsecutiveOnes(vector<int>& nums) {
        if (nums.size() == 0) return 0;
        vector<int> slots;
        int acc = 0;
        int ret = -1;
        for(int i = 0; i <= nums.size(); ++i) {
            if(i < nums.size() && nums[i] == 1) 
                acc += 1;
            else {
                ret = max(ret, acc);
                slots.push_back(acc); 
                acc = 0;
            }
        }
        
        for (int index = 0; index < slots.size() - 1; ++index)
            ret = max(ret, slots[index] + slots[index + 1] + 1);
        
        return ret;
    }
};

One Pass Solution

class Solution {
public:
    int findMaxConsecutiveOnes(vector<int>& nums) {
        if (nums.size() == 0) return 0;
        vector<int> slots;
        int acc = 0;
        int pre = 0;
        int ret = -1;
        int flag = 0;
        for(int i = 0; i <= nums.size(); ++i) {
            if(i < nums.size() && nums[i] == 1) 
                acc += 1;
            else {
                ret = max(ret, acc + pre + flag); 
                pre = acc;
                acc = 0;
                flag = 1;
            }
        }
        
        return ret;
    }
};

Sentence Similarity

Posted on 2018-11-11

Description

Given two sentences words1, words2 (each represented as an array of strings), and a list of similar word pairs pairs, determine if two sentences are similar.

For example, “great acting skills” and “fine drama talent” are similar, if the similar word pairs are pairs = [["great", "fine"], ["acting","drama"], ["skills","talent"]].

Note that the similarity relation is not transitive. For example, if “great” and “fine” are similar, and “fine” and “good” are similar, “great” and “good” are not necessarily similar.

However, similarity is symmetric. For example, “great” and “fine” being similar is the same as “fine” and “great” being similar.

Also, a word is always similar with itself. For example, the sentences words1 = ["great"], words2 = ["great"], pairs = [] are similar, even though there are no specified similar word pairs.

Finally, sentences can only be similar if they have the same number of words. So a sentence like words1 = ["great"] can never be similar to words2 = ["doubleplus","good"].

Note:

The length of words1 and words2 will not exceed 1000.

The length of pairs will not exceed 2000.

The length of each pairs[i] will be 2.

The length of each words[i] and pairs[i][j] will be in the range [1, 20].

Solution

class Solution {
public:
    bool areSentencesSimilar(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, unordered_map<string, bool>> mapping;
        for (auto& p : pairs) {
            mapping[p.first][p.second] = true;
            mapping[p.second][p.first] = true;
        }
        for(int i = 0; i < words1.size(); ++i) {
            if(words1[i] != words2[i] && !mapping[words1[i]][words2[i]]) return false;
        }
        return true;
    }
};

Sort Transformed Array

Posted on 2018-11-10

Description

Given a sorted array of integers nums and integer values a, b and c. Apply a quadratic function of the form f(x) = ax2+ bx + c to each element x in the array.

The returned array must be in sorted order.

Expected time complexity: O(n)

Example 1:

1 2	Input: nums = [-4,-2,2,4], a = 1, b = 3, c = 5 Output: [3,9,15,33]

Example 2:

1 2	Input: nums = [-4,-2,2,4], a = -1, b = 3, c = 5 Output: [-23,-5,1,7]

Solution

class Solution {
public:
    vector<int> sortTransformedArray(vector<int>& nums, int a, int b, int c) {
        vector<int> ret(nums.size());
        
        if (a == 0) {
            int index = b >= 0 ? 0 : nums.size() - 1;
            int bias = b >= 0 ? 1 : -1;
            for (auto& item : nums){
                ret[index] = cal(a, b, c, item);
                index += bias;
            }
            return ret;
        }
        int index = a > 0 ? nums.size() - 1 : 0;
        int bias = a > 0 ? -1 : 1;
        auto left = 0;
        auto right = nums.size() - 1;
        while(left <= right) {
            int lValue = cal(a, b, c, nums[left]);
            int rValue = cal(a, b, c, nums[right]);
            ret[index] = a < 0 ? min(lValue, rValue) : max(lValue, rValue);
            int none = (ret[index] == lValue) ? ++left : --right;
            index += bias;
        } 
        
        return ret;
    }
    
    int cal(int a, int b, int c, int x) {
        return a*x*x + b*x + c;
    }
};

Line Reflection

Posted on 2018-11-09

Description

Given n points on a 2D plane, find if there is such a line parallel to y-axis that reflect the given points.

Example 1:

1 2	Input: [[1,1],[-1,1]] Output: true

Example 2:

1 2	Input: [[1,1],[-1,-1]] Output: false

Solution

class Solution {
public:
    bool isReflected(vector<pair<int, int>>& points) {
        if (points.size() == 0) return true;
        
        unordered_map<int, set<int>> mapping;
        for (auto& point : points) mapping[point.second].insert(point.first);
        
        auto iter = mapping.begin();
        auto left = iter->second.begin();
        auto right = iter->second.rbegin();
        int target = *left + *right;
        
        while(iter != mapping.end()) {
            auto& lis = iter->second;
            left = lis.begin();
            right = lis.rbegin();
            while (left != lis.end() && right != lis.rend() && *left <= *right) {
                if (*left + *right != target) return false;
                ++left; ++right;
            }
            ++iter;
        }
        return true;
    }
};

Bomb Enemy

Posted on 2018-11-09

Description

https://leetcode.com/problems/bomb-enemy/

Given a 2D grid, each cell is either a wall 'W', an enemy 'E' or empty '0' (the number zero), return the maximum enemies you can kill using one bomb.
The bomb kills all the enemies in the same row and column from the planted point until it hits the wall since the wall is too strong to be destroyed.
Note: You can only put the bomb at an empty cell.

Example:

Input: [["0","E","0","0"],["E","0","W","E"],["0","E","0","0"]]
Output: 3 
Explanation: For the given grid,

0 E 0 0 
E 0 W E 
0 E 0 0

Placing a bomb at (1,1) kills 3 enemies.

Solution

class Solution {
public:
    int maxKilledEnemies(vector<vector<char>>& grid) {
        int rowSize = grid.size();
        if (rowSize == 0) return 0;
        int columnSize = grid[0].size();
        vector<vector<int>> rowFromLeft(rowSize, vector<int>(columnSize, 0));
        vector<vector<int>> rowFromRight(rowSize, vector<int>(columnSize, 0));
        vector<vector<int>> columnFromUp(rowSize, vector<int>(columnSize, 0));
        vector<vector<int>> columnFromDown(rowSize, vector<int>(columnSize, 0));
        
        vector<vector<int>> resultMatrix(rowSize, vector<int>(columnSize, 0));
        
        for(int i = 0; i < rowSize; ++i) {
            for(int j = 0; j < columnSize; ++j) {
                if(grid[i][j] == '0') {
                    rowFromLeft[i][j] =  (j > 0 ? rowFromLeft[i][j-1] : 0);
                    columnFromUp[i][j] = (i > 0 ? columnFromUp[i - 1][j] : 0);
                    resultMatrix[i][j] += rowFromLeft[i][j] + columnFromUp[i][j];
                }
                    
                if(grid[i][j] == 'E') {
                    rowFromLeft[i][j] =  (j > 0 ? rowFromLeft[i][j-1] + 1 : 1);
                    columnFromUp[i][j] = (i > 0 ? columnFromUp[i - 1][j] + 1 : 1);
                }
                //The wall case should be zero
            }
        }
        
        for(int i = rowSize - 1; i >= 0; --i) {
            for(int j = columnSize - 1; j >= 0; --j) {
                if(grid[i][j] == '0') {
                    rowFromRight[i][j] =  (j < columnSize-1 ? rowFromRight[i][j+1] : 0);
                    columnFromDown[i][j] = (i < rowSize-1 ? columnFromDown[i+1][j] : 0);
                    resultMatrix[i][j] += rowFromRight[i][j] + columnFromDown[i][j];
                }
                    
                if(grid[i][j] == 'E') {
                  rowFromRight[i][j] =  (j < columnSize-1 ? rowFromRight[i][j+1] + 1 : 1);
                  columnFromDown[i][j] = (i < rowSize-1 ? columnFromDown[i+1][j] + 1 : 1);
                }
                //The wall case should be zero
            }
        }
        int ret = 0;
        for(int i = 0; i < rowSize; ++i) {
            for(int j = 0; j < columnSize; ++j) {
                if (grid[i][j] == '0') ret = max(ret, resultMatrix[i][j]);
            }
        }
        return ret;
    }
};

hazelnutsgz

A normal coder

Github LinkedIn