Linkage
https://www.cl.cam.ac.uk/~cm542/papers/Ubicomp17-Georgiev.pdf
Background
This paper proposed a multi-task framework to handle audio recongnition problem on embedded or mobile devices.
Former work
The former work focused on the single network optimization or the multi-task network on large devices like server. While our work combined them together.
Feature
1.Change the input feature.
2.Proposed a new search-mechanism
3.Less energy consumed
Implementation
Multi-task with shared network structure. We us
Task Type
The number of tasks can be scale up and down dynamcally using our multi -task framework.
Speaker Identi!cation.
Emotion Recognition
Stress Detection.
Ambient Scene Analysis.
Customize the Input
a. extract filter bank summaries, normalize the features across individual datasets and randomize the samples.
Using Filter Bank rather than the PLP, MFCC
Reducing Input Feature Complexity.
Adjust the input data to the same size regardless of the size of voice data.
Hidden Layer
Topology of Framework
DBN Stacked RBM(two stage)
unsupervised pre-training(RBM)
fine tuning
DNN(BP + SGD)
Optimize Configuration of Structure
1.The node in each layer is the powers of 2 (Should not exceed the memory limitation)
2.The searching scopre for topology is restricted.
3.Make sure the accuracy is high enough.
Output Layer
Represent the probability by Softmaxing
Evalution
We evaluate the performance from different aspects.
a. Accuracy : similiar to the simple network
b.Runtime and Energy Reductions. (Runtime and Energy is similar to simple network)
c.handcrafted feture & filter bank
d.Cluster size
e.memory footpritnt