Linkage

https://www.cl.cam.ac.uk/~cm542/papers/Ubicomp17-Georgiev.pdf

Background

This paper proposed a multi-task framework to handle audio recongnition problem on embedded or mobile devices.

Former work

The former work focused on the single network optimization or the multi-task network on large devices like server. While our work combined them together.

Feature

1.Change the input feature.

2.Proposed a new search-mechanism

3.Less energy consumed

Implementation

Multi-task with shared network structure. We us

Task Type

The number of tasks can be scale up and down dynamcally using our multi -task framework.

Speaker Identi!cation.
Emotion Recognition
Stress Detection.
Ambient Scene Analysis.

Customize the Input

a. extract filter bank summaries, normalize the features across individual datasets and randomize the samples.

Using Filter Bank rather than the PLP, MFCC

Reducing Input Feature Complexity.
Adjust the input data to the same size regardless of the size of voice data.

Hidden Layer

Topology of Framework

DBN Stacked RBM(two stage)

unsupervised pre-training(RBM)

fine tuning

DNN(BP + SGD)

Optimize Configuration of Structure

1.The node in each layer is the powers of 2 (Should not exceed the memory limitation)

2.The searching scopre for topology is restricted.

3.Make sure the accuracy is high enough.

Output Layer

Represent the probability by Softmaxing

Evalution

We evaluate the performance from different aspects.

a. Accuracy : similiar to the simple network

b.Runtime and Energy Reductions. (Runtime and Energy is similar to simple network)

c.handcrafted feture & filter bank

d.Cluster size

e.memory footpritnt