News

  • 04/23/2018
    I've joined the TPC of IEEE CLUSTER 2018.
  • 02/21/2018
    I've joined the TPC of BDCAT 2018.
  • 10/26/2017
    I've joined the TPC of ICCCN 2018.
  • 06/06/2017
    I've joined the TPC of IPCCC 2017.
  • 04/21/2017
    Two paper submissions to ICCCN 2017 have been accepted.
  • 03/30/2017
    I've joined the TPC of BDCAT 2017.

Research Projects

  • image

    Optimize MapReduce Overlap with a Good Start (Reduce) and a Good Finish (Map)

    We design a dynamic scheduler to optimize the overlap between the map phase and the reduce phase, and implement it in Hadoop

    The scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to “shuffle” the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.

  • image

    Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand

    We propose a new Hadoop YARN scheduling algorithm which aims at efficiently utilizing the resources for scheduling map/reduce tasks and improving the makespan of MapReduce jobs

    The Hadoop ecosystem has evolved into its second generation, Hadoop YARN, which adopts fine grained resource management schemes for job scheduling. In YARN, there is no “slot” which is the building block in the old versions, and the system no longer distinguishes map and reduce tasks when allocating resources. Instead, each task specifies a resource request in the form of <2G,1core> (i.e., requesting 2G memory and 1 cpu core), and it will be assigned to a node with sufficient capacity. However, existing schedulers in YARN don't consider the efficiency of resource utilization for multiple jobs running concurrently in cluster.

    Motivated by above problem, We designed a YARN scheduler, named HaSTE, which can effectively reduce the makespan of MapReduce jobs in YARN platform by leveraging the information of requested resources, resource capacities, and dependency between tasks. Moreover, we proposed an opportunistic scheduling scheme to reassign reserved but idle resources to other waiting tasks. The major goal of our new scheme is to improve system resource utilization without incurring severe resource contentions due to resource over provisioning.

  • image

    Fair and Efficient Slot Configuration and Scheduling for Hadoop Cluster

    a Hadoop scheduler for multiple jobs considering fairness and dynamic slot configuration

    The native Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we developed a fair and efficient slot configuration and scheduling for Hadoop clusters called FRESH which can derive the best slot setting, dynamically configure slots, and appropriately assign tasks to the available slots so that it cannot only improve the makespan but also guarantee the fairness of batch jobs.

Filter by type:

Sort by year:

19. Automatic and Scalable Data Replication Manager in Distributed Computation and Storage Infrastructure of Cyber-Physical Systems

Zhengyu Yang, Janki Bhimani, Jiayin Wang, David Evans and Ningfang Mi
Journal Paper[SCPE] Scientific International Journal for Parallel and Distributed Computing, Vol. 18, No. 4

Abstract

18. EA2S2: An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters

Teng Wang, Jiayin Wang, Son Nam Nguyen, Zhengyu Yang, Ningfang Mi, and Bo Sheng
Conference Papers [ICCCN '17a] The 26th International Conference on Computer Communications and Networks, Vancouver, Canada, July 2017.

Abstract

17. AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-Stage Big Data Frameworks

Han Gao, Zhengyu Yang, Janki Bhimani, Teng Wang, Jiayin Wang, Ningfang Mi, and Bo Sheng
Conference Papers [ICCCN '17b] The 26th International Conference on Computer Communications and Networks, Vancouver, Canada, July 2017.

Abstract

16. SEINA: A Stealthy and Effective Internal Attack in Hadoop System

Jiayin Wang, Teng Wang, Zhengyu Yang, Ying Mao, Ningfang Mi, and Bo Sheng
Conference Papers [ICNC '17] International Conference on Computing, Networking and Communication, Silicon Valley, CA, Jan. 2017.

Abstract

15.eSplash: Effcient Speculation in Large Scale Heterogeneous Computing Systems

Jiayin Wang, Teng Wang, Zhengyu Yang, Ningfang Mi, and Bo Sheng
Conference Papers [IPCCC '16a] 35th IEEE International Performance Computing and Communications Conference, Las Vegas, Nevada, Dec. 2016.

Abstract

14.GREM: Dynamic SSD Resource Allocation in Virtualized Storage Systems With Heterogeneous IO Workloads

Zhengyu Yang, Jianzhe Tai, Janki Bhimani, Jiayin Wang, Ningfang Mi, and Bo Sheng
Conference Papers [IPCCC '16b] 35th IEEE International Performance Computing and Communications Conference, Las Vegas, Nevada, Dec. 2016.

Abstract

13. OpERA: Opportunistic and Efficient Resource Allocation in Hadoop YARN by Harnessing Idle Resources

Yi Yao, Han Gao, Jiayin Wang, Ningfang Mi, and Bo Sheng
Conference Papers [ICCCN '16] The 25th International Conference on Computer Communications and Networks, Waikoloa, HI, Aug. 2016.

Abstract

12. Mobile Message Board: Location-based Message Dissemination in Wireless Ad-Hoc Networks

Ying Mao, Jiayin Wang and Bo Sheng
Conference Papers [ICNC '16] International Conference on Computing, Networking and Communication, Kauai, HI, Feb. 2016.

Abstract

Smartphones play an important role in mobile social networks. This paper presents a Mobile Message Board (MMB) system for smartphone users to post and share messages in a cer- tain area. Our system is built upon ad-hoc communication model, and allows the users to browse the nearby information without pre-registration with any servers. Our algorithm design focuses on the message management on each phone considering its own schedule of turning the wireless device on and off. We present algorithms for two different cases to maximize the availability of the messages. Furthermore, we have implemented our solutions on commercial smartphones, and conducted experiments and simulation for evaluation. The results are supportive and shows that the MMB system is efficient and effective for location-based message dissemination.

11. OMO: Optimize MapReduce Overlap with a Good Start (Reduce) and a Good Finish (Map)

Jiayin Wang, Yi Yao, Ying Mao, Bo Sheng, and Ningfang Mi
Conference Papers [IPCCC '15a] 34th IEEE International Performance Computing and Communications Conference, Nanjing, China, Dec. 2015.

Abstract

MapReduce has become a popular data processing framework in the past few years. Scheduling algorithm is crucial to the performance of a MapReduce cluster, especially when the cluster is concurrently executing a batch of MapReduce jobs. However, the scheduling problem in MapReduce is different from the traditional job scheduling problem as the reduce phase usually starts before the map phase is finished to “shuffle” the intermediate data. This paper develops a new strategy, named OMO, which particularly aims to optimize the overlap between the map and reduce phases. Our solution includes two new techniques, lazy start of reduce tasks and batch finish of map tasks, which catch the characteristics of the overlap in a MapReduce process and achieve a good alignment of the two phases. We have implemented OMO on Hadoop system and evaluated the performance with extensive experiments. The results show that OMO's performance is superior in terms of total completion length (i.e., makespan) of a batch of jobs.

10. Building Smartphone Ad-Hoc Networks With Long-range Radios

Ying Mao, Jiayin Wang, Bo Sheng and Fan Wu
Conference Papers [IPCCC '15b] 34rd IEEE International Performance Computing and Communications Conference, Nanjing, China, Dec. 2015.

Abstract

This paper investigates the routing protocols in smartphone-based mobile Ad-Hoc networks. We introduce a new dual radio communication model, where a long-range, low cost, and low rate radio is integrated into smartphones to assist regular radio interfaces such as WiFi and Bluetooth. We propose to use the long-range radio to carry out small management data packets to improve the routing protocols. Specifically, we develop new schemes to improve the efficiency of the path establishment and path recovery process in the on-demand Ad-Hoc routing protocols. We have prototyped our solution LAAR on Android phones and evaluated the performance with small scale experiments and large scale simulation implemented on NS2. The results show that LAAR significantly improves the performance.

9. Admission control in YARN clusters based on dynamic resource reservation

Yi Yao, Jason Lin, Jiayin Wang, Bo Sheng and Ningfang Mi
Conference Papers [IM '15] IFIP/IEEE International Symposium on Integrated Network Management, Ottawa, ON, May 2015.

Abstract

Hadoop YARN is an open project developed by the Apache Software Foundation to provide a resource management framework for large scale parallel data processing. However, there exists a resource waiting deadlock under the Fair scheduler when the resource requisition of applications is beyond the amount that the cluster can provide. In such a case, the YARN system will be halted if all resources are occupied by ApplicationMasters, a special task of each job that negotiates resources for processing tasks and coordinates job execution. Therefore, we develop a new admission control mechanism which dynamically reserves resources for processing tasks in order to avoid resource waiting deadlocks and meanwhile obtain good performance. We implement and evaluate our new mechanism in Hadoop YARN v2.2.0. The experimental results show the effectiveness of this mechanism under MapReduce benchmarks.

8. Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters

Yi Yao, Jiayin Wang, Bo Sheng, Chiu Tan and Ningfang Mi
Journal Paper[TCC] IEEE Transactions on Cloud Computing, Volume PP, Issue 99, 23 March 2015, Pages 1-14

Abstract

Although a substantial amount of research has examined the constructs of warmth and competence, far less has examined how these constructs develop and what benefits may accrue when warmth and competence are cultivated. Yet there are positive consequences, both emotional and behavioral, that are likely to occur when brands hold perceptions of both. In this paper, we shed light on when and how warmth and competence are jointly promoted in brands, and why these reputations matter.

7. LAAR: Long-range Radio Assisted Ad-Hoc Routing in MANETs

Ying Mao, Jiayin Wang, Bo Sheng, and Mooi Chuah
Conference Papers [ICNP '14] The 22nd IEEE International Conference on Network Protocols (Concise Papers Track), The Research Triangle, NC, Oct. 2014.

Abstract

This paper investigates the routing protocols in smartphone-based mobile Ad-Hoc networks. We introduce a new dual radio communication model, where a long-range, low cost, and low rate radio is integrated into smartphones to assist regular radio interfaces such as WiFi and Bluetooth. We propose to use the long-range radio to carry out small management data packets to improve the routing protocols. Specifically, we develop new schemes to improve the efficiency of the path establishment and path recovery process in the on-demand Ad-Hoc routing protocols. We have prototyped our solution LAAR on Android phones and evaluated the performance with small scale experiments and large scale simulation implemented on NS2. The results show that LAAR significantly improves the performance.

6. DAB: Dynamic and Agile Buffer-control for Streaming Videos on Mobile Devices

Ying Mao, Jiayin Wang and Bo Sheng
Conference Papers [MobiSPC '14] The 11th International Conference on Mobile Systems and Pervasive Computing, Niagara Falls, Ontario, Canada, Aug. 2014.

Abstract

This paper studies the video buffer control for streaming video data to mobile devices. We target on the design challenge when the wireless link quality is dynamic due to the the environmental factors or user mobility. We develop a Dynamic and Agile buffor-control scheme, called DAB, that adaptively adjust the video buffer size based on the measurements of the signal strength (RSSI) and accelerometer on the smartphone. Our goal is to keep a smooth playback while delivery as little data as possible to the end-user in order to save bandwidth cost. We have implemented our solution on Android platform and evaluate it with experiments. Compared to the traditional video buffer scheme, our solution DAB significantly improves the performance in terms of the quality of playback and the buffer efficency.

5. PASA: Passive Broadcast for Smartphone Ad-hoc Networks

Ying Mao, Jiayin Wang, Joseph Paul Cohen and Bo Sheng
Conference Papers [ICCCN '14] The 23rd International Conference on Computer Communications and Networks, Shanghai, China, Aug. 2014.

Abstract

Smartphones have become more and more popular in the past few years. Motivated by the fact that location plays an extremely important role in mobile applications, this paper develops an efficient local message dissemination system PASA based on a new communication model called passive broadcast. It is based on the method of overloading device names described in MDSRoB [14] and Bluejacking [23]. In this new model, each node does not maintain connection state and data delivery is initialized by a receiver via a `scan' operation. The representative carriers of passive broadcast include Bluetooth and WiFi-Direct, both of which define a mandatary `peer discovery' scan function. Passive broadcast features negligible cost for establishing and maintaining direct links and is extremely suitable for short message dissemination in the proximity. In this paper, we present PASA with complete protocols and in-depth analysis for optimization. We have prototyped our solution on commercial phones and evaluated it with comprehensive experiments and simulation.

4. FRESH: Fair and Efficient Slot Configuration and Scheduling for Hadoop Clusters

Jiayin Wang, Yi Yao, Ying Mao, Bo Sheng, and Ningfang Mi
Conference Papers [CLOUD '14a] The 7th IEEE International Conference on Cloud Computing, Anchorage, AK, June 2014.

Abstract

Hadoop is an emerging framework for parallel big data processing. While becoming popular, Hadoop is too complex for regular users to fully understand all the system parameters and tune them appropriately. Especially when processing a batch of jobs, default Hadoop setting may cause inefficient resource utilization and unnecessarily prolong the execution time. This paper considers an extremely important setting of slot configuration which by default is fixed and static. We proposed an enhanced Hadoop system called FRESH which can derive the best slot setting, dynamically configure slots, and appropriately assign tasks to the available slots. The experimental results show that when serving a batch of MapReduce jobs, FRESH significantly improves the makespan as well as the fairness among jobs.

3. HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand

Yi Yao, Jiayin Wang, Bo Sheng, Jason Lin and Ningfang Mi
Conference Papers [CLOUD '14b] The 7th IEEE International Conference on Cloud Computing, Anchorage, AK, June 2014.

Abstract

The MapReduce framework has become the de facto scheme for scalable semi-structured and un-structured data processing in recent years. The Hadoop ecosystem has evolved into its second generation, Hadoop YARN, which adopts fine-grained resource management schemes for job scheduling. One of the primary performance concerns in YARN is how to minimize the total completion length, i.e., makespan, of a set of MapReduce jobs. However, the precedence constraint or fairness constraint in current widely used scheduling policies in YARN, such as FIFO and Fair, can both lead to inefficient resource allocation in the Hadoop YARN cluster. They also omit the dependency between tasks which is crucial for the efficiency of resource utilization. We thus propose a new YARN scheduler, named HaSTE, which can effectively reduce the makespan of MapReduce jobs in YARN by leveraging the information of requested resources, resource capacities, and dependency between tasks. We implemented HaSTE as a pluggable scheduler in the most recent version of Hadoop YARN, and evaluated it with classic MapReduce benchmarks. The experimental results demonstrate that our YARN scheduler effectively reduces the makespans and improves resource utilization compare to the current scheduling policies.

2. Skyfiles: Efficient and secure cloud-assisted file management for mobile devices

Ying Mao, Jiayin Wang and Bo Sheng
Conference Papers [ICC '14] IEEE International Conference on Communications, Sydney, Australia, June 2014.

Abstract

This paper targets the application of cloud storage management for mobile devices. Because of the limit of bandwidth and other resources, most existing cloud storage apps for smartphones do not keep local copies of files. This efficient design, however, limits the application capacities. In this paper, our goal is to extend the available file operations for cloud storage service to better serve smartphone users. We develop Skyfiles, an efficient and secure file management system that supports more advance file operations. Our basic idea is to utilize cloud instances to assist file operations. Particularly, Skyfiles supports download, compress, encrypt, convert operations, and file transfer between two smartphone users' cloud storage spaces. In addition, we design protocol for users to share their idle instances.

1. Using a Tunable Knob for Reducing Makespan of MapReduce Jobs in a Hadoop Cluster

Yi Yao, Jiayin Wang, Bo Sheng and Ningfang Mi
Conference Papers [CLOUD '13] The 6th IEEE International Conference on Cloud Computing, Santa Clara, CA, 2013.

Abstract

The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose a simple yet effective scheme which uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our scheme dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented scheme in Hadoop V0.20.2 and evaluated it with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our scheme under both simple workloads and more complex mixed workloads.

Currrent Teaching (@ MSU)

  • Fall 2017

    CSIT 111: Fundamentals of Programming I

Teaching History

  • Fall 2016

    CS/IT 114: Introduction to Java, part 1 (@ UMass Boston)

    Lecturer: Deliver two presentations weekly, 75mins each; CS/IT 114 is the first course in the two-course version of introductory Java programming.

  • Fall 2015

    IT443 - Network Security Administration (@ UMass Boston)

    Teaching Assistant

    Duties include assisting the instructor to develop the exercises of the course, grading homework and holding the Q&A sessions through office hours.

  • 2015 2014

    CS110 - Introduction to Computing (@ UMass Boston)

    Lab sessions Instructor: in charge of the lab sessions of CS110 for 40 students each semester.

    Deliver two presentations weekly, 25mins each; follow by 50mins hands-on guidance to help students fully understand the lectures; and guide them to finish the lab project with programming in Java.

  • 2014 2013

    CS630 - Database Management Systems (@ UMass Boston)

    Teaching Assistant

    Duties include giving lectures to introduce SQL, grading homework and projects, and holding the Q&A sessions through office hours.

  • Spring 2013

    CS341 - Computer Architecture and Organization (@ UMass Boston)

    Teaching Assistant

    Duties include assisting students with programming in assembly language, grading homework and projects, and holding the Q&A sessions through office hours.

  • Fall 2012

    IT244 - Introduction to Linux/Unix (@ UMass Boston)

    Teaching Assistant

    Duties include assisting the instructor to develop the exercises of the course, grading homework and holding the Q&A sessions through office hours.