ISSN 1847-3938
organizer
39th
international convention
May 30 - June 03, 2016, Opatija – Adriatic Coast, Croatia
Lampadem tradere
mipro - path to knowledge and innovation
mipro proceedings
My profession.
My organization.
My IEEE.
Discover the benefits
of IEEE membership.
Join a community of more than
365,000 innovators in over 150
countries. IEEE is the world’s
largest technical society, providing members with access to the
latest technical information and
research, global networking and
career opportunities, and exclusive discounts on education and
insurance products.
Join today
www.ieee.org/join
MIPRO 2016
39th International Convention
May 30 – June 03, 2016
Opatija, Croatia
Proceedings
Conferences:
Microelectronics, Electronics and Electronic Technology /MEET
Distributed Computing, Visualization and Biomedical
Engineering /DC VIS
Telecommunications & Information /CTI
Special Session on Future Networks and Services /FNS
Computers in Education /CE
Computers in Technical Systems /CTS
Intelligent Systems /CIS
Special Session on Biometrics & Forensics & De-Identification and
Privacy Protection /BiForD
Information Systems Security /ISS
Business Intelligence Systems /miproBIS
Digital Economy and Government, Local Government, Public
Services / DE-GLGPS
MIPRO Junior - Student Papers /SP
Edited by:
Petar Biljanović
International Program Committee
Petar Biljanović, General Chair, Croatia
S. Amon, Slovenia
V. AnĎelić, Croatia
M.E. Auer, Austria
M. Baranović, Croatia
A. Badnjević, Bosnia and Herzegovina
B. Bebel, Poland
L. Bellatreche, France
E. Brenner, Austria
A. Budin, Croatia
Ţ. Butković, Croatia
Ţ. Car, Croatia
M. Colnarič, Slovenia
A. Cuzzocrea, Italy
M. Čičin-Šain, Croatia
M. Delimar, Croatia
T. Eavis, Canada
M. Ferrari, Italy
B. Fetaji, Macedonia
T. Galinac Grbac, Croatia
P. Garza, Italy
L. Gavrilovska, Macedonia
M. Golfarelli, Italy
S. Golubić, Croatia
F. Gregoretti, Italy
S. Groš, Croatia
N. Guid, Slovenia
Y. Guo, United Kingdom
J. Henno, Estonia
L. Hluchy, Slovakia
V. Hudek, Croatia
Ţ. Hutinski, Croatia
M. Ivanda, Croatia
H. Jaakkola, Finland
L. Jelenković, Croatia
D. Jevtić, Croatia
R. Jones, Switzerland
P. Kacsuk, Hungary
A. Karaivanova, Bulgaria
M. Mauher, Croatia
I. Mekjavić, Slovenia
B. Mikac, Croatia
V. Milutinović, Serbia
V. Mrvoš, Croatia
J.F. Novak, Croatia
J. Pardillo, Spain
N. Pavešić, Slovenia
V. Peršić, Croatia
T. Pokrajčić, Croatia
S. Ribarić, Croatia
J. Rozman, Slovenia
K. Skala, Croatia
I. Sluganović, Croatia
V. Sruk, Croatia
U. Stanič, Slovenia
N. Stojadinović, Serbia
J. Sunde, Australia
A. Szabo, IEEE Croatia Section
L. Szirmay-Kalos, Hungary
D. Šarić, Croatia
D. Šimunić, Croatia
Z. Šimunić, Croatia
D. Škvorc, Croatia
A. Teixeira, Portugal
E. Tijan, Croatia
A.M. Tjoa, Austria
R. Trobec, Slovenia
S. Uran, Croatia
T. Vámos, Hungary
M. Varga, Croatia
M. Vidas-Bubanja, Serbia
B. Vrdoljak, Croatia
D. Zazula, Slovenia
organized by
MIPRO Croatian Society
technical cosponsorship
IEEE Region 8
under the auspices of
Ministry of Science, Education and Sports of the Republic of Croatia
Ministry of Maritime Affairs, Transport and Infrastructure of the Republic of Croatia
Ministry of Entrepreneurship and Crafts of the Republic of Croatia
Ministry of Public Administration of the Republic of Croatia
Croatian Chamber of Economy
Primorsko-goranska County
City of Rijeka
City of Opatija
Croatian Regulatory Authority for Network Industries
Croatian Power Exchange - CROPEX
patrons
University of Rijeka, Croatia
University of Zagreb, Croatia
IEEE Croatia Section
IEEE Croatia Section Computer Chapter
IEEE Croatia Section Electron Devices/Solid-State Circuits Joint Chapter
IEEE Croatia Section Education Chapter
IEEE Croatia Section Communications Chapter
T-Croatian Telecom, Zagreb, Croatia
Ericsson Nikola Tesla, Zagreb, Croatia
Končar - Electrical Industries, Zagreb, Croatia
HEP - Croatian Electricity Company, Zagreb, Croatia
VIPnet, Zagreb, Croatia
University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia
Ruđer Bošković Institute, Zagreb, Croatia
University of Rijeka, Faculty of Maritime Studies, Croatia
University of Rijeka, Faculty of Engineering, Croatia
University of Rijeka, Faculty of Economics, Croatia
University of Zagreb, Faculty of Organization and Informatics, Varaždin, Croatia
University of Rijeka, Faculty of Tourism and Hospitality Management, Opatija, Croatia
Polytechnic of Zagreb, Croatia
EuroCloud Croatia
Croatian Regulatory Authority for Network Industries, Zagreb, Croatia
Croatian Post, Zagreb, Croatia
Erste&Steiermärkische bank, Rijeka, Croatia
Selmet, Zagreb, Croatia
CISEx, Zagreb, Croatia
Kermas energija, Zagreb, Croatia
Rezultanta, Zagreb, Croatia
River Publishers, Aalborg, Denmark
sponsors
Ericsson Nikola Tesla, Zagreb, Croatia
T-Croatian Telecom, Zagreb, Croatia
Končar-Electrical Industries, Zagreb, Croatia
HEP - Croatian Electricity Company, Zagreb, Croatia
InfoDom, Zagreb, Croatia
Hewlett Packard Croatia, Zagreb, Croatia
IN2, Zagreb, Croatia
Transmitters and Communications Company, Zagreb, Croatia
Storm Computers, Zagreb, Croatia
Nokia, Zagreb, Croatia
VIPnet, Zagreb, Croatia
King-ICT, Zagreb, Croatia
Microsoft Croatia, Zagreb, Croatia
Micro-Link, Zagreb, Croatia
Mjerne tehnologije, Zagreb, Croatia
Altpro, Zagreb, Croatia
Danieli Automation, Buttrio, Italy
Selmet, Zagreb, Croatia
ib-proCADD, Ljubljana, Slovenia
Nomen, Rijeka, Croatia
All papers are published in their original form
For Publisher:
Petar Biljanović
Publisher:
Croatian Society for Information and Communication Technology,
Electronics and Microelectronics - MIPRO
Office: Kružna 8/II, P. O. Box 303, HR-51001 Rijeka, Croatia
Phone/Fax: (+385) 51 423 984
Printed by:
GRAFIK, Rijeka
ISBN 978-953-233-087-8
Copyright 2016 by MIPRO
All rights reserved. No part of this book may be reproduced in any form, nor may be stored in
a retrieval system or transmitted in any form, without written permission from the publisher.
CONTENTS
LIST OF PAPER REVIEWERS
LIST OF AUTORS
FOREWORD
MICROELECTRONICS, ELECTRONICS AND ELECTRONIC
TECHNOLOGY
INVITED PAPER
(Si)GeSn Nanostructures for Optoelectronic Device Applications ..................................... 5
I.A. Fischer, F. Oliveira, A. Benedetti, S. Chiussi, J. Schulze
PAPERS
Thermoelectric Properties of Polycrystalline WS2 and Solid Solutions
of WS2-ySey Types .................................................................................................................. 11
G.E. Yakovleva, A.I. Romanenko, A.S. Berdinsky, A.Yu. Ledneva, V.A. Kuznetsov,
M.K. Han, S.J. Kim, V.E. Fedorov
Piezoresistive Effect in Polycrystalline Bulk and Film Layered Sulphide
W0.95Re0.05S2 ....................................................................................................................................................................................... 16
V.A. Kuznetsov, A.I. Romanenko, A.S. Berdinsky, A.Yu. Ledneva, S.B. Artemkina,
V.E. Fedorov
Luminescent Diagnostics in the NIR-region on a Base of Yb-porphyrin
Complexes .............................................................................................................................. 20
V.D. Rumyantseva, I.P. Shilov, Yu.V. Alekseev, A.S. Gorshkova
Simulation Study of the Composite Silicon Solar Cell Efficiency Sensitivity
to the Absorption Coefficients and the Thickness of intrinsic Absorber Layer .............. 24
V. Tudić, N. Posavec
The Investigation of Influence of Localized States on a-Si:H p-i-n Photodiode
Transient Response to Blue Light Impulse with Blue Light Optical Bias ....................... 30
M. Ĉović, V. Gradišnik, Ţ. Jeriĉević
Analysis of Electrical and Optical Characteristics of InP/InGaAs Avalanche
Photodiodes in Linear Regime by a New Simulation Environment ................................. 34
T. Kneţević, T. Suligoj
V
Design of Passive-Quenching Active-Reset Circuit with Adjustable Hold-Off
Time for Single-Photon Avalanche Diodes ......................................................................... 40
I. Berdalović, Ţ. Osreĉki, F. Šegmanović, D. Grubišić, T. Kneţević, T. Suligoj
Impact of the Emitter Polysilicon Thickness on the Performance of
High-Linearity Mixers with Horizontal Current Bipolar Transistors ............................. 46
J. Ţilak, M. Koriĉić, H. Mochizuki, S. Morita, T. Suligoj
Fully-integrated Voltage Controlled Oscillator in Low-cost HCBT
Technology ............................................................................................................................. 51
M. Koriĉić, J. Ţilak, H. Mochizuki, S. Morita, T. Suligoj
Variable-Gain Amplifier for Ultra-Low Voltage Applications in 130nm CMOS
Technology ............................................................................................................................. 57
D. Arbet, M. Kováĉ, L. Nagy, V. Stopjaková, M. Šovĉík
Relaxation Oscillator Calibration Technique with Comparator Delay
Regulation .............................................................................................................................. 63
J. Mikulić, G. Schatzberger, A. Barić
A Bootstrap Circuit for DC–DC Converters with a Wide Input Voltage
Range in HV-CMOS ............................................................................................................. 68
N. Mitrović, R. Enne, H. Zimmermann
A Fractional-N Subsampling PLL based on a Digital-to-Time Converter ...................... 72
N. Markulic, K. Raczkowski, P. Wambacq, J. Craninckx
Infrared Protection System for High-Voltage Testing of SiC and GaN FETs
used in DC-DC Converters ................................................................................................... 78
F. Hormot, J. Baĉmaga, A. Barić
Optimal Conduction Angle of an E-PHEMT Harmonic Frequency
Multiplier ............................................................................................................................... 82
K. Martinĉić
Ultra-Wideband Transmitter Based on Integral Pulse Frequency Modulator
T. Matić, M. Herceg, J. Job, L. Šneler .................................................................................... 86
Design of a Transmitter for High-Speed Serial Interfaces in Automotive MicroController ............................................................................................................................... 90
A. Bandiziol, W. Grollitsch, F. Brandonisio, R. Nonis, P. Palestri
Application of the Calculation-Experimental Method in the Design of
Microwave Filters .................................................................................................................. 95
A.S. Geraskin, A.N. Savin, I.A. Nakrap, V.P. Meshchanov
Minimax Design of Multiplierless Sharpened CIC Filters Based on Interval
Analysis ................................................................................................................................ 100
G. Molnar, A. Dudarin, M. Vuĉić
VI
Minimization of Maximum Electric Field in High-Voltage Parallel-Plate
Capacitor .............................................................................................................................. 105
R. Bleĉić, Q. Diduck, A. Barić
Modelling SMD Capacitors by Measurements ................................................................. 110
R. Mišlov, M. Magerl, S. Fratte-Sumper, B. Weiss, C. Stockreiter, A. Barić
Impact of Capacitor Dielectric Type on the Performance of Wireless Power
Transfer System ................................................................................................................... 116
D. Vinko, P. Oršolić
Switching Speed and Stress Analysis for Fixed-fixed Beam Based Shunt
Capacitive RF MEMS Switches ......................................................................................... 120
A. Kumar A., R. R
Performance Analysis of Micromirrors - Lift-off and von Mises Stress ........................ 126
S. Finny, R. R
Material and Orientation Optimization for Quality Factor Enhancement
of BAW Resonators ............................................................................................................. 130
R. Raj R.S., R. R
Impact of Propagation Medium on Link Quality for Underwater and
Underground Sensors ......................................................................................................... 135
G. Horvat, D. Vinko, J. Vlaović
Electrical Field Intensity Model on the Surface of Human Body for
Localization of Wireless Endoscopy Pill ........................................................................... 141
B. Lukovac, A. Koren, A. Marinĉić, D. Šimunić
Wide Band Current Transducers in Power Measurment Methods - an
Overview .............................................................................................................................. 146
R. Malarić, Ţ. Martinović, M. Dadić, P. Mostarac, Ţ. Martinović
Laboratory Model for Design and Verification of Synchronous Generator
Excitation Control Algorithms ........................................................................................... 152
S. Tusun, I. Erceg, I. Sirotić
The European Project SolarDesign Illustrating the Role of Standardization
in the Innovation System .................................................................................................... 158
W. Brenner, N. Adamovic
Open Public Design Methodology and Design Process .................................................... 164
D. Rembold, S. Jovalekic
VII
DISTRIBUTED COMPUTING, VISUALIZATION AND
BIOMEDICAL ENGINEERING
INVITED PAPER
Views on the Role and Importance of Dew Computing in the Service and
Control Technology ............................................................................................................. 175
Z. Šojat, K. Skala
PAPERS
DISTRIBUTED COMPUTING AND CLOUD COMPUTING
Parameters That Affect the Parallel Execution Speed of Programs in
Multi-Core Processor Computers ...................................................................................... 185
V. Xhafa, F. Dika
Federated Computing on the Web: the UNICORE Portal .............................................. 190
M. Petrova-El Sayed, K. Benedyczak, A. Rutkowski, B. Schuller
Problem-Oriented Scheduling of Cloud Applications: PO-HEFT Algorithm
Case Study ............................................................................................................................ 196
E.A. Nepovinnykh, G.I. Radchenko
Towards a Novel Infrastructure for Conducting High Productive Cloud-Based
Scientific Analytics .............................................................................................................. 202
P. Brezany, T. Ludescher, T. Feilhauer
An OpenMP Runtime Profiler/Configuration Tool for Dynamic Optimization
of the Number of Threads .................................................................................................. 208
T. Dancheva, M. Gusev, V. Zdravevski, S. Ristov
An Effective Task Scheduling Strategy in Multiple Data Centers in Cloud
Scientific Workflow ............................................................................................................. 214
E.I. Djebbar, G. Belalem
Visualisation in the ECG QRS Detection Algorithms ...................................................... 218
A. Ristovski, A. Guseva, M. Gusev, S. Ristov
Analysis and Comparison of Algorithms in Advanced Web Clusters
Solutions ............................................................................................................................... 224
D. Alagić, K. Arbanas
Metamodeling as an Approach for Better Computer Resources Allocation in
Web Clusters ........................................................................................................................ 230
D. Alagić, D. Maĉek
VIII
Showers Prediction by WRF Model above Complex Terrain ......................................... 236
T. Davitashvili, N. Kutaladze, R. Kvatadze, G. Mikuchadze, Z. Modebadze,
I. Samkharadze
Methods and Tools to Increase Fault Tolerance of High-Performance
Computing Systems ............................................................................................................. 242
I.A. Sidorov
Logical-Probabilistic Analysis of Distributed Computing Reliability ............................ 247
A.G. Feoktistov, I.A. Sidorov
Distributed Graph Reduction Algorithm with Parallel Rigidity Maintenance ............. 253
D. Sušanj, D. Arbula
Architecture of Virtualized Computational Resource Allocation on
SDN-enhanced Job Management System Framework .................................................... 257
Y. Watashiba, S. Date, H. Abe, K. Ichikawa, Y. Kido, H. Yamanaka, E. Kawai,
S. Shimojo
Near Real-time Detection of Crisis Situations .................................................................. 263
S. Girtelschmid, A. Salfinger, B. Pröll, W. Retschitzegger, W. Schwinger
Automatic Protocol Based Intervention Plan Analysis in Healthcare ............................ 269
M. Kozlovszky, L. Kovács, K. Batbayar, Z. Garaguly
Using Fourier and Hartley Transform for Fast, Approximate Solution of Dense
Linear Systems ..................................................................................................................... 274
Ţ. Jeriĉević, I. Koţar
Procedural Generation of Mediterranean Environments ............................................... 277
N. Mikuliĉić, Ţ. Mihajlović
Energy-Aware Power Management of Virtualized Multi-core Servers
through DVFS and CPU Consolidation ............................................................................ 283
H. Rostamzadeh Hajilari, M.M. Talebi, M. Sharifi
Human Posture Detection Based on Human Body Communication with
Muti-carriers Modulation ................................................................................................... 289
W. Ni, Y. Gao, Ţ. Luĉev Vasić, S.H. Pun, M. Cifrek, M.I. Vai, M. Du
SAT-Based Search for Systems of Diagonal Latin Squares in Volunteer
Computing Project SAT@home ........................................................................................ 293
O. Zaikin, S. Kochemazov, A. Semenov
Architectural Models for Deploying and Running Virtual Laboratories in the
Cloud .................................................................................................................................... 298
E. Afgan, A. Lonie, J. Taylor, K. Skala, N. Goonasekera
A CAD Service for Fusion Physics Codes ......................................................................... 303
M. Telenta, L. Kos
IX
Correlation between Attenuation of 20 GHz Satellite Communication Link and
Liquid Water Content in the Atmosphere ........................................................................ 308
M. Kolman, G. Kosec
Practical Implementation of Private Cloud with Traffic Optimization ......................... 314
D.G. Grozev, M.P. Shopov, N.R. Kakanakov
Improving Data Locality for NUMA-Agnostic Numerical Libraries ............................. 320
P. Zinterhof
Use Case Diagram Based Scenarios Design for a Biomedical Time-Series
Analysis Web Platform ....................................................................................................... 326
A. Jović, D. Kukolja, K. Jozić, M. Cifrek
Augmented Reality for Substation Automation by Utilizing IEC 61850
Communication ................................................................................................................... 332
M. Antonijević, S. Suĉić, H. Keserica
Innovation of the Campbell Vision Stimulator with the Use of Tablets ........................ 337
J. Brozek, M. Jakes, V. Svoboda
Classification of Scientific Workflows Based on Reproducibility Analysis ................... 343
A. Bánáti, P. Kacsuk, M. Kozlovszky
Dynamic Execution of Scientific Workflows in Cloud ..................................................... 348
E. Kail, J. Kovács, M. Kozlovszky, P. Kacsuk
FPGA Kernels for Classification Rule Induction ............................................................. 353
P. Škoda, B. Medved Rogina
VISUALIZATION SYSTEMS
Prototyping of Visualization Designs of 3D Vector Fields Using POVRay
Rendering Engine ................................................................................................................ 361
J. Opiła
New Cybercrime Taxonomy of Visualization of Data Mining Process .......................... 367
M. Babiĉ, B. Jerman-Blaţiĉ
Visual Representation of Predictions in Software Development Based on
Software Metrics History Data .......................................................................................... 370
B. Popović, A. Balota, Dţ. Strujić
Interaction with Virtual Objects in a Natural Way ......................................................... 376
I. Prazina, K. Balić, K. Pršeš, S. Rizvić, V. Okanović
Bone Shape Characterization Using the Fourier Transform and Edge
Detection in Digital X-Ray Images ..................................................................................... 380
D. Sušanj, G. Gulan, I. Koţar, Ţ. Jeriĉević
X
GIS in the e-Government Platform to Enable State Financial Subsidies Data
Transparency ....................................................................................................................... 383
M. Kranjac, U. Sikimić, I. Simić, M. Paroški, S. Tomić
Evaluation of Caching Techniques for Video on Demand in Named Data
Networks .............................................................................................................................. 388
K. Jakimoski, S. Arsenovski, L. Gorachinova, S. Chungurski, O. Iliev, L. Djinevski,
E. Kamcheva
BIOMEDICAL ENGINEERING
Diagnostic of Asthma Using Fuzzy Rules Implemented in Accordance with
International Guidelines and Physicians Experience ...................................................... 395
A. Badnjević, L. Gurbeta, M. Cifrek, D. Marjanović
Robust Beat Detection on Noisy Differential ECG .......................................................... 401
P. Lavriĉ, M. Depolli
Classification of Asthma Using Artificial Neural Network ............................................. 407
A. Badnjević, L. Gurbeta, M. Cifrek, D. Marjanović
Brain-Computer Interface Based on Steady-State Visual Evoked Potentials ............... 411
K. Friganović, M. Medved, M. Cifrek
Comparison of Wireless Electrocardiographic Monitoring and Standard
ECG in Dogs ........................................................................................................................ 416
A. Krvavica, Š. Likar, M. Brloţnik, A. Domanjko-Petriĉ, V. Avbelj
A Medical Cloud .................................................................................................................. 420
J. Tasiĉ, M. Gusev, S. Ristov
A Hospital Cloud-Based Open Archival Information System for the Efficient
Management of HL7 Big Data ........................................................................................... 426
A. Celesti, M. Fazio, A. Romano, M. Villari
Recognition and Adjustment for Strip Background Baseline in Fluorescence
Immuno-chromatographic Detection System ................................................................... 432
Y. Gao, C. Lin, S.H. Pun, M.I. Vai, M. Du
Agile Development of a Hospital Information System ..................................................... 436
S.L.R. Vrhovec
SOA Based Interoperability Component for Healthcare Information System ............. 442
D. Kuĉak, G. Đambić, V. Kokanović
Wireless Intrabody Communication Sensor Node Realized Using PSoC
Microcontroller .................................................................................................................... 446
F. Grilec, Ţ. Luĉev Vasić, W. Ni, Y. Gao, M. Du, M. Cifrek
XI
Detection of Heart Rate Variability from a Wearable Differential ECG
Device .................................................................................................................................... 450
J. Slak, G. Kosec
Penetration of the ICT Technology to the Health Care Primary Sector –
Ljubljana PILOT ................................................................................................................ 456
T. Poplas Susiĉ, U. Staniĉ
Image-Based Metal Artifact Reduction in CT Images ..................................................... 462
A. Šerifović-Trbalić, A. Trbalić
New Algorithm for Automatic Determination of Systolic and Diastolic
Blood Pressures in Oscillometric Measurements ............................................................. 467
V. Jazbinšek
TGTP-DB – a Database for Extracting Genome, Transcriptome and
Proteome Data Using Taxonomy ....................................................................................... 472
K. Kriţanović, M. Marinović, A. Bulović, R. Vaser, M. Šikić
Development and Perspectives of Biomedical Engineering in South
East European Countries 477
A. Badnjević, L. Gurbeta
Clustering of Heartbeats from ECG Recordings Obtained with Wireless
Body Sensors ........................................................................................................................ 481
A. Rashkovska, D. Kocev, R. Trobec
Heart Rate Analysis with NevroEkg .................................................................................. 487
M. Mohorĉiĉ, M. Depolli
TELECOMMUNICATIONS & INFORMATION
FNS • SPECIAL SESSION ON FUTURE NETWORKS AND SERVICES
PAPERS
A Survey of IoT Cloud Providers ...................................................................................... 497
T. Pflanzner, A. Kertesz
QoS-Aware Deployment of Data Streaming Applications over Distributed
Infrastructures ..................................................................................................................... 503
M. Nardelli
QoS-Aware Application Placement Over Distributed Cloud .......................................... 509
F. Bianchi, F. Lo Presti
XII
Energy-Aware Control of Server Farms ........................................................................... 515
M.E. Gebrehiwot, S. Aalto, P. Lassila
SDN Based Service Provisioning Management in Smart Buildings ............................... 521
M. Tošić, O. Iković, D. Bošković
TELECOMMUNICATIONS & INFORMATION
INVITED PAPER
Time Series Analysis and Possible Applications ............................................................... 531
M. Ivanović, V. Kurbalija
PAPERS
WIRELESS COMMUNICATIONS AND TECHNOLOGIES
Wireless Resonant Power Transfer – An Overview ......................................................... 543
Ţ. Martinović, M. Dadić, R. Malarić, Ţ. Martinović
Investigation of a Small Handheld PCB Nesting Two Antennas NFC
13.56 MHz and to RF 868 MHz .......................................................................................... 550
L.A. Iliev, I.S. Stoyanov, T.B. Iliev, E.P. Ivanova, Gr.Y. Mihaylov
The Coverage Belt for Low Earth Orbiting Satellites ..................................................... 554
S. Cakaj
The Investigation of the Effect of the Carrier Frequency Offset (CFO) in
SC-FDMA System ............................................................................................................... 558
N. Taşpinar, M. Balki
Performance Analysis of Low Density Parity Check Codes Implemented
in Second Generations of Digital Video Broadcasting Standards .................................. 562
Gr.Y. Mihaylov, T.B. Iliev, E.P. Ivanova, I.S. Stoyanov, L.A. Iliev
DATA AND IMAGE ANALYSIS
Iterative Denoising of Sparse Images ................................................................................ 569
I. Stanković, I. Orović, S. Stanković, M. Daković
Compressive Sensing Based Image Processing in TrapView Pest
Monitoring System .............................................................................................................. 574
M. Marić, I. Orović, S. Stanković
Big Data Analytics for Communication Service Providers ............................................. 579
D. Šipuš
Role of Data Analytics in Utilities Transformation .......................................................... 584
V. Ĉaĉković, Ţ. Popović
XIII
Using MEAN Stack for Development of GUI in Real-Time Big Data
Architecture ......................................................................................................................... 590
M. Štajcer, M. Štajcer, D. Orešĉanin
.................................................................................................................................
NETWORK TECHNOLOGIES
A Survey on Physical Layer Impairments Aware Routing and
Wavelength Assignment Algorithms in Transparent Wavelength
Routed Optical Networks ................................................................................................... 599
H. Dizdarević, S. Dizdarević, M. Škrbić, N. Hadţiahmetović
A Survey on Transition from GMPLS Control Plane for Optical
Multilayer Networks to SDN Control Plane ..................................................................... 606
S. Dizdarević, H. Dizdarević, M. Škrbić, N. Hadţiahmetović
About the Telco Cloud Management Architectures ........................................................ 614
I. Nenadić, D. Kobal, D. Palata
CPE Virtualization by Unifying NFV, SDN and Cloud Technologies ........................... 622
P. Cota, J. Šabec
Soft Sensors in Wireless Networking as Enablers for SDN Based
Management of Content Delivery ...................................................................................... 628
M. Tošić, O. Iković, D. Bošković
A FIRM Approach for Software-Defined Service Composition ..................................... 634
P. Kathiravelu, T. Galinac Grbac, L. Veiga
Test Environment & Application as a Service .................................................................. 640
T. Ţitnik, M. Galin, G. Pauković, I. Dević, R. Ĉiţmar, Z. Bosić
Workaround Solutions Used During PSTN Migration of Customers to
IMS Network ....................................................................................................................... 644
N. Štokić
Development of the Generic OFDM Based Transceiver in the LabView
Software Environment ........................................................................................................ 650
D. Hamidović, N. Suljanović
The Challenge of Cellular Cooperative ITS Services Based on 5G
Communications Technology ............................................................................................. 656
Z. Kljaić, P. Škorput, N. Amin
NETWORKS PERFORMANCES
Performance Evaluation of Different Scheduling Algorithms in LTE
Systems ................................................................................................................................. 667
A. Marinĉić, D. Šimunić
XIV
Performance Analysis of LTE Networks with Random Linear Network
Coding .................................................................................................................................. 673
T.D. Assefa, K. Kralevska, Y. Jiang
VoLTE E2E Performance Management ........................................................................... 679
D. Klobuĉarević, Ţ. Klobuĉarević, D. Belošić
Balancing Security and Blocking Performance with Reconfiguration
of the Elastic Optical Spectrum ......................................................................................... 684
S. Kumar Singh, W. Bziuk, A. Jukan
Ensuring Continuous Operation of Critical Process of Remote Control
System at the Level of Network Connectivity ................................................................... 690
I. Fosić, D. Budiša
IoT PLATFORM AND APPLICATIONS
Requirements and Challenges in Wireless Network‟s Performance
Evaluation in Ambient Assisted Living Environments .................................................... 699
A. Koren, D. Šimunić
Long Term Evolution as a Precondition for Internet of Postal Things .......................... 703
A. Kosovac, A. Veispahić, M. Berković
Advanced Sensing and Internet of Things in Smart Cities ............................................. 707
D. Capeska Bogatinoska, R. Malekian, J. Trengoska, W. Asiama Nyako
Security Challenges of the Internet of Things .................................................................. 713
M. Weber, M. Boban
Promoting Health for Chronic Conditions: a Novel Approach That
Integrates Clinical and Personal Decision Support ......................................................... 719
I. Lasorsa, M. Ajĉević, P. D’Antrassi, G. Carlini, A. Accardo, S. Marceglia
A Taxonomy of Localization Techniques Based on Multidimensional
Scaling .................................................................................................................................. 724
B. Risteska Stojkoska
Distributed Real-Time Lift Kinematic Monitoring Using COTS
Smartphones ........................................................................................................................ 730
N. Miškić-Pletenac, K. Lenac
Mobile Devices as Authentic and Trustworty Sources in Multi-Agent
Systems ................................................................................................................................. 736
V. Vyroubal, A. Stanĉić, I. Grgurević
SOFTWARE ENGINEERING
Comparative Analysis of Functional and Object-Oriented Programming .................... 745
D. Alić, S. Omanović, V. Giedrimas
XV
Improving the Composition and Assembly of APIs in Service Dominant
Ecosystem Environments .................................................................................................... 751
D. Ramljak
Service Level Agreement - SLA, raspoloživost servisa i kvalitet sistema u
telekomunikacijama ............................................................................................................ 755
D. Glamoĉanin
Challenges of a Service Transition in Multi Domain Environment ............................... 761
I. Golub, B. Radojević
Teaching “Ten Commandments” of Software Engineering ............................................ 766
Z. Putnik, M. Ivanović, Z. Budimac, K. Bothe
Methodologies for Development of Mobile Applications ................................................. 772
Z. Stapić, M. Mijaĉ, V. Strahonja
Upravljanje promjenama na primjeru telekom operatera u Jugoistočnoj
Evropi ................................................................................................................................... 777
A. Gabela
Tehnologije integracije informacijskih sustava ................................................................ 783
A. Stojanović, N. Lazić, Ţ. Kovaĉević
TELECOM PRODUCTS, SERVICES AND MARKET
Restructuring of Telco Products ........................................................................................ 791
I. Vrbovĉan, T. Pavić, M. Šoša Anić
Moving from Network-Centric toward Customer-Centric CSPs in Bosnia
and Herzegovina .................................................................................................................. 794
N. Banović-Ćurguz, D. Ilišević
Future Communication Model: Challenges and Opportunities for Society
as a Whole ............................................................................................................................ 800
D. Ilišević, N. Banović-Ćurguz
Some Aspects of Network Management System for Video Service ................................ 805
O. Jukić, I. HeĊi
Influence of OTT Service Providers on Croatian Telecomunication
Market .................................................................................................................................. 809
I. Draţić Lutilsky, M. Ivić
Implementing Shared Service Center in Telecom Environment as More
Efficient and More Cost Effective Business Model .......................................................... 814
T. Ţilić, V. Ĉošić
XVI
ICT APPLICATIONS
Usluga mobilnog plaćanja računa m:Pay ......................................................................... 821
V. Ţlof, S. Salapura
Praćenje imovinsko pravnih poslova na elektroničkoj komunikacijskoj
infrastrukturi putem Web GIS aplikacije ........................................................................ 826
D. Salopek, T. Đigaš, F. Ambroš, M. Štimac
Fieldbus Diagnostic Online Solution Program Establishment at Rijeka Oil
Refinery ................................................................................................................................ 831
B. Ţeţelj, H. Hajdo
Automatic Communication System Ship to Shipping Terminal, for
Reporting Potential Malfunctions of a Ballast Water Treatment System
Operation ............................................................................................................................. 836
G. Bakalar, M. Baggini
AdriaHUB ICT platform .................................................................................................... 841
T. Škorjanc, R. Ţigulić, N. AnĊelić
Mjerenje kvaliteta usluge mobilnog plaćanja m:Pay ....................................................... 847
S. Salapura, V. Ţlof
Uspostava sustava upravljanja identitetima u Carinskoj upravi ................................... 852
M. Hajnić, D. Cmuk
COMPUTERS IN EDUCATION
INVITED PAPER
New Informatics Curriculum - Croatian Tradition with World Trends ....................... 863
L. Kralj
PAPERS
Creativity, Communication and Collaboration: Grading with Open
Badges ................................................................................................................................... 869
I. Salopek Ĉubrić, G. Ĉubrić
A Study of Students‟ Attitudes and Perceptions of Digital Scientific
Information Landscape ....................................................................................................... 875
R. Vrana
Researcher Measured - Towards a Measurement-driven Academia ............................. 881
H. Jaakkola, J. Henno, J. Mäkelä, K. Ahonen
XVII
Use of „Learning Analytics‟ ................................................................................................. 888
J. Henno, H. Jaakkola, J. Mäkelä
Smart Immersive Education for Smart Cities with Support via Intelligent
Pedagogical Agents .............................................................................................................. 894
M. Soliman, A. Elsaadany
Review of Source-Code Plagiarism Detection in Academia ............................................ 901
M. Novak
The Comparison of Impact Offline and Online Presentation on Student
Achievements: A Case Study .............................................................................................. 907
P. Esztelecki, G. Kőrösi, Z. Námestovski, L. Major
Digital Competences for Teachers: Classroom Practice .................................................. 912
M. Filipović Tretinjak, V. AnĊelić
Introducing Collaborative e-Learning Activities to the e-Course
“Information Systems” ....................................................................................................... 917
M. Ašenbrener Katić, S. Ĉandrlić, M. Holenko Dlab
A Curriculum for Unified Embedded Engineering Education ....................................... 923
I. Kaštelan, M. Temerinac
Individual versus Collaborative Learning in a Virtual World ....................................... 929
P. Pürcher, M. Höfler, J. Pirker, L. Tomes, A. Ischebeck, C. Gütl
Preparation of a Hybrid e-Learning Course for Gamification ....................................... 934
D. Kermek, D. Strmeĉki, M. Novak, M. Kaniški
Implementation of Fundamental Ideas into the Future Managers´
Informatics Education ........................................................................................................ 940
L. Révészová
Fostering Creativity in Technology Enhanced Learning ................................................ 946
A. Ţiţić, A. Granić, I. Šitin
Teaching Physics in Primary Schools with Tablet Computers: Key
Advantages ........................................................................................................................... 952
V. Grubelnik, L. Grubelnik
Project Based Learning (PBL) in the Teachers‟ Education ............................................ 957
M. Krašna
Didactical Suitability of e-Generated Drill Tests for Physics .......................................... 962
R. Repnik, M. Soviĉ
Utilizing MOOCs in the Development of Education and Training
Programs .............................................................................................................................. 966
P. Linna, T. Mäkinen, H. Keto
XVIII
Distance Delivery and Technology-Enhanced Learning in Information
Technology and Programming Courses at RIT Croatia ................................................. 970
K. Marasović, B. Mihaljević, I. Baĉić
Overview of IT Solutions for Career Services and Quality Assurance
at Higher Education ............................................................................................................ 976
E. Gjorgjevska, P. Tonkovikj, M. Gusev
Selecting the Most Appropriate Web IDE for Learning Programming
Using AHP ............................................................................................................................ 982
I. Škorić, B. Pein, T. Orehovaĉki
Using Real Projects as Motivators in Programming Education ..................................... 988
M. Konecki, S. Lovrenĉić, M. Kaniški
Making Programming Education More Accessible for Visually
Impaired ............................................................................................................................... 992
M. Konecki, N. Ivković, M. Kaniški
Use of Computer Programs in Teaching Photography Courses
at Schools of Applied Arts and Design in Croatia ............................................................ 996
Z. Prohaska, Z. Prohaska, I. Uroda
Universtiy Search Engine .................................................................................................. 1002
Ţ. Knok, M. Marĉec
Experience with Usage of LMS Moodle not Only for the Educational
Purposes at the Educational Institution .......................................................................... 1006
D. Paľová
Using Robot Simulation Applications at the University – Experiences with
the KUKA Sim ................................................................................................................... 1012
D. Lukac
Implementation and Analysis of Open Source Information Systems in
Electronic Business Course for Economy Students ....................................................... 1017
H. Jerković, P. Vranešić, G. Slamić
Virtual Firms as Education Tool in the Field of eCommerce ....................................... 1023
M. Vejaĉka
Systems and Software Assurance - A Model Cyber Security Course .......................... 1028
V. Jovanović, J. Harris
Analysis of Learning Management Systems Features and Future
Development Challenges in Modern Cloud Environment ............................................. 1033
H. Jerković, P. Vranešić, A. Radan
Markov Model of Mathematical Competences in Elementary Education ................... 1039
G. Paić, B. Tepeš, K. Pavlina
XIX
PYTHON as Pseudo Language for Formal Language Theory ..................................... 1045
Z. Dovedan Han, K. Kocijan, V. Lopina
Croatian Students' Attitudes Towards Technology Usage in Teaching
Asian Languages – a Field Research ............................................................................... 1051
M. Janjić, S. Librenjak, K. Kocijan
Adaptive e-Learning System for Language Learning: Architecture
Overview ............................................................................................................................ 1056
V. Slavuj, B. Kovaĉić, I. Jugo
L2L – Learn to Learn: Teach to Learn: CARTOON ENGLISH
(A constructivist approach to teaching and learning) .................................................... 1061
K. Bedi
Facilitating Mobile Learning by Use of Open Access Information
Resources ............................................................................................................................ 1067
R. Vrana
Work-Based Learning: New Skills for New Technologies ............................................ 1072
M. Lamza Maronić, I. Ivanĉić
Creating Assets as a Part of Tertiary Education of Technical Domains ...................... 1078
J. Brozek, D. Hamernik, Z. Kopecky
Software Solution Incorporating the Steganographic Principle for Hiding
Pictures within Pictures .................................................................................................... 1084
J. Brozek, J. Marek, V. Svoboda
A Platform Independent Tool for Programming, Visualization and
Simulation of Simplified FPGAs ...................................................................................... 1091
M. Ĉupić, K. Brkić, Ţ. Mihajlović
Digital Risks and Experiences of Future Teachers ........................................................ 1097
T. Bratina
A Study of Factors Influencing Higher Education Teachers' Intention to
Use e-Learning in Hybrid Environments ........................................................................ 1103
S. Babić, M. Ĉiĉin-Šain, G. Bubaš
Development and Implementation of E-Learning System in Smart
Educational Environment ................................................................................................. 1109
A. Elsaadany, K. Abbas
Introducing Inquiry-Based Learning to Estonian Teachers: Experiences
from the Creative Classroom Project .............................................................................. 1115
N. Hoić-Boţić, M. Laanpere, K. Pata, I. Franković, S. Teder
Mobile Robots Approach for Teaching Programming Skills in Schools ..................... 1121
W. Werth, C. Ungermanns
XX
Age Independent Examination of Algorithm Creating Abilities .................................. 1125
Z.A. Godó, D. Kocsis, G. Kiss, G. Stóka
The Digitalization Push in Universities ........................................................................... 1130
H. Jaakkola, H. Aramo-Immonen, J. Henno, J. Mäkelä
Toby the Explorer – an Interactive Educational Game for Primary School
Pupils .................................................................................................................................. 1137
N. Kaevikj, A. Kostadinovska, B. Risteska Stojkoska, M. Mihova, K. Trivodaliev
The Use of Contemporary e-Services and e-Contents at Mother Tongue
Classes ................................................................................................................................ 1142
V. Jesenek
Migration from in-House LMS to Google Classroom: Case of SEEU ......................... 1145
L. Abazi Bexheti, A. Kadriu, M. Apostolova Trpkovska
Survey Analyses of Impacting Factors in ICT Usage in School
Management: Case Study ................................................................................................. 1149
B. Fetaji, M. Fetaji, R. Azemi, M. Ebibi
Case Study Analyses of Semantic Security Using SQL Injection in Web
Enabled ORACLE Database ............................................................................................ 1155
M. Fetaji, B. Fetaji, M. Ebibi
Using Web Applications in Education ............................................................................. 1161
A. Babić, S. Vukmirović, Z. Ĉapko
Qualitative Approach to Determining the Relevant Facets of Mobile
Quality of Educational Social Web Applications ........................................................... 1165
T. Orehovaĉki, S. Babić
MeĎukurikularni projekti u nastavi informatike u Ekonomskoj školi –
primjeri dobre prakse ....................................................................................................... 1171
S. Bulešić Milić
Tradicionalni ili hibridni model nastave računalstva .................................................... 1174
M. Sertić, K. Šolić
Provjere znanja pomoću Classroom Managera u učionicama
budućnosti .......................................................................................................................... 1180
M. Korać
Primjena e-učenja u hrvatskom vojnom obrazovanju ................................................... 1184
D. Moţnik
Prezentacijski alati za prikaz matematičkih sadržaja ................................................... 1190
M. Štefan Trubić, I. Radošević
XXI
Siguran put do škole .......................................................................................................... 1196
D. Šokac, I. Biuklija
Primjena obrazovne društvene mreže Edmodo u nastavi III. osnovne
škole Čakovec ..................................................................................................................... 1199
N. Boj
Digitalni scenariji učenja .................................................................................................. 1203
M. Mirković
Detekcija najčešćih sintaktičkih i logičkih grešaka učenika kod
stvaranja programa u početnim godinama učenja programiranja .............................. 1209
K. Blaţeka
Nastava matematike na SageMathCloud platformi ....................................................... 1215
Ţ. Tutek
Uvod u robotiku - Arduino platforma i web aplikacija ................................................. 1218
A. Lacković, B. Fulanović
Informacijski sustav visokih učilišta - analiza slučaja za Veleučilište u
Šibeniku .............................................................................................................................. 1222
S. Krajaĉić, L. Topolĉić, F. Urem
Mobilne aplikacije u visokom obrazovanju .................................................................... 1225
M. Blašković, M. Fumić, F. Urem
Metodologija izrade E – learning sadržaja za edukaciju o izradi
Standarda zanimanja ........................................................................................................ 1230
I. Vunarić, S. Grgić, T. Babić
Uloga IKT u razvoju financijske pismenosti djece ........................................................ 1235
I. Ruţić
Informacijsko-komunikacijske znanosti u nastavi - digitalizirani
materijali za učenje ........................................................................................................... 1239
T. Babić, A. Ogrin, M. Babić
Istraživanje stavova i očekivanja studenata prilikom upisa na studij
kao metoda povećanja kvalitete usluge u visokom obrazovanju .................................. 1245
T. Babić, S. Grgić, E. Rajković
E-obrazovanjem do fleksibilnog modela učenja ............................................................. 1250
M. Boţurić, R. Bogut, M. Tretinjak
Preporuke i primjeri dobre prakse e-učenja u hrvatskom visokom školstvu ............. 1254
D. Junaković, I. Paćelat, F. Urem
XXII
Ilustracija primjene novog Kurikuluma iz predmeta Informatika i to
domene - Računalno razmišljanje i programiranje na primjeru metode
Početnica Mema za prvi razred osnovne škole ............................................................... 1258
M. Ĉiĉin-Šain, S. Babić, L. Kralj
Izloženost i navike korištenja medija i računala kod djece u razrednoj
nastavi ................................................................................................................................. 1262
T. Paviĉić, J. Šurić
COMPUTERS IN TECHNICAL SYSTEMS
INVITED PAPERS
Architecture and Application of Virtual Desk and 3D Process Simulation for
Wire Rod Rolling Mills ..................................................................................................... 1271
A. Venuti
Use of Offline Computational Tools for Plant Data Analysis and Setup
Model Calibration: a Perspective in the Industry of Flat Metal
Production .......................................................................................................................... 1276
C. Aurora, F.A. Cuzzola
Architecture and Implementation of a MES System in a Large Scale Steel
Plant: Severstal Cherepovets Success Story ................................................................... 1280
G. Brunetti
PAPERS
Anfis as a Method for Determinating MPPT in the Photovoltaic System
Simulated in Matlab/Simunlink ....................................................................................... 1289
D. Mlakić, S. Nikolovski
Linear Motion Calculation of the High Voltage Circuit Breaker Contacts
Using Rotary Motion Measurement with Nonlinear Transfer Function ..................... 1294
K. Obarĉanin, R. Ostojić
Robot Arm Teleoperation via RGBD Sensor Palm Tracking ....................................... 1300
F. Marić, I. Jurin, I. Marković, Z. Kalafatić, I. Petrović
A Proposal for a Fully Distributed Flight Control System Design ............................... 1306
M. Šegvić, K. Krajĉek Nikolić, E. Ivanjko
Control of Thermal Process with Simulink and NI USB-6211 in
Real Time ........................................................................................................................... 1311
I. Tikvić, G. Vujisić, M. Fruk
Stabilization of Multi-AUV Formation with Digital Control ........................................ 1315
S.A. Ul’yanov, N.N. Maksimkin
XXIII
A Hybrid Approach to Solve the Dynamic Patrol Routing Problem for
Group of Underwater Robots ........................................................................................... 1321
M.Yu. Kenzin, I.V. Bychkov, N.N. Maksimkin
Multi - Heater Induction Heating System with Sandwich Material
Heater ................................................................................................................................. 1327
A. Smrke
Two-Rate Motion Control of VTAV by NARMA-L2 Controller for
Enhanced Situational Awareness ..................................................................................... 1333
I. Astrov
LADDER Program Solution for Multi-probe Monitoring and
Control in Simple Cooling Process .................................................................................. 1339
T. Špoljarić, M. Špoljarić
An M2M Solution for Smart Metering in Electrical Power Systems ........................... 1348
M.P. Shopov
Noise within a Data Center ............................................................................................... 1352
D. Miljković
Active Noise Control: From Analog to Digital – Last 80 Years .................................... 1358
D. Miljković
Responding to Stakeholders‟ Resistance to Change in Software Projects –
A Literature Review .......................................................................................................... 1364
S.L.R. Vrhovec
Object-Oriented Programming Model for Synthesis of Domain-Specific
Application Development Environment .......................................................................... 1369
T. Lugarić, Z. Pavlić, D. Škvorc
Logistic and Production Computer Systems in Small-Medium
Enterprises ......................................................................................................................... 1375
M. Pighin
The Implications of Employing Component Based Software Design in
Non-Commercial Applications ......................................................................................... 1380
B. Zorić, G. Martinović, I. Crnković
Extended Approach to Selecting a Project-specific Reliability Growth
Model .................................................................................................................................. 1386
J. Krini, A. Krini, O. Krini, J. Börcsök
Embedded Linux Controlled Sensor Network ............................................................... 1392
M. Saari, A.M. Baharudin, P. Sillberg, P. Rantanen, J. Soini
XXIV
Portable Sensor System for Reliable Condition Measurement ..................................... 1397
J. Soini, P. Sillberg, P. Rantanen, J. Nummela
Architecture of an Interoperable IoT Platform Based on
Microservices ..................................................................................................................... 1403
T. Vresk, I. Ĉavrak
Performance Estimation in Heterogeneous MPSoC Based on
Elementary Operation Cost .............................................................................................. 1409
N. Frid, D. Ivošević, V. Sruk
Sustav za lociranje atmosferskih pražnjenja u identifikaciji kvarova TK
mreže uzrokovanih atmosferskim prenaponima ............................................................. 1413
V. Milardić, B. Franc, M. Budimirović
SNUPI - Sustav za nadzor i upravljanje procesima infrastrukture
podatkovnog centra ........................................................................................................... 1420
M. Zmijanac
INTELLIGENT SYSTEMS
BiForD • SPECIAL SESSION ON BIOMETRICS & FORENSICS & DEIDENTIFICATION AND PRIVACY PROTECTION
KEYNOTE SPEECH
Face Alignment: Addressing Pose Variability in Face Recognition
Systems ............................................................................................................................... 1433
V. Štruc
PAPERS
Shape and Texture Combined Face Recognition for Detection of
Forged ID Documents ....................................................................................................... 1437
D. Sáez-Trigueros, H. Hertlein, L. Meng, M. Hartnett
Simple Method Based on Complexity for Authorship Detection of Text ..................... 1443
L. Meluch, I. Tokárová, P. Farkaš, F. Schindler
Privacy Protection Performance of De-identified Face Images with and
without Background .......................................................................................................... 1448
Z. Sun, L. Meng, A. Ariyaeeinia, X. Duan, Z.-H. Tan
Deep Metric Learning for Person Re-Identification and
De-Identification ................................................................................................................ 1454
I. Filković, Z. Kalafatić, T. Hrkać
XXV
Deformable Part-Based Robust Face Detection under Occlusion by
Using Face Decomposition into Face Components ......................................................... 1459
D. Marĉetić, S. Ribarić
Creating a Face Database for Age Estimation and Classification ................................ 1465
P. Grd, M. Baĉa
Forensic Anthropometry from Voice: An Articulatory-Phonetic
Approach ............................................................................................................................ 1469
R. Singh, B. Raj, D. Gencaga
INTELLIGENT SYSTEMS
PAPERS
Computer Vision for the Blind: a Dataset for Experiments on Face
Detection and Recognition ................................................................................................ 1479
S. Carrato, S. Marsi, E. Medvet, F.A. Pellegrino, G. Ramponi, M. Vittori
Impact of Light Conditions on the Vertical Traffic Signs Detection in
Vertical Traffic Signs Recognition System ..................................................................... 1485
D. Solus, Ľ. Ovseník, J. Turán
Wound Detection and Reconstruction Using RGB-D Camera ..................................... 1490
D. Filko, E.K. Nyarko, R. Cupec
Clustering of Affective Dimensions in Pictures: An Exploratory Analysis
of the NAPS Database ....................................................................................................... 1496
M. Horvat, K. Jednoróg, A. Marchewka
Challenges in Adopting Big Data Strategies and Plans in
Organizations ..................................................................................................................... 1502
A. Budin, S. Krajnović
A Survey of Intelligent System Techniques for Indian Stock Market
Forecasting ......................................................................................................................... 1508
S. Panwar, V.P. Upadhyay, S.K. Bishnoi
The Effect of Class Distribution on Classification Algorithms in Credit
Risk Assessment ................................................................................................................. 1514
K. Andrić, D. Kalpić
Software Solution for Optimal Planning of Sales Persons Work Based on
Depth-First Search and Breadth-First Search Algorithms ........................................... 1521
E. Ţunić, A. Djedović, B. Ţunić
Iterated Local Search Algorithm for Planning the Sequence of Arrivals
and Departures at Airport Runways ............................................................................... 1527
E. Bytyçi, K. Sylejmani, A. Dika
XXVI
Energy Efficiency with Intelligent Light Management Systems ................................... 1532
I. Britvić, A. Nikitović
Adaptive and Modular Urban Smart Infrastructure .................................................... 1538
M. Klarić, I. Kuzle, I. Livaja
Automatic Pathole and Speed Breaker Detection Using Android
System ................................................................................................................................. 1543
V. Rishiwal, H. Khan
The Influence of the CAPTCHA Types to Its Solving Times ........................................ 1547
D. Brodić, S. Petrovska, M. Jevtić, Z.N. Milivojević
Techniques and Applications of Emotion Recognition in Speech ................................ 1551
S. Lugović, I. DunĊer, M. Horvat
Word Occurrences and Emotions in Social Media: Case Study on a
Twitter Corpus .................................................................................................................. 1557
I. DunĊer, M. Horvat, S. Lugović
The Application of Parameterized Algorithms for Solving SAT to the
Study of Several Discrete Models of Collective Behavior .............................................. 1561
S. Kochemazov, A. Semenov, O. Zaikin
Logical-Algebraic Equations Application in Discrete-Event Systems
Studying ............................................................................................................................. 1566
N. Nagul
An Evaluation Framework and a Brief Survey of Decision Tree
Tools .................................................................................................................................... 1572
N. Vlahović
Positive Constructed Formulas Preprocessing for Automatic
Deduction ........................................................................................................................... 1578
E. Cherkashin, A. Davydov, A. Larionov
Monte-Carlo Randomized Algorithm: Empirical Analysis on
Real-World Information Systems .................................................................................... 1582
R. Kudelić, D. Oreški, M. Konecki
Control Flow Graph Visualization in Compiled Software Engineering ...................... 1586
A. Mikhailov, A. Hmelnov, E. Cherkashin, I.V. Bychkov
Bottom-Left and Sequence Pair for Solving Packing Problems ................................... 1591
T. Rolich, D. Domović, M. Golub
Automatic Image Annotation Refinement ...................................................................... 1597
M. Pobar, M. Ivašić-Kos
XXVII
Defining Ontology Combining Concepts of Massive Multi-Player Online
Role Playing Games and Organization of Large-Scale Multi-Agent
Systems ............................................................................................................................... 1603
B. Okreša Đurić, M. Schatten
Comparison of Solution Representations for Scheduling in the Unrelated
Machines Environment ..................................................................................................... 1609
M. Đurasević, D. Jakobović
INFORMATION SYSTEMS SECURITY
PAPERS
TECHNICAL TRACK
Technical Recommendations for Improving Security of Email
Communications ................................................................................................................ 1623
A. Malatras, I. Coisel, I. Sanchez
Performance Analysis of Two Open Source Intrusion Detection
Systems ............................................................................................................................... 1629
B. Brumen, J. Legvart
Challenges of Mobile Device Use in Healthcare ............................................................. 1635
S.L.R. Vrhovec
Safe Use of Mobile Devices in the Cyberspace ............................................................... 1639
S.L.R. Vrhovec
Securing Web Content and Services in Open Source Content
Management Systems ........................................................................................................ 1644
H. Jerković, P. Vranešić, S. Dadić
Can Malware Analysts be Assisted in Their Work Using Techniques
from Machine Learning? .................................................................................................. 1650
I. Novković, S. Groš
Performance Evaluation of a Rule-Based Access Control Framework ....................... 1656
S.A. Afonin
SOCIAL ENGINEERING TRACK
Going White Hat: Security Check by Hacking Employees Using Social
Engineering Techniques ................................................................................................... 1663
Z. Lovrić Švehla, I. Sedinić, L. Pauk
XXVIII
Analysis of Phishing Attacks against Students ............................................................... 1667
J. Andrić, D. Oreški, T. Kišasondi
What Do Students Do with Their Assigned Default Passwords? ................................. 1674
L. Bošnjak, B. Brumen
Analysing Real Students‟ Passwords and Students‟ Passwords
Characteristics Received From a Questionnaire ............................................................ 1680
V. Taneski, M. Heriĉko, B. Brumen
MISC TRACK
Using DEMF in Process of Collecting Volatile Digital Evidence .................................. 1689
M. Baĉa, J. Ćosić, P. Grd
From Safe Harbour to European Data Protection Reform ........................................... 1694
T. Katulić, G. Vojković
Information Security Assessment in Nature Parks ........................................................ 1699
S. Aksentijević, T. Đugum, K. Šakić
Clustering Approach for User Location Data Privacy in
Telecommunication Services ............................................................................................ 1706
M. Vuković, M. Kordić, D. Jevtić
Analiza sigurnosnih ranjivosti inteligentnih sučelja za upravljanje
podatkovnim centrom ....................................................................................................... 1711
M. Ramljak
BUSINESS INTELLIGENCE SYSTEMS
PAPERS
Analyzing Air Pollution on the Urban Environment ..................................................... 1723
E. Baralis, T. Cerquitelli, S. Chiusano, P. Garza, M.R. Kavoosifar
Application of Model Driven Architecture for Development of Data
Consolidation Web-System ............................................................................................... 1729
A.A. Korobko, L.F. Nozhenkova
Business Process Management Systems Selection Guiedelines: Theory and
Practice ............................................................................................................................... 1735
V. Bosilj Vukšić, L. Brkić, M. Baranović
Organization of Tax Data Warehouse for Legal Entities .............................................. 1741
M. Sretenović, B. Kovaĉić, V. Jovanović
XXIX
Predictive Analytics in Big Data Platforms – Comparison and
Strategies ............................................................................................................................ 1747
M. Zekić-Sušac, A. Has
The Analysis of CSFs in Stages of ERP Implementation - Case Study in
Small and Medium - Sized (SME) Companies in Croatia ............................................. 1753
M. Nikitović, V. Strahonja
Model optimizacije procesa s primjenom na punjenju bankomata .............................. 1759
I. Osman, K. Bokulić
DIGITAL ECONOMY AND GOVERNMENT, LOCAL
GOVERNMENT, PUBLIC SERVICES
PAPERS
The Modern Approach to the Analysis of Logistics Information
Systems ............................................................................................................................... 1769
A. Iskra, E. Tijan, S. Aksentijević
Development of the Data Warehouse Model for Public Authorities
Accounts in Croatia ........................................................................................................... 1774
M. Sretenović, B. Kovaĉić, V. Jovanović
The Future of Digital Economy in Some SEE Countries (Case study:
Croatia, Macedonia, Montenegro, Serbia, Bosnia and Herzegovina) .......................... 1780
M. Vidas-Bubanja, I. Bubanja
Effects and Evaluation of Open Government Data Initiative in
Croatia ................................................................................................................................ 1786
T. Vraĉić, M. Varga, K. Ćurko
ICT Technologies and Structured Dialogue: Experience of
"Go, go, NGO!" Project .................................................................................................... 1792
N. Kadoić
Using ICT Tools for Decision Making Support in Local Government
Units .................................................................................................................................... 1798
N. Kadoić, I. Kedmenec
The Conceptual Risk Management Model - A Case Study of Varazdin
County ................................................................................................................................ 1804
R. Kelemen, M. Biškup, N. Begiĉević ReĊep
Electronic Commerce in Croatia and a Comparison of Open Source
Tools for the Development of Electronic Commerce ..................................................... 1811
J. Tomljanović, T. Turina, E. Krelja Kurelović
XXX
The Social Marketing as Prerequisite for the Competitiveness of
South-East European Companies .................................................................................... 1817
I. Bubanja
Homeostasis and Collaborative Decision Making for Smart and Cognitive
Cities ................................................................................................................................... 1822
J. Klasinc
Can the Bank Payment Obligation Replace the International
Documentary Letter of Credit? ........................................................................................ 1828
R. Bergami
Implementation and Design of Cool'n'Project - Web-Based Project
Management Software ....................................................................................................... 1834
I. Špeh
Analysis of ICT Use in Private Accommodation Rentals in Croatia ............................ 1841
Lj. Zekanović-Korona, J. Grzunov
Records Management Challenges and Opportunities: An Australian
Perspective ......................................................................................................................... 1847
A. Davies, R. Bergami
Effectiveness Analysis of Using Solid State Disk Technology ....................................... 1852
A. Skendţić, B. Kovaĉić, E. Tijan
Information and Communication Technologies and the New Forms
of Organized Crime in Network Society ......................................................................... 1857
M. Boban
Digitalizacija lokalne uprave na primjeru Istarske županije ........................................ 1863
L. Ordanić, N. Šarić-Kekić
Digitalna ekonomija – rezultanta disruptivnih tehnologija .......................................... 1869
M. Mauher
MIPRO Junior – STUDENT PAPERS
PAPERS
Technical Diagnosis of Basic Logic Gates ....................................................................... 1879
Z. Tucaković
Developing a Parking Monitoring System Based on the Analysis of
Images from an Outdoor Surveillance Camera .............................................................. 1884
I.V. Sukhinskiy, E.A. Nepovinnykh, G.I. Radchenko
XXXI
Laboratory Model of an Elevator: Control with Three Speed
Profiles ................................................................................................................................ 1889
A. Jozić, T. Špoljarić, D. Gadţe
Security and Privacy in an IT Context – a Low-Cost WIDS Employed
against MITM Attacks (concept) ..................................................................................... 1895
N. Poljak, M. Ševo, I. Livaja
Use of HLA During Customer Flow Simulation in a Polyclinic ................................... 1899
J. Brozek, J. Fikejz, V. Samotan, L. Gago
Revealing the Structure of Domain Specific Tweets via Complex
Networks Analysis ............................................................................................................. 1904
E. Moĉibob, S. Martinĉić-Ipšić, A. Meštrović
Counting Prime Numbers in Paralell - Faster by Reducing the
Synchronization Overhead ............................................................................................... 1909
A. Duraković, E. Pajić, I. Branković, E. Kušundţija, S. Karkelja
Parallelization Challenges of BFS Traversal on Dense Graphs
Using the CUDA Platform ................................................................................................ 1914
H. Milišić, D. Ahmić, H. Sinanović, E. Šarić, A. Asotić, A. Huseinović
Buck Converter Controlled by Arduino Uno ................................................................. 1919
H. Kovaĉević, Ţ. Stojanović
Audio Phonebook for the Blind People ........................................................................... 1924
G. Popović, U. Pale
Heart Rate Variability Analysis Using Different Wavelet
Transformations ................................................................................................................ 1930
U. Pale, F. Thürk, E. Kaniusas
Istraživanje ransomware napada i prijedlozi za bolju zaštitu ....................................... 1936
M. Rak, M. Ţagar
XXXII
LIST OF PAPER REVIEWERS
Aksentijević, S.
Alexin, Z.
Antolić, Ţ.
Antonić, A.
Aramo-Immonen, H.
Arbula, D.
Ašenbrener Katić, M.
Avbelj, V.
Babić, D.
Babić, S.
Bačmaga, J.
Bako, N.
Balaţ, A.
Banek, M.
Banek Zorica, M.
Barić, A.
Basch, D.
Bebel, B.
Begušić, D.
Bellatreche, L.
Bibuli, M.
Bilas, V.
Blaţević, D.
Blaţević, Z.
Blečić, R.
Bogunović, N.
Bonastre, J.
Bosiljevac, M.
Brčić, M.
Bregar, K.
Brestovec, B.
Brezany, P.
Britvić, I.
Brkić, K.
Brkić, L.
Brkić, M.
Broz, I.
Budin, A.
Budin, L.
Bujan, I.
Bujas, G.
Buković, M.
Butković, Ţ.
Car, Ţ.
Cifrek, M.
Crnković Stumpf, B.
Čačković, V.
Čandrlić, S.
Čeperić, V.
Čičin-Šain, M.
Čubrilo, M.
Čupić, M.
Davidović, M.
Delač, G.
Depolli, M.
Dešić, S.
Dobrijević, O.
Domazet-Lošo, M.
Duarte, M.
(Croatia)
(Hungary)
(Croatia)
(Croatia)
(Finland)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Serbia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(France)
(Italy)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(France)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Austria)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Portugal)
Dţanko, M.
Dţapo, H.
Đerek, V.
Erceg, I.
Eškinja, Z.
Fertalj, K.
Filjar, R.
Fischer, D.
Frid, N.
Galinac Grbac, T.
Gamulin, O.
Garza, P.
Glavaš, G.
Glavaš, J.
Gojanović, D.
Golfarelli, M.
Golub, M.
Golubić, S.
Gomez Chavez, A.
Gracin, D.
Granić, A.
Grd, P.
Grgić, K.
Grgurić, A.
Groš, S.
Grubišić, D.
Grţinić, T.
Gulić, M.
Hadjina, T.
Henno, J.
Hoić-Boţić, N.
Holenko Dlab, M.
Horvat, G.
Horvat, M.
Hrabar, S.
Hrkać, T.
Humski, L.
Hure, N.
Ilić, Ţ.
Inkret, R.
Ipšić, I.
Ivanjko, E.
Ivašić-Kos, M.
Ivković, N.
Ivošević, D.
Jaakkola, H.
Jakobović, D.
Jakopović, Ţ.
Jakupović, A.
Jardas, M.
Jarm, T.
Jelenković, L.
Jevtić, D.
Jeţić, G.
Joler, M.
Jovanovic, V.
Jović, A.
Kalafatić, Z.
Kalpić, D.
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Italy)
(Croatia)
(Croatia)
(Croatia)
(Italy)
(Croatia)
(Croatia)
(Germany)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Sweden)
(Croatia)
(United States)
(Croatia)
(Croatia)
(Croatia)
(Estonia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Finland)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(United States)
(Croatia)
(Croatia)
(Croatia)
XXXIII
Kapus-Kolar, M.
Karan, M.
Kaštelan, I.
Katanić, N.
Kaučič, B.
Keto, H.
Kišasondi, T.
Klemenc-Ketiš, Z.
Kocev, D.
Kocijan, K.
Kopčak, G.
Koričić, M.
Kosec, G.
Kovačić, A.
Kovačić, B.
Krašna, M.
Krhen, M.
Krivec, S.
Krois, I.
Krpić, Z.
Kudelić, R.
Kunda, I.
Kušek, M.
Lacković, I.
Lipovac, A.
Lo Presti , F.
Lončarić, S.
Lovrenčić, A.
Lučev Vasić, Ţ.
Lučić, D.
Lugarić, T.
Lukac, D.
Ljubić, S.
Maček, M.
Magdalenić, I.
Malarić, R.
Mandić, F.
Mandić, T.
Maračić, M.
Marčetić, D.
Marinović, I.
Marinović, M.
Marjanović, M.
Markuš, N.
Martinčić-Ipšić, S.
Matić, T.
Mauša, G.
Mekovec, R.
Mekterović, I.
Meng, L.
Mezak, J.
Mihajlović, Ţ.
Mikac, B.
Mikuc, M.
Milanović, I.
Miličević, K.
Mišković, N.
Mlinarić, H.
Močinić, D.
Modlic, B.
Molnar, G.
Mošmondor, M.
Mrakovčić, T.
XXXIV
(Slovenia)
(Croatia)
(Serbia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Slovenia)
(United States)
(Sweden)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Italy)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Germany)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(United Kingdom)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Serbia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
Mrković, B.
NaĎ, Đ.
Nikitović, M.
Očko, M.
Oletić, D.
Orsag, M.
Pale, P.
Palestri, P.
Paspallis, N.
Pavlić, Z.
Pečar-Ilić, J.
Pelin, D.
Perić Hadţić, A.
Perkovac, M.
Perković, T.
Petrović, G.
Pintar, D.
Pivac, B.
Pobar, M.
Pocta, P.
Poljak, M.
Poplas Susič, T.
Pribanić, T.
Pripuţić, K.
Ptiček, M.
Rashkovska, A.
Repnik, R.
Resnik, D.
Ribarić, S.
Rimac-Drlje, S.
Ristić, D.
Rupčić, S.
Seva, J.
Sillberg, P.
Skala, K.
Skočir, P.
Skorin-Kapov, L.
Soini, J.
Soler, J.
Sorić, K.
Sruk, V.
Stanič, J.
Stanič, U.
Stapić, Z.
Stojković, N.
Stupar, I.
Sučić, S.
Suligoj, T.
Suţnjević, M.
Sviličić, B.
Šarolić, A.
Šegvić, S.
Ševrović, M.
Šikić, M.
Šilić, M.
Škvorc, D.
Štajduhar, I.
Štih, Ţ.
Šunde, V.
Švedek, T.
Švogor, I.
Tanković, N.
Tijan, E.
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Italy)
(United Kingdom)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Slovakia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Slovenia)
(Slovenia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(United Kingdom)
(Finland)
(Croatia)
(Slovenia)
(Croatia)
(Finland)
(Denmark)
(Croatia)
(Croatia)
(Slovenia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(United States)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
Tomczak, J.
Tomić, M.
Tralić, D.
Trancoso, I.
Trobec, R.
Trţec, K.
Tuomi, P.
Uroda, I.
Varga, M.
Vasić, D.
Vidaček-Hainš, V.
Vladimir, K.
Vlahović, N.
Vojković, G.
Vrančić, K.
Vranić, M.
(Austria)
(Croatia)
(Croatia)
(Portugal)
(Slovenia)
(Croatia)
(Finland)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
Vrdoljak, B.
Vrhovec, S.
Vrlika, V.
Vukadinović, D.
Vuković, M.
Weber, M.
Werth, W.
Zaluški, D.
Zereik, E.
Zinner, T.
Ţonja, S.
Zulim, I.
Ţgank, A.
Ţilak, J.
Ţivković, M.
Ţulj, S.
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
(Croatia)
(Austria)
(Croatia)
(Italy)
(Germany)
(Croatia)
(Croatia)
(Slovenia)
(Croatia)
(Croatia)
(Croatia)
XXXV
AUTHOR INDEX
Aalto, S.
Abazi Bexheti, L.
Abbas, K.
Abe, H.
Accardo, A.
Adamovic, N.
Afgan, E.
Afonin, S.A.
Ahmić, D.
Ahonen, K.
Ajĉević, M.
Aksentijević, S.
Alagić, D.
Alekseev, Yu.V.
Alić, D.
Ambroš, F.
Amin, N.
Andrić, J.
Andrić, K.
AnĊelić, N.
AnĊelić, V.
Antonijević, M.
Apostolova Trpkovska, M.
Aramo-Immonen, H.
Arbanas, K.
Arbet, D.
Arbula, D.
Ariyaeeinia, A.
Arsenovski, S.
Artemkina, S.B.
Asiama Nyako, W.
Asotić, A.
Assefa, T.D.
Astrov, I.
Ašenbrener Katić, M.
Aurora, C.
Avbelj, V.
Azemi, R.
Babiĉ, M.
Babić, A.
Babić, M.
Babić, S.
Babić, S.
Babić, T.
Baĉa, M.
Baĉić, I.
Baĉmaga, J.
Badnjević, A.
Baggini, M.
Baharudin, A.M.
Bakalar, G.
Balić, K.
Balki, M.
Balota, A.
Bánáti, A.
Bandiziol, A.
Banović-Ćurguz, N.
Baralis, E.
Baranović, M.
XXXVI
515
1145
1109
257
719
158
298
1656
1914
881
719
1699, 1769
224, 230
20
745
826
656
1667
1514
841
912
332
1145
1130
224
57
253
1448
388
16
707
1914
673
1333
917
1276
416
1149
367
1161
1239
1103
1165, 1258
1230, 1239, 1245
1465, 1689
970
78
395, 407, 477
836
1392
836
376
558
370
343
90
794, 800
1723
1735
Barić, A.
Batbayar, K.
Bedi, K.
Begiĉević ReĊep, N.
Belalem, G.
Belošić, D.
Benedetti, A.
Benedyczak, K.
Berdalović, I.
Berdinsky, A.S.
Bergami, R.
Berković, M.
Bianchi, F.
Bishnoi, S.K.
Biškup, M.
Biuklija, I.
Blašković, M.
Blaţeka, K.
Bleĉić, R.
Boban, M.
Bogut, R.
Boj, N.
Bokulić, K.
Börcsök, J.
Bosić, Z.
Bosilj Vukšić, V.
Bošković, D.
Bošnjak, L.
Bothe, K.
Boţurić, M.
Brandonisio, F.
Branković, I.
Bratina, T.
Brenner, W.
Brezany, P.
Britvić, I.
Brkić, K.
Brkić, L.
Brloţnik, M.
Brodić, D.
Brozek, J.
Brumen, B.
Brunetti, G.
Bubanja, I.
Bubaš, G.
Budimac, Z.
Budimirović, M.
Budin, A.
Budiša, D.
Bulešić Milić, S.
Bulović, A.
Bychkov, I.V.
Bytyçi, E.
Bziuk, W.
Cakaj, S.
Capeska Bogatinoska, D.
Carlini, G.
Carrato, S.
Celesti, A.
63, 78, 105, 110
269
1061
1804
214
679
5
190
40
11, 16
1828, 1847
703
509
1508
1804
1196
1225
1209
105
713, 1857
1250
1199
1759
1386
640
1735
521, 628
1674
766
1250
90
1909
1097
158
202
1532
1091
1735
416
1547
337, 1078, 1084, 1899
1629, 1674, 1680
1280
1780, 1817
1103
766
1413
1502
690
1171
472
1321, 1586
1527
684
554
707
719
1479
426
Cerquitelli, T.
Cherkashin, E.
Chiusano, S.
Chiussi, S.
Chungurski, S.
Cifrek, M.
Cmuk, D.
Coisel, I.
Cota, P.
Craninckx, J.
Crnković, I.
Cupec, R.
Cuzzola, F.A.
Ĉaĉković, V.
Ĉandrlić, S.
Ĉapko, Z.
Ĉavrak, I.
Ĉiĉin-Šain, M.
Ĉiţmar, R.
Ĉošić, V.
Ĉović, M.
Ĉubrić, G.
Ĉupić, M.
Ćosić, J.
Ćurko, K.
D’Antrassi, P.
Dadić, M.
Dadić, S.
Daković, M.
Dancheva, T.
Date, S.
Davies, A.
Davitashvili, T.
Davydov, A.
Depolli, M.
Dević, I.
Diduck, Q.
Dika, A.
Dika, F.
Dizdarević, H.
Dizdarević, S.
Djebbar, E.I.
Djedović, A.
Djinevski, L.
Domanjko-Petriĉ, A.
Domović, D.
Dovedan Han, Z.
Draţić Lutilsky, I.
Du, M.
Duan, X.
Dudarin, A.
DunĊer, I.
Duraković, A.
Đambić, G.
Đigaš, T.
Đugum, T.
Đurasević, M.
Ebibi, M.
Elsaadany, A.
Enne, R.
Erceg, I.
Esztelecki, P.
Farkaš, P.
1723
1578, 1586
1723
5
388
289, 326, 395, 407, 411, 446
852
1623
622
72
1380
1490
1276
584
917
1161
1403
1103, 1258
640
814
30
869
1091
1689
1786
719
146, 543
1644
569
208
257
1847
236
1578
401, 487
640
105
1527
185
599, 606
599, 606
214
1521
388
416
1591
1045
809
289, 432, 446
1448
100
1551, 1557
1909
442
826
1699
1609
1149, 1155
894, 1109
68
152
907
1443
Fazio, M.
Fedorov, V.E.
Feilhauer, T.
Feoktistov, A.G.
Fetaji, B.
Fetaji, M.
Fikejz, J.
Filipović Tretinjak, M.
Filko, D.
Filković, I.
Finny, S.
Fischer, I.A.
Fosić, I.
Franc, B.
Franković, I.
Fratte-Sumper, S.
Frid, N.
Friganović, K.
Fruk, M.
Fulanović, B.
Fumić, M.
Gabela, A.
Gadţe, D.
Gago, L.
Galin, M.
Galinac Grbac, T.
Gao, Y.
Garaguly, Z.
Garza, P.
Gebrehiwot, M.E.
Gencaga, D.
Geraskin, A.S.
Giedrimas, V.
Girtelschmid, S.
Gjorgjevska, E.
Glamoĉanin, D.
Godó, Z.A.
Golub, I.
Golub, M.
Goonasekera, N.
Gorachinova, L.
Gorshkova, A.S.
Gradišnik, V.
Granić, A.
Grd, P.
Grgić, S.
Grgurević, I.
Grilec, F.
Grollitsch, W.
Groš, S.
Grozev, D.G.
Grubelnik, L.
Grubelnik, V.
Grubišić, D.
Grzunov, J.
Gulan, G.
Gurbeta, L.
Gusev, M.
Guseva, A.
Gütl, C.
Hadţiahmetović, N.
Hajdo, H.
Hajnić, M.
426
11, 16
202
247
1149, 1155
1149, 1155
1899
912
1490
1454
126
5
690
1413
1115
110
1409
411
1311
1218
1225
777
1889
1899
640
634
289, 432, 446
269
1723
515
1469
95
745
263
976
755
1125
761
1591
298
388
20
30
946
1465, 1689
1230, 1245
736
446
90
1650
314
952
952
40
1841
380
395, 407, 477
208, 218, 420, 976
218
929
599, 606
831
852
XXXVII
Hamernik, D.
Hamidović, D.
Han, M.K.
Harris, J.
Hartnett, M.
Has, A.
HeĊi, I.
Henno, J.
Herceg, M.
Heriĉko, M.
Hertlein, H.
Hmelnov, A.
Höfler, M.
Hoić-Boţić, N.
Holenko Dlab, M.
Hormot, F.
Horvat, G.
Horvat, M.
Hrkać, T.
Huseinović, A.
I. Orović,
Ichikawa, K.
Iković, O.
Iliev, L.A.
Iliev, O.
Iliev, T.B.
Ilišević, D.
Ischebeck, A.
Iskra, A.
Ivanĉić, I.
Ivanova, E.P.
Ivanović, M.
Ivanjko, E.
Ivašić-Kos, M.
Ivić, M.
Ivković, N.
Ivošević, D.
Jaakkola, H.
Jakes, M.
Jakimoski, K.
Jakobović, D.
Janjić, M.
Jazbinšek, V.
Jednoróg, K.
Jeriĉević, Ţ.
Jerković, H.
Jerman-Blaţiĉ, B.
Jesenek, V.
Jevtić, D.
Jevtić, M.
Jiang, Y.
Job, J.
Jovalekic, S.
Jovanović, V.
Jović, A.
Jozić, A.
Jozić, K.
Jugo, I.
Jukan, A.
Jukić, O.
Junaković, D.
Jurin, I.
Kacsuk, P.
XXXVIII
1078
650
11
1028
1437
1747
805
881, 888, 1130
86
1680
1437
1586
929
1115
917
78
135
1496, 1551, 1557
1454
1914
574
257
521, 628
550, 562
388
550, 562
794, 800
929
1769
1072
550, 562
531, 766
1306
1597
809
992
1409
881, 888, 1130
337
388
1609
1051
467
1496
30, 274, 380
1017, 1033, 1644
367
1142
1706
1547
673
86
164
1028, 1741, 1774
326
1889
326
1056
684
805
1254
1300
343, 348
Kadoić, N.
Kadriu, A.
Kaevikj, N.
Kail, E.
Kakanakov, N.R.
Kalafatić, Z.
Kalpić, D.
Kamcheva, E.
Kaniški, M.
Kaniusas, E.
Karkelja, S.
Kaštelan, I.
Kathiravelu, P.
Katulić, T.
Kavoosifar, M.R.
Kawai, E.
Kedmenec, I.
Kelemen, R.
Kenzin, M.Yu.
Kermek, D.
Kertesz, A.
Keserica, H.
Keto, H.
Khan, H.
Kido, Y.
Kim, S.J.
Kiss, G.
Kišasondi, T.
Klarić, M.
Klasinc, J.
Klobuĉarević, D.
Klobuĉarević, Ţ.
Kljaić, Z.
Kneţević, T.
Knok, Ţ.
Kobal, D.
Kocev, D.
Kochemazov, S.
Kocijan, K.
Kocsis, D.
Kokanović, V.
Kolman, M.
Konecki, M.
Kopecky, Z.
Korać, M.
Kordić, M.
Koren, A.
Koriĉić, M.
Korobko, A.A.
Kőrösi, G.
Kos, L.
Kosec, G.
Kosovac, A.
Kostadinovska, A.
Kovács, J.
Kovács, L.
Kováĉ, M.
Kovaĉević, H.
Kovaĉević, Ţ.
Kovaĉić, B.
Kozlovszky, M.
Koţar, I.
Krajaĉić, S.
1792, 1798
1145
1137
348
314
1300, 1454
1514
388
934, 988, 992
1930
1909
923
634
1694
1723
257
1798
1804
1321
934
497
332
966
1543
257
11
1125
1667
1538
1822
679
679
656
34, 40
1002
614
481
293, 1561
1045, 1051
1125
442
308
988, 992, 1582
1078
1180
1706
141, 699
46, 51
1729
907
303
308, 450
703
1137
348
269
57
1919
783
1056, 1741, 1774, 1852
269, 343, 348
274, 380
1222
Krajĉek Nikolić, K.
Krajnović, S.
Kralevska, K.
Kralj, L.
Kranjac, M.
Krašna, M.
Krelja Kurelović, E.
Krini, A.
Krini, J.
Krini, O.
Kriţanović, K.
Krvavica, A.
Kuĉak, D.
Kudelić, R.
Kukolja, D.
Kumar A., A.
Kumar Singh, S.
Kurbalija, V.
Kušundţija, E.
Kutaladze, N.
Kuzle, I.
Kuznetsov, V.A.
Kvatadze, R.
Laanpere, M.
Lacković, A.
Lamza Maronić, M.
Larionov, A.
Lasorsa, I.
Lassila, P.
Lavriĉ, P.
Lazić, N.
Ledneva, A.Yu.
Legvart, J.
Lenac, K.
Librenjak, S.
Likar, Š.
Lin, C.
Linna, P.
Livaja, I.
Lo Presti, F.
Lonie, A.
Lopina, V.
Lovrenĉić, S.
Lovrić Švehla, Z.
Luĉev Vasić, Ţ.
Ludescher, T.
Lugarić, T.
Lugović, S.
Lukac, D.
Lukovac, B.
M. Marić,
Maĉek, D.
Magerl, M.
Major, L.
Mäkelä, J.
Mäkinen, T.
Maksimkin, N.N.
Malarić, R.
Malatras, A.
Malekian, R.
Marasović, K.
Marceglia, S.
Marchewka, A.
1306
1502
673
863, 1258
383
957
1811
1386
1386
1386
472
416
442
1582
326
120
684
531
1909
236
1538
11, 16
236
1115
1218
1072
1578
719
515
401
783
11, 16
1629
730
1051
416
432
966
1538, 1895
509
298
1045
988
1663
289, 446
202
1369
1551, 1557
1012
141
574
230
110
907
881, 888, 1130
966
1315, 1321
146, 543
1623
707
970
719
1496
Marĉec, M.
Marĉetić, D.
Marek, J.
Marić, F.
Marinĉić, A.
Marinović, M.
Marjanović, D.
Marković, I.
Markulic, N.
Marsi, S.
Martinĉić, K.
Martinĉić-Ipšić, S.
Martinović, G.
Martinović, Ţ.
Martinović, Ţ.
Matić, T.
Mauher, M.
Medved Rogina, B.
Medved, M.
Medvet, E.
Meluch, L.
Meng, L.
Meshchanov, V.P.
Meštrović, A.
Mihajlović, Ţ.
Mihaljević, B.
Mihaylov, Gr.Y.
Mihova, M.
Mijaĉ, M.
Mikhailov, A.
Mikuchadze, G.
Mikuliĉić, N.
Mikulić, J.
Milardić, V.
Milišić, H.
Milivojević, Z.N.
Miljković, D.
Mirković, M.
Miškić-Pletenac, N.
Mišlov, R.
Mitrović, N.
Mlakić, D.
Mochizuki, H.
Moĉibob, E.
Modebadze, Z.
Mohorĉiĉ, M.
Molnar, G.
Morita, S.
Mostarac, P.
Moţnik, D.
Nagul, N.
Nagy, L.
Nakrap, I.A.
Námestovski, Z.
Nardelli, M.
Nenadić, I.
Nepovinnykh, E.A.
Ni, W.
Nikitović, A.
Nikitović, M.
Nikolovski, S.
Nonis, R.
Novak, M.
1002
1459
1084
1300
141, 667
472
407, 395
1300
72
1479
82
1904
1380
146, 543
146, 543
86
1869
353
411
1479
1443
1437, 1448
95
1904
277, 1091
970
550, 562
1137
772
1586
236
277
63
1413
1914
1547
1352, 1358
1203
730
110
68
1289
46, 51
1904
236
487
100
46, 51
146
1184
1566
57
95
907
503
614
196, 1884
289, 446
1532
1753
1289
90
901, 934
XXXIX
Novković, I.
Nozhenkova, L.F.
Nummela, J.
Nyarko, E.K.
Obarĉanin, K.
Ogrin, A.
Okanović, V.
Okreša Đurić, B.
Oliveira, F.
Omanović, S.
Opiła, J.
Ordanić, L.
Orehovaĉki, T.
Orešĉanin, D.
Oreški, D.
Orović, I.
Oršolić, P.
Osman, I.
Osreĉki, Ţ.
Ostojić, R.
Ovseník, Ľ.
Paćelat, I.
Paić, G.
Pajić, E.
Palata, D.
Pale, U.
Palestri, P.
Paľová, D.
Panwar, S.
Paroški, M.
Pata, K.
Pauk, L.
Pauković, G.
Paviĉić, T.
Pavić, T.
Pavlić, Z.
Pavlina, K.
Pein, B.
Pellegrino, F.A.
Petrova-El Sayed, M.
Petrović, I.
Petrovska, S.
Pflanzner, T.
Pighin, M.
Pirker, J.
Pobar, M.
Poljak, N.
Poplas Susiĉ, T.
Popović, B.
Popović, G.
Popović, Ţ.
Posavec, N.
Prazina, I.
Prohaska, Z.
Prohaska, Z.
Pröll, B.
Pršeš, K.
Pun, S.H.
Pürcher, P.
Putnik, Z.
R, R.
Raczkowski, K.
Radan, A.
XL
1650
1729
1397
1490
1294
1239
376
1603
5
745
361
1863
982, 1165
590
1582, 1667
569
116
1759
40
1294
1485
1254
1039
1909
614
1924, 1930
90
1006
1508
383
1115
1663
640
1262
791
1369
1039
982
1479
190
1300
1547
497
1375
929
1597
1895
456
370
1924
584
24
376
996
996
263
376
289, 432
929
766
120, 126, 130
72
1033
Radchenko, G.I.
Radojević, B.
Radošević, I.
Raj R.S., R.
Raj, B.
Rajković, E.
Rak, M.
Ramljak, D.
Ramljak, M.
Ramponi, G.
Rantanen, P.
Rashkovska, A.
Rembold, D.
Repnik, R.
Retschitzegger, W.
Révészová, L.
Ribarić, S.
Rishiwal, V.
Risteska Stojkoska, B.
Ristov, S.
Ristovski, A.
Rizvić, S.
Rolich, T.
Romanenko, A.I.
Romano, A.
Rostamzadeh Hajilari, H.
Rumyantseva, V.D.
Rutkowski, A.
Ruţić, I.
Saari, M.
Sáez-Trigueros, D.
Salapura, S.
Salfinger, A.
Salopek Ĉubrić, I.
Salopek, D.
Samkharadze, I.
Samotan, V.
Sanchez, I.
Savin, A.N.
Schatten, M.
Schatzberger, G.
Schindler, F.
Schuller, B.
Schulze, J.
Schwinger, W.
Sedinić, I.
Semenov, A.
Sertić, M.
Sharifi, M.
Shilov, I.P.
Shimojo, S.
Shopov, M.P.
Sidorov, I.A.
Sikimić, U.
Sillberg, P.
Simić, I.
Sinanović, H.
Singh, R.
Sirotić, I.
Skala, K.
Skendţić, A.
Slak, J.
Slamić, G.
196, 1884
761
1190
130
1469
1245
1936
751
1711
1479
1392, 1397
481
164
962
263
940
1459
1543
724, 1137
208, 218, 420
218
376
1591
11, 16
426
283
20
190
1235
1392
1437
821, 847
263
869
826
236
1899
1623
95
1603
63
1443
190
5
263
1663
293, 1561
1174
283
20
257
314, 1348
242, 247
383
1392, 1397
383
1914
1469
152
175, 298
1852
450
1017
Slavuj, V.
Smrke, A.
Soini, J.
Soliman, M.
Solus, D.
Soviĉ, M.
Sretenović, M.
Sruk, V.
Stanĉić, A.
Staniĉ, U.
Stanković, I.
Stanković, S.
Stapić, Z.
Stockreiter, C.
Stojanović, A.
Stojanović, Ţ.
Stóka, G.
Stopjaková, V.
Stoyanov, I.S.
Strahonja, V.
Strmeĉki, D.
Strujić, Dţ.
Suĉić, S.
Sukhinskiy, I.V.
Suligoj, T.
Suljanović, N.
Sun, Z.
Sušanj, D.
Svoboda, V.
Sylejmani, K.
Šabec, J.
Šakić, K.
Šarić, E.
Šarić-Kekić, N.
Šegmanović, F.
Šegvić, M.
Šerifović-Trbalić, A.
Ševo, M.
Šikić, M.
Šimunić, D.
Šipuš, D.
Šitin, I.
Škoda, P.
Škorić, I.
Škorjanc, T.
Škorput, P.
Škrbić, M.
Škvorc, D.
Šneler, L.
Šojat, Z.
Šokac, D.
Šolić, K.
Šoša Anić, M.
Šovĉík, M.
Špeh, I.
Špoljarić, M.
Špoljarić, T.
Štajcer, M.
Štajcer, M.
Štefan Trubić, M.
Štimac, M.
Štokić, N.
Štruc, V.
1056
1327
1392, 1397
894
1485
962
1741, 1774
1409
736
456
569
574, 569
772
110
783
1919
1125
57
550, 562
772, 1753
934
370
332
1884
34, 40, 46, 51
650
1448
253, 380
337, 1084
1527
622
1699
1914
1863
40
1306
462
1895
472
141, 667, 699
579
946
353
982
841
656
599, 606
1369
86
175
1196
1174
791
57
1834
1339
1339, 1889
590
590
1190
826
644
1433
Šurić, J.
Talebi, M.M.
Tan, Z.-H.
Taneski, V.
Tasiĉ, J.
Taşpinar, N.
Taylor, J.
Teder, S.
Telenta, M.
Temerinac, M.
Tepeš, B.
Thürk, F.
Tijan, E.
Tikvić, I.
Tokárová, I.
Tomes, L.
Tomić, S.
Tomljanović, J.
Tonkovikj, P.
Topolĉić, L.
Tošić, M.
Tošić, M.
Trbalić, A.
Trengoska, J.
Tretinjak, M.
Trivodaliev, K.
Trobec, R.
Tucaković, Z.
Tudić, V.
Turán, J.
Turina, T.
Tusun, S.
Tutek, Ţ.
Ul’yanov, S.A.
Ungermanns, C.
Upadhyay, V.P.
Urem, F.
Uroda, I.
Vai, M.I.
Varga, M.
Vaser, R.
Veiga, L.
Veispahić, A.
Vejaĉka, M.
Venuti, A.
Vidas-Bubanja, M.
Villari, M.
Vinko, D.
Vittori, M.
Vlahović, N.
Vlaović, J.
Vojković, G.
Vraĉić, T.
Vrana, R.
Vranešić, P.
Vrbovĉan, I.
Vresk, T.
Vrhovec, S.L.R.
Vuĉić, M.
Vujisić, G.
Vukmirović, S.
Vuković, M.
Vunarić, I.
1262
283
1448
1680
420
558
298
1115
303
923
1039
1930
1769, 1852
1311
1443
929
383
1811
976
1222
521
628
462
707
1250
1137
481
1879
24
1485
1811
152
1215
1315
1121
1508
1222, 1225, 1254
996
289, 432
1786
472
634
703
1023
1271
1780
426
116, 135
1479
1572
135
1694
1786
875, 1067
1017, 1033, 1644
791
1403
436, 1364, 1635, 1639
100
1311
1161
1706
1230
XLI
Vyroubal, V.
Wambacq, P.
Watashiba, Y.
Weber, M.
Weiss, B.
Werth, W.
Xhafa, V.
Yakovleva, G.E.
Yamanaka, H.
Zaikin, O.
Zdravevski, V.
Zekanović-Korona, Lj.
Zekić-Sušac, M.
Zimmermann, H.
XLII
736
72
257
713
110
1121
185
11
257
293, 1561
208
1841
1747
68
Zinterhof, P.
Zmijanac, M.
Zorić, B.
Ţagar, M.
Ţeţelj, B.
Ţigulić, R.
Ţilak, J.
Ţilić, T.
Ţitnik, T.
Ţiţić, A.
Ţlof, V.
Ţunić, B.
Ţunić, E.
320
1420
1380
1936
831
841
46, 51
814
640
946
821, 847
1521
1521
FOREWORD
The 39th International ICT Convention MIPRO 2016 was held from 30th of May until 3rd of
June 2016 in Opatija, the Adriatic Coast, Croatia. The Convention consisted of nine
conferences under the titles: Microelectronics, Electronics and Electronic Technology
(MEET), Distributed Computing, Visualization and Biomedical Engineering (DC VIS),
Telecommunications & Information (CTI), Computers in Education (CE), Computers in
Technical Systems (CTS), Intelligent Systems (CIS), Information Systems Security (ISS),
Business Intelligence Systems (miproBIS), Digital Economy and Government, Local
Government, Public Services (DE/GLGPS). A special conference was dedicated to the works
of students: MIPRO Junior-Student Papers (SP). Along with this, special sessions on
Biometrics & Forensics & De-Identification and Privacy Protection (BiForD) and Future
Networks and Services (FNS) were also held as a part of convention MIPRO.
The papers presented on these conferences and special sessions are contained in this
comprehensive Book of Proceedings. All the papers were reviewed by an international review
board. The list of reviewers is contained in the Book of Proceedings. All the positively
reviewed papers are included in the Book of Proceedings. These papers were written by
authors from the industry, scientific institutions, educational institutions, state and local
administration.
The convention was organized by the Croatian ICT Society MIPRO with the help of
numerous patrons and sponsors to whom we owe our sincere thanks. We specially single out
our golden sponsors Ericsson Nikola Tesla, T-Croatian Telecom and Končar-Electrical
Industries and silver sponsor InfoDom. Our bronze sponsors are HEP–Croatian Electricity
Company, Hewlett Packard, IN2, Transmitters and Communications and Storm Computers.
To all who helped organizing the 39th International ICT Convention MIPRO 2016 as well as
editing of this Book of Proceedings we extend our heartfelt thanks.
Prof. Petar Biljanović, PhD
International Program Committee
General Chair
XLIII
MEET
International Conference on
MICROELECTRONICS, ELECTRONICS AND ELECTRONIC
TECHNOLOGY
Steering Committee
Chairs:
Željko Butković, University of Zagreb, Croatia
Marko Koričić, University of Zagreb, Croatia
Petar Biljanović, University of Zagreb, Croatia
Members:
Slavko Amon, University of Ljubljana, Slovenia
Dubravko Babić, University of Zagreb, Croatia
Maurizio Ferrari, CNR-IFN, Povo-Trento, Italy
Mile Ivanda, Ruđer Bošković Institute, Zagreb, Croatia
Branimir Pejčinović, Portland State University, USA
Tomislav Suligoj, University of Zagreb, Croatia
Aleksandar Szabo, IEEE Croatia Section
INVITED PAPER
(Si)GeSn Nanostructures for Optoelectronic
Device Applications
I. A. Fischer*, F. Oliveira*, **, A. Benedetti§, S. Chiussi §§ and J. Schulze*
Institute of Semiconductor Engineering, Pfaffenwaldring 47, 70569 Stuttgart, Germany
Centre of Physics, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
§
CACTI, Univ. de Vigo, Campus Universitario Lagoas Marcosende 15, Vigo, Spain
§§
Dpto. Fisica Aplicada, Univ. de Vigo, Rua Maxwell s/n, Campus Universitario Lagoas Marcosende, Vigo, Spain
fischer@iht.uni-stuttgart.de
*
**
Abstract – We present an overview of recent results on the
fabrication of GeSn- and SiGeSn-nanostructures for
optoelectronic device application.
I.
INTRODUCTION
Recent years have seen numerous experimental efforts
directed at integrating photonics with electronics based on
Group-IV optoelectronic devices. The use of Si and Ge in
optoelectronics has been limited by their indirect bandgap
resulting in low efficiency. More recently, significant
progress has been made in enhancing optical properties of
Group-IV alloys by including Sn. The unstrained binary
alloy Ge1-ySny is predicted to become a direct bandgap
semiconductor for y > 0.073 [1], while y > 0.17 is needed
to obtain a direct bandgap material for Ge1-ySny grown
pseudomorphically on Ge [2]. The growth of Ge1-ySny on
Ge is challenging because of the 14.7 % lattice mismatch
between α-Sn (with lattice constant aSn = 6,493 Å) and Ge
(aGe = 5,658 Å). The existence of a direct bandgap
material has been confirmed for a partially relaxed
Ge0.874Sn0.126 layer on Ge [3].
Electrical and optical properties can be tuned further
by adding Si (aSi = 5,431 Å) i. e. investigating the ternary
alloy. SixGe1-x-ySny can be grown without lattice mismatch
on Ge for
x
a Sn aGe
y
aGe a Si
Compared to Ge1-ySny the material properties such as the
composition-dependent bandgap of the ternary alloy
SixGe1-x-ySny are much less understood and subject of
ongoing experimental efforts.
Nanostructures such as quantum wells and islands are
well suited for application in optoelectronic devices
because of the improved optical properties that originate
from carrier confinement in one or more directions. The
properties of nanostructures such as quantum wells and
quantum dots for application in III-V semiconductorbased optical devices have led to the fabrication and
commercialization of devices such as quantum-well and
quantum-dot lasers, quantum cascade lasers and quantumThis work was partly supported by the Portuguese Foundation for
Science and Technology (FCT) through Strategic Project PEstC/FIS/UI0607/2013 and PhD Fellowship (F. Oliveira).
MIPRO 2016/MEET
well infrared photodetectors. SiGe nanostructures such as
quantum wells [4] and islands [5] have also been
intensively investigated for applications in modulators or
photodetectors. The addition of Sn opens up exciting new
possibilities for the use of Group-IV-nanostructures in
optical device applications. There are a number of
theoretical proposals for lasers and infrared photodetectors
based on (Si)GeSn multi-quantum-well structures [6]–[8].
While nanostructures are interesting for optoelectronic
applications in their own right, fabricating and
functionalizing nanostructures in the context of group-IVoptoelectronics could be a possible route towards
obtaining high Sn content in such structures, with the
concomitant advantages for optical efficiency.
Group-IV nanostructures containing Sn have
previously been fabricated using different techniques. Sn
dots were grown on Ge [9], Ge1-ySny dots were grown on
thin SiO2 layers on top of Si (111) substrates [10] and Sn
nanostructures were embedded into a Si or Ge matrix by
annealing of SiSn or GeSn films [11]. Here, we present a
brief overview of our recent progress in fabricating
(Si)GeSn-nanostructures that can be fully embedded in
Group-IV-based devices.
II.
NANOSTRUCTURE FABRICATION
A. Molecular Beam Epitaxy
Molecular Beam Epitaxy (MBE) is a growth method
that is ideally suited to material deposition with
monolayer precision. The growth of GeSn and SiGeSn
alloys is usually performed at very low substrate
temperatures in order to prevent precipitation and
segregation of Sn. Deposition temperatures of 100 °C –
160 °C are often selected for GeSn, while SiGeSn has also
been grown at higher temperatures [12]. In bulk materials,
the substrate temperature has a strong impact on layer
quality and, thus, on device quality. For nanostructures,
material diffusion and segregation is an additional concern
that needs to be addressed when selecting growth
temperatures [12]. While a higher growth temperature
could also yield better layer quality, it can be expected to
negatively affect the apruptness of heterotransitions.
All structures discussed here were grown using solid
source MBE at a base pressure lower than 10-10 mbar. An
electron beam evaporator was used for Si evaporation,
5
(a)
layers on Ge. This virtual substrate was formed in two
steps. A Ge buffer layer was grown at 1 Å/s and 330 ºC,
followed by an annealing treatment at 850 ºC to reduce the
threading dislocation density and form a virtual substrate.
A second layer of epitaxial Ge was then deposited to
provide a smooth surface onto which high-quality layers
could be grown.
(b)
100 nm n++-Si
100 nm n++-Ge
Ge
SiGeSn
SiGeSn
SiGeSn
Ge
100 nm p++-Ge(VS)
400 nm p++-Si
Figure 1. (a) Schematic layer structure and (b) TEM image of SiGeSn
multi-quantum-well structures. The contrast indicates material
transitions.
while Knudsen cells were used for Ge and Sn deposition.
The Si as well as the Ge flux are monitored in situ with a
quadruple mass spectrometer with a feedback loop for
flux stabilization. The Sn flux is controlled by the cell
temperature. The Si flux was calibrated by growing Si
films on Si (100) wafers with a growth rate of 1 Å/s and
measuring film thickness with a profilometer. The Ge flux
was calibrated by growing a relaxed epitaxial Ge film on a
Si (100) wafer with a slow flux of ≈ 0.1 Å/s and
measuring film thickness by ellipsometric spectroscopy.
The flux of Sn was calibrated by growing thin epitaxial
films of Ge1-xSnx on Ge buffer layers on Si (100) wafers.
The absolute concentration of Sn was subsequently
measured using Rutherford backscattering spectroscopy.
All samples discussed in the following subsections
were fabricated on 4” Si (100) wafers. After placing the
wafers into the MBE chamber they were first subjected to
a thermal desorption step at 900 ºC for 5 minutes to
remove the surface SiO2 layer. This was followed by the
growth of 50 nm of Si to cover remaining surface
contaminants and obtain a smooth surface for the
subsequent growth steps. For (Si)GeSn layers grown on
Ge it was necessary to form a Ge virtual substrate (VS) on
the Si wafer to accommodate the lattice difference
between Si and Ge and enable the growth of high quality
(a)
(b)
B. SiGeSn Multi-Quantum Well Structures
Fabricating SiGeSn multi-quantum-well (MQW)
structures on Ge has the advantage that while both well
and barrier layers can be strained with respect to Ge, the
compositions of barrier and well layers can be chosen in
such a way that the net strain of the MQW structure on Ge
is zero or close to zero. Such structures can, thus, make
use of well-established Ge VS technology for integration
on Si. We, therefore, investigated SiGeSn MQW
structures containing two and four wells grown on Ge
buffer layers at a substrate temperature of 160 °C [13].
The presence of the well and barrier layers can clearly be
seen in TEM images (Fig. 1 (b)). Both barrier and well
layers, each 10 nm thick, are composed of SixGe1-x-ySny
with different fractions of the semiconductor materials
(Si0.31Ge0.62Sn0.07 for the barrier and Si0.25Ge0.63Sn0.12 for
the well layers). The material compositions were chosen
for their bandgaps and such that the lattice constant of the
unstrained well (barrier) ternary alloy is larger (smaller)
than that of Ge in order to obtain a MQW layer stack with
small residual strain on the Ge VS.
Optoelectronic device functionality has been
demonstrated for these structures by placing them in the
intrinsic layer of a PIN-photodiode layer stack. The
presence of the SiGeSn-MQW structure can be seen to
influence both photocurrent and electroluminescence
measurements [13].
C. Sn-rich GeSn Multi-Quantum-Well-Structures in Ge
Similarly to the growth of SiGeSn MQW structures,
the fabrication of Ge/GeSn MQW structures can be
achieved by sandwiching GeSn layers with thicknesses of
a few nanometers between Ge spacer layers. Lightemitting diodes (LEDs) containing Ge/Ge0.93Sn0.07 MQW
layers with varying thicknesses have been shown to emit
light with much higher intensity than a reference Ge LED
(c)
Ge
GeSn
Ge
GeSn
200 nm Ge(VS)
p-Si Substrate
Figure 2. (a) Schematic layer structure and (b) TEM BF as well as (c) HR-TEM images of GeSn multi-quantum-well structures. The contrast
indicates material transitions.
6
MIPRO 2016/MEET
2.0 ML
(b)
(1 x 1) μm2
(a)
3.0 ML
32 dots/μm2
1,214 dots/μm2
Figure 3. AFM measurements of (a) 2.0 and (b) 3.0 ML of Sn
deposited on Ge (100). Large-scale dot formation is observed for 3.0
ML of Sn.
[14].
A particularly high Sn content in Ge/GeSn MQW
structures can be achieved by depositing pure Sn layers on
Ge with a total layer thickness that is below the critical
layer thickness tc at which the transition from 2D to 3D
growth (Stranski-Krastanov growth) sets in. Similarly to
the growth of Ge on Si, which has been investigated for
nearly three decades, the growth of Sn on Ge proceeds in
two stages. In the first stage, Sn forms a fully strained 2Dwetting layer on Ge. At a critical layer thickness,
relaxation sets in via formation of local, Sn-rich material
accumulations: 3D island growth is obtained. Using Sn
layers with thicknesses below tc and overgrowing them
with 5 – 10 nm of Ge, ultra-thin multi-quantum well
layers with high Sn-content can be fabricated [15].
Fig. 2 shows cross sectional TEM images of a sample
with 10 Sn-rich wells separated by 10 nm Ge spacer
layers. The sample was fabricated by repeatedly
depositing 2 ML of Sn and overgrowing them with 10 nm
Ge at a constant substrate temperature of 100 °C. The
resulting position-dependent Sn composition is likely to
be influenced by material diffusion and Sn segregation but
can be expected to be high.
D. Sn-rich GeSn islands on Ge
When the critical thickness is exceeded during the
deposition of pure Sn on Ge, we can observe 3D island
(a)
(b)
growth. We investigated the transition from 2D to 3D
growth of pure Sn on Ge by depositing 0 – 5 ML of Sn on
Ge at a substrate temperature of 100 °C [15]. AFM images
of two selected samples are shown in Figure 3. At these
growth temperatures, we determined the critical thickness
to be tc = 2.25 ML. For Sn layers with thicknesses larger
than tc we observe the onset of large-scale dot formation
as shown in Fig. 3 (b). Only few dots can be found for Sn
layers with thicknesses below tc as shown in Fig. 3 (a). No
facets can be observed in AFM measurements. An
interesting question is whether the Sn islands contain αSn, whose crystal structure is identical to that of Si and
Ge, or β-Sn, which is the thermodynamically stable phase
for temperatures above 13 °C. The fact that such layers
overgrown with Ge show perfect crystallinity in TEM
measurements seems to indicate that the islands observed
in AFM measurements are indeed composed of α-Sn [15].
E. GeSn islands in Si
Finally, a strategy of producing GeSn islands on Si
that closely mirrors the growth of self-assembled Ge
islands on Si consists of growing few MLs of Ge1-ySny on
Si. Self-assembled GeSn islands were fabricated by
depositing 5.5 ML of Ge0.96Sn0.04 on Si at a constant
substrate temperature of 350 °C and overgrowing them
with 10 nm of Si [16]. This growth sequence could be
repeated up to four times until layer quality started to
deteriorate [16]. TEM images of a sample with three
stacked layers of self-assembled GeSn islands are shown
in Fig. 4. At a thickness of 5.5 ML the deposited
Ge0.96Sn0.04 layers exceed the critical thickness for island
formation and relaxation sets in via the local accumulation
of GeSn. The resulting structures are clearly visible in the
cross sectional TEM images. Local material accumulation
can be seen to be accompanied by a thinning of the
wetting layer in the vicinity of the islands. The resulting
island composition can, thus, be expected to be a result of
position-dependent intermixing as is the case with Ge
layers deposited on Si. An experimental technique with
sub-nanometre precision would be required to investigate
the position-dependent composition of the resulting
islands in detail.
III.
CONLUSION
We have explored several growth strategies for
(c)
Si
GeSn
Si
GeSn
Si Substrate
Figure 4. (a) Schematic layer structure and (b) STEM as well as (c) HR-TEM images of GeSn dots grown on Si and capped with Si. Local GeSnrich islands form as a result of the lattice mismatch between Ge0.96Sn0.04 and Si.
MIPRO 2016/MEET
7
growing Sn-rich quantum wells and islands that can be
fully embedded either in Si or Ge for future possible
applications in electrooptic devices. We are able to
fabricate Sn-rich nanostructures, some of which have
already been integrated into diodes to demonstrate
optoelectronic functionality. Future steps consist of
improving layer growth and, most importantly, explore
experimental strategies to determine composition and
strain on a nanometer and even sub-nanometer scale in
order to be able to tailor those nanostructures for
application needs.
[6]
[7]
[8]
[9]
[10]
ACKNOWLEDGMENT
We thank G. Capellini, M. Virgilio, T. Wendav and K.
Busch for insightful discussions.
[11]
REFERENCES
[1]
[2]
[3]
[4]
[5]
8
L. Jiang, J. D. Gallagher, C. L. Senaratne, T. Aoki, J. Mathews, J.
Kouvetakis, and J. Menéndez, “Compositional dependence of the
direct and indirect band gaps in Ge1-ySny alloys from room
temperature photoluminescence,” Semicond. Sci. Technol., vol.
29, p. 115028, 2014.
A. A. Tonkikh, C. Eisenschmidt, V. G. Talalaev, N. D. Zakharov,
J. Schilling, G. Schmidt, and P. Werner, “Pseudomorphic
GeSn/Ge(001) quantum wells: Examining indirect band gap
bowing,” Appl. Phys. Lett., vol. 103, p. 32106, Jul. 2013.
S. Wirths, R. Geiger, N. von den Driesch, G. Mussler, T. Stoica,
S. Mantl, Z. Ikonic, M. Luysberg, S. Chiussi, J. M. Hartmann, H.
Sigg, J. Faist, D. Buca, and D. Grützmacher, “Lasing in directbandgap GeSn alloy grown on Si,” Nat. Photonics, vol. 9, pp. 88–
92, Feb. 2015.
Y.-H. Kuo, Y. K. Lee, Y. Ge, S. Ren, J. E. Roth, T. I. Kamins, D.
A. B. Miller, and J. S. Harris, “Quantum-Confined Stark Effect in
Ge/SiGe Quantum Wells on Si for Optical Modulators,” IEEE J.
Sel. Top. Quantum Electron., vol. 12, pp. 1503–1513, Dec. 2006.
K. L. Wang, Dongho Cha, Jianlin Liu, and C. Chen, “Ge/Si SelfAssembled Quantum Dots and Their Optoelectronic Device
Applications,” Proc. IEEE, vol. 95, pp. 1866–1883, Sep. 2007.
[12]
[13]
[14]
[15]
[16]
G.-E. Chang, S.-W. Chang, and S.-L. Chuang, “Strain-Balanced
Multiple-Quantum-Well Lasers,” IEEE J. Quantum Electron., vol.
46, pp. 1813–1820, 2010.
G. Sun, R. A. Soref, and H. H. Cheng, “Design of a Si-based
lattice-matched room-temperature GeSn/GeSiSn multi-quantumwell mid-infrared laser diode,” Opt. Express, vol. 18, pp. 19957–
19965, Sep. 2010.
G. Sun, R. A. Soref, and H. H. Cheng, “Design of an electrically
pumped SiGeSn/GeSn/SiGeSn double-heterostructure midinfrared
laser,” J. Appl. Phys., vol. 108, p. 33107, Aug. 2010.
W. Dondl, P. Schittenhelm, and G. Abstreiter, “Self-assembled
growth of Sn on Ge (001),” Thin Solid Films, vol. 294, pp. 308–
310, Feb. 1997.
Y. Nakamura, A. Masada, S.-P. Cho, N. Tanaka, and M.
Ichikawa, “Epitaxial growth of ultrahigh density Ge1−xSnx
quantum dots on Si (111) substrates by codeposition of Ge and Sn
on ultrathin SiO2 films,” J. Appl. Phys., vol. 102, pp. 124302124302–6, Dec. 2007.
R. Ragan, K. S. Min, and H. A. Atwater, “Direct energy gap
group IV semiconductor alloys and quantum dot arrays in
SnxGe1−x/Ge and SnxSi1−x /Si alloy systems,” Mater. Sci. Eng. B,
vol. 87, pp. 204–213, Dec. 2001.
N. Taoka, T. Asano, T. Yamaha, T. Terashima, O. Nakatsuka, I.
Costina, P. Zaumseil, G. Capellini, S. Zaima, and T. Schroeder,
“Non-uniform depth distributions of Sn concentration induced by
Sn migration and desorption during GeSnSi layer formation,”
Appl. Phys. Lett., vol. 106, p. 61107, Feb. 2015.
I. A. Fischer, T. Wendav, L. Augel, S. Jitpakdeebodin, F.
Oliveira, A. Benedetti, S. Stefanov, S. Chiussi, G. Capellini, K.
Busch, and J. Schulze, “Growth and characterization of SiGeSn
quantum well photodiodes,” Opt. Express, vol. 23, p. 25048, Sep.
2015.
B. Schwartz, M. Oehme, K. Kostecki, D. Widmann, M. Gollhofer,
R. Koerner, S. Bechler, I. A. Fischer, T. Wendav, E. Kasper, J.
Schulze, and M. Kittler, “Electroluminescence of GeSn/Ge MQW
LEDs on Si substrate,” Opt. Lett., vol. 40, p. 3209, Jul. 2015.
F. Oliveira, I. A. Fischer, A. Benedetti, P. Zaumseil, M. F.
Cerqueira, M. I. Vasilevskiy, S. Stefanov, S. Chiussi, and J.
Schulze, “Fabrication of GeSn-multiple quantum wells by
overgrowth of Sn on Ge by using molecular beam epitaxy,” Appl.
Phys. Lett., vol. 107, p. 262102, Dec. 2015.
F. Oliveira, I. A. Fischer, A. Benedetti, M. F. Cerqueira, M. I.
Vasilevskiy, S. Stefanov, S. Chiussi, and J. Schulze, “Multi-stacks
of epitaxial GeSn self-assembled dots in Si: Structural analysis,”
J. Appl. Phys., vol. 117, p. 125706, Mar. 2015.
MIPRO 2016/MEET
PAPERS
Thermoelectric Properties of Polycrystalline WS2
and Solid Solutions of WS2-ySey Types
G.E. Yakovleva*, A.I. Romanenko*, A.S. Berdinsky **, A.Yu. Ledneva*, V.A. Kuznetsov**, M.K. Han***,
S.J. Kim*** and V.E. Fedorov*
**
*Nikolaev
Institute of Inorganic Chemistry, Russian Academy of Sciences, Novosibirsk, Russia
Novosibirsk State Technical University/Semiconductor Devices & Microelectronics, Novosibirsk, Russia
***
Ewha Womans University/Dept. Chemistry and Nano Science, Seoul, Korea
e-mail: fed@niic.nsc.ru
Transition metal chalcogenides are perspective
thermoelectric materials which have a great interest
for application. In this work, the polycrystalline bulk WS2
and solid solutions WS2-ySey types have been studied. In
contrast to the literature data obtained at higher
temperatures, we have investigated the thermoelectric
properties of these materials at low and middle
temperatures (77-450K). The temperature dependences of
electrical conductivity and Seebeck coefficient were
received from experimental data. The Seebeck
coefficients of these materials have a high values, the
maximum value up to 2000µV/K has been obtained.
I.
INTRODUCTION
At our days, the main ecological problem is an
environmental pollution. Over 60% of energy is
wasting worldwide, mostly in the form of waste heat
[1]. Therefore, thermoelectric power sources are one of
the perspective fields of study. Thermoelectric
materials have an ability to convert heat into electrical
energy. The efficiency of thermoelectric materials is
characterized by dimensionless thermoelectric quality
factor ZT. This parameter depends on electrical
conductivity (σ), Seebeck coefficient (S) and thermal
conductivity (λ).
The layered transition metals chalcogenides are
typical 2-D solid materials in a bulk form. Such
materials have been used as solid lubricant,
photovoltaic and photocatalytic solar energy
converters, catalysts in many other industrial
applications and others for many years [2].
Nevertheless, thermoelectric properties of layered
transition metals chalcogenides are of great interest due
to their low thermal conductivity.
metals such as WS2 and WSe2 where authors mainly
studied single crystals of chalcogenides of transition
metals [3]. In paper [4] authors investigated transport
properties of ternary mixed WS2-ySey single crystals. In
our work we have researched thermoelectric properties
of polycrystalline solid solutions of WS2-ySey and W1xNbxS2 for thermoelectric applications.
II.
EXPERIMENTAL RESULTS AND DISCUSION
A. Synthesis and characterization of the compositions
A series of samples of compositions WS2-ySey (y =
0.1, 0.2, 0.25) and W1-xNbxS2 ( x = 0.05 and 0.15) were
synthesized by means of ampoule high-temperature
method. High purity elements were used for the
syntheses. The starting powders of metals were
annealed in hydrogen flow at 1000°C for 1 hour in
order to remove adsorbed water and traces of oxides.
Stoichiometric amounts of metal powder and
chalcogenes were placed in quartz ampoules. The
ampoules were vacuumed and sealed, heated up to
800°C during 5 hours and kept at 800°C for 4 days,
then cooled and opened.
According XRD analysis the samples were single
phases corresponded to 2H-WS2 type (hexagonal,
P63/mmc) with some broadening of reflections in
comparison to pure WS2(Fig.1).
One of the brightest representatives of such
materials is a tungsten disulfide. Single-layer tungsten
disulfide is a two-dimensional quasi-crystal, that
consists of close-packed layer of tungsten in between
of two close-packed layers of sulfur S-W-S. Layers are
held together due to weak Van der Waals forces,
whereas atomic structures of layers are tied together by
strong covalent forces.
There are some papers devoted to the study of
chalcogenides thermoelectric properties of transition
Figure 1. XRD powder patterns of WS1.90Se0.10, WS1.80Se0.20,
WS1.75Se0.25
The work was supported by Russian Science Foundation, grant 1413-00674.
MIPRO 2016/MEET
11
The crystal system is hexagonal, space group is
P63/mmc (no. 194). In chemistry of dichalcogenides
this symmetry also called 2H-type of WQ2 (or MoQ2,
where Q = S, Se, Te). There are also exist 3R-type
corresponding to rhombohedral R3m (no. 160) space
group. All obtained compound are 2H-type which
means they have P63/mmc space group. If our samples
were single-crystal, the diffraction reflexes would have
been tight. Since the samples are fine powders, the
broadening of reflexes occurs. The doping of Nb or Se
leads to insignificant changes in unit cell parameters
and small shift (about 0.1 degree) of reflexes in
comparison with pure WS2. Combination of these two
factors leads to a broadening of the reflections
compared to pure WS2.
B. Preparation of the samples and measurement
techique
Before measurements all samples have been kept
under vacuum during 2 hours at 300° in order to remove
absorbed water and oxygen from air. The XRD powder
patterns of annealed samples completely agree with
ones before evacuating. Thermal study of the samples
shows stability of compounds up to 400°C (673 K).
The powdered materials were pressed to 10 mm in
diameter pellets-shape samples. The samples 10 × 2
×2 mm in size were cut from the pellets. Silver paste
was used in order to obtain ohmic contact with
samples.
The measurements of the electrical conductivity
temperature dependence were performed by means of
four-contact method. The thermopower was measured
by means of static dc method. Thermal conductivity
was determined by combining the thermal diffusivity
D(T), specific heat Cp (T) and sample density ρ (T)
according to κtot (T) = D(T) × Cp(T) × ρ(T). The
thermal diffusivity D(T) and specific heat Cp(T) of
several specimens were determined by the flash
diffusivity-heat capacity method using NETZSCH LFA
457 MicroFlash™ instrument.
The temperature ranges for temperature dependence
of Seebeck coefficient and electrical conductivity are
different due to technical capability of measuring
instruments.
current–voltage characteristics, magnetoresistance, etc.)
of materials substantially vary with their decreasing to
nanometer sizes [5-10].
Our investigations of the electrical properties of
materials containing nanoparticles with characteristic
sizes of the order of several nanometers in different
dielectric matrices manifested the variation of not only
electrical conductivity but electron transport
mechanisms as well [11-14]. It was established by the
example of numerous systems that conductivity in
polycrystalline materials with a high contact resistance
is performed by tunneling charge carriers between the
crystallites separated by conducting barriers (contacts
between the crystallites) [15, 16].
If the sizes of crystalline islands are large
sufficiently,
the
temperature
dependence
of
conductivity σ(T) is described by the fluctuation model
of tunneling – the fluctuation induced tunneling
conduction (FIT) [17]:
σ(T) = σ1∙exp[-Tt/(T+Ts
where temperature Tt corresponds to the energy
necessary for the electron transition between the
crystallites (in fact, this transition is associated with
overcoming energy gap Eg ~ kB∙Tt); σ1 is the intrinsic
conductivity and Ts is the temperature, below which the
conductivity reaches the saturation [17].
Electrical conductivity of the samples was measured
in helium atmosphere in a temperature range from 77 K
to 450 K. The measured results are displayed in Fig.2.
The straight lines are approximations of the low
temperature data by “(1)” with Ts=5K for all samples.
The results of temperature dependences of
conductivity demonstrate that the FIT mechanism
makes the dominant contribution to the low temperature
dependence of conductivity for all samples. This fact
leads us to conclusion that contact resistance between
particles makes the main contribution to resistivity of
all samples. On the figure 3 the energy gap (Eg ~ kB∙Tt)
of the WS2-ySey are presented.
C. Electrical properties
Our bulk samples consist of big quantity of the
pressed nanoparticles. The current flow in such bulk
sample will perform both inside nanoparticles and
through the contacts between them. This leads to the
additional variation in the electron transport properties
of massive samples that consist of numerous
nanoparticles. In addition, a poorly conducting layer is
formed on the surface of the most of nanodimensional
electroconductive particles.
In many cases, the electrical conductivity in arrays
of such nanoparticles is determined mainly by the
contact resistance. It is established experimentally that
the electron transport properties (electrical conductivity,
12
Figure 2. Temperature dependence of conductivity of the WS2-ySey
samples in coordinates of dependence “(1)”.
MIPRO 2016/MEET
transition metal atoms in WS2 by Nb are presented in
Fig.5.
Such partial substitution of metal atoms in WS2
leads to decrease in its energy gap and change its
conductivity behavior to metallic one.
As seen from Figure 5, the material with
composition W0.85Nb0.15S2 has value of electrical
conductivity greater than the one of the material with
replacement WS2-ySey approximately by 103 times at
T=320K. The materials W0.85Nb0.15S2 and W0.95Nb0.05S2
have energy gap value of 0.0001 eV and 0.02 eV
correspondingly.
Figure 3. Material composition dependence of the WS2-ySey energy
gap
D. Thermoelectric properties
Seebeck coefficient of the samples was measured at
the temperature range of 190 to 450 K. The results on
Seebeck coefficient are presented in Fig.4.
Figure 5. Temperature dependence of electrical conductivity of the
W1-xNbxS2 samples.
E. Power factor
In present work we researched temperature
dependence of Seebeck coefficient and electrical
conductivity and we used power factor for the
characterization of thermoelectric efficiency.
Figure 4. Temperature dependence of Seebeck coefficient of the
WS2-ySey samples. The inset shows the results of measurements at low
temperatures.
According to obtained data the maximum value of
Seebeck coefficient S = 2000 V/K has material with
composition WS1.8Se0.2. And one has maximum value
at low (S=-525µV/K at T=205 K) and middle
temperatures (S = 1950 µV/K at T=445 K).
Such replacement of chalogen atom leads to
increase in Seebeck coefficient. According to our
preliminary measurements, the thermal conductivity of
WS2-ySey is about 1-1.2 W/m*K. Such materials have a
large value of the Seebeck coefficient on the one hand,
and very low value of the electrical conductivity on the
other hand. Therefore, in order to increase the electrical
conductivity we have replaced transition metal atoms
W partially by atoms Nb.
As we have shown in paper [18] such replacements
significantly increased the electrical conductivity. The
results of our work on measurements of replaced
MIPRO 2016/MEET
To evaluate the effectiveness of thermoelectric
materials power factor was calculated by the formula
“(2)”.
P = S2σ
where S – Seebeck coefficient (V/K), σ – electrical
conductivity (S/cm).
The results of the thermoelectric power factor
calculation are presented in Fig. 6.
One can see that these materials have not very high
value of power factor in comparison with modern
thermoelectric materials due to low electrical
conductivity. But polycrystalline solid solutions of
WS2-ySey have a very high Seebeck coefficient. These
parameters are interdependent quantities [19] so a
compromise between electrical conductivity and
Seebeck coefficient needs to be found. We believe that
replacement of metal atoms W in WS2 partially by
atoms Nb should be an efficient way of tuning
thermoelectric properties towards the system's
optimum.
13
[2]
[3]
[4]
[5]
[6]
Figure 6. Temperature dependence of power factor of the WS2-ySey
samples. The inset shows the results of measurements at low
temperatures.
III.
[7]
CONLUSION
A series of samples of compositions WS2-ySey (y =
0.1, 0.2, 0.25) and W1-xNbxS2 ( x = 0.05 and 0.15) were
synthesized by means of ampoule high-temperature
method. Electron transport properties of obtained
nanocomposite bulk materials WS2 and solid solutions
W1-xNbxS2, WS2-ySey at temperature range from 77 K
to 450K were investigated.
We have found that Seebeck coefficients of these
materials have a high value. The maximum value of the
Seebeck coefficient up to 2000µV/K has been obtained
for composition WS1.8Se0.2. However, the power factor
P was found to be low. We have found that in
nanocomposite bulk materials W0.85Nb0.15S2 electrical
conductivity increased by 103 times at room
temperature, but the Seebeck coefficient was lower than
in any of WS2-ySey .
It was shown that it is possible to change
thermoelectric and transport properties by controlling
of Se and Nb contents in composites WS2-ySey and W1xNbxS2.
In the scope of present work we succeed in
increasing of the Seebeck coefficient by replacing
chalcogen atoms in WS2 by the factor of 2. But despite
such a significant change of the Seebeck coefficient,
these bulk polycrystalline composite materials still
cannot compete with modern thermoelectric materials
such as Bi2Te3 thin films with P= 2000 µW/m*K at
room temperature [20].
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
ACKNOWLEDGMENT
The work was supported by Russian Science
Foundation, grant 14-13-00674.
References
[1]
14
M. G. Kanatzidis, “Nanostructured thermoelectrics: the new
paradigm?,” Chemistry of materials, vol. 22(3), pp. 648–659,
2010.
[17]
[18]
Haotian Wang, Hongtao Yuan, Seung Sae Hong, Yanbin Li and
Yi Cui, “Physical and chemical tuning of two-dimensional
transition metal dichalcogenides,” Royal society of chemistry,
vol. 44, pp. 2664–2680, 2015.
Jong-Young Kim, Soon Mok Choi, Won-Seon Seo and WooSeok Cho, “ Thermal and electronic properties of exfoliated
metal chalcogenides,” Korean Chem.Soc, vol.3, pp.3225-3227,
2010.
G. K. Solanki, D.N. Gujarathi, M. P. Lakshminarayana and M.
K. Agarwal, “Transport property measurement in tungsten
sulphoselenide single crystals grown by a CVT technique,”
Crystal research and technology, vol. 43, pp. 179–185, 2008.
Zaitsev-Zotov S.V., Pokrovskii V. Y., Monso P. “ Transition to
1D conduction with decreasing thickness of the crystals of
TaS3 and NbSe3 quasi-1D conductors,” JETP Lett., vol. 73, pp.
29-32, 2001.
Hor Y. S., Xiao Z. L., Welp U., Ito Y., Mitchell J. F., Cook R.
E., et al. “Nanowires and Nanoribbons of Charge-DensityWave Conductor NbSe3,” Nano Letters, vol.5,pp. 397-401,
2005.
Stabile A. A., Whittaker L, Wu T. L., Marley P. M., Banerjee
S, Sambandamurthy G. “Synthesis, characterization, and finite
size effects on electrical transport of nanoribbons of the charge
density wave conductor NbSe3,”Nanotechnology, vol.22, pp. 1
-12, 2011.
Monceau P. “Electronic crystals: an experimental overview,”
Advances in Physics, vol.61, pp. 325-581, 2012.
Romanenko A.I., Anikeeva O.B., Kuznetsov V.L., Obrastsov
A.N., Volkov A. P., Garshev A. V. “Quasi-two-dimensional
conductivity and magnetoconductivity of graphite-like nanosize
crystallites,” Solid State Communications, vol.137, pp. 625629, 2006.
Romanenko A. I., Anikeeva O. B., Buryakov T. I., Tkachev E.
N., Zhdanov K. R., Kuznetsov V. L., et al. “Electrophysical
properties of multiwalled carbon nanotubes with various
diameters,” Phys Status Solidi B,vol.246, pp. 2641-2644, 2009.
Chen J, Zhang G, Li B. “Impacts of Atomistic Coating on
Thermal Conductivity of Germanium Nanowires,” Nano Lett,
vol. 12, pp. 2826-2832, 2012.
Romanenko A. I., Anikeeva O. B., Buryakov T. I., Tkachev E.
N., Zhdanov K. R., Kuznetsov V. L., et al. “Influence of
surface layer conditions of multiwall carbon nanotubes on their
electrophysical properties,” Diamond & Related Materials,
vol.19, pp. 964-967, 2010.
Mazov I. N., Kuznetsov V. L., Moseenkov S. I., Ishchenko A.
V., Rudina N. A., Romanenko A. I., et al. “Structure and
Electrophysical
Properties
of
Multiwalled
Carbon
Nanotube/Polymethylmethacrylate Composites Prepared via
Coagulation Technique,” Nanoscience and Nanotechnology
Letters, vol.3, pp. 18-23, 2011.
Romanenko A. I., Fedorov V. E., Artemkina S. B., Anikeeva O.
B., Poltarak P. A. “Temperature Dependences of Transport
Properties of Films, Bulk Samples of Nanocrystals, and Single
Crystals of Niobium Triselenide,” Physics of the Solid State.
vol.57, pp. 1850-1854, 2015.
Zhao Y, Li W. “Fluctuation-induced tunneling dominated
electrical transport in multi-layered single-walled carbon
nanotube films,” Thin Solid Films, vol.519, pp. 7987-7991,
2011.
Romanenko A. I., Dybtsev D. N., Fedin V. P., Aliev S. B.,
Limaev K. M. “Electric-Field-Induced Metastable State of
Electrical Conductivity in Polyaniline Nanoparticles
Polymerized in Nanopores of a MIL-101 Dielectric Matrix,”
JETP Letters, vol. 101, pp. 59-63, 2015.
Sheng P. “Fluctuation-Induced Tunneling Conduction in
Disordered Materials,” Physical Review B, vol.21, pp. 21802195, 1980.
V.E. Fedorov, N.G. Naumov, A.N. Lavrov, M.S. Tarasenko,
S.B. Artemkina, A.I. Romanenko and M.V. Medvedev,
“Tuning Electronic Properties of Molybdenum Disulfide by a
Substitution in Metal Sublattice,” 6th International Convention
on Information & Communication Technology Electronics &
Microelectronics,pp.11-14, 2013.
MIPRO 2016/MEET
[19] Miroslav Ocko, Sanja Zonja and Mile Ivanda “Thermoelectric
materials: problems and perspectives,”MIPRO 2010, pp.16-21,
2010.
[20] Jae-Hwan Kim, Jung-Yeol Choi, Jae-Man Bae, Min-Young
Kim and Tae-Sung Oh “Thermoelectric characteristics of ntype Bi2Te3 and p-type Sb2Te3 thin films prepared by coevaporation and annealing for thermopile sensor application,”
Materials Transactions,vol.54, pp. 618-625, 2013
MIPRO 2016/MEET
15
Piezoresistive Effect in Polycrystalline
Bulk and Film Layered Sulphide W0.95Re0.05S2
V. A. Kuznetsov*, **, a, A. I. Romanenko*, A. S. Berdinsky**,
A. Yu. Ledneva**, S. B. Artemkina**, V. E. Fedorov**, ***, b
Novosibirsk State Technical University / Semiconductor Devices and Microelectronics, Novosibirsk, Russia
Nikolaev Institute of Inorganic Chemistry, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia
*** Novosibirsk State University, Novosibirsk, Russia
a
e-mail address: vitalii.a.kuznetsov@gmail.com
b
e-mail address: fed@niic.nsc.ru
*
**
Abstract - The paper reports the results of an experimental
investigation of electron transport properties and
piezoresistive effect of polycrystalline film and bulk samples
of
tungsten-rhenium
disulphide
W0.95Re0.05S2.
Polycrystalline powder of the composition was synthesized
by direct high temperature reaction of elements with
stoichiometric ration. The film samples were prepared by
ultrasonication of the powder in 35% ethanol with
subsequent spraying of the colloidal dispersion onto
preheated substrates. The bulk samples were formed by
conventional compress technology at a pressure of 1.25 GPa.
The strain gauge factor is equal to 13 and 19 for film and
bulk samples, respectively. The band gaps were estimated
from temperature dependences of conductivity to be about
360 and 470 meV, respectively.
I.
INTRODUCTION
New functional materials are very prospective for
microelectronic application. Sensor electronics is one of
the most interesting branches to use unusual properties of
functional
materials.
Traditionally
metal
and
semiconductor strain sensors are used to measure
mechanical quantities. Strain gauge factor (SGF) is one of
the main parameters to evaluate efficiency of strain
sensors. The SGF of semiconductor sensors is one-two
orders more than of the metallic ones, as well as SGF of
monocrystalline semiconductor samples is several times
more than of polycrystalline ones [1, 2].
One of the famous layered materials is graphene and
graphene-based compounds. Electron transport and strain
sensing properties of such materials were investigated
quite well [3-10]. Similar to graphene transition metal
dichalcogenides (TMDC) have layered structure, but their
electron transport and strain sensing properties are not
well known.
Electron transport properties of the thin polycrystalline
films are different in comparison with bulk samples,
because when passing from separate micro- or nanosized
monocrystalline particles to polycrystalline arrays of
them, the current flow in a bulk sample will perform both
inside the particles and through the contacts between
them. That leads to the additional variation in the electron
The study was supported by the Russian Science Foundation
(Grant no. 14-13-00674).
16
transport properties of massive samples that consist of
numerous nano- and microcrystallites. In addition, a low
conducting layer is formed on the surface of the most of
nanodimensional electroconductive particles. In many
cases, the electrical conductivity in arrays of such particles
is determined mainly by the contact resistance. It is
established experimentally that the electron transport
properties (electrical conductivity, current–voltage
characteristics, magnetoresistance, etc.) of materials
substantially vary upon their decreasing to nanometre
sizes [11-15].
It was established by the example of numerous
systems that conductivity in polycrystalline materials with
a high contact resistance was performed by tunnelling
charge carriers between the crystallites separated by
conducting barriers (contacts between the crystallites) [16,
17]. Such electron transport is described by fluctuation
induced tunnelling conduction (FITC) model with several
parameters to be estimated [18]:
σ(T) = σ1 ∙ exp[ – Tt / (T + Ts)],
(1)
where Tt is the temperature which corresponds to the
energy necessary for the electron transition between the
crystallites and Ts is the temperature, below which the
conductivity reaches the saturation.
Recently we investigated electron transport properties
of bulk polycrystalline samples of transition metal
dichalcogenides (TMDC) and strain sensing properties of
film samples of molybdenum-rhenium disulphide [19, 20].
But we have not investigated bulk samples for strain
sensing properties. Tungsten disulphide WS2 is one of
bright representatives from layered materials family. 2HWS2 polymeric three-atom thick (S-W-S) layers are bound
to the neighbouring layers via van der Waals S…S
bonding. Due to the presence of weak van der Waals
interaction MoS2 and WS2 can be dispersed under
ultrasonic treatment in liquid medium forming colloidal
dispersions with nanosized sheets. That exfoliation
method allows producing of stable colloidal dispersions
with the particles a few layers thick and tens to hundreds
nanometre in lateral size [21], and such dispersions can be
used to form thin films, which consist of the nanoparticles.
MIPRO 2016/MEET
Figure 1. X-ray powder diffraction pattern of W0.95Re0.05S2 synthesized
in high-temperature ampoule approach
Figure 2. X-ray powder diffraction pattern of the W0.95Re0.05S2 film
sprayed onto preheated amorphous quartz glass substrate
The goal of this work is to investigate strain gauge
factor and electron transport properties of film and bulk
samples of tungsten disulphide doped with rhenium.
2 mm in width. Samples for studying strain sensing
properties were glued to a beam of uniform strength (in
bending) like in [22]. The glue was polymerized according
to its bonding technology with the maximum temperature
of 180°C. Before heating samples were covered
additionally with the glue to reduce an environmental
influence.
II.
EXPERIMENTAL
A. Preparation of W0.95Re0.05S2 film and bulk samples
We used the same methods and apparatuses as we did
in [22]. W0.95Re0.05S2 was synthesized via ampoule hightemperature method from stoichiometric mixture of the
elements. The synthesized W0.95Re0.05S2 was analysed by
X-ray powder diffraction method. XRD-analysis has
shown the single phase with some broadening of
reflections (see Fig. 1). To prepare film samples 1.0 g of
the powder was placed into a glass flask with 250 ml of
ethanol-water solution (35 / 65% in volume) and
ultrasonicated for 48 h. The resulting mixture was
centrifuged at 1600 rpm for 25 min. The colloidal
dispersion was sprayed onto a preheated to 180°C steel
beam covered with polymer glue VL-931 like in [22].
Thickness of the samples resulted was estimated by
weighting to be about 1 μm. For studying electron
transport properties the dispersion was sprayed the same
way onto Al2O3 polished substrates. The XRD pattern of
thin film sample sprayed onto quartz glass polished is
shown in Fig. 2. As one can see from Fig. 2 only 00l
peaks are visible and it is indirect proof that the particles
in films sprayed are oriented generally in parallel to the
substrates.
Bulk samples were formed of the powder with
a laboratory hydraulic press at a pressure of 1.25 GPa.
Thickness of the tablets was about 0.25 mm. To
investigate the properties the tablets were cut to strips
TABLE I.
Samples
SAMPLES’ PARAMETERS
Sprayed
Pressed
1
250
11000
166
Band gap, Δ (meV)
470
360
Strain gauge factor, K
13.0
19.5
Thickness, d (μm)
Resistivity, ρ (Ohm·cm)
MIPRO 2016/MEET
B. Electron transport properties
The temperature dependences of resistivity were
measured using four-point probe method for bulk samples
and two-point probe method for film ones. Contacts to the
samples were made of the silver paste and thin gold wires.
Two-point probe method was used for films instead of
four-probe one because of high resistance of the films, but
it should be noted that the contacts were ohmic. The
dependences measured are shown in Fig. 3 and Fig. 4 in
different axes. As one can see from Fig. 4 the
experimental curves are well fitted by the equation (1).
That exponential dependence is typical for fluctuation
induced tunnelling conduction with the temperature Tt
corresponding to the energy necessary for the electron
transition between the crystallites. In fact that transition is
associated with overcoming energy gap Δ ~ kB ∙ Tt, the
energy gap an electron has to tunnel through from one
crystallite to another. The energy gaps Δ were estimated
from the slopes of the fitting straight lines (see Fig. 4).
The results are shown in Table 1. As intercrystalline
boundaries are most likely to be smaller in tablets in
contrast to films due to pressing, it seems reasonable to
say that the band gaps difference is concerned with the
difference of the contacts.
C. Strain sensing properties
The device for measuring SGF was the same as we
used in [20, 22]. Electrical contacts to the samples were
made of silver paste “Dotite D-500” and thin copper
wires. The contacts were ohmic. Every compressiontension cycle was lasted for 10 minutes with loading and
unloading period being the same. The dependences of
resistance on strain are shown in Fig. 5 for film samples in
comparison to bulk ones. SGF was defined as slope of the
line connecting two extreme values. The results are shown
in Table 1. One can see that SGF of bulk samples is more
17
Figure 3. Temperature dependences of conductivity of the film and
bulk samples
Figure 4. Temperature dependences of conductivity with the fitting
lines for energy gaps estimation by equation (1)
than SGF of thin films. It was shown above that the
intercrystalline contacts were different in bulk and film
samples, and most likely the reason of the difference in
the SGF is different contribution of the crystallites
themselves and the contacts between them to changing of
electrical resistance of the bulk and film samples when
mechanical strain is applied. Especially as particles of
layered compounds a few layer thick like ones the films
involved consist of have great Young’s modulus [23].
Anyway the difference should be estimated with a
conventional beam because of small thickness of the
beams used and relatively large thickness of the bulk
samples.
electrical conductivity of bulk samples in contrast to films
as a result of better densification of the crystallites due to
pressing. The SGF of bulk samples is higher than the
factor of films. It can be circumstantial evidence that
intercrystalline boundaries make great contribution in
piezoresistive effect.
III.
CONLUSION
We have studied piezoresistive effect and electron
transport properties of film and bulk polycrystalline
tungsten-rhenium disulphide W0.95Re0.05S2 samples.
Temperature dependences of conductivity are well
described by fluctuation induced tunnelling conduction
model. The band gaps have been estimated to be about
470 and 360 meV for film and bulk samples, respectively.
The strain gauge factors of film and bulk samples have
been estimated from experimental curves to be about 13
and 19, respectively. The difference in energy gaps can be
explained by lesser contribution of contact resistance in
Figure 5. Resistance-strain characteristic of W0.95Re0.05S2 film and bulk
samples at room temperature in ambient
18
REFERENCES
[1]
N. Maluf and K. Williams, An Introduction to
Microelectromechanical Systems Engineering. Boston: Artech
House Inc., 2004.
[2] M. Elwenspoek and R. J. Wiegerink, Mechanical microsensors.
Berlin: Springer-Verlag Berlin Heidelberg, 2001.
[3] S. D. Sarma, A. K. Geim, P. Kim, and A. H. MacDonald, "Special
edition: Exploring graphene - Recent research advances," Solid
State Communications, vol. 143, pp. 1-126, 2007.
[4] B. Partoens and F. M. Peeters, "From graphene to graphite:
Electronic structure around the K point," Physical Review B,
vol. 74, pp. 075404-1-10, 2006.
[5] C. Gomez-Navarro, R. T. Weitz, A. M. Bittner, M. Scolari, A.
Mews, M. Burghard, et al., "Electronic transport properties of
individual chemically reduced graphene oxide sheets," Nano
Letters, vol. 7, pp. 3499-3503, 2007.
[6] X. Du, I. Skachko, A. Barker, and E. Y. Andrei, "Approaching
ballistic
transport
in
suspended
graphene,"
Nature
Nanotechnology, vol. 3, pp. 491-495, 2008.
[7] G. Eda, G. Fanchini, and M. Chhowalla, "Large-area ultrathin
films of reduced graphene oxide as a transparent and flexible
electronic material," Nature Nanotechnology, vol. 3, pp. 270-274,
2008.
[8] G. T. Pham, Y.-B. Park, Z. Liang, C. Zhang, and B. Wang,
"Processing and modeling of conductive thermoplastic/carbon
nanotube films for strain sensing," Composites Part
B-Engineering, vol. 39, pp. 209-216, 2008.
[9] N. Hu, Y. Karube, M. Arai, T. Watanabe, C. Yan, Y. Li, et al.,
"Investigation on sensitivity of a polymer/carbon nanotube
composite strain sensor," Carbon, vol. 48, pp. 680-687, 2010.
[10] A. Bessonov, M. Kirikova, S. Haque, I. Gartseev, and M. J. A.
Bailey, "Highly reproducible printable graphite strain gauges for
flexible devices," Sensors and Actuators A-Physical, vol. 206,
pp. 75-80, 2014.
[11] P. Monceau, "Electronic crystals: an experimental overview,"
Advances in Physics, vol. 61, pp. 325-581, 2012.
[12] A. A. Stabile, L. Whittaker, T. L. Wu, P. M. Marley, S. Banerjee,
and G. Sambandamurthy, "Synthesis, characterization, and finite
size effects on electrical transport of nanoribbons of the charge
density wave conductor NbSe3," Nanotechnology, vol. 22,
p. 485201 (6pp), 2011.
MIPRO 2016/MEET
[13] A. I. Romanenko, O. B. Anikeeva, V. L. Kuznetsov, A. N.
Obrastsov, A. P. Volkov, and A. V. Garshev, "Quasi-twodimensional conductivity and magnetoconductivity of graphitelike nanosize crystallites," Solid State Communications, vol. 137,
pp. 625-629, 2006.
[14] Y. S. Hor, Z. L. Xiao, U. Welp, Y. Ito, J. F. Mitchell, R. E. Cook,
et al., "Nanowires and Nanoribbons of Charge-Density-Wave
Conductor NbSe3," Nano Letters, vol. 5, pp. 397-401, 2005.
[15] S. V. Zaitsev-Zotov, V. Y. Pokrovskii, and P. Monceau,
"Transition to 1D Conduction with Decreasing Thickness of the
Crystals of TaS3 and NbSe3 Quasi-1D Conductors," JETP Lett.,
vol. 73, pp. 29-32, 2001.
[16] A. I. Romanenko, D. N. Dybtsev, V. P. Fedin, S. B. Aliev, and K.
M. Limaev, "Electric-Field-Induced Metastable State of Electrical
Conductivity in Polyaniline Nanoparticles Polymerized in
Nanopores of a MIL-101 Dielectric Matrix," JETP Letters,
vol. 101, pp. 59-63, 2015.
[17] Y. Zhao and W. Li, "Fluctuation-induced tunneling dominated
electrical transport in multi-layered single-walled carbon nanotube
films," Thin Solid Films, vol. 519, pp. 7987-7991, 2011.
[18] P. Sheng, "Fluctuation-Induced Tunneling Conduction in
Disordered Materials," Physical Review B, vol. 21, pp. 21802195, 1980.
MIPRO 2016/MEET
[19] V. E. Fedorov, N. G. Naumov, A. N. Lavrov, M. S. Tarasenko, S.
B. Artemkina, and A. I. Romanenko, "Tuning electronic properties
of molybdenum disulfide by a substitution in metal sublattice," in
36th International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO), Opatija,
Croatia, 2013, pp. 11-14.
[20] V. A. Kuznetsov, A. S. Berdinsky, A. Y. Ledneva, S. B.
Artemkina, M. S. Tarasenko, and V. E. Fedorov, "Strain-sensing
element based on layered sulfide Mo0.95Re0.05S2," in 38th
International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO), Opatija,
Croatia, 2015, pp. 15-18.
[21] V. Nicolosi, M. Chhowalla, M. G. Kanatzidis, M. S. Strano, and J.
N. Coleman, "Liquid Exfoliation of Layered Materials," Science,
vol. 340, pp. 1420-+, Jun 21 2013.
[22] V. A. Kuznetsov, A. S. Berdinsky, A. Y. Ledneva, S. B.
Artemkina, M. S. Tarasenko, and V. E. Fedorov, "Film
Mo0.95Re0.05S2 as a strain-sensing element," Sensors and Actuators
A: Physical, vol. 226, pp. 5-10, 2015.
[23] A. Castellanos-Gomez, M. Poot, G. A. Steele, H. S. J. van der
Zant, N. Agrait, and G. Rubio-Bollinger, "Elastic Properties of
Freely Suspended MoS2 Nanosheets," Advanced Materials,
vol. 24, pp. 772-775, 2012.
19
Luminescent diagnostics in the NIR-region on a
base of Yb-porphyrin complexes
V.D. Rumyantseva1,2, I.P. Shilov2, Yu.V. Alekseev3, A.S. Gorshkova1
1. Moscow Technological University, 119454 Moscow, Russia. E-mail: vdrum@mail.ru
2. Kotel’nikov Institute of Radioengineering and Electronics RAS, 141190 Fryazino,
Moscow region, Russia. E-mail: ipshilov@ms.ire.rssi.ru
3. State Scientific Center of Laser Medicine, Moscow, Russia. E-mail: ural377@mail.ru
The problem of early diagnostics, necessary for
successful therapy of oncological diseases, is well known. In
this field an optical diagnostic methods are the most
effective. The optical response of a luminescent label can
indicate the condition of biological tissues and biochemical
processes occurring in them in real time. The most
promising luminescent labels for non-invasive diagnostics
are those that absorb and emit light in the NIR-region
(~700-1100 nm), where the biological tissues absorption and
auto-fluorescence are minimal. One of the most promising
studied by us label is Yb-complex
of 2,4dimethoxyhematoporhyrin IX, it possess the optimum
chemical and photophysical properties such as a
tumorotropic, a high molar extinction coefficient, a large
Stokes shift, an effective luminescence, chemical and light
stability and an ability to be used in aqueous media.
Therefore, a special attention is given to luminescent
diagnostics method development for endoscopic and visually
available cancer forms on a base of Yb-complex of 2,4dimethoxyhematoporhyrin IX and laser-fiber NIR-range
fluorimeter.
La3+ luminescent level lies below T1-porphyrin level, are
of the most interest.
Porphyrins complexes with Er, Nd and Yb possess 4fluminescence in the near-infrared region (NIR-region) of
spectrum, which is become possible because of an
intramolecular energy transfer from the triplet state of a
porphyrin (located in the range of 12 500-13 500 cm-1) to
the lower resonance levels of Er3+, Nd3+ and Yb3+ (6 450,
11 500 and 10 200 cm-1 respectively) (fig. 1) [4].
I. INTRODUCTION
Various porphyrin compounds perform important
functions, which are indispensable for the existence and
development of flora and fauna on Earth. At the end of
the XX century porphyrins have been used in
photodynamic therapy and diagnostics of malignant
tumors due to their ability to accumulate in various types
of cancer cells and tumor microvessels. For today a
whole series of photosensitizers effectively generating a
singlet oxygen, were synthesized: Photofrin II,
Photogem, Foscan, Fotoditazin, Photolon, Radachlorin,
Photosens, Alasens, Tookad and others.
However, the free bases of these macroheterocycles
have a side effect - phototoxicity, which unfavorable
influence during diagnostics procedures. It is possible to
overcome that drawback by use of diagnostics
photosensitizers which are practically do not regenerate a
singlet oxygen, while maintaining a high affinity to
malignant tumors. Ytterbium complexes of natural and
synthetic porphyrins are such compounds [1, 2].
II. RESULTS AND DISCUSSION
Lanthanide luminescence weak enough by itself is
significantly enhanced in a porphyrins metallocomplexes.
It is connected to the transfer of macrocycle excitation
energy to the La3+ ion [3]. Among lanthanides the
erbium, neodymium and ytterbium complexes whose
20
Figure 1. Diagram of energy levels and luminescence spectra of
Er3+, Nd3+ и Yb3+ ions.
Ytterbium porphyrins complexes have been chosen as
research objects due to the fact that under excitation of
the π-electron part of molecule the luminescence which
occur from transitions 2F5/2 → 2F7/2 of the Yb3+ 4felectron level (2F5/2 - excited state, 2F7/2 - the ground
state), is observed. Introduction of ytterbium ion in a
porphyrin leads to reduce of photochemical activity, but
the selectivity of accumulation in tumors, characteristic to
the most of porphyrins, still remains. The reduction of a
singlet oxygen quantum yield is causes by that the
luminescent level of Yb3+ ion lies rather below the triplet
state of the molecule organic part, but higher than that of
a singlet oxygen. As a result, the porphyrin matrix
excitation under the influence of external light radiation
is not transferred to oxygen, but intercepted by Yb 3+ ion,
thereby strongly reducing the sensitized by porphyrin
singlet oxygen generation [5]. These transformations are
shown in fig. 2.
MIPRO 2016/MEET
Figure 2. Scheme of electronic transitions of porphyrin sensitizers and
singlet oxygen generation: (1) absorption, (2) fluorescence, (3)
intercombination conversion, (4) phosphorescence, (5) excitation
transfer to oxygen and the transition of triplet oxygen 3O2 to singlet
oxygen, (6) excitation transfer to the Yb3+ ion, and (7) luminescence of
the Yb3+ ion.
Porphyrin molecules form stable complexes with
ytterbium ions which have intense absorption in the near
infrared region (NIR-region) of spectrum [6]. The
extinction coefficient (ε) for Yb-complexes is 104-105 M1
cm-1 that is almost 4 orders higher than ε value under
direct excitation of Yb3+ ion itself, so one can assume that
Yb3+ ion excitation through porphyrin matrix provides the
most effective way for strong 4f-luminescence than direct
excitation of Yb3+. At the same time, introduction of
various substituents in meso and/or β-positions of
macrocycle allows to drastically modify the
physicochemical properties of lanthanide porphyrins
complexes, that plays the important role in medicine and
photochemistry use [7].
Figure 3. Structural formula of dipotassium salt of Yb-2,4dimethoxyhematoporphyrin IX complex.
In studies of ytterbium ions infrared luminescence in
complexes solutions with organic reagents the main
purpose is to minimize a non-radiative loss of excitation
energy. In macrocyclic ligands a complexing ion is
effectively protected from effects of high frequency O-H
and C-H oscillations of solvent molecules, which play an
important role in a non-radiative degradation of electronic
excitation energy.
The first studies of ytterbium porphyrins complexes
as luminescent markers on animals with malignant
tumors were carried out on liposomal forms of
coproporphyrin
III,
protoporphyrin
IX,
hematoporphyrin IX as their methyl ethers [8], as well as
water-soluble
synthetic
derivatives
of
tetraphenylporphyrin [9].
Ytterbium porphyrins complexes
have
the
characteristic for rare-earth ions narrow and rather bright
luminescence line, which for Yb3+ ion locates in the
infrared range of 975-985 nm in a "therapeutic window of
biological tissues transparency", where their own
luminescence is practically absent. The lifetime (τ) for
the Yb-2,4-dimethoxyhematoporphyrin IX complex was
11 μs [13], the luminescence decline have a nonexponential pattern that is due to a strong luminescence
quenching by a fluctuations of OH-groups from an inner
environment of the ytterbium ion. A significant
difference in lifetimes of the excited state of ytterbium
complexes of porphyrins hydrophobic ethers and their
acids is caused by a presence of intermolecular hydrogen
bonds and luminescence quenching by water [14].
Continuing these studies more than two dozen
ytterbium complexes of natural and synthetic porphyrins
were synthesized [10]. The analysis of physical, chemical
and luminescence characteristics and the results of
biological tests revealed that one of the most promising
compounds for diagnostic purposes is the dipotassium
salt of Yb-2,4-dimethoxyhematoporphyrin IX complex
[11] (fig. 3).
To increase diagnostic potential of ytterbium
porphyrin complexes it is necessary to isolate them from
the quenching effect of a water environment whenever
possible. A preferred solvent for such compounds may be
DMSO,
which
has
unique
biomedical
and
pharmacological properties: it penetrates through
biological membranes, improves transport properties of
drugs, stimulates immune system.
This complex is similar in structure to natural
protoporphyrin IX, which iron complex is the
hemoglobin prosthetic group. The substance is low toxic,
well soluble in water, its synthesis based on blood hemin,
is simple and cheap [12].
The emission luminescence spectra of the Yb-2,4dimethoxyhematoporphyrin IX complex in aqueous
solutions with different concentrations of DMSO are
shown on fig. 4.
MIPRO 2016/MEET
21
mixtures in various proportions were used. The Ybcomplex pharmaceutical composition was found to
accumulate fast enough (less than 1 hour) in places with
pathological changed skin and mucous membranes.
Herewith the clear luminescence intensity difference was
established compared to healthy tissue (900-1100 nm
range). The luminescence parameters change depends on
a measurement time and a pathological process character
[17].
This method can be successfully applied in
dermatology, dentistry, gynecology, veterinary and other
fields of medicine; it is characterized by simplicity of
performance, availability, informativity and low toxicity.
The best results of accumulation in pathological changed
skin were obtained for the pharmaceutical composition
based on Tisolum. The contrast index value was in range
3.0-15.0.
III. CONLUSION
Figure 4. Emission luminescence spectra of the Yb-2,4-dimethoxyhematoporphyrin IX in aqueous solutions with different
DMSO concentrations: 1 - 100% DMSO, 2 - 50% DMSO,
3 - 20% DMSO, 4 - 100% H2O.
Under conditions of lower polarity (solutions with
growing concentration of DMSO) emission maxima are
shifted
toward
long-wave
spectrum
region
(solvatochromism phenomenon) [15]. From fig. 4 one
can see that the luminescence intensity increases
significantly with increasing of DMSO concentration
(> 10 times in transition from Yb-complex aqueous
solution to 100% DMSO solution), and the emission
spectrum maximum shifts to almost 10 nm at the same
time. The lifetime in 100% DMSO solution was ~ 22 μs.
А 20-30% aqueous DMSO solutions are allowed in
medicine and are of practical interest in use them for
intravenous injections. For them τ ~ 5÷10 μs.
Study of a photosensitized luminescence kinetic
signals of singlet oxygen in aqueous solutions showed
that the quantum yield of a singlet oxygen generation for
the Yb-2,4-dimethoxyhematoporphyrin IX complex
reduces almost 4 times (to 11%) from 40% for a free base
porphyrin, that experimentally confirms its low
phototoxicity.
Previously in in vivo experiments on mice with
subcutaneously grafted sarcoma 5-37 it was found by
luminescence in the NIR-region the preferential Ybcomplex accumulation in tumor tissue compared with
normal one [16]. The injection of the dipotassium salt of
Yb-complex was carried out intravenously, the
luminescent accumulation contrast was determined after
48 hours.
Continuing the studies we designed the amphiphilic
pharmaceutical compositions in the form of gels for
epikutan use as well as for application to mucous
membranes. The optimum concentration of the Yb-2,4dimethoxyhematoporphyrin IX complex (~ 0.05%, w/w)
was received, gels Tisolum, Kalgel, Cremophor and their
22
Thus, ytterbium porphyrins complexes are promising
diagnostic markers of malignant tumors and pathological
changes of skin and mucous membranes in the NIRregion of spectrum and possess a low phototoxicity.
ACKNOWLEDGMENT
This work was supported by the Russian Federation
Ministry of Education and Science, the project №
4.128.2014/K.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, L.D. Roytman, V.I.
Chissov, V.D. Rumyantseva, G.M. Sukhin. Dokl. AN SSSR.
Seriya Biofizika. 1989, vol. 309, N 4, pp. 980-983 (in russ).
M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, V.D. Rumyantseva,
V.I. Chissov, G.M. Sukhin. J. Photochem. Photobiol., B.: Biology.
1990. vol. 7, pp. 15-20.
A.F. Mironov. Uspekhi Khimii. 2013, vol. 82, N 4, pp. 333-351
(in russ).
I.P. Shilov, A.V. Ivanov, V.D. Rumyantseva, A.F. Mironov. In
book: Fundamental scienses for medicine. Biophysical medical
technologies (Ed. A.I. Grigorev and Yu.A. Vladimirov). M.:
Maks-Press. Vol. 2, 2015, pp. 110-144 (in russ).
A.V. Ivanov, V.D. Rumyantseva, K.S. Shchamkhalov, I.P. Shilov.
Laser Physics, 2010, vol. 20, N 12, pp. 2056-2065.
M. Gouterman. The Porphyrins (Ed. D. Dolphin), N 4, San
Francisco, London: Academic Press. 1978, vol. 3, pp. 1-156.
M.I. Gaiduk, V.V. Grigoryants, A.F. Mironov, V.D. Rumyantseva.
Proc. Estonian Acad. Sci. Phys. Math., 1991, vol. 40, N 3,
pp. 198-204.
RF Patent 1340087. Byull. Izobret. N 18 filed 27.06.1995. P. 273.
M.I. Gaiduk, V.V. Grigoryants, V.D. Menenkov, A.F. Mironov,
V.D. Rumyantseva. Izv. AN SSSR. Seriya Fizika. 1990, vol. 54,
N 10, pp. 1904-1908 (in russ).
V.D. Rumyantseva, A.S. Gorshkova, A.F. Mironov. Fine Chem.
Tech., 2014, vol. 9, N 1, pp. 3-17.
RF Patent 2411243. Byull. Izobret. N 4 filed 10.02.2011. P. 731.
V.D. Rumyantseva, A.F. Mironov, I.P. Shilov, K.S.
Shchamkhalov, A.S. Ryabov, A.V. Ivanov. Abstracts of the XIII
Int. Sci-Techn. Conf. «High-Tech in Chemical Engineering2010». Russia, Suzdal. 29.06–02.07.2010. P. 192 (in russ).
V.D. Rumyantseva, A.F. Mironov, K.S. Shchamkhalov, G.M.
Sukhin, I.P. Shilov, V.M. Markushev, Z.V. Kuzmina, N.I.
Polyanskaya, A.V. Ivanov. Lazernaya Medicina, 2010, vol. 14,
N 1, pp. 20-25 (in russ).
MIPRO 2016/MEET
[14] A.S. Stachevski, V.N. Knyukshto, A.V. Ivanov, V.D.
Rumyantseva, I.P. Shilov, V.A. Galievsky, B.M. Dzhagarov.
Abstracs of the Int. Conf. «Molecular and cellular basics of
biosistems functioning». 17.06–20.06.2014. Belarus, Minsk: Book
of Abstracs in 2 ch. Ch. 1, 2014, pp. 128-130 (in russ).
[15] A.S. Stachevski, V.N. Knyukshto, A.V. Ivanov, V.D.
Rumyantseva, I.P. Shilov, V.A. Galievsky, B.M. Dzhagarov. J.
Appl. Spectr., 2014, vol. 81, N 6, pp. 938-942.
MIPRO 2016/MEET
[16] V.D. Rumyantseva, K.S. Shchamkhalov, I.P. Shilov,
L.U. Kochmarev,
V.M.
Markushev,
Z.V.
Kuzmina,
N.I. Polyanskaya, A.S. Ryabov, A.V. Ivanov. Medicinskaya
fizika, 2011, N 2, pp. 67-73 (in russ).
[17] Yu.V. Alekseev, A.V. Ivanov, A.S. Ryabov, N.M. Shumilova, I.P.
Shilov. Ros. Bioterapevticheskiy Zhurnal, 2015, vol. 14, N 1,
P. 59 (in russ).
23
Simulation Study of the Composite Silicon Solar
Cell Efficiency Sensitivity to the Absorption
Coefficients and the Thickness of intrinsic
Absorber Layer
*
*
V. Tudić N. Posavec
* Karlovac University of Applied Sciences/Department of Mechanical Engineering, Karlovac, Croatia
vladimir.tudic@vuka.hr
nikola.posavec.1@gmail.com
ABSTRACT - In this paper, two silicon solar cells p+-ii-n+
with homogenous and heterogeneous intrinsic absorber
layers based on hydrogenated amorphous-nanocrystallinemicrocrystalline silicon (a-Si:H/nc-Si:H/c-Si:H) have been
studied by computer modeling and simulation program
(AMPS-1D - Analysis of Microelectronic and Photonic
Structures). Various factors that affect cell efficiency
performance have been studied such as layers absorption
coefficients, band gap and layer thickness up to 1200nm. It
was found that in the case of standard solar cell conditions
a layers absorption coefficient has a major contribution to
solar cell performance according to measurement on the
actual solar cell samples. It is demonstrated that, for
homogenous a-Si:H/nc-Si:H intrinsic absorber layer with
constant crystal fraction of Xc=30% cell efficiency is higher
than in case of heterogeneous intrinsic absorber layer
which contains various crystal fractions depending of
absorber layer thickness. Second case scenario of silicon
thin film composite structure is more common in solar cells
production by using PECVD and HWCVD deposition
techniques which is proven by X-ray diffraction and high
resolution electron microscopy measurements.
I.
INTRODUCTION
In previous work a semi conducting silicon
properties, hetero-junctions and photo effect in intrinsic
silicon thin films have been studied, and optical
generation and recombination of free carriers have been
investigated [1, 2]. Also, bases physical principles of
simple silicon solar cells with accent to amorphousnanocrystalline layers as promissing composite material
for high efficiency solar cells called third generation
have been studied. In present work basic principles of
pin structure solar cells have been carried out with one
dimensional computer modelling programme AMPS-1D.
Suggested solar cell model are based on simple pin
structure in order to compare simulation results with
others in references. Computer modelling programmed
allows solar cell parameter calculations and structure
design simulations. By varying characteristic set of solar
cells parameters such as: illumination spectra, photon
flux, absorption coefficient, boundary conditions, front
and back contact parameter, general silicon layer
parameters as doping and free carrier concentrations,
24
mobility, gap state defect distribution, I-V characteristic,
fill factor (FF), and efficiency () of solar cell can be
determinate. In this work a few simulation groups are
carried out with homogenous distribution of Si nc in
layer and with layers where crystallinity and crystal
sizes change across the layer. Calculated solar cells
efficiency of modelled single and multilayer absorber
structures were graphically presented and discussed.
A goal of this work is to suggest possible
application of a-Si:H/nc-Si:H/c-Si:H film as active part
in a typical pin solar cell in respect to overall
performance. A detailed comparison of calculated data
extracted from defined model suggest optimal absorber
layer thickness in composite silicon solar cell and leads
to a better understanding of effective solar cell thickness.
II.
INTRINSIC LAYER MICROSTRUCTURE
Amorphous-nanocrystalline silicon (a-nc-Si) films of
few hundred nanometers in thickness consist of a matrix
of amorphous silicon with embedded silicon crystals of
nanometric dimensions [3, 4]. This material has
improved properties with respect to pure amorphous Si
(a-Si), micro-crystalline Si (c-Si) and bulk crystalline Si
(c-Si) owning quantum confinement effect [5]. When
compared to amorphous silicon, nc-Si has better
electrical transport characteristic [6], possibility to tailor
the optical band gap [7] and resistance to light induced
degradation [8].
Micro-crystalline silicon (c-Si) films of 1-2
micrometers in thickness consist of small crystal
aggregates in deeper layers and large grains/columns up
to film surface [9, 10]. On the other hand microcrystalline Si (c-Si) and bulk crystalline Si (c-Si)
compared to nc-Si has better conductivity along crystal
grains according to higher free carrier mobility and less
numbers of grain boundaries. Therefore c-Si layers has
better electrical transport characteristic compared to a-ncSi [11].
The main advantage of nc-Si in comparison with
crystalline silicon is its higher absorption, which allows
efficient solar cells in thin film designed device. This
MIPRO 2016/MEET
material could be fraction of “small crystals”, expected to
show effect of increase in optical gap due to quantum
confinement predictable according to effective mass
theory or quantum dots.
band gap energy (Eg=1.82eV) caused by lower crystalline
fraction (vol. 30%).
As a result of PECVD and HWCVD deposition
techniques micro-crystalline silicon thin films typically
forms a microstructure presented in Figure 1. It has a
complicated microstructure, mixture of crystalline
silicon (c-Si) grains, grain boundaries and/or
amorphous-nano-crystalline hydrogenated silicon (a-ncSi:H) often called ”tissue”. Approach followed in this
paper is to compare on as wide as possible range of a-ncc-Si:H layer-samples the microstructure (grain size,
crystallinity and roughness) with the optical and
transport properties and to find overall performance. We
have used the fact that a-nc-c-Si:H microstructure
changes with the thickness of sample [11].
AFM and SEM micrographs of some authors [10,
12] reveals surface morphology of micro-crystalline
silicon films thickness as sample of 1.4 micrometers
published in [12].
Figure 2. Distribution of absorption coefficient of nc-Si (black line),
a-Si (green line) and c-Si (red line) calculated from FTPS and PDS [4].
IV.
SIMULATION MODEL
Computer modeling software AMPS-1D can simulate
all modeled semiconductor and photovoltaic device
structures. In the present version of AMPS-1D user can
chose one of two different calculation models: Density of
States (DOS approach) and Carrier Lifetime Model
(CLM). DOS approach is more suited for silicon
amorphous and nano-crystalline thin films layers due to
large defect densities in midgap states [13, 14].
Figure 1. Typical SEM micrograph of composite silicon (a-nc-cSi:H) thin film 1.4 m of thickness produced with Cat-CVD (Hot Wire
CVD) technique, published in [12].
Large complexity of microstructure in hydrogenated
microcrystalline silicon and existence of tissue with at
least two different sizes of crystallites determine the
optical properties and therefore modeled complex free
carrier mobility based on mechanism of transport.
III.
OPTICAL PROPERTIES
The optical properties of composite silicon thin films
strongly depend on the production conditions and
determination of the structural properties of the thin film.
The spectra of absorption coefficient, (E), of typical ncSi thin film calculated from transmittance, Fourier
transform photocurrent spectroscopy (FTPS) and photo
deflection spectroscopy (PDS) are shown in Figure 2.
Absorption coefficient data used in our calculation and
device simulation was published in literature [3, 4].
Amongst others, sample number two (K2) is chosen in
simulation modeling because of convenient value of its
MIPRO 2016/MEET
Model of photovoltaic device can be simulated
through optical and material parameters on device
designed structure. Material and optical properties affect
electrical parameters in tree differential equations in
correlation: the Poisson's equation, electron continuity
equation and hole continuity equation. Those tree
equations are solved simultaneously under nonequilibrium steady-state conditions (i.e., under the effect
of light, voltage bias or both) by using method of finite
differences and Newton-Raphson technique. The used
equations are:
Poisson's equation,
x x
x
(1)
Electron continuity equation,
1 J n x
R px , nx Gop x
q x
(2)
And hole continuity equation,
1 J p x
Gop x R px , nx
q x
(3)
25
All carriers in semiconductor layer can be described by
net charge density value x , expressed like
x q px nx pT x nT x N A N D
(4)
And the present electrostatic field E is defined as
x=0 and x=L. These constraints force the mathematics to
acknowledge the fact that the currents must cross at x=0
and x=L (contact position) by either thermionic emission
or interface recombination. Mathematically expressed
current values at boundaries:
J n (0) qS n 0 n0 n0 (0)
(8)
(5)
J p (0) qS p 0 p0 p0 (0)
(9)
In equation is the dielectric constant, E is electrostatic
field, (x) represents the position of energy in the local
vacuum level, x is position in the device, n and p the
extended states density in conduction and valence band,
respectively, pT and nT the trapped hole and electron
population density, NA acceptor doping density, ND the
donor doping density, if exists, the q the electron charge,
R(x) the recombination rate, Gop(x) the optical generation
rate of free electron-hole pair, Jn and Jp the electron and
hole current density, respectively. The term R(x) is the
net recombination rate resulting from band-to-band
(BTB) direct recombination and Shockley-Read-Hall
(SRH) indirect recombination traffic through gap states.
The model used in AMPS for indirect recombination
assumes that the traffic back and forth between the
delocalized bands and the various types of delocalized
gap states is controlled by SRH, capture and emission
mechanisms. Since AMPS has the flexibility to analyze
device structures which are under light bias
(illumination) as well as voltage bias, the continuity
equations include the term Gop(x) which is the optical
generation rate as a function of x due to externally
imposed illumination.
J n ( L) qS nL nL n0 ( L)
(10)
J p (0) qS pL pL p0 ( L)
(11)
x .
E
x
Generally, the three state variables completely define
the state of a device: local vacuum energy level and
the quazi-Fermi levels EFp and EFn. Once those tree
dependent variables are calculated as a function of
position in device, all other parameter can be determinate
as its function. In thermodynamic equilibrium, the Fermi
level is a constant as a function of position and hence the
tree equations (1-3) essentially reduce to Poisson's
equation. Therefore, local vacuum energy level is the
only variable to solve in thermodynamic equilibrium.
Otherwise, in non-thermodynamic equilibrium
steady-state, a system of three correlated non-linear
second order differential equations in the tree unknowns
(, EFp, EFn) is obtained. Further calculation need six
boundary conditions, two for each dependent variable.
The first two boundary conditions are modified versions
of the ones used for solving Poisson's equation in
thermodynamic equilibrium:
0 0 L bL 0 V
(6)
And
L 0
(7)
Where L is the total length of the modeled device,
L are the electron affinities at x=0 and x=L,
respectively and V is applied voltage. Zero value of local
vacuum energy level is at boundary point x=L. The
four other boundary conditions are obtained from
imposing constraints on the currents at the boundaries at
26
Where Sn0Sp0 are surface recombination velocities for
electrons and holes respectively at the x=0 interface and
the quantities are the corresponding velocities at the x=L
interface. Strongly limited by thermionic emission largest
value of recombination velocities cannot over cross
107cms-1. In equations (8-9) n(0) and p(0) are the electron
and hole density at x=0, n(L) and p(L) are the same
values at x=L. Analogy, n0(0) and p0(0), n0(L) and p0(L)
are the electron and hole density at the thermodynamic
equilibrium at boundaries x=0 and x=L, respectively.
Now, when all conditions are defined, simultaneously
calculation of , EFp and EFn can be obtained.
The model used in our simulation for gap states
consist of two exponential tail states distributions and
two Gaussian distributions of deep defects states [13].
V.
SOLAR CELL PARAMETERS
In this paper we have simulated two types of single
junction solar cells both with absorber layer thickness of
1200 nm. The first type have the standard pin structure
with one homogenous absorber: (p-) a-Si:H/ (i-) a-Si:H/
(n-) a-Si:H and the second type with inhomogeneous
absorber has a multi-layer structure with 9 intrinsic
absorbers with various structural and therefore optical
properties (Fig.2). Thickness of layers for standard pin
structure is as follows: (p-) 8nm/ (i-) 1400nm/ (n-) 15nm.
We propose an implementation of chemically
textured zinc-oxide ZnO:Al film or SnO2 as a front TCO
in our p-i-n solar cells, and in combination with Ag as a
textured back reflector-enhancing dielectric layer for
additional efficiency. Such modeled cells exhibit
excellent optical and light-trapping properties
demonstrated by high short-circuit current densities.
A.
Boundary conditions
The tree governing equations (1), (2) and (3) must
hold every position in the device and the solution to
those equations involves determining the state variables
(x) EFp(x) and EFn(x). Non-linear and coupled
equations cannot be solved analytically, numerical
methods must be utilized. Boundary conditions must be
imposed on the sets of equations. These are expressed in
terms of conditions on the local vacuum level and the
currents at the contacts. To be specific the solution to
equations (1), (2) and (3) must satisfy the boundary
conditions (6-11).
MIPRO 2016/MEET
In computer modeling program AMPS-1D (0) is
PHIB0, (L) is PHIBL, S is recombination speed for
holes and electrons depending on carrier position.
Parameter RF is reflection coefficient at x=0, RB is
reflection coefficient at x=L. In actual situation RF is
parameter of lost photon flux transmitting throe glass
substrate and TCO layer. Measurements of solar cells
internal optical loses (in visible spectrum) showing
wavelength () or energy dependence[5, 6, 15].
Reflection coefficient RB depends of optical
characteristics of back electrode which acts as an optical
mirror and efficient light trapping. The properties of the
front contact and back contact used as the model
parameters are shown in Table I.
Table I. Boundary conditions in AMPS-1D: PHIB0 is (0) at x=0;
PHIBL is (L) at x=L (total device length); SN0-SNL-SP0-SPL are
surface recombination speeds at x=0 and x=L (N=electron, P=hole);
RF is reflection coefficient at x=0; RB is reflection coefficient at x=L.
B.
Front Contact
Back Contact
PHIB0 = 1.730 eV
SN0 = 1x107 cm/s
SP0 = 1x107 cm/s
PHIBL = 0.120 eV
SNL = 1x107 cm/s
SPL = 1x107 cm/s
RF = 0.250
RB = 0.600
Solar cells design
Second type model of solar cell with multilayer
absorbers will be detailed explained in this part. Solar
cell design in fact represents a-Si:H/nc-Si:H/c-Si:H
inhomogeneous intrinsic absorber of pin silicon solar
cell with thickness of 1200 nm (Fig.3).
Figure 3. Drawing of composite silicon (a-nc-c-Si:H) thin film cross
section of 1.2 m thickness; it consists of multi layers with different
structure and absorption properties.
Design complies of multi-layer structure with 9
individual intrinsic absorbers arbitrary modeled
thickness with various structural and optical properties
(Fig.4). Solar cell model structure is as follows:
Transparent Conductive Oxide (TCO) is solar cell front
contact for collecting holes (0.3 m), a-Si:H (p-1)
window layer (8 nm), a-nc-Si:H absorbers i1-2/i4-5
(100nm), nc-Si:H absorber i5-6/i6-7 (100 nm), nc-cSi:H
absorber i7-8/i8-9 (200 nm), cSi:H absorber i9-10 (400
MIPRO 2016/MEET
nm), a-Si:H (n-11) back layer (15 nm), Aluminumdoped Zinc Oxide (AZO) is reflection layer (0.1 m),
Ag as a textured back reflector (0.5-2 m), Aluminum
back contact electrode (2-3 mm).
Figure 4. Model of solar cell with multi layer absorbers: (p-) is layer
1, (n-) is layer 11 and layers 2-10 are 9 individual intrinsic absorbers
with various thicknesses, structural and optical properties [18].
The input parameters of all modeled solar cell layers
in order to simulate efficiency properties of actual solar
cell are not given in complete because of complexity.
Published selected mobility parameters are taken from
references [16, 17] and absorption coefficients from anc-Si:H and c-Si:H samples. We used standard boundary
conditions and standard global illumination conditions
(Table I), air-mass 1/cos (AM1.5 spectrum),
1000W/m2 at 300 K temperature reference.
C.
Absorption coefficients
One set of absorption coefficient data we used in our
calculation and device simulation was measured on
samples and published in literature [3, 9]. It is coefficient
for a-nc-Si:H sample and c-Si:H silicon layer. Amongst
others, a-nc-Si:H sample is chosen in this modeling
because of convenient value of its band gap energy
(Eg=1.82eV) caused by lower crystalline fraction (vol.
30%), layer thickness (100 nm) and calculated DC
conductivities and free carrier mobility [2]. Presented
modeled device of solar cell (Fig.4) consists of 9
absorber layers with 2 actual absorption coefficients and
7 approximated absorption coefficients between a-ncSi:H (front) and c-Si:H (back) silicon layer. Therefore,
absorption coefficients of each individual absorber layer
are modeled and proposed by linear approximationsuperposition according to different microstructure
(Fig.3) through solar cell variable length (parameter x).
According to layer thickness or calculated length x each
individual absorber layer has different superposition ratio
of absorption coefficients between a-nc-Si:H and c-Si:H
layers as it is in actual solar cell. For example: absorption
layer 2 (i1-) in modeled solar cell is a-nc-Si:H actual
sample with measured absorption coefficient, absorption
layer 3 (i2-) is a-nc-Si:H modeled layer with proposed
absorption coefficient in ratio a-nc-Si/c-Si (0,25/0,75),
absorption layer 3 (i2-) is a-nc-Si:H modeled layer with
proposed absorption coefficient in ratio a-nc-Si/c-Si
(0,35/0,65), etc. Last layers are: absorption layer 9 (i8-) in
modeled solar cell is nc-c-Si:H layer with proposed
absorption coefficient in ratio nc-Si/c-Si (0,85/0,15) and
absorption layer 10 (i9-) is c-Si:H actual sample with
measured absorption coefficient (Fig. 5).
27
1000000
apsorption coefficient (1/cm)
100000
10000
a-nc-Si:H sample
lin. aprox. 1
lin. aprox. 2
lin. aprox. 3
lin. aprox. 4
lin. aprox. 5
1000
lin. aprox. 6
lin. aprox. 7
mc-Si:H sample
100
300
400
500
600
700
800
900
lambda (nm)
Figure 5. Absorption coefficients () of nine absorber layers in
modeled solar cell [18].
hundred nanometers of thickness of a-nc-Si:H tissue.
Current densities in our simulation had never reached its
calculated maximum in both cases. In first solar cell
model simulation improvement in efficiency is supported
by at least one order of magnitude better absorption in anc-Si:H homogenous absorber, good conductivity and
therefore high free carrier mobility. Also, efficiency
curve points to saturation at 600 nm and its decrease at
thicknesses higher then 900 nm. Calculated solar cell
performances in first model simulation are likely
expected according to physical nature of photogeneration and recombination of electron-hole pairs in
intrinsic silicon with controlled general silicon layer
parameters as doping and free carrier concentrations,
mobility and gap state defect distribution.
20
Free carriers mobility
VI
SIMULATION RESULTS AND DISCUSION
The performance of modeled solar cells was
analyzed in respect to the current density (JSC) and
efficiency () by incorporating the layer parameters into
AMPS- 1D. First pin structure consists of homogenous
intrinsic a-nc-Si:H absorber in thickness of 1200 nm with
constant crystal fraction of Xc=30%. In second pin
structure with the same thickness we modeled an
inhomogeneous intrinsic absorber which consists of 9
individual homogenous layers with different optical and
electrical properties suggesting experimentally proven
structure in-homogeneity. In standard simulation
conditions defined earlier for the first structure
simulation results shown predictable curves and maximal
value of JSC=17.421mA/cm2 at 1200 nm of absorber
thickness and maximal efficiency of =13.992% at
935nm (Fig.6.). In second modeled structure calculated
values are different: JSC=14.781mA/cm2 at 1200 nm of
absorber thickness and maximal solar cell efficiency of
=12.32% at 492 nm (Fig.7.). For both type of modeled
devices simulation promotes typically exponentional rise
of current density in the first 300 nm of thickness
according to excellent photon absorption and collection
of photo-generated electron-hole pairs in absorber
structure with 10-100 ns free carrier life time in first few
28
18
16
14
2
Jsc (mA/cm ); efficiency (%)
In mixed phases silicon thin film layers transport
mechanism strongly depends of carrier mobility (cm2
V-1s-1). Electron n and hole p mobility have
dependence to crystal lattice temperature also donor-like
and acceptor-like doped concentrations [18], defect
density [11], DC conductivity [19], suggesting electron
mobility at temperature of 300 K maximal values of
1250 cm2 V-1s-1 and hole mobility maximal values of 400
cm2 V-1s-1 in bulk (intrinsic) crystalline silicon. For
electron mobility at temperature of 300 K in a-Si:H
intrinsic silicon layers in simulation values are modeled
as follows: (MUN) 10-20 cm2V-1s-1, (MUP) 2-4 cm2V-1s-1.
For a-nc-Si:H thin film layers values are: (MUN) 100-250
cm2V-1s-1, (MUP) 8-60 cm2V-1s-1; for nc-Si:H layers
(MUN) 400-650 cm2V-1s-1, (MUP) 100-180 cm2V-1s-1, for
nc-Si:H/c-Si:H layers (MUN) 800-1000 cm2V-1s-1,
(MUP) 200-300 cm2V-1s-1; and for c-Si:H layers (MUN)
1200-1250 cm2V-1s-1, (MUP) 300-400 cm2V-1s-1.
12
10
current density Jsc
efficiency (%)
8
6
0
200
400
600
800
1000
1200
1400
absorber thickness (nm)
Figure 6. Graphical presentation of calculated solar cell current density
JSC (mA/cm2) and efficiency () in case of homogeneous absorber
with crystal fraction of 30%.
Second pin model solar cell design implies
experimentally proven structure in-homogeneity of
silicon CVD thin films. Absorption coefficient decreases
drastically through structure instead of increasing of free
carrier mobility according to tissue structure changes. As
a result of different optical and electrical properties in the
structure layers of the in-homogenous solar cell
calculated performance is expectable. Optimum
efficiency of =12.32% is reached at 492 nm (Fig.7.)
and current density of JSC=12.21 mA/cm2 exists at the
same absorber thickness.
16
15
14
J SC (mA/cm2); efficiency (%)
D.
13
12
11
current density Jsc
10
y = 2E-08x3 - 4E-05x2 + 0,0307x + 5,8456
efficiency
9
Poly. (efficiency )
8
7
6
0
200
400
600
800
1000
1200
1400
absorber thickness d (nm)
Figure 7. Graphical presentation of calculated solar cell current density
JSC (mA/cm2) and efficiency () in case of inhomogeneous absorber
presented with 9 different absorber layers.
MIPRO 2016/MEET
In Figure 7 additional curve (black line) represents
third degree Polynomial of efficiency curve suggesting
efficiency calculation in any absorber thickness (absorber
dimension x).
VII
CONCLUSION
In this study we have simulated two types of single
junction solar cells with absorbers of a-nc-Si:H and aSi:H/nc-Si:H/c-Si:H tissues. A series of simulations
were carried out in order to calculate the efficiency of
modeled solar cell by varying the properties each of
layers in the range published in the literature. The
obtained results show clearly that the cells reach its
optimum efficiency at different thicknesses. In case of
homogenous silicon a-nc-Si:H tissue expected efficiency
is around 14% and this value is in correlation with crystal
fraction, absorption coefficient and free carrier mobility,
respectively. In case of in-homogenous a-Si:H/ncSi:H/c-Si:H absorber expected efficiency is around 12%
and lower strongly depending of structure characteristic.
The observed silicon layers specificity in the optical
and electrical properties can be explained as a
consequence of thin film deposition techniques forming
regions of nano and micro crystals with arbitrary
concentrations in amorphous matrix to determine free
carrier transport model.
However, the goal of this work is quite ambitious in
its choice to enable a preventive efficiency calculation of
composite silicon solar cells before its production if
deposition techniques are known, constant and stabile.
Building of data matrix concerning arbitrary
concentrations of crystals and thickness of deposition
layers it could be possible to predict solar cell efficiency
using for example a principal method and third degree
Polynomial approximation such as the one derived here.
REFERENCES
[1]
D. Gracin, K. Jurajić, I. Djerdj, A. Gajović, S. Bernstorff, V.
Tudić, M. Čeh, “Amorphous-nanocrystalline silicon thin films
for single and tandem solar cells”, 14th Photovoltaic Technical
Conference - Thin Film & Advanced Silicon Solutions, June,
2012, Aix en Provence, France.
[2]
V. Tudić, “AC Impedance Spectroscopy of a-nc-Si:H Thin
Films”, Scientific Research Engineering, July, 2014, vol. 6, No.
8, pp. 449-461. doi: 10.4236/eng.2014.68047.
[3]
J. Sancho-Parramon, D. Gracin, M. Modreanu, A. Gajovic,
“Optical spectroscopy study of nc-Si-based p-i-n solar cells“,
Solar Energy Materials & Solar Cells 93, 2009, pp. 1768-1772.
[4]
D. Gracin, A. Gajović, K. Juraić, J. Sancho-Parramon, M. Čeh:
“Correlating
Raman-spectroscopy
and
high-resolution
transmission-electron-microscopy studies of amorphousnanocrystalline multilayered silicon thin films“; Thin Solid
Films 517, 2009, vol. 18, pp. 5453-5458.
[5]
A. M. Ali, “Origin of photoluminescence in nanocrystalline
Si:H films“, Journal of Luminescence, 2007, vol. 126, pp. 614622.
[6]
A. V. Shah, J. Meier, E.Vallat-Sauvain, N. Wyrsch. U. Kroll, C.
Droz, U. Graf, “Material and solar cell research in
microcrystalline silicon“, Solar Energy Materials & Solar Cells,
2003, vol. 78, pp. 469-491.
[7]
A. M. Ali, “Optical properties of nanocrystalline silicon films
deposited by plasma-enhanced chemical vapor deposition“,
Optical Materials, 2007, vol. 30, pp. 238-243.
MIPRO 2016/MEET
[8]
S. Hazra, S. Ray, “Nanocrystalline silicon as intrinsic layer in
thin film solar cells, Solid State Commun., 1999, vol. 109, pp.
125-128.
[9]
D. Gracin, A. Gajović, K. Juraić, J. Sancho-Parramon, M. Čeh:
“Correlating
Raman-spectroscopy
and
high-resolution
transmission-electron-microscopy studies of amorphousnanocrystalline multilayered silicon thin films“, Thin Solid
Films, 2009, vol. 517, Is. 18, pp. 5453-5458.
[10] Kočka, J., Stuchlíkova, H., Stuchlík, J., Rezek, B., Mates, T.,
Švrcek, V., Fojtík, P., Pelant, I., Fejfar, A., “Microcrystalline
silicon - relation of transport and microstructure“, Solid State
Phenomena, 2001, Vol. 80-81, pp. 213-224.
[11] Kočka, J., Stuchlíkova, H., Stuchlík, J., Rezek, B., Mates, T.,
Švrcek, V., Fojtík, P., Pelant, I., Fejfar, A., Model od transport
in Micro-crystalline silicon“, Journal of Non-Crystalline Solids,
2002, Vol. 299-302, pp. 355-359.
[12] Moutinho, H.R., Jiang, C.-S., Perkins, J., Xu, Y., Nelson, B.P.,
Jones, K.M., Romero, M.J., Al-Jassim M.M., “Effects of dilution
ratio and seed layer on the crystallinity of microcrystalline silicon
thin films deposited by hot-wire chemical vapor deposition“,
Thin Solid Films, 2003, Vol. 430, Issues 1–2, pp. 135–140.
[13] A. Belfar, R. Mostefaoui, “Simulation of n1-p2 Microcrystalline
Silicon Tunnel Junction with AMPS-1D in a-SiC:H/c-Si:H
Tandem Solar Cell, Journal of Applied Science, 2011, pp.
10.3923.
[14] S. Tripati, R. O. Dusane, “AMPS-1D simulation studies of
electron transport in mc-SI:H thin films“, Journal of NonCrystalline Solids, 2006, vol. 352, pp. 1105-1108.
[15] R. H. Franken, R. L. Stolk, H. Li, C. H. M. Van der Werf, J. K.
Rath, R. E. I. Schropp, “Understanding light trapping by light
scattering textured back electrodes in thin film n-i-p-type silicon
solar cells“, Journal of Applied Physics, 2007, vol. 102, pp.
014503.
[16] D. Stieler, V. D. Dalal, K. Muthukrishnan, M. Noack, E.
Schares, “Electron mobility in nanocrystalline devices“, Journal
of Applied Physics, 2006, vol. 100, doi: 10.1063/1.2234545.
[17] B. Van Zeghbroeck, “Mobility Carrier Transport, Principles of
semiconductor devices“, ECEE University of Colorado, 2011.
[18] V. Tudić, “Modeling of Electric characteristics of the
photovoltaic amorphous-nanocrystalline silicon cell“, doctoral
thesis, FER Zagreb, 2014, 245 pp.
[19] K. Shimakawa, “Photo-carrier transport in nanocrystalline silicon
films“, Journal of Non-Crystalline Solids, 2006, vol. 352, Issues
9-29, pp. 1180–1183.
29
The investigation of influence of localized states
on a-Si:H p-i-n photodiode transient response to
blue light impulse with blue light optical bias
#
Marko Čović#, Vera Gradišnik# and Željko Jeričević*
Engineering Faculty/Department of Electrical Engineering, Rijeka, Croatia
Engineering Faculty/Department of Computer Engineering, Rijeka, Croatia
*
zeljko.jericevic@riteh.hr
Abstract - The series of experiments measuring the
transient response of a-Si:H pin photodiode to light
impulses superimposed to constant light (optical bias
dependence of modulated photocurrent method - OBMPC)
of the same wavelength (430 nm) and various reverse
voltages on photodiode was performed in order to
characterize localized states of the energy gap of amorphous
silicon and their influence on photocurrent degradation.
The responses were analyzed as a sum of decaying
exponential functions using the least squares method and a
generalized Fosse's algorithm. This type of response is
typical for independent relaxation processes running at the
same time. Experiments and subsequent data processing
illustrate feasibility of the method and results for the
transient response of a-Si:H pin photodiode. The results
strongly suggest two energy levels between 0.32 eV and 0.45
eV. These results were obtained applying the optical ac blue
and dc blue bias light in a low frequency regime.
I.
INTRODUCTION
Exponential decay is typical for a single relaxation
processes in physics and first order chemical reactions in
chemical kinetics, as well as in biology. In more complex
situations where few independent processes of this type
are going on in parallel, the summary measurements from
the system consist of the sum of exponential functions.
For example, in a mixture of radionuclides, each one
decays independently and with a different rate controlled
by its half life. Although the separation of exponentials
from summary signal looks deceptively simple, it is
actually a tough numerical problem because of the nonorthogonality of exponentials. Attempts to separate
exponentials with close half times iteratively by nonlinear
least squares usually results in large number of iterations
and no convergence. For the analysis reported here we
used the least squares method with a linearization step
based on numerical integration. After the linearization, the
solution of the multi-exponential problem is obtained by
solving an over-determined system of linear equations
followed by finding the roots of polynomial. The number
of exponentials in the signal dictates the degree of
polynomial, rank of the linear system, and multiplicity of
numerical integration. The advantage of accurate
linearization is that the separation of exponentials
becomes a noniterative procedure and the condition
number of the linear system can be used to control the
30
quality of solution. The procedure is completely
generalized and initial guesses are not necessary. It is also
simple to implement non-negativity constrains on a
solution.
The complex nature of localized states, such as native
and metastable defects in a-Si:H [1], has an influence on
a-Si:H p-i-n photodiode transient response. Other authors
used the multiexponential trapping rate and modulated
photocurrent (MPC) technique [2] to determine
parameters of localized states throughout the entire energy
gap by employing frequency and temperature scans. We
examined the nature and the kinetics of light-induced
defects creation in a-Si:H films and photodiode and their
influence on photocurrent degradation. We measured and
analyzed the transient response of a-Si:H p-i-n photodiode
to blue light impulse superimposed to the blue light
optical bias (optical bias dependence of modulated
photocurrent method – OBMPC [2, 3]) at various reverse
bias voltages and one frequency. By means of OBMPC,
the trap and recombination localized states parameters
throughout the entire energy gap can be identified. To the
low-frequency MPC data the deeper recombination
centers also contribute [2]. The purpose of this work is to
identify the nature and role of trapping and recombination
process of mobile carriers. This was done under condition
of bias and modulated blue light of weak illumination
intensity, at low modulation frequency and at applied low
reverse bias voltages on a-Si:H p-i-n photodiode. In this
regime, the localized states with deeper energy levels in
low frequency regime can be identified. Based on
experimentally obtained results, the photodiode transient
responses are reconstructed with the help of numerical
modeling.
Under described experimental conditions, the
measured transient responses show the presence of one or
two decaying exponential functions corresponding to two
energy levels between 0.32 eV and 0.45 eV.
II. THEORY AND RESULTS
The basic idea of the computing method was first
proposed by Foss [4], and was later used by Matheson [5]
in chemistry and Jericevic [6] in biology. In all above
mentioned papers the method was developed and applied
to special cases, no general solution was developed,
MIPRO 2016/MEET
although Foss claimed it is possible to construct one. The
general solution was finally developed by Jericevic [7],
and it does represent the foundation on which here
described data processing is based.
The approach outlined here offers unprecedented
flexibility addressing important problems of accurate and
fast (real time) processing for multiple models (for an
arbitrary number of exponential terms) using the same
computing methodology. Previous implementations
required that specialized subroutines for each individual
case (mono-exponential, bi-exponential, tri-exponential,
etc) have to be written.
The major steps in out methodology are:
1.
2.
3.
4.
5.
Linearization by numerical integration method
Solution of linear system of equations with or
without non-negativity constrains
Determining coefficients and roots of polynomial
equation(s) based on linear system solutions.
Decay constants computed in the previous step are
used to compute pre-exponential terms.
Testing and verifying the results by detailed error
analysis.
The detailed development of a general solution is
presented in [7] and here we are giving a brief description
of the final result only:
y =
G e n e r a l s o lu t i o n f o r
N
∑
Ai e − k t
i
i =1
N
p1 = −∑ ki
(1)
i =1
pn = −
N!
( N − n )!n!
n
∑ ∏k
i =1
1
m ∈ ( N Cn )
m
N
p N = −∏ k i
(2)
( 3)
i =1
N
pN +1 = ∑ Ai
( 4)
i =1
i= N
pN + 2 = ∑ Ai
i =1
pN + n =
N
∑
j =1; j ≠ i
1
∑ Ai
(n − 1)! i =1
N
( 5)
kj
( N −1)!
( N − n )!( n −1)!
∑
j =1;
n −1
∏k
1
m
m ∈ ( N −1Cn −1 ; m ≠ i)
MIPRO 2016/MEET
( 6)
p2 N =
N
N
1
A
∑ i ∏ kj
( N − 1)! i =1 j =1; j ≠i
(7)
Where vector of parameters p is a solution of linear
system based on consecutive N-fold numerical integration
of multi-exponential equation in time t (N is the number of
exponential terms in the summary signal y). From the first
N polynomial equations in k (vector of decay constants) is
calculated. This set of polynomial equations are presented
in brief as (1) to (3). After vector k is known, the
computing of parameter vector A (vector of population of
states) from the last N equations (4) to (7) becomes a
linear problem. The details and complete development of
the solution is in [7]. Our rationale for using the general
multi-exponential solution is that we did not postulate the
number of components (N) in advance. Instead, we let the
data determine N by fitting with an increasing number of
exponentials and subsequently used the best fit for
photodiode characterization.
The a-Si:H p-i-n photodiode, due to the presence of
localized states in the energy gap, shows the multiexponential decay in transient response on light pulse
when the light is taken off. The number of components
included in response was not known in advance.
In our experiments, the a-Si:H p-i-n photodiode is
illuminated with monochromatic blue LED (Kinghbright
FULL COLOR RGB LAMPS, 430 nm, IF= 20 mA) light
consisting of a constant (bias) and pulsed (probe) light
beams of the same relatively weak intensity and low
frequency. Consequently, the electron-hole pair
photogeneration rate contributes to a constant and a pulsed
(transient) mobile carrier density component. For blue
light pulse and bias illumination, electron-hole pairs are
generated near the front surface. Our sample of a-Si:H p-in is illuminated from p-type layer electrons. We assume
that the majority carriers are in i-type layer. Due to
trapping and release interaction of free carriers with band
gap localized states, the resulting transient photocurrent
will have time delay with respect to the light excitation.
Also, the photocurrent decay will happen long before the
photocurrent reduction which is due to free carrier
recombination. The carrier from shallow trap are
reemitted soon after capture, but the carriers in a deep trap
reside longer, are practically lost, and are the major cause
for the transient photcurrent decay (base-line current tail).
The transition of mobile carriers, detected in the
experiment, relies only on the exchanges between
localized gap states and extended states. The hole, at blue
light absorption, move directly into the front contact and
their contribution to the transient photocurrent is small [3].
The dc illumination determines the position of quasiFermi level for electrons, Etn and holes, Etp and can be
deduced from the measurement of the dc photocurrent
from [8, 9]
EC − Etn = kT ⋅ ln( μ n N C Aqξ / I phdc )
(8 )
where is the mobility µp = 10 cm2V-1s-1, the effective
density of states Nc = 1020 cm-3, ξ is bias voltage
dependent electric field, and attempt-to-escape frequency
ν0 =1⋅1012 s-1. The time response characteristic for dc part
31
Figure 1. Measured and calculated switch-off transient response of aSi:H p-i-n photodiode on blue light pulse at blue bias light at 1.5 V
bias reverse voltage. Measured is a summary signal (experimental),
Theory is fitted function, Energies are fitted components and
Measured – Theory is a difference between experimental
measurements and fitted function.
Figure 2. Energies of two energy levels obtained from measured
transient responses of a-Si:H p-i-n photodiode by blue light pulse at blue
bias light at reverse bias voltages from 0 V to 2 V.
i
is close to Etn ,
E ωi n ≈ E tni . In region where ω << ω ci
and ω are comparable to
frequency regime.
ω ci
is the so called low
of generation rate of the gap state at the energy E is given
by [9] 1/τ(E) = cnndc + cppdc + en(E) + ep(E), where ndc,
pdc are free electron and hole density, cn, cp capture
coefficients of electrons, holes, en(E), ep(E) emission
frequency toward the conduction, valence band. The
occupation function of the gap states fdc at constant
illumination change from 1 to 0 in two steps which
occurring at two quasi-Fermi levels of trapped carriers, Etn
and Etp, at which the emission of electron and hole are
equal to characteristic capture frequency, ωc. The energy
gap divide, dependently on localized states energy
positon, two quasi-Fermi levels, in electron trapping states
(E > Etn), hole trapping states (E < Etp) and recombination
states (Etp < E < Etn).
All the states have characteristic response time shorter
than the period of the ac signal. The phase shift is low
and is not induced by trapping-and-release events of the
MPC. The energies fall in recombination centers and the
information on gap states at Etp and Etn is done.
In the low frequency (LF) regime the DOS (Density Of
States) at the quasi Fermi level of trapped carriers is as
we propose the modified expression from one done in [9]
At bias voltage Vi, the characteristic frequency, ω c ,
where the delay time td and pulse period Tp, Gdc the dc
generation rate, kB Boltzmann constant, T temperature.
(9)
The transient photocurrent decay is due to transit time,
which is voltage dependent. Due to localized states the
transit time has two component, corresponding to two
energies, and expressed [3] as
i
which is the capture rate of electrons and holes into each
type or probed gap state is given by [2]
ωci = ndc cni + pdc cip
where ndc (pdc) are free carrier electron (hole) density and
cni (c ip ) the capture coefficient for an electron (hole)
changes. The applied reverse bias voltage scan provides
the spectroscopy of the gap states instead of temperature
scan.
To characterize the transient behavior two other
energies have to be introduced: the characteristic time
response of the gap states and the characteristic time,
which is the period of the ac signal.
In the MPC (Modulated PhotoCurrent method)
experiment the ac behavior is characterized with two other
energies, comparing the inverse of characteristic time
response of the localized gap states 1 / τ( E ) and the
angular frequency ω of the ac signal. If ω<ωc than 1/τ(E)>
ωc>ω. The distribution energy described by relation from
[2, 8]
EC − Eω n = kT ln(ν 0 / ω )
32
(10 )
N ( Etn ) ≈
Gdc td
kBT (Tp / 2)
⎛ d
tt = t0 + ∑ N ( Ei )σ vth ⎜
i
⎝ μnξ
(11)
⎞ −1 ( Ei / kT )
⎟ν i e
⎠
(12 )
where time that electron spend in conduction band is t0,
thermal velocity vth, electron capture cross section σ,
discrete localized states N(Ei) at Ei energy levels, electric
field ξ, electron mobility µn, intrinsic layer width d.
The a-Si:H p-i-n photodiode structure used in our
experiment is well described in [10] and their transient
response on light pulse of blue, green and red light at 2V
reverse bias in [11].
The p-i-n structure was deposited on a transparent
conductive oxide (TCO) coated glass from undiluted SiH4
by plasma-enhanced CVD and is as follows: glass/TCO/ptype (5 nm)/i-type (300 nm)/n-type (5 nm)/Al back
contact as described in [10]. The n-type layer was made
by adding phosphine and the p-type by adding diborane to
the gas mixture. The back contact was aluminum
deposited by the evaporation. The active surface area of
the photodiode was 0.81 cm2. Photo-illumination was
MIPRO 2016/MEET
term of the deeper gap states and those nearer midgap
states was higher than those of the shallower energy
levels. These results agree with those obtained by oher
authors [2], where the capture coefficients of the gap
states closer to the midgap were higher than those of the
shallow energy levels.
III.
Figure 3. Preexponential terms obtained from measured transient
responses of a-Si:H p-i-n photodiode by blue light pulse at blue bias light
at reverse bias voltages from 0 V to 2 V.
obtained through the bottom p-type layer. The transient
response of a-Si:H photodiode was measured as a
response to the light pulses superimposed to the constant
light (optical bias dependence of modulated photocurrent
method - OBMPC) of the same wavelength of 430 nm and
at reverse voltages from 0V to 2V.
Samples were measured at the room temperature.
Photocurrent was measured directly on the 10 kΩ load
resistor. The two blue (B) light LED for the probe (ac) and
the pump (dc bias) from Multicolor LED lamp were used
in the experiments, emitting at 430 nm. The emitted
photons are of energies higher than the band gap energy.
The optical light powers were defined at 20 mA LED bias
current and the probe pulse period 3 ms with 50% duty
cycle.
The measured a-Si:H photodiode switch-off transient
response on a 3 ms light pulses of blue light at bias blue
light and at 1.5 V reverse bias is shown in Fig. 1. The
responses at bias voltages of 0 V, 0.5 V, 1 V, 1.5 V and
2V were analyzed as a sum of decaying exponential
functions using the least squares method and a generalized
Fosse's algorithm as described above. From Fig.1 it is
evident that for 1.5 V reverse bias voltage, the states with
shallower energy (E1 = 0.4397 eV) and corresponding
preexponential term 2.133·10-6 have a small contribution
to total photocurrent. The deeper energy state (E2 = 0.4492
eV) with a 1.2886·10-5 preexponentail term prevails in the
transient response. It can be concluded that second energy
states behave as deep acceptor localized states. The
calculated value of DOS is 5·1014 cm-3eV-1 centered at
EC-Ei=0.45 eV with EC-Etn=0.66 eV and EC-Eωn=0.5 eV.
The calculated energies of localized energy levels are
presented in Fig. 2. The summary measurements from the
system consist of the sum of two exponential functions in
all the cases, as shown in Fig. 1. The pre-exponential
factor yields the information about presented species of
localized states is shown in Fig 3. The preexponentail
MIPRO 2016/MEET
CONCLUSION
Our experiments indicate that the used frequency of
modulated light is lower then the critical frequency and
therefore fall in a low frequency regime. The experiments
are done using the same light intensity for biased and
modulated light. Under those conditions the trap centers
act as a recombination centers and influence the a-si:H pi-n photodiode transient response.
The results of OBMPC experiment have been
collected and analyzed in time domain to characterize the
behavior of a-Si:H p-i-n photodiode. For the selected light
pulse frequency and the low bias voltages on photodiode,
the LF regime can be used to determine DOS values and
energy levels. Our results show the presence of two
energy levels and their influence on a-Si:H p-i-n transient
response on blue light pulses at bias blue light. The
preexponentail term of the deeper gap states and those
nearer midgap states was higher than those of the
shallower energy levels. These results agree with those
obtained by other authors [2], where the capture
coefficients of the gap states near midgap found higher
than those of the shallow energy levels.
ACKNOWLEDGMENT
We thank the referees whose comments improved the
paper and DKJ for helping with English
REFERENCES
Melskens, J. et al., IEEE J. Photovolt., 6 (2014) 1331-1336 Year:
2014, Volume: 4, Issue: 6 Pages: 1331 - 1336,
DOI:
10.1109/JPHOTOV.2014.2349655
[2] Pomoni, M., Kounavis, P., Phil. Mag., 94:21, (2015) 2447-2471.
[3] Shen, D.S., Wagner, S. , J. Appl. Phys. 79, (1996) 794 – 801.
[4] Foss, S.D., Biometrics, 26 (1970) 815-821.
[5] Matheson, I.B.C., Anal. Instr., 16 (1987) 345-373.
[6] Jericevic, Z. et al., Adv. Cell Biol., 3 (1990) 111-151.
[7] Jericevic, Z. “Method for Fitting a Sum of Exponentials to
Experimental Data by Linearization Using Numerical Integration
Approximation, and Its Application to Well Log Data”, USP
#7,088,097, 2006.
[8] Kounavis, P. , J. Appl.Phys., vol. 77, (1995) 3872-3878.
[9] Kleider, J.-P. et al., Phys. Stat. Sol.C, 5 (2004) 1208-1226.
[10] Gradisnik, V. et al., IEEE Trans. Electron Devices, 49, (2002) 550
- 556.
[11] Gradisnik, V et al., IEEE Trans. Electron Devices, 53, (2006)
2485 – 2491.
[1]
33
Analysis of Electrical and Optical Characteristics
of InP/InGaAs Avalanche Photodiodes in Linear
Regime by a New Simulation Environment
Tihomir Knežević and Tomislav Suligoj
University of Zagreb, Faculty of Electrical Engineering and Computing, Micro and Nano Electronics Laboratory,
Croatia
tihomir.knezevic@fer.hr
Abstract - The linear characteristics of the InP/InGaAs
avalanche detectors are modeled and numerically analyzed
by developing a new TCAD-based simulation environment.
Temperature dependency of the impact ionization
coefficients in InP are fitted for 200 K to 300 K temperature
range. Adjustment of the model parameters for the
simulations of the dark current sources in InP and InGaAs
materials is performed in the same temperature range.
Optical constants of the InGaAs material used in the layer
stack are fitted to account for the absorption in the material
for a range of wavelengths between 0.9 and 1.7 µm. Dark
current and I-V characteristics under illumination are
simulated and analyzed. Impact of the operating temperature
on responsivity, breakdown voltage and dark current are
analyzed. Excess noise factor is also calculated. Process
simulations of Zn diffusion into InP are included in the
TCAD simulator and the impact of the real diffusion profiles
on the diode characteristics are assessed. The dark current
for the structure with diffused Zn p+ region decreases by a
factor of 1.7 compared to the structure with box-like constant
concentration p+ region extracted at operating temperature
of 200 K at 90% of VBR.
I.
INTRODUCTION
Low-light detection in the near-infrared range is
commonly achieved by InP/InGaAs heterostructures
employed in separate absorption, grading, charge and
multiplication (SAGCM) avalanche photodiodes (APD).
Avalanche photodiodes can be operated in linear or
“Geiger” mode. In comparison to the standard pin
photodiodes, InP/InGaAs APDs operated in linear mode
have a higher sensitivity which promotes them to a device
of choice for the optical communication systems [1].
Photodiode detectors operated in “Geiger” mode, above
breakdown voltage, are called single-photon avalanche
diodes (SPADs). Single-photon detection in the
wavelength range above 1 µm is important for quantum
cryptography [2], eye-safe laser ranging (Light Detection
And Ranging – LIDAR) [3], time-resolved spectroscopy,
photon-counting optical communication [4], etc. In
“Geiger” mode, photogenerated carriers can produce selfsustaining avalanche. Avalanche is stopped by using a
quenching circuit [5] and a SPAD is ready to detect a new
photon.
In SAGCM structures, InGaAs (In0.53Ga0.47As) is used
as an absorbing region and its lattice is matched to the one
34
of InP. Bandgap of the InGaAs layer is 0.75 eV at room
temperature and the layer is used for detection of light with
wavelengths in range between 0.9 and 1.7 µm. SAGCM
structure provides a way to limit the dark current due to the
tunneling in a narrow bandgap InGaAs layer. High-field
multiplication region is located in the low-doped InP layer
while the charge layer limits the spread of the electric field
to the absorption region. Reduced tunneling and impact
ionization in the InGaAs layer contribute to the decreased
dark current and improved overall performance of the
photodetector.
Thermally generated carriers, tunneling currents and
background photons all give rise to dark current, which
decreases the sensitivity of the APD operated in the linear
mode. The same is true for SPADs where the unwanted
carriers can trigger an avalanche. This introduces noise in
the operation of the SPAD called Dark Count Rate (DCR).
Device structure e.g. layer stack thicknesses and doping
concentrations can impact the performance of the APDs by
changing the magnitude of the dark current coming from
trap-assisted tunneling (TAT) [6]-[8]. On the other hand,
thermally generated dark current is commonly reduced by
decreasing the operating temperature of the InP/InGaAs
APDs.
The key element in designing a high performance APDs
is the ability to predict the device behavior for different
structure parameters. There are simulator environments and
analytical models capable of analyzing and simulating
optical and electrical characteristics of the InP/InGaAs
photodiodes [6], [9]-[13]. However, none of them exploits
the functionality of a TCAD software to model the linear
InP/InGaAs APD characteristics in the 200 to 300 K
temperature range for realistic p+ region doping profiles.
A new simulation environment using Sentaurus TCAD
is developed. This TCAD-based environment is capable of
simulating electrical and optical characteristics of the
InP/InGaAs APDs. In this paper, fitted TCAD models
enable the simulations of the avalanche generation in InP,
thermal generation in InP, InGaAs and InGaAsP and trapassisted tunneling in InP. Process simulations of Zn
diffusion into InP are also implemented in the simulation
environment. Optical and electrical simulations of the
active part of the structure proposed by Liu et al. [6] are
performed for temperatures of 200 K and 300 K. Excess
noise factor is simulated for the same structure. Impact of
MIPRO 2016/MEET
the realistic p+ region diffusion profiles on the electrical
characteristics is also assessed.
II.
DEVICE SIMULATIONS
A. Fitting of the physical model parameters for
InP/InGaAs APD device simulations
Commercially available Sentaurus TCAD software
from Synopsys is used for the simulations of the
InP/InGaAs structure. The TCAD software is capable of
performing both device and process simulations. Device
simulator [14] can be used for electrical, optical and
thermal simulations of the user-defined geometry with
different materials. However, simulator provides full
functionality of all the models for Si materials. In order to
use the physical models for simulations of InP, InGaAs and
InGaAsP materials for low temperatures, the physical
model parameters should be fitted and properly tested.
Doping profile of the analyzed InP/InGaAs APD
structure is shown in Fig. 1 (a). The parameters of the
structure such as doping concentrations and layer
thicknesses are almost identical as those in [6]. Buffer layer
with thickness of 2 µm is defined on top of the bulk n+ InP
region. Buffer layer is followed by the InGaAs absorption
region with thickness of 3 µm. Between the InGaAs layer
and InP there is a InGaAsP grading layer, which serves to
improve the transient characteristics of the device. Field
stop or charge region with thickness of 1.2 µm and doping
p InP
19
10
18
10
17
10
16
10
15
10
14
Absorption
region
p+ InP
Buffer layer
p+ InP
n- InP
n InP
InGaAsP
-3
Doping concentration (cm )
10
Charge
region
Grading
Multip.
(a)
+
Bulk InP
n+ InP
n- InGaAs
n InP
n+ InP
bulk
n InP
concentration of 2.1 · 1016 cm-3 is used to limit the spread of
the electric field into the InGaAs layer. Low-doped n- InP
layer is defined on top of the charge layer. P+ region is
defined as a highly doped box-like p+ region with the
constant doping concentration of 2 · 1018 cm-3, in the first
approximation. Multiplication region is defined to be
0.5 µm thick. Band diagram of the structure at 0 V, 300 K
is shown in Fig. 1 (b). The difference in the bandgap of the
InP and InGaAs materials causes valence and conduction
band discontinuities. Valence band discontinuity can limit
the speed of the device since the generated holes have to
overcome the potential barrier. Grading layer decreases the
valence discontinuity increasing the response speed of the
device.
In order to obtain the current-voltage characteristics of
the InP/InGaAs diode, the impact ionization coefficients
for InP material must be fitted first. TCAD simulator
provides different models capable of modeling impact
ionization coefficients in wide electric field and
temperature range. However, the temperature dependency
of the used Okuto-Crowell model for impact ionization in
InP could not be obtained just by fitting the appropriate
coefficients. Therefore, the parameters for the electron and
hole impact ionization coefficients a, b and δ are fitted in
the different temperature steps to the impact ionization
values from the analytical model from [10]. In [10] the
analytical expression was constructed to obtain best fit to
the measured data and represents a quasi-physical model.
The comparison of the fitted impact ionization coefficients
for holes and electrons to the values obtained by the
analytical model for temperatures of 200 K and 300 K is
plotted in Fig. 2. Excellent fit to the experimental data can
be achieved. Using the fitted impact ionization coefficients,
the current-voltage characteristics of the structure proposed
in Fig. 1 are simulated. Extracted breakdown voltage (VBR)
at 300 K is 79 V. Measured VBR for the same structure is 75
V [6] which is in good agreement to the simulations.
InGaAsP
n InP
n- InGaAs
n- InP
0
1
2
3
4
5
6
7
8
9
10
Depth (µm)
p InP
2.0
Charge
region
Grading
Multip.
(b)
+
Absorption
region
Buffer layer
Identifying the dark current sources at different
temperatures is very useful for proper modelling and
optimization of the electrical and optical characteristics of
APDs operated in linear mode. Current-voltage
characteristics of the linear APDs contain the information
on the dark current sources. Fitting the model parameters
of TAT for InP and SRH for InP, InGaAs and InGaAsP for
various temperatures is a difficult task due to the lack of the
Bulk InP
10
EC
1.5
10
4
10
3
10
2
10
1
10
0
-1
Ionization rate (cm )
0.5
EF
0.0
EV
-0.5
-1.0
-1.5
electrons
200 K
-2.0
0
Okuto-Crowell fit
Holes:
T=300 K
T=200 K
Electrons:
T=300 K
T=200 K
holes
1.0
Energy (eV)
5
1
2
3
4
5
6
7
8
9
10
Depth (µm)
Figure 1. Cross-section of the simulated structure. (a) Doping profile
of the SAGCM InP/InGaAs APD. (b) Band diagram of the structure at
0 V; 300 K.
MIPRO 2016/MEET
2.0
Analytical model
Holes:
T=300 K
T=200 K
Electrons:
T=300 K
T=200 K
2.5
300 K
200 K
300 K
3.0
3.5
4.0
4.5
1/E (cm/MV)
Figure 2. Fitted Okuto-Crowel impact ionization coefficients for
electrons and holes for temperatures of 200 K and 300 K.
35
B. Electrical and optical analysis of the InP/InGaAs
APD
The fitted parameters for the dark current generation are
then used in current-voltage simulations of the structure
depicted in Fig 1. Simulation results of the current-voltage
characteristics for temperatures of 200 K and 300 K are
shown in Fig. 3. For both temperatures, the majority of the
dark current is originating in InP. The contribution to the
dark current from InGaAs and InGaAsP starts at
approximately 40 V, which is the punchthrough voltage
when those regions become fully depleted. Contribution to
the dark current from InGaAsP is negligible at both
temperatures. For the temperature of 300 K, the
contribution to the total dark current from the InGaAs is
comparable to the contribution to the dark current from InP.
On the other hand, the contribution of the dark current from
InGaAs at 200 K is almost two orders of magnitude lower
than the contribution to the dark current from InP.
Furthermore, there is an increase of the dark current coming
from InP that starts at around 35 V. The origin of this dark
current is TAT from InP layer. These results are in
agreement to the results reported in the literature where the
dominant mechanism that determines the DCR at lower
temperatures is TAT. TAT is caused by the high electric
field in the multiplication region and can be controlled by
decreasing the trap concentration in the region or by
adjusting the geometry of the device and increasing the
multiplication region thickness.
Optical simulations of the structure are performed and
the results of the current-voltage characteristics are shown
in Fig. 4 (a). Simulations are done for temperatures of
200 K and 300 K. Complex refractive index for
2
Current density (A/µm )
-10
10
-11
10
-12
10
-13
10
-14
Total
10
-15
10
T = 200 K
-16
10
Total
-17
InP
10
InGaAs
-18
10
InGaAsP
-19
10
-20
10
Total
-21
10
-22
10
20
30
T = 300 K
InP
InGaAs
InGaAsP
InP
T = 300 K
Total
InP
InGaAs
InGaAsP
InGaAs
InGaAsP
T = 200 K
40
50
60
70
80
Reverse voltage (V)
Figure 3. Current-voltage characteristics of the proposed InP/InGaAs
APD structure at temperatures of 200 K and 300 K. Symbols: total
current; lines: contribution to the dark current from InP, InGaAs and
InGaAsP.
36
(a)
2
Current density (A/µm )
-8
10
-9
10
-10
10
-11
10
-12
10
-13
10
-14
10
-15
10
-16
10
-17
10
-18
10
-19
10
-20
10
-21
10
-22
10
Illuminated
Optical generation
λ = 1.5 µm
-3
2
I = 10 W/cm
T = 300 K
T = 200 K
T = 300 K
Dark current
Dark current
T = 300 K
T = 200 K
T = 200 K
0
10
20
30
40
50
60
70
80
Reverse voltage (V)
(b)
10
2
Gain
T = 300 K
T = 200 K
10
1
10
0
T = 200 K
T = 300 K
Gain
experimental data. However, there are plenty of literature
sources describing the impact of these dark current sources
on the DCR of SPADs at various temperatures. We used
numerical computation to calculate probabilities that a hole
or an electron will initiate an avalanche in [15]. Electron
and hole avalanche probabilities are used together with the
carrier generation rate profiles obtained from TCAD
simulations to calculate DCR. Parameters of the TCAD
models for SRH from InP, InGaAs and InGaAsP and TAT
from InP are fitted and their contributions to DCR show
excellent agreement for various temperatures and device
geometry parameters as compared to the measured and
simulated data.
10
-1
10
-2
20
30
40
50
60
70
80
Reverse voltage (V)
Figure 4. (a) Current-voltage and (b) gain characteristics of the
illuminated InP/InGaAs APD for temperatures of 200 K and 300 K.
In0.53Ga0.47As material is not available in the simulator so
the fitting to the complex refractive index from [16] is
performed. The structure is exposed to the light with
wavelength of 1.5 µm and the intensity of light is
10-3 W/cm2. For both temperatures, the punchthrough
voltage is around 40 V. On the other hand, the breakdown
voltage changes from 58 V to 79 V for the temperatures of
200 K and 300 K, respectively. The punchthorugh voltage
of 60 V is reported in [6]. The difference can be attributed
to the variations of the doping profiles and thicknesses of
the multiplication region. Current-voltage characteristics of
the illuminated structure is used to calculate the gain
characteristics of the InP/InGaAs diode. Unity gain is
defined to be at the diode voltage where the quantum
efficiency reaches 80 %. Gain characteristics are depicted
in Fig. 4 (b). Since the punchtrough voltage is almost the
same for both temperatures and the breakdown voltage is
smaller at 200 K, the multiplication gain increases more
swiftly for the device operated at 200 K.
C. Excess noise characteristics
Statistical fluctuation of the avalanche process
generates noise in the electrical current. In the operation of
the linear device, knowledge on the noise that the APD
introduces in the electronic system is of utmost importance.
Analytical expression for calculation of the excess noise
factor (F) is derived in [17]:
= 1 − (1 − ) ∙
ሺெିଵሻమ
ெమ
,
(1)
MIPRO 2016/MEET
where M is multiplication gain and k is the ratio of the
maximum ionization coefficients of electrons and holes.
Excess noise factor is calculated for InP/InGaAs APD
structure for temperatures of 200 K and 300 K and is
depicted in Fig. 5. Multiplication gain as a function of
voltage is determined by the exposure to the light with
wavelength of 1.5 µm and intensity of 10-3 W/cm2.
Ionization coefficient profiles for electrons and holes are
extracted from 1D simulations for the same bias voltages.
Excess noise factors at M = 10 are 4.3 and 4.6 for
temperatures of 200 K and 300 K, respectively. Using the
APD diode at lower temperatures improves the overall
sensitivity of the device for optical detection due to the
decrease of the dark current. Lowering the temperatures
also decreases the excess noise factor. Other than the
decrease of the temperature, excess noise factor depends on
the structure of the layer stack of the APD, which is not
analyzed in this paper.
III.
PROCESS SIMULATIONS
A. Fitting of the Zn diffusion model parameters
Zn diffusion is commonly used in the formation of the
pn-junctions in InP/InGaAs APDs. Process simulations of
the Zn diffusion into InP are also added to the TCAD
simulation environment. Currently, Zn diffusion into InP is
not modeled in the Sentaurus Process [21]. Therefore, we
performed calibration of the diffusivity models in order to
simulate the Zn diffusion. Device simulations using the Zn
diffusion profiles are important for obtaining realistic
electrical and optical characteristics of APDs. Process
simulations of the Zn diffusion along with device
simulations can also be used in the design of the guard rings
of InP/InGaAs APDs.
SIMS profiles of Zn diffusion into InP are reported in
various literature sources [18]-[20]. Mechanisms governing
Zn diffusion and the diffusion models are discussed in [18].
The diffusion is dominated by interstitial-substitutional
mechanism where Zn diffuses as a singly ionized
interstitial. Zn diffusivity is proportional to the hole
concentration and the background concentration can
significantly reduce the Zn diffusion. Using the SIMS
profiles from the literature and calibrating the simulator
parameters, Zn diffusion was simulated in Sentaurus
Process simulator.
Sentaurus Process offers a number of different diffusion
and impurity activation models that could be used for
modeling of the Zn diffusion. We focused on the constant
diffusion model and Fermi diffusion model. Contrary to
constant diffusion model, Fermi diffusion model takes into
account the dependency of diffusivities on the electron
(hole) concentrations [21]:
డಲ
డ௧
ିି௭
∇
=
exp −
௭
ା
ಶ
ಲ
்
,
20
-3
Concentration (cm )
6
4
10
19
Fermi c = 1
Fermi c = 2
10
SIMS
18
10
17
10
16
Process simulations:
Fermi model (c = 2)
Fermi model (c = 1)
Constant diffusion
0.0
0
5
10
15
20
25
30
35
40
Multiplication gain
Figure 5. Excess noise factor of the simulated InP/InGaAs APDs for
temperatures of 200 K and 300 K
MIPRO 2016/MEET
Constant diffusion
Sulfur
Excess noise factor
T = 300 K
T = 200 K
0
SIMS from [18]
Zn
T = 200 K
2
(3)
The results of the additional verification of the fitted
Fermi diffusion model are plotted in Fig. 7. (a) and (b). In
Fig. 7 (a) the diffusion is simulated with temperatures of
475° C and 500° C and the durations of 30 min and 15 min,
respectively. Bulk doping concentration is 2·1016 cm-3.
Excellent matching of the simulated profile to the SIMS
30 min @ T = 475° C
T = 300 K
(2)
Process simulations of the Zn diffusion using the
proposed models are compared to the SIMS data from [18]
and depicted in Fig. 6. Sulfur background concentration of
2·1016 cm-3 in InP is used. Diffusion assumes constant
surface concentration of 8·1018 cm-3. The diffusion time is
30 min at 475° C. Zn diffusion profile obtained by the
constant diffusion models largely underestimates the shape
of the real diffusion profile obtained from SIMS. Femi
diffusion model with c = 2 has a sharper decrease of the Zn
concentration near the pn-junction than the real SIMS
profile. On the other hand, Fermi diffusion model with
c = 1 can excellently fit the SIMS data. Values of the fitting
ா
-3
constants ூ
cm2/s and 1.75 eV,
and ூ are 10
respectively.
10
8
,
where CA is the concentration of substitutional dopands,
CA+ is the active portion of CA, c is the charge state of point
defect, z is the charge state of dopant A, ni is the intrinsic
concentration, n is the electron (hole) carrier concentration,
X is either interstitial or vacancy, k is the Boltzmann
ா
constant, T is the temperature,
and are Fermi
diffusion constants that can be calibrated in the process
simulator. Dopant activation is modeled by solid solubility
model. Parameters of the solid solubility model are taken
from [18].
10
Excess noise factor
= ∇ ∑,
0.5
1.0
1.5
2.0
Depth (µm)
Figure 6. Comparison of the Zn diffusion profile obtained by constant
diffusion model, Fermi model with c = 1 and c =2 with SIMS Zn
profiles from [18]
37
Process simulations:
30min @ 475° C
15min @ 500° C
Zn
10
19
10
18
10
17
-3
SIMS from [18]:
30min @ 475° C
15min @ 500° C
15 min @ 500°C
30 min @ 475°C
Sulfur
0.5
1.0
1.5
2.0
2.5
10
17
10
16
10
15
10
14
10
T = 500° C
Measurements from [18]
Linear fit
8
7
6
5
4
3
2
1
Process simulations
0
0
20
40
60
1/2
Time
Absorption
region
Buffer layer
Bulk InP
Diffused Zn
Zn concentration:
Constant concentration
Diffused profile
Constant Zn
n-type doping
Sulphur concentration
0
1
2
3
4
5
6
7
8
9
10
Depth (µm)
Figure 8. Cross section of the structure with p+ region realized as a
box-like constant concentration profile and diffused Zn profile
(b)
9
Multip.
18
Charge
region
3.0
Depth (µm)
Junction depth (µm)
10
p InP
80
100
1/2
(s )
Figure 7. Verification of the fitted Fermi diffusion model parameters.
(a) Comparison of simulated doping concentration profile with Zn
SIMS profile from [18]. (b) Comparsion of the simulated junction depth
with the extracted junction depth from [18].
measurement can be achieved for both diffusion
parameters. PN-junction depth is extracted versus the
square root of the diffusion time for diffusion temperature
of 500° C and depicted in Fig. 7 (b). The simulated pnjunction depth is compared to the measurements [18].
Junction depth displays the same behavior versus t1/2 and is
a linear function of t1/2. Simulation results show that
calibrated Fermi diffusion model provides a realistic model
capable of assessing the Zn diffusion profile.
B. Device simulations of APDs with realistic p+ region
doping profile
The APD fabrication is simulated with the fitted
parameters of the Fermi model for Zn and the obtained
diffusion profiles are later used in the device simulations.
Doping concentration profiles of structures where p+ region
is realized as a box-like constant concentration profile and
as a diffused Zn profile are depicted in Fig 8. Junction depth
in both cases is 2 µm. Temperature used in the simulations
of the Zn diffusion is 500° C. The simulated diffusion time
is 11.2 min that is needed to achieve 2 µm junction depth.
Constant surface concentration of 2 · 1019 cm-3 is assumed.
Current-voltage characteristics of the structures with
both p+ region defined as a constant concentration and
diffused Zn region are depicted in Fig. 8. Breakdown
voltage for structure with p+ region simulated with diffused
Zn profile increases approximately by 1.5 V for both
temperatures. Dark current at 90 % of VBR for device
operated at 300 K is 1.26 · 10-13 A/µm2 and
1.27 · 10-13 A/µm2 for structures with box-like constant
concentration p+ region and diffused Zn p+ region,
respectively. On the other hand, dark current at 90 % of VBR
at 200 K is 1.5 · 10-17 A/µm2 for structure with box-like
constant concentration p+ region and 8.9 · 10-18 A/µm2 for
structure with diffused Zn p+ region. The dark current of
the structure with diffused Zn p+ region decreases by a
factor of 1.7 compared to the structure with box-like
constant concentration p+ region. The decrease of the dark
current is a result of the reduced TAT. For structure with
realistic p+ region, the maximum electric field decreases,
resulting in a smaller TAT. Reduced dark current can
increase the sensitivity of the APD. The change in the
breakdown voltage and current-voltage characteristics
shows the importance of using the realistic diffusion
profiles in the simulations of the device especially at lower
temperatures. This is expected to become even more
important in 2D simulations of the APD devices.
IV.
CONCLUSION
Comprehensive TCAD simulation environment
capable of simulating both device and process
characteristics for InP/InGaAs APDs is demonstrated. This
simulation procedure can be used in analysis and
optimization of complex InP/InGaAs APD structures both
in 1D and 2D. The knowledge of the realistic diffusion
10
-10
10
-11
10
-12
10
-13
10
-14
10
-15
10
-16
10
-17
10
-18
10
-19
10
-20
10
-21
2
0.0
38
19
16
Current density (A/µm )
10
10
Grading
10
20
+
Doping concentration (cm )
10
-3
Concentration (cm )
(a)
21
T = 300 K
+
p region: constant conc.
T = 300 K
T = 200 K
+
p region: diffused Zn
T = 300 K
T = 200 K
T = 200 K
20
30
40
50
60
70
80
90
Reverse voltage (V)
Figure 9. Comparison of the current-voltage characteristics for the
structure with box-like constant concentration p+ region and structure
with diffused Zn p+ region at temperatures of 200 K and 300 K
MIPRO 2016/MEET
parameters is an essential parameter for a successful design
and analysis of the guard rings at the periphery of the APD.
[8]
Analysis of the linear characteristics of a InP/InGaAs
APD is demonstrated. Fitted model parameters for the
impact ionization model, complex refractive index for
InGaAs, TAT model for InP and SRH models for all the
materials in the layer stack are used in this analysis. Process
simulations of the diffusion of Zn into InP are added to the
TCAD simulation environment. Impact of the real diffusion
profile on the linear characteristics is also analyzed. Real
diffusion profile changes the breakdown voltage and
impacts the TAT in InP region. Decrease of the dark current
for a factor of 1.7 at 90 % of VBR at 200 K is obtained by
using the realistic diffused Zn profile. The importance of
using the realistic Zn diffusion profiles is expected to be
more critical in 2D simulations of the InP/InGaAs APDs.
[9]
[10]
[11]
[12]
[13]
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
J. C. Campbell, “Recent Advances in Telecommunications
Avalanche Photodiodes”, Journal of Lightwave Technology, vol.
25, no. 1, pp. 109-121, Jan. 2007.
N. Gisin, G. Ribordy, W. Tittel, H. S. Zbinden, “Quantum
cryptography”, Reviews of Modern Physics, 74 (1), pp. 145-195,
Jan 2002.
U. Schreiber, C. Werner, “Laser Radar Ranging and Atmospheric
Lidar Techniques”, Proceedings of SPIE, SPIE, 1997, Dec. 1997
S. Verghese et al., "Geiger-mode avalanche photodiodes for
photon-counting communications," Digest of the LEOS Summer
Topical Meetings, 2005, pp. 15-16.
S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa,
"Avalanche photodiodes and quenching circuits for single-photon
detection," Applied Optics, Vol. 35, Issue 12, pp. 1956-1976, 1996.
Y. Liu, S. R. Forrest, J. Hladky, M. J. Lange, G. H. Olsen and D. E.
Ackley, “A planar InP/InGaAs avalanche photodiode with floating
guard ring and double diffused junction” Journal of Lightwave
Technology, vol. 10, no. 2, pp. 182-193, Feb 1992.
S.R. Forrest, R.G. Smith, and O.K. Kim, “Performance of
In0.53Ga0.47As/InP avalanche photodiodes,” IEEE J. Quantum
Electron.,vol. QE-18, pp. 2040-2048, 1982.
MIPRO 2016/MEET
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
F. Acerbi, M. Anti, A. Tosi and F. Zappa, “Design Criteria for
InGaAs/InP Single-Photon Avalanche Diode” IEEE Photonics
Journal, vol. 5, no. 2, pp. 6800209-6800209, April 2013.
C. L. F. Ma, M. J. Deen and L. E. Tarof, “Multiplication in separate
absorption, grading, charge, and multiplication InP-InGaAs
avalanche photodiodes”, IEEE Journal of Quantum Electronics,
vol. 31, no. 11, pp. 2078-2089, Nov 1995.
J. P. Donnelly et al., “Design Considerations for 1.06-µm InGaAsPInP Geiger-Mode Avalanche Photodiodes” IEEE Journal of
Quantum Electronics, vol. 42, no. 8, pp. 797-809, Aug. 2006.
X. Jiang, M. A. Itzler, R. Ben-Michael and K. Slomkowski,
“InGaAsP–InP Avalanche Photodiodes for Single Photon
Detection”, IEEE Journal of Selected Topics in Quantum
Electronics, vol. 13, no. 4, pp. 895-905, July-aug. 2007.
M. Anti, F. Acerbi, A. Tosi and F. Zappa, “2D simulation for the
impact of edge effects on the performance of planar InGaAs/InP
SPADs”, Proc. SPIE 8550, Optical Systems Design 2012, 855025
M. Anti, F. Acerbi, A. Tosi and F. Zappa, "Integrated simulator for
single photon avalanche diodes," Numerical Simulation of
Optoelectronic Devices (NUSOD), 2011 11th International
Conference on, Rome, 2011, pp. 47-48.
Sentaurus Device User Guide, Synopsys, Mountain View, CA,
USA, Mar. 2016.
T. Knežević, T. Suligoj, “Examination of the InP/InGaAs SinglePhoton Avalanche Diodes by Establishing a New TCAD-based
Simulation Environment”, submitted.for publication
S. Adachi, "Physical Properties of III-V Semiconductor
Compounds: InP, InAs, GaAs, GaP, InGaAs, and InGaAsP“, John
Wiley & Sons, 1992
Mclntyre R.J., “Multiplication Noise in Uniform Avalanche
Diodes”, IEEE Trans. Electron Devices, ED13, pp. 164-168 (1966)
G. J. van Gurp, P. R. Boudewijn, M. N. C. Kempeners and D. L. A.
Tjaden, “Zinc diffusion in n-type indium phosphide” Journal of
Applied Physics, Vol. 61, pp. 1846-1855, 1987.
S. Y. Yang and J. B. Yoo, “Characteristics of Zn diffusion in planar
and patterned InP substrate using Zn3P2 film and rapid thermal
annealing process”, Surface and Coatings Technology, Vol. 131,
Issues 1–3, Pages 66-69, 2000.
H. S. Marek and H. B. Serreze, “Diffusion coefficients and
activation energies for Zn diffusion into undoped and S‐doped InP”,
Applied Physics Letters, vol. 51, pp. 2031-2033, 1987
Sentaurus Process User Guide, Synopsys, Mountain View, CA,
USA, Mar. 2016.
39
Design of Passive-Quenching Active-Reset
Circuit with Adjustable Hold-Off Time for
Single-Photon Avalanche Diodes
I. Berdalović*, Ž. Osrečki*, F. Šegmanović*, D. Grubišić**, T. Knežević* and T. Suligoj*
*
Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
** Laser Components DG, Inc., Tempe, Arizona, USA
tomislav.suligoj@fer.hr
Abstract - Single-photon avalanche diodes (SPADs) are
gaining popularity in applications where low intensity light
needs to be detected. Since they are used in Geiger mode,
where the self-sustaining avalanche needs to be quenched,
an important part of the detection circuitry is the quenching
circuit. First, we examine the operation of a basic passive
quenching circuit consisting of the SPAD and two series
resistors and measure the SPAD’s dark count rate. Then we
implement a passive quenching circuit with active reset
(PQAR). Without a sufficiently long hold-off time between
quenching and reset the circuit does not operate properly.
Because of that, a hold-off time is introduced by means of an
adjustable time delay circuit. The behavior of the PQAR
circuit for different hold-off times is then examined, and the
minimum hold-off time of 1 μs, which still allows for correct
operation with the given circuitry is determined. Finally, a
comparison is made between the passive and the PQAR
circuit, focusing on the advantages of active over passive
reset.
I.
INTRODUCTION
A single-photon avalanche diode (SPAD) is a solidstate photodetector used for detecting low-intensity optical
signals [1]. Low cost, miniature size, higher quantum
efficiency and low voltage operation made the SPAD a
replacement for photomultiplier tubes (PMT) in many
applications today. The SPAD is basically a p-n junction
reverse biased above the breakdown voltage (as opposed
to avalanche photodiodes operated below the breakdown
voltage) and thus operated in Geiger-mode, where each
electron-hole pair can trigger an avalanche multiplication
process [2]. The avalanche current rises swiftly until
quenched by an external circuit. The leading edge of the
current pulse gives information about photon arrival time.
However, the avalanche multiplication process can also be
triggered by thermally generated electron-hole pairs inside
the active region or by the charge released from deep-level
traps. These mechanisms give rise to dark count rate
(DCR) and represent the noise of a SPAD. Charge
released from deep-level traps can cause afterpulsing [3].
The avalanche, once triggered, is self-sustained until
quenched by a quenching circuit, during an interval called
the quenching time. The quenching circuit also has to
This work was supported by the Croatian Science Foundation under
contract no. 9006.
40
detect the avalanche, produce a readable output signal
and prepare the diode for new detections, all that during a
time interval called dead time [4, 5]. The dead time limits
the SPAD’s maximum operating frequency, because the
diode is not able to detect photons in that period. There
are two basic types of quenching circuits: passive and
active [4]. Passive quenching circuits shut down the
avalanche process by reducing the diode’s voltage below
the breakdown voltage by means of a ballast resistor
connected in series with the diode. The avalanche current
creates a voltage drop on the resistor, thereby reducing
the voltage on the diode below the breakdown voltage
[6]. Active quenching circuits use external circuitry to
shut down the avalanche, and an improved performance
can be achieved, with circuit complexity as a
disadvantage. After quenching is complete, the diode
voltage must be restored to its operating value. This can
be done passively (long reset time, simple circuit) and
actively (small reset time, complex circuit).
The type of quenching circuit determines the type of
SPAD operation: free-running or gated. In free-running
mode, the SPAD is constantly biased above the
breakdown voltage, as opposed to gated mode operation,
where the bias voltage is periodically lowered. In gated
mode, the diode is only active when a photon needs to be
detected, therefore, the frequency of incoming photons
must be known before detection. In free-running mode,
the diode can detect randomly incoming photons and is
only limited by the dead-time needed to quench the
avalanche from a previous detection and restore the
operating voltage. The diode voltage is restored to its
operating value in a period called the reset time. During
the reset time interval, the diode is still biased above the
breakdown voltage and the avalanche can be triggered, so
there must be a certain period after quenching and before
reset where leftover charge is removed from the diode.
This time interval is called hold-off time [4]. All residual
charge can cause afterpulsing during and after reset,
thereby reducing the possibility of detecting the next
photon. As a trade-off between the circuit complexity and
performance, a passive quenching active reset (PQAR)
circuit passively quenches the avalanche and actively
restores the SPAD’s operating voltage, shortening the
reset time [7]. There are different techniques for
afterpulsing reduction in PQAR circuits, some of them
MIPRO 2016/MEET
being variable-load [8] and gated mode PQAR quenching
circuits [9]. Monolithic quenching circuit design results in
lower parasitic capacitances and higher performance [10].
In this paper, a PQAR circuit with adjustable hold-off
time is designed. First, with a simple passive quenching
circuit, the diode’s voltage and avalanche current
waveforms are measured. An active reset mechanism is
then implemented using a switching transistor in parallel
to the ballast resistor. However, if the active reset starts
during passive quenching, the avalanche is not properly
quenched and the circuit does not function as wanted. The
introduction of a hold-off time between quenching and
reset results in a properly functioning quenching circuit.
As a means of hold-off time optimization, a quenching
circuit with adjustable hold-off time is designed with offthe-shelf components. With negligible complexity, the
circuit can be used to determine minimum hold-off time
for a certain SPAD with respect to afterpulsing probability
and counting frequency. Dark counts are used as a trigger
for avalanche multiplication in all measurements.
II.
QUENCHING CIRCUITS
A. Passive Quenching Circuit
The passive quenching circuit with passive reset is the
basic circuit to shut down the avalanche of the
photodiode. As shown in Fig. 1, it consists of a reverse
biased SPAD in series with a large ballast resistor RA=560
kΩ and a small resistor RC=1 kΩ.
The SPAD used in this and all subsequent circuits is a
SAP500 from Laser Components DG [11]. It is operating
at a voltage VOP=141.1 V, which is higher than the
breakdown voltage VBR=139.1 V by the overvoltage
VEX=VOP–VBR=2 V. When an avalanche is triggered, the
current through the diode increases and the voltage drop
across the resistor RA increases, causing the diode voltage
to decrease below the breakdown voltage. As that
happens, the avalanche is quenched and the diode current
becomes negligible. The voltage drop across the resistor
RC, shown in Fig. 2 a), is in fact caused by the diode
current, so the cathode voltage waveform is the same as
the diode current waveform, shown enlarged in Fig. 3
(curve a). We can see that the quenching time, i.e. the time
until the avalanche current drops, is approximately 50 ns.
Since RA is loaded with a significant capacitance,
namely the oscilloscope input capacitance and other
Figure 1. Schematic of the passive quenching circuit.
MIPRO 2016/MEET
(a)
(b)
Figure 2. Transient response of passive quenched SPAD: (a) AC
coupled cathode voltage and (b) Anode voltage.
parasitic capacitances, the currents through RC and RA are
different. Most of the diode current is actually
capacitance-charging current, and only a fraction of the
diode current flows through RA, producing a voltage drop
of around 5 V only, as seen in Fig. 2 b). After the
avalanche is quenched, the anode capacitance has to
discharge through the ballast resistor RA and the voltage
decreases to zero with an RC time constant determined by
the anode capacitance and RA. As shown in Fig. 2. b), this
time constant is rather large so the reset time of the circuit
is around 30 μs.
Thus far, we have described the case with an
oscilloscope probe connected to the anode. The anode
capacitance also determines the peak avalanche current.
As shown in Fig. 3 (curve b), the peak current is only
about 0.3 mA without the oscilloscope probe connected to
the anode, since less current is needed to charge a smaller
CA.
B. PQAR Circuit without Hold-Off
To improve the passive reset, we wanted to implement
a passive quenching circuit with active reset. The first idea
was to use a constant level discriminator (CLD) to detect
the cathode voltage drop. The output of the CLD is
connected to the gate of a MOSFET, and when it goes
high, it switches the MOSFET on, effectively connecting
the anode to the ground and immediately resetting the
circuit. The schematic of this circuit is shown in Fig. 4.
Figure 3. Transient response of passive quenched SPAD: Diode
current a) with and b) without oscilloscope probe connected to the
anode.
41
(a)
(b)
Figure 4. Schematic of the PQAR circuit without hold-off.
The threshold voltage of the CLD is determined using
two resistors, R2=1 kΩ and R3=4.7 kΩ, connected as a
voltage divider from the negative supply to the positive
input. This gives us a threshold of about VTR=–0.85 V. The
negative input of the CLD is connected to the AC coupled
cathode. Thus, when the cathode voltage drops by more
than 0.85 V, the CLD is triggered. The output capacitance
of the MOSFET (BS107) is 30 pF, which is larger than the
oscilloscope capacitance, so this is the dominant
capacitance in CA. As a result, the peak avalanche current
is now about 2 mA, producing a cathode voltage drop of
nearly 2 V, which is more than enough to trigger the CLD.
Fig. 5 shows the transient response of this circuit.
Figure 6.
42
Figure 5. Transient response of PQAR circuit without hold-off: (a) AC
coupled cathode, (b) Anode (c) Output of the CLD (node 1).
It can be clearly seen that the anode voltage is indeed
reset when the MOSFET is switched on. However, after
the MOSFET is switched off, the anode voltage rises
again and this time, resets slowly, as in the case of passive
reset. This is caused by the fact that reset occurs too soon,
before the avalanche is properly quenched. Because of
that, a hold-off time is needed between quenching and
reset.
Schematic of the PQAR circuit with adjustable hold-off time, including the pulse generator circuit with the adjustable time delay.
MIPRO 2016/MEET
C. PQAR Circuit with Adjustable Hold-Off Time
We have concluded in Section II. B that the PQAR
circuit without hold-off between quenching and reset does
not operate as wanted. Therefore, we have implemented a
pulse generator circuit with adjustable time delay between
the output of the CLD and the gate of the MOSFET. The
purpose of this circuit is to create a voltage pulse at the
gate with adjustable duration after the output of the CLD
goes high. The schematic of the whole quenching circuit
including the pulse generator is shown in Fig. 6. As in the
previous circuit, the cathode voltage drop caused by an
avalanche event triggers the CLD (NE521), and its output
goes high while the AC coupled cathode voltage is below
its threshold voltage. This in turn triggers the J-K flip-flop
(74LS73), its Q output goes high and charges the
capacitor C2 with an RC time constant determined by R5
and C2. When the capacitor voltage reaches the threshold
voltage of the inverting Schmitt triggers (74HC14), the JK flip-flop is reset asynchronously. The voltage at node 3
goes high and triggers the pulse shaper (74HC74), which
subsequently creates a 12 ns pulse at the gate of the
MOSFET, switching it on. The RC network consisting of
R6 and C3 suppresses the ringing at node 3 and ensures the
proper operation of the pulse shaper. The pulse shaper is
needed to generate a short pulse at the gate, since node 3
stays high considerably longer, until the voltage of the
discharging capacitor C2 falls below the Schmitt trigger
threshold. For a higher count rate, it is desirable that the
MOSFET is switched on for as short a period of time as
possible, in our case about 12 ns. By changing the
resistance R5 we can charge C2 with different time
constants, which gives us a simple way of adjusting the
delay of the gate pulse, i.e. the hold-off time.
The voltage waveforms at certain nodes of the
described circuit for a hold-off time of approximately 5 μs
are shown in Fig. 7.
Fig. 8 shows the anode and cathode voltages of the
same circuit, but using the maximum value of resistor R5,
which is 10 kΩ. This value gives us a maximum hold-off
time of approximately 10 μs. Obviously, the longer the
hold-off time, the greater the possibility that another
avalanche event may occur during that time, i.e. before
reset. If that is indeed the case, the new avalanche will
occur while the diode is still below its operating voltage,
which means that the avalanche current will be
significantly smaller. That in turn means that the voltage
drop on the cathode will be smaller as well, and this
voltage drop may not be sufficient to trigger the constant
level discriminator. Therefore, if an avalanche occurs
during hold-off, it will not be detected by the circuitry. An
example of that can be seen in Fig. 8. The avalanche
current caused by the second dark count results in a
voltage drop of around 0.5 V across the resistor RC, which
is not enough to trigger the CLD. As a result, that count is
lost. The diode resets 10 μs after the first dark count.
In order to achieve a high counting frequency, the
hold-off time must be as short as possible. Because of the
intrinsic time delays of the components used, the
minimum hold-off time possible with our circuit was
determined to be around 300 ns. However, for such short
hold-off times, the afterpulsing becomes a possible issue.
Afterpulsing occurs when carriers trapped from an
MIPRO 2016/MEET
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7. Transient response of PQAR circuit with a hold-off time
adjusted to 5 μs in this case: (a) SPAD cathode (AC coupled), (b) SPAD
anode, (c) Output of the CLD (node 1), (d) Capacitor C2 (node 2), (e)
Output of the Schmitt triggers (node 3), (f) MOSFET gate (node 4).
avalanche are subsequently released, triggering new
unwanted avalanches [12]. That could explain the
43
(a)
(a)
(b)
(b)
Figure 8. An example of an avalanche pulse occured during hold-off
for the PQAR circuit with a hold-off time adjusted to 10 μs: (a) AC
coupled cathode voltage and (b) Anode voltage.
waveforms obtained with a hold-off time of around 500
ns, shown in Fig. 9. A new unexpected avalanche occurs
immediately after reset in some cases, which could be an
afterpulse. This effect is much more pronounced for
shorter hold-off times. Thus, for a hold-off time of 500 ns,
we have obtained a cascade of multiple avalanches, as
shown in Fig. 9. The hold-off time could be further
reduced by lowering the overvoltage VEX and thus
decreasing the afterpulsing probability. However, a lower
VEX results in a lower detection efficiency, so there is a
trade-off between shortening the hold-off time and
keeping the detection efficiency high.
For all the reasons above, we choose a hold-off time of
1 μs as the optimum hold-off time for the used SPAD.
This time is long enough for the afterpulsing probability to
be negligible, but not too long, to minimize the number of
dark counts during hold-off and to increase the maximum
(c)
(d)
Figure 10. Transient response of the PQAR circuit with the optimum
hold-off time of 1 μs: (a) SPAD cathode (AC coupled), (b) SPAD
anode, (c) Output of the CLD (node 1), d) MOSFET gate (node 4).
frequency of detected photons. Obviously, for different
diodes, this optimum value will vary, and our circuit
provides a simple solution for adjusting the desired holdoff time. The voltage waveforms for this final version of
the circuit are shown in Fig. 10.
(a)
(b)
Figure 9. Single shot capture of waveforms of the PQAR circuit with a
hold-off time of 500 ns with a cascade of afterpulses: (a) AC coupled
cathode voltage and (b) Anode voltage. This event occurs in about 1 in
20 measurements.
44
If we compare this PQAR circuit with a hold-off time
of 1 μs to the passive quenching circuit using the same
resistors in series to the SPAD, we can observe the main
advantage of the active over passive reset. While the
passive reset time is measured to be around 30 μs, the
SPAD with the PQAR circuit is fully reset to its operating
voltage in just over 1 μs. That means that the maximum
counting frequency of the PQAR circuit is almost 30 times
higher than that of the passive circuit, which is a
significant improvement. The problems of active reset,
such as larger afterpulsing probability, possible detections
during hold-off or while the MOSFET is on, are overcome
by choosing a suitable hold-off time and by keeping the
MOSFET on for very short periods of time, which is
achieved by applying short pulses to the gate of the
MOSFET.
MIPRO 2016/MEET
III.
CONCLUSION
The aim of this paper was to analyze the impact of
hold-off time on the performance of a PQAR circuit for
SPADs. The behavior of the SPAD was examined during
the operation with a simple passive quenching circuit. A
need for a hold-off time is demonstrated on a simple
PQAR circuit where reset starts during quenching. Then, a
PQAR circuit with adjustable hold-off time is designed as
a means of performance optimization for different SPADs.
Finally, the impact of different hold-off times on the
performance of the quenching circuit is described,
showing that a shorter hold-off time provides a higher
counting frequency, but that the hold-off time has to be
long enough to prevent false detections. Also, the
advantages of active over passive reset, in particular the
shorter reset time, are demonstrated.
REFERENCES
[1]
[2]
[3]
H. Dautet et al., “Photon-counting techniques with silicon
avalanche photodiodes,” Applied Optics, vol. 35, pp. 3894-3900,
1993.
J. Zhang, M. A. Itzler, H. Zbinden, J. W. Pan, “Advances in
InGaAs/InP single-photon detector systems for quantum
communication,” Light: Science & Applications 4, e286, 2015.
M. Stipčević, D. Q. Wang, R. Ursin, “Characterization of a
commercially available large area, high detection efficiency
single-photon avalanche diode,” IEEE Journal of Lightwave
Technology, vol. 31, no. 23, pp. 3591-3596, 2013.
MIPRO 2016/MEET
[4]
A. Gallivanoni, I. Rech, M. Ghioni, “Progress in quenching
circuits for single photon avalanche diodes,” IEEE Transactions
on Nuclear Science, vol. 57, no. 6, pp. 3815-3826, 2010.
[5] M. Stipčević, H. Skenderović, D. Gracin, “Characterization of a
novel avalanche photodiode for single photon detection in VISNIR range,” Optics Express, vol. 18, issue 16, 2010.
[6] B. F. Aull et al., “Geiger-Mode avalanche photodiodes for threedimensional imaging,” Lincoln Laboratory Journal, vol. 13, no. 2,
pp. 335-350, 2002.
[7] S. Cova, M. Ghioni, A. Lacaita, C. Samori, F. Zappa, “Avalanche
photodiodes and quenching circuits for single-photon detection,”
Applied Optics, vol. 35, no. 12, pp. 1956-1976, 1996.
[8] S. Tisa, F. Guerrieri, F. Zappa, “Variable-load quenching circuit
for single-photon avalanche diodes,” Optics Express, vol. 16, no.
3, pp. 2232–2244, 2008.
[9] M. Liu, C. Hu, J. C. Campbell, Z. Pan, M. M. Tashima, “A novel
quenching circuit to reduce afterpulsing of single photon
avalanche diodes,” Proc. SPIE, vol. 6900, no. 5, 2008.
[10] D. Bronzi et al., “Fast sensing and quenching of CMOS SPADs
for minimal afterpulsing effects,” IEEE Photonics Technology
Letters, vol. 25, no. 8, pp. 776-779, 2013.
[11] Laser Components DG, Inc. Pulsed Laser Diodes - Avalanche
Photodiodes Catalog, available at:
http://www.lasercomponents.com/fileadmin/user_upload/home/Da
tasheets/lc/kataloge/pld-apd.pdf
[12] M. G. Liu, C. Hu, J. C. Campbell, Z. Pan, M. M. Tashima,
“Reduce afterpulsing of single photon avalanche diodes using
passive quenching with active reset,” IEEE Journal of Quantum
Electronics, vol. 44, no. 5, pp. 430-434, 2008.
45
Impact of the Emitter Polysilicon Thickness on
the Performance of High-Linearity Mixers with
Horizontal Current Bipolar Transistors
J. Žilak*, M. Koričić, H. Mochizuki**, S. Morita** and T. Suligoj*
*
University of Zagreb, Faculty of Electrical Engineering and Computing, Department of Electronics, Microelectronics,
Computing and Intelligent Systems, Micro and Nano Electronics Laboratory, Zagreb, Croatia
** Asahi Kasei Microdevices Co. 5-4960. Nobeoka, Miyazaki, 882-0031, Japan
jzilak@zemris.fer.hr
Abstract - The impact of the emitter polysilicon etching in
Tetramethyl Ammonium Hydroxide (TMAH) on the
characteristics of high-linearity mixers fabricated with the
low-cost Horizontal Current Bipolar Transistor (HCBT) is
analyzed. During emitter formation, the thick layer of α-Si
is deposited over the whole wafer, which is then etched-back
in the TMAH. The emitter thickness depends on the TMAH
etching time and impacts the HCBT's electrical
characteristics. Active down-converting mixers with opencollector topology based on Gilbert cell are fabricated with
two types of HCBTs with different TMAH etching time
using the lowest-cost HCBT technology with CMOS n-well
region for n-collector. Measurements of mixers'
characteristics are done on-wafer by using the multi-contact
probes. The mixers achieve maximum IIP3 of 20.2 dBm and
conversion gain of 4 dB. Differences in performance
characteristics between two mixer types are small indicating
that HCBT's circuit performance sensitivity on the emitter
thickness variations is relatively small.
I.
INTRODUCTION
The improvement of the high-frequency response of
the CMOS devices, enabled by the downscaling, led to the
widening of the CMOS technology usage in wireless and
other radio frequency (RF) analog circuit applications. In
order to keep its low-cost and high-volume production,
scaling techniques and further CMOS development
require increasing investments [1]. On the other hand,
bipolar technologies are suitable for the mentioned RF
applications at coarser technology nodes due to better
high-frequency characteristics, noise factor and higher
gain [2], [3]. Hence, the solution to the very cost-sensitive
demands of the RF integrated circuits (RFICs) market is
the addition of bipolar devices to the coarser lithography
CMOS technology. It is critical that such integration is
done with minimum number of new masks and process
steps, keeping the fabrication costs as low as possible.
The Horizontal Current Bipolar Transistor (HCBT)
technology, developed with a novel technological
approach as add-on to 180 nm CMOS process, is an
example of such low-cost integration. The integration is
accomplished with addition of only 2 or 3 lithography
masks and few process steps. The fabricated HCBTs with
This work was supported by the Croatian Science Foundation under
contract no. 9006.
46
very high peak cutoff frequency (fT) of 51 GHz and
maximum frequency of oscillations (fmax) of 61 GHz,
along with collector-emitter breakdown voltage (BVCEO)
of 3.4 V has been demonstrated [4]. Moreover, the highvoltage transistors are integrated in the process at zero
cost [5].
The RF mixers are widely used in mobile and base
station transceivers, as well as general purpose RF
systems. A spectral efficiency and an intermodulation
distortion, generated due the nonlinear nature of active
elements, are important effects in modern systems
imposing high demands on the RF mixers [6]. Hence,
high-linearity mixers with low power consumption are
preferable in the integrated mixer design. On the other
hand, high performance mixers can be fabricated with a
minimum number of large area passive components
placed on the chip. Therefore, mixer design has shown to
be suitable for RFIC design in novel HCBT technology
and high-linearity active mixers with input 3rd order
intercept point (IIP3), IIP3 = 23.8 dBm, have recently
been demonstrated [7].
In this paper, the impact of the emitter polysilicon
etching in Tetramethyl Ammonium Hydroxide (TMAH)
on the characteristics of high-linearity mixers fabricated
Figure 1. Cross-section of the HCBT: a) α-Si deposition prior to
TMAH etching, b) final structure, c) TEM of the intrinsic region with
the short-TMAH, d) TEM of the intrinsic region with long-TMAH
MIPRO 2016/MEET
Collector, Base Current (A)
10
-2
10
-3
10
-4
10
-5
10
-6
10
-7
10
-8
10
-9
10
-10
10
-11
10
-12
AE = 0.1 x 1.8 µm
VCE = 2 V
2
IC
IB
HCBT - short-TMAH
HCBT - long-TMAH
0,4
0,6
0,8
1,0
1,2
Base-Emitter Voltage (V)
Figure 2. Measured Gummel plots of the HCBTs with the short- and
long-TMAH etching times with an emitter area of 0.1×1.8 µm2 at
VCE = 2 V.
60
AE = 0.1 x 1.8 µm
VCE = 2 V
Frequency (GHz)
50
2
40
Electrical characteristics of unit transistors, which
differ only in TMAH etching times, are measured. The
Gummel characteristics of both HCBT types are shown in
Fig. 2. The HCBT with long-TMAH has smaller base
current (IB) at base-emitter voltage VBE = 0.9 V, which is
the bias point around peak fT. Consequently, long-TMAH
HCBT has higher peak current gain βlong-TMAH=133 in
comparison to the βshort-TMAH=117. It is due to the longer
distance between intrinsic emitter and the extrinsic base
(p+ base) and consequently reduced electron injection into
the base contact. Moreover, the charge sharing between
intrinsic and extrinsic base acceptors is reduced if the
intrinsic transistor is at larger distance from the extrinsic
base, as explained in [8]. The reduced charge sharing
effect is also beneficial for the fT and fmax (Fig. 3) resulting
in a higher fT in the case of the long-TMAH HCBT
(thinner emitter polysilicon). The fmax benefits from the
increased fT, but its increase is offset by an increased base
resistance in case of thinner emitter at the long-TMAH
HCBTs [8].
III.
30
20
fT
10
fmax
HCBT - short-TMAH
HCBT - long-TMAH
0
-5
10
10
-4
10
-3
Collector Current (A)
Figure 3. Cutoff frequency (fT) and maximum frequency of oscilations
(fmax) versus collector current of the HCBTs with the short- and longTMAH etching times, with an emitter area of 0.1×1.8 µm2 at VCE = 2 V.
with the lowest-cost HCBTs using CMOS n-well region
as n-collector with only 2 additional masks is analyzed.
II.
FABRICATED HCBT STRUCTURES
The HCBT fabrication sequence and its integration
with the CMOS baseline process using a base-after-gate
scheme are described in detail in [8]. All examined mixers
are designed and fabricated by using the CMOS n-well
collector HCBT with a single polysilicon region. It is a
lowest-cost version of HCBT since it uses only 2
additional lithography masks and skips the additional
n-collector implantations. A cross-section of the HCBT
structure is shown in Fig. 1b. The emitter formation as a
subject of this study is described in more detail as follows.
The in situ phosphorus doped amorphous silicon (α-Si)
(Fig. 1a) is deposited and etched-back by using
Tetramethyl Ammonium Hydroxide. The deposited
polysilicon layer has to be thick enough to achieve planar
surface of the deposited film. A thin native oxide layer,
grown during the pre-deposition annealing, serves as a
protection layer keeping the n-hill from the TMAH
etching. At the same time, the oxide is thin enough
allowing the current flow in or out of the n-hill. The
TMAH etching is time-controlled and the final polysilicon
thickness is determined by the etching duration. Two
TEM micrographs of fabricated HCBT structures with
different TMAH etching times are shown in Fig. 1. The
shorter time (short-TMAH) results in a thicker polysilicon
layer (Fig. 1c) and the longer etching time (long-TMAH)
results in a thinner polysilicon layer (Fig. 1d).
MIPRO 2016/MEET
HCBT MIXER DESIGN
High-linearity mixers are designed and fabricated
using both unit HCBTs with short- and long-TMAH
etching. A down-converting active mixer, using a doublebalanced Gilbert cell topology [9] with an open-collector
output, is designed and its schematic is shown in Fig. 4.
Main circuit parts are a differential pair (Q1 and Q2) as a
transconductance stage that transforms the input voltage
signal to the current, and a switching quad consisting of 4
transistors (Q3~Q6), that commutates the current
providing frequency conversion. The differential Local
Oscillator (LO) signal, is generated by the LO buffer
circuit. It should be noted that 128 of unit HCBT
transistors are connected in parallel in the design for each
Q1~Q6 transistor in the scheme. The current mirror (Q7
and Q8) sets the bias current. Degeneration emitter
resistors (RE) are used in order to improve linearity, since
it is a critical parameter of mixers in wireless transceivers.
The input impedances are designed to be 50 Ω.
The open-collector design provides flexibility in
setting parameters such as output impedance, conversion
gain and linearity. Moreover, it can be used effectively
Figure 4. Double-balanced active mixer based on a Gilbert cell with an
open collector, which is implemented on-wafer, except the output
network that is added on external PCB, as marked.
47
20
20
15
15
IIP3
10
5
5
0
0
-15
Figure 5. Measurement setup used for the on-wafer measurements
with mixer chip photo and photo of additional PCB
with both differential and single-ended filter
configurations that are usually connected to the mixer
outputs. Mixer performance is, in this case, highly
dependent on its output network. It has to reduce the high
output impedance of the switching quad transistors and to
provide an interface to the load, which assures adequate
power transfer with a minimum number of external
components [10].
The voltage swing at the output of the LO buffer
circuit has to be large enough to switch on and off the
switching quad transistors completely (Fig. 4). The buffer
is amplifying the input LO signal and provides the singleended to differential conversion. That is accomplished by
the differential amplifier, which is the main part of the
buffer circuit. Moreover, the buffer improves the isolation
between the LO and other mixer's ports (RF and output
IF) and sets the proper DC voltage level for the switching
quad transistors (Q3~Q6 in Fig. 4). The output stage of the
buffer is build by the emitter followers, which needs to
assure sufficient current for driving the LO switching
quad capacitances.
The conversion gain (CG) and the IIP3 are the most
common figure-of-merits in mixer performance
characterization. They are determined by the
transconductance of the differential pair (gm) and the value
of the emitter degeneration resistance RE, as well as by the
shape and the magnitude of the LO signal generated by
the LO buffer. The CG can be approximated by [11]
CG ≈
RL
2
⋅
,
π 1/ gm + RE
(1)
where RL is the load resistance. The higher gm and RL
result in the higher CG, while the higher RE reduces it. On
the other hand, both high gm and RE are beneficial for IIP3
as a linearity figure of merit, as approximated in [11]
IIP3 ≈ 4 2vT ⋅ (1 + g m RE )
3/ 2
,
(2)
where the vT is the thermal voltage. Hence, in order to
achieve CG above 0 dB, the mixer current is increased to
have a higher gm by putting the unit HCBTs in parallel.
The chosen value of RE is 20 Ω and it assures highlinearity behavior.
10
CG
HCBT - short-TMAH
HCBT - long-TMAH
-10
-5
0
LO level (dBm)
5
Conversion Gain (dB)
IIP3 (dBm)
25
-5
10
Figure 6. Measured input 3rd order intercept point (IIP3) and
conversion gain (CG) vs. LO power level (PLO) of the mixers with
RE = 20 Ω fabricated with the short- and long-TMAH HCBTs at the
mixer current of 50 mA. Measurement setup of the two-tone test:
PRF = -10 dBm, fRF = 900 MHz, fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V.
IV.
HCBT MIXER MEASUREMENTS
The fabricated mixers are measured on-wafer by using
the multi-contact probes. The measurement setup is shown
in Fig. 5. The used probes have in total 4 ground-signalground RF ports connected to the signal pads and several
DC probes for the power supply, reference voltages and
current connections. Both input ports (RF and LO) are
fully differential but they are driven by the single-ended
signal with another input grounded. Two RF signal
generators are used for the RF port driving, since a twotone test is required for linearity measurements. The
signals are added together by using a power combiner.
The mixer current is adjustable via a current source (ISET)
connected to the current mirror input. Additional mixer's
DC operating point tuning for the optimal performance
can be achieved by DC voltages connected to the DC
probes.
The open-collector IF outputs are connected to a balun
transformer. It is an off-chip component mounted on a
small PCB designed for the measurement purpose. The
balun is an 8:1 impedance transformer used to convert
differential IF outputs to single-ended. When loaded with
50 Ω, the balun present a 400 Ω load to the mixer. The
open-collector outputs are biased through the two
inductors also placed on the external PCB with the power
supply voltage VCC = 5 V. They are required since the
maximum DC current of the balun is specified not to
exceed the 30 mA and the total mixer current is larger. For
the smaller mixer current consumption, the bias could be
set through the center pin of the balun's primary winding.
The output power is measured by a spectrum analyzer
connected to the secondary winding of the balun.
The CG and IIP3 of the examined mixers are
measured at 900 MHz RF (fRF) frequency and -10 dBm
input power (PRF). The output frequency (fIF) is 20 MHz
and the LO frequency (fLO) is set accordingly. The
frequency spacing (∆f) used for the two-tone testing is
2 MHz. The used power drive of the LO buffer (PLO) is
0 dBm. The power combiner loss, and losses due to cables
used for the RF and the LO port driving are the only
losses included in the results deembedding.
The LO buffer circuit function is verified with the
measurement of the IIP3 and CG dependence on the LO
48
MIPRO 2016/MEET
0
POUT, PIM3 (dBm)
Return loss (dB)
5
10
15
RF return loss
LO return loss
20
25
30
0
100 200 300 400 500 600 700 800 900 1000
Frequency (MHz)
power level shown in Fig. 6. Results are shown for the
mixers with RE = 20 Ω fabricated with both short- and
long-TMAH HCBTs. The IIP3 and CG values are
constant around PLO = 0 dB, which is the standard LO
drive in the commercial mixers, while small decrease of
CG is observed for the LO power levels smaller
than -5 dBm. Since the input impedances are designed to
be 50 Ω, there is no need for the input matching networks
at both RF and LO ports. The return loss measurements,
necessary for the input impedances testing, are done by
using the vector network analyzer. The measured return
loss is less than 10 dB in both mixer types and results of
the mixer with RE = 20 Ω fabricated with the long-TMAH
HCBTs is shown in Fig. 7. Hence, more than 90 % of the
incident power coming from the RF signal generators is
absorbed by the mixer ports in frequency range 10 MHz to
1 GHz.
The measured IIP3 and CG dependence on the mixer
current (IMIX) (without the LO buffer and bias circuitry
current consumption) of mixers fabricated with both shortand long-TMAH HCBT types are shown in Fig. 8. The
total power consumption is 425 mW at a mixer current of
40 mA, including the LO buffer and bias circuitry
consumption. The highest IIP3 value of 20.2 dBm is
obtained in the short-TMAH HCBT mixer at the mixer
current of 45 mA. The mixer with the long-TMAH
HCBTs has the IIP3 of 19.6 dBm at the same mixer
25
20
IIP3 (dBm)
15
15
IIP3
CG
HCBT - short-TMAH
HCBT - long-TMAH
10
10
5
5
0
Conversion Gain (dB)
Mixers with RE = 20 Ω
20
0
10
20
30
40
50
60
70
Mixer Current (mA)
80
90
Figure 8. Measured input 3rd order intercept point (IIP3) and
conversion gain (CG) vs. mixer current (IMIX) of the mixers with
RE = 20 Ω fabricated with the short- and long-TMAH HCBTs.
Measurement setup of the two-tone test: PRF = -10 dBm, PLO = 0 dBm,
fRF = 900 MHz, fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V.
MIPRO 2016/MEET
POUT
PIM3
HCBT - short-TMAH
HCBT - long-TMAH
-10
0
10
20
30
PIN (dBm)
Figure 7. Measured return loss at RF and LO ports of the mixer with
RE = 20 Ω fabricated with the long-TMAH HCBTs at the mixer current
of 50 mA.
25
30
20
10
0
-10
-20
-30
-40
-50
-60
-70
-80
-90
-20
Figure 9. Measured output power (POUT) and 3rd order intermodulation
distortion power (PIM3) vs. input power (PIN) of the mixers with
RE =20 Ω fabricated with the short- and long-TMAH HCBTs at the
mixer current of 50 mA. Measurement setup: fRF = 900 MHz,
fIF = 20 MHz, PLO = 0 dBm, VCC = 5 V.
current. The shape of the linearity dependence curve on
mixer current is similar in both HCBT mixers. On the
other hand, CG is higher by around 0.4 dB in the case of
mixer with the long-TMAH HCBTs and has a value of
4 dB at the mixer current of 55 mA. The measured IIP3
values of both mixer types are smaller in comparison to
the previously reported high-linearity mixers [7], where
IIP3 = 23.8 dBm, since those are fabricated with the
optimized n-hill collector HCBTs. In this work, the mixers
are designed with the CMOS n-well region as n-collector
HCBTs that are fabricated with only 2 additional
lithography masks, unlike 3 masks used for optimized
n-hill collector HCBTs.
The output and 3rd order intermodulation distortion
power versus input power of both mixer types are shown
in Fig. 9. The IIP3 values can be observed as theoretical
intercept point of fundamental tone (POUT) and 3rd order
intermodulation tone (PIM3), while the CG values
correspond to the difference in POUT and PIN. The IIP3 and
CG start to degrade for PIN above 0 dBm where the large
signal effects occur. They are characterized by the 1 dB
compression point (P1dB) which is defined as the power of
input signal where the power gain is degraded by 1 dB.
The measured P1dB values are 2 dBm and 2.7 dBm in both
mixers with the short- and long-TMAH HCBTs,
respectively.
The similar values of IIP3 and CG in both short- and
long-TMAH mixers (Figs. 8 and 9) suggest that the
TMAH etching time impact on the mixer performance is
relatively small. The emitter degeneration resistor
(RE = 20 Ω), necessary for the high-linearity performance,
has a great impact on the mixer characteristics, according
to (1) and (2). Hence, the differences in HCBT's TMAH
etching times and their impact are masked by the RE. In
order to gain a deeper insight into the effect of TMAH
etching time, the mixers without emitter degeneration
resistors are also fabricated and the IIP3 and CG
dependence on the mixer current (IMIX) are measured.
They have identical LO buffers as the mixers with RE. The
results of both short- and long-TMAH HCBT mixers are
shown in Fig. 10. The mixers without degeneration have
larger CG values, but they are less linear, as expected, in
comparison to the ones with degeneration. Since there is
49
20
25
IIP3 (dBm)
15
20
IIP3
10
CG
15
HCBT - short-TMAH
HCBT - long-TMAH
5
10
0
5
-5
Conversion Gain (dB)
Mixers without RE
0
10
20
30
40
50
60
70
Mixer Current (mA)
80
no RE, the measured mixer characteristics are more
dependent on the HCBT transistor characteristics. The
long-TMAH HCBTs have smaller IB at the same VBE in
comparison to the short-TMAH HCBTs. It results in a
better current mirroring meaning that for the same
reference current (ISET), total mixer current (IMIX) is higher
in the case of the long-TMAH mixer. For ISET of 5 mA, the
short-TMAH mixers have IMIX around 44 mA and the
long-TMAH mixers have IMIX around 58 mA. Considering
IIP3 and CG, the differences in the measured values
between two mixer types are higher in comparison to the
mixers with RE = 20 Ω, but they are still relatively small.
The difference in CG is ∆CG=1.1 dB and in IIP3 is
∆IIP3 = 1.0 dB at the mixer current of 50 mA.
V.
REFERENCES
90
Figure 10. Measured input 3rd order intercept point (IIP3) and
conversion gain (CG) vs. mixer current (IMIX) of the mixers without RE
fabricated with the short- and long-TMAH HCBTs. Measurement setup
of the two-tone test: PRF = -10 dBm, PLO = 0 dBm, fRF = 900 MHz,
fIF = 20 MHz, ∆f = 2 MHz, VCC = 5 V.
CONLUSION
The impact of the TMAH etching time, as an
important HCBT technology fabrication step, on the
performance of the high-linearity mixers is investigated.
The TMAH time variation results in the different
polysilicon emitter thicknesses. The HCBT with thinner
emitter polysilicon, as a result of the long-TMAH etching
time, has a smaller IB, higher β and fT. The peak IIP3 and
CG values obtained in the mixers fabricated in this lowestcost HCBT technology are IIP3 = 20.2 dBm and
CG = 4 dB. The differences in IIP3 and CG of mixers
with HCBTs with 2 TMAH etching times at IMIX = 50 mA
50
are 0.6 dB and 0.4 dB respectively. The additional
measurements on the mixers without emitter degeneration
resistor RE also showed small differences in CG and IIP3
(∆CG=1.1 dB and ∆IIP3 = 1.0 dB), suggesting that the
differences in electrical characteristics at transistor level,
have a relatively small impact at the circuit level in the
case of high-linearity mixers. Furthermore, the overall
mixer performance demonstrates the suitability of the
lowest-cost HCBT technology with CMOS n-well region
as n-collector for the wireless communication market.
[1]
M. Feng, S.-C. Shen, D.C. Caruth, J.-J. Huan, “Device
technologies for RF Front-End circuits in next-generation wireless
communications,” Proceedings of the IEEE, vol. 92, pp. 354-375,
Feb 2004.
[2] P. Deixler et al., "QUBiC4G: a fT/fmax = 70/100 GHz 0.25 µm low
power SiGe-BiCMOS production technology with high quality
passives for 12.5 Gb/s optical networking and emerging wireless
applications up to 20 GHz," in Proc. BCTM, 2002, pp. 201-204.
[3] H. S. Bennet et al., “Device and Technology Evolution for Sibased RF Integrated Circuits,” IEEE Trans. Electron. Devices, vol.
52, pp. 1235-1258, Jul 2005.
[4] T. Suligoj et al., “Horizontal Current Bipolar Transistor (HCBT)
with a Single Polysilicon Region for Improved High-Frequency
Performance of BiCMOS ICs,” IEEE Electron Device Lett., vol.
31, pp 534-536, Jun 2010.
[5] M. Koričić, J. Žilak, T. Suligoj, “Double-Emitter ReducedSurface-Field Horizontal Current Bipolar Transistor with 36 V
Breakdown Integrated in BiCMOS at Zero-Cost,” IEEE Electron.
Device Lett., vol. 36, pp. 90 – 92, Feb 2015.
[6] B. Razavi, RF Microelectronics, 2nd ed., New York, USA,
Paerson Education, Inc., 2012.
[7] J. Žilak, M. Koričić, H. Mochizuki, S. Morita, T. Suligoj, “Impact
of Emitter Interface Treatment on the Horizontal Current Bipolar
Transistor (HCBT) Characteristics and RF Circuit Performance,”
in Proc. BCTM, 2015, pp. 31-34.
[8] M. Koričić, “Horizontal Current Bipolar Transistor Structures for
Integration with CMOS Technology”, doctoral thesis, FER,
University of Zagreb, Croatia, 2008.
[9] B. Gilbert, “A precise four-quadrant multiplier with
subnanosecond response”, IEEE Journal of Solid-State Circuits,
vol. 3, pp. 365-373, 1968.
[10] M. B. Judson, "Low-voltage front-end circuits: SA601, SA602,"
Philiphs Semiconductors, Aplication Note AN1777, Aug. 1997.
[11] J. Rogers and C. Plett, "Mixers," in Radio Frequency Integrated
Circuit Design, Boston, MA, USA: Artech Hounce Inc., 2003.
MIPRO 2016/MEET
Fully-integrated Voltage Controlled Oscillator in
Low-cost HCBT Technology
M. Koričić *, J. Žilak *, H. Mochizuki **, S. Morita ** and T. Suligoj*
*
University of Zagreb, Faculty of Electrical Engineering and Computing, Department of Electronics, Microelectronics,
Computing and Intelligent Systems, Micro and Nano Electronics Laboratory, Zagreb, Croatia
** Asahi Kasei Microdevices Co. 5-4960. Nobeoka, Miyazaki, 882-0031, Japan
marko.koricic@fer.hr
Abstract - Design of cross-coupled voltage controlled
oscillator in low-cost HCBT technology is presented. Beside
the low-complexity front-end devices, only 2 metal layers
are used and the passives are implemented in the available
on-chip structures. Varactors are fabricated as pn-junctions
by using the ion implantation from the technology.
Symmetric inductors are fabricated by using the topmost
metal layer. Since only 2 aluminum metal layers are
available, small thickness of the aluminum layer and
proximity of the silicon substrate limit the inductor quality
factor. Varactor and inductor models for circuit simulations
are developed by using the device and electromagnetic
simulations, respectively, and are compared to measured
characteristics of fabricated devices.
I.
INTRODUCTION
Integration of bipolar transistors to the baseline coarser
lithography Complementary-Metal-Oxide-Semiconductor
(CMOS) processes has become an attractive technological
approach for the extension of the technology applications,
without a significant increase of fabrication cost [1]. A
typical application is in the wireless communication
circuits where benefits are gained by the improved high
frequency and noise performance of bipolar transistors.
Such integration should be cost-effective to meet the
demands of very cost-sensitive market. Horizontal current
bipolar transistor (HCBT) is integrated with standard 180
nm CMOS technology with only 2 or 3 additional
lithography masks and a small number of additional
processing steps [2], resulting in the very low-cost
BiCMOS technology. At the same time, electrical
performance is comparable to more expensive vertical
bipolar transistors with implanted base. Furthermore,
high-voltage devices are added to the technology at zerocost [3], [4], extending the portfolio of the available
devices.
RF mixer is so far the only published RF circuit
implemented in HCBT technology [5]. In order to
fabricate other RF front-end circuits, high quality passive
components are needed, which: (i) are not available in the
technology in its experimental and development phase,
and (ii) increase the fabrication cost. In this paper, the
design of fully integrated cross-coupled voltage controlled
oscillator (VCO) is presented. The goal is to design VCO
at the lowest possible cost by using only available frontThis work was supported by the Croatian Science Foundation under
contract no. 9006.
MIPRO 2016/MEET
Figure 1. TEM cross-section of fabricated HCBT.
end process steps needed for HCBT fabrication, i.e., in the
bipolar-only version of the HCBT technology, with only
the first two metal layers of standard CMOS interconnect
processing. In this paper, design, modeling and fabrication
of passive components in HCBT technology are done for
the first time.
II.
VCO DESIGN IN HCBT TECHNOLOGY
VCO is fabricated by using only the process steps for
the fabrication of HCBT devices. TEM cross-section of
fabricated HCBT is shown in Fig. 1. with marked
transistor regions. Details about the process are given
elsewhere [2]. Steep collector profile optimized for the
performance of high-speed HCBT devices [6] is used for
the fabrication of VCO.
Electrical schematic of the designed VCO is presented
in Fig. 2a and the photograph of fabricated chip in Fig. 2b.
The core of the VCO consists of the cross-coupled
differential pair biased by the tail current source (I0)
realized by the simple current mirror. Bias current (IBIAS) is
supplied off-chip. Resonant LC-tank of the circuit consists
of a symmetric inductor and varactors which are
fabricated by the front-end process steps available in the
technology. Output buffers are implemented as two stages
of the emitter followers in order to reduce the parasitic
capacitance at the collectors of T1 and T2 and to provide
capability of driving the output pad and the input
impedance of the spectrum analyzer. Also, a separate
biasing is used in order to monitor the power dissipation
of the VCO core.
In order to sustain the oscillations, the cross-coupled
pair generates negative conductance at the collectors of T1
51
TABLE I. SIMULATED ELECTRICAL CHARACTERISTICS OF INTEGRATED
INDUCTORS WITH DIFFERENT GEOMETRICAL PARAMETERS
Wire width, W [µm]
Wire spacing, S [µm]
Number of turns, N
Output diameter, D [µm]
L@fosc, [nH]
Q@fosc
Qmax
L@f(Qmax), [nH]
Rp@fosc, [Ω]
I0@Voutpp=0.6 Vpp, [mA]
Inductor 1
12
3
3
150
1.12
1.487
3.6 (@10 GHz)
1.293
26.2
12.4
Inductor 2
5.5
0.8
10
170
10
1
1 (@2.3 GHz)
10
157
1.5
a)
b)
Figure 2. a) Electrical schematic of designed cross-coupled VCO.
b) Chip photograph.
and T2, which replenishes the energy lost in the resistive
component of LC-tank. The transconductance of
transistors (gm) should be high enough to satisfy the
condition:
1
gm ≥
Rp
(1)
where Rp is the equivalent parallel resistance of the
inductor. In practice, the gm is chosen higher than the
marginal case in (1), to allow complete current steering
and higher output voltage swings.
As a design constraints we choose: a) tail bias current
I0 smaller than 2 mA in order to have power dissipation
less than 10 mW including bias circuitry and excluding
output buffers, b) output voltage swing around 0.6 Vpp in
order to have complete current steering in the crosscoupled pair, c) central frequency fc=2.5 GHz and d)
tuning range of 10% of the central frequency, i.e.,
∆f=250 MHz.
In the case of complete current steering in the
differential pair, output voltage swing is [7]:
Vout , pp =
4
I 0 Rp
π
(2)
where Rp is equivalent parallel resistance of the resonant
tank. In the first approximation, it is defined by the
inductor parameters at the frequency of oscillation:
1
R p = 2π f osc L + Q
Q
52
(3)
Figure 3. Layout of the symmetric inductor used in the design of VCO.
where fosc is the frequency of oscillation, L the inductance
and Q the inductor quality factor. A high Q of the inductor
is desirable due to smaller losses, which reflects to smaller
power consumption. Furthermore, a phase noise, an
important oscillator parameter, is proportional to 1/Q2 and
it benefits from usage of high quality inductors.
A. Inductor design and modeling
Symmetric inductors shown in Fig. 3 are used in the
LC-tank of VCO. Inductors are fabricated by using only 2
aluminum layers of 0.5 µm thickness. Since the sheet
resistance of these layers is rather high, wide metal lines
are used in order to minimize wire resistance and improve
Q-factor. On the other hand, this increases the capacitive
coupling to substrate limiting the maximum achievable Q
due to substrate losses. Additionally, wider metal lines
require larger spacing between the lines due to reliability
of the interconnect processing, yielding a smaller mutual
and hence the overall inductance. Furthermore, total
inductor area, which is usually specified in terms of an
output diameter, increases and the achievable inductance
is limited. Results obtained by electro-magnetic (EM)
simulations for 2 inductors realized with wide and
moderately wide metal lines and with comparable output
diameter (D) are shown in Table 1. The equivalent parallel
resistance (Rp) is calculated at a target oscillation
frequency (fosc=2.5 GHz) from (3) and the tail bias current
(I0) for desired output voltage swing Voutpp=0.6 Vpp is
MIPRO 2016/MEET
Figure 4. Compact model of the symmetric inductor extracted for the
use in the circuit simulation.
Figure 6. Varactor: a) 2D device simulation model, b) simple π-model
of capacitances. Due to symmetry, half of the structure is simulated.
frequency divider is chosen to put a focus only on the
performance of VCO.
Figure 5. Results of the optimization used for the extraction of inductor
model parameters from Fig. 3.
TABLE II. PARAMETERS OF THE INDUCTOR COMPACT MODEL OBTAINED
FROM THE OPTIMIZATION
CF, [fF]
203
Rsub, [Ω]
602
Ls, [nH]
4.44
Cox1, [fF]
119
Rs, [Ω]
36
Csub1, [fF]
3.1e-6
Cox, [fF]
44.6
Rsub1, [Ω]
1506
Csub, [fF]
1.2e-9
calculated by combining (2) and (3). It can be seen that it
is difficult to obtain a high Q inductor at frequency as low
as 2.5 GHz. In the case of the inductor 1, Q=1.487, which
is higher compared to the inductor 2. However, due to a
larger width of the metal lines (W), a smaller length of the
wire with comparable D, results in a much smaller
inductance of the inductor 1. This translates to much
higher I0 and increased power dissipation. The inductor 2
is designed to have the peak Q-factor around the targeted
fosc=2.5 GHz. Due to the higher inductance, the I0 is within
the specification of the power dissipation for the given
output voltage swing. Therefore, the inductor 2 is used for
the design of VCO.
It should be noted that in the case of the inductor 1, a
moderately high Q of 3.6 is obtained at f=10 GHz. Since
the higher Q can be obtained at higher frequencies, VCO
can be designed to operate at a higher frequency and then
a frequency divider can be used to obtain the specified
lower frequency. For example, if the inductor 1 is used for
the design of VCO operating at 10 GHz and
Voutpp=0.6 Vpp from (2) and (3) we can calculate the
required tail current to be I0=1.5 mA. Furthermore, the
phase noise performance would be improved as explained
earlier. However, design of VCO at f=2.5 GHz without
MIPRO 2016/MEET
S-parameters obtained by the EM simulations are used
to extract a compact model of the inductor, which is
shown in Fig. 4. Optimization goals are defined as
absolute differences between S11, S12 and S22 obtained
from the EM simulation and the ones obtained from the
compact model from Fig. 4 in the frequency range from
1 GHz to 8 GHz. Relative difference after the
optimization is shown in Fig. 5 showing the error lower
than 2 % in the frequency band of interest around
2.5 GHz, indicating the suitability of the model for the
circuit simulation. The parameters of the compact model
obtained by the optimization are listed in Table 2.
B. Varactor design and modeling
In order to tune the frequency of the VCO, variable
capacitors are needed. Frequency of oscillation is:
f osc =
1
2π L(C p + Cv )
(4)
where Cv is variable capacitance and Cp is overall parasitic
capacitance at the output node, including the collectorbase, the base-emitter and the collector-substrate
capacitance of the cross-coupled pair as well as the buffer
input capacitance, the inductor parasitic capacitance and
interconnect capacitance (see Fig. 2a). Equation (4) is
used to set the value of the varactor minimum and
maximum capacitance for the target central frequency (fc)
and the tuning range (∆f). Frequency tuning is done by
reverse biased diodes, which are fabricated by using the
ion implantation for the fabrication of the n-collector (nhill in Fig. 1) and the p+ extrinsic base of HCBT.
Electrical characteristics and the circuit model of the diode
are obtained by the TCAD device simulations. Doping
profiles under the extrinsic base, which are measured by
Secondary Ion Mass Spectrometry (SIMS) are loaded in
the device simulator. Cross-section of the simulated
53
Figure 7. Measured SIMS profile used in the device simulation. Doping
profile at the cutline VCL from Fig. 5.
varactor is shown in Fig. 6a and the profiles used in the
simulations obtained at the cutline VCL are shown in
Fig. 7. AC simulations are performed at different bias
points in order to obtain the capacitance-voltage (CV)
characteristics. Capacitances of the structure from Fig. 6a
can be represented by a simple π-network shown in
Fig. 6b. It consists of two junction capacitances: cathodeanode capacitance (CjCA) and cathode-substrate
capacitance (CjCSUB) as well as parasitic anode-substrate
capacitance (CASUB), which is shown to be negligible from
the simulation results. In SPICE-like programs the pnjunction capacitance in is modeled by:
C jB =
C jB 0
Vapp
1 +
Vbi
M
(5)
where CjB0 is zero-bias capacitance, Vbi built-in potential,
M grading coefficient and Vapp voltage applied between
the cathode and the anode. SPICE model parameters are
extracted from the device simulation results which are
then scaled in the circuit simulations to obtain target fc and
∆f. Finally, the actual physical design of varactor layout is
carried out with the area defined by the circuit simulation
results. Fitting of the SPICE model to the device
simulation after scaling is shown in Fig. 8 The quality
factor of varactors (QC) is important for total quality factor
of resonant LC-tank. The total quality factor is:
1
1
1
=
+
QTOT QL QC
(6)
where QL and QC are quality factors of the inductor and
the varactor, respectively. QC depends on the series
resistances and hence is difficult to model by the device
simulations. In the physical design of the circuit layout,
p+ and n+ regions from Fig. 6a are kept as close as
possible with the constraint not to have too low
breakdown voltage between the cathode and the anode.
The cathode and the anode regions are realized as a long
thin slices, which yields slightly larger series resistances
(i.e., lower QC) compared to the case when n+ rings are
formed around smaller p+ square-shaped regions. The
structure with slices is chosen because it is dominated by
54
Figure 8. Comparison of varactor CV-characteristics obtained by the
device simulations and from the extracted SPICE model.
TABLE III. TRANSISTOR SIZES USED IN VCO DESIGN
Transistor
TB1
TB2
T1
T2
No. of unit transistors
1
8
8
8
TABLE IV. SIMULATED ELECTRICAL CHARACTERISTICS OF VCO
Cps=80 fF
Parasitic capacitance at the output
Cps=0
VCC, [V]
3.3
3.3
IBIAS, [µA]
200
200
I0, [mA]
1.6
1.6
Minimum frequency, fmin, [GHz]
2.481
2.292
Maximum frequency, fmax, [GHz]
2.818
2.575
Tuning range, ∆f, [MHz]
337
283
Central frequency, fc, [GHz]
2.650
2.434
Output signal power, Pout, [dBm]
0.81
-0.47
Phase noise @100kHz offset, PN [dBc/Hz]
-73.5
-73.2
the cross-section from Fig. 6a and is basically 2D
structure, which is better predicted by the 2D device
simulations.
C. VCO design
VCO is designed by using the inductor and the
varactor models described in the previous sections. The
size of the varactors is scaled in order to obtain the desired
oscillation frequency (fosc) and the tuning range (∆f).
HCBT model used in the design is previously developed
standard Gummel-Poon model, which is already used in
the design of RF mixers [5]. Unit size transistors are used
and are connected in parallel depending on the required
collector current. Transistor sizes in terms of number of
parallel unit transistors are given in Table 3. The bias
current supplied of the chip in Fig. 2a is IBAS=200 µA. The
size ratio of TB1 and TB2 sets the tail bias current
I0=1.6 mA. The sizes of the transistors T1 and T2 are
chosen the same as in the tail current source in order to
avoid large current density when current is completely
steered in one of the branches. The results of simulated
electrical performance of VCO are summarized in Table
4. Table includes the column with assumed parasitic
capacitance at the output node which is used in
simulations (Cps). It includes the wire capacitance and all
possible discrepancies in capacitance in transistor and
varactor model. Its value is set to fit the measured data.
MIPRO 2016/MEET
Figure 9. Comparison of measured and modeled inductor Q-factor.
Figure 10. Comparison of measured and modeled inductance of the
inductor.
III.
MEASUREMENT RESULTS
Comparison of measured inductor Q-factor and
inductance up to 8 GHz with the ones obtained by EM
simulations and compact model used in the design phase
are shown in Figs. 9 and 10, respectively. Good fitting
between EM simulation and the compact model can be
observed as expected from the optimization results shown
in Fig. 5. Slightly higher value of inductance and Q-factor
is obtained in measurements. De-embedding of the
measurement results is done by using open-pad structure
only, and portion of the wire connecting inductor to the
pad adds small series inductance. Furthermore, the stack
of the substrate, metal and dielectric layers and their
electrical parameters which are set as the input to the EM
simulator might introduce some error. Nevertheless, a
satisfactory fit of the measured by modeled electrical
characteristics is achieved.
Comparison of the CV characteristic obtained by the
SPICE model used in the simulations and the measured Sparameters of fabricated varactor is shown in Fig. 11.
Good fitting of the measured CjCA by the model is
accomplished. The measured CjCSUB is underestimated in
the simulation due to the underestimated peripheral
component, which is not used in the 2D device simulation
structure employed for the model development (see
MIPRO 2016/MEET
Figure 11. Comparison of measured and modeled CV-characteristics of
the varactor
Figure 12. Measured quality factors of varactor at the anode and the
cathode terminals at frequncy f=2.5GHz.
Fig. 6a). On the other hand, the voltage dependency is
well reproduced and model can be easily calibrated by
adjusting the parameter CjB0 in (5). The extracted quality
factor of the varactor (QC) from the measured Sparameters at 2.5 GHz is shown in Fig. 12. The QC
measured at the cathode plays a more significant role
since it is connected to the output node. QC above 10 is
achieved for all applied control voltages, which is rather
low value, but according to (6), LC-tank quality factor
(QTOT) is still dominated by the inductor in our design.
Measured output signal power and frequency of
oscillation dependency on control voltage (VCTRL) at
IBIAS=200 µA of fabricated VCO are shown in Fig. 13 and
the results are summarized in Table 5. Measurements are
made directly on-chip by using multi-contact probes.
Maximum and minimum fosc for VCTRL of 0 V and 3 V are
2.53 GHz and 2.295 GHz, respectively, resulting in the
tuning range (∆f) of 235 MHz, which is 9.75 % of the
central frequency (fc=2.41 GHz). Measurement results fit
well to the results of the simulated VCO with assumed
parasitic capacitance Cps=80 fF reported in Table 4. The
assumed Cps takes into account the discrepancy of CjCSUB
in Fig. 11 as well as the wire capacitance at the output
node. The measured output power shows discrepancy
compared to the simulation results, which is attributed to
the actual buffer output impedance driving the 50 Ω input
55
TABLE V. MEASURED ELECTRICAL CHARACTERISTICS OF VCO
3.3
VCC, [V]
IBIAS, [µA]
200
I0, [mA]
1.74
Minimum frequency, fmin, [GHz]
2.295
Maximum frequency, fmax, [GHz]
2.53
Tuning range, ∆f, [MHz]
283
Central frequency, fc, [GHz]
2.434
Output signal power, Pout, [dBm]
-3.3
Phase noise @100kHz offset, PN [dBc/Hz]
Not available
100 kHz offset from the carrier frequency (see result for
Cp=80 fF in Table 4), which is rather high value. Since the
2
it can mainly be
phase noise is proportional to 1 QTOT
improved by incorporating high-Q inductors and
varactors.
Figure 13. Measured oscillation frequency and output signal power
dependency on control voltage of fabricated VCO.
IV.
CONLUSION
Design of the cross-coupled voltage controlled
oscillator in low-cost HCBT technology is presented.
Varactors and inductors are designed and modeled by
device and electromagnetic simulations and fabricated by
using the available process steps without altering the
technology. Good agreement between modeled and
measured electrical characteristics of fabricated passive
components is accomplished. It is shown that the VCO
performance is limited by the low Q-factor of the
inductor, which is fabricated in the 2nd aluminum metal
layer of standard CMOS interconnect.
REFERENCES
[1]
Figure 14. Measured oscillation frequency and output signal power
dependency on tail bias current of fabricated VCO.
[2]
impedance of the spectrum analyzer as well as to the
reduction of the QTOT by the resistance of the interconnect
wires that is not taken into account in the circuit design.
Reduction of the Pout for larger VCTRL can be also partly
attributed to the reduction of QTOT of the LC-tank. VCTRL is
applied at the anode of the varactor, and since the cathode
voltage is approximately VCC, the cathode-anode voltage is
decreased when VCTRL is increased resulting in the
decrease in QC (see Fig. 12). Hence, according to (6), QTOT
decreases. Measured Pout and fosc dependencies on the tail
bias current (I0) of the cross-coupled pair for two marginal
control voltages (VCTRL) which define the tuning range, are
shown in Fig 14. Frequency tuning range and Pout are
relatively constant for the I0 between 1.5 mA and 2.5 mA.
Pout rises for larger I0, which agrees with (2), and then
falls-off when transistors enter the high current regime.
Direct phase noise measurement method by using the
spectrum analyzer, which is available at the moment, is
not suitable for characterization of free-running VCOs [8].
Therefore, the simulation results are given in this paper.
Results show the phase noise of -73.2 dBc/Hz at the
56
[3]
[4]
[5]
[6]
[7]
[8]
H. S. Bennett, R. Brederlow, J. C. Costa, P. E. Cottrell, W. M.
Huang, A. A. Immorlica, Jr., J.-E. Mueller, M. Racanelli, H.
Shichijo, C. E. Weitzel, and B. Zhao, “Device and technology
evolution for Si-based RF integrated circuits,” IEEE Trans.
Electron Devices, vol. 52, no. 7, pp. 1235–1258, Jul. 2005.
T. Suligoj, M. Koričić, H. Mochizuki, S. Morita, K. Shinomura,
and H. Imai, “ Horizontal Current Bipolar Transistor (HCBT) with
a Single Polysilicon Region for Improved High-Frequency
Performance of BiCMOS ICs,” IEEE Electron Device Lett., vol.
31, no. 6, pp. 534-536, June 2010.
M. Koričić, T. Suligoj, H. Mochizuki, S. Morita, K. Shinomura,
and H. Imai, “Double-Emitter HCBT Structure—A High-Voltage
Bipolar Transistor for BiCMOS Integration,”, IEEE Trans.
Electron Devices, vol. 59 , no. 12 pp. 3647 – 3650, Dec. 2012.
M. Koričić, J. Žilak, T. Suligoj, “Double-Emitter ReducedSurface-Field Horizontal Current Bipolar Transistor With 36 V
Breakdown Integrated in BiCMOS at Zero Cost,” IEEE Electron
Device Lett., vol. 36, no. 2, pp. 90-92, Feb. 2015.
J. Žilak, M. Koričić, H. Mochizuki, S. Morita, T. Suligoj, “Impact
of Emitter Interface Treatment on the Horizontal Current Bipolar
Transistor (HCBT) Characteristics and RF Circuit Performance,”
in Proc. BCTM, 2015, pp. 31-34.
T. Suligoj, M. Koričić, H. Mochizuki, S. Morita, K. Shinomura,
and H. Imai, “Collector Region Design and Optimization in
Horizontal Current Bipolar Transistor (HCBT),” in Proc. IEEE
Bipolar / BiCMOS Circuits and Technology Meeting, 2010, pp.
212-215.
Behzad Razavi, “Oscillators,” in RF Microelectronics, 2nd
edition, Boston, MA, USA, Prentice Hall, 2011.
Keysight Technologies, “Phase Noise Measurement Solutions,”
Selection guide, USA, 5990-5729EN, Aug. 2014.
MIPRO 2016/MEET
Variable-Gain Amplifier for Ultra-Low Voltage
Applications in 130nm CMOS Technology
Daniel Arbet, Martin Kováč, Lukáš Nagy, Viera Stopjaková and Michal Šovčı́k
Department of IC Design and Test
Faculty of Electrical Engineering and Information Technology
Slovak University of Technology
Bratislava, Slovakia
e-mail: daniel.arbet@stuba.sk
Abstract—The paper deals with design and analysis of a
variable-gain amplifier (VGA) working with a very low supply
voltage, which is targeted for low-power applications. The
proposed amplifier was designed using the bulk-driven approach,
which is suitable for ultra-low voltage circuits. Since the power
supply voltage is less than 0.6 V, there is no risk of latchup
that is usually the main drawback of bulk-driven topologies. The
proposed VGA was designed in 130 nm CMOS technology with
the supply voltage of 0.4 V. The achieved results indicate that
gain of the designed VGA can be varied from 0 dB to 18 dB.
Therefore, it can be effectively used in the many applications such
as automatic gain control loop with ultra-low value of supply
voltage, where the dynamic range is the important parameter.
I. I NTRODUCTION
Advanced nanotechnologies enable very-large-scale
integration and bring the opportunity to design ultra
low-power analog and mixed-signal integrated systems. In
these modern nanoscale CMOS processes, also the power
supply voltage is continuously scaled down. Moreover, the
threshold voltage of MOS devices does not decrease with the
same slope as the supply voltage. Trends towards ultra-low
value of power supply voltage can effectively increase battery
life of portable electronics, biomedical implanted devices,
hearing-aid devices, etc.
One of the most important building blocks of analog
integrated circuits (IC) is a variable gain amplifier (VGA),
which is used in many applications in order to stabilize the
voltage amplitude of a signal at its output. VGA is usually
employed in an automatic gain control (AGC) circuit to
maximize the dynamic range of the whole system.
Since the proposed VGA is meant to be used in low-voltage
applications, the possibility of using standard amplifier
topologies is limited (because the supply voltage is usually less
than 1 V). Standard VGA topologies are ussually based on the
conventional differential structure [1]–[3]. Since there are four
or more stacked transistors, these topologies require a high
value of the supply voltage and it is not suitable for ultra-low
voltage applications. One possible topology of a VGA suitable
for a low supply voltage is based on the pseudo-differential
difference amplifier (PDDA). PDDA topology can effectivelly
increase the input and output voltage ranges. On the other
hand, disadventages of the PDDA include high sensitivity
to process and temperature variations (PVT) and low value
MIPRO 2016/MEET
of CMRR (Common-Mode Rejection Ratio) parameter due
to a missing tail source. Therefore, ussually Common-Mode
Feedback (CMFB) or Common-Mode Feedforward (CMFF)
circuit is employed to stabilize the operational point and to
increase CMRR of the PDDA [4], [5].
In the case when the PDDA is based on the two
common-source amplifiers [4], the input voltage range is
limited by the treshold voltage of the input transistors. In order
to increase the input voltage range, a rail-to-rail input stage has
to be used. For this purpose, some of unconventional design
techniques is needed. In order to overcome this limitation and
also to increase the input voltage range of VGA, so-called
bulk-driven technique (MOS devices are controlled by bulk
instead of gate) was used to design the proposed VGA.
Bulk-driven approach has been employed to design a number
of analog building blocks [7], [8].
In this paper, design of ultra low-voltage bulk-driven VGA
based on pseudo-differential topology is presented and its main
parameters are analysed. In Section II, the proposed topology
of VGA is presented. Performed small-signal analysis of the
VGA is then described in Section III. Achieved results are
presented in Section IV. In Section V, the achieved parameters
of the developed VGA are summarized and shortly discussed.
II. P ROPOSED VARIABLE G AIN A MPLIFIER
A. VGA general description
CTRL
+IN VGA
+OUT21
+
_
PDDA
+
_
-OUT22
-IN
CMFF
CMFB
Fig. 1. Block diagram of the proposed VGA
Fig. 1 shows the block diagram of the proposed VGA.
The main block representing the VGA core is a bulk-driven
fully PDDA (FPDDA) with the gain control input terminal.
FPDDA is based on two common-source amplifiers, where
a bulk-driven input transistor is used. In order to stabilize
the operational point and increase the CMRR of the proposed
57
VGA, CMFB and CMFF circuits were employed. To achieve
good stability of the CMFB loop, frequency compensating
capacitors have been used (not depicted in Fig. 1).
B. Proposed Pseudo-Differential VGA topology
Schematic diagram of the proposed ultra-low voltage VGA
core circuit is depicted in Fig. 2. The VGA was designed in
130 nm CMOS technology and can reliably operate at the
supply voltage of 0.4 V.
VDD
M7
-OUT
M8
+OUT
CMFB_out
VDD
M6
M5
drop on diode-connected transistor M13) is stable. Difference
between values of common-mode voltage at the VGA inputs
will change the current through transistor M11. This current
is mirrored by transistor M12 to transistor M13 and its
voltage drop will change. This principle was used to regulate
(gate) bias voltage for input transistors M1-M4 when the
common-mode voltage at the inputs of VGA is changed due
to PVT. Finally, voltage drop on the transistors M1 and M2
was used as bias voltage for input transistor in the CMFF
circuit (M9 and M10). Thank to this biasing technique, the
VGA sensitivity to common-mode voltage as well as PVT is
reduced.
M12
CTRL
+IN
cmff_b1
M3
M1
cmff_b2
CMFF_out
M11
CMFF_out
Gain control
M4
-IN
M13
+IN
-IN
cmff_b1
M9
M2
cmff_b2
M10
Fig. 3. Schematic diagram of CMFF circuit
Fig. 2. Schematic diagram of the bulk-driven FPDDA-based VGA
The proposed topology is based on the cascode FPDDA,
where bulk-driven (BD) input MOS transistors were used.
Input BD transistors are used to obtain the rail-to-rail input
voltage range. Unfortunately, bulk transconductance of a MOS
transistor is given by the following expression gmb ≈ 0.2gm ,
which means that the gain and gain-bandwidth product (GBW)
of the proposed VGA will be decreased. On the other hand,
the circuits designed by the bulk-driven approach are useful
for ultra low-voltage and low power applications.
Generally, gain of the VGA can be varied by controlling
its total conductance or the total output resistance. Thus, in
our case, transistors M5 and M6 were employed to control
the VGA gain. Voltage change at the VGA control terminal
(CTRL) causes a change in current flowing through the
input transistors M1 and M2, and thereby regulates their
transconductance. Thus, changes in transconductance of the
input transistors lead to variation of the total VGA gain.
Therefore, the total gain of VGA is directly proportional to
transconductance of the gain control transistors (gm5 and gm6 ).
Additionally, the gain control transistors together with the
input transistors represent a cascode stage and therefore, gm5
and gm6 influence also the output resistance of VGA. Detailed
small-signal analysis of the designed VGA is described in
Section III.
C. CMFF circuit
In order to adjust the bias voltage (CMFF out) of input
transistors (M1-M4), a CMFF circuit depicted in Fig. 3 was
used. Transistors M9 and M10 represent the input transistors
of the CMFF circuit, which follow the signal at the input
of VGA. In the case of differential input signal, there is
no change in the current flowing trough the diode-connected
transistor M11, and voltage at the CMFF output (voltage
58
D. CMFB circuit
In general, a CMFB is needed to prevent the output
voltages from saturating to one of the power rails when the
input common-mode voltage varies or there is a change in
the circuit operating point due to VGA control voltage. A
CMFB usually consist of a common–mode voltage detector
and an error amplifier, as presented in [9]. In general, this
approache suffers from chip area overhead in terms of resistors
implementation and separation function of two main active
blocks. In addition, potential incompatibality of conventional
structures with ultra–low voltage application is increased
(especially with the supply voltage value deep below 1 V),
which results in frequently used low voltage realizations like
current-based operation CMFB (CB–CMFB) or its improved
version [10] combining the common–mode detection and gain
functionality if comapared to many published common–mode
detection only realizations [9].
The novel self-biased bulk driven CMFB circuit that
combines common–mode voltage detection and error
amplification capabilities is introduced (Fig. 4), and used as
the most important part of the low-voltage VGA based on
pseudo–differential structure. The main idea is slightly similar
to CB–CMFB, where two differential pairs (N 2x, N 3x)
with purposely degraded current sources (N 1x) of the same
polarity are connected in parallel but in our design, current
mixing is accomplished in the output node CM F B out
across two current mirrors (N 4x, N 5x). Degraded tail
current sources (diode-connected transistors in this case, in
opposite to CB–CMFB) was employed to improved entire
common-mode gain of the VGA. We have to note that blue
dashed feedback consisting of these two transistors can works
as a negative feedback (across N 3x) or a partially positive
feedback (across N 4x, N 5x). Additionally, the output node
MIPRO 2016/MEET
+
gm5Vgs5
rds5
Vgs5
+
1
2 Vin
1g
V
2 mb1 in
rds1
Vbs1
-
+
1g V
2 m3 in
rds4
rds7
1
2 Vout
Vds1
-
Fig. 5. Small-signal model of proposed VGA
Using the small-signal model (Fig. 5), the total VGA
transconductance Gm can be expressed as follows:
Gm = gmb ·
Fig. 4. Schematic diagram of the bulk-driven CMFB: original design (black),
modification (red), N–type realization
is jammed between two current sources (N 3x, N 5x) with
high output impedance. Therefore, further gain improvement
can be expected, which is in the case of inherited low gain
bulk–driven configuration, the main effort. Finally, similar
to the PDDA, also self-biased approach was employed into
CMFB design that slightly contributes to improvement of
the entire PSSR of VGA and eliminates the need for a
separate bias circuit. Originally, the designed CMFB needs
only one bias voltage for transistors N 2x and N 3x that
was in case of self-biased configuration formed by antiseries
diode like connected transistors N 6x and N 7. This forms
two additional feedback loops, where green dashed and
red dashed loop introduces partially positive and negative
feedback, respectively. Therefore, design of the CMFB
circuit must be carefully taken and rather investigated as
part of the whole VGA, especially, if suitable common-mode
phase margin requirements have to be fulfilled. To improve
its common-mode and differential-mode performance, the
topology was extended by a couple of bridged transistors
N 8x that lightly balance current between two differential
pairs.
III. S MALL -S IGNAL A NALYSIS
A. Small-Signal Analysis of VGA
For better understanding of the gain control technique
used in the developed VGA, the low-frequency small-signal
analysis has been performed. The small-signal model of the
proposed VGA (without CMFB circuit and second stage) is
shown in Fig. 5. Since the selected topology is symmetrical, in
the small signal analysis, we can consider half-circuit model
and evaluate impact of +Vin and −Vin separately, and then
simply sum the contribution of both parts.
The total low-frequency gain of the proposed VGA circuit
can be written as
A = Gm · Rout ,
(1)
where Gm and Rout represent the total transconductance and
the total output resistance of the proposed VGA, respectively.
MIPRO 2016/MEET
gm5 + gds5
− gmb3
gds1 + gm5 + gds5
(2)
where gmb1 and gmb3 are bulk transconduntance of transistors
M1 and M3, gm5 is a transconductance of transistor M5, while
gds1 and gds5 are output transconductance of transistors M1
and M5. If we consider that gm3 >> gds5 , then equation 2
can be rewritten as follows:
Gm1 = gmb1 ·
gm5
− gmb3 = gmb1 · K − gmb3 (3)
gds1 + gm5
K=
1
gm5
=
gds1
gds1 + gm5
1+
gm5
(4)
One can observe that the total transconductance of VGA (Gm )
can be controlled by transconductance gm5 , which depends
on the control voltage (CTRL). Coefficient K can vary in the
range from 0 to 1. If gm5 << gds1 the coefficient K form
equation 4 is become zero and the total transconductance of
VGA will be equal to −gmb3 . In the opposite case (when
gm5 >> gds1 ), coefficient K is equal to 1 and Gm1 will
be equal to gmb1 − gmb3 . This meas that by increasing
the transconductance gm5 , the total transconductance Gm1 is
decreased.
Besides, the total transconductance Gm also depends on
gds1 /gm5 ratio. If numerator and denominator of the ratio are
divided by Ids1 , the following expression can be written:
gds1
gds1
VCT RL − Vth1
I
=
,
= gds1
m5
gm5
2 · (VA1 + Vds1 )
Ids1
(5)
where VCT RL is the control voltage of VGA, and Vth1 , VA1
and Vds1 is the threshold voltage, Early voltage and voltage
between source and drain of transistor M1, respectively. From
equation 5, it can be observed that gds1 /gm5 ratio, which
controls the total transconductance Gm , depends on the control
voltage of the proposed VGA.
Using the small-signal model depicted in Fig. 5, the output
resistance Rout of the proposed VGA is expressed as
Rout1 = [(1 + gm5 rds1 )rds5 + rds1 ]||rds7 ||rds4 ,
(6)
59
where rds1 , rds4 , rds5 and rds7 is the output resistance of
transistor M1, M4, M5 and M7, respectively. Since the term
gm5 rds1 rds5 > rds1 , equation 6 can be simplified to
Rout = gm5 rds1 rds5 || rds4 ||rds7
|
{z
} | {z }
A
(7)
B
In equation 7, terms A and B represent a parallel combination
of two resistances. From the VGA design point of view, if term
B is maximized, the total resistance Rout of the VGA will
depend on term A. In such a case, it is possible to vary Rout
by transconductance gm5 that is proportional to the control
voltage of the VGA.
B. Small-Signal Analysis of CMFB
To investigate the low frequency common–mode gain of the
CMFB circuit (investigation of error amplifier character), the
exhausted small–signal model has been derived (Fig. 6), where
the individual parameters are also listed.
2.5 dB. Additionally, one can observe that using gate–driven
approach, common–mode gain of the novel self–biased CMFB
can be improved further up to 4 − 5times.
IV. S IMULATION RESULTS
In this section, simulation results achieved for the designed
VGA (including the CMFB and CMFF circuits) using the
supply voltage of 0.4 V are presented. The results were
obtained from Corner and Monte Carlo (MC) analyses, where
the process variation as well as mismatch of devices were
taken into account. Since the proposed VGA is suitable for
low-frequency applications, there is an assumption that results
obtained by post-layout simulation will not differ significantly.
The frequency response of VGA for different values of the
control voltage is shown in Fig. 7. It can be observed that for
the control voltage of 0.1 V, the VGA gain of about 18 dB
and GBW of 1.2 MHz were achieved. These parameters were
obtained for the load capacitance of 1 pF. For higher values
of the control voltage, the gain decreases down to 6 dB. It is
important to note that the bandwidth (BW) does not change
with the CTRL voltage, which is one of advantages of the
proposed VGA.
f -3
2 0
A
V
[d B ]
0
f -3
-1 0
V
-2 0
V
-3 0
gmb2
gm5 + gmb5
)(
)
gds3 + gds5 gm4 + gmb4
(8)
Equation 8 is based on the following presumptions:
!
!
!
!
GM 1 = GM 2 ∧ RO1 = RO2 ∧ RO1 RO6 ∧ GM 4 RO6 = 1
(9)
Thus, equation 8 reveals a few interesting observations:
common–mode gain depends only on properties of the
differential pairs and current mirrors with no influence of
bias circuit and transistors N 1x. Such a degree of freedom
can be useful for finding the solution to comply the sensitive
restrictions (Eq.9), where the last one is the most difficult
to fulfill. In our case, its value was approximately 1.029 in
all corners that leads to inaccuracy of calculation up to cca
60
C T R L
V
Fig. 6. Low frequency small–signal model of CMFB
AvCM F Bapp = (
= 1 7 3 .8 k H z
f0
1 0
Exhausted result of close-loop common-mode gain of
CMFB derivation is however, rigorous and too difficult
to interpret or extract any handy information because of
three existing internal feedback loops, as already discussed.
Fortunately, if considering the operational modes of individual
transistors (i.e. N 1 works like linear resistor, N 4, N 6, N 7
and N 2, N 3, N 5 represent diode-connected transistors and
current sources, respectivelly) approximated expression of
common-mode gain of CMFB can be derived:
d B
1
1 0
C T R L
C T R L
d B
f0
d B
= 1 .2 M H z
= 1 7 3 .6 k H z
d B
= 2 8 9 .3 k H z
= 0 .1 V
= 0 .2 5 V
= 0 .3 V
1 0 0
1 k
1 0 k
F re q u e n c y [H z ]
1 0 0 k
1 M
1 0 M
Fig. 7. Frequency response of the VGA
As can be observed from Fig. 8, the VGA gain is varied in
the range of the control voltage from 0 V to 0.33 V but it is
linear only in the range from 0.3 V to 0.33 V. In the whole
range, gain varies from 18 dB to 0 dB. The gain change (for
low values of CTRL) caused by process variations is 3.34 dB,
which represents a significant result taking into account that
the proposed VGA was designed in 130 nm technology. On
the other hand, variance of gain for high values of CTRL is
substantially higher because the slope of the characteristics is
changed by the different process corner. The worst case can
be observed in fast-slow (FS) and slow-fast (SF) corner.
Although the supply voltage is very low, gain is relatively
stable for low values of CTRL from temperature point of
view. For high values of CTRL, deviation of gain caused
by temperature increases because the bias voltage of input
transistors is temperature dependent. This deviation can be
reduce using temperature compensation of the bias voltage.
The important pameter of the PDDA is a CMRR. Fig. 9
shows variations of the CMRR parameter obtained from MC
analysis. It can be observed that CMRR varies in the range
MIPRO 2016/MEET
2 0
1 5
[d B ]
1 0
A
V
5
fa s
fa s
s lo
s lo
ty p
0
-5
-1 0
0 .1 0
t n m
t n m
n m
w n m
ic a l
w
o s o s o s
o s
fa s t
s lo w
- fa s
- s lo
0 .1 5
0 .2 0
p m o s
p m o s
t p m o s
w p m o s
V
C T R L
[V ]
0 .2 5
0 .3 0
0 .3 5
Fig. 8. VGA gain vs the control voltage in all process corners
from -40 dB to -85 dB, while the mean value is about -60 dB.
This is very good results generally, which was achieved thank
to both CMFF and CMFB circuits being employed.
CMFB, we individually investigated its performance. Two
most important characteristics are depicted in Fig. 11 and
Fig. 12. Fig. 11 shows the frequency response across
all corners, where the load capacitance of 0.5 pF was
considered and the reference voltage V REF was set to
Vdd /2. The common–mode gain between 17.38 dB and
21.29 dB was achieved that is however, further increased by
gmM 7 and gmM 8 transconductances. The achieved value of
common-mode gain is sufficient to hold the common-mode
output voltage approximately in the range from 188.5 mV to
199.8 mV in all corners. However, the phase of common–mode
gain transfer function (not shown) moves in the range from
0 to 360 deg and therefore, compensation task could be
quite challenging. One can also observe that the minimum
bandwidth of 23.43 kHz was achieved at slow-slow (SS)
corner. This value is sufficient for biomedical and audio
applications, where the bandwidth of approximately 1 kHz
and 20 kHz is required, respectively.
2 0 0
1 5 0
7 .8 1 d B
2 0
[d B ]
1 5
1 0
C M F B
1 0 0
5
A v
N u m b e r o f S a m p le s
2 5
M e a n = 6 1 .3
S td D e v = 9 .7 7
0
5 0
ty p
fa s
s lo
s lo
fa s
-5
0
-9 0
-8 0
-7 0
C M R R
-6 0
-5 0
[d B ]
-1 0
-4 0
Fig. 9. Variations of CMRR evaluated by MC analysis
Since the self-biased technique was used in the proposed
VGA, the PSRR parameter is also improved. Fig. 10 shows the
variation of the PSRR parameter, where the process variation
and mismatch of all devices were taken into account. One
can observe that the PSRR parameter varies in the range from
-75 dB to -25 dB, while the mean value of PSRR is about
-45 dB.
2 0 0
m o s
p m o s
p m o s
p m o s
1 0 0
1 k
F re q u e n c y [H z ]
1 0 k
1 0 0 k
1 M
Fig. 12 shows the CMFB output control voltage VCM F B out
versus the VGA output voltage. It is obvious that CMFB is
characterized by good performance in terms of the output
control voltage range, where CMFB exhibits 0.12 V linear
range behavior at 0.4 V supply voltage. We have to also
note that FS corner belongs to the worst case corner in terms
of common-mode gain and CMFB output voltage operation
region.
ty p
fa s
s lo
s lo
fa s
0 .3 0
0 .2 5
c m fb _ o u t
[V ]
1 0 0
ic a l
t n m
w n m
w n m
t n m
o s - fa s t
o s - s lo
o s - fa s
o s - s lo w
p m
w p
t p m
p m
o s
m o s
o s
o s
0 .2 0
0 .1 2 V
0 .1 5
V
N u m b e r o f S a m p le s
1 0
0 .3 5
5 0
o s - fa s t p
o s - s lo w
o s - fa s t
o s - s lo w
Fig. 11. Frequency response of CMFB
M e a n = -4 8 .9 1
S td D e v = 1 0 .7
1 5 0
1
ic a l
t n m
w n m
w n m
t n m
0 .1 0
0
-8 0
-7 0
-6 0
-5 0
P S R R
[d B ]
-4 0
-3 0
-2 0
Fig. 10. Variation of PSRR obtained from MC analysis
Since the designed VGA consists a novel bulk-driven
MIPRO 2016/MEET
0 .0 5
0 .0 0
0 .0
0 .1
V
0 .2
c m f b _ in
[V ]
0 .3
0 .4
Fig. 12. CMFB output voltage vs. common output voltage of VGA
61
15
400
24
20
500
Adm [dB]
2,25
BW [KHz]
GBW [MHz]
IDD [μ A]
3,00
30
Main parameters of VGA and CMFB circuits obtained
from Corner Analysis are summarized in Fig. 13 and Fig.14,
respectively. In the case of the proposed VGA, FF corner
represents the worst case. In FF corner, the gain of 15.98 dB
and current consumption of 19.57 µA were achieved. A
summary of all characteristic parameters i.e. static current
consumption, BW, GBW and the maximum ripple of CMFB
output voltage 4VCM F B out driven by ± 200 mV differential
output voltage of CMFB, can be found in Fig. 14. Because
diode-connected transistors were used instead of current–mode
operation transistors, one can observe substantial current
fluctuation across individual corners affecting also BW and
GBW. The worst case static consumption of approximately
12.5 µA was observed for fast-fast (FF) corner. Since the
proposed VGA is based on two input differential pairs and uses
the cross-coupled topology, the total current consumption does
not depend on the control voltage CTRL, which is considered
as advantage.
5
12
6
0
SF
FS
0
SS
0,00
FF
100
0,75
0
200
10
18
300
1,50
TT
Corners
120
100
10
8
80
60
SF
2
20
0
0
SS
0
FF
200
TT
50
0
40
400
100
6
600
4
Vcmfb_out [mV]
150
Δ
800
BW [kHz]
GBW [kHz]
IDD [μ A]
200
12
Fig. 13. Summary of the VGA main parameters across all corners.
FS
Corners
Fig. 14. Summary of the CMFB main parameters across all corners.
V. C ONCLUSION
The novel VGA circuit based on PDDA topology
for low-voltage applications, designed in 130 nm CMOS
technology, was presented. As demonstrated, the designed
VGA represents a building block for low-power and
low-voltage applications, where the supply voltage of less than
0.6 V, differential signal processing as well as high dynamic
range and low distortion are required. The future work will be
62
TABLE I
M AIN PARAMETERS OF THE PROPOSED VGA
Parameter
VCT RL [V]
Av tun. range
[dB]
Condition
Min
Typ
Max
VDD = 0.4 V
0
-
0.33
0 ÷ 0.33
0
-
18
Av Lin-in-dB [dB]
0.3 ÷ 0.33
0
-
8
GBW [Hz]
CL = 1 pF
289k
-
1.2M
CL = 1 pF
173.6k
-
173.8k
CMRR [dB]
VDD = 0.4 V
−40
−60
−85
PSRR [dB]
VDD = 0.4 V
−25
−45
−75
Av , cmfb [dB]
CL = 0.5 pF
17.4
20.1
21.29
GBW, cmfb [kHz]
CL = 0.5 pF
290
570
820
BW, cmfb [kHz]
Lin. op. range,
cmfb [V]
CL = 0.5 pF
20
50
90
-
0.12
0.13
0.15
Total power [µW]
VDD = 0.4 V
1.39
3.45
7.84
BW [Hz]
led towards design of a logarithmic function generator in order
to increase the linear-in-decibel range as well as on further
improvement of the main VGA parameters in terms of ExG
requirements.
ACKNOWLEDGMENT
This work was supported by the Slovak Republic under
grants VEGA 1/0762/16 and VEGA 1/0823/13.
R EFERENCES
[1] H. D. Lee, K. A. Lee, and S. Hong, “A Wideband CMOS Variable Gain
Amplifier With an Exponential Gain Control,” Microwave Theory and
Techniques, IEEE Transactions on, vol. 55, no. 6, pp. 1363–1373, June
2007.
[2] P.-C. Huang, L.-Y. Chiou, and C.-K. Wang, “A 3.3-V CMOS
wideband exponential control variable-gain-amplifier,” in Circuits and
Systems, 1998. ISCAS ’98. Proceedings of the 1998 IEEE International
Symposium on, vol. 1, May 1998, pp. 285–288 vol.1.
[3] T. Yamaji, N. Kanou, and T. Itakura, “A temperature-stable CMOS
variable-gain amplifier with 80-dB linearly controlled gain range,”
Solid-State Circuits, IEEE Journal of, vol. 37, no. 5, pp. 553–558, May
2002.
[4] A. Suadet and V. Kasemsuwan, “A 1 Volt CMOS Pseudo Differential
Amplifier,” in TENCON 2006. 2006 IEEE Region 10 Conference, Nov
2006, pp. 1–4.
[5] M. Shahabi, R. Jafarnejad, J. Sobhi, and Z. Daei Kouzehkanani,
“A novel low power high CMRR pseudo-differential CMOS OTA
with common-mode feedforward technique,” in Electrical Engineering
(ICEE), 2015 23rd Iranian Conference on, May 2015, pp. 1290–1295.
[6] F. Khateb and S. Vlassis, “Low-voltage Bulk-driven Rectifier for
Biomedical Applications,” Microelectron. J., vol. 44, no. 8, pp. 642–648,
Aug. 2013.
[7] G. Raikos, S. Vlassis, and C. Psychalinos, “0.5 V bulk-driven
analog building blocks,” International Journal of Electronics and
Communications, vol. 66, no. 11, pp. 920 – 927, 2012.
[8] J. Carrillo, G. Torelli, R. Prez-Aloe, and J. Duque-Carrillo, “1-V
rail-to-rail CMOS OpAmp with improved bulk-driven input stage,” IEEE
Journal of Solid-State Circuits, vol. 42, no. 3, pp. 508–516, 2007.
[9] J. Carrillo, G. Torelli, M. Dominguez, R. Perez-Aloe, J. Valverde,
and J. Duque-Carrillo, “A Family of Low-Voltage Bulk-Driven CMOS
Continuous-Time CMFB Circuits,” Circuits and Systems II: Express
Briefs, IEEE Transactions on, vol. 57, no. 11, pp. 863–867, Nov 2010.
[10] F. Castano, G. Torelli, R. Perez-Aloe, and J. Carrillo, “Low-voltage
rail-to-rail bulk-driven CMFB network with improved gain and
bandwidth,” in Electronics, Circuits, and Systems (ICECS), 2010 17th
IEEE International Conference on, Dec 2010, pp. 207–210.
MIPRO 2016/MEET
Relaxation Oscillator Calibration Technique with
Comparator Delay Regulation
J. Mikulić*, G. Schatzberger* and A. Barić**
ams AG, Graz, Austria
University of Zagreb/Faculty of Electrical Engineering and Computing, Zagreb, Croatia
josip.mikulic@ams.com
*
**
Abstract – This paper presents an improved technique for
the calibration of the relaxation oscillators with respect to
the delay of the comparators. The drawbacks of the conventional topology for the relaxation oscillators are analyzed.
Based on the analysis, the circuit modification which resolves the effects of the comparator delay in the trimming
procedure is proposed. The simulations in ams 0.18μ CMOS
technology exhibit more than 5x the improvement in the
precision compared to the conventional topology, evaluated
in the temperature range from -40 to 125 ○C.
I.
INTRODUCTION
A stable clock reference is one of the basic building
blocks of every digital and mixed-signal circuit. Together
with the ever-increasing trend of the semiconductor industry, the clock references tend to be implemented as a full
on-chip solution in the systems that require low power
consumption and low production cost [1-5]. As a drawback compared to the crystal oscillators, they suffer from
the reduced accuracy, most of the time being in the range
of several percentage points [1].
Although some techniques have been adapted in order
to minimize the influence of the process and the supply
voltage variations [1-3], they rely on the stable references
over the temperature, which are not always at the disposal
for low-cost, full on-chip solutions. On the other hand, the
techniques for the process and temperature compensation
of the references, such as [6], have a limited success, and
are always inferior to the performance of the crystal oscillators. Therefore, it is obvious that the trimming procedure
should be considered for the increased accuracy.
In this paper, the digital trimming technique is described, similar to the technique presented in [4]. Combined with the LUT (Look-Up Table) and a temperature
sensor, it enables the calibration of the oscillator in the
entire temperature range. This principle is illustrated using
the conventional relaxation oscillator topology, shown in
Fig. 1, which has the advantage of having the calibration
possibility with the referent currents, in contrast to ring
and harmonic oscillators where the trimming of the resistors and capacitors would be needed. Finally, the modification of the topology which ensures more accurate results
of the calibration procedure is proposed, with a negligible
increase of the circuit complexity and power consumption.
This paper is organized as follows. In Section II the
conventional topology is described and analyzed. The
MIPRO 2016/MEET
Figure 1. The conventional relaxation oscillator topology.
Section III proposes the improvement of the design, while
in Section IV the improved topology is verified with the
simulations. The final conclusions are given in Section V.
II.
CONVENTIONAL OSCILLATOR ARCHITECTURE
A. Timing Analysis
The conventional topology of the relaxation oscillator
is shown in Fig. 1. As seen in Fig. 2, the capacitor C
charges and discharges with the current IREF. The switches
S1 and S2 alternate the charging and discharging process
every half-cycle. The two comparators, biased with the
current IB, activate the set and reset signals of the SR flipflop at the moments when the capacitor voltage VC gets
higher and lower than the referent voltages VREFhigh and
VREFlow, respectively. From Fig. 2 it can be seen that the
duration of one period is determined by the slew rate of
the capacitor voltage (SRVC = C/IREF), the difference of the
Figure 2. One period of the capacitor voltage VC.
63
referent voltages (ΔVREF = VREFhigh – VREFlow), and the time
needed for the comparators to activate the set and reset
signals (td1, td2).
If we neglect the switching delays of the flip-flop and
the switches, and assume that td1 = td2 = td, the expression
for the period is then calculated as follows:
C VREFhigh VREFlow
2CVREF
T 2
td 1 td 2
4td
I
I REF
REF
The comparator delay td observed in (1) represents a
serious problem in low-power and high-precision designs
[1]. First of all, it takes a significant portion of the total
period duration, unless an excessive amount of power is
consumed. Furthermore, it is never precisely known, being influenced by parasitics, as well as the process, temperature, supply and referent voltage changes. It also has a
negative influence on the calibration procedure, explained
in the following section.
B. Calibration Method
From (1) we can observe that the oscillation period can
be modified with the current IREF. A straightforward way
to realize this is a current DAC (Digital to Analog Converter) controlled by the temperature sensor. The block
scheme of such a system is shown in Fig. 3. As a result,
the current IREF is defined as
I REF B I REF0
B
T0 4td
Tdes 4td
where T0 represents the measured period with B = 1.
From (4) we can observe that for the precise calibration, the parameter td has to be known upfront. As this is
never the case, the method for bypassing this effect is developed in the following section.
III.
MODIFIED OSCILLATOR ARCHITECTURE
A. Comparator Delay Analysis
At the beginning, the analysis of the comparator delay
is conducted. For this purpose, two complementary symmetrical OTA (Operational Transconductance Amplifier)
topologies shown in Fig. 4 are employed. Using the symmetrical OTA topology has the advantage of reduced systematic offset and better transfer characteristic in contrast
to the basic, asymmetrical OTA topology. The complementary topologies are suitable because of the different
voltage levels at the inputs of each comparator.
In Fig. 5 the time diagram of the low-to-high transition
of the clock voltage VCLK is shown. Currents and voltages
correspond to the OTA in Fig. 4(a), i.e. to COMP1 in
Fig. 1. For t < 0, the voltage at the inverting input of the
comparator (VREFhigh) is larger than the voltage at the noninverting input (VC). From this it follows that the current
ID1 is larger than ID2, the transistor M10 is in the linear re-
where IREF0 is the value of the unaltered referent current,
while B is the referent current multiplication factor. Equation (1) then becomes:
T
2CVREF
4td
B I REF0
From (3) we can derive the expression for the factor B
needed to trim the oscillator to the desired period (Tdes) at
a certain temperature:
(a)
(b)
Figure 3. The referent current and voltage generator, together with the
calibration circuitry.
64
Figure 4. The symmetrycal OTA (a) with nMOS input pair (COMP1)
(b) with pMOS input pair (COMP2) .
MIPRO 2016/MEET
Furthermore, we can safely neglect the higher order effects in (6) related to ΔV2, as the voltage ΔV in this application will be low. As a result, the expression in (6) is
reduced to
I D1, 2
IB
1
2 KT
1 V
2
IB
For t > 0, the current IOUT, which charges and discharges parasitic capacitances at the output node of the
comparator, can be written as
I OUT I D8 I D10 I D 2 I D1
Equation (9) combined with (8) leads to
Figure 5. The time diagram of the comparator signals during the lowto-high transition of the clock voltage VCLK.
gion, while the transistor M8 is in the saturation region. As
a consequence, the voltage VOUT is at the ground level. At
the moment t = 0, the voltages at the inputs are equal, as
well as the currents ID1 and ID2. At this moment, the transistor M10 starts to enter the saturation region, while the
output current IOUT, which charges the parasitic capacitance Cp at the output, becomes equal to the difference of
the currents ID8 and ID10. The voltage VOUT then rises and
reaches the ΔVthr level of the flip-flop at t = td, which then
changes the output state of the flip-flop from low to high.
At this moment, the states of the switches S1 and S2 also
reverse, and the voltage VC now starts to decrease with the
identical slew rate. As a result, the waveforms of the currents ID1 and ID2 are mirrored around t = td. They become
equal again at t = 2td, after which they discharge the voltage VOUT to the ground level.
In order to describe the behavior quantitatively, first
we use the equation for the strong inversion of the MOS
transistor, with neglected channel length modulation effect:
I D KT (VGS VT ) 2
where ID is the drain current, KT is the technology and
design constant, VGS is the gate-source voltage, and VT is
the threshold voltage. Using (5), we can derive the following equation for the differential currents of the transistors
M1 and M2:
I D1, 2
IB
1
KT
1 V
2
IB
2
KT V 2
IB
V VIN VIN VC VREFhigh
MIPRO 2016/MEET
I OUT V 2KT I B
For 0 < t < td, the differential voltage ΔV is equal to
V (t )
I REF
t
C
Combining (10) with (11), we obtain the current IOUT as a
function of time for 0 < t < td :
I OUT (t )
I REF
2 KT I B t
C
The delay td of the comparator is experienced as the
time needed for the output voltage to charge up to the
threshold voltage ΔVthr of the SR flip-flop (approximately
half of the supply voltage). As the following is valid:
Vthr
1 td
I OUT (t )dt
Cp 0
the expression (13) combined with (12) then gives
Vthr
1 td I REF
2 KT I B tdt
Cp 0 C
After the integration, we obtain the following expression
for the delay td of the comparator
td
2CVthrC p
A similar analysis can be performed for the high-tolow transition. Although in this case the parameters Cp,
ΔVthr and KT slightly differ, this difference can be neglected for the purpose of this simplified analysis.
where
I REF 2 KT I B
65
B. Calibration Method with Modified Architecture
After combining (1) and (15), the expression for the
duration of the clock period becomes
T
2CVthrC p
2CVREF
4
B I REF0
B I REF0 2 KT I B
If we set the bias current IB in the following relation
with IREF0
I B B 2 I REF0
circuit is shown in Fig. 6.
First, we start from the current equation valid for MOS
transistors in weak inversion. The currents IB1 and IB2 are
equal to
Each parameter in (18), excluding the factor B, is a constant determined by the design and process at a certain
temperature. If we compare (18) with (3), we can see that
the delay of the comparators can be modified by varying
the factor B in the same way as the rest of the expression.
As a result, the expression (4) reduces to
T
B 0
Tdes
I R
I R
I B 2 I B1 exp B1 B I B1 1 B1 B
nkT / q
nkT / q
I B1 I REF B I REF0
I B I B 2 I REF
together with
the expression for the bias current IB turns into
2
I B B 2 I REF
0
C. Square Function Realization
The improved calibration method, according to (17),
requires the square relationship between the bias current IB
and the parameter B. Exponential behavior of the MOS
transistors in weak inversion will be exploited in order to
realize this relationship. The schematic of the proposed
for IB1RB/(nkT/q) < 1. If we consider that
from which it follows that an improved precision will be
achievable, since the only parameter required for the calibration is the measured period T0.
where I0 is the constant determined by the design and process, n is the emission coefficient, k is the Boltzmann constant, T is the temperature in Kelvin and q is the charge of
an electron. From (20), we can calculate the relationship
of the two currents:
where κ is an arbitrary constant, the expression for the
clock period then becomes equal to
2 I REF0
4 CVthrC p
KT
1 2CVREF
T
B I REF0
I REF0
V
I B1, 2 I 0 exp GS 1, 2
nkT / q
RB
B 2 I REF0
nkT / q
where
I REF0 RB
nkT / q
Although in theory κ close to or larger than one would
induce the higher order effects in the approximation made
in (21), in practice it will push the transistor MB2 towards
the strong inversion region, compensating for the effects
and making the approximation valid again. Also note that
the circuit in Fig. 6 can be scaled down to reduce the power consumption.
IV.
SIMULATIONS
To confirm the observations made in the previous sections, the simulations in ams 0.18μ CMOS technology are
performed. The basic topology from Fig. 1 is combined
with the reference generators and the calibration circuitry
shown in Fig. 3. The improved topology also utilizes the
bias current generator from Fig. 6.
Figure 6. The square function generator.
66
For the simulation purposes, the supply voltage VDD is
set to 1.8 V, the capacitor C to around 1 pF, the current
IREF0 and IB to 1.2 μA, while the referent voltages VREFhigh
and VREFlow are equal to 1.2 V and 0.6 V, respectively. The
factor κ is approximately set to 1. Considering everything
MIPRO 2016/MEET
Basic topology (1) Improved topology (2)
Figure 7. Absolute frequency error after trimming vs. the temperature,
plotted for 5 process corners of the basic (1) and the improved (2)
topology.
mentioned before, it is clear from (1) that the resulting
frequency will be somewhere around 1 MHz, not counting
the effects of the comparator delay.
Shown in Fig. 7, the simulation results of the calibration procedure demonstrate the improvement. The results
are plotted over the temperature range from -40 to 125 ○C,
with 5 different process corners considered for both topologies. The basic topology exhibits the absolute frequency
error from -0.9% to 0.4% in the complete temperature
range. The error falls down to a range from -0.05% to
0.2% with the improved topology. That corresponds to the
error reduction of more than five times. Moreover, the
improved topology exhibits almost ideal behavior in the
temperature range from -40 to 75 ○C, with the absolute
frequency error around ±0.05%. The frequency error for
the improved topology is shown separately in Fig. 8.
V.
CONCLUSION
A new topology for the improved calibration technique of the relaxation oscillators is proposed. Based on
the analytical calculations, the method for neutralizing the
comparator delay is developed. The proposed technique is
verified with the simulations in ams 0.18μ CMOS process.
MIPRO 2016/MEET
Figure 8. Absolute frequency error after trimming vs. the temperature,
plotted for 5 process corners of the improved topology.
The results reveal the possibility to achieve a low-power,
full on-chip solution, having the accuracy comparable to
the accuracy of crystal oscillators.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Y. Tokunaga, S. Sakiyama, A. Matsumoto, S. Dosho, “An on-chip
CMOS relaxation oscillator with voltage averaging feedback,” J.
Solid-State Circuits, vol.45, no. 6, pp. 1150-1158, Jun. 2010.
T. Tokairin, et al., “A 280 nW, 100 kHz, 1-cycle start-up time, onchip CMOS relaxation oscillator employing a feed-forward period
control scheme,” in Proc. Dig. Symp. VLSI Circuits, pp. 16-17,
Jun. 2012.
K.-J. Hsiao, "A 32.4 ppm/°C 3.2-1.6V self-chopped relaxation
oscillator with adaptive supply generation," in VLSI Circuits
Symp. Dig. Tech. Papers, pp. 14-15, Jun. 2012.
A. Vilas Boas, A. Olmos, “A temperature compensated digitally
trimmable on-chip IC oscillator with low voltage inhibit
capability,” in Proc. IEEE Int. Symp. Circuits and System
(ISCAS), vol. 1, pp. 501-504, Sep. 2004.
Y. H. Chiang, S. I. Liu, “A submicrowatt 1.1-MHz CMOS
relaxation oscillator with temperature compensation,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 60,
no. 12, pp. 837-841, Oct. 2013.
K. Sandaresan, P.E. Allen, and F. Ayazi, “Process and temperature
compensation in a 7-MHz CMOS clock oscillator,” IEEE J. SolidState Circuits, vol. 41, no. 2, pp. 433-441, Feb. 2006.
67
A Bootstrap Circuit for DC–DC Converters with
a Wide Input Voltage Range in HV-CMOS
N. Mitrovic, R. Enne and H. Zimmermann
Institute of Electrodynamics, Microwave and Circuit Engineering
Vienna University of Technology
Vienna, Austria
e-mail: natasa.mitrovic@tuwien.ac.at, reinhard.enne@tuwien.ac.at and horst.zimmermann@tuwien.ac.at
Abstract - Bootstrap circuits are essential parts of integrated
DC–DC converters with an NMOS transistor as high-side
switch, which provide voltage overdrive for the gate drivers
and to drive the high-side switch’s gate. This paper presents
a bootstrap circuit which doesn’t need an additional supply
voltage for charging the bootstrap capacitor to a desired
voltage level, but uses the input voltage of the converter.
Other advantages of this circuit are proper operation over a
wide range of duty ratios and input voltages (from 7.5 V to 45
V). The circuit is designed in 0.18 µm 50 V high-voltage (HV)
CMOS technology as part of a buck DC–DC converter for a
high output power. Post layout simulations show a voltage dip
lower than 80 mV when the high-side NMOS transistor is
switched on and the power loss of the bootstrap circuit equal
to 160 mW for nominal conditions (input voltage 36 V, duty
ratio 15%, bootstrap voltage 4.8 V, input power of drivers
115 mW). The layout dimensions of the bootstrap circuit are
96 µm × 251 µm.
I.
INTRODUCTION
Switch mode step down DC–DC converters are widely
used due to their high efficiency even at high input power,
compared to low-drop-out (LDO) regulators. The use of an
NMOS transistor as high-side (HS) switch is preferable
rather than having a PMOS HS switch, since it can achieve
low on-resistance with usually three times smaller
transistor width compared to a PMOS. This means that
considerably less area is needed for large-area high-voltage
transistors, and, more important, better efficiency. The
main disadvantage is that a bootstrap circuit together with
the bootstrap capacitor has to be implemented. The
bootstrap circuit has to provide enough energy to the
bootstrap capacitor so it can properly supply the HS drivers
to switch the HS NMOS correctly.
The conventional bootstrap circuits consist of an
external diode connected to an additional voltage supply
and a capacitor, which is not area efficient and can be
potentially a source of instability. Depending on the
technology, a bootstrap circuit together with smaller
bootstrap capacitor can be implemented on chip, but
additional voltages must be provided [1]-[5].
This paper presents a bootstrap circuit which is supplied
from the input voltage of the converter, so the need for
additional high voltage sources is eliminated. The voltage
The authors would like to thank the Austrian BMVIT via FFG, the
ENIAC Joint Undertaking and AMS AG for financial funding in the
project eRAMP.
68
Figure 1. Structure of the buck converter
on the bootstrap capacitor is not determined by the input
voltage, which represents one big advantage, so this
topology of the circuit is easily adjustable to fit the
requirements of wide range DC–DC converters
specifications.
I.
CONVERTER OVERVIEW
Fig. 1 illustrates the simplified structure of the
investigated buck converter. The chip design is divided into
high-side (HS) switch transistor MH and low-side (LS)
switch transistor ML with drivers DRH and DRL,
respectively, and regulation loop. The switching node (SN),
whose potential is denoted as VSSH, is connected to the offchip smoothing inductor (LSM) and capacitor (CSM). The
drivers for HS and LS switching transistors, are cascade
connected tapered inverters. During switching on the power
transistors, the drivers have to inject a significant amount
of charge into the gates of the power transistors in order to
enable their fast turn on and to minimize the switching
losses. The regulation loop is based on a currentprogrammed controller, i.e. peak current-mode control is
used, which determines and controls the duty cycle of the
converter.
All blocks in the dotted box are realized in an N-well
isolated 0.18-μm high-voltage CMOS technology. The
circuit is designed for 36 V input voltage and 5.5 V output
voltage with 1 A output current. An external inductor and
capacitor are used as output filter.
MIPRO 2016/MEET
A. Power MOSFETs sizing
Main losses of the converter are usually caused by
drivers of the switch transistors and their on-resistance [6].
The power loss in the gate drivers is given by (1), where
eG,MH, eG,ML are the energy per transistor width needed to
switch on the switch transistors, wMH, wML are the widths
of the power MOSFETs and f is the switching frequency.
Equation (2) represents the conduction losses of MH and
ML, assuming continuous conduction mode (CCM), where
ron,MH and ron,ML represent the on-resistances related to the
widths, I is the mean output current and δ is the duty ratio.
𝑃𝑑𝑟𝑖𝑣𝑒𝑟 = 𝑓 ∙ (𝑒𝐺,𝑀𝐻 ∙ 𝑤𝑀𝐻 + 𝑒𝐺,𝑀𝐿 ∙ 𝑤𝑀𝐿 )
(1)
𝑟𝑜𝑛,𝑀𝐻
𝑟𝑜𝑛,𝑀𝐿
(1 − 𝛿)]
𝑃𝑟𝑒𝑠 = 𝐼 2 ∙ [
∙𝛿+
𝑤𝑀𝐻
𝑤𝑀𝐿
(2)
By minimizing the sum of these two equations,
relations for the transistor widths is given in (3) and (4):
𝑤𝑀𝐻 = 𝐼 ∙ √
𝛿 ∙ 𝑟𝑜𝑛,𝑀𝐻
𝑓 ∙ 𝑒𝐺,𝑀𝐻
(3)
Figure 2. Circuit diagram of proposed bootstrap circuit
II.
(1 − 𝛿) ∙ 𝑟𝑜𝑛,𝑀𝐿
𝑤𝑀𝐿 = 𝐼 ∙ √
𝑓 ∙ 𝑒𝐺,𝑀𝐿
(4)
After replacing the values obtained from simulations
and values extracted from the specifications of the
converter, the width of the MH transistor is calculated to be
wMH = 7 cm, and the width of ML is wML = 14 cm. Drivers
are designed in such a manner that the last inverter in the
chain can drive the gate capacitance of the switching
transistor, therefore the tapering factor and the number of
stages of the inverters are determined relative to the width
of the power transistors.
B. Sizing of the bootstrap capacitor
From simulations the electrical charge of QG = 5 nC
needed for HS MOSFET to switch on is calculated by
integrating its gate current during its transition time
between off and on state. This charge has to be provided
by the bootstrap capacitor, by releasing some charge into
the DRH driver which redirects it to the MH gate. The
capacitance value of the bootstrap capacitor CBS is
determined by setting the voltage drop of VBC being
tolerable for this charge loss. In this case, the initial voltage
VBC,1 is 5 V, the voltage drop was set to be 100 mV, and the
capacitance CBS was calculated using the formula for the
stored charge of a capacitor, Q = C·V. The initial charge
stored in the bootstrap capacitor is Qstored,1 = CBS·V BC,1 and
after discharge we have Qstored,2 = CBS·V BC,2, VBC,2 = 4.9 V,
where QG = Qstored,1 – Qstored,2. The bootstrap capacitance is
then calculated:
𝐶𝐵𝑆 =
𝑄𝐺
= 50𝑛𝐹.
𝑉𝐵𝐶,1 − 𝑉𝐵𝐶,2
(5)
Since an increase in power consumption of the
produced chip compared to the simulated design is
expected, the finally chosen bootstrap capacitor is double
the calculated value, i.e. CBS = 100 nF.
MIPRO 2016/MEET
CIRCUIT IMPLEMENTATION
Fig. 2 shows the bootstrap circuit. The core of this
circuit topology consists of the high voltage NMOS
transistor M4, diodes D1 to D9 and the current mirror M1–
M2. The drain current of transistor M4 directly charges the
bootstrap capacitor. The stack of diodes, D1 to D9,
indirectly defines the maximum achievable bootstrap
capacitor voltage, which can be represented as 9·VD –
Vth,M4 where Vth,M4 is the threshold voltage of M4. The high–
voltage NMOS transistor M3 works as a current source,
providing the bias current for the current mirror M1–M2.
The current of the transistor M3 is determined by the size of
the transistor and VBIAS voltage, which can be any lowvoltage form DC source that is already used by other
regulation circuits or the band gap voltage which is usually
used as the reference voltage in the regulation loop of the
converter.
When the MH is turned on, switching node SN is on
high potential, close to input voltage of converter, VIN.
Transistor M4 is off and charging of the bootstrap capacitor
is disabled. Only the transistors M1 and M3 conduct very
small referent current, which can’t be mirrored due to the
high potential of node VG4. Capacitor C1 keeps the gate
potential of the M4 during it’s off period, i.e. it holds the
VGS4 voltage positive and prevents damages to the
transistor. Also, diode D10 has significant role during this
interval, by blocking reverse current of M4, i.e. preventing
M4 to discharge the bootstrap capacitor. Similar function
has the diode D11 which prevents C1 discharging through
transistor M2.
In the opposite interval, when the ML power switch is
conducting, charging of the bootstrap capacitor takes place.
Transistor M2 is now conducting, mirroring the current of
M1. One part of the M2 current serves to charge capacitor
C1. The diodes D1–D9 carry current, and they conduct the
other part of the current of M2. The gate-source voltage of
M4 is now positive and above the threshold voltage, so M4
conducts significant current which charges the bootstrap
capacitor.
69
Figure 5. Power loss and bootstrap voltage dependence on the input
voltage (duty ratio 0.15)
C1 and diodes D1-D9, Fig 3b. At first, most of the current is
flowing into the capacitor C1, so the voltage VG4 is rising,
causing then raise in diodes current, which in the second
part of the interval becomes dominant. The voltage dip of
the voltage VBC, defined like the difference between
voltages when MH is off and on, is very low, and it is in the
range of 80–100 mV, which can be seen in Fig 3c.
The layout of the bootstrap circuit is given in Fig. 4. The
capacitor C1 which had to be divided into two capacitors
connected in series, because of the technology rule about
maximum allowed voltage between terminals, occupies
most of the area of the circuit.
III.
Figure 3. a)-c) Transient waveforms of the voltage and currents of the
bootstrap circuit: a) IM4 – drain current of the M4 transistor, VSSH and
VDDH – potentials at the bootstrap capacitor terminals; b) IM2 – source
current of the M2 transistor, IC1 – current of the capacitor C1, ID1 –
current of the diode D1; c) VBC – votage across the bootstrap capacitor.
Transient simulation traces for nominal conditions of
the most important signals are shown in Fig. 3. During the
time when MH is conducting, voltages VSSH and VDDH are
at high potential, so the bootstrap circuit is inactive, M4 is
not conducting, which is clearly visible on the waveform,
Fig. 3a. When MH is switched off, the drain current of M4
is increasing due to raise of its gate voltage, and the
bootstrap capacitor is being charged. It also shows how the
M2 current is distributed over the time between capacitor
SIMULATION RESULTS
Post layout transient simulations of the circuit were run
with varying two parameters: input voltage of the converter
and duty ratio. In both cases two outcomes are interesting
to monitor: the final voltage across the bootstrap capacitor,
VBC, and the power loss of the bootstrap circuit. All the
simulations are performed at switching frequency of
1MHz.
The goal for the final voltage across the bootstrap
capacitor is set to be 4.5 V for two reasons. Firstly, this way
the proper operation of the drivers is guaranteed, so the
reliability that they can always switch is high. The other
reason is reduction of the conduction losses of MH
transistor, since when it is conducting, its gate-source
voltage is equal to the VBC. The goal could have been put
even lower, since the drivers can operate properly even with
the VBC voltage as low as 2.5 V and the threshold voltage
of the MH is 0.7 V, at the cost of efficiency and reliability.
In Fig. 5 it can be observed that power losses are
strongly correlated with the input voltage of the converter,
with almost linear dependency. The main percentage of the
circuit losses is due to the fact that M4 and D10 conduct
during the period when the switching node is at low
potential, so current of some milliamps at such large
voltage difference will make significant loss due to the 36
V input voltage, which is the target input voltage of this
converter. Even so, the loss of the circuit which is equal to
160 mW at this operating condition is acceptable in
comparison to the 5.5 W converter output power, i.e. the
loss of the bootstrap circuit will reduce the efficiency of the
whole converter by less than 3%.
Figure 4. Layout view of the bootstap circuit
70
MIPRO 2016/MEET
directly related to the input voltage, so the circuit can be
used for a wide range of converters. Also, the calculation of
the widths of the HS and LS power switch transistors is
shown, based on which then the value of the bootstrap
capacitor capacitance is calculated. The voltage across the
bootstrap capacitance being larger than 4.5 V in the realized
circuit is obtained for input voltages already above 7V and
duty ratios smaller than 55%. Simulations show also good
stability of the bootstrap voltage when the input voltage and
the duty ratio are varying.
ACKNOWLEDGMENT
Figure 6. Power loss and bootstrap voltage dependence on the duty
ratio (input voltage 36 V)
The maximum achievable bootstrap voltage is mostly
limited by the input voltage, since it can be roughly defined
as VBC = Vin-VD-VOV, where VD is diode forward voltage
and VOV is the overdrive voltage of M4 transistor. So, as it
can be noticed form the graph, there is a minimum input
voltage of 7 V required for the bootstrap circuit to operate
properly. On the other hand, when the input voltage is
bigger than 7 V, the dependence of the bootstrap voltage on
the input voltage is negligible, since it is then defined by
the forward voltage of the diodes and the gate-source
voltage of M4.
The bootstrap voltage VBC is very dependent on
variation of the duty ratio, Fig. 6, since it represents the time
interval in which the capacitor is not charged but in which
it is being discharged. The power loss of the circuit is not
so highly related, but it shows a slight increment with the
increase of the duty ratio, which is a consequence of a lower
value of VBC and then voltage VDDH, so the voltage drop
across M4 and D10 is bigger.
IV.
CONCLUSION
This paper presents a bootstrap circuit for DC–DC
converters which is supplied from the input voltage of the
converter. The voltage value across the capacitor is not
MIPRO 2016/MEET
The authors would like to thank A. Steinmair and F.
Schrank from AMS AG in Unterpremstätten, Austria, for
technical support. The work has been performed in the
project eRamp (Grant Agreement N°621270), co-funded
by grants from Austria and the ENIAC Joint Undertaking.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Seidel, A.; Costa, M.; Joos, J.; Wicht, B., “Bootstrap circuit with
high-voltage charge storing for area efficient gate drivers in power
management systems, ” in Proc. 40th European Solid State Circuits
Conf., ESSCIRC 2014, Sep. 2014., pp.159-162.
Seidel, A.; Costa, M.S.; Joos, J.; Wicht, B., “Area Efficient
Integrated Gate Drivers Based on High-Voltage Charge Storing, ”
in Solid-State Circuits, IEEE Journal of, vol.50, no.7, pp.15501559, July 2015.
Xu, J.; Lin Sheng; Xianhui Dong, “A novel high speed and high
current FET driver with floating ground and integrated charge
pump, ” in Proc. 2012 IEEE Energy Conversion Congress and
Exposition (ECCE), Sep. 2012., pp.2604-2609.
K. Abe, K. Nishijima, K. Harada, T. Nakano, T. Nabeshima, and T.
Sato, “A novel three-phase buck converter with bootstrap driver
circuit,” in Proc. Power Electronics Specialists Conf., Jun. 2007,
pp. 1864–1871.
M. Huque, R. Vijayaraghavan, M. Zhang, B. Blalock, L. Tolbert,
and S. Islam, “An SOI-based high-voltage, high-temperature gatedriver for SiC FET,” in Proc. IEEE Power Electronics Specialists
Conf., PESC 2007, Jun. 2007, pp. 1491–1495.
R. Enne and H. Zimmermann, “An integrated low-power buck
converter with a comparator controlled low-side switch,” in Proc.
IEEE 13th Int. Symp. Design Diagnostics Electron. Circuits Syst.,
2010, pp. 84–87.
71
A Fractional-N Subsampling PLL based on a
Digital-to-Time Converter
N. Markulic1, 2, K. Raczkowski1, P. Wambacq1, 2 and J. Craninckx1
1
imec, Leuven, Belgium,
Vrije Universiteit Brussel, Brussels, Belgium
2
Abstract - The paper presents a subsampling PLL which uses
a 10-bit, 0.5 ps unit step Digital-to-Time Converter (DTC) in
the phase-error comparison path for the fractional-N lock.
The gain and nonlinearity of the DTC can be digitally
calibrated in the background while the PLL operates
normally. During fractional multiplication of a 40 MHz
reference to frequencies around 10 GHz, the measured jitter
is in the range from 176 to 198 fs. The worst measured
fractional spur is -57 dBc and the in-band phase noise
performance of the PLL is −108 dBc/Hz. The presented
analog PLL in advanced 28 nm CMOS achieves a figure-ofmerit (FOM) of -246.6 dB that compares well to the recent
state-of-the-art.
I.
INTRODUCTION
Frequency synthesizers, typically implemented as
phase-locked-loops (PLLs) are omnipresent building
blocks used for local oscillator (LO) generation in radio
frequency (RF) communication, accurate clock generation
in digital circuits, clock recovery, etc. In wireless
transceivers, they serve for up/down conversion of the
baseband data. A very peculiar aspect of a wireless LO
synthesizer is its phase noise and spurious performance.
Namely, system level performance in, both, the receive
(RX) and transmit (TX) chain are fundamentally limited by
the LO phase noise. For example, the limit in error vector
magnitude (EVM) in an RX and a TX for high-order
modulation schemes is limited by the LO in-band phase
noise. In an RX, adjacent channel interferers are
reciprocally mixed down onto the desired signal by the LO
phase noise and spurs. Moreover, the TX spectral output
mask in the receive band is limited by TX LO far-out phase
noise, etc. A considerable amount of energy and chip area
is therefore typically spent to guarantee low phase-noise
LO operation.
The analog subsampling PLL [1] introduced in 2009,
shows, even today, an unparalleled synthesizer efficiency
amongst the CMOS frequency synthesis state-of-the-art:
the lowest integrated phase-noise (i.e. RMS jitter) vs.
power consumption. The extreme phase-error detection
gain of this architecture reduces the in-band phase-noise
and, thanks to that, allows wide bandwidths for efficient
VCO noise filtering. In this way, a subsampling
architecture achieves the PLL “utopia” in which the output
phase-noise is mainly dominated by the reference noise [1].
At the same time, power consumption is reduced thanks to
the divider-less operation. However, the architecture’s
inherent integer-N operation prevents the adoption of this
approach in practical wireless transceivers.
Within the fractional-N PLLs, the all-digital systems
(for example [2], [3]) show potential of achieving
72
performance similar to the subsampling PLL, although
often at the cost of large power consumption of the time-todigital converter (TDC). Moreover, to achieve highest
performance, these systems depend on increasingly
complex and power consuming calibration techniques.
We propose a solution that enables fractional-N
operation of a subsampling PLL [4,7]. Instead of measuring
the fractional residue phase-error with a TDC, we recognize
that this error is known a priori and can be compensated for
by a Digital-to-Time converter (DTC). In this manner, we
are able to achieve fractional-N lock, while retaining the
key benefits of subsampling operation. The mixed-signal
solution that we propose takes advantage of nanoscale
CMOS and is not limited by its analog performance. Phaseerror detection is done in essence by a switch and a
capacitor that benefit from scaling and the charge
pump/transconductor can be very simple since, in a
subsampling PLL, their noise and linearity do not impact
the overall system performance [5].
II.
FRACTIONAL-N OPERATION OF THE SUBSAMPLING
PLL ENABLED BY A DTC
The integer-N subsampling PLL [1] operates by
sampling the (differential) voltage-controlled oscillator
(VCO) sinusoid with a repetition rate set by the reference
frequency (Figure 1). In a phase-locked state the sampling
happens precisely at the zero crossings of the differential
sinewave. Any deviation from the timing of the zero
crossing results in a non-zero voltage being sampled, which
in turn is converted to a correction current fed to the loop
filter. Because the sampling events occur only at the edges
of the reference, the VCO zero crossings are aligned to
these edges and the VCO produces a frequency which is an
exact integer-N multiple of the reference.
The basic subsampling PLL cannot synthesize
fractional-N frequencies, because it lacks any phase
modulation mechanism in the loop (for example a divider).
We therefore add a DTC in the reference path of the PLL
as depicted in Figure 1. The DTC is used to control the
exact moment of sampling, such that it always falls on the
expected zero crossing of the VCO, even for non-integer
ratios between the VCO frequency and the reference clock.
A simple example with a fractional-N multiplication
that differs from integer-N by 0.25 is shown in Figure 1. In
the first cycle the sampling edge appears at the same
moment as in the integer-N mode. Then, in the second
cycle, the sampling edge is delayed by 0.25*Tvco after the
reference edge. In the third cycle, the sampling edge is
delayed by 0.5*Tvco, then by 0.75*Tvco. Finally, in the
MIPRO 2016/MEET
fifth cycle, the sampling should happen 1*Tvco after the
reference edge, however, simply skipping a VCO cycle and
sampling at the original reference edge yields to the same
effect. Since the required PLL frequency and the reference
frequency are known, it is possible to calculate the position
of any following zero crossings with absolute precision.
The DTC should cover at least 1 VCO period of delay.
The digital computation of the necessary phase
adjustment, i.e. of the delay that needs to be inserted in the
reference path is depicted in Figure 2. The difference
between the multiplication factor N and the integerquantized value is extracted first. A ∆Ʃ modulator is used
to generate the integer quantization of N so that the signal
Diff is a zero-mean stream accumulated with the desired
“phase wrapping” behavior. The accumulated value Acc is
then appropriately scaled on the available DTC input
quantization range.
III.
IMPLEMENTATION CONSIDERATIONS
If a fractional-N subsampling PLL, as described in the
previous section, were implemented with an ideal DTC, it
would have the same performance as an integer-N
subsampling PLL. This lies in stark contrast to the case of
a traditional mixed signal ∆Ʃ PLLs, where there is an
unavoidable penalty associated with the modulation
through the divider. Any practical implementation of the
fractional-N subsampling PLL system will, however, be
limited by the limitations of the DTC.
Figure 1: Fractional-N subsampling PLL operation.
Figure 2: DTC input calculation path.
MIPRO 2016/MEET
A. DTC quantization error
A DTC has finite resolution. To scale the output of the
accumulated phase error to a digital tuning code, the output
of the accumulator Acc in the Figure 2 needs to be
𝑇
multiplied by a factor (𝐿𝑆𝐵 𝑅𝐸𝐹
. The sampling
)
𝐷𝑇𝐶 ∙𝑁𝐹𝑅𝐴𝐶
moments hence occurs with accuracy limited by the LSB
of the DTC and the resulting error is fed into the low pas
filter (LPF), thereby instantaneously modulating the VCO,
creating spurs.
System level simulations show that choosing a DTC
LSB of 0.5 ps ensures that the quantization noise appears
below other loop noise. Moreover, a second ∆Ʃ modulator
(Figure 2) is used in front of the DTC to shape the
associated quantization noise beyond the PLL bandwidth.
Thanks to the fact that the stream is perfectly accurate on
average, the average PLL frequency is also accurate, with
no visible modulation.
Another modification to the basic system that helps to
mitigate the problem of limited DTC resolution, is to use a
MASH modulator in the beginning of the delay
computation path (initial ∆Ʃ modulator in Figure 2). A
MASH modulator provides better randomization of the
generated code, which helps with reducing spurious
content. Compared to a first-order ∆Ʃ, the generated codes
have a larger range, which results in a larger delay range
of the DTC. For this reason we choose a 10-bit DTC
implementation. By generating delays larger than one
VCO period, it is possible to effectively de-color the
sampling data. Moreover, randomizing DTC codes
provides an effect similar to dynamic element matching.
Since e.g. four DTC codes are used in MASH 1-1-1 mode
to generate the same effective sampling phase, the
apparent DTC nonlinearity is randomized.
B. DTC gain error
DTC gain can be defined as the amount of delay per
least-significant bit (LSB) code. The DTC is analog in its
nature and susceptible to PVT variations, hence its
absolute gain is unknown and varying with time and
temperature. Gain error in the delay steps introduces
problematic spurs in the spectrum of the PLL. Automatic
background calibration which tracks the gain variations
becomes therefore absolutely necessary.
An automatic DTC gain calibration can be designed
similarly to the popular least-mean-square (LMS) based
mechanisms used in digital PLLs [6] (Figure 3). Simply
stated, it is possible to extract the sign of the sampled
voltage and correlate it with a change in direction of the
DTC word. Intuitively, if the modulator “tells” the DTC to
sample later, but due to a gain error “early” samples get
consecutively detected — it is possible to deduct that the
DTC gain is too low. After accumulation, the correction
word can be applied as a scaling factor to the computation
path of Figure 2. When the correction loop converges,
there is no penalty on phase noise. Figure 4 shows a
simulation result where a 10% gain error was applied to
the DTC. This error introduces a large ripple in the
sampled voltage, which in turn results in large spurs at the
output of the PLL. After the DTC gain is corrected, the
sampled voltage converges back to zero.
73
Figure 3: DTC gain digital background calibration.
Figure 4: DTC gain background calibration simulation with 10% gain
error on the DTC.
Figure 5: DTC INL calibration loop [7].
C. DTC nonlinearity
The DTC nonlinearity, will naturally increase spurious
content at the output of the PLL. Many techniques for
improving linearity which are present for digital-to-analog
converters (DACs) also apply for the DTC. For example,
careful layout of the tuning element is of highest priority.
Advanced nanometer scale technologies offer a significant
advantage in this regard, thanks to ever-improving
lithography resolution.
Nevertheless, to ensure linearity of phase-error
detection path, we employ DTC nonlinearity calibration
loop. The calibration is based on Error Sign signal
observation, which represents the sign of the instantaneous
phase error. This 1-bit phase error is random and zero on
74
average in a linear system, but in presence of DTC INL it
becomes “colored” by its nonlinearity. Essentially, we
exploit the correlation between the Error Sign signal and
the particular DTC input code which induced it, to restore
the INL curves and pre-compensate for them. A look-uptable (LuT) with a set of coefficients c(0:k-1) is used to
approximate the 10-bit DTC INL curve (where k is 32). In
every clock cycle, the input code addresses two
neighboring LuT coefficients which piece-wise linearly
approximate the expected, instantaneous INL error. The
INL compensation value is simply subtracted from the
original code, which ideally forces the DTC to produce no
error (zero mean Error Sign) for the given code. The LuT
correction coefficients are updated gradually, by
integrating the scaled Error Sign value to the appropriate
address (defined with the input code) in every cycle. When
the calibration is initialized, the LuT is reset to zero. While
the PLL operates, the coefficients c(0:k-1) slowly change
towards their optimal calibrated values that cancel the INL
error. At this moment, the coefficient updating can be
disabled. The algorithm convergence speed is determined
by the tap gain G, where a typical value of 2 -13 results in
10 ms approximate calibration time. The DTC INL
calibration loop is enabled after the gain error is corrected.
An important detail is that the offset in the extracted error
sign needs to be digitally compensated for the loop to
converge.
D. DTC phase-noise
In this paper we propose a solution to enhance an
integer-N subsampling PLL by placing a phase modulator
(DTC) in the path of the reference. Unfortunately, the
phase noise contribution of the DTC adds directly to the
phase-noise of the reference. Ultimately, the in-band phase
noise of the subsampling PLL is limited by the phase noise
of both the reference and the DTC, since both pass the
system in the same way. Therefore, great care must be
taken to minimize the DTC's contribution to phase noise,
otherwise the unique phase noise advantages of the
subsampling architecture will be lost. Here, scaling of
CMOS technology is again on our side, since transistors
are getting faster with every node, reducing jitter and phase
noise.
IV.
CIRCUIT IMPLEMENTATION
The subsampling phase locked loop can only detect
phase error, which makes it susceptible to false locking at
any N. Therefore, a frequency acquisition loop is required
in addition to the subsampling loop [1] (see Figure 6). A
conventional PLL easily fulfills this requirement. It can be
disabled once frequency has been acquired in order to save
power.
Common to both frequency and phase acquisition loops
are the low-pass filter (LPF) and the VCO. For the purpose
of demonstrating the concept of the fractional-N
subsampling PLL we have chosen the simplest LPF
design—a passive third-order lead-lag filter. Tunable
resistance in the LPF has been implemented to be able to
change the bandwidth of the PLL. Such a simple filter can
cause increase in reference spurs and is often avoided in
classical charge-pump-based PLLs. Spurious content can
MIPRO 2016/MEET
increase because the varying level of tuning voltage can
introduce mismatches between the currents of the charge
pump. In a subsampling PLL, however, any offset in
currents of is compensated by a slight modification of the
locking point Figure 7. A locked condition always means
zero output current of the transconductor(𝐺𝑀 ). If changes
to the output level cause an input referred offset of the 𝐺𝑀 ,
the PLL will adapt its phase to compensate for this offset.
A. Implementation of the Subsampling Loop
The subsampling loop consists of a VCO buffer, a
sampler and an 𝐺𝑀 . Additionally, the DTC provides the
required phase modulation. Figure 8 shows the circuits
along the subsampling path.
A VCO buffer is required in order to reduce the
kickback effect from the sampler to the VCO [5] and to
interface the signal levels between the blocks. In this test
chip, to accommodate for changing phase noise
requirements of a software-defined radio, we have
implemented a low-noise VCO that can be operated from
a variable supply as high as 1.8 V. Therefore, the input
buffer needs to convert the level between the high voltage
VCO domain (max. 1.8 V) and the core domain (0.9 V).
The buffer is implemented with a tunable capacitive
attenuator and a source follower pair (Figure 8). The
tunable attenuator is built with metal-oxide-metal (MOM)
capacitors and provides additional tuning of loop gain. The
buffer is also the largest contributor to power consumption
in this loop, as it needs to process a GHz-range signal.
The sampler is built around an NMOS switch and a
small MOM capacitor. In total, taking into account the
input capacitance of the 𝐺𝑀 , the sampling capacitance is
20 fF. Thermal kT/C noise can be neglected because it is
already suppressed by the large detection gain. The
implemented sampler uses an auxiliary sampler operating
in inverted phase to the primary sampler in order to reduce
load variability of the VCO.
Figure 6: Complete system overview.
Figure 8: Simplified block diagram of the subsampling loop.
Since the implemented VCO can operate from the IO
voltage (1.8V), the tuning voltage also has a range larger
than the core voltage. Therefore, the output stage of the 𝐺𝑀
needs to provide translation from the low voltage domain
of the sampler to the high voltage domain of the LPF and
the VCO. Identically to [1], the phase-error detection gain
is so large that duty-cycling is required in the output stage
of the 𝐺𝑀 . Pulsing is done with a simple digital pulse
generator that opens the output switches of the 𝐺𝑀 .
An important part of the system is the background
correction of DTC gain and nonlinearity. As said earlier,
the error signal from within the PLL is present in the sign
of the sampled voltage. However, this is true only if no
mismatches are present in the system. If there are any
mismatches in the phase detection circuitry the PLL will
adjust the locking phase (and sampled voltage) so that the
output current of the 𝐺𝑀 is zeroed (Figure 7). Therefore,
the gain correction mechanism requires detection of the
sign of output current. Using a simple clocked comparator
to detect the sign of the swing in relation to Vtune voltage
is sufficient to obtain information about the sign of the
output current.
B. Implementation of the Digital-to-Time Converter [8]
Since the DTC is at the input of the system, its phase
noise is multiplied by a square of the PLL multiplication
number when transferred to the output (here: 48 dB as N =
250). On top of that, any kind of non-linearity present in
the phase error comparison path leads to potential noise
folding or spurs [9].
From the PLL system perspective, we target a 10-bit
DTC, with a 0.5 ps unit step. This delay range covers
multiple VCO periods allowing operation with the thirdorder MASH 1-1-1 modulator. Because the 0.5 ps step is
very small and we know from system simulations that the
PLL is sensitive to its disturbance, we can suspect that the
DTC needs a good isolation from any noise coming from
the supply.
Implementation of the delay generator is shown in
Figure 9 [8]. The first inverters in the chain serve as an
input buffer towards the delay circuit loaded with a tunable
MOM capacitance 𝐶𝐿 .
Figure 7: The subsampling PLL always locks into a state that guarantees
zero output current, even in presence of offset and mismatch.
MIPRO 2016/MEET
75
capacitor, and not from the VDD. The dip in the regulated
supply voltage is suppressed by the gain of the current
source before reaching the top supply. The dynamic charge
flow is in this way kept within the structure itself.
Figure 9: Implementation of the DTC.
To suppress mismatch-based errors for the chosen unit
size, the capacitor array employs a 5 bit
binary/thermometer segmentation. With the unit capacitor
size of 3 fF, this ensures statistical DNL errors below 0.5
LSB. The 10-bit array is placed in a common centroid
layout to avoid systematic nonlinearities. Only the high-tolow transition of the 𝑉𝑋 voltage is important, because the
subsampling loop reacts only on closing of the sampling
switch. One could realize discharging of the load
capacitance using a simple NMOS transistor, however,
this would lead to an excessive 1/𝑓 noise contribution,
which would dominate the PLL phase noise. To reduce this
effect we introduce a resistor above the NMOS. The
exponential discharging is determined then by the
corresponding RC time constant. The delay is, however, a
linear function of capacitance. A resistor sets the
discharging slope and hence contributes to the output
phase noise, however, it generates no 1/𝑓 noise.
Furthermore, any supply ripple coming from the preceding
buffer only modulates the NMOS switch resistance which
is an order of magnitude smaller than the discharging
resistor and does not affect delay. The phase noise level
introduced by the delay generator can be derived as
ℒ 𝑤ℎ𝑖𝑡𝑒 ~10 log ( 𝑓𝑜𝑢𝑡
𝑘𝑇 𝐶𝐿𝑜𝑎𝑑 𝑅2
2
𝑉𝐷𝐷
) ~ 10log ( 𝑓𝑜𝑢𝑡
𝑘𝑇 𝜏𝑑𝑒𝑙𝑎𝑦 𝑅
2
𝑉𝐷𝐷
),
where R is the resistor value and VDD is the supply voltage
of the delay element and fout is the output frequency.
Based on (1) and the targeted minimal delay step of 0.5 ps
we size the R=180 Ohm and C = 3 fF to lower the noise of
this stage to 160 dBc/Hz for maximal delay. The RC delay
control block is followed by a CMOS inverter serving as a
comparator to restore steep slopes. This circuit toggling
moment is unfortunately dependent on the input slope
shape which degrades the linearity of a high range DTC.
Care must also be given to the fact that regeneration of the
RC-delayed slope is most vulnerable to supply
modulation. A tunable regulated supply shown in Figure 9
is used to protect the supply of the comparator and the
following buffer. The regulated supply consists of a
constant current source biasing a diode-connected
transistor. A capacitor of 4 pF is used for additional
decoupling of the regulated supply node. At the moment
of toggling, charge is instantaneously pulled from the
76
C. Implementation of the VCO [10]
The VCO is a thick-oxide NMOS cross-coupled core
with current limiting realized using a tunable resistor.
Simulations have shown that this architecture yields
favorable 1/f noise compared to the traditional currentsource-based architecture. The VCO has been designed to
meet the stringent GSM900 specification for out-of-band
phase noise. The inductor coil is created using a stack of
two top copper metals (each <1 μm thick) and an
aluminum redistribution layer. Digital tuning is realized
using a bank of NMOS-only switched capacitor cells. The
simulated Q of the tank reaches 18. The VCO is designed
to operate with a low drop-out linear regulator (not present
on chip) with a supply between 0.9V and 1.5 V, depending
on the required phase noise performance and available
power.
D. Frequency acquisition loop
The frequency acquisition loop has been implemented
with a chain of divide-by-2/3 circuits, a traditional 3-state
phase frequency detector (PFD), enhanced with a large
deadzone [1] following and a very simple charge pump.
The first stage of the divider is made with current mode
logic, since the VCO frequency can reach 12 GHz, but the
following stages of the divider are standard CMOS gates.
Once the frequency acquisition is complete, the loop
automatically becomes inactive thanks to the increased
dead-zone in the PFD and can be completely shut down,
saving power. In general, the loop components for both the
phase and the frequency acquisition loop can be made very
simple and do not require neither good precision, nor good
matching, nor low noise.
V.
EXPERIMENTAL RESULTS
The prototype IC was fabricated in TSMC 28nm bulk
digital CMOS technology, and its size is 0.77 mm2
(excluding IO ring). It operates on 0.9 V and 1.8 V supplies
(IO interface and the Gm stage).
Figure 10: Die microphotograph.
The measured power consumption is 5.6mW in total,
of which 1.8mW is for the loop components, 2.7mW for
MIPRO 2016/MEET
the VCO, and 1.1mW for digital circuitry that all runs on
reference clock of 40 MHz. The VCO tuning range is 10.1
– 12.4 GHz. The measured in-band phase noise around a
close-to-integer fractional 11.72 GHz carrier is -107.9
dBc/Hz (Figure 11). The measured RMS jitter is 198 fs
after calibration is enabled, with an integration range from
10 Hz to 40 MHz and all spurs (worst fractional) included.
The measured spurious performance is shown in Figure 12.
The worst fractional spur before calibration appears at -41
dBc but drops with 15.6 dB after calibration to -56.6 dBc.
The integer spur is at -69 dBc. The PLL achieves one of
the best reported FOMs in the recent state-of-the-art: 246.6 dB.
VI.
Table 1: Performance summary and comparison to the state-of-art.
CONCLUSION
A subsampling PLL is an architecture that offers
extremely low phase noise, however, in its original form it
is limited only to integer-N frequency multiplication. We
presented a fractional-N subsampling PLL which operates
based on a low-noise, low-quantization error DTC. The
DTC is enhanced by digital background calibration which
suppresses gain and nonlinearity issues. In this way we
enhance the original phase noise performance of an
integer-N subsampling PLL for fractional-N synthesis,
without drawbacks of additional noise folding and spurs.
[1]
X. Gao et.al., “A Low Noise Sub-Sampling PLL in Which
Divider Noise is Eliminated and PD/CP Noise is Not Multiplied
by N^2,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 12,
pp. 3253–3263, 2009.
[2]
C.-W. Yao et.al., “A low spur fractional-N digital PLL for
802.11 a/b/g/n/ac with 0.19 ps RMS jitter,” in VLSI Circuits
(VLSIC), 2011 Symposium on, 2011, pp. 110–111.
[3]
C.-W. Yao et.al., “A 2.8-3.2-GHz Fractional-Digital PLL With
ADC-Assisted TDC and Inductively Coupled Fine-Tuning
DCO,” 2013.
[4]
K. Raczkowski, et.al., “A 9.2-12.7 GHz wideband fractional-N
subsampling PLL in 28 nm CMOS with 280 fs RMS jitter,”
Solid-State Circuits, IEEE Journal of, vol. 50, no. 5, pp. 1203–
1213, 2015.
[5]
X. Gao et.al., “Spur reduction techniques for phase-locked
loops exploiting a sub-sampling phase detector,” Solid-State
Circuits, IEEE Journal of, vol. 45, no. 9, pp. 1809–1821, 2010.
[6]
D. Tasca et.al., “A 2.9-4.0-GHz Fractional-N Digital PLL With
Bang-Bang Phase Detector and 560,” Solid-State Circuits, IEEE
Journal of, vol. 46, no. 12, pp. 2745–2758, 2011.
[7]
N. Markulic et.al., “9.7 A self-calibrated 10Mb/s phase
modulator with -37.4dB EVM based on a 10.1-to-12.4GHz, 246.6dB-FOM, fractional-N subsampling PLL,” in 2016 IEEE
International Solid-State Circuits Conference (ISSCC), 2016,
pp. 176–177.
[8]
N. Markulic et.al., “A 10-bit, 550-fs step Digital-to-Time
Converter in 28nm CMOS,” in European Solid State Circuits
Conference (ESSCIRC), ESSCIRC 2014-40th, 2014, pp. 79–82.
[9]
S. Levantino, et.al., “An adaptive pre-distortion technique to
mitigate the DTC nonlinearity in digital PLLs,” Solid-State
Circuits, IEEE Journal of, vol. 49, no. 8, pp. 1762–1772, 2014.
[10]
B. Hershberg et.al., “A 9.1-12.7 GHz VCO in 28nm CMOS
with a bottom-pinning bias technique for digital varactor stress
reduction,” in European Solid State Circuits Conference
(ESSCIRC), ESSCIRC 2014-40th, 2014, pp. 83–86.
Figure 11: Measured phase noise at the PLL output.
Figure 12: Measured fractional spur level at the PLL output.
MIPRO 2016/MEET
77
Infrared Protection System for High-Voltage
Testing of SiC and GaN FETs used in DC-DC
Converters
Filip Hormot, Josip Bačmaga, Adrijan Barić
University of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia
Tel: +385 (0)1 6129547, Fax: +385 (0)1 6129653, e-mail: filip.hormot@fer.hr
Abstract—This technical paper presents the design and
testing of the protection system for evaluation of high-voltage
devices and circuits, e.g. SiC and GaN devices or high-voltage
switching DC-DC converters. The high-voltage testing area is
protected from invading by an array of infrared (IR) emitters
and detectors. Whenever an object passes through the IR
protected space, a high-voltage DC source is disconnected
from its power supply and harmful effects of high DC voltage
are avoided. The functionality of the developed system is
tested and its characteristic response timings are measured.
Index Terms—hazardous DC voltage, high-voltage
switch-mode power converter
I. I NTRODUCTION
The silicon FETs are being used for many years as
power devices in switching DC-DC converters. However,
power converters operating at high power levels and
voltages of several hundreds of volts are not achievable
using the existing silicon power devices [1]. Instead, the
silicon carbide (SiC) power devices are being used in
high-voltage power converters due to their high voltage
breakdown and thermal conductivity [2]. Furthermore,
gallium-nitride (GaN) FETs due to their extremely low
gate charge and output capacitance can operate at higher
switching frequencies [3] compared to the silicon FETs.
Increase of a switching frequency reduces the size of
external passive components which results in a higher
power density and efficiency of the DC-DC converter.
High-voltage SiC and GaN FETs used in DC-DC
converters are subjected to the voltage levels of several hundreds of volts during the evaluation of their
operation [4], [5]. If a human body comes into contact
with the device under test (DUT) during a high-voltage
testing procedure, a harmful effects of high voltage can
occur. The effect of a DC electric shock is determined
by the amplitude of the current through the human body
and the duration of the shock. The DC current amplitude
of 300 mA is considered as a safety limit for a human
body [6]. The DC voltage that will produce a current of a
300 mA through the human body depends on the human
body resistance [7]. It can be seen in Fig. 1 that if 50% of
human body serves as a current path, the DC voltage of
approximately 400 V will cause a maximum allowable
current of 300 mA through a human body. Therefore,
the DC voltages typically used to test the operation of
DC-DC converters based on SiC and GaN power devices
are hazardous for a human.
78
Fig. 1. Human body resistance as a function of body voltage [7].
In this paper, a protection system for high-voltage testing of SiC and GaN power switches in DC-DC converter
applications is presented. The system is based on infrared
(IR) detection of invasion into the high-voltage testing
area. If the testing area is invaded, the protection system
disconnects the high DC voltage supply from the DUT
and the harmful effects of a DC voltage to a human body
are avoided.
The overview of the developed infrared protection
system is shown in Section II. Section III presents the
evaluation of the protection system. Section IV concludes
the paper.
II. OVERVIEW OF THE P ROTECTION S YSTEM
A. Infrared Protection of the Testing Area
The top view of the high-voltage testing area is shown
in Fig. 2. The testing area is enclosed by the 700-mm high
fence attached to the 40-mm square tubes. The 820-mm
wide entrance monitored by the IR protection system is
left open in the fence for a user to arrange a measurement
set-up. The mechanical sketch of the IR protected space is
shown in Fig. 3. The 16 IR emitters [8] and 16 photodiodes [9] are uniformly distanced and placed in a “zigzag”
order for effective monitoring of the entrance to the testing
area. During the normal operation of the protection system,
the photodiodes receive IR signal generated by the IR
emitters. Once the IR protected space is invaded, the
MIPRO 2016/MEET
820
820
850
high-voltage
testing area
IR protected space
40×40
Fig. 2. Top view of the testing area. All dimensions are in milimeters.
W = 820
21.8
43.7
H = 700
14×43.7
21.8
Fig. 3. Mechanical sketch of the space monitored by the IR protection
system. All dimensions are in milimeters. The distances between the
diodes are rounded to the first decimal.
transmission of the IR signal between the IR emitters
and one or more photodiodes will be broken. This will
cause a decrease of the current through the non-illuminated
photodiodes large enough to produce a voltage difference
that is detected by the input stage of the electronic control
system.
B. Principle of Operation of the Protection System
The simplified schematic of the protection system
shown in Fig. 4 can be divided into three function blocks:
• input stage that detects an invasion into the testing
area,
• logic stage that sets the operation mode of the system
and indicates the invasion of the testing area,
• output stage that disconnects the high-voltage DC
supply from the DUT.
The input stage of the developed system consists of
a voltage divider with variable resistors R1 to R16
and reverse-biased photodiodes DP H 1 to DP H 16 . The
current through each of 16 reverse-biased photodiodes
IP H i (i = 1, 2, ..., 16) is larger than zero when all the
photodiodes are illuminated by the IR signal generated
by the IR emitters. If any of the photodiodes looses the
IR signal from the emitter, its current falls to zero. The
voltage across each of 16 photodiodes is:
VP H
i
= VDD − IP H i · Ri ,
i = 1, 2, ...16
(1)
where VDD is a 5-V supply, VP H i is the voltage across
the photodiode, IP H i is the current through the photodiode and Ri is the resistance of the corresponding
MIPRO 2016/MEET
variable resistor. When the photodiode is not illuminated
by the IR signal, the voltage across it equals VDD . The
voltage acrosss the photodiode decreases as the photodiode
gets more illuminated. The voltages VP H i are compared
to the reference voltage VREF using a 16 comparator
circuits. The value of VREF can be adjusted to achieve
required sensitivity of the IR detection. The voltages at
the comparator outputs VCOM P i (i = 1, 2, ..., 16) are:
(
VDD ,
VP H i ≤ VREF ,
VCOM P i =
(2)
0 V,
VP H i > VREF .
Since invasion into the testing area has to be detected by any of 16 photodiodes, all 16 VCOM P i signals are combined into one control signal using two
8-input NAND-gates and a 2-input OR-gate. The control
signal VCT RL is connected to the “set” input of the
SR-latch as shown in Fig. 4. The relationship between
the control signal VCT RL and the VP H i is determined as:
(
0 V,
VP H ≤ VREF , DP H i illum.,
VCT RL =
VDD , VP H > VREF , DP H i not illum.
(3)
When the transmission of the IR signal to the either
one of 16 photodiodes in interrupted, the control voltage
VCT RL rises to VDD . This causes the turn-on of the
MOSFET Q1 and closing of the normally-open contact
of the relay K1 shown in Fig. 4. The relay K1 Closing
the contact of the K1 will cause the closing of the
normally-open contacts of the relay K2 and disconnection
of the high-voltage DC source VDC from its power supply.
The invasion into the testing area is indicated by the LED
error indicator while disconnection of the VDC from its
power supply is indicated by the LED protection indicator
shown in Fig. 4. The transistor Q1 is turned off again by
pressing the push-button SRST and the VDC is connected
to its power supply.
The emergency stop switch in the output stage can
be used independently of the part for the IR detection.
Pressing the emergency stop switch will also cause the
relay K2 to disconnect the high-voltage DC source from
its power supply.
The top view of the assembled electronic control circuit
with indicated main blocks is shown in Fig. 5.
III. E VALUATION OF I NFRARED P ROTECTION S YSTEM
The functionality of the control circuitry is verified by
applying a short circuit between the inverting comparator
input and a ground terminal to imitate a fully illuminated
photodiode as shown in Fig. 6. When any of the comparator inputs are left open to imitate an invasion into the
testing area, the control system triggers the on-state of the
protection circuit and the voltage source that supplies the
DUT is switched off.
In order to test the functionality of the complete protection system, the IR emitters and photodiodes are placed
on the two 700-mm high tubes distanced by 800 mm as
shown in Fig. 7. The system is calibrated by adjusting the
variable resistors Ri (i = 1, 2, ..., 16) shown in Fig. 4 in
79
VDD
VDD
to LED
error
indicator
Ri
VCOM P
emergency
stop
switch
i
VREF
2
VP H
i
&
IP H
i
VRST
illuminated
by IR signal
DP H
12 V
to LED
protection
indicator
VCT RL
≥1
voltage source
VDC
K1
S
Q
R
SRST
DUT
(SiC/GaN)
VG
K2
Q1
i
GND
input stage
L1
GND
logic stage
-12 V
L2
L3
output stage
Fig. 4. Simplified schematic of the protection system. The system is divided into three function blocks: input, logic and output stage. The developed
protection system contains 16 identical input stages (i = 1, 2, ..., 16). The array of 16 photodiodes is illuminated by IR signal generated by 16 IR
emitters.
VREF
adjustment
IR emitter
interface
LED error relay
indicators K1
multimeters
IR emitters
oscilloscope VDD supply photodiodes
IR protected space
Fig. 7. Set-up to test the functionality of the complete protection system.
interface to the
output stage
photodiode interface
with variable resistors
Fig. 5. The top view of the assembled control circuit.
VDD
Ri
VP H
VCT RL
VREF
logic
S
R
S
GND
VG
Q
osc.
GND
Fig. 6. Set-up to test the functionality of the control circuitry. The voltage
waveforms are sensed using an oscilloscope (osc.).
such a way that all the voltages across the photodiodes are
the same during the normal operation of the system, i.e.
when the testing area is not invaded. The reference voltage
VREF is adjusted to a value of 3.73 V that is high enough
to avoid the impact of the noise on the illumination of
the photodiodes caused by other sources of light such as
daylight or room lightning.
The system is initially set in a normal mode of operation
in which all the photodiodes are illuminated by the IR
signal. The IR protected space is invaded and characteristic
80
transient response timings of the protection system are
measured. A pre-trigger of 1-5 ms is set at the oscilloscope to record the complete transition of the system
response when the space protected by the IR emitters
and photodiodes is invaded. All waveforms are recorded
using the Agilent MSO7034B oscilloscope and Agilent
10073D passive probe configuration as follows: 1:1 ratio,
DC coupling, bandwidth limit: OFF, input impedance:
10 MΩ.
The transient response waveforms of the control system
when the IR protected space is invaded are shown in
Fig. 8. The voltage across the photodiode VP H rises to
the value of VDD after the invasion is detected. When
VP H crosses the value of VREF , the voltages VCT RL
at the comparator output and VG at the gate of the Q1
instantaneously rise to the value of VDD . The time-delay
of the control circuitry, i.e. the time between the detection
of the invasion into the testing area and start of the turn-on
process of Q1 is denoted as ∆tCT RL1 in Fig. 8. The major
part of ∆tCT RL1 is the rise time of the voltage across
the photodiode VP H which depends on how fast an object
passes through the IR protected space. If the object slowly
invades into the testing area, it will take more time for the
photodiode to become completely non-illuminated and for
the VP H to reach the VREF . The time-delay of the digital
MIPRO 2016/MEET
5
2.5
0
0
5
7.5
10
12.5 15
17.5 20
17.5
20
∆tCT RL1 = 1.75 ms
2.5
7.5
5
10 12.5
time [ms]
15
17.5
20
VP H [V]
2.5
15
VG [V]
5
2.5
0
0
12.5
5
5
2.5
0
0
2.5
0
0
ILOAD [A]
VP H [V]
VCT RL [V]
VG [V]
5
VREF = 3.73 V
2.5 invasion into
the testing area
0
2.5
7.5 10
0
5
1
0.5
0
0
VREF = 3.73 V
10
20
30
invasion into
the testing area
40
50
60
∆tCT RL2 = 32 ms ∆tK
10
20
30
40
50
60
50
60
∆tT OT = 40 ms
10
20
30
40
time [ms]
Fig. 8. Transient response of the control circuitry when IR protected
space is invaded. Top to bottom: voltage across the photodiode, voltage
at the output of the comparator and voltage at the gate of the Q1 .
Fig. 9. Transient response of the complete system when IR protected
space is invaded. Top to bottom: voltage across the photodiode, voltage
at the output of the comparator and current through the dummy load.
logic parts and SR-latch are negligible.
A test case in which a dummy load used to imitate the
output stage is being disconnected from the voltage source
is designed to evaluate the response of the IR protection
system. An invasion into the testing area is simulated and
the voltage across the photodiode VP H , voltage VG at the
gate of Q1 and the current through the dummy load IL are
recorded and shown in Fig. 9. The total time-delay of the
protection system ∆tT OT is the sum of the time-delay of
the control circuitry ∆tCT RL2 and the time ∆tK between
the beginning of the turn-on process of Q1 and fall of the
IL to 50% of its value. Since the simulated invasion into
the testing area is much slower than for the test case shown
in Fig. 8 in order to reproduce the worst-case-scenario,
the time-delay of the control circuitry ∆tCT RL2 is much
larger than ∆tCT RL1 . The value of ∆tK is independent
of the speed of invasion into the testing area and equals
8 ms. The total time-delay ∆tT OT between the detection
of the invasion into the testing area and break of the output
high-voltage circuit is approximately 40 ms. The largest
permissible duration of the electric shock with the most
harmless effects on human body is 50 ms as specified
in [7].
The power dissipation of all 16 IR emitters and 16
photodiodes when they are illuminated is 1 W while
the dissipation of the control circuitry is approximately
35 mW.
permissible duration of the electric shock that is specified
by safety regulations.
IV. C ONCLUSION
A protection system is developed to ensure safety of
a user during a high-voltage testing of power devices
such as SiC and GaN FETs used in switching DC-DC
converters. The system is based on infrared detection of
the invasion into the testing area in order to turn-off the
high-voltage source that supplies the device under test. The
functionality of the system is evaluated for different test
cases and characteristic response timings are measured.
The time-delay between the detection of an invasion and
activation of the protection circuit is lower than the highest
MIPRO 2016/MEET
ACKNOWLEDGMENT
This work was supported in part by the Croatian Science
Foundation (HRZZ) within the project Advanced design
methodology for switching dc–dc converters.
R EFERENCES
[1] J. Biela, M. Schweizer, S. Waffler, and J. W. Kolar,
“SiC versus Si—Evaluation of Potentials for Performance
Improvement of Inverter and DC–DC Converter Systems by
SiC Power Semiconductors,” IEEE Transactions on Industrial
Electronics, vol. 58, no. 7, pp. 2872–2882, July 2011.
[2] B. J. Baliga, Fundamentals of Power Semiconductor Devices, 1st ed.
Springer, 2008.
[3] Narendra Mehta, GaN FET module performance advantage over
silicon. Texas Instruments, March 2015, Application Note.
[4] O. Mostaghimi, N. Wright, and A. Horsfall, “Design and Performance Evaluation of SiC Based DC-DC Converters for PV
Applications,” in IEEE Energy Conversion Congress and Exposition
(ECCE), Sept 2012, pp. 3956–3963.
[5] R. Mitova, R. Gosh, U. Mhaskar, D. Klikic, M. Wang, and A. Dantella, “Investigations of 600-V GaN HEMT and GaN Diode for
Power Converter Applications,” in IEEE Transactions on Power
Electronics, vol. 29, no. 5, October 2013, pp. 2441 – 2452.
[6] L. Gordon, “The physiological effects of electric shock in the pulsed
power laboratory,” in Pulsed Power Conference, 1991. Digest of
Technical Papers, June 1991, pp. 377 – 380.
[7] C. H. Lee and A. P. S. Meliopoulos, “Comparison of touch and step
voltages between IEEE Std 80 and IEC 479-1,” IEE Proceedings
- Generation, Transmission and Distribution, vol. 146, no. 6, pp.
593–601, Nov 1999.
[8] OSRAM. (2014) High Power Infrared Emitter SFH 4550
FA Product Details. [Online]. Available: http://www.osramos.com/Graphics/XPic3/00116140 0.pdf
[9] ——. (2014) Silicon PIN Photodiode SFH 213 FA
Product
Details.
[Online].
Available:
http://www.osramos.com/Graphics/XPic5/00101689 0.pdf
81
Optimal Conduction Angle of an E-pHEMT
Harmonic Frequency Multiplier
Krunoslav Martinčić, Zagreb University of Applied Sciences, Electrical Engineering dept., Zagreb, Croatia
e-mail: krunoslav.martincic@tvz.hr
Abstract - This paper describes a classic method of an analog
harmonic multiplier which uses a C class a mplifier to pology.
The g oal of this w ork is to propose the method of fi nding the
optimal conduction angle of a E-pHEMT trans istor so that a
better spectral purity of the second or third harmonics can be
obtained. M ATLAB simulations are performed and the
electronic circuit with an E-pHEMT trans istor is designed .
Measurements are mad e in both ti me and frequency domain
and a comparsion of the results is presented.
INTRODUCTION
With the development of modern technologies, the
characteristics of electronic components are greatly
enhanced along with increasing operating frequency of
electronic circuits. The rapid expansion of information and
communication technology leads to the congestion of
currently used frequency bands and also to a need for higher
operating frequencies. Highly stable and in the harmonicspectrum pure frequencies in higher GHz [1], [2] and THz
[3] bands are practically impossible to generate directly in
present time. Higher harmonic components of the
fundamental frequency can be simply obtained by applying
a sine voltage onto an electronic component having
nonlinear current-voltage characteristics. Unfortunately,
the byproduct of the wanted higher harmonics is also a large
number of unwanted harmonic components of the
fundamental frequency.
The paper describes a sine signal distortion method
applied to finding the optimal waveform with the aim of
gaining a single, specific higher harmonic component.
Semiconductor electronic components possess highly nonlinear current-voltage properties and as such are suitable for
generating higher harmonics of fundamental frequency [4],
[5]. The aim of this paper is to apply simulation tools and
measurement of devices in order to obtain the defined
harmonic component. The focus in this paper is not on the
circuit topology, conversion efficiency, bandwidth or
optimisation of high frequency matching under the nonlinear operating conditions of semiconductor devices.
filter. It is better if the side spectrum components: (k –2)ω,
(k –1)ω, (k+1)ω and (k+2)ω are as small as possible. In this
way, the unwanted components of the spectrum can be well
suppressed by simpler, low order filters. Since the concept
of spectral purity in literature is frequently linked to the
notion of phase noise of an oscillator, the quantitative factor
that describes the issues studied here can be named
Harmonic Spectra Purity Factor (HSPF). Similarly to the
distortion of linear systems, HSPF can be defined via the
power of spectral components as:
(
)=
82
)
)
(
;
,
∈ N,
> 1,
(1)
where P(kω) = P(ω k) is the power of the k-th harmonic ω k.
If the signal only contains the component ω k, then
HSPF(ω k) = 1, i.e. the signal is 100% spectrally "clean".
The same method of the quantitative description can be
applied to other nonlinear systems, e.g. to mixers.
Symmetrical (balanced) structures will suppress some of
the unwanted products better than unbalanced structures.
Also, with the same topology, a well balanced circuit will
have higher HSPF than an unbalanced circuit.
The relation (1) is unsuitable for application, therefore,
in practice, it can be used in a simplified form taking into
account the spectrum of only two or four symmetric octaves
with respect to ω k. The components of the spectrum which
are more distant from ω k are more easily filtered out and
generally have smaller amplitudes. HSPF2 and HSPF4,
defined by two and four octaves, are expressed in Equations
(2) and (3). For rational sums index, the nearest lower
whole number (integer) is to be taken.
2(
)=
4(
)=
QUANTITATIVE MODEL
The quantitative model proposed here can be applied to
any harmonically pumped nonlinear system. If we focus
solely on the concept of multiplication of a harmonic
frequency of a sine signal, then the objective is to obtain
one particular, higher harmonic kω (k ∊ N, k > 1), with the
power P(kω) derived from the initial signal. Nonlinear sine
wave distortion generates a multitude of harmonic products
nω (n = 1, 2, 3, ...). It is possible to extract the target
component kω from such a spectrum by using a suitable
(
∑
(
∑
)
(
)
(
∑
)
(
)
;
,
∈ N,
> 1.
(2)
;
,
∈ N,
> 1.
(3)
In a specific situation, HSPF will be represented by the
function of a selected electronic component, static
operating point, circuit topology, the amplitude of the
driving signal and the temperature.
MIPRO 2016/MEET
expected in the measurement results compared to the
simulation results. In the realistic setting of the dynamic
conditions, there are parasitic serial resistances,
capacitances, and inductances in the active component
which influence the shape of the signal in the time-domain
as well as the spectrum. Therefore, one can expect a small
difference between the data obtained from simulation and
the results of measurement, especially at higher
frequencies.
According to (4), (5) and (6), Id will depend on the gate
bias voltage UGSQ and the amplitude of the driving signal
ug. The three fundamentally different operating modes are
possible:
Fig.1, Measurement setup
a)
PHYSICAL REALIZATION
Due to the simplicity in the setting of quiescent condition
and theoretically simple transfer characteristic, the circuit
is realized by using an enhanced pHEMT [6] transistor of
high transconductance. The schematic of the circuit and the
measurement setup is shown in Fig. 1. Firstly, the static
transfer characteristic of the transistor at Uds = 4V has been
measured. HF and microwave transistors generally possess
a narow gate and, consequentially, linear transfer
characteristic in the saturation region. Due to nonlinearity
around the threshold voltage, UGS0 (VTH), the transfer
characteristic is interpolated by a semi parabola:
= 0.3
− 0.27
= 0,
≤ 0.27 V
A,
> 0.27 V and (4)
(6).
+
In the frequency multiplication circuits with transistors,
the most commonly used mode is a C-class since such a
waveform contains a multitude of higher harmonic
components of the spectral distribution [7]:
cos
(5).
The gate voltage Ugs is a sum of the DC bias UGSQ and AC
driving signal ug (Fig. 1):
=
UGSQ ≤ UGS0, C-class, the transistor conducts for
less than half of a period, with low side clipping,
b) UGSQ > UGS0, non-linear operation, the transistor
conducts for more than half of a period, depending
on the amplitude of the driving signal, clipping is
possible both on the bottom and the top side,
c) UGSQ >> UGS0, depending on the amplitude of the
driving signal, mainly saturation, and high side
clipping.
The measured and interpolated transfer functions are
shown in Fig. 2. Some smaller differences are observed
between the two curves, therefore, some deviations are
(
=2
−
,
,
)~
=
1
2
1−(
,
≤
,
2
,
)
=2
(7)
,
,
≥ 1,
−
.
UGS0 = VTH = 0.27 V
Uds = 4 V
Fig. 3, Spectrum normalized to P(f1) for θ = 30º
MATLAB simulation (FFT) of spectrum for narrow half-
Fig. 2, E-PHEMT, Id(Ugs), Uds = const.
MIPRO 2016/MEET
sinusoidal pulse train (θ << 2π) is shown in Fig. 3 and for
large conduction angle (θ ≈ π) in Fig. 4. The waveform and
the spectrum depend on the gate bias voltage UGSQ and the
amplitude of the driving signal ug. According to (4), (5) and
(6) both of these variables unambiguously determine the
conduction angle θ of the transistor. If the amplitude of the
driving signal ug is held constant and the gate bias voltage
UGSQ is being changed (Fig. 1), the clipping level and
83
Fig. 6, Id(t), θ = 120º, UGSQ = 0.12 V, P in = - 10dBm, fin = 100MHz
Fig. 4, Spectrum normalized to P(f1) for θ = 170º
consequently the conduction angle can be adjusted. For
this reason, the simulation and the spectrum analysis by
MATLAB have been done at a conduction angle θ as an
independent variable.
SIMULATIONS AND MEASUREMENTS
With the constant amplitude ug, and by changing the
gate bias voltage UGSQ, with the aid of a resistive divider
R1 - POT1, according to Fig. 1, it is possible to adjust the
Fig. 7, Spectrum, measured for θ = 120º
compared to the prior case of the conduction angle θ =
170°, Fig. 4.
The same procedure has been performed in the case of
a frequency tripler, and Fig. 8 shows the obtained
Fig. 5, HSPF2(ω 2) as a function of θ
conduction angle from 0º to 360º. With smaller conduction
angles, the total power of the spectral components is
smaller, too. The aim of this work is not the optimisation
of power or efficiency, but the optimisation of the spectral
composition. Once the wanted component has been
separated from the spectrum and the unwanted
components have been sufficiently suppressed, the signal
level can be amplified to the defined level by a linear
amplifier. In Fig. 4 (θ = 170°), the fundamental harmonic
dominance in the spectrum can be observed. With a
frequency doubler, it would be desirable that the
amplitudes of the fundamental and third harmonics are as
low as possible. The simulation of the conduction angle
ranging from 0º to 180º has been performed in MATLAB.
Fig. 5 shows HSPF2(ω 2) as the function of the conduction
angle. The optimal ratio of the amplitudes of spectral
components in terms of generating the second harmonic
occurs at a conduction angle of θ ≈ 117º, i.e. one third of
the period. This case is shown for time domain in Fig. 6,
and for frequency domain in Fig. 7. There is a relative
increase of the amplitude of the second harmonic with
respect to the amplitude of the fundamental component
84
Fig. 8, HSPF2(ω 3) as a function of θ
Fig. 9, Spectrum normalized to P(f1) for θ = 70º
dependence of HSPF2(ω 3) on the conduction angle. The
graph shows the maximum for a conduction angle of 71º,
MIPRO 2016/MEET
i.e. ≈ 1/5 of the period. Fig. 9 shows the spectrum obtained
by computer simulation for the referred case. From the
graphs a relative increase can be observed in the power of
the third harmonic compared to the first and second
harmonic compared to the prior case of the frequency
doubler and conduction angle θ = 120°.
CONCLUSION
This paper presents a model for quantitative description
of harmonically pumped nonlinear systems. With the aid
of MATLAB, an analysis is conducted, including the
calculation of the proposed factors for the case of a
frequency multiplier implemented by using E-pHEMT in
C-class amplifier. The measurement results have
confirmed the computer simulation results and have also
justified the use of the proposed factors. In the future work,
the intention is to carry out an analysis of circuits with an
exponential current-voltage characteristic, as well as of
symmetrical topologies with suppression of the
fundamental component.
MIPRO 2016/MEET
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Yuan Chun Li, “20–40 GHz dual-gate frequency doubler using 0.5
µm GaAs pHEMT technology“, IEEE, Electronics Letters,
Volume: 50, Issue: 10, 2014.
Rauscher, C., "High Frequency Doubler Operation GaAs Field
Effect Transistors", IEEE Trans. On Microwave Theory and
Techniques, Vol. 31, No. 8, June 1983,
T.W. Crowe, J.L. Hesler, S.A. Retzloff, C. Pouzou, G.S.
Schoenthal, "Solid-State LO Sources for Greater than 2THz“,
Virginia Diodes Inc., Charlottesville, VA, 2011 ISSTT Digest, 22nd
Symposium on Space Terahertz Technology, April 26th-28th,
2011, Tucson Arizona, USA, www.vadiodes.com
S. A. Maas, “The RF and Microwave Circuit Design Cookbook”,
Artech House, Inc., Norwood, MA, 1998.
Edmar Camargo, “Design of FET Frequency Multipliers and
Harmonic Oscillators“, Artech House, Norwood, MA, 1998.
Samuel Y. Liao, “Microwave Devices and Circuits“, Third Edition,
Prentice Hall, Englewood Cliffs, NJ, 1996.
S. A. Maas, “Nonlinear Microwave and RF Circuits”, Artech
House, Inc., Norwood, MA, 2003.
85
Ultra-Wideband Transmitter Based on Integral
Pulse Frequency Modulator
T. Matić*, M. Herceg*, J. Job* and L. Šneler**
*
Faculty of Electrical Engineering Osijek, Osijek, Croatia
** Supracontrol, Zagreb, Croatia
tmatic@etfos.hr
Abstract - The paper presents a novel short-range wireless
sensor node architecture, based on Integral Pulse Frequency
Modulator (IPFM) and Ultra-Wideband (UWB) pulse
generator. Due to the lack of internal clock signal source
and multi user implementation without application of a
microprocessor, the architecture is simple and energy
efficient. Multi-user coding is performed using delay
elements, where each user has unique delay time value. The
output of the IPFM is fed to the delay element. Delayed and
original signal from the IPFM are feeding UWB pulse
generator. The paper presents the transmitter architecture
and measurements of the transmitter signals in time and
frequency domain.
I.
INTRODUCTION
Wireless sensor networks have gained great attention
in last two decades. Following development of the
semiconductor technology and modern CMOS mixed
signal circuit design, the wireless sensor networks are
becoming the core part of the various modern
communication systems [1-4]. Particularly important is its
application in the systems where the node size and the
power consumption is the critical demand, like Wireless
Body Area Networks (WBAN) [5-8]. In such networks,
the size of the wireless node and its power consumption
are limited and it is important to achieve minimization and
lowest possible consumption [9].
Ultra-wideband (UWB) communication systems have
shown great promise in low power communication
systems. Since the UWB communication systems are
based on the narrow pulses, it provides energy
consumption below 1 nJ per transmitted pulse. Depending
on the application areas, for slowly varying signal
measurements, where low duty cycle operation is enabled,
wireless sensor node can provide extremely low power
operation. In such a system, there is a promising
application of energy harvesting power supplies that
would enable battery-free operation.
Wireless sensor nodes aimed to acquire a signal from
an analog sensor are relaxing in terms on computational
complexity. Due to analog input signal, they could consist
of the simple analog-to-digital converter followed by an
UWB generator employing suitable modulation for multiuser coding. Such a sensor could employ simple 1-bit
analog-to-time conversion and delay based multiuser
coding, as it is proposed in [10 and 11]. The proposed
multi-user coding is based on transmitted reference ultra-
86
wideband modulation. The major difference is that it
employs asynchronous transmission and no clock nor
microprocessor is required at the transmitter side.
Clock-less analog to time conversion can easily be
implemented with Time Encoding Machine (TEM) [12] at
the analog sensor output. The analog input signal is
transformed to output pulse train that contains the
information on analog input signal in time domain, like
output pulse frequency, distance or duty cycle, depending
on the type of TEM [11].
For energy efficient time-encoding, the best choice is
pulse-based TEM, like Integrate and Fire (IAF) or Integral
Pulse Frequency Modulator (IPFM) that transform analog
input signal to pulse train with information in pulse
distance/frequency. The maximum output signal value is
present only during short term pulse duration, while the
rest of the period output signal is equal to zero. Such an
operation enables low duty-cycling and therefore low
energy consumption.
In next chapter, the paper presents the TEM theory,
focusing on IPFM application for UWB wireless sensor
application. The UWB IPFM transmitter is implemented
as discrete-type laboratory prototype which can be used as
a wireless sensor node at the analog sensor output. The
third chapter present the measurement results.
II.
IPFM UWB TRANSMITTER
A. Transmitter architecture
The presented wireless transmitter is aimed for the
application at the analog sensor output for remote signal
acquisition. As it can be seen form Fig. 1, it transforms
analog input signal to time information using TEM as a
modulator. The pulse train y(t) is fed to the delay line that
is used for multi user coding and optionally fed back to
IPFM. Each sensor has unique delay value that is used for
user identification at the receiver side. The delayed and
original pulse train (y’(t) and y(t) signals) are fed to the
UWB pulse generator where the pulse pair is formed for
direct wireless transmission. The ultra-wideband generator
can optionally be fed to the transmission line or antenna to
achieve wireless or wired communication.
In case of wireless transmission, the pulses are in high
frequency bands over 3 GHz frequencies, while in wired
mode communication, the frequency should be
significantly lower to minimize losses and reflections.
MIPRO 2016/MEET
B. Time Encoding Machine
The Time Encoding Machines transform the analog
input signal to information at the output. The basic TEM
circuits, according to [12] are Asynchronous Sigma-Delta
Modulator (ASDM) and Integrate and Fire (IAF)
modulator. Beside ASDM and IAF, Integral Pulse
Frequency Modulator can also be considered as a TEM
[11], since its consecutive output pulses distance is
proportional to analog input voltage value.
For ASDM application in TEM, the information on the
analog input voltage is transformed to output pulse train
duty cycle [13]. Therefore, the information on rising and
falling edge is required for analog information recovery.
Due to more complex transmitter and receiver architecture
for transmission of positive and negative pulses, IPFM is
more attractive for transmitter implementation. Since
information is carried only in pulse distances, only
positive UWB pulses can be transmitted for reliable
analog signal demodulation at the receiver side.
Figure 1. Block scheme of the IPFM UWB transmitter
The block scheme of the Integral Pulse Frequency
Modulator is depicted in Fig.2. The analog input signal
defines the comparator threshold voltage. To enable
unipolar supply for bipolar input voltages, the input
voltage span is transformed from [-VCC, VCC] to [0, VCC],
where c = 0.5, while Sref is equal to VCC/2.
The integrator L(s) from the Fig. 2 constantly
integrates until the integrator output l(t) reaches
comparator threshold. The transition of the comparator
output y(t) occurs and feedback loop signal y’(t) resets the
integrator to integrate from zero voltage till the next
crossing occurs. The delay block can be implemented
outside the feedback loop or inside, like in Fig.2. The
consecutive pulse distance is proportional to the
comparator threshold uTH(t) = x(t)/2 + VCC/2 for ideal
integrator L(s) with time constant τi. The integrator output
signal in time domain can be expressed by the following
equation:
l (t k )
tk
tk 1
VCC
i
dt Ck 1 .
At the time tk when integrator triggers comparator,
under consumption that at the beginning of the integration
slope tk-1, the integrator output was equal to 0 (l(tk-1) = Ck-1
= 0) the following equation holds:
l (t k )
tk
tk 1
VCC
i
dt uTH
x(tk ) VCC
,
2
2
if τi << (tk - tk-1). According to (2), the distance between
consecutive output pulses Tk and Tk-1 of the signal y(t), if
τn = 0 is equal to:
Tk Tk 1 t k t k 1 TON TOFF
i
2VCC
x (t k )
MIPRO 2016/MEET
i
2
TON TOFF ,
Figure 2. Block scheme of the IPFM modulator
where TON and TOFF are on and off switching times of the
integrator L(s) and their sum is equal to the output pulse
width (TP = TON + TOFF). For τn ≠ 0, the equation (3)
becomes:
Tk Tk 1
i
2VCC
x (t k )
i
2
2 n TON TOFF .
If integrator time constant τi, multi user delay τn and
switching delay times TON and TOFF are known at the
receiver side, according to (4), it’s easy do demodulate
pulse train and to obtain analog input voltage value x(tk).
The difference in two multi user delays τn - τn-1 should be
high enough that variation in integrator time constant τi,
delay times TON and TOFF does not affect detection at the
receiver side.
C. Circuit implementation
The pulse train y(t) is applied to the delay line which
generates delayed replica y’(t). In that way, two pulse
trains are formed to feed the UWB pulse generator for the
wireless pulse transmission. The discrete type prototype is
implemented, based on modified version of UWB
generator published in [14]. Modification allows impulse
pair generation, independently on the rectangular pulse
width.
The implemented circuit is depicted on Fig.3. Instead
of the operational amplifier (OPAMP) implementation,
the simple RC-filter is used as integrator. One OPAMP is
used as comparator within IPFM, while the other one is
used as a comparator within delay circuit. Both
87
Figure 3. The circuit implementation of the IPFM UWB transmitter
Figure 4. The output of the comparator y’(t)
Figure 5. The step recovery diode voltage
comparators trigger UWB generator inputs that are
implemented with high frequency transistors that trigger
step recovery diode connected to microstrip line. At each
rising edge of both original and delayed pulse trains, the
UWB generator will form an ultra-wideband pulse which
are summed and form signal yUWB (t) to be sent wirelessly.
The microstrip line is connected to antenna via capacitor.
Since analog input voltage is unipolar signal, constant c =
1 and Sref is not connected.
III.
RESULTS
The laboratory prototype from Fig.3 was implemented
in discrete components implementation and for UWB
pulse generation step recovery diode has been used. Due
to constant forward biasing and constant current
consumption this implementation is not energy efficient.
To achieve low power operation IC based pulse
generation techniques have to be used to ensure low duty
cycling and zero output current in excess of UWB pulse
[15 and 16]. The delayed comparator output pulse train
y’(t) is depicted in Fig 4. Information on the input signal
level is contained in the pulse distance Tk – Tk-1, which is
equal to 500 ns in the present setup.
88
Figure 6. The output spectrum of the UWB pulse train
Fig.5. presents the step recovery diode voltage. It is
visible that the pulse pair reversely polarizes diode. At
each reverse polarization pulse step recovery diode will
MIPRO 2016/MEET
form the narrow Gaussian pulse. The spectrum of the
transmitted pulse is measured with receiving antenna at
the 30 cm distance without LNA. Received power
spectrum is depicted on Fig. 6. Due to limitations of the
step recovery diode based UWB pulse shaper, the pulse in
not positioned entirely in unlicensed spectrum.
[4]
[5]
IV.
CONLUSION
The paper presents novel UWB IPFM transmitter for
short range wireless communications. Architecture of the
transmitting system is achieved without internal clock
application nor microprocessor application for multi-user
coding. Unique delay enables simple and efficient multiuser coding and UWB pulses provide low energy
consumption per transmitted pulse. To achieve low power
operation, future work is directed to CMOS IC
implementation of the system which will enable low
power operation in periods of non-activity between two
consecutive UWB pulses. Downscaling the pulse width to
sub 10 ps range, the integrator time constant could be
lowered to enable integration of the capacitors on silicon
as well. Also, IC design will provide more efficient
spectral shaping and spectral positioning in unlicensed
spectrum over 3 GHz due to delay-based UWB pulse
generation and more accurate pulse shaping comparing to
SRD implementation.
ACKNOWLEDGMENT
This work has been supported in part by Croatian
Science Foundation under the project UIP-2014-09-6219
Energy efficient asynchronous wireless transmission.
REFERENCES
[1]
[2]
[3]
I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A
survey on sensor networks,” IEEE Commun. Mag., vol. 40, no. 8,
pp.102–114, Aug. 2002.
Li-Yuan Chang; Pei-Yin Chen; Tsang-Yi Wang; Ching-Sung
Chen, "A Low-Cost VLSI Architecture for Robust Distributed
Estimation in Wireless Sensor Networks," in Circuits and Systems
I: Regular Papers, IEEE Transactions on , vol.58, no.6, pp.12771286, June 2011.
Bellasi, D.E.; Benini, L., "Energy-Efficiency Analysis of Analog
and Digital Compressive Sensing in Wireless Sensors," in Circuits
MIPRO 2016/MEET
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
and Systems I: Regular Papers, IEEE Transactions on , vol.62,
no.11, pp.2718-2729, Nov. 2015.
Bellasi, D.E.; Rovatti, R.; Benini, L.; Setti, G., "A Low-Power
Architecture for Punctured Compressed Sensing and Estimation in
Wireless Sensor-Nodes," in Circuits and Systems I: Regular
Papers, IEEE Transactions on , vol.62, no.5, pp.1296-1305, May
2015.
S. Diao, Y. Zheng, and C.-H. Heng, “A CMOS ultra low-power
and
highly efficient UWB-IR transmitter for WPAN
applications,” IEEE.Trans. Circuits Syst. II, Exp. Briefs, vol. 56,
no. 3, pp. 200–204, 2009.
Poon, C.C.Y.; Lo, B.P.L.; Yuce, M.R.; Alomainy, A.; Yang Hao,
"Body Sensor Networks: In the Era of Big Data and Beyond," in
Biomedical Engineering, IEEE Reviews in , vol.8, no., pp.4-16,
2015.
Sarkar, S.; Misra, S., "From Micro to Nano: The Evolution of
Wireless Sensor-Based Health Care," in Pulse, IEEE , vol.7, no.1,
pp.21-25, Jan.-Feb. 2016.
Ozols, K., "Implementation of reception and real-time decoding of
ASDM encoded and wirelessly transmitted signals," in
Radioelektronika
(RADIOELEKTRONIKA),
2015
25th
International Conference , vol., no., pp.236-239, 21-22 April 2015.
Sarpeshkar, R., "Universal Principles for Ultra Low Power and
Energy Efficient Design," in Circuits and Systems II: Express
Briefs, IEEE Transactions on , vol.59, no.4, pp.193-198, April
2012.
T. Matic, M. Herceg, J. Job, “Energy-efficient system for distant
measurement of analogue signals”, WO/2014/195739, 11.12.2014.
T. Matic, M. Herceg, J. Job, “Energy-efficient system for distant
measurement of analogue signals”, WO/2014/195744, 11.12.2014.
A. A. Lazar and L. T. Toth, “Time encoding and decoding of a
signal”, US 7,573,956 B2, 11 August, 2009.
Ouzounov, S.; Engel Roza; Hegt, J.A.; van der Weide, G.; van
Roermund, A.H.M., "Analysis and design of high-performance
asynchronous sigma-delta Modulators with a binary quantizer," in
Solid-State Circuits, IEEE Journal of , vol.41, no.3, pp.588-596,
March 2006.
P. Protiva, J. Mrkvica, J. Macháč, Universal Generator of UltraWideband Pulses, Radioengineering, vol. 17, no. 4, December
2008.
Xie, H.L.; Wang, X.; Wang, A.; Qin, B.; Chen, H.; Zhao, B., "An
ultra low-power low-cost gaussian impulse generator for uwb
applications," in Solid-State and Integrated Circuit Technology,
2006. ICSICT '06. 8th International Conference on , vol., no.,
pp.1817-1820, 23-26 Oct. 2006.
Radic, J.B.; Djugova, A.M.; Videnovic-Misic, M.S., "A 3.1–10.6
GHz impulse-radio UWB pulse generator in 0.18µm," in
Intelligent Systems and Informatics (SISY), 2011 IEEE 9th
International Symposium on , vol., no., pp.335-338, 8-10 Sept.
2011.
89
Design of a transmitter for high-speed serial
interfaces in automotive micro-controller
A. Bandiziol*, W. Grollitsch**, F. Brandonisio**, R. Nonis**, P. Palestri*
*
Università degli Studi di Udine, Udine, Italy
** Infineon Technologies, Villach, Austria
palestri@uniud.it
Abstract - This work reports about the system level design
of a transmitter for the next generation of High-Speed Serial
Interfaces (HSSI) to be implemented in a micro-controller
for automotive Electronic Control Unit (ECU) applications,
pushing the transmission speed up to 10 Gbps over a 10cm
long cable. A voltage mode architecture is selected for low
power considerations. We focus our analysis here on the
system-level implementation of Feed-Forward Equalization
as an FIR filter consisting of different transmitter slices
driven by different bits in the data sequence. We consider
different data rates and number of taps and analyze how the
performance of the equalizer is affected by the quantization
of the values of the taps in the practical implementation of
the FIR filter.
I.
INTRODUCTION
High-speed digital I/O is becoming increasingly popular
and emerged as one of the hot research topics in microelectronics. In this context, high-speed serial links are
widely used in a variety of different fields [1]. Moreover,
in the last twenty years the speed of these links has been
constantly increasing, leading to the adoption of a broad
variety of standards [2]. Since the baud rates at which
these links work today are extremely high, inter-symbol
interference (ISI) has become the most limiting factor. In
order to mitigate ISI, the most effective solution is channel
equalization [3]. Equalization can be done either at the
transmitter side just before the channel (and it is called
Feed-Forward Equalization, FFE) or at the receiver (for
example in the form of Decision-Feedback Equalization,
DFE). Figure 1 shows the general structure of a transmit-
ter with feed-forward equalization: delayed versions of the
bit stream are fed to drivers with different strengths, thus
implementing an FIR filter.
II.
DRIVER ARCHITECTURE
For the transmitter, a voltage mode architecture has
been chosen, because it consumes less power than a current mode with the same output swing [4]. Figure 2 shows
the general structure of a voltage-mode differential driver:
two inverters are driven by bit bi and bi to produce a
differential voltage vo. The MOSFETs are large enough to
have a negligible voltage drop when on. Impedance
matching is implemented by the resistancesRDi.
Figure 2 is also the starting point to explain the basic
principle of FFE. A single driver can be split into many
slices, and the same bit bi of the serial data input can drive
many of these slices, i.e. the different drivers in Figure1
are each formed by many slices with the schematics of
Figure 2. This can be better understood in Figure 3. The
index “i” of a bit indicates the position in the bit stream.
When dividing the driver into slices, one must still
guarantee impedance matching. In fact, when FFE is not
implemented, only one slice might be used, and its output
resistance should match R0=50Ω. If more slices are used,
it has to be considered that the total resistance of all the
slices in parallel should be 50Ω, leading therefore to bigger values for RDi of each slice. This results in the following equation
∑
i
1
RDi
= 1
R0
(1)
Under this assumption, one can then write a closed
form expression for the output voltagevo, which reads
v0 =
VDD
n
⋅ ∑ ( i ) ⋅ bi ⋅ sgn i
M
2
i
(2)
where (see Figure 3) ni is the number of slides connected to the bit bi, M is the number of total slices and sgni
is the sign of bi (i.e. putting bi at the left or right part of
Figure 2 making the driver inverting or non-inverting).
For the bi, ‘1’ corresponds to bi =1 and ‘0’ means bi =-1.
Eq. (2) can be rewritten as
Figure 1. General scheme of a serial link implementing FFE using
driver slices
90
MIPRO 2016/MEET
cess is the zero-forcing method [6]. Here the discussion
will be at a tutorial level to bridge the gap between the
theory in [6] and the actual hardware implementation
(Figs. 2, 3). The goal is to minimize the distance between
the desired response of the transmitter+channel (i.e. a
signal without ISI) and that actually received; this is done
via a Least Squares Minimization problem that reads
min wZFE || z DES − H CH wZFE || 2
Figure 2. General scheme of a voltage-mode transmitter.
v0 [ j ] =
VDD
2
∑ w ⋅ b [ j] =
i
i
i
(3)
VDD
2
∑ w ⋅ data[ j − i ]
i
i
where j indicates a bit period and wi is the strength of
the slices connected to the i-th bit normalized to the full
driver strength. From now on, we will refer to wi as to the
“weight” for the i-th tap. From Eq. (3) we see that the
structure realizes an FIR filter, which implements a convolution between the bit stream (data[j]) and the tap vector wi.
It should be noted that in order to obtain fine impedance tuning and compensate PVT (Process, Voltage and
Temperature) variations, one does not strictly follow Eq.
(1): the driver (divided in slices) is sized to obtain a resistance much larger than 50Ω; then many replicas of the
structure are duplicated and the number of such replicas
put in parallel is adjusted to match the 50Ω target [5].
III.
CHOICE OF EQUALIZATION TAPS
In this paragraph, we will cover the basic steps that are
required to determine the weights wi for FFE.
The approach that we follow in our equalization pro-
Figure 3. Example of FFE implementation using slices (with
schematics as in Figure 2) put in parallel.
MIPRO 2016/MEET
(4)
where wZFE is the weights vector [w0,w1,…,wi,…wN] to
be determined by the minimization problem andzDES is the
desired output response. In other words, if we consider a
bit stream ‘10000’, the transmitter will generate a sequence of pulses of height [w0,w1,…,wi,…wN], each one
stimulating the channel. We want to set wi in such a way
that the receiver samples the original sequence ‘10000’
without ISI, meaning that zDES =[1,0,0,…,0.]. HCH is a
matrix that rearranges the channel pulse response (h) in
order to transform the convolution with the different pulses of height wi into a matrix product. For example, if the
channel pulse response is not null only for the first three
samples and we decide to equalize with 5 post-cursor taps,
then we have
h0
h
1
h2
HCH = 0
0
0
0
0
h0
0
0
0
0
h1
h2
0
0
0
h0
h1
h2
0
0
0
h0
h1
h2
0
0
0
0
0
h0
h1
h2
(5)
The solution of the minimization problem introduced
by Eq. (4) requires extracting the channel pulse response.
Therefore the simulation setup shown in Figure 4 is used.
This setup is implemented in Ansys Electronic Desktop
[7] and consists of a differential link with a pulse generator (with ideally steep rise and fall times) and an Sparameter block. This block changes from system to system and represents all the elements that compose the system after the transmitter. In this work we analyze two
different systems, which we will refer as “BGA system”
and “Leadframe system”. The Ball Grid Array (BGA)
system is composed by a via, approximately 5 mm long,
which connects the transmitter output signal to the package output, a BGA package, a Printed Circuit Board
(PCB) and a cable (10 cm long). At the receiver end, the
Figure 4. Setup implemented in Ansys Electronic Desktop in order to
extract the channel pulse response
91
Figure 8. Setup implemented in Ansys Electronic Desktop in order to
extract the obtain the eye diagram of the system
Figure 5. S21 of the BGA System
zDES = [1,0,0,0]. If pre-cursor taps are used, then the zDES
elements are shifted to the right by a number of places
equal to the number of pre-cursor taps.
IV.
Figure 6. S21 of the Leadframe System
impedance is matched at 100Ω differential and the differential output is measured via voltage probes. The Leadframe System is similar to the BGA one, but instead of a
BGA package it uses a leadframe one and the cable lowpass characteristic has been mimicked by lumped elements. Figs. 5 and 6 show the S21 of both systems.
The solution of the Eq. (4) reads
wZFE = ( H CH ⋅ H CH ) −1 ⋅ H CH ⋅ z DES
T
T
Once the optimal weights have been found, one can
then evaluate the effectiveness of FFE not only by looking
at the channel pulse response (as in Figure 7), but also by
simulating a structure equivalent to the system composed
by transmitter, package, chip and channel and evaluating
the improvements obtained in the eye diagram. The
software used to this end is Ansys Electronic Desktop.
Figure 8 illustrates a typical simulation setup: the only
difference to the structure presented in Figure 4 is the use
of a PRBS (Pseudo Random Bit Sequence) generator
instead of a pulse generator. The PRBS generator offers
the possibility to adapt its output based on equalization
weights inserted by the user: in this case the vector wZFE
obtained with Eq. (6). The low and the high voltage levels
have been set to -450mV and 450 mV, respectively.
Thermal noise typical of MOSFETs in the TX has not
been modeled and will be one of next steps of our work.
(6)
which is the well-known solution for the LS objective
function of Eq.4. Once the weights wZFE have been obtained, one can evaluate how close the overall transmitter+channel response fits the wanted zDES by checking it
against the product HCH ⋅ wZFE (i.e. the overall response of
the FFE+channel). This is shown in Figure 7, which compares the effect that equalizations with different number of
taps has on the overall pulse response of the channel of
Figure 6. In this particular case (consistent with the
description above), only post-cursor taps were used,
therefore the desired impulse channel response would be a
1 followed by a number of zeroes equal to the post-cursor
taps used. So, for a 4-taps equalization (main and three
post-cursors) the desired channel pulse response would be
Figure 9. Eye diagram at 2.5 Gbps for BGA System obtained with a
transsitor level simulation and one tap de-emphasis. The transmitter is
described in [8].
Channel Pulse Response
1
Output Voltage [V]
EXAMPLE WITH REALISTIC CHANNELS
Pulse
Pulse
Pulse
Pulse
Pulse
Pulse
0.8
0.6
Response pre-FFE
Response 2-post
Response 3-post
Response 4-post
Response 5-post
Response 6-post
0.4
h(5)
0.2
0
h(-2) h(-1)
h(0)
h(1)
h(2)
h(3)
20.8
21
21.2
h(4)
21.4
21.6
21.8
Time [ns]
22
22.2
22.4
22.6
Figure 7. Response of channel of Figures 4-6 operating at 10Gbps along
with responses after equalization for various numbers of taps
92
Figure 10. Eye diagram at 2.5 Gbps for BGA System obtained with a
PRBS generator and one tap de-emphasis.
MIPRO 2016/MEET
The channel is terminated with a 50 Ohm resistance.
In order to confirm the validity of our simplified
approach, in Figs. 9 and 10 two eye diagrams of the same
system (at 2.5 Gbps) are shown. One obtained with a
PRBS generator and the other one with a transistor level
model of the transmitter: the eye diagram parameters are
very similar. Since the system level design is performed
before actually designing the TX at transistor level, in the
following, for the 5 Gbps and the 10 Gbps cases only the
PRBS source is used. Figs. 11, 12, 13 and 14 report the
eye diagrams in such cases with and without equalization.
For the 5 Gbps situation, the improvement due to FFE is
marginal, whereas to work at 10 Gbps with the Leadframe
System, FFE is mandatory. Note that, at given VDD, the
inclusion of FFE lowers the high and low levels of the eye
(e.g. 400mV vs. 300mV in Figs.9 and 10): this is because
when FFE is implemented, some slices will be driven by
bits having opposite sign with respect to the main one, and
this implies that the driver is not working at full strength.
V.
Figure 11. Eye diagram at 5 Gbps for Leadframe System without FeedForward Equalization.
EFFECT OF TAP QUANTIZATION
Eq. (6) provides optimum tap weights, but one must
also think at a real world implementation, which obviously implies quantizing these obtained weights since each bit
will be connected to a finite number of slices. This problem is peculiar to the voltage-mode transmitter divided in
slices. In fact previous works already introduced equalization implemented with sliced drivers, but mainly in current mode logic [9]-[10], which makes equalization easier
to implement with high granularity.
For this reason, we analyzed the effect of quantization
with two different granularities, 8 and 16 levels (i.e. M=8
or M=16 as in Figure 3). These two different granularities
Figure 12. Eye diagram at 5 Gbps for Leadframe System with 6 postcursor taps. The weights that generate this eye are w0=0.8009, w1=0.0403,
w2=-0.1018, w3=0.0395, w4=-0.0045, w5=-0.013.
Figure 15. Eye diagram at 10 Gbps for Leadframe System with 6-post
cursor taps and quantization step of 1/8. The weigths are w0=0.625, w1=0.125, w2=0, w3=0, w4=-0.125 and w5=0.125, which correspond to 5 slices
Figure 13. Eye diagram at 10 Gbps for Leadframe System without FeedForward Equalization.
connected to b0 while b1 , b4 and b5 require one slice each.
Figure 16. Eye diagram at 10 Gbps for Leadframe System with 6-post
cursor taps and quantization step of 1/16. The weights are w0=0.5625, w1=0.1875, w2=0, w3=0.0625, w4=-0.125 and w5=0.0625, which correspond to
Figure 14. Eye diagram at 10 Gbps for Leadframe System with 6 postcursor taps. The weights that generate this eye are w0=0.577, w1=-0.1526,
9 slices connected to b0, three connected to b1 , one slice connected to b3,
w2=0.0148, w3= 0.0579, w4=-0.1392 and w5=0.0585.
one to b5 and two to b4 .
MIPRO 2016/MEET
93
offer quantization steps of 0.0625 and 0.125 respectively.
Figs. 15 and 16 show the effect of quantization on the
operation at 10 Gbps of Leadframe System when equalized with 6 post-cursor, which without quantization has
already been shown in Figure 14. With 16 slices we obtain
eye parameters very close to what is obtained from Eq.
(6), whereas with 8 slices there is a degradation of the eye.
VI.
CONCLUSIONS
We have reported on the design of a high speed transmitter at 10 Gbps, focusing on the system level planning
of the feed-forward-equalization. It has been shown that
for a realistic channel typical of automotive ECU applications, communication at 10 Gbps requires 6 taps. The
driver has to be partitioned in at least 16 slices in order to
have the requested granularity of the taps for the FIR
filter. If the transmission speed gets even higher, then a
higher number of taps is needed and the effect of quantization becomes more and more relevant. The eye parameters with and without FFE and including different granularity in the tap quantization are summarized in Figs. 1720. These will be checked against experimental data once
the test chip will be available. Transistor level design with
an advanced CMOS technology is ongoing and the results
will be published later on. We include in these figures also
a 15 Gbps situation that requires 9 taps for equalization
when considering the Leadframe System (7 for the BGA
System). In Figs. 17-20 we see that FFE improves the eye
height and width, although part of the improvement is lost
if a too coarse granularity is used for thewi.
ACKNOWLEDGMENT
The authors would like to thank Prof. L. Selmi (University of Udine) for support and for many helpful discussions.
REFERENCES
[1]
Carusone, Tony Chan. "Introduction to Digital I/O: Constraining
I/O Power Consumption in High-Performance Systems." SolidState Circuits Magazine, IEEE, vol. 7, no. 4 (2015): 14-22.
[2] Chang, Ken, Geoff Zhang, and Christopher Borrelli. "Evolution of
Wireline Transceiver Standards: Various, Most-Used Standards
for the Bandwidth Demand." Solid-State Circuits Magazine, IEEE,
vol. 7, no. 4 (2015): 47-52.
[3] Bulzacchelli, John F. "Equalization for Electrical Links: Current
Design Techniques and Future Directions." Solid-State Circuits
Magazine, IEEE, vol. 7, no. 4 (2015): 23-31.
[4] Razavi, Behzad. "Historical Trends in Wireline Communications:
60X Improvement in Speed in 20 Years." Solid-State Circuits
Magazine, IEEE, vol. 7, no. 4 (2015): 42-46.
[5] Kossel, Marcel, et al. "A T-coil-enhanced 8.5 Gb/s high-swing
SST transmitter in 65 nm bulk CMOS with 16 dB return loss over
10 GHz bandwidth." Solid-State Circuits, IEEE Journal of, vol.
43. no. 12 (2008): 2905-2920.
[6] Proakis, John G. Intersymbol interference in digital
communication systems. John Wiley & Sons, Inc., 2001.
[7] Ansys Inc. PDF Documentation for Release 15.0
[8] Cossettini, A., et al. "Design, characterization and signal integrity
analysis of a 2.5 Gb/s High-Speed Serial Interface for automotive
applications overarching the chip/PCB wall." Research and
Technologies for Society and Industry Leveraging a better
tomorrow (RTSI), 2015 IEEE 1st International Forum on. IEEE,
2015.
[9] Bulzacchelli, J. F., et al. "4.1 A 10Gb/s 5-Tap-DFE/4-Tap-FFE
Transceiver in 90nm CMOS." Solid-State Circuits, IEEE Journal
of, vol. 41. no. 12 (2006): 2885-2900.
[10] Beukema, Troy, et al. "A 6.4-Gb/s CMOS SerDes core with feedforward and decision-feedback equalization."Solid-State Circuits,
IEEE Journal of 40.12 (2005): 2633-2645.
Quantization Step 1/16
Quantization Step 1/16
Quantization Step 1/8
Quantization Step 1/8
Figure 17. Eye height versus transmission speed for Leadframe System Figure 19. Eye height versus transmission speed for BGASystem when not
when not equalized, optimally equalized and when quantization is applied equalized, optimally equalized and when quantization is applied to wi.
to wi.
Quantization S tep 1/16
Quantization S tep 1/8
94
Quantization Step 1/16
Quantization Step 1/8
Figure 18. Eye width versus transmission speed for Leadframe System Figure 20. Eye width versus transmission speed for BGASystem when not
when not equalized, optimally equalized and when quantization is applied equalized, optimally equalized and when quantization is applied to wi.
to wi.
MIPRO 2016/MEET
Application of the calculation-experimental
method in the design of microwave filters
A.S. Geraskin*, A.N. Savin*, I.A. Nakrap**, V.P. Meshchanov***
National Research Saratov State University/ Faculty of Computer Science and Information Technologies, Saratov,
Russian Federation
** National Research Saratov State University/ Faculty of Physics, Saratov, Russian Federation
*** Yuri Gagarin State Technical University of Saratov/ Institute of electronic engineering and mechanical engineering,
Saratov, Russian Federation
Gerascinas @mail.ru, savinan@info.sgu.ru, nakrapia@info.sgu.ru, nika373@bk.ru
*
The estimation of the applicability of calculationexperimental method to optimize the design of
microstrip microwave filters on the example of the
band-pass filter on the half-wave resonators with
Chebyshev characteristic. This method is based on an
iterative process of the correction of the result
designed device parameters synthesis in his
experimental output characteristics.
It is shown that the method allows to consider the
influence of various factors relating to the
manufacturing process and features of the materials
used. However, an additive method of accounting in
the target function of deviations from given output
characteristics does not provide a rapid convergence
process at large deviations.
In order to increase the efficiency of calculation experimental optimization method is offered its
modification, consisting in additional correction
parameters of materials and mathematical models of
components of the experimental output characteristics
of the device.
Accounting irregularities of the relative dielectric
permittivity and the thickness of the microstrip filter
components in its design with the help of a modified
method has allowed for a single iteration to reduce the
unevenness of the reflection coefficient module of the
filter with 7.1 dB to 1.7 dB.
I.
INTRODUCTION
At the present stage of science development and
technology are widely used various microstrip microwave
devices. The development of such devices assumes the
numerical or physical modeling of the processes occurring
in them, and following optimization of the device in order
to achieve the required values of its output parameters [1].
Design of microwave devices, as a rule, is carried out
with the use of specialized computer-aided design [2].
However, the real parameters of the microstrip microwave
devices can be significantly different from designed
parameters. This can be due to various factors:
inhomogeneity of parameters of the materials used,
MIPRO 2016/MEET
deviations microstripes size and quality of their surface,
and inaccurate mathematical models of the device
components.
In this connection arises accounting task of indicated
factors during development. In [3] propose calculationexperimental method of parametric optimization (CEMO),
allowing to solve the given task based on the use of
experimental output characteristics of microwave devices.
Purpose of the given work is to study the possibility of
applying CEMO at designing of microstrip microwave
filters.
II. THE ALGORITHM OF THE CALCULATIONEXPERIMENTAL METHOD BASED ON CHEBYSHEV
APPROXIMATION OF THE OUTPUT CHARACTERISTICS
In the process of designing microwave devices, as a
rule, solves the problem of parametric synthesis of the
vector v variable parameters of the device in order to
minimize the deviation of the output characteristic of the
from the given characteristics of
in the
frequency domain
:
(1)
The implementation process is carried out usually
with the use of specialized tools computer-aided design
(for example, NI AWR Microwave Office [2]) using
various approximate mathematical models of components
for microwave devices design. Accordingly, the resulting
output characteristic
may differ from the
experimental characteristics
of a real device (
is the optimal vector of parameters in the device
synthesized in the solution of the task (1)). Proposed in
[3] CEMO is based on correction results of the synthesis
using the experimental device characteristics. Thus the
physical implementation of the device is considered as an
intermediate step of the iterative process of its
development.
You can select the following steps of the algorithm of
this method [3, 4]:
95
1.
2.
3.
Synthesizing optimal vector
of the
microwave device parameters for a given
frequency domain
the characteristic
and the calculation of the output
characteristics
(i.e. the solution of task
(1)) with the use of computer-aided design.
Making of microwave devices designed for the
model and measuring the real characteristics
at the points of the frequency range
, corresponding to the output
characteristics
of the Chebyshev
approximation. The use of Chebyshev
approximation allows the use in the synthesis
process of the vector
the minimum number of
points for a given accuracy of the approximation
and correspondingly the minimum number of
experimental data.
The construction new target characteristics
, takes into account the difference
between the calculated
and experimental
output
characteristics is carried out by the specular
reflection amendment
relative to the origin
given characteristics
:
(2)
The use of output characteristics
for a
given correction characteristics
allows to
take into account inaccuracies of the
mathematical model of the output characteristics
and the features of the materials used
and the manufacturing process of the microwave
devices.
4.
5.
6.
Synthesis of new optimal vector
parameters
of microwave devices by new a given
characteristic
and the calculation of the
output characteristics
(i.e. solution of
task (1) with a modified target function). Thus as
the initial approximation used the earlier
resulting vector .
Making the new device with the parameter
vector
, and to measure its output
characteristics
.
If the obtained output characteristic
the required accuracy coincides with a given
output characteristic
, the process ends with
the development of the device, otherwise returns
to step 3 with the use of (2) a correction value
in the formation
of new characteristics and a given vector
as a
first approximation for synthesizing the device
settings at step 4.
In [3] it is shown that in most cases only one of the
calculation-experimental cycle for the satisfactory
coincidence of the functions
with a given
.
In order to assess the applicability of this method in
the designing microstrip microwave filters was carried out
96
to develop a band pass filter fourth order on half-wave
resonators by the given algorithm.
III. DEVELOPMENT OF MICROSTRIP MICROWAVE
FILTER USING COMPUTATIONAL EXPERIMENTAL METHOD
Developed microstrip microwave filter should have
the following characteristics:
type of filter – band-pass;
center frequency
;
the boundary frequencies of passband:
,
;
the non-uniformity of modulus of transmission
coefficient
of the filter in a passband of not
more than
;
the module of the reflection coefficient
filter in the passband must have the minimum
possible deviation from the theoretically
calculated and the minimum non-uniformity,
measured by the maximum in bandwidth;
the filter should be connected to microstrip lines
with a characteristic impedance
;
the required transfer function must be the
Chebyshev polynomial of the fourth order;
mathematical models of the components of the
filter must have the ultimate shape;
the substrate material of the filter – two-sided
coated fibreglass FR4-2 c parameters:
,
,
,
.
Filter topology on half-wave resonators, working
principle of which is discussed in detail in [5], is shown
in Fig. 1.
a)
b)
Figure 1. Topology of the developed filter
In accordance with steps 1 and 2 CEMO in the
program NI AWR Microwave Office (Trial version) was
synthesized microstrip band pass filter fourth order on
half-wave resonators with Chebyshev transfer function,
satisfying the above requirements and made its prototype.
The vector of optimal parameters
of the filter are
given in section a) Table. 1. Calculated and experimental
–parameters of the filter shown in Fig. 2.
MIPRO 2016/MEET
A sufficiently large difference between the
experimental characteristics of the filter calculated from
“Fig. 2” may be related to characteristics of the
technology used in the manufacture of filter
inhomogeneity relative permittivity of the substrate and
the thickness of microstrips “Fig. 1b”, the quality of the
surface, and inaccurate mathematical model of the filter,
taking into account in the calculation of -wave.
On the basis of the detailed analysis of the influence
of the dimensions of the filter to half-wave resonators in
its output characteristics, is given in [5], were defined
varying parameters in the vector :
- length of
microstrip all components of the filter and – gaps in
segments of the associated lines “Fig. 1”.
New vector
, the optimal filter parameters,
obtained by synthesis on a given new feature
given
in item b) of the table. 1. Output characteristics of the
layout of the filters with the parameters defined by the
vectors
, is shown in Fig. 3.
TABLE I.
№
compon
ent of
filter
Figure 2. Calculated ( , - - - -) and experimental (♦, - -♦- -)
-parameters of the filter parameters synthesized in NI AWR
according to specified requirements
Further optimization of the filter design for the
purpose of reduction of reflection coefficient module
layout of the experimental to the calculated mean
was carried out in accordance with steps 3 – 5 CEMO.
In accordance with the requirements for the formation
of a new specified
in the expression (2) as the
output
and set
characteristics of the
considered module of the reflection coefficient
calculated filter, and the output characteristics of
PARAMETERS OF THE SYNTHESIZED FILTERS
,
mm
a)
,
mm
,
mm
The filter parameters
characteristic
, mm
,
mm
, synthesized by a given
1, 5
1.5
1.9
0.326
17.24
0.035
4.4
2, 4
1.5
2.7
1.383
16.73
0.035
4.4
3
1.5
2.7
1.883
17.08
0.035
4.4
6, 7
1.5
2.856
8.606
0.035
4.4
b) The filter parameters , synthesized by a given characteristic
1, 5
1.5
1.9
0.326
17.25
0.035
4.4
2, 4
1.5
2.7
1.386
16.72
0.035
4.4
3
1.5
2.7
1.914
17.09
0.035
4.4
6, 7
1.5
2.856
8.83
0.035
4.4
c) The filter parameters
, synthesized by a given characteristic
in view of inhomogeneity and
1, 5
1.5
1.9
0.326
17.24
0.039
4.283
2, 4
1.5
2.7
1.383
16.73
0.033
4.507
3
1.5
2.7
1.883
17.08
0.040
4.348
6, 7
1.5
2.856
8.606
0.035
4.727
d) The filter parameters
, synthesized by a given characteristic
in view of inhomogeneity и
1, 5
1.5
1.9
0.345
17.351
0.039
4.283
2, 4
1.5
2.7
1.371
16.605
0.033
4.507
3
1.5
2.7
1.880
17.113
0.040
4.348
6, 7
1.5
2.856
6.900
0.035
4.727
The difference between the outer lateral maximums
characteristic filter with parameter vector
is 5
.
The difference for the characteristic with the vector
parameters
is equal
“Fig. 3”. As you can see,
using one iteration, CEMO, to improve the characteristics
of the filter failed. This may be explained by the wrong
choice at step 4 variable filter parameters, do not provide
in this case a consideration of the factors discussed above
affecting the output characteristic.
Figure 3. Experimental -parameters of the filter with the parameters
of (•, - -•- -) and
(♦, - -♦- -), obtained CEMO
experimental
–
of experimental filter,
obtained in steps 1 and 2. Thus the calculated and
experimental
was set to
points of the
frequency range
used Chebyshev approximation
“Fig. 2”.
MIPRO 2016/MEET
For the purpose of solving this task was proposed to
modify steps 3, 4 CEMO by using prior synthesis to
adjust the parameters of the materials used and, if
necessary, the study of mathematical models of
components of the device according to the experiment
proposed in [6] for the synthesis of waveguide
microwave systems [7].
This idea has been used to optimize the filter
developed by adjusting the technological parameters of
97
the experimental layout: the relative dielectric constant
of the substrate and the thickness of microstrips.
Modified CEMO step 3 (step 3’) consists of the
solutions of problem (1) synthesis of the optimal
correcting vector
in which the varied parameters are
only
and elements, and the remainder obtained in
step 1 correspond to the vector . At the same time as
the given characteristics of
is used characteristic
output experimental
–
, measured in step 2.
Modified CEMO step 4 (step 4’) solves the problem
(1) synthesis of new optimal vector
, where as the
original CEMO, varied parameters are
– length of
microstrip all components of the filter and – gaps in
segments of the associated lines [5], but
and
elements obtained in step 3’, correspond to the vector
. As specified
is used as output estimated
characteristics
–
, calculated in step 1.
The difference for the characteristic with the vector
parameters
is equal
“Fig. 4”. As can be seen,
the use of a mathematical model of the filter to adjust the
parameters of the substrate according to the experimental
data allowed us to align
filter. The relative
deviations of parameters from nominal substrate for filter
components are small “Table. 2” and correspond to the
tolerances on the characteristics of the material used.
Thus, the proposed modification based on the correct
mathematical model of the filter on its real parameters,
greatly improves the efficiency of CEMO.
IV.
CONLUSION
For example, the development of bandpass
microstrip microwave fourth-order filter on the half-wave
resonators with Chebyshev characteristic evaluated the
effectiveness
calculation-experimental
optimization
method, founded on an iterative process of correction of
the results of the developed filter synthesis using his
experimental output characteristics.
It is shown that the method allows to take into
account the influence of various factors relating to
process of manufacturing and features of the materials
used. However, additive methods of accounting in the
target function of deviations from given output
characteristics do not provide a rapid convergence
process at large deviations. To reduce the unevenness of
the reflection coefficient module of the filter for one
iteration failed.
Figure 4. Experimental -parameters of the filter with the parameters
of (•, - -•- -) and
(♦, - -♦- -), the obtained modified
CEMO
Further in accordance with steps 5 and 6 CEMO the
manufacture of the new layout of the filter with parameter
vector
and to measure its output characteristics
.
Obtained in steps 3’, 4’ corrective
and new
vector of optimal parameters of filter are given in item c)
and d) of the table. 1. Output characteristics of the layout
of the filters with the parameters defined by the vectors
and
, is shown in Fig. 4.
TABLE II. DEVIATIONS OF SUBSTRATE PARAMETERS FROM THE
With the aim of increasing the effectiveness of
calculation-experimental method of optimization is
offered his modification, consisting in an additional
adjustment of materials parameters and mathematical
models of components of experimental output
characteristics of the device. The proposed modification
of the method can be used for a wide class of microstrip
devices.
Accounting irregularities of the relative dielectric
permittivity and the thickness of the microstrip filter
components in its design with the help of a modified
method has allowed for a single iteration to reduce the
unevenness of the reflection coefficient module of the
filter with 7.1 dB to 1.7 dB.
ACKNOWLEDGMENT
Work is performed with financial support of Ministry
of Education and Science of Russian Federation within the
project part of state assignment in the field of scientific
activity #3.1155.2014/K.
REFERENCES
NOMINAL
№ component of filter
1, 5
2, 4
3
6, 7
,%
-2.7
2.4
-1.2
7.3
,%
11.4
-5.7
14.3
0
The difference between the outer lateral maximums
characteristic filter with parameter vector
is 5
.
98
[1]
T. C. Edwards, M. B. Steer, “Foundations for Microstrip Circuit
Design, Fourth Edition”, Hardcover - Wiley & Sons Ltd, 2016.
[2]
http://www.awrcorp.com/products/microwave-office.
[3]
B. M. Kac, V. P. Meshchanov, A. L. Feldshtein, “Optimalnyj
sintez ustrojstv svch s T volnami./ Pod red. V. P. Meshchanov,
M.:Radio i svyaz, 1984. 288 s.
MIPRO 2016/MEET
[4]
A. M. Bogdanov, M. V. Davidovich, B. M. Kac i dr. “Sintez
sverhshirokopolosnyh mikrovolnovyh struktur./ Pod red. A. P.
Krenickogo i V. P. Meshchanova, M.: Radio i svyaz, 2005, 514 s.
[5]
M. I. Nikitina, “Sistema proektirovaniya mikropoloskovyh
polosno-propuskayushchih filtrov”, dis. kand. tekh. nauk,
Krasnoyarsk, Krasnoyarskij gos. tech.un-t, 1998, 151 s.
[6]
A. N. Savin, “Issledovanie ehlektrodinamicheskih harakteristik
struktur vakuumnoj ehlektroniki i magnitoehlektroniki svch na
MIPRO 2016/MEET
osnove regressionnyh modelej”, dis. kand. fiz. – mat. nauk,
Saratov, Saratovskij gos. un-t, 2003, 184 s.
[7]
I. A. Nakrap, A. N. Savin, and Yu. P. Sharaevskii, “Modeling of
Wideband Slow-Wave Structures of Coupled-Cavities-Chain Type
by the Impedance Design of Experiments”, Journal of
Communications Technology and Electronics, 2006, Vol. 51, No.
3, pp. 316–323.
99
Minimax Design of Multiplierless Sharpened
CIC Filters Based on Interval Analysis
Goran Molnar1, Aljosa Dudarin1, and Mladen Vucic2
1
Ericsson Nikola Tesla d. d.
Krapinska 45, 10000 Zagreb, Croatia
2
University of Zagreb, Faculty of Electrical Engineering and Computing
Department of Electronic Systems and Information Processing
Unska 3, 10000 Zagreb, Croatia
E-mails: goran.molnar@ericsson.com, aljosa.dudarin@ericsson.com, mladen.vucic@fer.hr
AbstractPolynomial sharpening of amplitude responses of
cascaded-integrator-comb (CIC) filters is often used to
improve folding-band attenuations. The design of sharpened
CIC (SCIC) filters is based on searching the polynomial
coefficients ensuring required magnitude response. Several
design methods have been developed, resulting in real,
integer, and sum-of-powers-of-two (SPT) coefficients. The
latter are preferable since they result in multiplierless
structures. In this paper, we present a method for the design
of SCIC filters with SPT polynomial coefficients by using the
minimax error criterion in folding bands. To obtain the
coefficients, we use a global optimization technique based on
the interval analysis. The features of the presented method
are illustrated with the design of wideband SCIC filters.
I. INTRODUCTION
Cascaded-integrator-comb (CIC) filter [1] is common
building block in digital-down converters. However, in
processing of wideband signals, the CIC filter is often
incapable of meeting the requirement for high folding-band
attenuations. To improve the CIC-filter folding-band
response, various structures have been developed. An
efficient structure arises from the polynomial sharpening of
CIC response [2]. This structure implements so called
sharpened CIC (SCIC) filter.
The design of sharpened CIC filters is based on
searching the polynomial coefficients ensuring required
magnitude response. Several design methods have been
developed, resulting in real, integer, and sum-of-powersof-two (SPT) coefficients. The latter are preferable since
they result in multiplierless structures.
Well-established sharpening method was developed
by Kaiser and Hamming [3]. It gives integer coefficients
using analytic expression. The incorporation of the KaiserHamming polynomial in SCIC filter was first presented in
[4]. This structure is further improved in [5]−[8]. Recently,
the Chebyshev polynomials have been used in the design
of SCIC filters with very high folding-band attenuations
[9]. Furthermore, closed-form methods for the design of
This paper was supported in part by Ericsson Nikola Tesla d.d. and
University of Zagreb, Faculty of Electrical Engineering and Computing
under the project Improvement of Competences for LTE Radio Access
Equipment (ILTERA), and in part by Croatian Science Foundation under
the project Beyond Nyquist Limit, grant no. IP-2014-09-2625.
100
SCIC filters using the weighted least-squares [10] and
minimax [11] error criterion in the passband and folding
bands have been proposed. These methods provide SCIC
filters ensuring small passband deviations and rather high
folding-band attenuations. However, the coefficients
obtained take real values, what results in the structure
employing multipliers. In [12], the particle swarm
optimization has been used to calculate SPT polynomial
coefficients in order to achieve a given passband deviation.
In [13], partially sharpened CIC filters have been
developed, employing the polynomials in Bernstein's form
[14], [15]. The proposed filters are multiplierless, but they
only support power-of-two decimation factors. In [16] and
[17], the sharpening has been combined with passband
compensation.
In this paper, we present a method for the design of
multiplierless SCIC filters by using the minimax error
criterion in folding bands only. The design is performed
over the SPT coefficient space. To obtain the coefficients,
we use a global optimization technique based on the
interval analysis [18]. The presented method brings
significant reduction in the filter complexity compared to
the Chebyshev SCIC filters [9].
The paper is organized as follows. Section II briefly
describes the sharpening of CIC filters. The method for the
minimax design of multiplierless SCIC filters is presented
in Section III. Section IV contains examples illustrating the
features of the proposed filters.
II. SHARPENED CIC FILTER
The amplitude response of the CIC filter of the Nth
order is given by [1]
R
1 sin 2
H CIC ( )
R
sin 2
N
(1)
where R denotes the decimation factor. The sharpening
polynomial of the Mth order is given by
MIPRO 2016/MEET
Figure 1. Structure of sharpened CIC decimation filter.
f ( x)
M
am x m
(2)
m 0
By substituting x = HCIC() into (2), we obtain the
amplitude response of the sharpened CIC filter. It is given
by
M
m ( )
H SCIC ( )
am H CIC
m 0
M
m ( )
am H CIC
(4)
m 1
The structure of SCIC decimation filter is shown in
Figure 1 [2]. It is clear from the figure that one multiplier
is necessary for each non-zero polynomial coefficient. It
makes the structure complex, especially for polynomials of
high orders. However, by using the SPT coefficients, the
multipliers are replaced by adders, resulting in an efficient
structure. It should be noted that the structure is suitable if
the delay elements introduce integer delays. It is achieved
if N(R+1) is an even number. The structure can be further
simplified such that elements with delays greater than R
are split into two blocks by applying the noble identity.
Consequently, one element operates at high, whereas the
other one operates at low sampling rate.
III. MINIMAX DESIGN OF MULTIPLIERLESS
SHARPENED CIC FILTERS
A. Minimax Approximation
Two approaches to the design of sharpened CIC
filters have been considered in literature. The first
approach simultaneously sharpens the passband and the
folding-band responses. The sharpening polynomial is
obtained using the maximally flat [4], least squares [10], or
minimax [11] approximation. The second approach
includes sharpening only within the folding bands. Such an
approach includes the design based on the Chebyshev
polynomials [9]. To obtain high folding-band attenuations,
the second approach is preferable.
MIPRO 2016/MEET
(a) max w( ) H SCIC ( , a) H d ( )
(5)
where a is the vector of polynomial coefficients given by
a a1 a 2 a M
(3)
It is well known that the amplitude response of a CIC filter
has zeros placed at the central frequencies of the folding
bands. This property is ensured if a0 = 0. Therefore, we
deal with the amplitude response
H SCIC ( )
Here, we perform the minimax design of SCIC filters
within the folding bands. We start the design with
weighted absolute error
T
(6)
w() is a positive weighting function, and Hd() is the
desired amplitude response defined within the band Ω. The
error function in (5) should take into account only the
folding bands. They are given by
2n
2 n
p
p
R
R
p
;
n 1, ,
R
1
2
(7)
for an even R, and by
2 n
2n
p
p
R
R
; n 1, ,
R 1
2
(8)
for an odd R, where p is the edge of filter's passband.
In folding bands, the desired response is zero. In
addition, we assume the unity weighting function. By
substituting Hd() = 0 and w() = 1 into error function in
(5), we arrive at
(a) max H SCIC ( , a)
(9)
Note that function in (9) does not take into account the
passband gain of the filter. Therefore, we normalized it to
constant passband gain at one frequency. Here, we choose
the unity gain at = 0. The error function thus takes the
form
H SCIC ( , a)
H SCIC (0, a)
(a) max
(10)
The expression for HSCIC(0,a) is easily obtained by
substituting = 0 into (4), resulting in
H SCIC (0, a)
M
am
(11)
m 1
Finally, by substituting (11) into (10), we arrive at
101
IV. DESIGN EXAMPLES
M
m
( )
am H CIC
(a) max m 1
M
(12)
am
m 1
In the design, we calculate (a) using the CIC
response evaluated on uniformly spaced frequency grid
Q = {k; k=0, ..., K1} defined within the folding bands,
. Hence, the objective function is obtained as
M
m
(k )
am H CIC
(a) max m 1
k Q
M
(13)
am
m 1
B. Problem Formulation
Our goal is to find the optimum SPT polynomial
coefficients of the sharpened CIC filter in the minimax
sense. Such a design is described by the optimization
problem
aˆ arg min (a)
a
(14)
subject to: a is SPT representable
According to the real polynomial coefficients
obtained with the methods in [9] and [11], it is reasonable
to assume the SPT coefficients have integer and fractional
parts. Therefore, we deal with am expressed as
am
S 1
bm, k 2k
k F
; m 1,, M
(15)
where bm,k{1,0,1}, and S and F are the wordlengths of
the integer and fractional part. Each bm,k 0 is called term.
Generally, it occupies one adder in implementation.
Therefore, the number of terms per polynomial or the
number of terms per coefficient is limited to a prescribed
value, P. For simplicity, here we use the latter approach.
From (13) it is clear that ε(a) = ε(−a). Apparently, two
global minimizers with opposite signs exist. However, we
need the minimizer that provides positive gain of the filter.
It is achieved by adding the constraint HSCIC(0,a) > 0 to the
objective function. It simplifies the search because the
optimization deals with only a half of the overall SPT
coefficient space.
To obtain the optimum SPT coefficients, we employ
the global optimization technique based on the interval
analysis. Recently, such a technique has been used in the
minimax design of symmetric non-recursive filters [18].
According to the paper referred to, solving the problem in
(13)−(15) is based on the interval extension of the
objective function. In our design of SCIC filter, the interval
extension of the objective function in (13) can be easily
obtained by using the extensions of elementary operations
and functions.
102
To illustrate the features of the proposed
multiplierless SCIC filters, two examples of wideband
filters are described. The first example presents simple
sharpened CIC filters, whereas the second example
describes a filter offering very high folding-band
attenuations.
The optimum SPT polynomial coefficients of the
SCIC filters are given in Table I, together with their
maximum passband deviations and minimum folding-band
attenuations. The coefficients are tabulated for various
orders of sharpening polynomials, passband edge
frequencies, and filter complexities, assuming R = 10,
K = 900, and S+F = 20.
It is well known that for a given order of the CIC
filter, the amplitude response negligibly changes the shape
within the passband and folding bands for R ≥ 10 [1]. In
that sense, the tabulated SPT coefficients can be used for
sharpening of any CIC filter with R ≥ 10.
A. Simple SCIC Filters
Here, we describe the multiplierless sharpening of the
CIC response with N = 2 and R = 10, by using the
polynomial with M = 3 and P = 1, and the passband edge
frequencies given in Table I. Figure 2 shows the magnitude
responses of the SCIC filters obtained for p = 0.02 and
p = 0.05. The filter with p = 0.02 ensures the
minimum folding-band attenuation of 81 dB, whereas the
filter with p = 0.05 exhibits the minimum attenuation of
106 dB and, as expected, higher passband droop. Other
simple SCIC filters have the passband and folding-band
responses placed somewhere between the described
responses. From Table I, it is clear that provided filters
exhibit rather high folding-band attenuations. However,
they have the structure in which three general-purpose
multipliers are replaced by only one adder. In addition,
they have similar responses within || ≤ 0.5/R, what
makes them suitable for uniform passband compensation.
B. SCIC Filter Offering High Alias Rejection
In this example, we illustrate the design of
multiplierless SCIC filters with high alias rejection. Such a
filter can be obtained by the sharpening of the CIC
response with N = 2 and R = 10, by using the polynomial
with M = 4 and P = 2. The passband edge frequency
p = 0.04 is chosen.
The optimum multiplierless filter is compared with
the filter obtained by the Chebyshev sharpening of the
same CIC response, but with real coefficients of the
sharpening polynomial of the same order [9]. Figure 3
shows the magnitude responses of both filters. Generally,
filters with SPT coefficients contain a real gain constant,
which is usually not implemented in practice. Therefore,
only for comparison purposes, our response is normalized
to 0 dB at = 0.
It is clear from Figure 3 that the responses are very
similar. The Chebyshev response ensures the minimum
folding-band attenuation of 141 dB, whereas the
multiplierless filter exhibits somewhat smaller attenuation
MIPRO 2016/MEET
TABLE I
SPT POLYNOMIAL COEFFICIENTS, PASSBAND DROOP (AP), AND MINIMUM FOLDING-BAND ATTENUATION (AS) OF VARIOUS MINIMAX
SHARPENED CIC FILTERS WITH N = 2 AND R 10
M=3
am
a1
a2
a3
AP, dB
AS, dB
p=0.2/R
P=1
2–14
–2–6
20
0.86
132
P=2
–14
2 +2–19
–2–6–2–10
20–2–5
0.86
142
p=0.25/R
P=1
2–12
–2–5
20
1.35
125
P=2
–12
2 –2–14
–2–5+2–11
20+2–3
1.35
129
p= /(3R)
P=1
2–10
–2–4
20
2.43
106
P=2
–10
2 –2–12
–2–4–2–9
20+2–2
2.42
113
p=0.4/R
P=1
2–11
–2–4
20
3.52
94.9
P=2
–9
2 +2–14
–2–3+2–7
21–2–1
3.54
102
p=0.5/R
P=1
2–8
–2–3
20
5.69
81.0
P=2
2 –2–12
–2–3–2–8
20+2–7
5.70
89.3
–8
M=4
am
a1
a2
a3
a4
AP, dB
AS, dB
p= /(3R)
P=1
0
2–10
–2–4
20
3.23
144
P=2
–2–16+2–19
2–9–2–14
–2–4–2–6
20+2–8
3.24
150
p=0.4/R
P=1
2–15
–2–14
–2–4
20
4.67
128
P=2
–2–15–2–17
2–8+2–14
–2–3+2–8
20+2–3
4.73
139
p=0.5/R
P=1
2–13
2–9
–2–3
20
7.50
110
P=2
–2–12+2–18
2–6+2–10
–2–2–2–4
21–2–2
7.62
120
p=0.6/R
P=1
–2–14
2–6
–2–2
20
11.4
96.4
P=2
–2–11–2–14
2–5–2–7
–2–2–2–5
20+2–6
11.5
105
p=2 /(3R)
P=1
–2–12
2–6
–2–2
20
14.3
87.7
P=2
–2–9+2–14
2–4+2–10
–2–1–2–3
21–2–3
14.7
97.6
M=5
am
a1
a2
a3
a4
a5
AP, dB
AS, dB
p=0.5/R
P=1
–2–18
2–12
2–10
–2–3
20
9.32
139
P=2
2–18
–2–11+2–17
2–6+2–10
–2–2+2–5
20–2–6
9.52
150
p=0.6/R
P=1
–2–16
2–13
2–6
–2–2
20
14.0
122
P=2
2–14
–2–8
–4
2 +2–6
–2–1–2–3
21–2–2
14.4
132
p=2 /(3R)
P=1
–2–14
2–9
2–8
–2–2
20
17.7
109
of 139 dB. In the passband, the filters introduce nearly the
same droop of 4.73 dB.
From the complexity point of view, the Chebyshev
SCIC filter needs five general-purpose multipliers to
incorporate sharpening polynomial in the structure.
However, our filter employs only six adders instead.
V. CONCLUSION
The design of multiplierless sharpened CIC filters
based on minimax approximation in folding bands was
presented. The design was formulated as an unconstrained
optimization problem. The optimum polynomial
coefficients are obtained by using the global optimization
technique based on the interval analysis. The proposed
filters exhibit similar amplitude behavior as the Chebyshev
sharpened CIC filters. From the complexity point of view,
the multiplierless filters are favorable, because they bring
significant reduction in the structure.
MIPRO 2016/MEET
P=2
2 +2–18
–2–7+2–12
2–3+2–6
–20–2–6
21+2–1
18.3
123
–13
p=0.75/R
P=1
2–12
2–11
–2–13
–2–2
20
22.9
93.6
P=2
2 –2–13
–2–6+2–10
2–2–2–4
–20+2–5
21–2–2
24.7
111
–11
p=0.8/R
P=1
–2–10
2–8
2–3
–20
21
28.7
91.7
P=2
2 –2–17
–2–6–2–10
2–2–2–7
–20–2–2
21+2–3
29.2
102
–12
VI. REFERENCES
[1] E. B. Hogenauer, “An economical class of digital filters for
decimation and interpolation,” IEEE Trans. Acoust., Speech,
Signal Process., vol. 29, no. 2, pp. 155–162, Apr. 1981.
[2] T. Saramäki and T. Ritoniemi, “A modified comb filter
structure for decimation,” in Proc. IEEE Int. Symp. Circuits
Syst., vol. 4, Hong Kong, Jun. 1997, pp. 2353–2356.
[3] J. Kaiser and R. Hamming, “Sharpening the response of a
symmetric nonrecursive filter by multiple use of the same
filter,” IEEE Trans. Acoust., Speech, Signal Process.,
vol. ASSP-25, no. 5, pp. 415–422, Oct. 1977.
[4] A. Y. Kwentus, Z. Jiang, and A. N. Willson Jr., “Application
of filter sharpening to cascaded integrator-comb decimation
filters,” IEEE Trans. Signal Process., vol. 45, no. 2,
pp. 457–467, Feb. 1997.
[5] G. Stephen and R. W. Stewart, “High-speed sharpening of
decimating CIC filter,” Electron. Lett., vol. 40, no. 21,
pp. 1383–1384, Oct. 2004.
[6] G. Jovanovic Dolecek and S. K. Mitra, “A new two-stage
sharpened comb decimator,” IEEE Trans. Circuits Syst. I:
Reg. Papers, vol. 52, no. 7, pp. 1414–1420, Jul. 2005.
103
Figure 2. Magnitude responses of sharpened CIC filters with
N = 2 and R = 10, obtained by minimax SPT sharpening with
M = 3 and P = 1, assuming passbands are || 0.02 and
|| 0.05.
Figure 3. Magnitude responses of sharpened CIC filters with
N = 2 and R = 10, obtained by minimax SPT sharpening with
M = 4 and P = 2, and by Chebyshev sharpening of fourth order.
Passband is || 0.04.
[7] Q. Liu and J. Gao, “Efficient comb decimation filter with
sharpened magnitude response,” in Proc. 5th Int. Conf.
WICOM, 2009, pp. 1–4.
[8] H. Zaimin, H. Yonghui, K. Wang, J. Wu, J. Hou, and L. Ma,
“A novel CIC decimation filter for GNSS receiver based on
software defined radio,” in Proc. 7th Int. Conf. WICOM,
2011, pp. 1–4.
[9] J. O. Coleman, “Chebyshev stopbands for CIC decimation
filters and CIC-implemented array tapers in 1D and 2D,”
IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 59, no. 12,
pp. 2956–2968, Dec. 2012.
[10] G. Molnar, M. Glavinic Pecotic, and M. Vucic, “Weighted
least-squares design of sharpened CIC filters,” in Proc. 36th
Int. Conf. MIPRO, vol. MEET, Opatija, Croatia, May 2013,
pp. 104–108.
[11] G. Molnar and M. Vucic, “Weighted minimax design of
sharpened CIC filters,” in Proc. IEEE 20th ICECS,
Abu Dhabi, UAE, Dec. 2013, pp. 869–872.
[12] M. Laddomada, D. E. Troncoso, and G. Jovanovic Dolecek,
“Improved sharpening of comb-based decimation filters:
Analysis and design,” in Proc. IEEE 11th Int. Conf. CCNC,
2014, pp. 11–16.
[13] M. G. C. Jiménez, V. C. Reyes, and G. Jovanovic Dolecek,
“Sharpening of non-recursive comb decimation structure,”
in Proc. 13th ISCIT, 2013, pp. 458–463.
[14] R. J. Hartnett and G. F. Boudreaux-Bartels, “Improved filter
sharpening,” IEEE Trans. Signal Process., vol. 43, no.12,
pp. 2805–2810, Dec. 1995.
[15] S. Samadi, “Explicit formula for improved filter sharpening
polynomial,” IEEE Trans. Signal Process., vol. 48, no. 10,
pp. 2957–2959, Nov. 2000.
[16] D. E. Troncoso Romero, M. Laddomada, and G. Jovanovic
Dolecek, “Optimal sharpening of compensated comb
decimation filters: analysis and design,” Sci. World J.,
Hindawi, vol. 2014, ID 950860, 9 pages.
[17] M. G. C. Jiménez, D. E. Troncoso Romero and G. Jovanovic
Dolecek, “On simple comb decimation structure based on
Chebyshev sharpening,” in Proc. IEEE 6th LASCAS,
Feb. 2015, pp. 1–4.
[18] M. Vucic, G. Molnar, and T. Zgaljic, “Design of FIR filters
based on interval analysis,” in Proc. 33rd Int. Conf. MIPRO,
vol. MEET & GVS, Opatija, Croatia, May 2010,
pp. 197–202.
104
MIPRO 2016/MEET
Minimization of Maximum Electric Field in
High-Voltage Parallel-Plate Capacitor
Raul Blecic∗† , Quentin Diduck‡ , Adrijan Baric∗
∗ University
of Zagreb, Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia
Tel: +385 (0)1 6129547, fax: +385 (0)1 6129653, e-mail: raul.blecic@fer.hr
† KU Leuven, ESAT-TELEMIC, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
‡ Ballistic Devices Inc, 904 Madison St, Santa Clara, CA, 95050 United States
Abstract—Minimization of maximum electric field of a
parallel-plate capacitor for high-voltage and temperature
stable applications is presented. Cubic zirconia is used as
a dielectric material because of its high relative permittivity,
high dielectric strength and high temperature stability. The
maximum electric field present in the structure limits the
maximum achievable capacitance of the capacitor structure.
Reducing the maximum electric field of the capacitor allows
to reduce the thickness of the dielectric material, which
increases its capacitance. The impact of geometrical and
electrical parameters of the parallel-plate capacitor on the
maximum electric field is analyzed by a 2D multiphysics
solver. The guidelines for the minimization of the maximum
electric field are given.
Index Terms—cubic zirconia, dielectric strength, edge
effects, electrostatics, fringing fields.
I. I NTRODUCTION
High relative permittivity, high dielectric strength and
high temperature stability of cubic zirconia [1], [2], [3]
make it a candidate for a dielectric material for highvoltage, temperature stable capacitors.
Most of the capacitance of a parallel-plate capacitor
is contained in the central part of the structure. Yet, the
electric field at the edges of the structure can be several
times higher than the field in its center [4], which reduces
the maximum achievable capacitance of the capacitor. Reducing the maximum electric field present in the capacitor
allows to reduce the thickness of the dielectric material
which increases the capacitance.
The objective of this paper is to analyze the impact
of geometrical and electrical parameters of a parallelplate capacitor on the maximum electric field in order
to minimize the maximum electric field present in the
capacitor and to consequently maximize its capacitance.
The analysis is performed by a 2D multiphysics solver.
This paper is structured as follows. Section II introduces
a parallel-plate capacitor. Section III presents a numerical
analysis of the impact of electrical and geometrical parameters of the structure on the maximum electric field.
Section IV concludes the paper.
II. C APACITANCE OF A PARALLEL -P LATE C APACITOR
Capacitance of a parallel-plate capacitor, neglecting the
contribution of the fringing fields, can be expressed as:
Cpp = ·
MIPRO 2016/MEET
S
,
d
(1)
where is the permittivity of the material between the
plates, S is the area of the plates and d is the separation
between the plates.
The total capacitance of a parallel-plate capacitor, taking
the fringing fields into account (assuming that the plates
are completely surrounded by the same material), can be
expressed by the Palmer’s equation [5]:
Ctotal
Ka
Kb
S
(2)
= · · Ka · Kb ,
d
2·π·a
d
1 + ln
, (3)
=
1+
π·a
d
d
2·π·b
=
1+
1 + ln
, (4)
π·b
d
where is the permittivity of the material which surrounds
the plates, while a and b are the width and the length of
the plates (S = a · b).
The relative contribution of the fringing fields to the
total capacitance of a square parallel-plate capacitor calculated as Cf f = Ctotal − Cpp is shown in Fig. 1. The
contribution of the fringing field capacitance to the total
capacitance of the parallel-plate capacitor is negligible if
the size of the plates is large relative to the separation of
the plates (a >> d and b >> d), which is typically true
for high-capacitance capacitors.
Although the contribution of the fringing fields to the
capacitance of a parallel-plate capacitor is negligible for
large sizes of plates, the fringing fields can be several times
larger than the field in the center of the capacitor [4]. The
fringing fields limit the minimum separation of the plates
and the maximum achievable capacitance of the capacitor.
Reducing the maximum electric field enables the reduction
of the separation between the plates, which increases the
capacitance.
III. 2D S IMULATIONS
A. Simulation Domain
The simulation domain of a parallel-plate capacitor in
a 2D multiphysics solver is shown in Fig. 2. It consists
of two 0.1-mm thick copper plates. Material between the
copper plates is a 0.15-mm thick cubic zirconia which has
a high relative permittivity (r,CZ = 17), high dielectric
strength and high temperature stability. The capacitor
is enclosed by a surrounding material. The surrounding
material has to be able to sustain large electric fields
105
Ctotal [nF]
12
8
TABLE I
PARAMETERS OF THE S IMULATED S TRUCTURE .
d = 0.15 mm
d = 0.3 mm
d = 1.5 mm
4
0
0
25
50
75
100
Width of square copper plates, a = b [mm]
(a)
Cf f [nF]
0.12
0.08
d = 0.15 mm
d = 0.3 mm
d = 1.5 mm
Parameter
Value
Width of the copper plates, wCu [mm]
Width of the surrounding material, wsm [mm]
Thickness of the cubic zirconia, tCZ [mm]
Thickness of the copper plates, tCu [mm]
Relative permittivity of the cubic zirconia, r,CZ
Losses of the cubic zirconia, tan δCZ
Losses of the surrounding material, tan δsm
Conductivity of copper, σCu [S/m]
1
5
0.15
0.1
17
0
0
6e7
TABLE II
VARIABLES OF THE S IMULATED S TRUCTURE .
0.04
0
0
25
50
75
100
Width of square copper plates, a = b [mm]
Variable
Values
Width of the cubic zirconia, wCZ [mm]
Relative permittivity of the surrounding
material, r,sm
0.8–1.6
1, 4, 20, 100
(b)
Cf f /Ctotal [%]
100
d = 0.15 mm
d = 0.3 mm
d = 1.5 mm
75
50
25
0
0
25
50
75
100
Width of square copper plates, a = b [mm]
(c)
Fig. 1. Capacitance of a square parallel-plate capacitor as a function of
the size of the plates: (a) total capacitance, (b) fringing-field capacitance
(Cf f = Ctotal − Cpp ), and (c) relative contribution of the fringing
fields (Cf f = Ctotal − Cpp ) to the total capacitance.
C. Impact of the Width of Cubic Zirconia and the Dielectric Constant of the Surrounding Material
Surrounding
material
wCu
wsm
Cu
tCu
tCZ
CZ
tCu
Cu
wCZ
wsm
Fig. 2. Simulation domain in a 2D multiphysics solver.
present at the edges of the copper plates (has to have a
high dielectric strength), however, its relative permittivity
and temperature stability do not significantly contribute
to the overall capacitance and temperature stability of the
capacitor.
B. Methodology
Stationary electrostatic analysis is performed. Boundary
condition is set to perfect insulator (modelled as the zero
106
charge boundary condition [6]) and it is placed sufficiently
far away from the parallel-plate capacitor to have a negligible impact on the simulation results. The top copper
plate is set to 1 V, while the bottom copper plate is set
to 0 V. This gives the electric field in the center of the
structure equal to Ecenter = (1 V)/tCZ = 6.67 kV/m.
The parameters and variables of the simulated structure
are given in Table I and Table II, respectively. The width
of the copper plates is set to a small value (wCu = 1 mm)
to reduce the simulation domain. In actual application, it
is much larger to provide a sufficiently large capacitance.
Parametric simulations for the different widths of cubic
zirconia wCZ and for four values of the dielectric constant of the surrounding material r,sm is performed. The
maximum electric fields in the cubic zirconia and in the
surrounding material are shown in Fig. 3.
The results can be divided into three regions, the first
region in which the width of the cubic zirconia is smaller
than the width of the copper plates (wCZ < 0.95 mm), the
second region in which the width of the cubic zirconia is
larger than the width of the copper plates (wCZ > 1.1
mm), and the third region in which the two sizes are
comparable (0.95 < wCZ < 1.1 mm).
The results in the first region show that the maximum
electric field in the cubic zirconia decreases as its width decreases. Also, it shows a weak dependence on the relative
permittivity of the surrounding material. The maximum
field in the surrounding material does not depend on its
relative permittivity nor on the width of the cubic zirconia.
In the second region, the maximum electric fields in
the cubic zirconia and in the surrounding material depend
on the relative permittivity of the surrounding material.
Increasing it decreases the maximum electric field in both
materials.
In the third region, the maximum electric fields in
MIPRO 2016/MEET
max(ECZ ) [kV/m]
120
Surrounding material (r,sm = 100)
ǫr,sm =1 ǫr,sm =4 ǫr,sm =20 ǫr,sm =100
90
Cu
CZ (wCZ = 0.8 mm)
Cu
60
30
0
0.8
1
1.2
1.4
Width of cubic zirconia, wCZ [mm]
1.6
(a)
Surrounding material (r,sm = 100)
max(Esm ) [kV/m]
(a)
120
ǫr,sm =1 ǫr,sm =4 ǫr,sm =20 ǫr,sm =100
Cu
CZ (wCZ = 1.6 mm)
Cu
90
60
30
0
0.8
(b)
1
1.2
1.4
Width of cubic zirconia, wCZ [mm]
1.6
Surrounding material (r,sm = 1)
(b)
Cu
CZ (wCZ = 1.6 mm)
Cu
Fig. 3. Maximum electric field as a function of the width of the cubic
zirconia and the relative permittivity of the surrounding material: (a) in
the cubic zirconia and (b) in the surrounding material.
both materials depend on the relative permittivity of the
surrounding material. Increasing it increases the maximum
electric field in both materials. For relative permittivity
of the surrounding material larger than that of the cubic
zirconia r,sm > r,CZ , the peak maximum electric field
in both materials is obtained for wCZ = wCu .
Comparing all three regions, the first region shows the
smallest maximum electric field in the cubic zirconia.
The second region, for the relative permittivity of the
surrounding material larger than that of the cubic zirconia
r,sm > r,CZ , shows the smallest maximum field in the
complete structure (in the dielectric material and in the
surrounding material). Also, the impact of the surrounding
material on the temperature stability of the capacitor is
expected to be the smallest in the second region since
the surrounding material is not present between the plates.
Therefore, the second region (for relative permittivity of
the surrounding material larger than that of the cubic
zirconia r,sm > r,CZ ) is considered to be the best
solution for the temperature stable applications. The third
region does not provide any significant advantage over the
first and the second region.
The electric field distributions in the simulated structure
for three characteristic cases (wCZ = 0.8 mm, r,sm =
100; wCZ = 1.6 mm, r,sm = 100 and wCZ = 1.6 mm,
r,sm = 1) are shown in Fig. 4. The maximum electric
fields in the cubic zirconia and in the surrounding material
for wCZ = 0.8 mm and for wCZ = 1.6 mm are given in
Table III.
D. Impact of the Shape of the Copper Plates
The impact of the shape of the copper plates on
the maximum electric field is analyzed as follows. Four
structures are analyzed and compared. The first one is
MIPRO 2016/MEET
(c)
Fig. 4. Electric field in the simulated structure for: (a) wCZ = 0.8 mm,
r,sm = 100, (b) wCZ = 1.6 mm, r,sm = 100 and (c) wCZ =
1.6 mm, r,sm = 1. The maximum electric field can be observed at the
edges of the copper plates.
TABLE III
M AXIMUM E LECTRIC F IELD IN THE C UBIC Z IRCONIA AND IN THE
S URROUNDING M ATERIAL FOR THE W IDTH OF THE C UBIC Z IRCONIA
wCZ = 0.8 MM AND wCZ = 1.6 MM , AND FOR THE R ELATIVE
P ERMITTIVITY OF THE S URROUNDING M ATERIAL
r,sm = 1, 4, 20, 100.
wCZ
[mm]
max(ECZ )
[kV/m]
max(Esm )
[kV/m]
1
0.8
1.6
6.67
98.42
38.24
92.41
4
0.8
1.6
6.67
75.79
38.24
69.31
20
0.8
1.6
6.70
35.21
38.24
28.17
100
0.8
1.6
6.72
15.43
38.24
11.48
r,sm
the structure with rectangular edges as shown in Fig. 2.
The second structure has triangular edges (cut at 45◦ )
(Fig. 5(a)), the third has round edges (with the radius
r = tCu /2 = 0.05 mm) (Fig. 5(b)), while the fourth
has Rogowski profile edges (ψ = π/2) terminated with
a circular section [7] (with the radius r = 0.031 mm)
(Fig. 5(c)). The parameters of the simulated structures are
given in Table I. The width of the cubic zirconia is set to
wCZ = 1.6 mm.
The maximum electric fields in the cubic zirconia and in
107
Surrounding
material
wCu
wsm
Cu
tCu
tCZ
CZ
tCu
Cu
TABLE IV
M AXIMUM E LECTRIC F IELD IN THE C UBIC Z IRCONIA AND IN THE
S URROUNDING M ATERIAL FOR THE R ECTANGULAR , T RIANGULAR
AND ROUND E DGES OF THE C OPPER P LATES , AND FOR THE
R ELATIVE P ERMITTIVITY OF THE S URROUNDING M ATERIAL
r,sm = 1, 4, 20, 100 (wCZ = 1.6 MM ).
r,sm
shape of edges
max(ECZ )
[kV/m]
max(Esm )
[kV/m]
1
rectangular
triangular
round
Rogowski profile
98.42
91.67
20.07
7.37
92.41
115.72
332.40
122.08
4
rectangular
triangular
round
Rogowski profile
75.79
54.56
12.25
6.81
69.31
64.04
51.03
29.03
20
rectangular
triangular
round
Rogowski profile
35.21
19.26
8.54
6.67
28.17
13.63
7.65
9.04
100
rectangular
triangular
round
Rogowski profile
15.43
9.34
7.45
6.72
11.48
7.08
3.17
4.63
wCZ
wsm
(a)
Surrounding
material
wCu
wsm
Cu
tCu
tCZ
CZ
tCu
Cu
wCZ
wsm
(b)
Surrounding
material
wCu
wsm
Cu
tCu
tCZ
CZ
tCu
Cu
wCZ
wsm
(c)
Fig. 5. The shapes of copper plates of the parallel-plate capacitor:
(a) triangular edges, (b) round edges (r = tCu /2 = 0.05 mm) and
(c) Rogowski profile edges (ψ = π/2) terminated with a circular section
(r = 0.031 mm).
the surrounding material for the four simulated structures
are given in Table IV. The results show that the Rogowski
profile edges provide the smallest maximum electric field
in the cubic zirconia among the four simulated structures
for any value of the relative permittivity of the surrounding
material. The maximum electric field in the surrounding
material for the structure with the triangular, round or
Rogowski profile edges is significantly reduced only if the
relative permittivity of the surrounding material is larger
108
than that of the cubic zirconia (r,sm > r,CZ ).
Although the structure with the Rogowski profile edges
shows the smallest maximum electric field, the design
of such structure with solid dielectric materials presents
an additional challenge from a mechanical point of view.
The maximum electric field in the cubic zirconia for the
triangular edges and for the relative permittivity of the
surrounding material equal to r,sm = 100 is 40% larger
than the electric field in the center of the structure, while
for the round edges and for the relative permittivity of the
surrounding material equal to r,sm = 100 is only 12%
larger than the electric field in the center of the structure
which provides a significant improvement when compared
to the structure with the rectangular edges.
IV. C ONCLUSION
The impact of geometrical and electrical parameters of
the parallel-plate capacitor, with the cubic zirconia used
as a dielectric material, on the maximum electric field
present in the capacitor structure is analyzed by a 2D
multiphysics solver. The maximum electric field present
in the capacitor can be significantly reduced if the width
of the dielectric is designed to be larger than the width
of the copper plates and if the capacitor is enclosed by
a material with a large relative permittivity, i.e. larger
than the relative permittivity of the dielectric material.
The maximum electric field can be further reduced by
patterning the edges of the copper plates in a triangular,
round or Rogowski profile shape.
R EFERENCES
[1] C.-H. Chang, B. Hsia, J. P. Alper, S. Wang, L. E. Luna, C. Carraro,
S.-Y. Lu, and R. Maboudian, “High-temperature All Solid-state
Microsupercapacitors Based on SiC Nanowires Electrode and YSZ
Electrolyte,” ACS Appl. Mater. Interfaces, vol. 7, pp. 26 658–26 665,
2015.
MIPRO 2016/MEET
[2] O. Jongprateep, V. Petrosky, and F. Dogan, “Yttria Stabilized Zirconia as a Candidate for High Energy Density Capacitor,” Kasetsart
Journal, vol. 42, pp. 373–377, 2008.
[3] ——, “Effects of Yttria Concentration and Microstructure on Electric
Breakdown of Yttria Stabilized Zirconia,” Journal of Metal, Materials and Minerals, vol. 18, no. 1, pp. 9–14, 2008.
[4] K. P. P. Pillai, “Fringing field of finite parallel-plate capacitors,”
Proceedings of the Institution of Electrical Engineers, vol. 117, no. 6,
pp. 1201–1204, June 1970.
MIPRO 2016/MEET
[5] M. Hosseini, G. Zhu, and Y.-A. Peter, “A new formulation of
fringing capacitance and its application to the control of parallelplate electrostatic micro actuators,” in Analog Integr. Circ. Sig.
Process, 2007, pp. 119–128.
[6] COMSOL, AC/DC Module User’s Guide. COMSOL, Nov. 2013.
[7] N. G. Trinh, “Electrode Design for Testing in Uniform Field Gaps,”
IEEE Transactions on Power Apparatus and Systems, vol. PAS-99,
no. 3, pp. 1235–1242, May 1980.
109
Modelling SMD Capacitors by Measurements
Roko Mišlov∗ , Marko Magerl∗ , Sandra Fratte-Sumper† , Bernhard Weiss† , Christian Stockreiter† and Adrijan Barić∗
∗ University
of Zagreb Faculty of Electrical Engineering and Computing, Unska 3, 10000 Zagreb, Croatia
Email: roko.mislov@fer.hr
† ams AG, Tobelbader Strasse 30, Premstaetten 8141, Austria
Email: christian.stockreiter@ams.com
Abstract—The lumped models of several capacitors in
SMD-0603 package are extracted from S-parameter measurements. The S-parameters of the SMD components are
obtained: (i) by modelling the RF connector and transmission
lines and de-embedding, and (ii) by directly calibrating
the reference plane to the SMD component using on-board
calibration standards. The extracted parasitics related to the
SMD component package are compared for capacitors of
different nominal values and for the two calibration methods
in the frequency range up to 8 GHz.
Index Terms—SMD capacitor modelling, SMA connector
modelling, CPW-CB modelling, S-parameter measurements
(a) Test structure (i): SMA connectors connected by CPWG lines.
(b) Test structure (ii): SMA connectors, CPWG lines and the DUT.
I. I NTRODUCTION
Surface mount (SMD) capacitors are used in many
electronics applications, e.g. decoupling in power delivery
networks (PDN) [1], injection of the RF disturbance in
electromagnetic immunity measurements, such as direct
power injection (DPI) [2], or in the output filters for
DC/DC converters [3]. A high-frequency capacitor model
available in the design phase can significantly improve the
performance of the first-pass design.
Several modelling methods for passive SMD components are reported in literature [4], [5], [6], where
the lumped or lumped-distributed models are built using
S-parameters of the SMD components. The S-parameters
are measured using a vector network analyzer (VNA),
primarily because of its high dynamic range and broad
frequency range. Alternatively, the S-parameters can be
obtained by applying the inverse Fourier transform on the
time-domain reflectometry (TDR) measurements [7].
The reference plane of the VNA measurements is
calibrated to the device-under-test (DUT) by measuring
well-defined short, open, load and through (SOLT) calibration standards. The VNA calibration kit enables calibration
up to the end of the coaxial cables. In [4] on-board
calibration standards are used to calibrate the reference up
to the DUT on the PCB. The on-board open calibration
standard is characterized using VNA port extension, based
on the assumption that the remaining on-board calibration
standards are well-defined.
An alternative calibration approach is to model the
fixture between the coaxial cables and the DUT: the radio-frequency (RF) connectors and the transmission lines.
The model can be used to de-embed the fixture from
the measurements obtained by calibrating the reference
110
(c) Test structure (ii): SMA connectors, CPWG lines and the OPEN
standard.
Fig. 1: Manufactured test structures.
plane using the well-defined calibration standards in the
VNA calibration kit.
In this paper, both of these calibration methods are
used and compared. Instead of using port extension,
the on-board calibration standards are characterized by
de-embedding their S-parameters using the fixture model.
Additional test structures are manufactured and measured,
in order to characterize the substrate, the RF connectors
and the transmission lines. The transmission lines are designed as conductor-backed coplanar waveguide (CPWG)
according to the design guidelines given in [5]. Three
capacitors of different nominal values, but in identical
SMD-0603 package are modelled.
This paper is organised as follows. The measurement
setup is presented in Section II, and the measurement results are shown in Section III. The modelling methodology
and results are presented in Section IV, and discussed in
Section V. The paper is concluded in Section VI.
II. M EASUREMENT SETUP
A. Test structures
Two types of test structures are manufactured on printed
circuit boards (PCB): (i) structures for FR4 substrate char-
MIPRO 2016/MEET
TABLE I: The nominal substrate values and the 50 Ω
CPWG line geometry calculated using [11].
parameter
εr
tan δ
substrate height [mm]
Cu thickness [µm]
trace width [mm]
clearence to GND [mm]
nominal value
4.5
0.02
1.5
35
1.548
0.3
TABLE II: Substrate permittivity calculated using the
phase difference of CPWG lines: l1 =25 mm, l2 =50 mm,
l3 =100 mm.
used lines
εef f,avg
εr
2.973
3.017
3.039
4.848
4.935
4.980
l1 , l2
l1 , l3
l2 , l3
TABLE III: Optimized values of the SMA connector and
transmission line models.
parameter
value
C1 [pF], L1 [nH]
C2 [pF], L2 [nH]
C3 [pF], L3 [nH]
lCP W G [mm]
tan δ
0.0044, 0.0326
0.2798, 0.0010
0.1573, 0.0466
23.65
0.0234
Fig. 2: The effective permittivity of the FR4 substrate
extracted from measurements of test structures (i).
Fig. 3: The SMA connector model.
acterization and RF connector modelling, and (ii) structures for measuring the SMD capacitors and on-board
calibration standards. Test structures (i), shown in Fig. 1a,
consist of two SubMiniature version A (SMA) connectors [8] connected by CPWG transmission lines of several
different lengths. The vias connecting the ground planes
are placed along the trace, as recommended in [5].
Test structures (ii), shown in Figs. 1b, 1c, consist of
pads for 2-port measurements of SMD-0603 components
connected to SMA connectors by CPWG transmission
lines. In Fig. 1b this test structure is used to characterize
the DUT as a 2-port component, while in Fig. 1c the
same test structure is used to measure the on-board open
calibration standard. The on-board open consists of two
parallel 1 pF capacitors soldered between each DUT pad
and ground. All components of the on-board calibration
standards have identical SMD-0603 package as the modelled SMD capacitors.
The CPWG transmission lines are designed to have the
characteristic impedance of 50 Ω on the FR4 substrate.
The transmission line geometric parameters, as well as the
nominal substrate parameters used for their calculation, are
summarized in Table I. The S-parameters are measured
in the frequency range from 300 kHz to 8 GHz using
a 2-port VNA [9]. Each measurement is taken in 1600 frequency points in the logarithmic scale using a 500 Hz
resolution bandwidth and an averaging factor of 3. The
modelling is done using Advanced Design System 2015.01
(ADS) [10].
B. Substrate characterization
The effective permittivity of the FR4 substrate is calculated using the S-parameter measurements of the test
structures (i) described in Section II-A. The following
MIPRO 2016/MEET
transmission line lengths are manufactured: l1 = 25 mm,
l2 = 50 mm, l3 = 100 mm. The phase difference between
the S21 parameters of any two of these test structures is
related to the difference of the physical line lengths by the
equation:
√
2πf εef f
2π
∆ϕ = β∆l =
∆l,
(1)
∆l =
λf
c
where ∆l is the difference in physical line length of the
two test structures, β is the wave number, ∆ϕ is the phase
difference of the S21 parameters at each frequency f ,
and εef f is the effective permittivity. This relation enables
the calculation of the effective permittivity as a function
of frequency:
εef f (f ) =
c∆ϕ
2πf ∆l
2
,
(2)
and it is shown in Fig. 2. The average value of the substrate permittivity over the frequency range from 1 GHz
to 8 GHz is given in Table II for all three pairs of
transmission line measurements. The value of the substrate
permittivity εr is calculated from the average effective
permittivity εef f,avg using the CPWG transmission line
model [12] and the frequency independent dielectric loss
model of the substrate in ADS.
C. SMA connector modelling
The topology of the SMA connector model is a simplified version of the lumped-distributed connector model
presented in [13]. The model is shown Fig. 3. The components L1 and C1 represent the transition between the
coaxial cable and the SMA connector. The transmission
111
Measured
Modelled
-60
-80
-100
3x105 106
107
108
Freq [Hz]
(a) S11
109
200
100
-1
100
-2
0
0
-100
-3
-200
8x109
-4
3x105 106
-100
Measured
Modelled
107
Phase [deg]
-40
0
Magnitude [dB]
Magnitude [dB]
-20
200
Phase [deg]
0
108
Freq [Hz]
(b) S21
109
-200
8x109
Fig. 4: Comparison between measurements and model of test structure (i) with the transmission line length l1 = 25 mm.
0
0
100
-40
-50
SOLT calibration
on-board calibration
-60
3x105 106
107
108
Freq [Hz]
(a) S11
109
100
50
0
-50
-100
8x109
50
-20
0
SOLT calibration
on-board calibration
-30
-40
3x105 106
107
108
Freq [Hz]
(b) S21
109
-50
-100
8x109
Fig. 5: Comparison of the de-embedded S-parameters of the 68 pF capacitor for the SOLT and the on-board calibration.
line Z1 models the coaxial section of the connector.
The right-angle transition between the SMA connector
and the CPWG line on the PCB is represented by the
components C2 and L2 , and the part of the transmission
line that is directly below the connector is modelled by
the components C3 and L3 .
The values of the model parameters are obtained by
minimizing the absolute difference between the measured
and modelled S11 and S21 parameters of the test structures (i), described in Section II-A, using the random
and gradient optimization solvers in ADS. The initial
values of the optimized variables are chosen according
to [13]. The optimization takes several minutes on an
Intel R Core R i7 CPU @ 3.0 GHz and 12 GB of RAM.
The transmission lines l1 , l2 , l3 are modelled by optimizing the CPWG model [12]. The lengths of these
transmission lines are defined by a single variable lCP W G ,
such that the modelled line lengths are equal to lCP W ,
(lCP W G + 25 mm), (lCP W G + 75 mm). In this way the
coupling effects between the SMA connector and the
CPWG lines are taken into account consistently. The dielectric loss is modelled by optimizing the loss tangent of
the substrate tan δ. The variables that are not optimized are
the nominal impedance Z1 = 50 Ω of the SMA connector,
the previously obtained substrate permittivity εr = 4.848,
the copper conductivity σCu = 58.5·106 S/m and copper
112
thickness tCu = 35 µm. The random optimizer is used,
followed by the gradient optimizer, with 100 iterations
each. The final values of the optimized variables of the
SMA connector model are given in Table III. The comparison between the measurements and the model of the
test structures (i) with the line length l1 = 25 mm is given
in Fig. 4.
III. M EASUREMENT RESULTS
The SMA connector and the CPWG line on each side
of the DUT in the test structures (ii) is referred to as
the fixture. In order to de-embed the fixture from the
S-parameter measurements, the model of the fixture is
built using the extracted characteristics of the substrate and
the model of the SMA connector. The fixture return loss
is better than 10 dB in the frequency range up to 4.9 GHz
(the worst case return loss is 6.4 dB). The fixture insertion
loss is less than 1.8 dB in the entire frequency range up
to 8 GHz. The fixture model enables de-embedding of the
fixture from the S-parameter measurements obtained by
the VNA calibration kit, i.e. SOLT calibration up to the
end of the coaxial cables.
The on-board calibration standards, i.e. the custom
calibration kit, are soldered on the PCB pads where the
component to be characterized will be soldered later. Then,
each on-board standard is measured separately according
MIPRO 2016/MEET
Phase [deg]
-30
-10
Magnitude [dB]
Magnitude [dB]
-20
Phase [deg]
-10
to the SOLT procedure and stored into the VNA memory.
The short and through standards are implemented using 0 Ω resistors; the load standard are two 100 Ω resistors
in parallel, and the open standard are two 1 pF capacitors in parallel. These SMD components have the
same 0603 package as the characterized capacitors.
In the remainder of this paper, the measurements obtained using the custom calibration kit are referred to
as “on-board calibration”, while the DUT measurements
obtained by de-embedding the fixtures are referred to as
“SOLT calibration”. The comparison between these two
measurement methods is shown in Fig. 5, for the 68 pF capacitor in the SMD-0603 package.
IV. M ODELLING RESULTS
The models of the SMD capacitors are extracted using
the π-model shown in Fig. 6. The admittances Y1 , Y2
represent the parasitic coupling between the soldering pads
and ground, while the admittance Y3 represents the intrinsic SMD component. These admittances can be expressed
in terms of the y-parameters using: Y3 = −(y12 + y21 )/2,
Y1 = y11 − Y3 , Y2 = y22 − Y3 . The y-parameters
are obtained from the de-embedded S-parameters using
well-known relations [14].
The admittances Y1 and Y2 are modelled first. The
results show that these admittances are similar. In order to
simplify the model, they are modelled as identical components according to the mean value Y1,avg = (Y1 + Y2 )/2.
The admittance Y1,avg is shown in Fig. 7a for the 68 pF capacitor measured using SOLT calibration. The capacitive
character of this admittance can be observed. The value
of Y1,avg admittance is similar for on-board calibration in
magnitude and phase. Both Y1 and Y2 are modelled as
the capacitances CP . The value of the capacitance can be
determined from the imaginary part of Y1,avg by using
expression:
Im{Y1,avg }
Im{Y1,avg }
=
.
(3)
ω
2πf
The measured value of CP obtained using Eq. (3) is
frequency-dependent, as it is shown in Fig. 8. In order to
obtain an optimal value of this parameter in the model,
the value of CP is optimized to minimize the mean
square error with respect to the measured value in the
frequency range where the frequency dependence of CP
is approximately flat: from 10 MHz to 8 GHz.
The admittance Y3 is shown in Fig. 7b. The series
resonance behaviour can be observed. Therefore, Y3 is
modelled using the series RLC model. Its impedance is
equal to:
1
1
Z3 =
= RS + j(ωLS −
),
(4)
Y3
ωCN
where CN is the nominal capacitance, LS is the series inductance, and RS is the equivalent series resistance (ESR).
The value of the nominal capacitance is calculated by
neglecting the inductive component ωLS in Eq. (4) for
low frequencies:
1
1
Im{Z3 } ≈ −
=−
(5)
ωCN
2πf CN
CP =
MIPRO 2016/MEET
Fig. 6: π-model for SMD components extracted using y-parameters.
The nominal capacitance CN can be expressed as:
CN =
−1
,
2πf Im{Z3 }
(6)
The value of CN in the model is optimized in the
frequency range from 1 MHz to 10 MHz. Analogously,
the value of the series inductance LS is calculated by
neglecting the capacitive component 1/ωCN in Eq. (4)
for high frequencies:
Im{Z3 } ≈ ωLS = 2πf LS .
(7)
The value of LS is equal to:
LS =
Im{Z3 }
.
2πf
(8)
The value of LS in the model is optimized in the
frequency range from 1 GHz to 8 GHz. The value of the
modelled series resistance RS is optimized with respect to
the real part of Z3 in Eq. (4). The resistance value is optimised in the frequency range from 10 MHz to 100 MHz.
All optimizations are run in ADS using the random
optimizer with 100 iterations. The frequency range for the
optimization is chosen manually for each element of the
model, and for each modelled capacitor.
The final schematic of the SMD capacitor model is
shown in Fig. 9. The optimized values of the model
elements are summarized in Table IV for the different
capacitor values, as well as for both calibration methods.
The comparison between the values of the model elements
in Fig. 9 obtained by measurements and the optimized
values is shown in Fig. 10, and the comparison of the
S-parameters is shown in Fig. 11.
V. D ISCUSSION
The measured S-parameters of the test structures can be
successfully used to build models of the SMA connector,
the CPWG transmission lines and the SMD capacitors.
The S-parameters obtained using the different calibration
methods (SOLT and on-board), shown in Fig. 5, are
consistent for frequencies up to 500 MHz. In the higher
frequency range, the general trend of the S-parameters is
consistent. The maximum model error in the magnitude
of S21 is under 1 dB.
The model topology presented in Section IV is built
based on the y-parameters obtained from measurements.
It can be considered a physical model, since each element
113
6
90
4
Magnitude
Phase
2
106
107
Freq [Hz]
108
85
80
109
Magnitude [dBS]
95
8
Phase [deg]
Magnitude [mS]
10
0
105
20
100
90
0
60
30
-20
0
-40
-30
-60
-60
Magnitude
Phase
-80
3x105 106
10
7
(a) Y1,avg = (Y1 + Y2 )/2.
Phase [deg]
12
-90
8
10
Freq [Hz]
10
9
8x109
(b) Y3 .
Fig. 7: Values of the admittances in the π-model obtained from the de-embedded S-parameters measured using
SOLT calibration for 68 pF capacitor.
10-12
Capacitance [F]
on-board calibration
SOLT calibration
10-13 5
3x10 106
Fig. 9: Model of the SMD capacitors.
107
108
Freq [Hz]
109
8x109
Fig. 8: Extracted value of the capacitance CP in Eq. (3)
for the 1 nF capacitor obtained using both calibration
methods.
has a physical interpretation. The capacitors CP model
the parasitic capacitive coupling between the soldering
pads and ground planes. The values for all capacitors
and calibration methods are around 200 fF. This is in the
expected order of magnitude for the SMD-0603 package.
The capacitance to ground CP is similar for both calibrations with the biggest difference between SOLT and
on-board calibrations of 20% for the 10 nF capacitor. The
element LS represents the equivalent series inductance and
its value is consistent and is between 1.1 nH and 1.6 nH.
The element RS represents the equivalent series resistance,
and its value is between 0.1 Ω and 0.3 Ω.
VI. C ONCLUSION
The SMD capacitor models are extracted from the measured S-parameters. The fixture consisting of SMA connectors and CPWG transmission lines is de-embedded by,
firstly, standard VNA calibration enhanced by fixture modelling and, secondly, by on-board calibration standards.
The two calibration methods are compared. The on-board
calibration is consistent with the de-embeded results for
frequencies up to 8 GHz. The parasitic element values of
the modelled capacitors are similar for different capacitors
in the same SMD-0603 package. The extracted models can
be used to improve the design time for electronic circuits
in high frequency applications.
114
TABLE IV: Model components value for SOLT and
on-board calibration.
nominal C
C = 68 pF
C = 1 nF
C = 10 nF
calibration
SOLT
on-board
SOLT
on-board
SOLT
on-board
CP /fF
LS /nH
RS /mΩ
CN /pF
fY3 res /MHz
283.2
1.144
135.9
68.20
517.2
273.5
1.155
117.9
68.21
517.2
247.1
1.427
157.5
995.6
133.4
202.9
1.417
149.2
993.3
134.2
195.1
1.615
126.1
9187.4
40.65
221.4
1.617
127.4
9181.2
41.12
ACKNOWLEDGMENT
This research is funded by ams AG, Premstaetten,
Austria.
R EFERENCES
[1] P. Muthana, A. E. Engin, M. Swaminathan, R. Tummala, V. Sundaram, B. Wiedenman, D. Amey, K. H. Dietz, and S. Banerji,
“Design, Modeling, and Characterization of Embedded Capacitor
Networks for Core Decoupling in the Package,” IEEE Trans. Adv.
Packag., vol. 30, no. 4, pp. 809–822, Nov 2007.
[2] A. Alaeldine, R. Perdriau, M. Ramdani, J. Levant, and M. Drissi,
“A Direct Power Injection Model for Immunity Prediction in
Integrated Circuits,” IEEE Trans. Electromagn. Compat., vol. 50,
no. 1, pp. 52–62, Feb 2008.
[3] Infineon, PFC boost converter design guide, Appl. Note, February
2016.
[4] K. Naishadham, “Experimental Equivalent-Circuit Modeling of
SMD Inductors for Printed Circuit Applications,” IEEE Trans.
Electromagn. Compat., vol. 43, no. 4, pp. 557–565, Nov 2001.
[5] B. Pejcinovic, V. Ceperic, and A. Baric, “Design and Use of FR-4
CBCPW Lines In Test Fixtures for SMD Components,” in ICECS,
Dec 2007, pp. 375–378.
[6] P. R. B. Vitor, M. J. Rosario, and J. C. Freire, “Modelling SMD
Capacitor for Microstrip Circuits,” in MELECON, Apr 1989, pp.
717–720.
[7] K. Technologies, Using the Time-Domain Reflectometer, Appl. Note,
August 2014.
MIPRO 2016/MEET
100
Capacitance [F]
on-board calibration
SOLT calibration
SOLT model value
Resistance [Ohm]
10-12
10-1
10-13 5
3x10 106
107
108
Freq [Hz]
(a) CP
109
10-2 5
3x10 106
8x109
107
108
Freq [Hz]
(b) RS
109
8x109
10-8
10-9
on-board calibration
SOLT calibration
SOLT model value
10-10
10-11 8
3x10
109
8x109
Freq [Hz]
(c) LS
Capacitance [F]
Inductance [H]
10-8
on-board calibration
SOLT calibration
SOLT model value
10-9
on-board calibration
SOLT calibration
SOLT model value
10-10 5
3x10 106
107
108
Freq [Hz]
(d) CN
109
8x109
Fig. 10: Comparison between the values of the parasitic elements of the 1 nF capacitor obtained using both calibration
methods, and the optimised value used in the model.
-20
0
Magnitude [dB]
Magnitude [dB]
0
Measured
Modelled
-40
-60
3x105 106
-5
-10
107
108
Freq [Hz]
(a) S11
109
8x109
-15
3x105 106
Measured
Modelled
107
108
Freq [Hz]
(b) S21
109
8x109
Fig. 11: Comparison between measured and modelled S-parameters for on-board calibration method for 1 nF capacitor.
[8] Cinch Connectivity. RF coaxial, SMA, straight jack, 50 Ohm.
[Online]. Available: cinchconnectivity.com/OA MEDIA/specs/pi142-0711-201.pdf
[9] Rohde&Schwartz. (2013) R&S ZVA / R&S ZVB / R&S ZVT
Vector Network Analyzers Operating Manual. [Online]. Available:
www.fer.unizg.hr/ download/repository/ZVA ZVB ZVT Operating.pdf
[10] K. Technologies, ADS 2015, Simulation-Analog RF, 2015.
[11] Wcalc.
(2009)
Coplanar
Waveguide
Analysis/Synthesis
Calculator. [Online]. Available: http://wcalc.sourceforge.net/cgibin/coplanar.cgi
[12] G. Ghione and C. Naldi, “Parameters of coplanar waveguides with
lower common planes,” Electronics Letters, vol. 19, no. 18, pp.
734–735, September 1983.
[13] T. Mandic, R. Gillon, B. Nauwelaers, and A. Baric, “Characterizing
the TEM Cell Electric and Magnetic Field Coupling to PCB Transmission Lines,” IEEE Trans. on Electromagn. Compat., vol. 54,
no. 5, pp. 976–985, Oct 2012.
[14] D. Pozar, Microwave Engineering. Wiley, 2004.
MIPRO 2016/MEET
115
Impact of Capacitor Dielectric Type on the
Performance of Wireless Power Transfer System
D.Vinko and P. Oršolić
University of Osijek, Faculty of Electrical Engineering, Department of Communications,
Osijek, Croatia
davor.vinko@etfos.hr
Abstract – In loosely coupled wireless power transfer
systems, the efficiency is directly affected by the quality
factor of the resonant LC tank in both the transmitter and
the receiver. This paper studies the impact that the
capacitor dielectric type has on the quality factor of the
resonant LC tank. Experimental investigation is conducted
for low ESR capacitor types and the performance of the
wireless power transfer system is evaluated. Focus of the
experimental evaluation is placed on the receiver’s end, i.e.
the evaluated parameters are current-voltage characteristics
of the receiver and maximum power values. Capacitors are
also compared with respect to their ESR values on different
frequencies and price. The results show that the capacitor
type has a significant impact on the performance of the
wireless power transfer system. The correct choice of
capacitor type can increase the efficiency of power transfer
and the maximum achievable power up to 400 %.
I.
INTRODUCTION
Wireless power transfer (WPT) is lately gaining more
and more attention with wide spectrum of possible
applications [1]-[3]. Applications differ in amount of
transferred power, the impedance of the load circuit, the
distance of wireless power transfer, the operating
frequency, and these are just some of the parameters that
can be altered to optimize power transfer. Analysis and
the design of the WPT system in a given application is
most often based on power and efficiency maximization
[4], [5]. In systems that are utilizing wireless power
transfer, a quality factor (Q factor) of an LC resonant tank
has a significant impact on system performance [6]-[8].
WPT system consists of a transmitter and a receiver,
Fig. 1. In WPT system the power is wirelessly transferred
from the transmitter to the receiver through alternating
magnetic field. Transmitter is represented by an AC
voltage source U which drives resonant tank formed by C1
and L1. Receiver is represented with resonant tank formed
by L2 and C2, and resistive load RLOAD. M represents the
mutual inductance of L1 and L2. In this paper we focus on
a loosely coupled inductive WPT system [9], [10], which
is characterized by coupling coefficient k below 0.1 [11].
With such setup, the resonant frequency of the WPT
system is not significantly affected by mutual inductance
of loosely coupled coils of the transmitter and the receiver.
Therefore, with low coupling coefficient k, both LC tanks
can be separately designed for the same desired resonant
frequency, and the changes in mutual inductance M (due
to different physical placement and alignment between
coils) will not significantly affect the efficiency of
wireless power transfer. Factors that do affect the
performance of WPT system are parasitic parameters in
resonant tank. Fig. 1 shows these parameters for the
resonant tank of the receiver. Inductor L2 has parasitic
resistance RL and parasitic capacitance CL, while capacitor
C2 has following parasitic parameters: parallel resistance
RLeak, equivalent series inductance ESL and equivalent
series resistance ESR. Both ESL and ESR are frequency
dependent. Values of parasitic capacitance CL and
equivalent series inductance ESL affect the resonant
frequency, while values of parasitic resistances, RL, RLeak
and ESR, affect the quality factor of each component (L2
and C2), and consequently the quality factor of the
resonant tank. With that in mind, the efficiency of the
wireless power transfer is predominantly affected by the
quality factor of components used.
Quality factor of the component (L or C) is the ratio of
stored energy and dissipated energy, which also
corresponds to the ratio of its reactance and resistance at a
given frequency f.
U
L1
C1
L2
M
Transmitter
RLOAD
C2
Receiver
RLeak
C2
L2
CL
ESR
RL
k=
M
< 0.1
L1 L2
(1)
ESL
Figure 1. Wireless power transfer system
This work was sponsored by J. J. Strossmayer University of Osijek
under project IZIP-2014-104 “Wireless power transfer for underground
and underwater sensors”.
116
MIPRO 2016/MEET
2π × f ×L
RL
(2)
(3)
1
QC =
2π × f ×C ×ESR
Papers that address the impact of capacitor selection
on system performance [12], [13] identify the ESR as the
main source of losses. In WPT system (Fig. 1), the
influence of RLeak on system performance is negligible in
comparison to the influence of ESR, which is why RLeak is
not included in (3).
ESR VS. CAPACITOR DIELECTRIC TYPE
Parasitic parameters of a capacitor (RLeak, ESL and
ESR) depend on a type of the dielectric used. In this paper
we investigate the impact of the ESR on the performance
of the WPT system. In order to be used in LC tank of a
WPT system capacitor must be non-polarized. Most
common non-polarized capacitor types are FILM and
ceramic capacitors and both types are characterized as
low-ESR capacitor types. Table I gives the list of all
capacitors used in this paper. All capacitors are 10 nF
capacitors with voltage rating of 100 V. Three capacitor
types are evaluated: Ceramic, MLCC (Multi-Layer
Ceramic Capacitor) and FILM capacitors. Besides the
capacitor type, dielectric type is stated for each capacitor.
Three FILM capacitors with same dielectric type are
evaluated (dielectric types marked PET1, PET2 and PET3
in Table I). They differ in manufacturer, which leads to
significant difference in unit price which is also given for
each capacitor.
ESR value for each capacitor is measured using
handheld LCR meter UT612, which can measure ESR
TABLE I.
@1
kHz
FILM, PET1
FILM, PET2
FILM, PET3
10
1
1.00E+03
@10 kHz
Ceramic, Y5P
1.00E+04
Frequency [Hz]
1.00E+05
Figure 2. ESR-frequency characteristics of different capacitor types
value on predefined frequencies up to 100 kHz. These
values are given in Table I and also shown on Fig. 2. It is
important to note than ESR varies with frequency, and that
this change is not linear and it varies with type of
dielectric.
Manufacturers commonly supply a single ESR value at
frequency which differs from manufacturer to
manufacturer. These common frequencies range from 1
kHz up to 100 MHz. With frequency dependent ESR, a
provided single ESR value does not give the complete
insight in capacitor performance.
III.
MEASUREMENTS
During measurements, evaluated capacitors were
placed (one at a time) in LC tank, parallel to a 100 µH air
coil. Due to tolerance in capacitance value, the resonant
frequency fR of LC tank varied for each capacitor, and is
given in Table II. The maximum deviation of resonant
frequency is within 10%, which is the maximum tolerance
of tested capacitors.
In WPT system (Fig. 1), both LC tanks (L1, C1 and L2,
C2) must be adjusted to the same resonant frequency. The
resonant frequency of receivers LC tank (L2, C2) changes
for different capacitor types. To avoid the necessity of
adjusting the resonant frequency of transmitters LC tank
(L1, C1 in Fig. 1) for each tested capacitor, a measurement
setup shown in Fig. 3 is used. LC tank is not used in
transmitter, but only a transmitting coil L1. This also
Dielectric
type
ESR [Ω]
Price
[€]
MLCC; X8R
FILM, PP
100
TABLE II.
COMPARISON OF USED CAPACITORS
C = 10 nF; 100 V
MLCC, C0G,NP0
MLCC, X7R
MLCC, Z5U
For LC tank, the Q factor represents the ratio of stored
and dissipated energy in each period (cycle) of resonant
frequency. Dissipated energy increases with increase of
parasitic resistances (RL, RLeak and ESR), resulting with
lower Q factor of the resonant tank, which decreases the
efficiency of wireless power transfer.
II.
1000
ESR value [Ω]
QL =
MAXIMUM POWER POINT ANALYSIS
RRES=RLOAD
@MPP [Ω]
fR
[kHz]*
RLC
[Ω]
@100 kHz
ESR
@100
kHz [ Ω]
**
QLC
@MPP
Capacitor
type
Dielectric
type
MLCC
C0G,NP0
1.48
3.36
1.3
1.58
C0G,NP0
6500
159.9
1.55
1.58
64,7
MLCC
X8R
0.82
1.73
1.32
1.60
X8R
6000
161.6
1.72
1.60
59,1
FILM
PP
0.52
5.75
1.92
1.64
PP
6000
161.2
1.71
1.64
59,2
FILM
PET1
2.55
60.1
13.24
3.47
PET1
4000
163.9
2.65
3.47
38,8
FILM
PET2
0.76
71.8
16.7
4.09
PET2
3500
163.7
3.02
4.09
34,0
FILM
PET3
0.11
67.7
16.62
4.13
PET3
3500
165.9
3.10
4.13
33,6
MLCC
X7R
0.49
137.6
18.9
3.61
X7R
2000
161.0
5.12
3.61
19,8
MLCC
Z5U
0.54
152.9
17.97
3.76
Z5U
2000
167.0
5.51
3.76
19,1
Ceramic
Y5P
0.29
207
27.6
4.86
Y5P
1000
163.4
10.54
4.86
9,7
* Resonant frequency fR is measured for parallel LC tank with L = 100 μH
** ESR values correspond to values given in Table I
MIPRO 2016/MEET
117
U
L1
M
L2
RLOAD
C2
a)
dielectric type, and FILM capacitor with PP dielectric
type. When compared to ESR measurement (Fig. 2), the
same capacitors have the lowest ESR values on all tested
frequencies.
Important parameter in WPT systems is a maximum
instantaneous power that can be supplied by the receiver.
Fig. 5 shows output power of a WPT receiver for different
values of load resistance RLOAD and different capacitor
types. Highest output power is obtained for the same three
capacitor types that have the lowest ESR values.
For each tested capacitor, the maximum power is
available for a different value of load resistance. At
maximum power point the load resistance is matched to
the resistance of LC tank. At resonant frequency, the
resistance of LC tank (RRES) can be expressed as:
b)
RRES = RLOAD @MPP = ωL ×QLC =
Figure 3. Measurement setup a) schematic and b) photograph
reduces the impact of mismatch in resonant frequencies of
the transmitter and the receiver on measurement results.
Output current [mA]
A current-voltage output characteristic of the WPT
receiver is measured for each tested capacitor, Fig. 4.
Result show that with different dielectric type, a
maximum output voltage of the WPT receiver varies from
25 V to 105 V. The variations of maximum output current
are less prominent, ranging from 17 mA to 20 mA. There
are three capacitor types that performed significantly
better the rest, MLCC capacitors with C0G, NP0 and X8R
20
18
16
14
12
10
8
6
4
2
0
MLCC, C0G, NP0
MLCC, X8R
FILM, PP
FILM, PET1
FILM, PET2
FILM, PET3
MLCC, X7R
MLCC, Z5U
Ceramic, Y5P
0
50
100
Output voltage [V]
150
Figure 4. Current-voltage output characteristics of a WPT receiver for
different capacitor types
118
Q LC =
1
ωL
,
=
RLC ωC ×RLC
(4)
(5)
where QLC represents the quality factor of the resonant
tank, RLC represents parasitic resistance of LC tank (RL +
ESR), and ω is the angular frequency 2πf. Since capacitor
and inductor have equal reactance at resonant frequency,
two equivalent expressions are given in (4) and (5).
To extract resistive loses of LC tank from
measurement results, the following expression is derived
from (4) and (5):
R LC =
(ωL )2
R RES
=
(6)
1
(ωC )2 ×RRES
The total parasitic resistance RLC can be calculated by
using measured values for resonant frequency fR, and the
resistance of LC tank RRES is obtained experimentally by
matching load resistance at maximum power point (4).
Output power plots for each capacitor (Fig. 5) are drawn
using 13 measurement points that do not necessarily
correspond to the maximum power point. Therefore the
values for RRES given in Table II are extracted from output
power plots (Fig. 5) and are best approximation of the
Output power [mW]
The following component values are used for
measurement: L1 = 180 µH (single layer air coil with
diameter of 60 cm, 10 turns of 1.6 mm wire), L2 = 100 µH
(single layer air coil with diameter of 11 cm, 25 turns of 1
mm wire), C2 = 10 nF (different capacitor types as given
in Table I), RLOAD value is varied from 100 Ω to 1 MΩ in
13 discreet steps (100 Ω, 220 Ω, 560 Ω, 1 kΩ, 2.2 kΩ, 4.7
kΩ, 10 kΩ, 22 kΩ, 47 kΩ, 100 kΩ, 220 kΩ, 470 kΩ, 1
MΩ). As a voltage source U, Agilent 33250A arbitrary
waveform generator is used, generating sine voltage
waveform with peak-to-peak amplitude of 20 V.
Frequency of the voltage source U is manually adjusted
for each tested capacitor and it corresponds to resonant
frequency fR values given in Table I. Voltage across RLOAD
is measured using oscilloscope (represented as voltmeter
in Fig. 3). Coils L1 and L2 are placed on the same plane,
with coupling coefficient k under 0.1, which results with
mutual inductance M under 14 µH.
QLC ,
ωC
450
400
350
300
250
200
150
100
50
0
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06
Load resistance [Ω]
MLCC, C0G, NP0
MLCC, X8R
FILM, PP
FILM, PET1
FILM, PET2
FILM, PET3
MLCC, X7R
MLCC, Z5U
Ceramic, Y5P
Figure 5. Output power of a WPT receiver for different capacitor types
MIPRO 2016/MEET
maximum power point. Calculated values for RLC are
given in Table II.
REFERENCES
Air coil used in LC tank has a DC resistance of 0.2 Ω.
At resonant frequency of the LC tank, due to the skin
effect, the coil resistance RL increases. Calculated value of
the coil resistance RL, at resonant frequency, equals
approx. 0.33 Ω. When compared with RLC values given in
Table II, the coil resistance makes from 3 % up to 20 % of
the total parasitic resistance RLC. The remaining more than
80% of RLC is due to capacitor losses.
[1]
ESR values measured at 100 kHz are the closest to the
resonant frequency (163 kHz ± 2.5%) and are also given
in Table II. It can be noted that there is a good correlation
between ESR values at 100 kHz and total parasitic
resistance RLC. Some discrepancies that occur can be
attributed to the non-linear behavior of the ESR. Quality
factor of resonant LC tank QLC is calculated (5) and the
values are given in Table II.
[4]
IV.
[3]
[5]
[6]
CONLUSION
Conducted experimental investigation has shown that
the choice of capacitor type has a significant impact on
performance of wireless power transfer system.
Equivalent series resistance (ESR) of capacitor was
identified as dominant factor which determines the
performance of WPT system. Three capacitor types were
tested: Ceramic, MLCC (Multi-Layer Ceramic Capacitor)
and FILM capacitors, all considered to be capacitors with
low ESR value. Measurements also showed that ESR
value is more affected by the dielectric material than the
capacitor type.
Best results are achieved by using C0G, NP0 and X8R
dielectric for MLCC capacitors, and PP dielectric for the
FILM capacitors. With use of those dielectric types
system performance is improved by 400 %, with respect
to Ceramic capacitor with Y5P dielectric, and the quality
factor of resonant LC tank shows six-fold improvement.
All tested capacitors were in price range from 0.11 to 2.55
€, and the top three range from 0.52 to 1.48 €. The best
performance to price ratio is obtained for FILM capacitor
with PP dielectric.
MIPRO 2016/MEET
[2]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Y. Yang, X. Xie, G. Li, Y. Huang, Z. Wang, “A Combined
Transmitting Coil Design for High Efficiency WPT of Endoscopic
Capsule,” IEEE International Symposium on Circuits and Systems
(ISCAS), 2015, pp. 97 – 100.
D. Futagami, Y. Sawahara, T. Ishizaki, I. Awai, “Study of high
efficiency WPT underseas,” IEEE Wireless Power Transfer
Conference (WPTC), 2015, pp. 1 – 4.
L. Olvitz, D. Vinko, T. Švedek, “Wireless Power Transfer for
Mobile Phone Charging Device,” Proceedings of the 35th
International Convention MIPRO, 2012, pp. 141 – 145.
G. Kim, B. Lee, “Analysis of Magnetically Coupled Wireless
Power Transfer between Two Resonators Based on Power
Conservation,” IEEE Wireless Power Transfer Conference
(WPTC), 2014, pp. 231 – 234.
Y.-K. Jung, B. Lee, “Design of Adaptive Optimal Load Circuit for
Maximum Wireless Power Transfer Efficiency,” Asia-Pacific
Microwave Conference Proceedings (APMC), 2013, pp. 1221 –
1223.
O. Jonah, S. V. Georgakopoulos, M. M. Tentzeris, “Optimal
Design Parameters for Wireless Power Transfer by Resonance
Magnetic,” IEEE Antennas and Wireless Propagation Letters, Vol.
11, pp. 1390-1393, 2012.
I. Awai, T. Ishizako, “Transferred Power and Efficiency of a
Coupled-resonator WPT system,” IEEE MTT-S International
Microwave Workshop Series on Innovative Wireless Power
Transmission: Technologies, Systems, and Applications (IMWS),
2012, pp. 105 – 108.
M. Dionigi, M. Mongiardo, “Coaxial Capacitor Loop Resonantor
for Wireless Power Transfer Systems,” The 7th German
Microwave Conference (GeMiC), 2012, pp. 1 – 4.
F. Lu, H. Zhang, H. Hofmann, C. Mi, “A High Efficiency 3.3 kW
Loosely-Coupled Wireless Power Transfer System Without
Magnetic Material,” IEE Energy Conversion Congress and
Exposition (ECCE), 2015, pp. 2282 – 2286.
J. A. Russer, P. Russer, “Design Considerations for a Moving
Field Inductive Power Transfer System,” IEEE Wireless Power
Transfer (WPT) Conference, 2013, pp. 147 – 150.
K. A. Grajski, R. Tseng, C. Wheatley, “Loosely-Coupled Wireless
Power Transfer: Physics, Circuits, Standards,” Proceedings of
IMWS-IWPT, 2012, pp. 9 – 14.
S. Karys, “Selection of Resonant Circuit Elements for the ARCP
Inverter,” 10th International Conference on Electrical Power
Quality and Utilisation EPQU, 2009, pp. 1 – 6.
H. J. H. Since, S. Taninder S., B. J. Kai, “The Impact of
Capacitors Selection and Placement to the ESL and ESR,”
International Symposium on Electronics Materials and Packaging
EMAP, 2005, pp. 258 – 261.
119
Switching Speed and Stress Analysis for Fixed-fixed
Beam Based Shunt Capacitive RF MEMS Switches
Anoushka Kumar A, Resmi R
LBS Institute of Technology for Women, Thiruvananthapuram, India
anoushkakumar4@gmail.com, resmilbs@gmail.com
Abstract—In this paper, effect of different materials on the
reliability of RF MEMS shunt capacitive switch is analyzed. The
effective von-Mises stress analysis is done on a fixed-fixed beam
switch for materials like Titanium, Platinum, Gold, Aluminium
and Copper. The maximum value of von-Mises stress obtained
for each material was less than their corresponding ultimate
tensile strength. Membrane using Titanium, Copper and
Platinum can withstand more number of switching cycles. The
variation in switching speed of a fixed-fixed beam structure using
Aluminium, Platinum, Titanium, Copper and Gold is also
analysed. Membranes using Titanium and Copper give better
performance with respect to switching speed and reliability.
Keywords—Fixed-fixed
strength
I.
beam;
von-Mises
stress;
Tensile
INTRODUCTION
voltage and hot switching. Fig. 1 shows the classification of
RF MEMS switches.
Figure 1. Classification of RF MEMS switches
Fig. 2 shows the shunt and series configurations of RF MEMS
switches [2].
Radio Frequency Micro Electro Mechanical System (RFMEMS) switches have been an attractive field for both
scientific research and industry due to its promising
applications in RADAR systems, Satellite communication
systems,
Wireless
communication
systems
and
Instrumentation systems. Compared to traditional GaAs FET
and p–i–n diode switches RF MEMS switches have negligible
power (few µ-watts) consumption, low insertion loss, high
isolation, much lower intermodulation distortion, low cost and
light weight[1]. RF MEMS switches replaced the traditional
GaAs FET and p–i–n diode switches in RF and microwave
systems. Typically MEMS switches are manufactured using
surface micro-machining processes.
In RF-MEMS switches, the mechanical movement of
switch membrane which can be either a fixed–fixed beam or a
cantilever beam creates a short circuit or an open circuit in
transmission line. This mechanical movement is achieved
using electrostatic, piezoelectric, magnetostatic or thermal
actuation. Even though electrostatic method requires a high
actuation voltage it is the most prevalent one due to its near
zero power consumption, small electrode size, thin layers and
short switching time. In electrostatic actuation, electrostatic
force is generated between fixed electrode and movable
membrane for switching operation. The main limitations of RF
MEMS switches include slow switching speed, high actuation
120
Figure 2. Shunt and series circuit configuration of MEMS switches
Various designs are existing on RF MEMS switches to
tackle the limitations faced by switches. Stiff ribs around the
membrane helps to reduce stiction and buckling effect in
switches [3]. Meander in membrane reduces the required
actuation voltage of a RF MEMS switch [4]. For highly
reliable operation twin layered membrane is preferred [5].
Holes in switch membrane help to reduce actuation voltage
and it also reduces squeeze film air damping [6].
MIPRO 2016/MEET
II.
from reference plane to the edge of membrane and w is the
CPW centre conductor width.
SHUNT CAPACITIVE SWITCH
A. General Structure
RF MEMS shunt capacitive switch generally consists of a
movable metal bridge, suspended at a height „g0‟ above the
center conductor. The dielectric layer is used above the center
conductor, so that the switch membrane does not come into
contact with the centre electrode during the actuation. Fig. 3
and Fig. 4 show the switch in OFF and ON states respectively
[7].
Figure 5. Equivalent C–L–R circuit model of shunt capacitive switch
B. Device Geometry
Fixed-fixed beam is usually preferred for the beam
structure in MEMS switches because it provides better
stability and much lower sensitivity to stress. Fig. 6 shows the
3D structure of a fixed-fixed beam switch modeled in
COMSOL Multiphysics. The switch consists of a square plate
which is suspended above a thin film of Silicon nitride
(relative dielectric constant 7.5). Silicon nitride is the
commonly used dielectric layer in shunt capacitive switches.
There is a silicon counter-electrode below the substrate which
is grounded. Four rectangular flexures are used at the corners
to anchor the plate to the substrate. The switch dimensions and
material parameters used for simulating the switch membrane
are shown in Table I and Table II.
Figure 3. Cross sectional view of RF MEMS Shunt Switch (OFF)
Figure 4. Cross sectional view of actuated MEMS Shunt switch (ON)
When a dc voltage is applied, an electrostatic force is
produced between the beam and centre conductor. When this
voltage exceeds pull-in voltage, the electrostatic forces
overcome elastic recovery forces and pulls down the
membrane over dielectric layer on signal line. It results in the
formation of an open circuit between transmission lines and
RF signal which prohibits the transmission of the signal. The
pull-in voltage of fixed-fixed beam switch is given by
VP =
8k
27ɛ0 A
g 30
(1)
Here k is the spring constant in N/m, g0 is the initial gap
in μm, ε0 is the free space permittivity and A is the
electrostatic area in μm2. Fig. 5 shows the equivalent circuit
model of shunt capacitive switch [1]. The sections of
transmission line are of length l+w/2, where l is the distance
MIPRO 2016/MEET
Figure 6. Modeled 3D structure of fixed-fixed beam switch in COMSOL
Multiphysics
121
TABLE I. SWITCH SPECIFICATIONS
Component
Length(µm)
Width(µm)
Depth(µm)
Membrane
frame
Contact area
220
100
1
100
100
1
Flexures
60
5
1
Dielectric layer
220
100
0.1
Gap
-
-
0.9
III. STRESS ANALYSIS USING FEM
The von-Mises stress analysis is mandatory to check
whether the membrane will withstand a given load condition.
If the maximum stress increases for a given gap height from
the stress a material can withstand, it indicates the failure of
design[8]. The effective von-Mises stress in membrane for the
required force at specific spring constant is analyzed for some
commonly available metals like Titanium, Platinum, Gold,
Aluminium and Copper. Simulation of stress gradient is done
using Finite Element Method in COMSOL Multiphysics.
Fig. 7 to Fig. 11 demonstrates the stress distribution in
membrane using different materials. From Fig.7 to Fig.11, the
colour variation across the surface of membrane of switch
indicates the corresponding stress as provided in the colour
chart at the right side of the plot. von-Mises stress profile
obtained is different for different materials. The maximum
stress can be seen along the flexures and minimum stress is
obtained in the contact area.
Figure 7. von-Mises stress demonstrating maximum stress of 68MPa in
Copper membrane
Fig. 8 shows the maximum and minimum value of vonMises stress obtained in membrane using Titanium
Figure 8. von-Mises stress demonstrating maximum stress of 68MPa in
Titanium membrane
TABLE II. SWITCH PARAMETERS
Material
Young‟s
Modulus(GPa)
Poisson‟s ratio
Density(kg/m3)
Copper
110
0.35
8700
Titanium
113
0.29
4540
Gold
79
0.44
19320
Aluminium
70
0.32
2700
Platinum
170
0.39
21450
Fig. 9 shows the maximum and minimum value of vonMises stress obtained in membrane using Gold
Fig. 7 shows the maximum and minimum value of vonMises stress obtained in membrane using Copper.
Figure 9. von-Mises stress demonstrating maximum stress of 48MPa in Gold
membrane
122
MIPRO 2016/MEET
Fig. 10 shows the maximum and minimum value of vonMises stress obtained in membrane using Aluminium
The maximum and minimum value of von-Mises stress
obtained for membrane using different materials along with
their corresponding ultimate tensile strength are shown in
Table III. Ultimate tensile strength is the value of maximum
stress that a material can withstand while being stretched or
pulled before breaking. From Table III it is clear that the
maximum von-Mises stress obtained for Titanium, Copper,
Gold, Aluminium and Platinum are less than their
corresponding ultimate tensile strength. It indicates that design
of membrane using these materials is safe for switching
operation. Moreover, the maximum von-Mises stress obtained
for Titanium, Copper and Platinum were much less than their
corresponding ultimate tensile strength. Therefore membrane
using these materials can be used for much more switching
cycles than the membrane made using Gold or Aluminium.
IV.
Figure 10. von-Mises stress demonstrating maximum stress of 43MPa in
Aluminium membrane
Fig. 11 shows the maximum and minimum value of vonMises stress obtained in membrane using Platinum
PULL-IN TIME ANALYSIS
One of the important parameters to be considered in the
design of RF MEMS switches is the switching rate. Switching
rate, also referred to as switching speed is the time required
for the switch to respond at the output due to the change in
control voltage. When a voltage is applied, electrostatic force
build up and it tends to pull down the membrane and when
this electrostatic force exceeds elastic recovery force of
membrane, pull in occurs[9]. The pull-in time required by
membrane using different materials is simulated on a fixedfixed beam based shunt capacitive switch. As shown in Fig.
12, the membrane using Platinum takes more than 50µs for a
displacement of 0.65µm.
Figure 11. von-Mises stress demonstrating maximum stress of 40MPa in
Platinum membrane
TABLE III. COMPARISON OF STRESS ANALYSIS
Material
Von-Mises
stress
Max (MPa)
Von-Mises
stress
Min (MPa)
Ultimate
tensile strength
(MPa)
Copper
68
0.59
210
Titanium
68
0.58
240-370
Gold
48
0.39
100
Aluminium
43
0.36
40-50
Platinum
40
0.27
200-300
MIPRO 2016/MEET
Figure 12. Displacement of the center of Platinum membrane in the modeled
switch
When membrane using Copper is used, the time required
for pull-in was obtained as 41µs for a displacement of 0.9µm.
Fig. 13 shows the switching time required by membrane using
Copper.
123
When membrane using Titanium is used, the time required
for pull-in was obtained as 39µs for a displacement of 0.9µm.
Fig. 16 shows the switching time required by membrane using
Titanium.
Figure 13. Displacement of the center of Copper membrane in the modeled
switch
When membrane using Aluminium is used, the time
required for pull-in was obtained as 37µs for a displacement of
0.9µm. Fig. 14 shows the switching time required by
membrane using Aluminium.
Figure 16. Displacement of the center of Titanium membrane in the modeled
Switch
TABLE IV. SWITCHING TIME REQUIREMENT FOR DIFFERENT MATERIALS
Figure 14. Displacement of the center of Aluminium membrane in the
modeled switch
When membrane using Gold is used, the time required for
pull-in was obtained as 45µs for a displacement of 0.9µm. Fig.
15 shows the switching time required by membrane using
Gold.
Figure 15. Displacement of the center of Gold membrane in the modeled
switch
124
Material used
Switching time(µs)
Titanium
39
Gold
45
Aluminium
37
Copper
41
Platinum
>50
From Table IV it is clear that among the five materials used
for simulation of membrane, Platinum gives the worst
switching speed. The switching time required by membrane
using Titanium, Copper and Aluminium is less compared to
membrane using Platinum and Gold. Thus for better switching
speed conditions, Platinum and Gold are not a good choice.
V.
CONCLUSION
Stress analysis is done for a fixed-fixed beam shunt
capacitive RF MEMS switch for materials like Titanium,
Platinum, Gold, Aluminum and Copper. Membranes using
Titanium, Copper and Platinum can withstand much more
switching cycles than the membrane made using gold or
aluminium. Thus membranes using Titanium, Copper and
Platinum are more reliable compared to membranes using
Gold and Aluminium. Platinum and Gold are not a good
choice for better switching speed conditions. Membranes
using Titanium and Copper give better performance with
respect to switching speed and reliability. When both
reliability and switching speed is considered simultaneously,
membranes made up of Titanium is found to be a better
choice.
MIPRO 2016/MEET
VI.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
REFERENCES
Gabriel M Rebeiz, “RFMEMS Theory, Design and Technology”,
John Wiley and Sons Limited, New Jersey, 2002
J Jason Yao, “Topical review RF MEMS from a device
perspective”, Journal of Micromechanics and Microengineering,
10 (2000), R9–R38
Goldsmith CL, Lin T, Powers B, Wu W, Norvell B, “
Micromechanical
membrane
switches
for
microwave
applications”, IEEE international MTT-S symposium digest, vol 1,
pp 91–94,1995
Muldavin, J.B, Rebeiz, G.M, “High-isolation CPW MEMS shunt
switches, part 1: modelling”, IEEE Trans. Microw. Theory Tech.,
pp1045–1052 , 2000
Singh, T, Khaira, N, Sengar, J, “Stress analysis using multiphysics
environment of a novel RF MEMS shunt switch designed on
quartz substrate for low voltage applications”,Trans Electr
Electron. Mater,vol 2,2013
Singh A, Dhanjr, “Design and modeling of a robust wideband
Poly-Si and Au based capacitive RF MEMS switch for millimeter
wave applications”, Proc 2nd Int Conf Comput Sci, vol 1,Elsevier,
GmbH, PB, pp 108–114,2014
Bhadri, Baiju, “Electromagnetic Analysis of RF MEMS Switch”,
International Journal of Engineering Research & Technology, Vol.
3 Issue 9, September 2014
Tejinder Singh, “Computation of Beam Stress and RF Performance
of a Thin Film Based Q-Band Optimized RF MEMS Switch”,
Transactions on Electrical andEelectronic materials, vol. 16, no. 4,
pp. 173-178, august 25, 2015
S. Shekhar, 2 K. J. Vinoy, 3 G. K. Ananthasuresh, “Switching and
Release Time Analysis of Electrostatically Actuated Capacitive RF
MEMS Switches”, Sensors & Transducers Journal, Vol. 130, Issue
7, pp 77-90, July 2011.
MIPRO 2016/MEET
125
Performance Analysis of Micromirrors - Lift-off and
von Mises stress
Sharon Finny, Resmi R
LBS Institute of Technology for Women, Thiruvananthapuram, India
sharon.finny2010@gmail.com, resmilbs@gmail.com
Abstract—Micromirrors are used in numerous applications
such as optical switching, projection displays, biomedical imaging
and adaptive optics. In this paper, the structural mechanical
properties of an electrostatically controlled square shaped
micromirror structure are studied. Lift-off analysis and stress
analysis is also conducted and materials which can boost the
performance of the micromirror are identified. Higher lift-off is
observed for silicon-aluminium structure, but it is not
recommended due to severe edge displacement and surface
deformation. . Structural steel-aluminium material combination
shows maximum lift-off with minimal surface deformation and
the maximum value of von Mises stress obtained is less than the
yield strength of the material thus ensuring safe operation of the
structure. The modeling and analysis are done using COMSOL
Multiphysics 4.3b.
developed during the fabrication process [6]. The plating
process usually controls this stress and it can be either
compressive or tensile. The prestress level is normally set
depending on the lift-off required.
II. MODELING OF MICROMIRRORS
The micromirror is modeled using COMSOL Multiphysics
software. The geometry consists of a centre mirror plate
surrounded by cantilever springs on each of the four sides. The
2D geometry of the micromirror is shown in Fig.1.
Appropriate boundary conditions are selected, and then
meshing is performed on the model to obtain final refined
mesh with hexahedral elements. The meshed geometry is
shown in Fig. 2.
Keywords—micromirror; lift-off; von Mises stress;
I. INTRODUCTION
The progress of MEMS technology has led to the
development of miniaturized optical devices which has a huge
impact on a large number of optical applications. These
include movable and tunable mirrors, lenses, filters and other
optical structures [1]. Micromirrors are a much- researched
area in Micro-Opto-Electro-Mechanical Systems (MOEMS).
The movable micromirrors are micro optical components used
for spatial manipulation of light. The incident light can be
reflected to an expected direction by moving the mirror plate
so as to modulate phase and/or the amplitude of the incident
light [2]. Micromirrors have been used in a wide variety of
applications including optical switching, projection displays,
endoscopic imaging, barcode readers, laser beam steering etc
[3].
Micromirrors are commonly classified on the basis of their
method of actuation. The most common actuation methods
include electrostatic actuation, electrothermal actuation,
electromagnetic actuation and piezoelectric actuation.
Electrostatic method of actuation is preferred over the other
methods because of its low power consumption, faster
response and simplicity of implementation. One of the
drawbacks of electrostatic actuated mirrors is that they exhibit
pull in, hysteresis and smaller vertical displacement range [4].
Prestressed actuators have been developed to address the issue
of pull in, hysteresis and to obtain large vertical displacement
[5].These actuators are operated electrostatically and their
electromechanical behavior is influenced by the residual stress
126
Figure 1. 2D Geometry of Micromirror.
Figure 2. 3D Meshed Geometry of the micromirror.
MIPRO 2016/MEET
III. SIMULATION RESULTS
A. Lift-off Analysis
The lift-off analysis was performed in COMSOL
Multiphysics to examine the effect of changing material
combinations on the lift-off process. Fig. 3 - Fig. 8 show the
lift-off when different materials such as aluminium, silicon,
structural steel, iron, steel AISI 4340 and tungsten were used
as the centre mirror plate. Aluminium was the material used
for the cantilever beams. The initial normal stress was set as
5 GPa.
In Fig. 3 and Fig. 4, the maximum lift-off is present at the
edges and edge displacement relative to centre plate is more
than 0.2 mm. Also, there is surface deformation at the edges
which makes the structure unsuitable for a mirror plate.
Figure 5. Total surface displacement for structural steel.
The maximum lift-off for iron and steel AISI 4340
plates are less than that of structural steel as shown in Fig. 6
and Fig. 7 respectively.
Figure 3. Total surface displacement for aluminium.
Figure 6. Total surface displacement for iron.
Figure 4. Total surface displacement for silicon.
For structural steel plate, the edge displacement is less
compared to aluminium and silicon mirror plate as shown in
Fig. 5. Steel deforms less because it is stiffer compared to
aluminium and silicon. Structural steel has a Young’s modulus
of 200 GPa whereas that of aluminium and silicon is 69 GPa
and 150 GPa respectively.
MIPRO 2016/MEET
Figure 7. Total surface displacement for steel AISI 4340.
127
Tungsten has a Young’s modulus of 411 GPa and it is
stiffer compared to other materials. Therefore, tungstenaluminum combination exhibits the least displacement as
shown in Fig. 8
Figure 8. Total surface displacement for tungsten.
In all the material combinations analysed, the maximum
displacement was observed at the ends of cantilever beam
connected to the mirror plate and minimum at the opposite
ends.
Table I shows a comparison of maximum lift-off for
different material combinations of the centre mirror plate and
the cantilever beams.
TABLE I.
B. Displacement v/s Prestress Analysis
Fig. 10 shows the variation of the total centre point
deflection with different prestress levels for different centre
plate materials. The mirror response is nearly linear in case of
all the centre plate materials used. Silicon exhibits greater
deflection compared to other materials for all the values of
applied prestress, but it shows high displacement at the edges
of the centre plate.
COMPARISON OF LIFT-OFF FOR DIFFERENT MATERIAL
COMBINATIONS
Material of centre plate
Maximum Displacement
or
Lift-off (mm)
Aluminium
0.3116
Silicon
0.287
Structural Steel
0.2545
Iron
0.2509
Steel AISI 4340
0.2447
Tungsten
0.1476
Fig. 9. shows the total edge displacement for structural
steel-aluminium combination which has a maximum value of
0.15 mm and therefore this material combination gives a
comparatively stable structure with less surface deformation.
In the case of other combinations the lift-off is too low.
128
Figure 9. Edge: Total displacement: structural steel
Figure 10. Total centre point displacement v/s prestress for different material
combinations.
Fig. 11 shows the mirror’s curvature along its centerline
for a range of values of prestress. As per the graph, mirror
bends more with increasing stress.
MIPRO 2016/MEET
Figure 13. Surface von Mises stress for different prestress levels.
Figure 11. Total centre point deflection for different prestress levels: structural
steel plate.
C. von Mises Stress Analysis
The effective stress at which yielding of any ductile
material occur is known as the yield strength. In order to avoid
static or fatigue failure of the structure, the von Mises stress
induced in the material should be less than its yield strength.
In (1), σv and σy denote von Mises stress and yield strength
respectively.
σv ≤ σy
(1)
IV. CONCLUSION
In this paper, an electrostatically controlled
micromirrors was modeled and analysed using COMSOL
Multiphysics software. The lift-off analysis was carried out for
different mirror plate materials. The effect of different
prestress levels on the deflection of mirror plate was also
studied. The result obtained shows that structural steelaluminium micromirror has high lift–off with minimal surface
deformation at the edges which improves the stability of the
structure. The maximum value of von Mises stress is less than
the yield strength of the material which ensures the safe
operation of micromirror structure.
The von Mises stress, σv can be expressed in terms
of principal stresses σ1 , σ2 , σ3 as given in (2).
σv = [
(σ1 −σ2 )2 +(σ2 −σ3 )2 +(σ3 −σ1 )2
2
]
1⁄
2
V. REFERENCES
[1]
Olav Solgard, Asif A. Godil, Roger T. Howe, Luke P. Lee, Yves-Alain
Peter, Hanes Zappe, “Optical MEMS:From micromirrors to complex
systems,” Journal of MicroElectroMechanical Systems, vol. 23, no.3,
June 2014.
[2]
Xingguo Xiong and Hanyu Xie, “MEMS Dual-mode Electrostatically
Actuated Micromirror,” Proceedingsof 2014 Zone 1 Conference of the
American Society for Engineering Education (ASEE Zone 1)
[3]
Y. P. Zhu, W. J. Liu, K. M. Jia, W. J. Liao, and H. K. Xie, “A
piezoelectric unimorph actuator based tip-tilt-piston micromirror with
high fill factor and small tilt and lateral shift,” Sens. Actuators A, Phys.,
vol. 167, no. 2, pp. 495–501, 2011.
Yuan Ma, Shariful Islam, and Ya-Jun Pan, “Electrostatic torsional
micromirror With enhanced tilting angle using active control Methods”,
IEEE/Asme Transactions on Mechatronics, Vol. 16, No. 6, December
2011.
(2)
Fig. 12 shows the distribution of von Mises stress
over the surface of the mirror structure with structural steel
plate. The von Mises stress is high at the edges and decreases
towards the centre of the mirror plate. To ensure the safety of
structure against failure, von Mises stress must be less than the
yield strength of the material (250 MPa). As per the Fig.13,
the structure can be safely operated for prestress values less
than 1 GPa.
[4]
[5]
M. Maheswaran and HarNarayan Upadhay, “A Study on Performance
Driven Microfabrication methods for MEMS Comb-drive
Actuators,”.Journal of Applied Sciences, 12: 920-928, 2012.
[6]
R. Sulima and S. Wiak, “Modelling of vertical electrostatic comb-drive
for scanning micromirrors,” Int. J. Comput. Math. Electr. Electron.
Eng., vol. 27, no. 4, pp. 780–787, 2008.
Figure 12. Surface von Mises stress: structural steel plate.
MIPRO 2016/MEET
129
Material and Orientation Optimization for
Quality Factor Enhancement of BAW Resonators
Reshma Raj R S and Resmi R
L B S Institute of Technology for Women, Thiruvananthapuram, India
reshmarajrs1991@gmail.com, resmilbs@gmail.com
Abstract - In this paper Quality factor of different Bulk Acoustic
Wave resonators using various piezoelectric materials are
analyzed. Higher the Quality factor, better is the resonator. High
Quality factor provides higher signal to noise ratio, higher
resolution and low power consumption. The results indicate that
Rochelle Salt possess maximum value of Quality factor of about
2999.729. By changing the orientations maximum value of
Quality Factor value obtained is about 42696.38 at 81◦. A MEMS
based Bulk Acoustic Wave resonator is designed using COMSOL
Multiphysics software.
Keywords - Bulk Acoustic Wave, Eigen frequency analysis,
Quality Factor, BAW resonator
I.
INTRODUCTION
BAW resonator is the core element of the BAW
technology. BAW resonator is an electromechanical device in
which, with the application of electrical signal a standing
acoustic wave is generated in the bulk of piezoelectric
material. In simple words, a device consisting of piezoelectric
material is sandwiched between two metallic electrodes. To
obtain the desired operating frequency, the natural frequency
of the material and the thickness are used as design
parameters. When the voltage is applied to the top electrode of
the resonator, the bulk acoustic mode of the resonator is
obtained from eigen frequency analysis.
Bulk acoustic wave promises frequency in GHz range
when integrated with RF circuits along with small size
resonator and filters [4]. The resonator considered in the
present paper is of thin film [5], [6] type in which the substrate
is partially etched away on the back. The natural frequency of
the material and Quality factor are used as design parameters
to obtain the desired resonant frequency.
FBAR components offer small sizes, low cost, high
quality factor, large power operation and compatibility with
silicon low-cost process, enabling mass production and filter
integration [7].
Fig.1 shows geometry of the modelled resonator. The
lowest layer of the resonator is silicon substrate on top of
which is the aluminum layer that operates as the ground
electrode. A piezoelectric layer is laid over the ground
electrode and above this piezoelectric layer is the metal
electrode. The Perfectly Matched Layer (PML) effectively
simulates the effect of propagation and absorption of elastic
waves in the adjoining regions. The resonator dimensions used
for simulating the resonator are shown in Table I.
The main mode of operation of BAW resonator [1] is the
thickness or longitudinal mode, meaning that the bulk acoustic
wave reflects the large plate surface and the resonance caused
by the wave excited to the thickness direction.
A. Thin Film Bulk Acoustic Resonator
Thin Film Bulk Acoustic Resonator (FBAR or TFBAR)
[2] is a device consisting of a piezoelectric material
sandwiched between two electrodes and acoustically isolated
from the surrounding medium with thicknesses ranging from
several micrometres down to tenth of micrometres resonate in
the frequency range of roughly 100 MHz to 10 GHz.In the
wireless telecommunication world, film bulk acoustic wave
resonators [3] are very promising for use as RF MEMS filters
since they can be combined to make up duplexers
(transmitter/receiver modules).
The FBAR is the chosen device in this study as an
example of high frequency MEMS with thin film targeted for
use in the ever growing wireless communications industry.
130
Figure 1. Geometry of Thin Film Bulk Acoustic Wave Resonator
TABLE I.
RESONATOR DIMENSIONS
Parameters
Thickness
Width
Silicon layers
7μm
1.7mm
top
0.2μm
500μm
middle
0.2μm
1.7mm
9.5μm
1.7mm
Metal layers
(Aluminium)
Piezoelectric layer
(Rochelle Salt)
MIPRO 2016/MEET
II.
EQUIVALENT CIRCUIT REPRESENTATION
For making circuit designs using MEMS resonators, an
equivalent electrical circuit that describes their frequency
dependent characteristics is needed. The equivalent
Butterworth Van Dyke (BVD) lumped equivalent circuit of
BAW resonator [8] is shown in Fig.2.
An obvious way to determine the frequency dependent
admittance Y(ω) of a MEMS resonator is using a frequency
response analysis. In this type of analysis, an actuation force
of a frequency is applied to the surface of the resonator and
the amplitude response is simulated. Because the Quality
factor of MEMS resonators is usually high, this method
requires a high density of frequency points to resolve a single
resonance of the resonator. Moreover, fitting of Y(ω) is
required to extract the equivalent circuit parameters.
In this paper an alternative method is described to directly
extract the Quality factor from an eigen frequency analysis.
The method will be applied to the resonator in Fig.2.
Eigen frequency and frequency domain analyses are
different studies that are independent of each other. An eigen
frequency step will gives list of all the natural frequencies of
the system. Then we have to examine the deformed mode
shape to see what displacement will be qualitatively. The
normalization used in an eigen value problem is somewhat
arbitrary. However, it is likely that the largest displacement
will corresponds to the lowest mode.
III.
QUALITY FACTOR ANALYSIS
Quality factor is one of the most important characteristics
of MEMS resonators, especially for vibrating structures where
the resonant frequency variation is monitored.
Higher the Quality factor value better the resonator
performance. Signal to noise ratio increases and power
dissipation decreases. High Quality factor circuits can be used
for various wireless applications. Quality factor value is a
dimensionless parameter.
As we increase resistance value, the frequency range over
which the dissipative characteristics dominate the behaviour of
the circuit increases. Quality factor is related to the sharpness
of the peak. Quality factor is an expression of the cyclic
energy loss in an oscillating system.
In terms of energy, it is expressed as the total energy stored
the system divided by the energy loss per cycle,
(2)
Q = 2π×
Quality factor analysis by using different materials is
explained below. In Eigen frequency analysis there are 6
modes of frequencies. The lowest frequency for which these
stationary waves are formed is called fundamental harmonic.
The modes of oscillation have different shapes for different
frequencies [9].
Here in this paper Quality factor is evaluated using eigen
frequency, from the equation given below.
Quality Factor,
=
(3)
Table II shows the Quality factor analysis of Zinc Oxide.
By using Zinc Oxide maximum value of Quality factor
obtained is about 1330.08 at an Eigen frequency of 221.42
MHz.
TABLE II. QUALITY FACTOR VALUES OBTAINED AT
DIFFERENT EIGEN FREQUENCY VALUES USING ZINC OXIDE
Figure 2. Equivalent Butterworth Van Dyke lumped element equivalent circuit
of BAW resonator
Each branch contains a resistor, an inductor, and a
capacitor, representing a resonance in the frequency response
of the resonator. The admittance of the equivalent circuit is
Y(ω) = ∑
+iω
(1)
Eigen Frequency(MHZ)
Quality Factor
217.73
1199.7624
218.7
936.23781
219.55
1188.7183
220.63
936.51987
221.42
1330.0896
222.13
937.4436
Table III shows the Quality factor analysis of Aluminium
Nitride. By using Aluminium Nitride maximum value of
MIPRO 2016/MEET
131
Quality factor obtained is about 1808.26 at an Eigen frequency
of 225.42 MHz.
TABLE III. QUALITY FACTOR VALUES OBTAINED AT
DIFFERENT EIGEN FREQUENCY VALUES USING ALUMINIUM
NITRIDE
Eigen Frequency(MHZ)
Quality Factor
214.8
1616.2625
216.03
1784.7664
221.3
1662.5064
222.09
1784.1761
225.42
1808.2671
Table IV shows the Quality factor analysis of Rochelle
Salt. By using Rochelle Salt maximum value of Quality factor
obtained is about 2999.72 at an Eigen frequency of 218.95
MHz.
TABLE IV. QUALITY FACTOR VALUES OBTAINED AT
DIFFERENT EIGEN FREQUENCY VALUES USING ROCHELLE
SALT
Eigen Frequency(MHZ)
Quality Factor
218.76
1165.6162
218.95
2999.7291
219.38
1185.5266
220.26
1083.3847
220.99
1228.3316
Table V shows the Quality factor analysis of Lithium
Niobate. By using Lithium Niobate maximum value of Quality
factor obtained is about 1360.2926 at an Eigen frequency of
222.88 MHz.
TABLE V: QUALITY FACTOR VALUES OBTAINED AT DIFFERENT
EIGEN FREQUENCY VALUES USING LITHIUM NIOBATE
Eigen Frequency(MHZ)
Quality Factor
217.21
665.18125
219.17
1152.154
219.59
619.25414
221.03
1309.1761
222.57
681.24313
222.88
1360.2926
Quality factor Vs Eigen Frequency plot of different
materials is shown in Fig.3 below.
132
Figure 3.Quality factor analysis by using different piezoelectric materials
From the Quality factor analysis it is clear that by using
Rochelle Salt a maximum Quality factor value of about
2999.729 is obtained.
Further analysis is done by using Rochelle Salt as a
piezoelectric material via varying the orientations. Here, the
piezoelectric layer is rotated along the X axis.
The eigen-frequencies and respective Quality factor change
when the piezoelectric materials inside the resonator were
rotated. This is due to the crystallographic effect [10] of
piezoelectric materials. However, any angular misalignment
from the axis of transduction disrupts the atomic linearity
which might lead to acoustic losses at the atomic level, the
ensemble of which might reflect on increase in the Quality
factor of the resonator.
Table VI shows the Quality factor Vs Eigen frequency plot
obtained using Rochelle Salt by rotating piezoelectric layer
along 30°. Here maximum value of Quality factor obtained is
about 17705.13955 at an Eigen frequency of 220.6 MHz.
TABLE VI. QUALITY FACTOR ANALYSIS BY ROTATING
PIEZOELECTRIC LAYER ALONG 30◦
Eigen Frequency(MHZ)
Quality Factor
215.47
5601.25809
218.12
12669.76214
219.6
13173.77809
220.6
17705.13955
222.62
11062.03554
Table VII shows the Quality factor Vs Eigen frequency
plot obtained using Rochelle Salt by rotating piezoelectric
layer along 50°. Here maximum value of Quality factor
obtained is about 40998.42342 at an Eigen frequency of
222.18 MHz.
MIPRO 2016/MEET
TABLE VII. QUALITY FACTOR ANALYSIS BY ROTATING
PIEZOELECTRIC LAYER ALONG 50◦
Eigen Frequency(MHZ)
Quality Factor
214.95
19496.29908
216.47
32516.34201
218.1
35075.73867
219.56
29980.49335
222.18
40998.42342
Table VIII shows the Quality factor Vs Eigen frequency
plot obtained using Rochelle Salt by rotating piezoelectric
layer along 60°. Here maximum value of Quality Factor
obtained is about 32554.42619 at an Eigen frequency of
220.42 MHz.
Figure 4.Quality Factor analysis at different orientations
IV.
TABLE VIII.QUALITY FACTOR ANALYSIS BY ROTATING
PIEZOELECTRIC LAYER ALONG 60◦
Eigen Frequency(MHZ)
Quality Factor
214.36
32137.45978
217.7
28887.5745
220.42
32554.42619
221.27
18363.90608
224.99
17416.11227
RESULTS
Eigen frequency analysis is used to find out Quality factor.
All the design and modeling are done using COMSOL
Multiphysics software. The Eigen frequency analysis plot
shown in Fig.5 shows the lowest BAW mode of the structure
that occurs at a particular frequency obtained via material
optimization.
The eigen frequency values are shown in results as
complex numbers wherein the real part provides the actual
displacement frequency while the imaginary part is an
indication of the extent of damping.
Table IX shows the Q factor Vs Eigen frequency plot
obtained using Rochelle Salt by rotating piezoelectric layer
along 81°. Here maximum value of Quality Factor obtained is
about 42696.38989 at an Eigen frequency of 218.84 MHz.
TABLE IX. QUALITY FACTOR ANALYSIS BY ROTATING
PIEZOELECTRIC LAYER ALONG 81◦
Eigen Frequency(MHZ)
Quality Factor
216.15
23987.13199
216.76
13140.93297
218.84
42696.38989
220.4
38909.29835
223.42
26911.90005
Quality factor analysis by varying orientations is shown in
Fig.4 below.
MIPRO 2016/MEET
Figure 5.The lowest bulk acoustic mode of the resonator identified from the
solutions of the Eigen frequency analysis via using Rochelle Salt as
piezoelectric layer.
Fig.6 shows the lowest BAW mode of the structure that
occurs at a particular frequency obtained via changing
orientation. By changing orientations maximum value of
Quality factor (42696.38989) is obtained by rotating
piezoelectric layer along 81°. This is due to the crystallographic
133
effect of piezoelectric materials. The colour variation across the
surface of the resonator and the corresponding displacement
for that eigen frequencies are shown in the colour chart at the
right side of the plot in Fig.5 and Fig.6.
value of Quality factor (42696.38) is obtained.Higher the
Quality factor value better the resonator and by improving
Quality factor signal to noise ratio and sensitivity are also
increased.
REFERENCES
Figure 6. The lowest bulk acoustic mode of the resonator identified from the
solutions of the Eigen frequency analysis by rotating piezoelectric layer along
81◦
V.
CONCLUSION
Quality factor of different Bulk Acoustic Wave resonators
using various piezoelectric materials are analyzed. It was found
that by using Rochelle Salt a Quality factor of about 2999.729
is obtained at a frequency of 218.955MHz.This is the highest
value obtained by analysing these piezoelectric materials. By
changing the orientations it is found that at 81° maximum
134
[1] Sneha Dixit,” Film Bulk Acoustic Wave (FBAR) Resonator”, in Scientific
andResearchPublications,Vol.5Iss.2,2015
[2]Gabriel.M.Rebeiz(2003),RF MEMS Theory, Design and Technology
[3] Tapani Makkonen, Antti Holappa, Juha Ella, and Martti M.
Salomaa,“Finite element simulations of Thin-Film Composite BAW
Resonators,” IEEE Trans. On Ultrason, Ferroelectrics, and Frequency Cotrol,
VOL. 48, no5,pp.1241-1258, 2001
[4] W.A. Burkland, A.R. Landlin, G.R. Kline, and R.S. Ketcham, “A thinfilm-bulk-acoustic-wave resonator-controlled oscillator on silicon,” IEEE
Electron. Device Lett., vol. EDL-8, no. 11, pp. 531-533, 1987
[5] R.B. Stokes, J.D. Crawford, and D. Cushman, “Monolithic bulk acoustic
filters to X-band in GaAs,” in Proc. IEEE Ultrason. Symp., pp. 547-551, 1993
[6] T.W. Grudkowski, J.F. Black, T.M. Reeder, D.E. Cullen, and R.A.
Wagner, “Fundamental-mode VHF/UHF miniature acoustic resonators and
filters on silicon,” Appl. Phys. Lett., vol. 37, no. 11, pp. 993-995, 1980
[7] R.F. Milsom and others, “Effect of mesa-shaping on spurious modes in
ZnO/Si bulk-wave composite resonators,” in Proc. IEEE Ultrason. Symp., pp.
498–503.
[8] K.M. Lakin, “Modelling of thin film resonators and filters,” IEEE MTT-S
Digest, vol. 1, pp. 149-152, 1992.
[9] Tapani Makkonen, Member, IEEE, Antti Holappa, Juha Ell¨a, and Martti
M. Salomaa, “Finite Element Simulations of Thin-Film Composite BAW
Resonators,” in IEEE transactions on ultrasonics, ferroelectrics, and frequency
control,vol.48,no.5,2007
[10] RanjanDey,D.Anumeha,” Design and Simulation of Portable Fuel
Adulteration Detection Kit”,in Journal of Energy and Chemical Engineering,
Vol.2Iss.2,PP.74-80,2014
MIPRO 2016/MEET
Impact of Propagation Medium on Link Quality
for Underwater and Underground Sensors
G. Horvat, D. Vinko and J. Vlaović
University of Osijek, Faculty of Electrical Engineering, Department of Communications
Osijek, Croatia
goran.horvat@etfos.hr
Abstract - With the rapid development of wireless networks
new application domains are continuously proposed. One of
these applications includes an underwater and underground
sensor which relies on the properties of diminished RF
performance to send the gathered information to a base
station. Although many researchers deal with the theoretical
background of the channel and propagation models, very few
experimental studies regarding the link quality were
conducted. Due to the fact that new and more powerful
sensor nodes are continuously being developed, the question
that arises is how these new and more powerful classes of
wireless sensors can handle the harsh environment in
underwater and underground networks. Therefore, in this
paper the authors analyze the impact of soil, water, moisture
and other parameters on the link quality between two sensor
nodes in various scenarios. The obtained results present a
basis
for
designing
and
planning
an
underground/underwater sensor network.
I.
INTRODUCTION
The performance of wireless sensor nodes (WSN)
depends on the environment and the medium in which they
operate. The underwater and underground environment
significantly differ from the terrestrial environment where
WSNs are commonly used. There is a number of papers
proposing new protocols to cope with the propagation
problems in both underwater [1] - [4] and underground [5]
environment. If we compare underwater and underground
environment, the research focuses on a different set of
problems. In underwater applications, the focus is set on
problems such as deployment algorithms for node
placement [6], [7], powering and energy efficiency issues
[8], [9]. In underground communications, the focus is
mainly set on various applications in coal and metal mines
[10] - [13], and underground train tunnels [14]. The security
aspect [15], [16] and localization/tracking [17] is also being
discussed but mainly for WSN applications in underground
environments.
When it comes to communication, researchers are
exploring different approaches to extend the
communication range in the underwater environment by
using wireless sensor nodes with optical [18], [19] and
acoustic communication [20], [21]. This is due to high
attenuation of the electromagnetic (EM) wave [22] in
water. Additional circuitry (optical and acoustic
transducers) used to extend the communication range has a
negative impact on power consumption and battery life of
the sensor node. In both, the underground environment
(such as coal mines and train tunnels) and terrestrial
environment, a propagation medium is still air. Thus,
standard EM wave propagation is used for communication.
This paper focuses on sensor node performance when
the node is transmitting EM waves through water or
ground. There are some papers that deal with such channel
modelling and there are few experimental studies in
underwater [23] and underground [24] environment, but the
link quality has not yet been thoroughly evaluated.
Furthermore, as new sensor nodes are continuously
being developed (higher RF sensitivity, higher RF power),
the question that arises is how these new and more powerful
classes of wireless sensors can handle the harsh
environment in underwater and underground networks.
Therefore, in this paper the following parameters are being
analyzed in various scenarios: Received Signal Strength
Indicator (RSSI), Link Quality Indicator (LQI), Round Trip
Time (RTT), and Packet delivery probability (prr). The
main objective of underground communication is to
analyze the impact of communication distance and soil
moisture on the monitored parameters. The goal of
underwater communication is to analyze how the distance
and additional impurities (salinity) affect the observed
parameters.
This paper is structured as follows: the measurement
testbed for measuring underwater and underground
communication is proposed in Section II. In Section III the
measurement results for underground communication are
presented, whereas in Section IV the measurement results
of underwater communication are presented. Section V
gives the conclusion of the proposed work and the
guidelines for future work.
II.
MEASUREMENT TESTBED
In order to examine the link quality parameters in the
underwater and underground networks a testbed that
consists of a point-to-point WSN and a testing chamber
was proposed. Used WSN topology is based on a star
topology, with one network Coordinator and n End nodes
which form a WSN. Hence, in the proposed topology the
first node is a network Coordinator whereas the second
node is an End node. The diagram of the proposed
measurement testbed is shown in Fig. 1.
This work was sponsored by the J. J. Strossmayer University of
Osijek under the project IZIP-2014-104 "Wireless power transfer for
underground and underwater sensors"
MIPRO 2016/MEET
135
power and higher RX sensitivity. The characteristics of
the used WSN node are shown in Table I.
Figure 1. The measurement testbed for underwater and underground
sensors
The network Coordinator is connected to a personal
computer (PC) via a USB cable, which is also used for
powering the Coordinator node. The End node is a battery
powered node. An application designed for testing and
measurement purposes is situated on the PC that
communicates with the Coordinator. Both nodes are
housed in waterproof containers to prevent contamination
by moisture. The distance between the node’s antennas and
the test chamber wall is greater than the diameter of the 1st
Fresnel zone (> 0.18 m); no relevant part of RF radiation
will propagate outside the test chamber. One of the
analyzed testbed parameters is the distance between the
nodes which refers to the distance between node antennas.
The nodes are assumed to be placed on the same height
and both antennas are directed to each other; PIFA (PCB
Inverted F Antenna) with a gain of 2.28 dBi. A block
diagram of the WSN node is shown in Fig. 2.
The proposed methodology for measuring the link
quality is based on convergecasting properties of the
WSN; multiple nodes transmit measured values to a single
coordinator of the network, but taking into account round
trip communication. In the proposed methodology the
sensor node sends a packet towards the coordinator and
waits for the reception of an acknowledgement packet. In
this manner each sensor node has a feedback information
whether the measured data reached the destination, what is
the delay (Round Trip Time, RTT) of the packet and what
is the packet delivery probability (prr). During the testing,
the node will send 100 packets, each separated by a time
delay of uniform distribution U (0.1 s, 1 s). After the
packets are sent, the end node will analyze the data and
transmit the link quality information to the coordinator.
This methodology is proposed due to the fact that in
harsh environments often End nodes lose communication
with the Coordinator due to changes in environmental
factors. With the proposed methodology a sensor node is
aware of the link quality information and can therefore
decide to transmit the data via RF or to store the data for
future transmission (upon link quality improvement).
In the course of the experiment two scenarios were
analyzed: underwater and underground communication by
means of filling the test chamber with soil or water.
III.
MEASUREMENT RESULTS – UNDERGROUND
COMMUNICATION
Figure 2. Block diagram of the used WSN node
The node is composed of a microcontroller and a RF
section which consists of a transceiver and a Power
Amplifier / Low Noise Amplifier (PA/LNA). With the
proposed architecture the used nodes have higher TX
First scenario estimates link quality in underground
communication in WSNs. The proposed methodology of
the experiment consists of several steps. The first step
consists of measuring the link quality for various distances
in a constant moisture of soil for uplink and downlink
communication. In this step the testing chamber was filled
with commercial soil of constant soil moisture of 56.2 %.
The approximate density of the soil was 300 kg/m3.
Measurement was performed each 10 cm until the distance
of 120 cm. The measurement results for RSSI value are
shown in Fig. 3
TABLE I
WSN NODE PARAMETERS
136
Parameter
Value
Frequency and modulation
Transciever
Microcontroller
Communication stack
Transmission power
Receiver sensitivity
Packet generation interval
Number of packets sent
Packet size
macMaxCSMABackoffs
macMaxFrameRetries
2.4 GHz, DSSS, 250 kBit/s
AT86RF231
Atmel ATxmega256A3U
Atmel LightWeight Mesh
20 dBm
-105 dBm
uniform (1s)
100
100B
4
3
Figure 3. RSSI for various distances vs. Downlink and Uplink channel
(underground – soil moisture 56.2 %)
As seen in Fig. 3 as the distance of nodes increases, the
RSSI value decreases accordingly. The trend can be seen
as a log distance model, widely known as log normal
shadowing model. If the graph is shown as a log distance
diagram, it can be seen that it is possible to determine the
MIPRO 2016/MEET
path loss exponent for underground communication [26].
First it is important to define the path loss of a
communication link as:
𝑃𝐿[𝑑𝐵] = 𝑃𝑡 [𝑑𝐵] − 𝑃𝑟 [𝑑𝐵𝑚] + 𝐺𝑡 [𝑑𝐵] + 𝐺𝑟 [𝑑𝐵]
(1)
where PL refers to the path loss, Pt refers to the transmitted
Power of the node (20 dBm), Pr refers to the received
power (RSSI), and Gt and Gr refer to the antenna gain of
the receiver and transceiver, respectively (2.21 dB). From
(1) it is possible to calculate a linear regression line and
determine the regression coefficients (Fig. 4).
Figure 4. Path loss for logarithmic distance with linear regression
(underground)
It can be seen from Fig. 4 that there is high correlation
with linear regression line and measurement data
(coefficient of determination value of 0.96) vs. the
logarithmic distance between nodes. As there are some
variations from the linear model these differences can be
accounted to multipath fading, reflections diffractions and
similar radio irregularity phenomenon [25]. These
variations can be modeled using a very widely used lognormal shadowing model:
𝑑
𝑃𝐿(𝑑) = 𝑃𝐿(𝑑0 ) + 10 ∙ 𝑛 ∙ log ( ) + 𝑋𝜎
𝑑0
(3)
To determine the path loss exponent of the underground
communication we correlate equations (3) and (2), which
results in determining path loss exponent model as
follows:
𝑃𝐿(𝑑) = 105.28𝑑𝐵 + 10 ∙ 5.496 ∙ log(𝑑)
The next step in measurements includes the increasing
of the soil moisture and measuring link quality parameters.
Soil moisture was controlled by adding tap water, with
quality parameters described onwards in Section IV. First
measured link quality parameter is RSSI (Fig. 5)
Figure 5. RSSI for various distances and soil moisture content
(underground)
In Fig. 5 the previous measurement containing 56 % of
moisture was shown alongside with the increased soil
moisture measurement of 61 %. It can be concluded that
an increase in soil moisture of only 5 % drastically
increases signal attenuation and increases the path loss.
This can be seen in Fig. 6 where the RSSI value is shown
for a distance of 50 cm in regards to soil moisture.
(2)
where PL is the path loss at referenced distance PL(d0), n
is the path loss exponent that is dependent on the
propagation medium and Xσ is the normally distributed
random variable with zero mean and standard deviation σ.
The derived linear regression equation from Fig. 4 is
𝑦 = 54.969 𝑥 + 105.28
(averaging 106.6 ms) without any direct correlation with
the distance or the RSSI value. Also, the packet delivery
probability averages 98.6 % with no direct correlation with
RSSI or distance. It is important to notice that the used
protocol stack uses packet retransmission on the MAC
layer which improves the reliability of the communication
and achieves high packet delivery probability.
(4)
Then, from the analysis of other parameters of the link
quality analyzed in this step (LQI, RTT, prr), it can be
concluded that the LQI ranges from 98.8 % to 100 % of the
link quality and the RTT ranges from 99 ms to 118 ms
Figure 6. RSSI vs. soil moisture content at distance of 50cm
(underground)
From Fig.6 it is evident that the increase in soil moisture
above 70% will result in communication link failure at
distances greater than 50 cm. This presents a very
important factor that needs to be taken into consideration
upon planning of an underground WSN. The influence of
soil moisture and distance for round-trip-time (RTT) was
analyzed next (Fig. 7).
TABLE II
UNDERGROUND EMPIRICAL PATH LOSS MODEL
Parameter
Path loss equation
n
PL(d0 = 1 m)
Xσ
MIPRO 2016/MEET
Value
10·5.496·log(d) + 105.28
5.496
105.28 dB
2.6 dB
Figure 7. Measurements of RTT vs. distance vs. soil moisture
(underground)
137
From the performed measurements shown in Fig. 7 it
can be concluded that the distance has no significant effect
on the RTT parameter. However, soil moisture has a
significant impact on the RTT with a negative trend,
meaning that the increase in moisture (increase in RF
attenuation) reduces average RTT for the sent packet. This
can be accounted for the reduction of reflections and
diffractions due to the increase in path loss of the medium.
This can also be seen for higher moisture contents where
RTT value settles at the minimum value (Fig. 8).
measuring the link quality for various distances in water
for uplink and downlink communication. In this step, the
test chamber is filled with water (approximately 110 L)
and the nodes are positioned so that the distance between
the antenna and the edge of the chamber is greater than the
size of the 1st Fresnel zone. Nodes with antennas are
positioned in a hermetically sealed enclosure made from
0.1mm nylon material, with PVC support. Measurement
was conducted using tap water at temperature of 15 °C
with conductivity of 870 µS/cm, hardness of 295 mg
CaCO3, °D 16.5 and TDS 460 mg/L [27]. This value for
conductivity is very close to the limit for brackish water
(1000 µS/cm).
Measurement was conducted at various distances where
the antennas were positioned in a straight line.
Measurement results for Uplink and Downlink value of
RSSI link quality parameter are shown in Fig. 10.
Figure 8. Measurements of RTT vs. distance vs. soil moisture
(underground)
By increasing the soil moisture beyond the 65%, the
communication is drastically affected and the packet
delivery probability drops to only 9 % (91 % of the packets
do not reach the destination – the Coordinator). In these
conditions the RTT increases due to the increase in
retransmissions (Fig. 9).
Figure 9. RTT and prr for various soil moisture content (underground)
From the measurement results it can be concluded that
for underground communication the most influenced
parameter is soil moisture. Soil moisture can significantly
degrade link quality and cause communication link failure.
On the other hand, by increasing soil moisture a drop in
RTT was observed that could be used as a model for
advanced communication protocols.
In comparison with path propagation in air, at maximum
distance (120cm) the RSSI value for LOS air propagation
equals -47 dBm, RTT equals 71 ms and packet delivery
probability equals
100%,
representing perfect
communication condition. This shows the amount of
attenuation of RF propagation induced by the soil or
similar propagation medium.
IV.
Figure 10. RSSI for various distances vs. Downlink and Uplink channel
(underwater)
As seen from measurement results Uplink and
Downlink channel results are consistent, with little
deviation, resulting in an almost symmetrical
communication link. On the other hand, the path loss in
underwater communication exhibits large amounts of
attenuation which results in a maximum communication
distance of only 15 cm. This can be accounted for large
value of water conductivity, CaCO3 and TDS. On the other
hand, other link quality parameters are not significantly
affected by the change in distance, resulting in mean value
of packet delivery probability of 98.3%, mean value of
RTT of 115.8 ms and average LQI of 254.5, not having
any correlation with RSSI or distance between the nodes.
On the other hand, as water quality parameters have a
profound impact on the communication and link quality,
the second step consists of adding impurities (salinity) in
order to increase the conductivity of water. Salinity was
increased up to 3 ‰ and RSSI was measured at the
distance of 10 cm between the End node and the
Coordinator. Results are shown in Fig. 11.
MEASUREMENT RESULTS – UNDERWATER
COMMUNICATION
A second scenario involves the analysis of underwater
communication in WSNs considering link quality
parameters. The proposed methodology of the experiment
consists of several steps. The first step consists of
138
Figure 11. RSSI for different values of salinity at 10cm distance
(underwater)
MIPRO 2016/MEET
From the measurement results it can be concluded that
the increase in salinity drastically affects the RSSI value;
increases the path loss in the channel. However, other link
quality parameters are not significantly affected by the
change in distance; not having any correlation with salinity
concentration.
V.
CONCLUSION
This paper presents a study on the impact of
propagation medium on link quality parameters for
underwater and underground Wireless Sensor Networks.
As the propagation medium can be versatile, this paper
analyzes the influence of soil moisture on the propagation
in underground networks and impurity levels in underwater
communication. Measurements were performed with the
proposed testbed composed of two WSN nodes; a
Coordinator node and an End node. Nodes communicate
within 2.4GHz band with TX power of 20 dBm, receiver
sensitivity -105 dBm and PIFA antenna with gain of 2.21
dBi. The testbed is composed of a testing chamber filled
with medium (soil/water) and a PC used to analyze the
communication. Link quality parameters that were
analyzed within this paper are RSSI, LQI, RTT and prr.
Underground communication: By performing detailed
measurements for various distances an empirical
propagation model is proposed where the path loss
exponent n and the path loss were calculated. From the
measurement results it can be concluded that soil moisture
has a profound effect on the link quality where the increase
in soil moisture can reduce communication range by a
factor of two, for only 10 % increase in soil moisture. An
interesting phenomenon was observed whereby increasing
soil moisture the communication delay RTT was reduced.
Underwater communication: Similarly to underground
communication an analysis was performed for various
distances, regarding link quality parameters. Tap water was
used with described water quality parameters. It can be
concluded that the path loss in water with relatively high
conductivity (850 µS/cm) is significant, resulting in a
maximum communication distance of only 15 cm. As the
content of impurities significantly affects communication,
the next step involved adding impurities in order to increase
the conductivity – salt. The increase in water salinity
drastically affects the RSSI value; increases the path loss in
the channel. However, other link quality parameters are not
significantly affected by the change in distance; not having
any correlation with salinity concentration. When salinity
is 3 ‰, the maximum communication distance drops to 10
cm.
Future work on the topic of underground
communication targets calculating the path loss exponent
for various soil moisture contents and various soil densities
in order to present an empirical path-loss model for
underground communication. Also, as the soil moisture
affects RSSI, soil composition such as the presence of salts,
or ions in the soil will change soil electromagnetic
properties, affecting radio propagation. These properties
will be investigated in future work. Furthermore, similar
approach will be taken in underwater communication
where the path-loss model will be proposed based on
different water quality factors (conductivity, salinity, TDS
MIPRO 2016/MEET
etc.) and isolating the most influenced factor for underwater
communication. Also, the measurement data will be
compared with theoretical expectations from the available
literature – existing propagation models.
ACKNOWLEDGMENT
The authors would like to thank Krunoslav Aladić, PhD
from Croatian veterinary institute, Branch - Veterinary
department Vinkovci for the analysis of soil moisture
content.
REFERENCES
[1]
Umar, A.; Akbar, M.; Iqbal, Z.; Khan, Z.A.; Qasim, U.; Javaid, N.,
"Cooperative partner nodes selection criteria for cooperative
routing in underwater WSNs," in Information Technology: Towards
New Smart World (NSITNSW), 2015 5th National Symposium on ,
vol., no., pp.1-7, 17-19 Feb. 2015
[2] Fahim, H.; Javaid, N.; Qasim, U.; Khan, Z.A.; Javed, S.; Hayat, A.;
Iqbal, Z.; Rehman, G., "Interference and Bandwidth Aware Depth
Based Routing Protocols in Underwater WSNs," in Innovative
Mobile and Internet Services in Ubiquitous Computing (IMIS),
2015 9th International Conference on , vol., no., pp.78-85, 8-10
July 2015
[3] Liaqat, T.; Javaid, N.; Ali, S.M.; Imran, M.; Alnuem, M., "DepthBased Energy-Balanced Hybrid Routing Protocol for Underwater
WSNs," in Network-Based Information Systems (NBiS), 2015 18th
International Conference on , vol., no., pp.20-25, 2-4 Sept. 2015
[4] Shah, M.; Javaid, N.; Imran, M.; Guizani, M.; Khan, Z.A.; Qasim,
U., "Interference Aware Inverse EEDBR protocol for Underwater
WSNs," in Wireless Communications and Mobile Computing
Conference (IWCMC), 2015 International , vol., no., pp.739-744,
24-28 Aug. 2015
[5] Zhiping Zheng; Shengbo Hu, "Research challenges involving crosslayered communication protocol design for underground WSNS,"
in Anti-counterfeiting, Security and Identification, 2008. ASID
2008. 2nd International Conference on , vol., no., pp.120-123, 2023 Aug. 2008
[6] Felamban, M.; Shihada, B.; Jamshaid, K., "Optimal Node
Placement in Underwater Wireless Sensor Networks," in Advanced
Information Networking and Applications (AINA), 2013 IEEE 27th
International Conference on , vol., no., pp.492-499, 25-28 March
2013
[7] Khalfallah, Z.; Fajjariz, I.; Aitsaadiz, N.; Langar, R.; Pujolle, G.,
"2D-UBDA: A novel 2-Dimensional underwater WSN barrier
deployment algorithm," in IFIP Networking Conference (IFIP
Networking), 2015 , vol., no., pp.1-8, 20-22 May 2015
[8] Amruta, M.K.; Satish, M.T., "Solar powered water quality
monitoring system using wireless sensor network," in Automation,
Computing, Communication, Control and Compressed Sensing
(iMac4s), 2013 International Multi-Conference on , vol., no.,
pp.281-285, 22-23 March 2013
[9] Parmar, J.K.; Mehta, M., "A cross layered approach to improve
energy efficiency of underwater wireless sensor network," in
Computational Intelligence and Computing Research (ICCIC),
2014 IEEE International Conference on , vol., no., pp.1-10, 18-20
Dec. 2014
[10] Dohare, Y.S.; Maity, T.; Paul, P.S.; Das, P.S., "Design of
surveillance and safety system for underground coal mines based
on low power WSN," in Signal Propagation and Computer
Technology (ICSPCT), 2014 International Conference on , vol.,
no., pp.116-119, 12-13 July 2014
[11] Jin-ling Song; Heng-wei Gao; Yu-jun Song, "Research on
Transceiver System of WSN Based on V-MIMO Underground Coal
Mines," in Communications and Mobile Computing (CMC), 2010
International Conference on , vol.2, no., pp.374-378, 12-14 April
2010
[12] Longsheng Liu; Yue Li; Zhijun Zhang; Zhenghe Feng; Wenming
Li; Da Zhang, "Experiment on underground propagation
characteristic using CC110-based WSN," in Antennas and
Propagation Society International Symposium (APSURSI), 2013
IEEE , vol., no., pp.1922-1923, 7-13 July 2013
139
[13] Xu Huping; Wu Jian, "Metal mine underground safety monitoring
system based on WSN," in Networking, Sensing and Control
(ICNSC), 2012 9th IEEE International Conference on , vol., no.,
pp.244-249, 11-14 April 2012
[14] Cammarano, A.; Spenza, D.; Petrioli, C., "Energy-harvesting
WSNs for structural health monitoring of underground train
tunnels," in Computer Communications Workshops (INFOCOM
WKSHPS), 2013 IEEE Conference on , vol., no., pp.75-76, 14-19
April 2013
[15] Guofang Dong; Bin Yang; Yang Ping; Wenbo Shi, "A secret
handshake scheme for mobile-hierarchy architecture based
underground emergency response system," in Advanced
Communication Technology (ICACT), 2015 17th International
Conference on , vol., no., pp.54-58, 1-3 July 2015
[16] Li Rong, "A study of the security monitoring system in coal mine
underground based on WSN," in Communication Software and
Networks (ICCSN), 2011 IEEE 3rd International Conference on ,
vol., no., pp.91-93, 27-29 May 2011
[17] Li Zhang; Xunbo Li; Liang Chen; Sijia Yu; Ningcong Xiao,
"Localization system of underground mine trackless facilities based
on Wireless Sensor Networks," in Mechatronics and Automation,
2008. ICMA 2008. IEEE International Conference on , vol., no.,
pp.347-351, 5-8 Aug. 2008
[18] Anguita, D.; Brizzolara, D.; Ghio, A.; Parodi, G., "Smart Plankton:
a Nature Inspired Underwater Wireless Sensor Network," in
Natural Computation, 2008. ICNC '08. Fourth International
Conference on , vol.7, no., pp.701-705, 18-20 Oct. 2008
[19] Ghelardoni, L.; Ghio, A.; Anguita, D., "Smart underwater wireless
sensor networks," in Electrical & Electronics Engineers in Israel
(IEEEI), 2012 IEEE 27th Convention of , vol., no., pp.1-5, 14-17
Nov. 2012
[20] Misra, S.; Ghosh, A., "The effects of variable sound speed on
localization in Underwater Sensor Networks," in Australasian
Telecommunication Networks and Applications Conference
(ATNAC), 2011 , vol., no., pp.1-4, 9-11 Nov. 2011
140
[21] Yong-sheng Yan; Hai-yan Wang; Xiao-hong Shen; Fu-zhou Yang;
Zhao Chen, "Efficient convex optimization method for underwater
passive source localization based on RSS with WSN," in Signal
Processing, Communication and Computing (ICSPCC), 2012 IEEE
International Conference on , vol., no., pp.171-174, 12-15 Aug.
2012
[22] Uribe, C.; Grote, W., "Radio Communication Model for
Underwater WSN," in New Technologies, Mobility and Security
(NTMS), 2009 3rd International Conference on , vol., no., pp.1-5,
20-23 Dec. 2009
[23] Abdou, A.A.; Shaw, A.; Mason, A.; Al-Shamma'a, A.; Cullen, J.;
Wylie, S., "Electromagnetic (EM) wave propagation for the
development of an underwater Wireless Sensor Network (WSN),"
in Sensors, 2011 IEEE , vol., no., pp.1571-1574, 28-31 Oct. 2011
[24] Stuntebeck, E.P.; Pompili, D.; Melodia, T., "Wireless underground
sensor networks using commodity terrestrial motes," in Wireless
Mesh Networks, 2006. WiMesh 2006. 2nd IEEE Workshop on , vol.,
no., pp.112-114, 25-28 Sept. 2006
[25] G. Horvat, D. Šoštarić and D. Žagar, "Using radio irregularity for
vehicle detection in adaptive roadway lighting," MIPRO, 2012
Proceedings of the 35th International Convention, Opatija, 2012,
pp. 748-753.
[26] A. Alsayyari, I. Kostanic and C. E. Otero, "An empirical path loss
model for Wireless Sensor Network deployment in an artificial turf
environment," Networking, Sensing and Control (ICNSC), 2014
IEEE 11th International Conference on, Miami, FL, 2014, pp. 637642
[27] Dadić, Ž., PREGLED KVALITETE PITKE VODE U
HRVATSKOJ, Priručnik o temeljnoj kakvoći vode u Hrvatskoj,
WaterLine, Quality water systems.
MIPRO 2016/MEET
Electrical Field Intensity Model on the Surface of
Human Body for Localization of Wireless Endoscopy
Pill
Bojan Lukovac, Ana Koren, Antonija Marinčić, Dina Šimunić
University of Zagreb, Faculty of Electrical Engineering and Computing
Unska 3, 10000
Zagreb, Croatia
E-mail(s): bojan.lukovac@ericsson.com, {ana.koren, antonija.marincic, dina.simunic}@fer.hr
Abstract—Constant struggle in medical science to develop new,
more advanced methods of medicine application and reducing
the invasiveness of a range of tests and procedures is present.
For achieving this ambitious goal, it is necessary to find and
develop new methods and technologies and to applicate them.
The issue of localizing a range of devices inside the human
body is one of them. S olution to this problem would enable the
upsurge of nanorobots (nanobots) for micro-operations,
application of strong medicine on limited areas to lessen its side
effects, conducting the nanotoxicology reports or reducing the
discomfort of some invasive procedures like endoscopy. In this
paper, the focus is on the latter and an example of an
endoscopy with a pill is given. Due to maximizing the
effectiveness of endoscopy by the use of the pill there is a need
for optimizing the technology used for its localization.
S EMCAD tool is used in modelling of the scenario and
executing the simulations.
Keywords— WBAN; Electrical Field; Dipole; Endoscopy;
Localization; SEMCAD
potential problems is found, difficulty of deducing the exact
location in the digestive tract presents itself, using solely the
images surrounding it. Thus, the process is not precise
enough to use in any operative procedures and it is usually
necessary to repeat endoscopy with classical methods in
order to confirm the diagnosis. Localization of the
endoscopy pill implies determining its location, whether in
real-time or retroactively, in respect to a referent point. It is
expected for the solution to be portable, and not requiring the
hospitalization of the patient for the time of procedure (time
necessary for the pill to travel the digestive tract is
approximately 12 hours). Furthermore, human body consists
of bones and tissue which are arranged unevenly. Tissue has
different density in different locations and is heterogeneous,
which is why the issue of precise localization inside the
human body remains open. In the next chapter (Chapter 2),
the explanation of the selected antenna model is given,
followed by the simulation and results of the model using
SEMCAD (Chapter 3). Finally, conclusions are drawn in the
Chapter 4.
I. INT RODUCT ION
Medical field is in a constant search for new, more advanced
methods of medicine application and reduction of the
invasiveness of a range of tests and procedures , one of which
is endoscopy. Endoscopy is a procedure used for viewing the
internal organs and vessels of a patient’s body without
making incisions. It is a common and powerful diagnostic
tool for digestive diseases . Classical endoscopy is described
as an extremely unpleasant and painful procedure which
doesn’t allow the exploration of parts of digestive tract (i.e.
small intestine). The endoscopy, in which a pill is swallowed
by the patient, travels through digestive system and
documents it by taking photographs already exists, however
it hasn’t replaced the classical method. This is due to it being
less exact as the localization of the pill (and any device in
general) inside of human body is not precise enough due to
several challenges. Current procedure of diagnosis via
endoscopy pill consists of the doctor manually viewing all
the images provided by the pill, which can sometimes be u p
to 100.000 images, depending on the manufacturer.
Currently there is no automatic filtering of non-relevant
images. When the image where there’s an indication of
MIPRO 2016/MEET
II. PROPOSED A NT ENNA M ODEL
Endoscopic capsules are medical diagnostic devices 32 mm
long and 11 mm in diameter consisting of a camera module,
battery, antenna and transmitter along with additional
sensors that depend on the manufacturer. They were
developed in order to reduce the discomfort inherent to
traditional endoscopic procedures and allow for detection of
medical issues of the otherwise unobservable small
intestine. Design of the antenna is an iterative process
which demands multiple calculations with the goal of
optimizing all the relevant parameters for working on the
frequency specified. The antenna contained in the pill used
in simulations is modelled in SEMCAD tool as a simple
dipole antenna, consisting of two cylindrical parts with
diameter of 1 mm, length of 10 mm and spacing of 1 mm.
Antenna in the simulations is assumed to be a perfect
electric conductor (PEC) in order to simplify the
experiment. As shown in Figure 1, the antenna is sheathed
with dielectric cylinder to protect it from contact with the
surrounding tissue. Material used for the cylinder is Rogers
141
TMM-10 substrate which has electrical conductivity of
0.00087 S/m, relative permittivity 9.2 and density 1000
kg/m3).
a)
Figure 1. Isometric view of the antenna modelled in
SEMCAD
Model of the pill is placed onto a several locations in the
body, though the majority of the measurements are executed
on one of the three locations.
b)
III. SIMULAT ION AND RESULT S
For the simulations, the software SEMCAD (Sim4Life) has
been used, namely its most detailed model of the human
body, Duke – a 36 year old man. First location of the
antenna is in the large intestine in position surrounded by
the pelvic bones (Figure 2a). Second location is the lumen
of small intestine in the proximity of navel, on the front end
part of the abdomen, which is a location that enables us to
observe the difference in attenuation between front part of
the body (with just a thin layer of muscle, fat and skin) and
the back part (where attenuation is caused by intestine,
organs, spine, muscles, fat and skin). This is shown in
Figure 2b. Final location measured is the middle of the
stomach lumen, which is characterized by the presence of
the rib cage and multiple organs which significantly
influence the strength of the signal (Figure 2c). These
locations are selected as they represent well the various
conditions inside the human’s body. Furthermore, in these
three locations, measurements have been performed with
harmonic signal and Gauss impulse; the pill has been rotated
parallel to all three axes and; the measurements in the space
diagonal have been performed. The intensity of the
electrical field and Poynting vector has been measured.
Poynting vector was selected as an additional method due to
practical reasons.
142
c)
Figure 2. Locations of the antenna marked with a red circle:
pelvic bones (2a), lumen of small intestine, (2b) and middle
of stomach lumen (2c)
Expected diagram of dipole antenna radiation in free space
is shown in Figure 3. Areas in the red specter present lower
power while the green areas present higher relative power.
All the simulations have been performed using the antenna
transmitting at 2.4 GHz frequency, and the source
characteristics being 5 V voltage and internal resistance 50
Ω.
MIPRO 2016/MEET
b)
Figure 5. Electrical field intensity (front and side view)
Figure 2. Diagram of dipole radiation in free space
As an additional method, Poynting vector was measured.
Results for the same scenario are given in Figures 5c and 5d
below.
Following figures (Figure 4a and 4b) show the legend for
the following simulations (Figures 5 through 7).
a)
b)
Figure 4. Reference values of electrical field intensity (left)
and Poynting vector value (right; legend applicable for all
simulations)
a)
Furthermore, the radiation of the dipole can be observed on
the following figures, as well as the position (localization)
of the pill. In Figures 5a and 5b it is clear that the intensity
of the electrical field suffers from strong attenuation caused
by the pelvis bones.
b)
Figure 5. Poynting vector value in the first scenario (front
and side view)
a)
MIPRO 2016/MEET
This leads to the following conclusions. First, electrical field
intensity is less indicative than the value of Poynting vector
due to lesser loss of intensity inside the body. Secondly, the
area in the pelvis region strongly affects the radiation of
dipole antenna; pelvis bones cause reflection and attenuation
which causes the diagram of radiation starts to look almost
isotropic. The results of the second simulation scenario are
presented in Figure 6.
143
the antenna is located. The importance of this result is great
since the recognition of the expected value potentially
enables the assessment of the pill’s orientation.
Figure 6. Poynting vector value in the second scenario (side
view)
Again, several conclusions can be drawn. The proximity of
the radiation source to skin causes propagation of
electromagnetic wave and distorts the radiation diagram.
Organs, bones and tissue cause the expected loss of wave
strength in all other directions. The surroundings have much
larger influence on the radiation diagram than the
orientation of the pill. Thus, precise localization of the pill
would require an algorithm which would define
measurements on a large number of locations in a body so it
would be possible to eliminate the influence of the wave
propagation in the skin. Results of the third and last scenario
are given below in Figures 7a and 7b.
Figure 8. Radiation diagram distortion on the skin surface
Closer analysis of the radiation gives an insight into specific
interactivity of the wave and skin and is shown on Figure 8.
On the area, marked with a red circle, it is clearly visible
how weak the signal is in the area between the pill and skin
(in line with the expectations) but then greatly increases
through the skin surface, due to refractive index between air
and skin. Thus, it would be advisable to perform the
measurements at approximately 1 cm distance from the skin
surface.
IV. CONCLUSION
a)
b)
Figure 7. Electrical field intensity and Poynting vector value
in the third scenario
The stomach has proved to be the best location for all tests
of dipole antenna since the result received has the expected
doughnut diagram of dipole antenna radiation, most likely
due to the homogenic nature of the stomach lumen where
144
The issue of localizing a range of devices inside the human
body is extremely relevant in today’s medical science as the
solution to this problem would enable the upsurge of
nanorobots (nanobots) for micro-operations, application of
strong medicine on limited areas to lessen its side effects,
conducting the nanotoxicology reports or reducing the
discomfort of some invasive procedures, endoscopy
included. Several conclusions were drawn from the past
simulations in SEMCAD. The greatest impacts on the
propagation of electromagnetic wave inside human body
have transitions between bones and other tissue and
transitions between skin and air. Low electrical conduction
and density of the bones has the largest effect on the wave
propagation having in mind the ratio of bone volume in the
total body volume. Choice of antenna is of crucial
importance. The Rogers TMM-10 substrate cylinder
sheathed dipole antenna gives realistic radiation diagrams.
In some of the simulated scenarios (locations) the
parameters of the environment (surrounding area) impacts
the radiation diagram severely, to the point where it is
impossible to draw conclusions of the pill position or
orientation. Thus, it is necessary to develop algorithms
which would take into account the environment and
eliminate its effects (e.g. wave propagation through skin) in
MIPRO 2016/MEET
order to ensure precision. Furthermore, measurements
should be conducted without skin contact. Point of
refractive index between skin and air deforms radiation
diagram and makes the assessment of the position and
orientation of the antenna more difficult. Hence, it is
advisable to move the measuring instrument at least 1 cm
away from the skin surface to avoid diagram deformation.
Lastly, graphs of electrical field intensity and Poynting
vector both describe similar radiation diagrams. However,
due to, by far, easier and more practical measuring of the
received power (which is directly proportional to the
Poynting vector) using a sensor in comparison with
measuring the electrical field intensity it is advisable to use
measurements of Poynting vector (i.e. received power) for
calculations.
REFERENCES
[1] Penazzio M. et al., Small-Bowel Capsule Endoscopy and DeviceAssisted Enteroscopy for Diagnosis and T reatment of Small-Bowel
Disorders: European Society of Gastrointestinal Endoscopy (ESGE)
Clinical Guideline, Endoscopy 2015 nr. 47, 2015., pp . 352.-376.
[2] Ladas, S.D: et al. European Society of Gastrointestinal Endoscopy
(ESGE): Recommendations on Clinical Use of Video Capsule Endoscopy
to Investigate Small-Bowel, Esophageal and Colonic Diseases, Endoscopy
2010 nr. 42, 2009., pp. 220.-227.
[3] Woods, S.P., Constadinou, T .G., Wireless Capsule Endoscope for
T argeted Drug Delivery: Mechanics and Design Considerations, IEEE
T ransactions on Biomedical Engineering, Vol. 60, nr. 4, April 2013., pp.
945.
[4] Yuce, M., Dissanayake, T., Easy-to-Swallow Antenna and Propagation,
IEEE Microwave Magazine, nr. 14, June 2013., pp. 74 -82.
[5] Wen-cheng Wang et al., Experimental Studies on Human Body
Communication Characteristics Based Upon Capacitive Coupling, Body
Sensor Networks (BSN), 2011 International Conference on, Dallas, USA,
May 2011., pp. 180 – 185.
[6] Sazonov, E., Neuman, M. R., Wearable Sensors: Fundamentals,
Implementation and Applications, Academic Press – Elsevier, San Diego,
USA, 2014., pp. 462.
[7] Francesco Merli et al., Design, Realization and Measurements of a
Miniature Antenna for Implantable Wireless Communication Systems,
IEEE T ransactions on Antennas and Propagation, Vol. 59, nr. 10, October
2011., pp. 3544-3555.
[8] Vallejo, M., et al., Accurate Human Tissue Characterization for EnergyEfficient Wireless On-Body Communications, Sensors (Basel) Vol. 13., nr.
6., 2013., pp. 7546.-7569.
MIPRO 2016/MEET
145
Wide band current transducers in power measurment
methods - an overview
Roman Malarić, Željko Martinović *, Martin Dadić, Petar Mostarac, Žarko Martinović **
Faculty of Electrical Engineering and Computing/
Department of Electrical Engineering, Fundamentals and Measurements
Unska 3, 10000 Zagreb, Croatia
*COMBIS / Baštijanova 52A, Zagreb, Croatia
**Danieli Systec, Katuri 17, Labin, Croatia
e-mail:, roman.malaric@fer.hr, zeljko.martinovic@combis.hr, martin.dadic@fer.hr, petar.mostarac@fer.hr
z.martinovic@systec.danieli.com
Precise power measurement is usually done at national
metrology institutes. In recent years researchers have devoted
substantial time toward a measurement of power under nonsinusoidal conditions, especially in power grids because of
renewable energy increase. The challenge is not only to measure
power at 50 Hz, but also to measure harmonics and inter
harmonics. To measure power at needed accuracy, accurate and
frequency independent current and voltage transducers are
needed up to 100 kHz. In this paper an overview and state of the
art is given for current transducers used for this purpose. This
mainly includes AC shunts of coaxial design, calculable
resistors, precise transformers and methods for their
characterization.
I. INTRODUCTION
In order to transform the currents to voltages suitable for
analog to digital conversion (usually 1 V RMS) to obtain
power and energy, precise current shunts or
shunt/transformer combination are used for this purpose.
This, in particular, is important for the measurement of power
and energy under distorted waveform conditions. Current
shunts therefor represent one of the most important parts of
power measuring system (fig.1.).
U/I source
Shunts
Dividers
A/D
conversion
Algorithms
Calculation
Fig. 1. Precise power measurement system
II. CURRENT SHUNTS
Coaxial shunts have a better frequency behavior than shunts
which are not coaxially connected [1], and smaller AC-DC
current transfer differences up to 30 kHz or even 100 kHz.
Shunt design have been mostly based on work done at
Dimitry Ivanovich (VNIIM) but there have been few
exceptions and different designs. The most often used
resistors are low inductance metal foil type. NRC [2] shunts
for currents up to 200 mA are a star configuration of resistors
soldered on a single-sided printed circuit board. For the
current ranges from 0.5 to 10 A, the shunts are Mendeleev
146
type built of three plates connected by a number of ribs. The
plates and the ribs are cut from the double-sided copper-clad
boards. The shunt resistor consists of a number of Vishay
type S102C metal film resistors, concentrically mounted
between the output and the middle plates, and connected in
parallel by the copper layers of these plates. The input plate
is removed from the two output plates by the length of the rib
switch minimizes the inductive coupling between the input
and the output circuits. This design is similar to the SP type
[3][4] which are designed from 50 mA to 100 A, CSIRO
shunts [5] and SIQ shunts [6] which are coaxial type ranging
from 100 mA to 20 A, with frequency range from 10 Hz to
30 kHz. The input and output sides are separated by 17-cmlong crossbars that are made of double-sided printed circuit
boards. The number of crossbars and the diameter of the
circular plates depend on the current shunt’s nominal current.
CMI shunts [7] are also cage type designed to be well suited
for use with planar multi-junction thermal converters
(PMJTCs). The voltage drop across every shunt (30 mA to
10 A) in parallel with a 90 PMJTC is 1 V at nominal current.
A calculable model was developed for these shunts using
lumped circuit elements. Temperature coefficients were
reduced by using a suitable combination of resistors types
(S102C, S102K and Z201). New generation of shunts were
produced based on results obtained from the analysis of the
lumped element model of the original shunt [8]. At BEV [9]
a manganine foil is wound as the resistive element around a
cylindrical core made of glass fiber epoxy. One copper foil at
the outside brings the low potential back to the current input,
another copper foil underneath the manganine brings the high
potential from the input to the output side. All foils are
soldered onto copper discs which are mechanically connected
to the core. This principle applies for shunts in the range from
2 A to 100 A, except that for shunts from 2 A to 10 A the core
is made of a brass tube instead of copper foil. Shunts from
100 mA to 1 A is made with metal film resistors soldered in
parallel in a 4-terminal arrangement with a small loop in both
the current and potential path. The output voltage at nominal
current is 0,4 V. INRIM [10] current shunts cover the range
from 5 mA to 20 A with nominal voltage of 1 V to cover the
input voltage of planar multi-junction thermal converter
(PMJTC). The shunts with value from 5 mA to 100 mA have
a single SMD resistor connected between the central
MIPRO 2016/MEET
conductor and a return wire at the external coaxial screen.
Shunts up to 2 A are made with a disk of a double-sided
printed circuit board with conductive layer removed from one
side except for the edge, and on the other side, a ring is
machined out of the conductive layer. The conductive parts
and the external edges that are connecting the two sides are
coated with gold, with 10 SMD resistors soldered across to
the insulating ring. For ranges up to 20 A the disks are made
of a thermally conductive ceramic material 10 mm thick with
a diameter of 80 mm. The resistive parts are made with
manganin foils that are fixed by thermally conductive
adhesive resin. The foils are then patterned to the proper
shape. At JV[11], a new type of shunt for ac–dc current
transfer was developed for frequencies from 10 to 100 kHz
and current ranges of 30 mA–10 A with uncertainties smaller
than ±9 μA/A using surface-mount resistors of the cylindrical
metal-electrode face-bonding (MELF) type which are cheap
and easily available with temperature coefficients below 10
μΩ/Ω per K. The shunts have very low and calculable ac–dc
current transfer differences in the frequency range of 10 Hz–
100 kHz. INTI [12] presented three different shunt designs
for AC-DC current transfer which has been evaluated. The
nominal current values are 5 A and 10 A with nominal output
of 1 V. One of the shunt is similar to cage type design and
two other differ from known designs, however the paper
lacks the comparison results. Commercially available shunts
include Fluke A40B series, Guildline 7340 and 7350 series,
Transmille, and some of the already mentioned in this paper
from National Metrology Institutes. A new AC current shunt
was also designed at NIM that uses minalpha wires instead of
commercial available resistors [13]. Shunt showed good
results up to 100 kHz.
A. Use of transformers to lower current level
Sometimes current transformers are used to lower the
currents to acceptable levels and are then burdened with low
current shunts. INMETRO [14] developed a special twostage transformer, which is coupled to a standard shunt of 10
Ω. At NIM [15] a set of shunts is used for current ranges of
0.1, 0.2, 0.5, 1, 2, 5, 10, and 20 A, while a current transformer
is adopted for 50 A. The full-scale output signals of the
voltage dividers and the shunts are 0.8 V regardless of the
voltage/current range, which are the input signals of the
corresponding two DVMs. INTI [16] developed current
transformers with 100 mA output value which are used
together with 10 Ω AC shunt to obtain 1 V output. At PTB
Primary AC power standard [17] current transformers
together with AC shunt is used as well. INRIM [18] current
to voltage converter is made up of a double-stage current
transformer with a precision 1 Ω AC resistor connected to its
output.
B. High current shunts
Recent developments have seen the realization of 100 A
shunt with considerable uncertainty improvement so that a
current transformer is not needed any more up to 100 A [19]
that have AC-DC difference below 20µA/A up to 100 kHz or
with [4] AC-DC difference of -36 μA/A up to 100 kHz.
These are designed in similar manner like the lower value
MIPRO 2016/MEET
shunts using Mendeleev cage type. NRC high current shunts
[20] have been produced in cooperation with BEV. Each
shunt consists of three coaxial cylinders mounted between
four copper plates on a glass fiber epoxy cylinder. The middle
cylinder, the resistor of a shunt is manufactured from a thin
manganine foil. The AC-DC difference is lower than 100
μA/A at 100 kHz. BEV Institute [21] also manufactures high
current TEE connector needed for calibration of these shunts.
There are also some commercial shunts like Guidline 7340
and Fluke A40B that are available up to 100 A.
C. Calculable AC-DC resistors
The AC-DC difference of shunts can be obtained using the
modelling the shunts with lumped elements and calculating
the frequency response and by calibrating the shunts with AC
bridge which needs a calculable resistance standard or
comparing it against thermal voltage transfer standards.
Several types of the calculable ac/dc resistors have been
presented in literature as coaxial, bifilar, quadrifilar and
octofilar designs. The time constant for coaxial and bifilar
resistors was analyzed [22]. Usually a simple geometrical
structure is used. As for more recent research, the calculable
coaxial resistor was designed [23] based on the coaxial line
with a cylindrical shield which can be described by relatively
simple equations for the real and imaginary parts of the
impedance. The resistors consist of a straight even ohm
resistance wire surrounded by a 51 mm diameter coaxial
brass case with nominal value of 1000 Ω. However, the
resistor was evaluated only at one frequency of 1592 Hz with
good results. In an inter-comparison [24] of calibration
systems for AC shunts up to audio frequencies (10 kHz)
between the NRC, JEMIC, and NIST is presented. The
comparison was implemented with a calculable transfer ac/dc
shunt, designed by JEMIC at frequencies up to 10 kHz in
each laboratory. Both laboratories use for that purpose its
own calculable AC-DC resistors and AC bridges. The NRC
uses two calculable ac/dc quadrifilar resistors of 100 and
1000 Ω described by Gibbons [25]. JEMIC [26] and NIST
[27] compares a 0.1 Ω bifilar reference current shunt with
calculable amplitude and phase response. A new design for
calculable coaxial resistors [28] with the values of 1 kΩ, 10
kΩ and 12,906 kΩ have been modeled by using Mathcad
computer program and evaluated in. Also, a new coaxial
resistors [29] for use with quantum Hall effect-based
impedance measurements was designed at PTB and CMI.
The improvement compared with existing resistors is in
easier handling of inner resistive wire from the shield.
D. AC Shunt modelling
Shunts differ from calculable AC-DC resistors as they are not
so simple to model having many elements such as ribs, plates
and resistors. Some of the shunts were just calibrated using
thermal voltage converters, some have been evaluated using
simple models, and some of the shunts have been modeled
using detailed equivalent schemes. A calculable model [30]
for Mendeleev type shunt was developed using lumped
circuit elements which can be used to calculate trans
impedance, ac–dc difference, and the phase angle error of a
shuns. It is based only on calculations of all component
147
values from the geometry and material properties (except
resistors). Compared with measured results the difference is
less than 6 μΩ/Ω in the AC-DC difference and 110 μrad in
the phase angle error at frequencies up to 100 kHz. SIQ
shunts [31] model was made using lumped element modeling
and then analytical transfer function and input impedance of
the shunt were derived from the model and used to calculate
frequency response and input impedance as a function of
frequency. These calculations were then compared to the
calibrated values of AC/DC difference and against measured
input impedance. Simplified model of BEV foil shunts [32]
have been presented. Also, the simplified circuit of INRIM
shunts [10] have been presented but without mathematical
modeling. NRC shunts [2] have been modeled assuming that
shunt consists of several identical components. The input and
output front plates were modeled as R−C−R T-networks, the
ribs as (R + L−C−R + L) T-networks, and the shunt resistor
as (R + L)C assuming that the shunt is a two terminal-pair
device. The model, even though too imprecise for a
theoretical characterization of a “calculable” shunt was used
in selecting the length of the ribs. The longer ribs increase the
distance between the input and the output, thus reducing the
coupling between the two circuits, which is a major source of
the shunt ac–dc difference. Shorter ribs, which were
eventually used in the design with lower capacitance,
decrease the internal capacitance of the shunt. The number of
ribs and resistors is again a compromise as with increasing
the number of ribs increases the nominal power rating of the
shunt but also increases the internal capacitance. JV shunts
[11] are modeled using lumped circuit elements. The model
includes approximations to the parasitic reactive and resistive
components from the chosen shunt geometry. Component
values are measured and/or calculated from the geometry and
material properties. Finally, important paper is the analysis of
shunt TVC [33] combination which was done. This paper
describes the relationship between the overall ac-dc
differences of a shunt thermal-converter combination.
comparing it with PMJTC 10 mA AC standard. The AC-DC
difference was less than 50 for all frequencies and
currents. The same method was also used to determine
frequency characteristics of NRC shunts [20]. The AC-DC
current difference for in-house built shunts from 100 mA to
10 A is less than 100 up to 100 kHz, but for
commercial shunts from 30 A up to 100 A the AC-DC
difference is quite larger particularly at frequencies above 10
kHz. JV shunts [11] were evaluated for AC-DC difference
using PMJTC in a digital bridge setup for comparing the ac–
dc current differences is similar to the system described by
Rydler [35]. A complete step-up from 10 mA to 10 A was
made where the unknown shunt/PMJTC for the next higher
current is calibrated at the current level of the known
converter and then used at its rated current. At 10 mA, the
shunt/PMJTC was calibrated directly against the primary
reference for the ac–dc current difference. The other steps
were 30 mA, 100 mA, 300 mA, 1 A, 3 A, 5 A, and 10 A. The
AC-DC difference was determined to be less than 20 μΩ/Ω
for currents from 50 mA to 10 A and frequencies up to 100
kHz. As the frequency characteristic of a shunt–thermal
converter combination depends on the frequency
characteristic of the shunt [2] the frequency characteristic of
the thermal converter, and a mutual inductance between the
shunt and the thermal converter which can be neglected for
this particular shunt design, and quit some simplifications.
The AC-DC transfer difference of a shunt–thermal converter
combination δi can be presented [32] as follows:
𝛿𝑖 ≈
𝑅𝑇𝑉𝐶
𝑅𝑆 +𝑅𝑇𝑉𝐶
× 𝛿𝑉 +
𝑅𝑆
𝑅𝑆 +𝑅𝑇𝑉𝐶
× 𝛿𝐼 + 𝛿𝑅
(2)
from where the AC-DC current difference for shunts can be
calculated. In addition AC-DC difference of current shunts
can be evaluated using different [36] AC bridges and resistors
with calculable AC response.
F. Phase angle error
E. Measurement of AC-DC current difference
of shunts
An AC-DC difference is defined as:
𝛿𝐴𝐶−𝐷𝐶 =
𝑉𝐴𝐶 −𝑉𝐷𝐶
𝑉𝐷𝐶
(1)
The AC-DC current transfer difference is the most important
characteristics of AC shunt and can be measured in three
ways: inter-comparison with reference shunt, calibrated with
thermal voltage standards and by measuring it with precise
AC bridge. For example, SIQ shunts have been compared to
current shunts [6] with known AC-DC. The AC-DC
difference was less than 20 for frequencies up to 30
kHz and currents up to 20 A. However to properly
characterize current shunts it is necessary to compare them
with thermal voltage converters which have a flat AC-DC
characteristics up to 100 kHz. At CMI [34] step-up procedure
is used to measure AC-DC current transfer difference from 1
mA up to 10 A in the frequency range 10 Hz – 100 kHz by
148
The phase angle error calculation from the model can be
validated by measuring it, even though it is a difficult task as
the shunts are designed to have very low phase angle error.
Most of the commercial shunt do not even specify phase
angle error. Measurement of shunt phase angle errors have
been performed [37] in various ways in SP research center.
Four terminal inductance are determined using an LCR-meter
and calculable inductance standards and calculated phase
angel errors are then verified at frequencies up to a few kHz
by comparison with digital sampling wattmeter [38] for
shunts from 0,05 A to 10 A. Later [39] it was extended 1 MHz
using the phase comparator comprising of fast and accurate
digitizers. The agreement between the different methods, for
measurement of phase angle difference of two shunts, was
within a few prad at 1500 Hz. Phase angle errors of shunts
[40] with rated currents at 50 and 100 A at frequencies from
25 to 100 kHz have been determined using a three-branch
binary inductive current divider to measure the phase angle
errors of high current shunts against a phase angle reference
standard with only one step. Later [41] it was extended to 200
kHz. A new method [42] was described to determine the
MIPRO 2016/MEET
phase angle errors of ac shunts by measuring the inductance
and distributed capacitance. For this purpose several units
have been developed: 1 Ω shunt of coaxial design as the time
constant standard, a coaxial inductor with identical structure
as the time constant standard and a four-terminal mutual
inductor for the measurement of the inductance of the time
constant standard.
In [43] a method is proposed to determine phase angle of
shunts using group of micro potentiometer resistors designed
at CSIRO whose phase angle error can be described, in the
first approximation with a formula. With the buildup process
the phase angle can be determined for currents from 100 mA
up to 20 A and frequencies from 40 Hz to 200 kHz. Level
dependence of phase angle error method have been proposed
[44] because in the build-up procedures, the unknown shunts
for the next higher rated current are calibrated at the current
level of the known shunts, and then it is used at its rated
current. A wideband phase comparator has been developed at
INRIM for high current shunts [45]. The two-input digital
phase detector is realized with a precision wideband digitizer
connected through a pair of symmetric active guarded
transformers to the outputs of the shunts under comparison.
The system is suitable for comparing shunts in a wide range
of currents, from several hundred of mill ampere up to 100A,
and frequencies ranging between 500Hz and 100 kHz. The
system has been used for international comparison of current
shunts [46] from 10 A to 100 A.
G. DC characterization of shunts
To properly characterize AC shunts and calculate
measurement uncertainty contribution of shunts in power
measurement, in addition to phase angle errors and AC-DC
difference it is also necessary to evaluate shunt performance
for power coefficient (level dependence), temperature
coefficient and drift of DC resistance of shunt.
Level dependence of step up procedure has been [47]
evaluated. The step-up procedure is based on the assumption
that the ac–dc difference of a shunt and thermal converter
combination is the same at two current levels used in the stepup procedure and can produce systematic errors that are
added in each step and can be considerable at the end of the
chain. The authors argue that because of their design, it was
not expected to find an appreciable low frequency level
dependence of the shunts but in PMJTC. In this paper TEE
connector used to connect two shunts has been analyzed as
well. The level dependence of two different shunts have been
also evaluated [48] in order to reduce the contribution of the
current level dependence in the uncertainty budget.
The temperature coefficients (TCR) of the shunts and drift [6]
were evaluated. The temperature coefficient (TCR) was
measured in a temperature chamber, where the temperature
was set to 23 C, 18 C, and 28 C. This temperature change
was large enough to determine the TCRs at the shunt’s
working point. The TCRs proved to be linear in the 18 C–
28 C temperature range within the measurement uncertainty.
The temperature rise due to the load current was calculated
based on the resistor power coefficient specification. Drift
was during several month periods by comparison with the
known reference resistor with the direct current comparator
MIPRO 2016/MEET
resistance bridge. Measurements showed that this drift was
fairly linear with time and that a linear drift slope could be
calculated for each current shunt. Drift was measured from
2,8 to 13,8 per year for all shunts. Complete DC
characterization of shunts in ppm level have been conducted
[49]. The set of four working resistance standards (0,1 Ω 0,001 Ω) and Fluke 8508A reference multi-meter have been
used for comparison purpose. The system established in CMI
which also includes oil bath and thermostatic chamber was
used to characterize foil and cage shunts up to 100 A. The
second paper [50] gives full DC characterization of AC
shunts in the range of 30 mA to 10 A. The characterization
includes the long-term drift, the temperature coefficient, and
the power coefficient, which all appear to have effects on the
level of 1-10 μΩ/Ω. The shunt investigated was designed and
built by JV [11]. In table 1, current and frequency ranges for
all shunts manufactured at different national metrology
institutes and companies are summarized:
Table 1: Available current shunts
Manufacturer
SIQ
NRC
INRIM
SP
CMI
JV
CSIRO
BEV
VNIIM
INTI
Fluke
Transmille
Guildline
Current range [A]
100 A – 100 A
1 mA–100 A
5 mA – 20 A
10 mA - 100 A
30 mA - 10 A
30 mA to 10 A
100 mA - 20 A
10 mA to 100 A
1 A – 10 A
5 A – 10 A
1 mA – 100 A
1 mA – 100 A
10 mA – 100 A
III. CONCLUSION
In this paper an overview of wide band current transducers is
presented. All available papers presenting the available AC
shunts and transformers are included, and commercial AC
shunts are also mentioned. A special section has been added
that presents the methods and procedures for the calibration
and characterization of shunts and transformers in respect to
their AC-DC difference, phase angle error and DC
characterization.
IV. ACKNOWLEDGEMENT
In the name of our research group (Department of Electrical
Engineering Fundamentals and Measurements from Faculty
of Electrical Engineering and Computing, University of
Zagreb, Croatia) I would like to thanks Croatian Science
Foundation for their support and motivation. This work has
been fully supported by Croatian Science Foundation under
the project Metrological infrastructure for smart grid IP2014-09-8826.
149
LITERATURE
[1] M. M. Halawa, Implementation and Verification of Building-up AC-DC
Current Transfer Up to 10 A, International journal of electrical engineering
& technology (IJEET), 2012, pp. 262 – 273
[2] P. S. Filipski, and M. Boecker, AC–DC Current Shunts and System for
Extended Current and Frequency Ranges, IEEE Transactions on
Instrumentation and Measurement, 2006, pp. 1222 – 1227
[3] S. Svensson and K. E. Rydler, A Measuring System for the Calibration
of Power Analyzers, IEEE Transactions on Instrumentation and
Measurement, 1995, pp. 316 – 317
[4] K. E. Rydler and V. Tarasso, Extending ac-dc current transfer
measurement to 100 A, 100 kHz, Precision Electromagnetic Measurements
Digest, 2008, pp. 28 - 29
[5] I. Budovsky, A micropotentiometer-based system for low voltage
calibration of alternating voltage measurement standards, Precision
Electromagnetic Measurements Digest, 1996, pp. 497 - 498
[6] B. Voljc, M. Lindic, and R. Lapu, Direct Measurement of AC Current
by Measuring the Voltage Drop on the Coaxial Current Shunt, IEEE
Transactions on Instrumentation and Measurement, 2009, pp. 863 – 867
[7] V. N. Zachovalova, AC-DC current transfer difference in CMI, Precision
Electromagnetic Measurements Digest, 2008, pp. 362 – 363
[8] V. N. Zachovalova, M. Síra and P. Bednari, New generation of cagetype current shunts at CMI, 20th IMEKO TC4 International Symposium,
2014, pp. 59 – 64
[9] M. Garcocz, P. Scheibenreiter, W. Waldmann, and G. Heine, Expanding
the measurement capability for AC-DC current transfer at BEV, Precision
Electromagnetic Measurements Digest, 2004, pp. 461-462
[10] U. Pogliano, G. C. Bosco, and D. Serazio, Coaxial Shunts as AC–DC
Transfer Standards of Current, Precision Electromagnetic Measurements
Digest, 2008, pp. 30-31
[11] K. Lind, T. Sørsdal and H. Slinde, Design, Modeling, and Verification
of High-Performance AC–DC Current Shunts From Inexpensive
Components, IEEE Transactions on Instrumentation and Measurement,
2008, pp. 176 - 181
[12] L. D. Lillo, H. Laiz, E. Yasuda and R. García, Comparison of three
different shunts design for AC-DC current transfer, 17th IMEKO TC4
International Symposium, 2004,
[13] J. Zhang, X. Pan, H. Huang, L. Wang and D. Zhang, A coaxial AC shunt
with calculable AC-DC difference, Precision Electromagnetic
Measurements (CPEM), 2010, pp. 597 – 598
[14] A. M. R. Franco, E. Tóth, R. M. Debatin, and R. Prada, Development
of a power analyze, 11th IMEKO TC-4 Symposium, Trends in electrical
measurements and instrumentation, 2001, pp 1-5
[15] L. Zuliang, W. Lei, L. Min, L. Lijuan, and Z. Hao, Harmonic Power
Standard at NIM and Its Compensation Algorithm , IEEE Transactions on
Instrumentation and Measurement, 2009, pp. 180-187
[16] R. Carranza, S. Campos, A. Castruita, T. Nelson, A. Ribeiro, E. So, L.
D. Lillo, A. Spaggiari, D.Slomovitz, D. Izquierdo, C. Faverio, H. Postigo, H.
Díaz, H. Sanchez, J. Gonzalez and Á. Z. Triana, Precision Electromagnetic
Measurements (CPEM 2014), pp. 302-303
[17] E. Mohns, G. Ramm, W.G.K. Ihlenfeld, L. Palafox and H. Moser, The
PTB Primary Standard for Electrical AC Power , MAPAN - Journal of
Metrology Society of India, 2009, pp. 15 - 19
[18] U. Pogliano, Use of Integrative Analog-to-Digital Converters for HighPrecision Measurement of Electrical Power, IEEE Transactions on
Instrumentation and Measurement, 2002, pp. 1315 - 1318
150
[19] B. Voljc, M. Lindic, B. Pinter, M. Kokalj, Z. Svetik, and R. Lapuh,
“Evaluation of a 100 A current shunt for the direct measurement of AC
current, IEEE Transactions on Instrumentation and Measurement, 2013, pp.
1675–1680
[20] P. S. Filipski and M. Boecker, AC-DC current transfer standards and
calibrations at NRC, Simposio de Metrología, Mexico, 2006
[21] M. Garcocz, P. Scheibenreiter, W. Waldmann, G. Heine, Expanding the
measurement capability for AC-DC current transfer at BEV, Precision
Electromagnetic Measurements Digest, 2004, pp. 461-462
[22] H.Fujiki, A.Domae and Y-Nakamura, Analysis of the time constant for
the bifilar calculable ac/dc resistors, Precision Electromagnetic
Measurements, 2002, pp. 342-343
[23] R. E. Elmquist, Calculable Coaxial Resistors for Precision
Measurements, Instrumentation and Measurement Technology Conference,
1999, pp. 1468 - 1471
[24] E. So, D. Angelo, T. Tsuchiyama, T. Tadokoro, B. C. Waltrip, and T.
L. Nelson, Intercomparison of Calibration Systems for AC Shunts Up to
Audio Frequencies, IEEE Transactions on Instrumentation and
Measurement, 2005, pp.507 – 511
[25] D. L. H. Gibbings, A design for resistors of calculable a.c./d.c.
resistance ratio, Electrical Engineers, Proceedings of the Institution, 2010,
pp. 335 - 347
[26] T. Tsuchiyama and T. Tadokoro, Development of a high precision AC
standard shunt for AC power measurement, Precision Electromagnetic
Measurements, 2002, pp. 254 - 255
[27] O. B. Laug, T. M. Souders, and B. C. Waltrip, A Four-Terminal Current
Shunt With Calculable AC Response, National Institute of Standards and
Technology, 2004, pp. 1-56
[28] Y. Giilmez, G. Gulmez, E. Turhan, T. Ozkan, M. Cinar, L. Sozen, A
new design for calculable resistor,
Precision Electromagnetic
Measurements, 2002, pp. 348 – 349
[29] J. Kucera, E. Vollmer, J. Schurr and J. Bohacek, Calculable resistors of
coaxial design, Measurement science and technology, 2009,
[30] V. N. Zachovalova, On the Current Shunts Modeling, IEEE
Transactions on Instrumentation and Measurement, 2014, pp. 1620-162
[31] B. Pinter, M. Lindic, B. Voljc, Z. Svetik, and R. Lapuh, Modeling of
AC/DC current shunts, Precision Electromagnetic Measurements (CPEM),
2010, pp. 599–600.
[32] M. Garcocz, P. Scheibenreiter, W. Waldmann, and G. Heine, Expanding
the measurement capability for AC-DC current transfer at BEV, Precision
Electromagnetic Measurements Digest, 2004, pp. 461–462
[33] J. R. Kinard, T. E. Lipe, and C. B. Childers, AC-DC difference
relationships for current shunt and thermal converter combinations,
Precision Electromagnetic Measurements, 1991, pp. 352–355
[34] V. N. Zachovlova, M. Sira and J. Streit, Current and frequency range
exstension of AC-DC current transfer difference measurement system at
CMI, Precision Electromagnetic Measurements (CPEM), 2010, pp. 605 –
606
[35] K. E. Rydler, High precision automated measuring system for AC–DC
current transfer standards, IEEE Transactions on Instrumentation and
Measurement, 1993, pp. 608–611
[36] E. So, D. Angelo, T. Tsuchiyama, T. Tadokoro, B. C. Waltrip, and T.
L. Nelson, Intercomparison of Calibration Systems for AC Shunts Up to
Audio Frequencies,
IEEE Transactions on Instrumentation and
Measurement, 2005, pp. 507 – 511
[37] K.E. Rydler and V. Tarasso, A method to determine the phase angle
errors of an impedance meter, Precision Electromagnetic Measurements
Digest, 2004, pp. 123-124
MIPRO 2016/MEET
[38] S. Svensson, K.E. Rydler, and V. Tarasso, Improved model and phaseangle verification of current shunts for ac and power measurements Precision
Electromagnetic Measurements Digest, 2004 p. 82-83
[39] K. E. Rydler, T. Bergsten and V. Tarasso, Determination of Phase Angle
Errors of Current Shunts for Wideband Power Measurement, Precision
Electromagnetic Measurements (CPEM), 2012, pp. 284-285
[40] X. Pan, J. Zhang, H. Shao, W. Liu, Y. Gu, X. Ma, B. Wang, Z. Lu, and
D. Zhang, Measurement of the Phase Angle Errors of High Current Shunts
at Frequencies up to 100 kHz, IEEE Transactions on Instrumentation and
Measurement, 2013, pp. 1652 - 1657
[41] J. Zhang, X. Pan, W. Liu, Y. Gu, B. Wang, and D. Zhang, Determination
of Equivalent Inductance of Current Shunts at Frequency Up to 200 kHz,
IEEE Transactions on Instrumentation and Measurement, 2013, pp. 1664 –
1668
[42] X. Pan, J. Zhang, X. Ma, Y. Gu, W. Liu, B. Wang, Z. Lu, and D. Zhang,
A Coaxial Time Constant Standard for the Determination of Phase Angle
Errors of Current Shunts, IEEE Transactions on Instrumentation and
Measurement, 2013, pp. 199-204
[43] I. Budovsky, Measurement of Phase Angle Errors of Precision Current
Shunts in the Frequency Range From 40 Hz to 200 kHz, IEEE Transactions
on Instrumentation and Measurement, 2007, pp. 284 – 288
[44] X. Pan, Q. Wang, J. Zhang, S. Zeng, W. Chen, Measurement of the level
dependence in phase angle errors of the high current shunts, Precision
Electromagnetic Measurements (CPEM), 2012, pp. 160-161
[45] U. Pogliano, B. Trinchera and D. Serazio, Wideband digital phase
comparator for high current shunts,
Precision Electromagnetic
Measurements (CPEM), 2010, pp. 135-136
[46] G. C. Bosco, M. Garcocz, K. Lind, U. Pogliano, G. Rietveld, V. Tarasso,
B. Voljc, and V. N. Zachovalova, Phase Comparison of High-Current Shunts
up to 100 kHz, Precision Electromagnetic Measurements (CPEM), 2010, pp.
229-230
[47] J. D. de Aguilar, R. Caballero, and Y. A. Sanmamed, Realization and
Validation of the 10 mA–100 A Current Standard at CEM, IEEE
Transactions on Instrumentation and Measurement, 2014, pp. 1753 – 1759
[48] T. Funck and M. Klonz, Improved AC–DC Current Transfer Step-Up
With New Current Shunts and Potential Driven Guarding, IEEE
Transactions on Instrumentation and Measurement, 2007, pp. 361-364
[49]V. N. Zachovalova, M. Sira, J. Streit, and L. Indra, Measurement system
for high current shunts DC characterization at CMI, Precision
Electromagnetic Measurements (CPEM), 2010, pp. 607-608
[50] G. Rietveld, J. H. N. van der Beek, and E. Houtzager, DC
Characterization of AC Current Shunts for Wideband Power Applications,
IEEE Transactions on Instrumentation and Measurement, 2011, pp. 2191 –
2194
MIPRO 2016/MEET
151
Laboratory model for design and verification of
synchronous generator excitation control
algorithms
S. Tusun*, I. Erceg* and I. Sirotić*
*
Faculty of Electrical Engineering and Computing/Department of Electric Machines, Drives and Automation, Zagreb,
Croatia
stjepan.tusun@fer.hr, igor.erceg@fer.hr, igor.sirotic@fer.hr
Abstract - This paper presents a laboratory model of
synchronous generator excitation system based on National
Instruments cRIO real-time industrial controller. It was
specially designed for development and verification of
classical linear and modern nonlinear excitation control
algorithms. Real-time Clarke and Park transformations were
implemented on the FPGA for measurements of the
generator load angle, voltages and currents in the generator
dq-frame. Automatic voltage regulator (AVR) and power
system stabilizer (PSS2A) were implemented and
experimentally verified on an 83kVA synchronous generator.
Tests of step changes for generator voltage and mechanical
power references, and test of transmission line disconnection
were conducted. Experimental results were compared to
simulation results of the designed model in Matlab/Simulink.
I.
INTRODUCTION
Electrical power system is one of the largest technical
systems. Increasing power demands [1] and introduction of
new technologies, mostly based on power electronic [2],
are the main reasons for changes and growth of the power
system. Most of the produced electric energy comes from
the synchronous generators installed in hydro, thermal and
nuclear power plants. Long distances between power plants
and operation of synchronous generator near its capability
limits can lead to poor damping of electromechanical
oscillations among group of generators or between specific
areas of the power system [3]. These oscillations reduce
power transfer capability limits and can lead to collapse of
the power system [4].
One way to enhance the damping is by controlling
synchronous generators excitation. Modern excitation
systems are equipped with automatic voltage regulator
(AVR) and power system stabilizer (PSS). Design and
tuning of PSS is accomplished via linear techniques around
an operating point [5-6]. However, synchronous machine
and power system are highly nonlinear and generator
operating point can change significantly due to load and
topology changes and large disturbances. Therefore,
nonlinear excitation strategies were proposed, such as:
adaptive and intelligent control, feedback linearization,
Lyapunov theory and energy shaping based controllers.
Nonlinear excitation control strategies mostly have
complex control algorithms and require measurement or
estimation of generator load angle and network parameters.
Commercial available excitation systems are closed for
152
modifications of the control algorithms and have
predefined control options which are based on standard
excitation types [7]. Implementation and testing of new
excitation control strategies on these systems are very
limited which was the main motivation for development of
a new and open digital control system for excitation control.
Excitation control system DIRES21 was the first digital
system developed on the Faculty of electrical engineering
and computing in Zagreb (Department of electrical
machines, drives and automation). It was based on four
DSP ADMC300 processors [8]. The programming and
configuration of control algorithms was accomplished by
graphical oriented software development tool. There were
more than 300 predefined blocks manually implemented
via assembler language. After DIRES21, another system
was developed based on Texas Instrument DSP
TMS320F281 [9]. These two digital systems were specially
designed for excitation control and special skills and
programming knowledge was required for implementation
of control algorithms. They had limited communication
capabilities and there was no simple way for transferring
large blocks of measurement data.
Over time the need for new and standardized
development environment, which allowed fast prototyping
and testing trough graphical development environment,
grew. In [10] first prototype of excitation control system
based on National Instruments cRIO platform was
experimentally tested. This paper is continuation of [10]
and it presents a new laboratory model for design,
verification and fast prototyping of excitation control
algorithms in graphic based software. Classical automatic
voltage regulator (AVR) and power system stabilizer (type
PSS2A) were implemented on the digital system using
standard LabView blocks and experimental tested were
conducted to show the system performances.
II.
LABORATORY MODEL
Laboratory model used for verification of the
excitation control algorithms is depicted in Fig. 1. The
laboratory model enables the verification of the control
algorithms in various control regimes (reference voltage
and mechanical power step changes, turning on and off
transmission lines, short circuit experiments, etc.).
MIPRO 2016/MEET
III.
DIGITAL CONTROL SYSTEM
Digital control system, used for the control algorithm
programming and data acquisition, is based on the National
Instruments CompactRIO platform (cRIO-9014). cRIO is a
combination of a real-time controller, reconfigurable IO
modules (RIO) and FPGA module. I/O modules are used
for data acquisition and analog to digital conversion
(ADC). Generator currents and voltages are first measured
by voltage and current transducers and then passed through
analog anti-aliasing filters (Fig 1.).
Figure 1. Laboratory model of 83kVA synchrnous generator
Laboratory model is located at the Faculty of Electrical
Engineering and Computing in Zagreb and consists of:
Salient synchronous generator mechanically
coupled with two DC motors
Transformer and inductances that simulate two
parallel transmission lines,
Thyristor rectifier for control of the DC machines
armature current,
IGBT rectifier for the synchronous machine
excitation,
Digital control systems for excitation control
algorithms.
Synchronous generator (83kVA) is governed by two
DC motors (44kW each) which are connected in series.
Parameters of synchronous generator and DC motors are
given in Appendix A (Tab. III and IV).
Inductors L1, L2 and L3 are used to represent parallel
transmission lines. Disconnection of transmission line and
three phase short circuits tests can be performed by
operating circuit barkers Q2 and Q1. Transformer T1 is
used for simulation of network voltage changes in range of
±10%. Connection of the T1 to the 10kV grid is
accomplished by transformer T2. Parameters of inductors
and transformers are given in Appendix A (Tab. V and VI).
DC motors are powered from the thyristor rectifier.
Protection functions and parameterization of the rectifier is
done by standard procedures [11]. Built-in speed and torque
controllers are configured as two stage controllers. In first
stage when the generator is not connected to the grid speed
controller maintains synchronous speed (near nominal). In
second stage when the generator circuit barker is closed
(the generator is connected to the grid) torque control is
activated. Speed and torque references are set by Siemens
S7-1200 PLC. Also, the PLC is used as communication
interface between the tyristor rectifier and cRIO digital
system that controls the generator excitation.
cRIO program consists of microcontroller program
loops and FPGA program loops. Microcontroller program
loops are executed in real time and monitored by the realtime operating system (RT OS) while FPGA program loops
are executed on dedicated hardware (Fig. 2).
A. Microcontroller program loops
Two program loops are periodically executed on
microcontroller: communication loop and control loop
(Fig. 2.) For the control and communication loop
implementation, standard Labview library programming
blocks were used. By using the standard blocks optimal
code execution is guaranteed.
Main task of the communication loop is the transfer of
the measured data, control signals and user interface
commands. The loop is executed periodically with the
frequency of 10 Hz. Modbus/TCP communication protocol
is used for communication with Siemens PLC S7-1200.
Control loop is used for execution of the excitation
control algorithms. Execution frequency of this loop is 500
Hz. Input measurements are converted by FPGA ADCs,
scaled in the FPGA, and used for calculation of the output
control value.
B. FPGA program loops
Three time critical program loops are implemented on
FPGA module: counter loop, PWM loop and ADC
measurement loop.
Counter loop is used for the digital encoder impulse
counting. Positive and negative edge of the encoder’s A and
B impulses are counted. From counter value the generator
rotor angle and speed is calculated.
Control signal for the IGBT rectifier is generated in the
PWM loop. Triangular carrier signal with frequency of
1250 Hz is used for the PWM signal generator.
Generator excitation is powered from independent
voltage source trough IGBT rectifier (Fig. 1.). Output
voltage of the IGBT rectifier is controlled by PWM signal
from digital control system.
Figure 2. Structure of digital control system
MIPRO 2016/MEET
153
Excitation control system consists of the AVR and PSS.
The AVR control loop consists of an inner field current
control loop and outer generator voltage control loop. Field
current controller is proportional (P) and generator voltage
controller is proportional integral (PI). Parameters of
generator voltage and field current controller used for
simulation and experiments are given in Tab. I.
Figure 3. Strucutre of ADC measuremnt loop
ADC loop is used for signal conditioning and
calculation of the Clarke and Park transform (Fig. 3). This
loop is executed at the frequency of 2500Hz and it is
synchronized with the PWM loop.
Signals from the ADCs are scaled to relative units (pu)
and the Clarke transform is used for calculation of the αβ
components:
𝑖𝛼𝛽
1
𝑖𝛼
2
= [𝑖 ] = [
3
𝛽
0
−1
2
√3
2
−1
𝑖𝑎
2
𝑖𝑏 ]
]
[
−√3
𝑖𝑐
2
Furthermore, the Park transform is used to calculate the
generator voltage dq components:
𝑢𝑑
sin 𝜃
𝑢𝑑𝑞 = [𝑢 ] = [
𝑞
cos 𝜃
− cos 𝜃 𝑢𝛼
] [ ]
sin 𝜃 𝑢𝛽
Generator load angle is then calculated from the voltage
dq components [12].
Generator active and reactive power, effective values of
the generator currents and voltages are also calculated from
αβ components. These signals are then filtered by second
order Butterworth filter. Additionally, the generator
angular speed and load angle are filtered with 10Hz Notch
filer to damp base mechanical frequency. These two digital
filters are part of the standard LabView FPGA library. In
the end, values of filtered signals are transferred to
microcontroller memory and used as inputs in control loop.
Generator speed, rotor angle and αβ components of the
measured signals are transferred to the microcontroller
shared memory. These values are then collected by GUI
application which is executed on a PC. GUI application is
used for transfer and storage of measurement data.
Furthermore, GUI is used for references setup and user
control of the digital control system.
IV.
CLASSICAL EXCITATION CONTROL SYSTEM
Classical generator excitation control, synchronous
generator [3-4], transmission lines and power system were
modeled in Matlab/Simulink (Fig. 4.).
PSS was modeled by transfer function blocks from
Matab/Simulink as PSS2A type [7]. Main task of the PSS
is the damping of synchronous generator electromechanical
oscillations [3-4]. Parameters of the PSS are given in Tab.
II
V.
EXPERIMENTAL VERIFICATION
For experimental verification of the presented
laboratory model AVR and PSS2A were implemented on
the digital control system.
Discrete PI controller was implemented using standard
PID block from LabView [13].
To implement the PSS2A on the digital control system
it was necessary to convert its transfer function from
continuous to discrete time domain. This step was
accomplished by use of bilinear transformation in
Matlab/Simulink.
To validate AVR and PSS2A following tests were
performed on the laboratory model:
voltage reference change,
mechanical power change, and
disconnection of transmission line.
For each experiment the generator terminal voltage,
active and reactive power, load angle, angular speed and
field current are shown (Fig. 5 to 8).
A. Voltage reference change
Voltage reference step change from 1.0 to 0.9 pu
experiment was conducted for the case of the synchronous
generator connected to the distribution network. The
mechanical power reference of the generator was constant
(0.5 pu). Also, simulation in Matlab/Simulink was
TABLE I.
PARAMETERS OF THE GENERATOR VOLTAGE AND
FIELD CURRENT CONTROLER
Controller
P gain
I gain
Upp. lim.
Low.lim.
10
20
4
0
3
-
3
0
Voltage [pu]
Field curr. [pu]
TABLE II.
Figure 4. Matlab/Simulink model of classical excitation control sysem
154
PARAMETERS OF PSS2A
Tw1,w3
T1
T2
T6
T7
T8
T9
1s
0.3 s
0.05 s
0s
1s
0.2 s
0.09 s
Ks1
Ks2
Ks3
N
M
Upp. Lim.
Low. Lim.
5
0.5
1
1
4
0.1 pu
-0.1 pu
MIPRO 2016/MEET
Figure 5. Comparison of experimental (blue) and simulation (red dash) results for a voltage reference change for the system with AVR
Figure 6. Comparison of experimental (blue) and simulation (red dash) results for a voltage reference change for the system with PSS2A
performed with the same conditions for validation of the
proposed laboratory model.
As it can be seen from Figs. 5 and 6 there is no
significant
difference
between simulation and
experimental results, in transient and steady state. The
greatest difference is in the signal of field current due to
unmodeled hysteresis in simulation model.
Oscillations in signal of the active power, angular
speed and load angle were presented for the system with
AVR (Fig. 5.). These oscillations were successfully
damped by adding PSS2A signal to AVR input (Fig. 6.).
Active power feedback loop was not used in the laboratory
model so there were differences in pre and post steady state
values of active power (Figs. 5. and 6.).
B. Mechanical reference change
The mechanical reference step change from 0.3 to 0.6
pu experiment was conducted for the case of the
synchronous generator connected to the distribution
MIPRO 2016/MEET
network. The generator voltage reference was constant
(1.0 pu).
In Fig. 7 comparison of experimental results for the
system with without and with PSS2A are shown. As it was
expected, system with PSS2A better damps
electromechanical oscillations than system with AVR
(without PSS2A). Also, it is important to note that system
with PSS2A has grater overshot in signal of generator
voltage which is tradeoff for better oscillation damping.
C. Disconnection of transmission line
Experiment of disconnection of one of the transmission
lines was conducted while the generator voltage reference
(0.9 pu) and mechanical power reference (0.4 pu) were
constant. Disconnection of the line (inductors L1 and L2)
was done by circuit breaker Q2 (Fig.1.).
As it can be seen from Fig. 8 system with PSS2A better
damps electromechanical oscillations than the system with
AVR (without PSS2A).
155
Figure 7. Experimental results for a mechanical power reference change for the system with AVR (blue) and system with PSS2A (red dash)
Figure 8. Experimental results for discconection of one transmission line for the system with AVR (blue) and system with PSS2A (red dash)
VI.
CONCLUSION
Laboratory model for design and verification of
synchronous generator excitation control algorithms was
presented. Proposed model can be efficiently used for
development and testing of classical linear and modern
nonlinear excitation control algorithms. Implementation of
control algorithms on the digital control system is
accomplished by use of well-known graphically based
development software. Furthermore, the graphic based
software reduces time for control law development and
verification.
Automatic voltage regulator and power system
stabilizer were implemented and experimentally tested.
Experimental tests of the generator voltage change,
mechanical power change, and disconnection of
transmission line showed that the PSS2A successfully
damps electromechanical oscillations. Furthermore,
156
experimental results were compared to results from
Matlab/Simulink model. Developed simulation model can
be used for parameter estimation and testing of new control
algorithms before prototyping on laboratory model.
APPENDIX A. LABORATORY MODEL PARAMETERS
TABLE III.
NOMINAL DATA OF SYNCHRONOUS GENERATOR
Voltage
400V
Speed
600r/min
Current
120A
Power factor (cos )
0.8
Power
83kVA
Excitation voltage
100V
Frequency1
50Hz
Excitation current
11.8A
TABLE IV.
NOMINAL DATA OF DC MOTORS
Voltage
Current
Power
Speed
220V
192A
44,24kW
600r/min
MIPRO 2016/MEET
TABLE V.
NOMINAL DATA OF INDUCTORS
Inductor
L1
L2
L3
Inductance
3,5mH
1,35mH
0,45mH
Current
86A
180A
226A
TABLE VI.
Transformer
Voltage
Power
uk
NOMINAL DATA OF TRANSFORMERS
T1
T2
380/380±10%
10/0,4kV
145kVA
1MVA
6%
6%
REFERENCES
[1]
[2]
[3]
[4]
A. A. Bayod-Rújula, “Future development of the electricity systems
with distributed generation,” Energy, vol. 34, no. 3, pp. 377–383,
Mar. 2009
F. Blaabjerg, Z. Chen, and S. B. Kjaer, “Power electronics as
efficient interface in dispersed power generation systems,” Power
Electronics, IEEE Transactions on, vol. 19, no. 5, pp. 1184–1194,
2004.
P. Kundur, N. J. Balu, and M. G. Lauby, Power System Stability
and Control, 1st edition. New York: McGraw-Hill, 1994.
G. Andersson, P. Donalek, R. Farmer, N. Hatziargyriou, I. Kamwa,
P. Kundur, N. Martins, J. Paserba, P. Pourbeik, J. Sanchez-Gasca,
R. Schulz, A. Stankovic, C. Taylor, and V. Vittal, “Causes of the
MIPRO 2016/MEET
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
2003 major grid blackouts in North America and Europe, and
recommended means to improve system dynamic performance,”
IEEE Transactions on Power Systems, vol. 20, no. 4, pp. 1922–
1928, Nov. 2005.
M. J. Gibbard, “Robust design of fixed-parameter power system
stabilisers over a wide range of operating conditions,” Power
Systems, IEEE Transactions on, vol. 6, no. 2, pp. 794–800, 1991.
D. Sumina, N. Bulić, and M. Mišković, “Parameter tuning of power
system stabilizer using eigenvalue sensitivity,” Electric Power
Systems Research, vol. 81, no. 12, pp. 2171–2177, Dec. 2011.
“IEEE Recommended Practice for Excitation System Models for
Power System Stability Studies,” IEEE Std 421.5-2005 (Revision of
IEEE Std 421.5-1992), pp. 0_1–85, 2006.
T. Idžotić, D. Sumina, and I. Erceg, “DSP based excitation control
system for synchronous generator,” presented at the EDPE 2005,
2005.
D. Sumina, N. Bulić, and M. Mišković, “Application of a DSPBased Control System in a Course in Synchronous Machines and
Excitation Systems,” International Journal of Electrical
Engineering Education, vol. 49, no. 3, pp. 334–348, Jul. 2012.
I. Erceg, S. Tusun, and G. Erceg, “The use of programmable
automation controller in synchronous generator excitation system,”
in IECON 2012 - 38th Annual Conference on IEEE Industrial
Electronics Society, 2012, pp. 2055–2060.
Siemens, Simoreg, DC-MASTER, “6RA70 Microprocessor-Based
onverters from 6kW to 2500kW for Variable-Speed DC Drives”,
13th ed., Siemens, 11 2007.
T. Stjepan, I. Erceg, and G. Erceg. "Synchronous generator load
angle estimation using Parks transformation." 9. savjetovanje HRO
CIGRE 2009. 2009.
National, Instruments. (2009) PID and Fuzzy Logic Toolkit User
Manual, aviliable at: www.ni.com/pdf/manuals/372192d.pdf
157
The European Project SolarDesign Illustrating the
Role of Standardization in the Innovation System
W. Brenner*, N. Adamovic*
*
Vienna University of Technology, Institute of Sensor and Actuator Systems, Vienna, Austria
werner.brenner@tuwien.ac.at
Abstract - The Framework Program 7 Project
SolarDesign is focused on new photovoltaic (PV) integrated
product solutions. To achieve this, new materials, flexible
production - and business processes in PV powered product
design and architecture had to be developed. Promising
markets such as sustainable housing, temporary building
structures, outdoor activities, electro-mobility, road lighting
and mobile computing drive the demand for decentralized,
attractive energy solutions. As products have to respect
existing standards SolarDesign from the begin decided to
proactively integrate standardization into the project’s
efforts. The example PV driven streetlamp demonstrates the
influence of standards on performance requirements of the
innovative PV materials.
I.
INTRODUCTION
Designers, architects and industrial manufacturers
share a common interest in using Photovoltaics (PV) as a
decentralized and sustainable source of energy in their
designs. Indeed photovoltaic modules are the only viable
renewable energy solution that can be integrated directly
into objects such as devices, textiles and surfaces of
buildings. Photovoltaics (PV) is widely recognized as one
of the key technologies for the future energy supply. The
cost reduction within the last 15 years is a result of the
utilization of learning effects, expansion of production
capacities, extensive automation and standardization
efforts [1].
SolarDesign reflects the consortium’s commitment to
actively contribute to a solid and sustainable
standardization system that fosters added value for the
European industry.
II.
SOLARDESIGN VISION
By adjusting the compositions and spectral responses
of the functional layers of Thin Film Photovoltaics not
only the overall efficiency can be improved but also
energy yields under diffuse light conditions or at higher
operating temperatures are changeable. This uniqueness
of CIGS had potential for exploitation. The scientific and
technical objectives (STOs) of this project are to develop:
STO1: A flexible scribing and printing technology
that allows producing a given photovoltaic module
according to specific design requirements “on-the-fly”.
This flexible interconnection is applied on the solar foil
(i.e. an endless solar cell) with a minimum width of 300
mm and allows curved solar cells and interconnection
patterns with a minimum radius of 10 mm.
STO2: Novel materials for the underlying flexible
solar cell technology to extend the design related degrees
of freedom and to optimize the materials used for
integrative solar applications.
STO3: Novel materials for satisfying design related
requirements on solar module level (the part that is most
visible for the beholder). Focus will be laid on materials
for the electrical conducting front grid to allow a high
design freedom of patterns and colour variations, as well
as using of different novel encapsulants allowing custom
designed optical appearance.
STO4: A methodological toolbox to provide design
rules for the best solar cell super-structure and module
design layout for a given application by using numerical
modelling and simulation.
STO5: The following applications demonstrate the
developed technologies: solar charging (cover for tablet
PC and solar powered radio), solar powered light, solar
powered sensor networks for detection of fire in forest,
urban solar lighting (compact solar street lighting system)
for Product Integrated Photovoltaics (PIPV), Building
Integrated Photovoltaics (BIPV) and integration of PV in
textile support.
III.
Figure 1. “Technical constraints” versus “design freedom”
158
STANDARDIZATION AND INNOVATION
Driven by increased competition many countries and
companies
have
started
efforts
focused
on
implementation of international standardization in an
early phase of research and development (R&D) [9].
MIPRO 2016/MEET
Extensive research on the economic evaluation of new
technology development and international standardization
has been conducted [10], [11] which evidences that R&D
that takes into consideration standardization can enhance
the efficiency of investment and stimulate the
introduction of developed technologies and solutions.
Accordingly, world wide economy and mass
production are typically associated with standardization.
Standardization meanwhile extended its scope towards
the research community in order to ease market access of
innovations and to provide interoperability between new
and existing products, services and processes
Companies and research institutions can profit from
participation in a Technical Committee (TC), which
enables them to benefit from the ability to internalize the
external information resources and thus to increase
innovation skills [12]. In this light, the positive effect of
participation can foster an enhancement of existing – and
a creation of new – competence and innovations. On the
other hand, it evidences interest in new technologies and
solutions, shows the willingness to co-operate with other
prospective users and indicates the intention to enter
innovative markets. Early access to information, the
degree of the strategic influence on new projects and the
influence on technological development can be affected
by the participants’ traditional or developing positioning
(modes of participation, roles overtaken, functions, social
rank etc.) [13]. Standards enable companies and research
institutions to comply with relevant laws and regulations
and help to ensure methodological robustness and wide
acceptance [18].
Standards provide a guiding framework for FP and
H2020 research projects, ensuring: tests and analytical
work are carried out according to established norms, and
developed technologies are interoperable with existing
technologies and compliant with industry standards. By
working to existing standards research projects have a
higher chance of their outputs being accepted by
scientific and industrial communities. Working with
existing standards also enables researchers to recommend
and contribute to new standards development, thus
increasing their technical knowledge, widening their
business networks and strengthening the market
exploitation of their results [6].
to identify gaps in coverage or as the basis for revising
resp. extending them.
There are many thousands of standards of various
types. Standards can be categorized into four major types
•
Fundamental standards - concerning terminology,
conventions, signs, units etc
•
Test methods and analysis standards - which
measure characteristics such as temperature, size,
force and chemical composition
•
Specification standards - which define a
product’s characteristics (product standards), or a
service (service activities standards) and their
performance thresholds such as fitness for use,
interfaces and interoperability, health and safety,
environmental protection, etc
•
Organisation standards - which describe the
functions and relationships of a company, as well
as elements such as quality management and
assurance, maintenance, value analysis, logistics,
project or system management, production
management, etc
SolarDesign has decided to integrate standardization
into the project’s efforts as standards can enhance the
economic value of research and innovation projects [3].
An early step was the decision which route answered
SolarDesign’s needs best. There are several ways in
which standards and standardization can be integrated
•
Integration of standardization bodies into the
project’s consortium
•
Identification of a national, European or
international Standards Body which can be an
associate in the project, for example in a Steering
Group
•
Informal participation of CEN-CENELEC
Management Centre as associate in a Steering
Group. This is possible in projects with specific
work packages on standardization
•
Identification of the links SolarDesign has with
ongoing standardization (e.g. "Photovoltaics in
buildings" prEN 50583)
•
requesting a Project Liaison with an existing
Technical Committee. As soon as this status has
been granted, the project can participate in the
Technical Committee's plenary meetings and
contribute to the working groups.
•
The project requesting the Project Liaison can
demonstrate formal collaboration with the
European Standardization System (usually a
requirement in FP7 calls);
•
The project representative can participate in the
TC directly thus ensuring synergies between the
research and standardization work (avoiding
duplication of standardization work);
•
The project representative can propose a new
work item (standard) directly to the TC without
Figure 2. Benefits of Research of Using Standards [5]
SolarDesign as many FP7 and H2020 projects started
with an analysis and review of existing standards, either
MIPRO 2016/MEET
159
going through a national delegation (direct
impact on the standardization work program.
International standardization is a highly social activity
involving specialists, detailled observation of the written
standardization process, and a clear strategic agenda.
Often it is a highly organizational activity, and in many
cases it is a highly individual-dependent activity.
Consensus building is an important process which
demands the following skills [14]:
•
sharing goals, costs, risks, quality requirements,
measures, and alternatives
•
sharing awareness of each party’s positions,
expectations and backgrounds
•
skills in communications, listening, persuasion
and facilitation
•
ability to foster compromises
It is impacted by the following issues:
•
ongoing explicit or hidden business models
•
human networks and trusts among participants
•
issues due to external organizational aspects
•
technology trends
In the frame of SolarDesign the coordinating institution
TU Wien motivates the partners to proactively contribute
to the standards development processes. Standardization
within Task 5.3 aims at providing a bridge connecting
research to industry by promoting innovation and
commercialization through dissemination of new ideas
and best practice. This comprises circulating of new
measurement and evaluation methods, implementation of
new processes and procedures created by bringing
together all interested parties such as manufacturers,
researchers, designers and regulators concerning
products, raw materials, processes or services.
SolarDesign’s consortium puts strong efforts on including
sessions on standardization in the frame of meetings for
increasing the internal and external awareness of the topic
(e.g. a standardization workshop was held at midterm).
Integrating photovoltaics in architecture and industrial
design is in its infancy and taking into account different
European perspectives is crucial for its success. This is
especially true for Building Integrated PV where legal
conditions like construction codes or feed-in regimes are
differing from one European country to another.
By seeing design from a stylish or sensory perspective
regional characteristics exist. Taking product designers
and PV experts from different European countries can
ensure
the
incorporation
of
varying
design
manifestations.
Dissemination, implementation and standardization
cannot be seen as separate activities. They must be fully
coordinated and represent two sides of the same plan for
success. Networking and documentary standards are
promising and vital tools of disseminating SolarDesign’s
research to the market place.
160
All documents of standardization relevance are jointly
prepared by the partners and ongoingly are bundled and
released by the task leader TU Wien.
IV. SOLARDESIGN’S STREETLAMP - AN EXAMPLE
HOW INNOVATION AND STANDARDIZATION INFLUENCE
EACH OTHER
In an early stage of the project the SolarDesign
consortium identified where standards can benefit the
project. In the next step the participants defined
standardization issues.
A typical problem of PV systems is the power loss
due to temperature increase, because modules often
operate close to the product envelope with low
ventilation. SolarDesign’s partner institution EURAC
therefore evaluated and compared the PV temperature
conditions of different PV module categories (in terms of
PV technology and material type). A simple linear
expression for the evaluation of the PV module
temperature is Tmod=Tamb+kG which links Tmod with the
ambient temperature Tamb and the incident solar radiation
flux G. Within this expression the value of the
dimensional parameter k, known as the Ross coefficient,
depends on several aspects (i.e. module type, wind
velocity and integration characteristics). However,
dispersed values for this parameter can be found in
literature (in the range of 0.02-0.06 K m2/W) according to
different module types [7].
Figure 3. Overview of temperature coefficients of different thin film
photovoltaic technologies [9]
As can be seen in Figure 3, the efficiency decrease of
ZnS-buffered CIGS (Copper, Indium, Gallium,
Diselenide) is smaller than for CdS-buffered CIGS. With
further improvement of alternative buffers (like ZnS,
ZnMgO, Zn(O,S)), the efficiency values of CdS-buffered
CIGS at 25°C can be reached (demonstrated by
Helmholtz Zentrum Berlin [8]. Due to lower temperature
coefficient even higher efficiencies at operating
temperatures (typically ~60°C for midsummer day) can
be expected for alternative buffers – without using toxic
Cd. For deposition of different buffer layer materials both
sputtering and wet chemical process are available.
Manufacturers typically rate PV modules at standard test
conditions. However, the actual energy production of
field installed PV modules is a result of a range of
operating temperatures, irradiances, and sunlight spectra.
Therefore, there is an urgent need to characterize PV
modules at different temperatures and irradiances to
provide comprehensive rating information [15]. One of
the most relevant PV standards issued by the
MIPRO 2016/MEET
International Electrotechnical Commission (IEC)
Technical Committee 82 Working Group 2 (IEC/TC82/
WG2) is the IEC 61853 standard titled Photovoltaic
Module Performance Testing and Energy Rating (IEC,
2011). This can be seen as relevant for the CIGS PV
modules. Why is their high efficiency of relevance in this
specific case of autonomous streetlamp? The technical
parameters of streetlamps are defined in four standards
which guarantee a common base of light conditions in
European cities.
PD CEN/TR 13201-1:2014 Road lighting. Guidelines
on selection of lighting classes Road lighting, Lighting
systems, Roads, Lighting levels, Classification systems,
Road safety, Traffic flow
BS EN 13201-2:2015 Road lighting. Performance
requirements Road lighting, Lighting systems, Roads,
Performance, Classification systems, Luminance, Glare,
Lighting levels, Road safety, Environmental engineering,
Luminaires, Pedestrian-crossing lights
BS EN 13201-3:2015 Road lighting. Calculation of
performance Road lighting, Lighting systems, Roads,
Performance, Mathematical calculations, Photometry
(light measurement), Light distribution, Lighting levels,
Luminance, Road safety, Luminaires
BS EN 13201-4:2015 Road lighting. Methods of
measuring lighting performance Road lighting, Lighting
systems, Roads, Performance, Photometry (light
measurement), Performance testing, Test equipment,
Luminance, Lighting levels, Luminaires, Reports.
increases when a
passenger approaches
luminescence
no passenger
-x
distance
+x
Standardization - proposal 1
Road Lighting CEN/TR 13201-1,2,3,4
To safe energy the luminescence is adjusted
automatically to actual distance between
passenger and lamp – required: calculation and
measurement of photometric performance for
this case
V.
International standardization from the viewpoint of
social structure shows several characteristics. First, the
level of knowledge required to be a leader is significantly
high. This leads to the fact that acting as the main editor
of a technical standardization requires experience and
extensive work, which is difficult to carry out as a side
job.
Standardization on European level turns out to be a
long process [16], different from national standards
which often can be finalized and issued within a few
months. At the international level, the full process takes
two up to five years. There are six main stages in
publishing a new European resp. international standard.
Proposal stage: originates from a National Committee.
Represented countries vote on the interest and nominate
experts which define a work programme with target
dates. Preparatory stage: a Working Draft (WD) is
prepared by a project team. Committee stage: a
Committee Draft (CD) is submitted to the National
Committees for comment. Enquiry stage: a Committee
Draft for Vote (CDV) is submitted to all National
Committees: a majority of two thirds is requested.
Approval stage: a Final Draft International Standard
(FDIS) is prepared. Publication stage: if the FDIS is
approved by a two thirds majority, the document is
published
by the
International
Electrotecnical
Commission (IEC) Central Office as an international
standard.
To avoid duplication of efforts, speed up standards
preparation and ensure the best use of the resources
available and particularly of experts' time IEC and
CEN/CENELEC [5] agreed: if the results of parallel
voting are positive in both the IEC and CEN/CENELEC,
the IEC will publish the International Standard, while the
CEN/CENELEC Technical Board will ratify the
European Standard [17].
In technology-based markets, to absorb external
knowledge and conversely to stake their claims,
stakeholders participate in the global socio-technical
network of standardization [12]. Main target stakeholders
of SolarDesign to be addressed are:
•
Standardization
bodies:
International
standardization
organisations
such
as:
(CEN/CENELEC, IEEE, etc.) and national such
as: HZN, AFNOR, DIN, SNV, SIST. They
develop and establish product and/or process
standards to be followed by producers and
application developers
•
Certification entities: (EEPCA - Professional
Association of the European Certification Bodies,
EECC - The European Certification Council, etc.)
provide confidence to users that a certain element
of the SolarDesign project is produced and/or
operated according to a defined set of practices or
standards
•
Photovoltaic systems suppliers, installers and
service providers for energy efficient buildings
(Building integration and architecture) and Solarpowered consumer products (e.g. solar lighting,
PV driven street lamp with proximity sensor
Figure 4. Energy saving performance of a PV driven street lamp as
example of need for standards development in the frame of the FP7
NMP project SolarDesign [2]
To meet all these requirements research undertaken in
the frame of FP7 NMP project SolarDesign had to
concentrate on an appropriate level of CIGS efficiency to
guarantee the luminance defined in the respective
standard.
To safe electric energy which can be harvested only
during day, the luminescence of the street lamp is
modulated according to the distance of approaching
passengers (by an integrated proximity sensor). This
special performance characterized by changing lighting
levels makes necessary a new standard covering related
calculation procedures too.
MIPRO 2016/MEET
CHARACTERISTICS OF STANDARDIZATION PROCESS
161
textile integration) through national, European
and international associations related with the
technologies developed in the project
•
•
Training providers: provide training to qualify
developers and other professionals to work with
the SolarDesign project results. Such as:
Universities, RTD and devoted training
organisations
Local Authorities & National/Regional Public
Bodies are key players as policy makers,
favourable legislative framework creation, public
procurement
•
Architects’ associations. Architects need to be
provided with appropriate training, tools and
guidelines for them to consider the integration of
photovoltaic materials
•
Solar-powered consumer users. To be provided
with appropriate training, tools and guidelines for
them to consider the integration of photovoltaic
materials in their products
•
Construction companies associations and related
research associations should be aware of the new
technologies that will be installed
•
Public and private real estate Promoters. They
can offer to their clients the advantage of the
developed system therefore they should be
informed about it and about new business models
•
Clients and users (citizens): key actors interested
in cooperative working systems or applications
providing their perspectives in the formulation
and assessment of the project results
•
Network operators: may act as a channel for
offering and billing services and/or access
devices to users.
•
Energy management agencies: regional and/or
national energy agencies are promoting efficient
and innovative energy technologies
A basic requirement to minimalize the duration of the
whole consensus process is a consistently written
specification, which requires a skilled and dedicated
editor resp. team of editors. This often is characterized by
the fact that there are some hidden structures in the
consensus process.
Editor
Contributors
Driver
Registers
Watcher
Facilitator
Process
Observer
Lobbyist
Technology
Observer
Coordinates all inputs and updates, has to be well
prepared to provide editorship and can replace
other editors if this should become necessary
Experts who create new inputs and support
discussions, they do not overtake responsibility as
editors
The workforce that drives and accelerates the
standard with collaborative writing.
Try to delays or obstructs the progress of a
standard.
The people that watch the trends.
Facilitate consensus process
Watch the consensus process and express opinions
on process violation, but they avoid responsibility
for consensus finding
People that negotiate under the table for their own
business models.
Watch the consensus process for technical quality
Barriers and problems encountered when contributing
to new or revised standards can be summarized as follows
[5]: first, timetabling issues, wherein the timeframes for
the research project and the standards development work
are not aligned sufficiently and coordination between the
two processes becomes difficult because of the differing
stages of progress. Difficulties in gaining acceptance for
the inputs put forward by the project team, noting
competing research resp. industrial interests, lobby
groups, or results too ‘innovative’ to be accepted by the
standardization groups. Another problem often is based on
resource availability over a longer time, given the shortterm nature of project funding, often resulting in a funding
gap. In cases of limited experience in standardization
additional difficulties might rise from unclear access to
standardization bodies and their technical committees, due
to membership rules, lack of direct participation, etc.
Researchers often see standards development as a time
consuming and difficult process which limits the available
time for research and teaching. Project teams might find it
difficult to identify suitably qualified experts able and
willing to work within the standardization process to
implement the project results. Anyway the consortium has
to be aware of a learning curve associated with
understanding the world of standardization (e.g. how to
identify already existing standards, how to propose and
make changes to standards, how to gain acceptance, etc.).
The consensus process never will be an equalitybased decision-finding and communication process.
Some stakeholders contribute significant efforts for
editing the specification in order to drive some favoured
business goals [14]. But without such contributions, it is
difficult to carry out today’s international standardization.
This situation results in a clear diversification of roles and
responsibilities in the complex standardization process as
shown in Table I.
TABLE I.
Role
162
ROLES OF PLAYERS IN INTERNATIONAL
STANDARDIZATION [14]
Figure 5. Barriers of Links between Research and Standardization [5]
Description
MIPRO 2016/MEET
VI.
CONSORTIUM
The SolarDesign consortium is constituted by 11
participants (6 SMEs and 5 research institutions) from 6
countries who gather all the necessary background and
expertise to achieve the ambitious research and
standardization objectives of the project.
Consortium: Technische Universität Wien (Austria),
Sunplugged GmbH (Austria), Faktor 3 ApS (Denmark),
Innovatec Sensorisatión y Comunicación S.L (Spain),
Studio Itinerante Arquitectura S.L. (Spain), RHP
Technology GmbH (Austria), Asociación de Industrias de
las Technologias Electrónicas y de la Información del País
Vasco (Spain), Munich University of Applied Sciences
(Germany), Accademia Europea Bolzano (Italy),
Università degli Studi di Milano-Bicocca (Italy),
Commissariat à l’energie atomique et aux energies
alternatives (France).
VII. CONCLUSION
Integrating photovoltaics into industrial design is in
its infancy and taking into account different European
perspectives is crucial for its success. Seeing design from
a stylish or sensory perspective regional characteristics
exist. This is especially true where legal conditions are
differing from one European country to another.
Dissemination, implementation and standardization
cannot be seen as separate activities. They must be fully
coordinated and represent two sides of the same plan for
success. SolarDesign from the beginning decided to
contribute to the role that standardization plays, such as:
•
Ensuring broad applicability of SolarDesigns’s
outcome
•
Fostering an increase of the efficiency of research
and development work
•
Supporting that interoperability of developed
solutions is given with already existing
technologies or regulations.
•
Contributing to standards development where a
need was identified
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
ACKNOWLEDGMENT
The FP7 NMP SolarDesign Project has received
funding from the European Union's Seventh Framework
Programme under grant agreement n° 310220.
REFERENCES
[1]
[2]
B. Parida, S. Iniyan and R. Goic, “A review of solar photovoltaic
technologies,” Renewable and Sustainable Energy Reviews 15
(2011), pp. 1625–1636.
SolarDesign Newsletter 4, January 2015, http://www.solardesign.eu/ .
MIPRO 2016/MEET
[16]
[17]
[18]
CEN (European Committee for Standardization)/CENELEC
(European Committee for Electrotechnical Standardization), “Why
Standards,“
http://www.cencenelec.eu/research/WhyStandards/
Pages/default.aspx.
S. Ishizuka, T. Yoshiyama, K. Mizukoshi, A. Yamada and S. Niki,
“Monolithically integrated flexible Cu(In,Ga)Se2 solar cell
submodules,” Solar Energy Materials and Solar Cells 94 (2010),
pp. 2052-2056.
CEN (European Committee for Standardization)/CENELEC
(European Committee for Electrotechnical Standardization),
“Research Study on the Benefits of Linking Innovation and
Standardization,” Dec 2014, Ref: J2572/CEN.
J. Stroyan and N. Brown, “Study on the contribution of
standardization to innovation in European-funded research
projects,” technopolis group, Sept 2013
L. Maturi, G. Belluardo, D. Moser and M. Dell Buono, “BiPV
system performance and efficiency drops: overview on PV module
temperature conditions of different module types,” Energy
Procedia 48 (2014), pp. 1311-1319.
H. Papathanasiou, “ROHS compatible buffer layers for CIGS
based solar modules – first results from the Photovoltaic Alliance
Project,” 3rd International Workshop on CIGS Solar Cell
Technology, Berlin, 17.-19.04.2012.
S. Y. Sohn and Y. Kim, “Economic evaluation model for
international standardization of correlated technologies,” IEEE
Transactions on Engineering Management, vol. 58, no. 2, May
2011, pp 189-198.
DIN (Deutsches Institut fuer Normung e.V.), (2000), “The
economic benefits of standardization,” Research Rep.,
http://www2.din.de/ .
Standards Australia. (2006), “Standards and the economy,”
Research Report, http://www.standards.org.au/ .
E. N. Filipovic, “How to support a standard on a multi-level
playing field of standardization: propositions, strategies and
contributions,” Proceedings of the 2014 ITU Kaleidoscope
Academic Conference: Living in a converged world - Impossible
without
standards?,
2014,
pp:
207-214,
DOI:
10.1109/Kaleidoscope.2014.6858464 referenced in: IEEE
Conference Publications.
S. Wurster, K. Blind and S. Fischer, “Born global standard
establishers identification of a new research field and contribution
to network theory,” Proceedings of the 8th IEEE Conference on
Standardisation and Innovation in Information Technology (SIIT),
Sophia-Antipolis, 09/ 2013, doi 10.1109/SIIT.2013.6774583, pp.
1-12
T. Yamakami, “A three-dimensional view model of international
standardization using techno-sociological analysis,” 4th
International Conference on Interaction Sciences (ICIS), 2011,
IEEE Conference Publications, pp. 13-18.
G. Tamizh Mani,. K. Paghasian, J. Kuitche, M. Gupta Vemula and
G. Sivasubramanian, “Photovoltaic module power rating per IEC
61853-1 standard: a study under natural sunlight,” Arizona State
University Photovoltaic Reliability Laboratory (PRL), March
2011, www.solarabcs.org/ratingper61853 .
D. Blanquet, P. Boulanger, A. Guerin de Montgareuil, P. Jourde,
Ph. Malbranche and F. Mattera, “Advances needed in
standardisation of PV components and systems,” Proceedings of
3rd World Conference on Photovoltaic Energy Conversion May 1118, 2003 Osuka, Japan, pp. 1877-1881.
IEC International Electrotecnical Commission, “Inside the IEC –
Information,” www.iec.ch.
CEN (European Committee for Standardization)/CENELEC
(European Committee for Electrotechnical Standardization), “The
importance of standards,” http://www.cencenelec.eu/research/
tools/ImportanceENs/Pages/default.aspx .
163
Open public design methodology and design
process
D. Rembold*, S. Jovalekic**
*
**
IES, Albstadt, Germany, rembold@hs-albstadt.de
IES, Albstadt, Germany, jovalekic@hs-albstadt.de
Abstract - This paper proposes a design methodology and
design process for mechatronic systems incorporating the
public community. As a proof of concept, we are building up
robot system consisting of the following components:
software code files, hardware design files and controller
hardware. Every component will be available for review on
an open repository and review tool which is GIT/Gerrit.
Designers release components into the repository in form of
commits. Commits can be downloaded by every community
member, examined and reviewed in detail by the public
community. At certain time spots we pick a number of
commits with positive review results, and review the
commits on an overall system perspective. If this review
passes, we create a real system from the commits. The steps
are: software code compilation, hardware components
creation with a 3D printer, controller hardware ordering.
Then we build the new robot upon these components. If we
can build successfully the robot system (which is expected
due to the overall review) and if we are passing all testcases,
a new release is created from the picked commits. Every
community member can download the latest release for
their own usage. The business case for this design
methodology and design process, is that we offer to the
customer a service to conduct tests from a set of reviewed
commits. We provide the resources and the knowledge to
this service. We inform the customer about the review and
test results and the customer can precede improving the
system based on the feedback we provide.
I.
INTRODUCTION
Recently a new maker community has developed for
machine development. The designs is often free to
download, and vice versa everybody can upload his
machine design on dedicated websites. The public
community is able to put feedback in form of remarks
next to the upload. Unfortunately this is not really
following a real process in a way we know this from
companies. The originator of the upload is free to take a
look into the remarks or comments, but there is no
requirement to pursue the feedback.
Every component will be available for review on a
public repository and an open review tool such as
GIT/Gerrit, see Figure 1.
Figure 1. Design Parts in Repository
The public community can review every single
component and add its approval or rejection to it, see
Figure 2. In detail this means, all components to be
reviewed are available in the repository in form of
commits, see Figure 3. Commits can be downloaded by
every community member, examined and reviewed in
detail. At certain time spots we pick a number of commits
with positive review results, create a system from them,
test it and eventually release the commits if successful, see
Figure 4.
Often the uploaded designs lack of sufficient
description, e.g. we see quite often that a build of material
(BOM) is missing. So we think, that more effort must be
spent into improving the build description.
Our goal of this project is to create a process to build
machines (in our case a robot system) consisting of
components such as software code, design files, controller
hardware and build instructions in form of videos and
boms.
164
Figure 2. Review
MIPRO 2016/MEET
During creation of the system, software components
are compiled, design files are created by a 3D printer, and
controller hardware are ordered automatically. Then we
build the new robot upon these components. If we are able
to build successfully the robot system and if the robot is
passing all testcases, a new release is created from the
picked commits. Then the picked commits are moved into
the master repository of GIT for the public download.
II.
COMMIT DEPENDENCIES
There is a dependency between design files and
software code. Updated design files for the robot can
cause an update to new robot geometry layout, therefore
the robot's inverse transformation algorithm needs to be
adapted inside the software code. So in a case there is a
change in robot geometry, but the person who created the
commit did not change the inverse transformation, then
we do not pick the design file change, because this is a
violation of the process rule. The person who commits
must indicate the dependency between the design file
commit and the software code commit.
Another dependency lies in the choice of the motor
driver, motor and the robot layout. The motor driver and
motor need to be able to handle the inertia of the robot
which is given by the robot layout. In case there is a new
commit with a new motor driver, but no commit with a
robot layout change, then there must be at minimum a
justification that a new robot layout is not needed. Also a
new motor driver, might have consequences on the
software which drivers the motor driver.
There are dependencies on commits such as software
code and testcases. E.g. if a motor driver is updated, we
expect a new release of a testcase which reflects the
testing of new attributes of the motor driver. Otherwise it
would make little sense to introduce a new motor driver, if
there are no testcases to verify them.
Figure 3. Commits
The business case for this design methodology and
design process is to offer a service to the customer to
conduct tests from a set of reviewed commits. We provide
the resources and the knowledge to this service. After
executing the above described process we inform the
customer about the test results and the customer can
precede improving the system based on the feedback we
provide.
We offer simulation code for simulation tests of newly
committed software code. However new design files can
cause a change in robot geometry layout, so the simulation
program must be adapted here as well. We are working on
providing a configuration file of the robot's geometry. In
future we will not pick any design file commits, if there
was no adaption of the robot's geometry configuration file.
III.
AUTOMATIC TESTING
As continuous integration tool we use Jenkins to
gather the commits and to process the components in an
automatic fashion. There are several steps Jenkins is
working on the commits. The steps depend on the content
of the commits:
The first step is that Jenkins extracts reviewed
commits containing software driver changes and it
compiles a new driver level. Jenkins then extracts the
testcases and the robot geometry layout and updates the
simulation program. The driver is then tested against the
simulation.
In case the commits contain new robot components in
form of design files, Jenkins loads the design files
automatically into 3D printer and it creates them one by
one.
Jenkins passes the content of the commits with
purchasable parts such as new robot controller parts to an
ordering software and orders automatically the new robot
controller parts.
Figure 4. New release from commits
box.
Identify applicable sponsor/s here. If no sponsors, delete this text
MIPRO 2016/MEET
We cannot do the assembly automatically due to its
complexity, so this is done manually. The tester of the
robot system assembles the robot with the newly created
components (design files and robot controller parts), loads
165
the compiled software to the controllers and tests the
system.
IV.
2.
We pull simulation software from the repository,
if e.g. updated design files require this. This is the
case when the design files cause a new robot
geometry layout. A new simulation system is built
from the new design files.
3.
We run and test the driver against the simulation
with testcases located in the repository. In some
cases the testcases need to be updated, due to new
robot layout, or new robot controller parts.
4.
In case simulation fails, we will create a list of
errors and a list of suggestions for improvement
(e.g. with a ticketing system) and hand them back
to the customer.
BUSINESS IDEA
We are working toward a goal to use this process to
create a business opportunity to offer a service to
customers. There might be customers who are interested
in one of our designs, but in general the designs don’t
fulfill completely their requirements. The customer can
download the design and create their own system from it,
but he has often a good understanding of what needs to be
done differently. Often the customer does not have the
knowledge to build a new system, or he is not willing to
spend the effort himself. So he can ask us for a service to
conduct tests from a set of reviewed commits. The
commits can be existing commits, commits provided by
the customer himself or requested design changes we
released for the customer.
We provide the resources to do the required design
changes and to conduct tests. Every design change will be
uploaded to git in form of a commit. We retrieve the
requested commits and apply them to a process in three
phases:
If all tests pass in the first phase, we can move to the
second phase:
B. Second Phase
The purpose of the previous phase was to test the
software code and the hardware in simulation only. The
second phase is concerned about creating the real
hardware of the robot system, see Figure 6.
Figure 6. Second Phase
Figure 5. First Phase
A. First Phase
In the first Phase we pick components and solely
test them on simulation, Figure 5. depicts this
process.
1.
166
The customer selects together with our
consultation the committed software code from
our repository, and we pull the software code and
create a driver from it.
1.
In case the customer requires new controller
hardware described in the commits, and the
commits passed the reviews, then Jenkins will
pull the content of the commits from the
repository and orders automatically the hardware
components using a consumer goods ordering
system.
2.
If there are new design files, Jenkins pulls the
commits with design files from the repository and
MIPRO 2016/MEET
loads them into the 3D printer and it prints the
updated parts.
3.
From all newly created parts, we build manually
the new robot system.
4.
We run testcases against the real hardware. These
are the same testcases, which we ran previously in
Phase 1. against the simulation.
5.
If the test against real hardware fails, we create a
list of errors and a list of suggestions for
improvement and hand them back to the
customer.
If the second phase passes all tests, then we are
moving to the third phase:
C. Third Phase
The second phase was about testing the system in a
real hardware environment. If all testcases pass the second
phase, then we precede with the repository git itself, see
Figure 7.
V.
SYSTEM COMPONENTS
In the following you find a summary of components
which will be released into the git repository and a list of
software components to set up the process. All
components can be openly pulled. Also everybody can
review all components and add comments. The comments
are shown publicly.
A. Control Software
The control software is code which drives motors and
contains control interfaces to move the robot head from
one point to another. There are currently two sorts of
movements possible, which is linear movement and point
to point movement. To translate the movements to motor
signals, we need to transform Cartesian coordinates to
motor positions. The software component “inverse
transformation” is dealing with this task.
We purposely use UML to generate automatically
code from the UML diagrams. The advantage of UML is
that graphical representation of code is easier to review
than pure C++ code. There are certainly drawbacks with
UML, such as fixes in pure C++ code must be phased
back to UML which is often quite cumbersome. There are
two sorts of diagrams we store into the git repository:
UML Class Diagrams and UML Object Diagrams.
Currently we have a linear programming approach for
our control software. This means that each task is
executed in sequence. We will enhance this to a real time
operating system containing scheduler, dispatcher etc. So
we will be able to execute tasks concurrently and we will
be able to set priorities to tasks. E.g. a task with low
priority can be the driver for the display. Below you find a
list of software components stored in the git repository.
Figure 7. Third Phase
•
UML Class Diagrams
•
UML Object Diagrams
•
inverse transformation algorithm
•
point to point movement algorithm
•
linear movement algorithm
•
control program
•
future topic: scheduler
•
future topic: preemptive multitasking code
•
testcases
B. Simulation Software
1.
We gather all commits contributed in phase 1. and
phase 2., and create a new release level from the
them.
2.
All commits are merged into the master branch of
the repository.
3.
The system is demonstrated to the customer.
Customer accepts the system if he is satisfied and
we will bill him.
MIPRO 2016/MEET
The reason to use simulation software before testing
on real hardware is to save the effort to build up the real
hardware, just for testing the software. Revealing software
problems might lead to the decision to stop building up
the hardware, which saves time and effort. Another reason
is to continuously testing the software released inside the
repository. Single commits can be pulled in an automatic
fashion and drivers can be built from them. Tools such as
Jenkins are suited for this kind of tasks.
167
The simulation system we build up with will be a
Mathematica model. It offers the feature to generate the
source code from it. Therefor we release Mathematica
models into the git repository. The model is currently pure
static, meaning that the motor dynamics and the robot
systems inertia is not taken into account. In future we will
implement a dynamic model as well. We will configure
the static model of the robot in a configuration file. This
means that if we have a design change of the robot, we
often do not have to touch the Mathematica model, but
just need to update the configuration file.
In future we will also combine the stl design elements
with the simulation, so the robot system can be tested
visually. The following simulation components are
released into the repository.
•
mathematic models (Mathematica export files,
source code)
•
pure static model
•
dynamic model (future topic)
•
configuration file for the robot's geometry
•
future topic: graphical simulation elements
C. Robot Controller
We have set to our self the requirement that we don’t
design any component of the robot controller, such as the
ARM32 board and the step motor driver board, besides
the interface connector between the components. We put
weight to the fact that these components must be
purchasable. Here comes the consumer goods ordering
system into effect. Jenkins triggers the ordering process
with the ordering data inside the commit. The following
descriptions for automatic ordering are stored into the
repository
•
components (design files)
•
3D printer parts
•
carriages
•
step motors
E. Software
We mostly use public available software to
accomplish the system we are proposing. First the git
repository, which is more and more becoming a popular
repository for code and other design files. We use Gerrit
as a review tool, so the community can check about newly
available released components for review. Every
community member can download the newly components
and check them for their own purpose. Gerrit offers a
mechanism to evaluate the new component by pressing
radio buttons with attributes such as: ok to release or
prefer not to release. At the end, the administrator has the
right to overrule the comments of the public community if
required, but in general, the comments are taken very
seriously.
Jenkins is a continuous integration tool using a plugin
from Gerrit. E.g. if there is a control software release,
Jenkins can download automatically the files, compile
them and test them against a simulation. Then even
Jenkins can set automatically an evaluation about the
released code, depending if the testcases which have been
running against the simulation were successful or not.
The following software tools are currently used to
accomplish the proposed system:
•
power supply
•
git repository
•
Controller hardware: ARM32 board, step motor
controller and the ordering data
•
Gerrit review tool
•
Jenkins, a continuous integration tool
•
consumer goods ordering system
Other files going into the repository:
•
Interface connector for ARM32 board and step
controller board.
•
design description
•
complete layout plan with pure purchasable
standard components and their interface
connectors.
D. Robot
The component files (design files) of the robot are
released as Sketchup files, files in stl format and files in
gcode format. Downloading the gcode format files makes
it us easy to automatically move the files to the 3D Printer
and print them out. This can be partially triggered by
Jenkins again. There are a few parts from the robot, which
cannot (or should not) be printed, such as bearings,
screws, carriages, step motors etc. Here comes the
168
consumer good ordering system into play again. So the
person who releases purchasable parts, must fill in the
commit comments with data, where to buy the part.
Jenkins triggers the ordering application, gathers the
ordering data from the commit and automatically orders
the parts. The following is release in git.
VI.
OUTCOME AND CONCLUSION
In this paper we presented a new design methodology
for machinery which includes the public community in
reviewing the machinery parts consisting of software
code, controllers, machinery parts and other. We are still
in an early stage. GIT, Jenkins etc. are set up, but the team
is struggling to get used to the work process. Also the
automatic features are still missing, such as downloading
code to flash the memory.
In our case we studied mainly the design a delta robot,
but we believe that the concept can be extended to any
kind of machinery. In Figure 8 you can see an improved
version of the robot “Cherry Pi II” released in thingiverse
which we have redesigned trying the proposed concept.
This is the first robot which is undergoing the proposed
MIPRO 2016/MEET
process. The redesign involved the removal of mechanical
play of the carriages. Another delta robot is currently
created, see Figure 9. There you can see a redesign of the
carriage using only low cost parts. Our study does
currently not involve the open public, but only a selected
number of persons.
within the Program Fit4Research. Without the funding we
would have no opportunity to start the project described in
this paper. Also we would like to thank Florian Wiest to
show so many new research idea within this area. Also we
thank our students Sinem Guruplar and Goncagül
Albayrak working on their theses providing the outcomes
presented in this paper.
REFERENCES
[1]
[2]
[3]
[4]
[5]
Figure 8. Cherry Pi II robot
[6]
In future we will involve the public community too.
The incentive of the public community for reviewing the
components is that they can download all parts of the
machinery for free to build devices for their own purpose.
Therefor the open public does have an interest in
continuously improving the design located in the public
repository.
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Figure 9. Redesign of the carriage
The administration of the repository needs funding, so
we came up with a business idea, that we offer a
purchasable service to our partners to test out the
machineries according to their demands. The customer
can take a look at the machines we offer on the website
and convince them, that we have the competence to build
such machinery. The service include consulting,
simulating and testing.
ACKNOWLEDGMENT
We would like to thank the University of Applied
Science Albstadt-Sigmaringen for sponsoring this project
MIPRO 2016/MEET
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
Oracle White Paper, The Department of Defense (DoD) and Open
Source Software, 11/2013
G. Madey, V. Freeh and Renee Tynan, The Open Source Software
Development Phenomen: An analysis based on social network
theory.
Open Sources: Voices from the Open Source Revolution, 156592-582-3
J. Asundi, Software Engineering Lessons from Open Source
Projects, Position Pater for the 1st Workshop on Open Source
Software, ICSE-2001
R. Vallance, S. Kiani, and S. Nayfeh, Open Design of
Manufacturing Equipment, MIT
Sören Sonnenburg et. al., The Need for Open Source Software in
Machine Learning, Journal of Machine Learning Research, 2007.
S. Maurer, S. Scothmer Open Source Software: The new
intellectual property paradigm, National Bureau of Economic
Research (NBER) working paper, 2006
P. Rigby et. al., Peer Review on Open-Source Software Projects:
Parameters, Statistical Models, and Theory
B. Lampson, R. Sproull, An Open Operating System for a SingleUser Machine
NEMA. 1996, Motion/Position Control Motors, Controls, and
Feedback Devices, NEMA Standard MG-7 Rev. 1. National
Electrical Manufacturers Association (NEMA), Rosslyn, VA.
September 1996.
Free Software Foundation, GNU General Public License,
http://www.gnu.org/licenses/gpl-3.0.en.html
Open Design Foundation. "Open Design Definition",
http://www.opendesign.org/odd.html
Maker culture, https://en.wikipedia.org/wiki/Maker_culture
humanity's first general-purpose self-replicating manufacturing
machine, http://reprap.org/
Thingiverse, http://www.thingiverse.com/
Openbuild, http://openbuilds.org/
www.amazon.com/oc/dash-button
Pinshape, https://pinshape.com/
3D Shook, www.3dshook.com/
Trinpy, https://www.trinpy.com
CGTrader, https://www.cgtrader.com/
3D-File Market, http://3dfilemarket.com/
Cults, https://cults3d.com/
Makershop, https://www.makershop.co/
MyMiniFactory, https://www.myminifactory.com/de/
Redpah, https://www.redpah.com/
Shapeking, http://www.shapeking.com/de/
Yeggi, http://www.yeggi.com
Youmagine, https://www.youmagine.com/
Threeding, https://www.threeding.com/index.php
TurboSquid, http://www.turbosquid.com/
STL Finder, http://www.stlfinder.com/
Florian Wiest, Die Zukunftsmacher, Schwäbisches Tagblatt,
6.3.20
169
DC VIS
International Conference on
DISTRIBUTED COMPUTING, VISUALIZATION SYSTEMS AND
BIOMEDICAL ENGINEERING
Steering Committee
Chairs:
Karolj Skala, Ruđer Bošković Institute, Zagreb, Croatia
Roman Trobec, Jožef Stefan Institute, Ljubljana, Slovenia
Uroš Stanič, Biomedical Research Institute-BRIS, Ljubljana,
Slovenia
Members:
Enis Afgan, Ruđer Bošković Institute, Zagreb, Croatia
Almir Badnjević, International Burch University, Sarajevo, Bosnia
and Herzegovina
Piotr Bala, Nicolaus Copernicus University, Toruń, Poland
Leo Budin, University of Zagreb, Croatia
Borut Geršak, University Medical Center, Ljubljana, Slovenia
Simeon Grazio, “Sisters of Mercy” University Hospital Centre,
Zagreb, Croatia
Gordan Gulan, University of Rijeka, Croatia
Yike Guo, Imperial College London, UK
Ladislav Hluchý, Slovak Academy of Sciences, Bratislava, Slovakia
Željko Jeričević, University of Rijeka, Croatia
Peter Kacsuk, Hungarian Academy of Science, Budapest, Hungary
Aneta Karaivanova, Bulgarian Academy of Sciences, Sofia, Bulgaria
Zalika Klemenc-Ketiš, University of Maribor, Slovenia
Charles Loomis, Laboratory of the Linear Accelerator, Orsay, France
Luděk Matyska, Masaryk University, Brno, Czech Republic
Željka Mihajlović, University of Zagreb, Croatia
Damjan Miklavčič, University of Ljubljana, Slovenia
Tonka Poplas Susič, Health Centre Ljubljana, Slovenia
Laszlo Szirmay-Kalos, Technical University of Budapest, Hungary
Tibor Vámos, Hungarian Academy of Sciences, Budapest, Hungary
Matjaž Veselko, University Medical Center, Ljubljana, Slovenia
Yingwei Wang, Universitiy of Prince Edward Island, Charlottetown,
Canada
INVITED PAPER
Views on the Role and Importance of Dew
Computing in the Service and Control Technology
Zorislav Šojat, Karolj Skala
Ruđer Bošković Institute, Centre for informatics and Computing, Zagreb, Croatia
sojat@irb.hr, skala@irb.hr
Abstract - Modern day computing paradigms foster for a
huge community of involved participants from almost the
entire spectrum of human endeavour. For computing and
data processing there are individual Computers, their
Clusters, Grids, and, finally, the Clouds. For pure data
communication there is the Internet, and for the Humanunderstandable Information Communication for example
the World Wide Web. The rapid development of hand-held
mobile devices with high computational capabilities and
Internet connectivity enabled certain parts of Clouds to be
"lowered" into the so called "thin clients". This led to
development of the Fog-Computing Paradigm as well as
development of the Internet of Things (IoT) and Internet of
Everything (IoE) concepts.
However, the most significant amount of information
processing all around us is done on the lowest possible
computing level, outright connected to the physical
environment and mostly directly controlling our human
immediate surroundings. These "invisible" information
processing devices we find in our car's motor, in the
refrigerator, the gas boiler, air-conditioners, wending
machines, musical instruments, radio-receivers, home
entertainment systems, traffic-controls, theatres, lights,
wood-burning stoves, and ubiquitously all over the industry
and in industrial products. These devices, which are neither
at the cloud/fog edge, nor even at the mobile edge, but
rather at the physical edge of computing are the basis of the
Dew Computing Paradigm.
The merits of seamlessly integrating those "dew" devices
into the Cloud - Fog - Dew Computing hierarchy are
enormous, for individuals, the public and industrial sectors,
the scientific community and the commercial sector, by
bettering the physical and communicational, as well as the
intellectual, immediate human environment.
In the possibility of developing integrated home
management/entertainment/maintenance systems, selforganising traffic-control systems, intelligent driver
suggestion
systems,
coordinated
building/car/traffic
pollution control systems, real-time hospital systems with all
patient and equipment status and control collaborating with
the medical staff, fully consistent synaesthetic artistic
performances including artists and independent individuals
("active public") from wide apart, power distribution peek
filtering, self-reorganisation and mutual cooperation
systems based on informed behaviour of individual power
consumption elements, emergency systems which cooperate
with the town traffic, etc., etc., the Dew-Computing
paradigm shows the way towards the Distributed
Information Services Environment (DISE), and finally
MIPRO 2016/DC VIS
towards the present civilisation's aim of establishment of a
Global Information Processing Environment (GIPE).
It is therefore essential, through Research, Innovation and
Development, to explore the realm of possibilities of Dew
Computing, solve the basic problems of integration of the
"dew" level with the higher level Dew-Fog-Cloud hierarchy,
with special attention to the necessity of information (not
only data) processing and communication, and demonstrate
the viability and high effectiveness of the developed
architecture in several areas of human endeavour through
real life implementations. The present scientific and
technological main objective is to provide the concepts,
methods and proof-of-concept implementations that are
moving Dew Computing from a theoretical/experimental
concept to a validated technology. Finally, it will be
necessary to define and standardise the basics of the Dew
Computing Architecture, Language and Ontology, which is
a necessity for the seamless integration of the emerging new
Global Information Processing Architecture into the Fog
and Cloud Paradigms, as a way towards the above
mentioned civilisation goals.
I.
INTRODUCTION
From the very begin of the “computer era” our
civilisation tends to become a global information society,
and the early developments were already aimed towards
computer and data/information interconnectedness. The
science fiction of 1960’s/1970’s, as well as many
scientists predicting possible future, actually envisioned
the present day informatisation. The primary
predisposition for that development was achieving
substantial speed improvements of processors and their
interconnections, and the ability to process increasing
amounts of data in memory. Through considerable
advances of computer science High-performance
distributed computing systems were founded on the Grid
computing paradigm, while scalable distributed
computing systems evolved through the Cloud, later Fog
and now the new computing paradigm called Dew
computing. The exponential growth of Internet of Things
(IoT), Internet of Everything (IoE) and Big Data
processing led to the necessity of horizontal and vertical
scalability of distributed resources at multiple levels or
layers. To facilitate the rapidly developing complex
distributed computer systems and meet the performance,
availability, reliability, manageability, and cost adapted
for requirements of today’s, and specially tomorrow’s
users, a three-layer scalable distributed platform evolved
(Cloud - Fog - Dew computing).
175
Dew computing is a new computing paradigm, which
appeared after the widespread acceptance of Cloud
Computing. The initial research of Dew computing started
from the Cloud Computing architecture, and the natural
metaphor of a cloud lead to the metaphor of the dew. Dew
computing can be seen as a ground, or physical, edge of
Cloud Computing.
II.
THE STORY OF THE DEW
The spread of data processing devices into all
segments and all layers of modern life in our civilisation
has to be, naturally, followed by specific system
architectures. By realising that there is a vast amount of
processing power and storage spread around the world
into quite minute hardware systems, it became obvious
that this widespread distribution of resources could be
used on a more global scale. Actually this type of system
architectures development, the development of
architectures which are not computer oriented, but
oriented towards reasonably effective exploitation of
distributed resources, can be seen to have proceeded in
several stages.
The first of these stages we can already recognise in
the development of the environment of the first large
mainframes and supercomputers during the 1960-ies. A
typical supercomputer (often as a multiprocessor setup,
specially from the second half of the 1960-ies) would
actually use quite a few other computers to perform the
input/output and telecommunications functions, as well as
to prepare data and analyse the results. The cost and speed
of the supercomputer was such that it, although generally
able to stand as a fully self-standing computer system,
would not be used for menial tasks, except during
installation and testing, and therefore would actually never
be used without it's distributed computers processing
environment. The other important distribution was
through the data communication channels towards other
computing systems. A typical large university, institution
or multinational company would have had all their major
computing
resources
always
communicational
interconnected. Those were the starting points of
development of mentioned higher level communication
and processing distribution (i.e. not computer-oriented)
architectures from which probably the Internet is the most
generally known.
The Era of Supercomputers lasted till the second
half of the 1990-ies. In the meantime the interconnection
of computers in the world already lead to the Internet, and
a lot of distribution problems were tackled and solved like the distribution of storage, through the Network File
System (NFS), presently probably the most used
distributed file system; the distribution of tasks and
requests, like for example in the Remote Procedure Call
protocol (RPC); or the distribution of time, through the
community developed Network Time Protocol (NTP); and
the distribution of hypertext human-readable information
through the World Wide Web (WWW), invented by Tim
Berners-Lee in 1989. With the vast spread of computing
(i.e. mostly model, data and less information processing),
the availability of faster and faster individual hardware
processing devices, and the availability of high-quality
inter-computer communication links and protocols, and
176
finally, the huge availability of extremely cheap
"personal" computers, naturally lead to a lot of
investigation into the possibility of using a bunch of the
so-called "of the shelf" "personal" computers in parallel,
as to enable high-speed parallel processing.
The task of organising a bunch of heterogeneous
interconnected computers into a stable, usable, fast and
easily programmable and maintainable system has shown
to be quite complex. Though the interconnection of the
cluster elements was solved through the long-time
stabilised Ethernet standards and internet protocols, the
major problem, not solved to this very day, was and is the
already mentioned organisation of a heap of individual
computers into a consistent and easily controllable,
programmable
and
fault-tolerant,
reconfigurable
environment of sometimes hugely heterogeneous
individual elements.
To get a reliable network of computer clusters on
which computational jobs could be executed at the best
place when it is most convenient, the Grid Computing
paradigm was developed. A lot of scientific areas benefit
from the Grid paradigm, as much of their jobs can be
Workflows of quite different data sizes and processing
complexity either overall or at the level of workflow
elements. Huge experimental data sets, the general cycle
of scientific exploration and computer use (e.g.
experimentation, thinking, writing, programming...), etc.
generally do not require the computer results interactively,
therefore the Grid architecture enables them much higher
processing equipment utilization than it would be possible
on any local system.
Unfortunately, outside some scientific and other
complex computing environments (like e.g. statistics,
business information processing, film rendering etc.), the
Grid paradigm, due to its "batch-processing" general
approach, is not easily or at all directly applicable.
Obviously a non-interactive environment is neither
appropriate for the general public, nor for artists and
scientists which deal with real-time phenomena.
Out of that situation a new notion in computing
paradigms and appropriate architectural structures have
been born. A Cloud. The Cloud Paradigm (or, if you
prefer, actually the Cloud Allegory) means organisation of
such a processing structure that computing, data retention,
data retrieval etc. is done somewhere-anywhere in the
world by systems of clusters, and then provided to
anybody needing data storage and/or processing for any
reason whatsoever, by the so-called "service providers".
The user interfaces changed and adapted themselves to the
needs of a more general public, and a lot of cloudcompatible programmes are written to accustom many
differing end-user needs. However, the Clouds
themselves, as opposed to the Grids, are not user-level
controllable or programmable, and therefore there is a lot
of proprietary solutions, though, in the manner of a Cloud,
this huge heterogeneity of the individual computers,
clusters and grids of clusters inside any particular
available Cloud (provided by a cloud-provider) is, or at
least tends to be, much hidden from the end-user.
As with all the nomenclature of different generations
or system architectures mentioned, all of them started
MIPRO 2016/DC VIS
being implemented and used often much earlier than their
names have been invented. The development process in
science has in this sense a two-sided push. Something is
invented, starts being used, becomes a seed for more
global thought, a name and paradigm is invented,
scientifically investigated and than applied to generalise
the usage of such inventions. Solving problems of
generalisation of a specific paradigm leads to a new circle
of invention... in a kind of spiralling development.
After seeing the Cloud we realised that we have to
bring processing also closer to the Earth, to the
user/consumer edge. So the Cloud slowly settled into the
Fog.
Probably the first actual fully fledged Fog paradigm
idea was the development of the Postscript language - a
language specifically designed for Printers, not for
Computers. Actually there is no reason whatsoever
(except speed) not to use your printer to calculate for
example appropriate sine waves and musical envelopes for
your computer resident audio generation programme. So
we here immediately see the Fog Paradigm - using
"lowest" level equipment to process part of the wider
processing need. The adoption of Postscript for real-time
calculations and rendering in the SUN NeWS and the
NeXTStep display systems, which allowed to run Display
Postscript applications on remote viewing foreground
stations (the Display Servers) controlled and
programmable from the background servers (the Display
Clients), enabled splitting the processing of certain
applications seamlessly into distinct computers, the central
ones (the Clients) and the peripheral ones (the Servers)1.
This idea, combined with the, in that time already
ubiquitous, HyperText Markup Language (HTML) led to
the development, again by SUN Microsystems, of the Java
programming language, and the JavaScript branch. To
mention is also the standardisation development done for
the Wireless Application Protocol (WAP), and the
associated Wireless Mark-up Language (WML), which,
though in intensive use quite short lived, did define some
important points for further development. Many other
such user-station execution languages/environments have
also been developed, like e.g. Flash etc.
So nowadays the situation is that we do have a kind of
primitive "operating system based on device-distributed
'intelligence'" common to all modern web-browsers, and
which allows a lot of processing to be done on the enduser (i.e. hierarchically lowest processing level)
equipment - from rendering properly human-readable text,
through decoding video and audio streams, up to
executing fully functional full-fledged important
applications, or even parts of scientific crowd-supported
distributed computing environment jobs.
Now we are in the situation mentioned above that we
have to scientifically develop this which we already
1
Please beware the terminology. A display
terminal is a Server, which serves the remote requests of
a possibly larger number of compute-server-type display
Clients. The human looks at the output of the Server, and
is mostly interested in the calculations of the Clients.
MIPRO 2016/DC VIS
perceive as a computing paradigm, and gave it the name
of Clouds Laying Low - the Fog. In this effort we will
have to solve a lot of problems to make the usage of
Things (as predicted by the buzz-name Internet of Things)
seamlessly coordinated through the Fog Architecture with
the Clouds, Grids, Clusters and individual Computers, as
to be really usable in the emerging Global Information
Processing Environment.
So, with the Fog we came down from the Clouds to
the "intelligent" Things in our everyday environment, like
our mobile communicators, or our home-entertainment
television-sets, or even our refrigerators and ovens, many
of which show the tendency to be connected to the
Internet. The real challenge now is to find a common
communication framework, information structure and
language to be able to have a grasp over the uncountable
possibilities of such an integrated global system - the
addition of the Internet of Things and the Fog paradigm
onto the already existing huge global processing
infrastructure. And one could imagine a highly refined
home oven helping, in free time, to calculate the
ephemeris of some just discovered asteroid, or to help
with its own cooking experience some newly opened
restaurant on the other side of the world. Believe it or not,
this is the goal our present day development is aiming at.
III.
WHAT DEW CAN DO
However, as all cells in our body are not primarily
information processors, like neurons, but the vast majority
of them are physically controlling their global
environment - our common human body (although also
processing and exchanging information), the same applies
to all of our Environment, natural or human made.
The vast majority of things we see all around us have
no processing power at all: the house, the car motor, the
bathtub, the road or the chair. Or do they?
As opposed to your internet-connected oven, which
has to have quite a high processing power already due to
the complexity of internet communication itself, your car
motor can work without any computing equipment at all although we use presently more and more processors to
fine-tune the mechanical behaviour of the motor based on
environmental conditions. However, there is actually no
possibility for this kind of processors to calculate PI for
some scientist on the other side of the globe. But they do
process a lot of environment information and generate a
lot of control and status data.
The road does not by the first sight seem as something
which would process information. However, a lot of
modern roads do sense traffic, know their surface
temperature and even some of them can sense if they are
covered by rain, snow, ice or mud. That is, in other words,
even our modern roads collect environment data and
distribute it to higher levels, primarily for traffic counting
and direct traffic control if necessary.
"Intelligent homes", massage chairs, hydro-massage
bathtubs, all of those have more and more integrated
control processors. The water and heating boilers, the airconditioners, the washing-machines... are all controlled by
177
processors gathering environment information
controlling the same environment in a specific way.
and
This is the very Dew of Computing. And these simple
computers influence our everyday physical life in an
extremely strong way, with the aim to keep our immediate
environment as close to our wishes as possible. Their
physical influence is truly immediate and tangible, an
influence on the physical quality of human life which no
Computer, Cluster, Grid, Cloud or Fog device could ever
dream to have.
The viability and high effectiveness of the Dew
computing paradigm and emerging architecture is quite
obvious in many important areas of real life
implementations (weather/soil conditions, traffic, medical
emergency, smart home, distribution balancing,
environment protection etc., etc.).
The merits of seamlessly integrating "dew" devices
into the Cloud - Fog - Dew Computing hierarchy are
enormous, for individuals, the public and industrial
sectors, the scientific community and the commercial
sector, by bettering the physical and communication, as
well as the intellectual, immediate human environment.
In the possibility of developing integrated home
management/entertainment/maintenance systems, selforganising traffic-control systems, intelligent driver
suggestion systems, coordinated building/car/traffic
pollution control systems, real-time hospital systems with
all patient and equipment status and control collaborating
with the medical staff, load balancing energy and water
distribution systems, emergency systems which cooperate
with the town traffic, fully consistent synaesthetic artistic
performances including artists and independent
individuals ("active public") from wide apart, power
distribution peek filtering, self-reorganisation and mutual
cooperation systems based on informed behaviour of
individual power consumption elements, etc., etc. 2 , the
Dew-Computing paradigm shows the way towards the
Distributed Information Services Environment (DISE),
and finally towards the present civilisation's aim of
establishment of a Global Information Processing
Environment (GIPE).
IV.
THE DEW
The mission of Dew computing is to fully realize the
potentials of personal/body and human environmentcontrol computers in symbiosis with Cloud services.
However, the complexity of interconnectivity, and even
more the heterogeneity of equipment used through these
paradigms is drastically growing as we approach the Dew
computing level.
Whereas the Cloud and Fog Computing paradigms
address the principles of operation on complex
computations using massive amounts of data, which is
2
It is essential to understand that on the level of
Dew-Computing the Human is the prime and final
decision maker (it is up to a person, the human being, if it
will head the warnings of the car, or if it will necessitate
water even if the system is trying to reduce overall
consumption!).
178
context-free, Dew computing is context-aware, giving the
meaning to data being processed. Data is context-free,
while information is data with accompanying meta-data.
The meta-data places the data in a specific context.
Information processing (data + meta-data) enables the
application of self-organisation on all levels, providing a
much wider scope of problems which can be solved, and
also allows more comprehensive analysis and deeper
knowledge discovery. The Cloud and Fog Computing
paradigms operate on huge quantities of raw data
generated by specific Things, via predefined services.
Since the raw data is out of context, the services need to
be tailored and application specific, requiring data driven
decisions. Building an integrated scalable heterogeneous
distributed computing environment from the level of
Cloud or Fog is currently not plausible (or viable), as the
lack of contextual information disallows generic
integration of all processing elements.
To solve the problems of everyday human/computer
interaction (communication) through ergonomic and
human-oriented end-user interfaces, and therefore also to
allow the adaptability of the whole user environment
system to human needs and wishes, Dew computing has to
be based on Information-oriented processing rather than
being Data-oriented.
In the Dew computing scenario individual Things are
responsible for collecting/generating the raw data, and are
thus the only components of the computing ecosystem
which are completely aware of the context the data were
generated in, therefore dew-devices actually must produce
and exchange Information.
The idea behind the Dew computing is primarily to use
the resources at the lowest level as self-organising systems
solving everyday human and industrial environment
problems. Significant advances and savings can be
achieved by using such physical-edge computing in the
areas of, for example, traffic control, power distribution
(balancing, peek protection, etc.), integrated home
systems, medical systems, emergency public services
systems, industrial control systems (specifically those
necessitating high level, e.g. Cloud, services) etc.
On the physical-edge level a vast majority of Dewdevices will not be connected primarily or at all to the
Internet or to the mobile phone network for
communication with other global computing environment
elements. As the impact of Dew-devices is primarily
space-oriented, i.e. the device's actions are directly
connected, by action or communication, with their
immediate physical and informational surrounding, i.e.
environment, for communication those devices will
mostly use near-communication means as e.g. direct
cabling, free line optical communication, short to medium
distance radio (e.g. Bluetooth) or even electricity
distribution line communication (e.g. X-10).
Actually the "dew" devices are exactly on the opposite
side of the High Performance Computing (HPC), we
could call them Low Performance Information Processing
(LPIP). HPC presupposes high data and computing rates,
whereas LPIP presupposes low information and decision
rates. However, just to mention, the sum of the processing
power of all LPIP devices on our planet highly supersedes
MIPRO 2016/DC VIS
the processing power of all HPC systems, though their
specialisations are on opposite sides of the human
endeavour spectrum, therefore they are actually
incommensurable.
It is obvious from the above that Dew computing in the
sense of information processing actually presupposes
extremely fine granulation of parallelism. On the lowest
level there are devices with slow serial processing speeds
and small to extremely small data/information retention
and processing possibilities. Therefore the Dew paradigm
is radically different from the Fog/Cloud or Cluster/Grid
paradigms, as its lowest and low level devices cannot be
used in a "conventional" programmable way, but they
have to cooperate on that lowest level to solve human
environment problems or needs, and be able to pass (and
consume) information from all hierarchical levels.
Dew computing must be a generic and general
architecture, which must be able to include all of the
present and future hybrid and extremely heterogeneous
information
gatherers,
information
distributors,
information processors, information presenters and
information consumers, at the lowest level directly
connected to everyday machines which are part of the
common modern human physical environment, and at the
highest level interconnected into the global information
processing and distribution system.
To achieve this it is essential to develop and define a
generic communication language and a full ontology, and
not just communication protocols, which are applicable in
data processing environments, but not in information
processing systems. Only in such a way we will be able to
cope with the complexity and heterogeneity of future
equipment, and be able to harness the vast possibilities
which a Global Information Processing System can offer.
V.
CONLUSION
The Dew-Computing Paradigm may well be the final
missing ingredient to the computing development,
transforming the all-pervading clusters, grids, clouds and
fogs of computers into a human-helping Global
Information Processing Environment. Solving properly
many of the envisioned problems at this very beginning of
a New Information Processing Era may well spare our
children of coping with myriads of incompatible and
pseudo-compatible systems which necessitate "higher
magical skills" to get them to do what is necessary, needed
or wanted.
ACKNOWLEDGEMENTS
This work was, in part, supported by the European
Commission through the Horizon 2020: EGI Engange
and INDIGO DC projects.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
MIPRO 2016/DC VIS
John Backus “Can Programming Be Liberated from the von
Neumann Style? A Functional Style and Its Algebra of Programs”,
Communications of the ACM, Vol 21, No. 8, pp. 613-641, August
1978.,
Borna Bebek, Zorislav Šojat: "Optimal Distribution of
Organisational Resources", Diplomatic Academy Year–Book,
Edition: Central European Initiative – International Conference
"Diplomacy for the twenty-first century: knowledge
management", Vol. 2/2000, No. 1, HR: Ministarstvo vanjskih
poslova i europskih integracija; 2000
R. Buyya, C. Yeo, S. Venugopal, and J. B. I. Brandic:
“Cloud Computing and Emerging IT Platforms: Vision, hype, and
reality for delivering computing as the 5th utility”, Future
Generation Computer Systems, vol. 25, no. 6, pp. 599–616, 2009.
EGI
Glossary;
European
Grid
Initiative
WIKI,
https://wiki.egi.eu/wiki/Glossary_V1, 22/2/2014, 21:56
Roland N Ibbett & Nigel P Topham (1996): “HIGH
PERFORMANCE COMPUTER ARCHITECTURES - A
Historical Perspective”, http://homepages.inf.ed.ac.uk/rni/comparch/index.html
Sinisa Marin, Robert J. Thorpe, Zorislav Sojat: “A Modular Robot
Programming System”, 1989; London, GB: RDP Technology.
Sinisa Marin, Mihajlo Ristic, Zorislav Sojat: “An Implementation
of a Novel Method for Concurrent Process Control in Robot
Programming”, Third International Symposium on Robotics and
Manufacturing: Research, Education and Application, ISRAM
‘90, Burnaby, BC/CA; 1990
Janko Mršić-Floegel, Derek Reynolds, Zorislav Šojat, Marco
Biancessi, Stefano Sala: Data Communications, Patent
WO2002025897
A1,
priority
13/9/2000,
http://www.google.com/patents/WO2002025897A1,
retreived
29/2/2016.
John Owens "Data Level Parallelism (2)" EEC 171 Parallel
Architectures,
UC
Davis,
http://www.nvidia.com/content/cudazone/cudau/courses/ucdavis/l
ectures/dlp2.pdf 1/3/2014, 6:13
Karolj Skala, Zorislav Sojat: "Towards a Grid Applicable Parallel
Architecture Machine.", Computational Science - ICCS 2004, 4th
International Conference, Kraków, Poland, June 6-9, 2004,
Proceedings, Part III; 2004
K. Skala, Z. Sojat: "Image Programming for Scientific
Visualization by Cluster Computing", Autonomic and
Autonomous Systems and International Conference on
Networking and Services, ICAS-ICNS; 2005;
Karolj Skala, Davor Davidovic, Enis Afgan, Ivan Sovic, Zorislav
Sojat: “Scalable Distributed Computing Hierarchy: Cloud, Fog
and Dew Computing”, Open Journal of Cloud Computing (OJCC),
2(1),
Pages
16-24,
2015,
https://www.ronpub.com/publications/OJCC_2015v2i1n03_Skala.
pdf
Zorislav Sojat: "Operating System Based on Device Distributed
Intelligence", 1st Orwellian Symposium, Baden Baden, Germany;
1984
Zorislav Sojat, Sinisa Marin: “ISOCOM 20 Filter System User
Documentation: Flow Through Filter Language and Graphics
Organisation Language”, 1992; BTS, Purley, GB.
Zorislav Šojat: "Nanoračunarstvo i prirodno distribuirani
paralelizam", Glasilo Instituta Ruđer Bošković. 3(7/8):20-22.;
2002
Zorislav Sojat, Karolj Skala: "Multiple Programme Single Data
Stream Approach to Grid Programming", Hypermedia and Grid
Systems, Opatija, Croatia; 2004
Zorislav Šojat, Tomislav Ćosić, Karolj Skala: "Virtue — A
different approach to human/computer interaction" Information
and
Communication
Technology,
Electronics
and
Microelectronics (MIPRO), 2014, 37th International Convention
on.
IEEE,
2014,
http://grgur.irb.hr/Library/Shoyat.Cyosicy.Skala.Virtue_A_differe
nt_approach_to_human_computer_interaction.pdf
Y. Wang, “Cloud-dew architecture,” International Journal of
Cloud Computing, vol. 4, no. 3, pp. 199–210, 2015.
179
180
MIPRO 2016/DC VIS
PAPERS
DISTRIBUTED COMPUTING AND CLOUD COMPUTING
Parameters that affect the parallel execution
speed of programs in multi-core processor
computers
*
Valon Xhafa*, Flaka Dika**
University of Prishtina/Department of Computer Engineering, Prishtina, Kosovo
**University of Vienna/Faculty of Computer Science, Vienna, Austria
valon.xhafa@uni-pr.edu
Abstract - The speed of the computers work depends
directly on the number of processor cores used in the
parallel execution of the programs. But, there are some
other parameters that have impact on the computer speed,
like maximum allowed number of threads, size of cache
memory inside the processors and its organization. To
determine the impact of each of these parameters, we have
experimented with different types of computers, measuring
the speed of their work during solving a certain problem
and during image processing. To compare impacts of
particular parameters on the computers speed, the results
we have presented graphically, through joint diagrams for
each parameter.
I.
INTRODUCTION
Currently, on the market there are different types of
computers with multi-core processors. Use of computers
with these features has an important impact on their
parallel work. Users have difficulties to distinguish which
is the best solution for them, knowing clock speed of
processor, or the information for the type of processor,
like Core i3, Core i5 or Core i7 (in this case often it is
assumed that we have to do with the processor that have
3, 5 or 7 cores) [5].
Programs that are written to be executed on a single
processor or in a processor core without having the
possibility to execute more instructions immediately, are
known as serial programs, or sequential programs [1]. On
the other side, programs that are written for simultaneous
execution in more cores, or more logical processor, are
known as parallel programs [2][13]. Parallel execution of
programs in a computer is based on the parallel work of
processor cores. But, using the Intel’s technology that is
known as Hyper-Threading it is possible for each core to
work as two logical processor [5], respectively it is
possible for the processor with 4 cores to work parallel
with 8 logical processors.
The parallel work of microprocessors has a great
influence on the calculation speed of computers which is
important for the massive use of computers in
communication through the Internet, also searching on
the Web, in the processing of various biometric
information, monitoring the climate changes in real time,
scientific researches, in complicated three-dimensional
games on the computer etc.
Measuring the performance of computers during
parallel execution of programs is not an easy task,
MIPRO 2016/DC VIS
because in the speed of their work, except clock speed,
indicates cores number, model, the support of HyperThreading technology, organization of cache memory in
separated levels [7] etc.
II.
EXPERIMENTAL RESULTS
In order to get relevant conclusions, initially we have
experimented with numerous computers that have
different processor types of Intel producer. In order to
measure the speed of the computer work during
experimentation we have used an ordinary algorithm for
multiplication of two square matrices with different
dimensions, from 100 x 100 to 1000 x 1000 (10 cases).
For parallelism of loops are used methods of
programming language C# [3][10][12], which are
included in Microsoft Visual Studio 2010 package.
We have started the experimentation with computers
that are built using previous processor 2 Duo, which have
cache memory only on the level L2, then with Core i3,
Core i5 and Core i7 processors (short i3, i5, i7) [16],
whose memory are on the levels L2 and L3 and processor
used for servers, such as Xeon type [15]. Finally from all
the computers that are used for experimentation, we have
singled out only the results of 15 computers with Intel
processor, whose characteristics are given in TABLE I,
without differentiating as desktop or notebook computer.
TABLE I.
LIST OF USED COMPUTERS
.
From the given table we can see that the model of the
processor does not automatically determines the number
of cores, neither the number of logical processors. Thus,
185
for example, computer C7 belongs to model i7 and and
contains 4 cores and 8 logical processors [8]. But, his
memories L2 and L3 are smaller, e.g. than at the model i5
processor of computer C2. On the other side, the
computer C14 has model i7 processor, but with two cores
and four logical processor [9]. At the same time,
processors of earlier generation 2 Duo, which are
included in the computers C9 and C12 have relatively
large cache memory on the level L2, but don’t have the
level L3 memory and in parallel work they have shown
lower speed than computers with processors of next
generation.
2.
Parallel execution time
During the parallel execution of the testing program in
separate computers, parallel execution time (Tp) have big
difference from serial execution time (Ts), which can be
seen from the diagram given in Fig.2.
A. Influence of number of cores
To have a clearer overview in how much does the
parallel execution of programs indicates, initially we
have experimented with serial execution testing program
for matrices multiplication, image horizontally flipping,
and Object tracking by activating only one core of the
processor. Then, we have measured the execution time of
a parallel program when all cores are active, respectively
all logical processors. Finally, using the results of
measurements, we found the execution speedup and the
level of parallelism of the program. Having no chance to
present all the results obtained, in the following section
we have given only the graphically results for four
computers from the list given in TABLE I, who have
clock speed approximately the same level (above 3 GHz).
1.
Serial execution time
Results of serial execution time (Ts) for four specific
computers are shown in the diagram given in Fig.1.
The diagram shows that computer C11 is faster, which
uses a processor that is ranked in the group of processors
that are actually fastest [8], the i7 model, with maximum
clock speed of 3.60 GHz, and with 4 cores (short 4-c) and
8 logical processors (short 8-lp). Computer C2, whether it
is the i5 model, has competitive speed with computer
C11. This is seen in the daily ranking of Intel processors
[9], where he has high position in the list.
Simultaneously, the computer C6 with i5 processor is
slower than computer C8 who have i3 processor. From
the given diagram is clear that the curves are ranked
based on the clock speed of processors.
Figure 2. Parallel execution time of testing program
Here it is seen that computer C11 has higher speed,
and computer C6 is slower, which is consistent with their
speed in serial work. But, in this case execution times are
significantly shorter. Thus, for example, computer C11
multiplication of matrices with dimensions 1000 x 1000
performs for more than 2 seconds, and during serial
execution is needed more than 10 seconds. Parallel
execution time in computers C11 and C2 do not differ
much between them, despite being different models of
processors (i7 and i5). During the parallel work of
computer C11 are 8 logical processor active, and in
computer C2 - 4 logical processors. On the other side, as
it is shown in the diagram, with increased dimensions of
matrices, the speed of the computer C6 drops
dramatically, despite having the i5 processor and its clock
speed of 3.10 GHz. From what was said and by
experimentation that we had with other computers, we
can conclude that the speed of parallel execution of the
program can't be determined based on the model of the
processor (i3, i5 or i7), or number of cores, because it is
also influenced by the clock speed of processor. Same
results are achieved during image processing. In our case,
we have taken an image to process by using image
flipping algorithms. First of all, we have measured the
execution time of the serial image processing. After that,
the algorithm used for image flipping is parallelized in
order to process the image in more cores. Later on, we
have measured the execution time of parallel image
processing too. Achieved results are shown in Fig. 3. We
have experimented with computers with 1, 2, and 4 cores,
Figure 3. Execution time for horizontally image flipping
Figure 1. Serial execution time of the program
186
also in the figure are shown the average time of execution
of horizontally image flipping.
MIPRO 2016/DC VIS
execution in finding the object contours. As mentioned
above, object tracking applications are time-sensitive
because their aim is to detect and find object position in a
specific time, therefore delays between framing and
contour tracking of an object should be decreased. From
the diagram we can conclude that delays decrease with
the parallel processing of framed image using multi-core
processors. But the rate of the delay decrees can't be
determined based on the model of the processor (i3, i5 or
i7), or number of cores, because it is also influenced by
the clock speed of processor, this is the same as during
matrices multiplication.
Figure 4. The application interface used in measuring the execution
speed of Contour tracking algorithm
In computers with processor with 1 core, we have used
the serial algorithm for image flipping, thus the average
time of execution is 783 ms. In computers with 2 or 4
cores processor, the parallel algorithm for image flipping
is used. The execution speed
are 456 ms and 234 ms, when 2 core and 4 core
processors are used. Based on this, the execution speed is
boosted when processors with more cores are used. The
interface of the application used for testing and
measuring the execution time of horizontally image
flipping algorithms is shown in Fig.5. Part of the interface
is also the option to choose the number of processor cores
to be used during the processing. Choosing the execution
mode between Parallel and Serial is also part of the
interface. In another experiment we have taken the case
of object tracking application which is a time-sensitive
application. The application frames the video camera into
images, than for every image uses Contour Tracking
algorithms to detect objects contours. The application
interface is shown in Fig. 4. The detected contours of an
object are transferred and displayed in the Object
tracking panel of the application from which is measured
the relative position of the object center. During the
experiments we have measured the serial execution of the
Contour Tracking algorithm and the parallel execution as
well. The results are shown in Fig. 6. Based on the
diagram, there is as an increase of speed during parallel
Figure 5. The application interface used in measuring the execution
speed of image flipping algorithms
MIPRO 2016/DC VIS
Figure 6. The average execution time of Object Tracking algorithm in
processors with different number of cores
3.
Execution speedup
If it's known the serial execution time (Ts) and
parallel execution time (Tp) of the testing program, can
be calculated execution speedup (S) through the
expression [6]
𝑆=
𝑇𝑠
𝑇𝑝
(1)
Using the experimental results mentioned above,
execution speedup of 4 computers taken into
consideration, it will look like is given in Fig. 7. Even the
execution speedup maintain the sequence of curves for
specific computers as well as in diagrams for execution
time. But curves have variations, with increases and
decreases. At the end of their, obvious that only the
computer C11 continues accelerating growth (finally
Figure 7. Execution speedup of the execution process
187
reaches the value of 4.5 times), while the three other
computers values stay the same, or decrease. As it is seen
from the diagram given, and even from experimentation
that we had with other computers too, the execution
speedup is greater if the number of operations that are
executed (e.g. multiplication of matrices with large
dimensions) increases up to a certain thresholds. During
this there is no continual growth execution speedup,
which shows that parallelism is not perfect.
B. The impact of cache
In the speed of the computer work it is important even
the size of the cache memory. Models of the last
generations of processors contain cache memory on the
three levels. Initially we have experimented with
computers of prior generations, like 2 Duo processor
cores. In TABLE I we've included computers C9 and
C12, whose processors have only memory L2 level and
we have noticed that compared with other computers in
parallel work are extremely slow, despite having
relatively higher clock speeds. Then we experimented
with other 6 computers also included in the TABLE I and
the results obtained for them are presented in the diagram
given in Fig. 8.
From the given diagram it is shown that computer C7
with i7 processor is faster and has large cache memory on
the third level whether that it has the lowest clock speed.
Computers C2 and C1 have approximately the same
speed, with processor of model i5, clock speed much
greater than computer C7 and have a maximum cache
memory at two levels. In the overall ranking of
Here, N is the number of processors, and P represents the
relative percentage of the program parallelism. Starting
from the given Law, through mathematical operations can
be derived expression for the calculation of the level of
parallelism of a program:
𝑃=
𝑁(𝑆 − 1)
𝑆(𝑁 − 1)
(3)
Since in the measurement of execution speedup values
are calculated using (1), and knowing the number of
cores, i.e. the number of logical processors (N), the level
of parallelism can be calculated through (3). Thus, for
computers with i7 processor, the level of parallelism will
appear as is given in Fig.9. From the diagram provided
and experimentation that we have done with other
computers, it is shown that the relative level of
parallelism of the program execution is over 0.65 and just
a bit under 1.00, which is directly linked with the speed
of parallel execution of the program.
Figure 9. Parallel execution time of computers with processors that have
different cache memory
D. The maximum rate of parallelism
Figure 8. Parallel execution time of computers with processors that have
different cache memory
processor of computer C1 have higher rank than the
computer C2 [9]. Speed of computers C8 and C6, despite
that they have processors i3 and i5, in their speed ranking
have higher speed clock and cache memory size. The
Computer C14, with i7 processor, have the highest
execution time.
During the parallel work of computer, the operating
system determines the number of threads that are
activated [10], or being allocated as needed certain
number of threads to programs that run in parallel [11].
However, this can be done by the user too, by limiting the
number of threads. If parallel execution relates only to a
program, computer work can be optimized if the number
of threads deal as far as the number of cores, i.e. the
number of logical processors.
To show this we experimented with several
computers. But in Fig. 10 have just given diagram of
C. The level of parallelism
The maximum execution speedup of program can be
determined by Amdahl's Law [4][6]
𝑆=
1
(1 − 𝑃) +
𝑃
𝑁
(2)
Figure 10. The rate of parallelism
188
MIPRO 2016/DC VIS
parallel execution time of testing computer C4, limiting
the number of threads between the values 4 and 20. The
given diagram clearly shows the execution time is
minimal if the maximum degree of parallelism obtained
4, which corresponds to the number of processor cores of
the computer where the program is executed. Execution
time increases if it’s allowed a higher degree of
parallelism, which is observed in most of the computers
with which we experimented. From this we can conclude
that the higher speed of parallel program execution is
usually achieved when the maximum degree of
parallelism is taken as far as the number of processor
cores.
III.
[11] https://msdn.microsoft.com/en-us/library/system.threading.tasks.
paralleloptions.maxdegreeofparallelism%28v=vs.110%29.aspx
[Accessed on 23.1.2014]
[12] Gaston C. Hillar. Proffesional Parallel Programming with C#,
Wiley Publishing, Inc., Indianapolis, Indiana, 2011
[13] Peter S. Pacheco. An Introduction to Parallel Programming,
University of San Francisco, Morgan Kaufmann Publishers, 2011
[14] Hans-Wolfgang Loidl. Parallel Programing in C#. Heriot-Watt
University, Edinburgh, 2012
[15] http://www.intel.com/content/www/us/en/processors/xeon/xeonprocessor-e7-family.html [Accessed on 18.1.2015]
[16] http://www.intel.com/content/www/us/en/processors/core/5th-gencore-processor-family.html [Accessed on 15.7.2014]
CONLUSION
The speed of the serial execution of the program for
different computers mainly is indicated directly from the
processor clock speed. But the speed of parallel execution
of the program can't be determined based on the model of
the processor, or the number of cores, because the cache
memory size and clock speed have an important impact.
Execution speedup of the computer, depends on the
number of processor cores, as well as the level of
parallelism of the program being executed, these
conclusions are achieved during the experiments of
matrices multiplication as well as during image
processing. But the level of a parallelism of a program
execution depends directly on the speed of the parallel
work of the computer. In computers with which we
experimented, level of parallelism is over 0.65 and just
under 1. In order to optimize the parallel speed of
computer we can manipulate with the allowed number of
threads. Experimental results show that if a single
program is executed, the speed of computer work is
usually better if the number of threads deal as far as the
number of processor cores.
REFERENCES
[1]
Blaise Barney. Intoduction to Parallel Computing. Lawrence
Livermore National Laboratory. https://computing.llnl.gov/tuto
rials/parallel_comp/ [Accessed on 10.9.2014]
[2]
Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar.
Introduction to Parallel Computing, Second Edition. Addison
Wesley, January 16, 2003
[3]
Parallel Programming with .Net. http://blogs.msdn.com/b/pfx
team/ . [Accessed on 17.8.2014]
[4]
Frank Willmore. Parallel Programming. The University of Texas
at Austin
https://portal.tacc.utexas.edu/c/document_library/get_fi
le?uuid=d7f742a0-42c8-4407-b3c6-9fe1cbda14f1&groupId=13
601 [Accessed on 25.11.2014]
[5]
http://www.pcworld.idg.com.au/article/386100/what_difference_
b etween_an_intel_core_i3_i5_i7_/ [Accessed on 18.12.2014]
[6]
Jan Zapletal. Amdahl's and Gustafson's laws. VSB - Technical
University of Ostrava, 2009
[7]
Shared Memory Multiprocessors. http://www.inf.ed.ac.uk/teachin
g/courses/
pa/Notes/lecture04-multi.pdf
[Accessed
on
23.12.2013]
[8]
http://www.pcworld.idg.com.au/article/386100/what_difference_
b etween_an_intel_core_i3_i5_i7_/ [Accessed on 16.12.2014]
http://www.cpubenchmark.net [Accessed on 18.1.2015]
[9]
[10] Optimal number of threads per core. http://stackoverflow.
com/questions/1718465/optimal-number-of-threads-per-core
[Accessed on 12.3.2013]
MIPRO 2016/DC VIS
189
Federated Computing on the Web: the UNICORE
Portal
Maria Petrova-El Sayed∗ , Krzysztof Benedyczak† , Andrzej Rutkowski† and Bernd Schuller∗
∗ Jülich
Supercomputing Centre
Forschungszentrum Jülich GmbH, Germany
Email: {m.petrova, b.schuller}@fz-juelich.de
† Interdisciplinary Centre for Mathematical and Computational Modelling
Warsaw University, Poland
Email: golbi@icm.edu.pl, rudy@mat.umk.pl
Abstract—As modern science requires modern approaches,
vast collaborations comprising federated resources on heterogeneous computing systems rise to meet the current scientific
challenges. Due to their size and complexity these computing
systems become demanding and could further complicate the
scientific process. As a result, scientists are overrun with the
necessity of additional technical experience that lies outside their
domain.
UNICORE is a middleware which serves as an abstraction
layer to mask the technical details and ensure an easy and unified
access to data and computation over federated infrastructures.
The Portal is the newest client in the UNICORE portfolio providing web access to data and computing systems. With the rising
demand of having an up-to-date user-friendly graphical interface
and access to computational resources and data from any device
at any point in time, the Portal meets the contemporary needs
of the dynamic world wide web.
This paper describes the functionality of the Portal and its
advantages over other clients. It also discusses security and
authentication methods offered by the Portal and presents use
cases and client customisation from our practice. It concludes
with ideas about future work and extension points for further
simplification for the scientific use.
I. I NTRODUCTION
Complex scientific projects involve multiple computer simulations and calculations. In order to achieve their goals,
scientists need to benefit from heterogeneous computing systems and resources, which can be distributed all over the
world. The working process in such an environment is very
challenging for scientists with limited technical background.
UNICORE is a middleware software that ensures a smoother
path for researchers, allowing them to concentrate on the
actual scientific problem by reducing the amount of required
technical knowledge. The development of UNICORE [1] dates
back to 1996 and was initiated in an effort to provide access to
the three largest German High-Performance Computing (HPC)
centres. Its core design principles comprise: abstraction of
resource-specific details, openness and extensibility, security,
operating system independence and autonomy of resource
providers. The software is available as open source from
the SourceForge repository [2] under a commercially friendly
licence.
The traditional end-user clients in the UNICORE portfolio
are the UNICORE Commandline Client (UCC) and the desk-
190
top UNICORE Rich Client (URC). However, in the current
era of technological advancement the world wide web is
the main means to enable access to data. In an answer to
contemporary demands, the UNICORE Portal provides a webbased Graphical User Interface (GUI). It offers not only a userfriendly GUI for work with distributed computing systems,
but also serves as an access point to compute centres. Web
technologies have multiple advantages, for instance, constant
availability from an arbitrary location.
The remainder of this paper is structured as follows. Section II describes the functionality of the Portal. Supported
authentication methods are covered in section III, which includes a concise presentation of the new Unity [3] component
that plays a crucial role in many of the new development and
integration scenarios. Section IV focuses on client customisations and extensions from our practice. In section V we
discuss some advantages of the Portal by making a high-level
comparison between the different UNICORE clients as well
as by comparing the Portal to other existing web solutions.
Finally, in section VI we summarise and conclude with future
work.
II. F UNCTIONALITY OF THE UNICORE P ORTAL
The UNICORE Portal facilitates computation in federated
systems and makes the grid more accessible. It is also used as
a convenient access point to a particular compute centre.
The functionality of the Portal covers generic use cases.
Before being able to compute or manage data via the Portal,
the user needs to authenticate himself at login. More information on available authentication methods is presented in
section III. After successfully logging in, the user is directed
to the home page of the Portal. The home page is to a
great extent configurable by an administrator and can contain
either a Really Simple Syndication (RSS) feed by choice, or a
personalised design from a local Hypertext Markup Language
(HTML) file. The Portal allows multiple customisations (done
by an administrator only) like hiding certain views, branding
the GUI, adding and exchanging logos and much more.
MIPRO 2016/DC VIS
A. Job Submission
Focus points of the Portal functionality are execution of
jobs, fetching of results and presentation of outcomes. By
clicking on New Job from a flat navigation menu, the user
is forwarded to a view for job submission (Fig. 1). Here he
can select an application from a pre-filled list, which is being
dynamically updated. The information about the applications
in this list is fetched from the configuration files of the
UNICORE sites which are accessible to the user.
In this view, job descriptions and application specific input
parameters can be entered in a graphical form. The latter would
have normally been put into a script and manually submitted
for execution. A GUI client eliminates the need to look into a
detailed job script by automatically generating it in a ready to
be submitted state. The user has to only prepare the necessary
data and set the desired arguments. He can as well upload
files from a remote or local storage. When clicking on the
corresponding button, a window with the accessible storages
will be open. Then the user can browse a selected storage
and navigate through its folders in a manner resembling work
within a file system. He also has the possibility to create new
files where he can write necessary algorithms or other required
input information. Most of the local or remote files can be
changed on the fly through a file editor. However, restrictions
on the size prohibit opening and editing of too large files from
within the Portal.
A stage-out destination of objects to a desired output
location can also be specified for every job. For example, the
user can define that the job output be copied to a remotely
located folder of choice.
Already in this view the user can see which sites he has
access to and can choose where to send his job for execution.
If none selected, the UNICORE server will take the decision
based on a thorough brokering algorithm, which takes into
account the needed resources, the load of the sites, etc. The
required resources for the successful completion of his job
can also be specified by the user in a comfortable GUI form.
Furthermore, multiple jobs can be edited at the same time
in multiple windows. Full job definitions can be saved and
restored, thus allowing for parameter fitting and re-submission.
B. Monitoring
The monitoring of the job results is represented in a
dedicated job table on a separate view. The table contains
the job name, status, submission time, tags and other details.
The update for each status change or a new job submission
happens automatically. The user can sort, filter and search the
table entries, as well as re-submit, abort or delete jobs. He
can also browse the working directory. If the job’s execution
completed successfully, the working directory contains all job
related files including the fetched output results. These can
normally be viewed in the browser or downloaded to the local
machine. As expected, the user can monitor and operate only
with his own data and only after a successful authentication. In
case of an unsuccessful job execution, the working directory
stays empty as no output has been produced. However, the
MIPRO 2016/DC VIS
Fig. 1. The view for preparation and submission of generic jobs.
user can view the reason for failure by clicking on a button
which opens a window with more detailed information.
C. Data Manager
The Portal offers a powerful data manager which assists
users in data-related tasks. The intuitive look & feel ensures a
comprehensive GUI for transfer, upload or download of data
within different locations. Users are able to create, preview
and delete files in the browser as well as edit their content
on the fly. The data manager supports both client-server
upload/download and server-server transfers with several available data transport protocols. High throughput protocols such
as UNICORE File Transfer Protocol (UFTP) [4] are also
included.
D. User Workspace
Another typical characteristic is the so called user
workspace. It enables the storage of data directly in the Portal
in order for them to be accessible from anywhere on the
web. For example, the data from submitted jobs are being
preserved in the workspace. Each user has his own separate
workspace, which is always created by default. However, for
security reasons the Portal administrator has the possibility to
hide the workspace from the user, thus restricting his access.
If access is granted, the user has full control over the data
from his workspace with the help of the data manager.
191
E. Workflows
The Portal supports the creation and submission of elementary workflows. With a few mouse clicks the user can compose
a graph of jobs with the required connections, parameters,
imports and exports. The workflow editor is meant to be simple
and intuitive and serves only the basic needs, i.e. only task
sequences. All the advanced features, such as conditional and
loops, are currently omitted.
Identical to the job table, there is a separate table, containing all submitted workflows and their details. The user can
browse through a workflow’s working directory in a manner
resembling browsing through a file system.
F. Sites view
The sites view contains information about all the accessible
to the user sites. The user can track each site’s availability, the
number of its cores, its installed applications and more. If the
geographic location is configured, it will be marked on Google
Maps for a clear representation of the infrastructure. The sites
view is just informative and has no influence on computing.
G. Configurations
The Portal is designed to be flexible and allows for multiple
configurations. All settings including authentication modes,
accessible compute centres, graphical customizations of logos,
home page, main menu entries and more are defined in a
properties file. This file is managed by a so called Portal
administrator who is responsible for the initial set up of a
Portal instance. The end user normally does not have access
to the configuration file and does not need to know or manage
the Portal instance itself. His working process starts from the
authentication point.
H. Internationalisation
Last but not least, the Portal supports internationalisation.
The default language is English. However, adding a new
language to the GUI is uncomplicated and does not require any
programming effort. All the messages are defined in properties
files which can be translated. With a simple configuration
from a Portal administrator, the new language will appear
for selection together with all available languages at the login
screen.
III. S ECURITY AND AUTHENTICATION
The certificates of the X.509 Public Key Infrastructure
system are commonly used by grid middleware as a base
of authentication and trust delegation, needed to coordinate
distributed job execution. This approach turned out to be a
major obstacle in grid adoption: certificates are difficult to
obtain, distribute, refresh and manage. Conversion between
different formats, installation in clients, etc. are problematic
as well.
Web scenarios are even more troublesome. The Portal is a
middleware client and is accessed itself by the user’s browser.
Therefore, the user’s private key cannot be retrieved by the grid
client—i.e. the Portal. Even if the user installs a certificate in
192
the browser, it can only be used for the authentication step. As
browsers do not support grid-specific trust delegation, which
is needed by the grid, this has to be handled differently.
The UNICORE Portal approach to authentication and trust
delegation addresses the aforementioned issues. It preserves
backwards compatibility and does not scarify an overall security. Authentication in the Portal is designed in a modular way.
Each authentication module provides support for a different
authentication mechanism. In general these mechanisms can
be grouped into two categories: local mechanisms and remote
mechanisms.
The local authentication is handled fully by the Portal.
Currently implemented local authentication modes comprise:
• An X.509 certificate—pre-installed in the user’s web
browser.
• User name and password—for simple installations as it
lacks advanced features such as password reset.
• Demonstration account—a shared pseudo account that
can be enabled for demonstrative access.
In order to use the UNICORE infrastructure after any of
the local authentication mechanisms, the user is prompted to
upload a so called trust delegation token to the Portal. The only
exception is the demonstrative account. This is a solution to
the delegation problem, as the private key stays at the user’s
machine. The drawback is that it requires the user to start a
Java Web Start application (served by the Portal) and provide
X.509 credentials to this application. The application generates
a time-limited delegation token and uploads it to the Portal.
This extra step is executed roughly once a month depending
on the user’s trust in the Portal. Nevertheless, it can be seen
as an obstacle for portal users as it requires possession of a
certificate and a potentially difficult action.
To offer first class authentication, the UNICORE Portal also
supports a remote authentication over the Security Assertion
Markup Language (SAML) protocol. The SAML authentication using the web POST profile is described in the OASIS
SAML Standard [5]. Besides the authentication itself, support
for an additional trust delegation assertion was added. Such
assertion must be generated by a universally trusted third party
service and returned to the Portal together with the SAML
authentication assertion.
The role of a trusted delegation and authentication authority
is fulfilled by a Unity service [3]. Unity is a powerful identity
provider, featuring support for outsourcing authentication to
different external identity providers, using both OAuth and
SAML protocols. Through Unity, a middleware can be configured with an arbitrary login mechanism. Unity can enable
one or more from the standard solutions such as password,
authentication with data centre Lightweight Directory Access
Protocol (LDAP), home Identity Provider (IdP) of a SAML
federation, etc. It can even use a social identity provider like
Google, Facebook or Github to name a few.
It should be underlined that the configuration of the selected
authentication mechanisms is done by a Portal administrator
and is transparent for the end user. Thus the choice of allowed
authentication sources is fully controllable and the Portal’s
MIPRO 2016/DC VIS
authentication uses UNICORE trust delegation without any
tricks like generation of short lived certificates or storing user’s
private key in a web server.
IV. U SE CASES AND USER CUSTOMISATION
A. Generic use case
Already in its default configuration, the UNICORE Portal
can be used for a wide variety of use cases. Access to a
federated infrastructure of HPC resources is one of the primary
design goals.
As a concrete example for this use case, we would like
to mention the Human Brain Project (HBP) [6], a European
FET flagship project. The HBP targets a wide range of topics,
aiming at a deeper understanding of the human brain and attempting to leverage this understanding for new technological
advancements. The HBP operates on HPC resources, integrating major supercomputing sites in Jülich, Barcelona, Bologna
and Lugano as well as cloud storage and other resources.
The HBP uses a single-sign-on system based on OpenID
Connect (OIDC) [7]. With UNICORE deployed as underlying
middleware, the UNICORE Portal is used as a simple way to
access the HPC and data resources. The Portal is configured
to use Unity for authentication, and is thus integrated with the
single-sign-on functionality. The user benefits already in this
simple scenario from the abstraction provided by UNICORE
and the single-sign-on functionality; it is no longer required to
know and understand how to log into the various resources, or
be aware of the various batch schedulers that are used at the
various sites (e.g., Slurm and IBM LoadLeveler). The Portal
is used to prepare and submit so called generic jobs. The user
interface for these generic jobs is generated at runtime based
on metadata provided by the UNICORE server.
B. Custom plugins
The generic approach is sufficient for the majority of
applications, especially those whose users were traditionally
preparing input files in some well-established format. However, there are cases when a more sophisticated interface is
welcomed. For example, some applications require special
resource allocations such as CPU, number of nodes, used
memory or architectures, which might all depend on other
application parameters. A specialised application interface can
assist with these complex resource settings. Customisations to
the interface can also improve the preparation of input. For
instance, selecting parts of the image for a visual analysis
is much more convenient when clicking on the image itself
than by providing numeric pixel coordinates. Even though the
generic application description is quite powerful and allows
for expressing the most common dependencies, some complex relationships between input settings may require special
implementation. Furthermore, for many applications output
visualisation is a key feature and its proper placement in the
Portal is crucial for effective research.
As the above aspects were known from the very beginning,
the core Portal is designed as a base for development of
applications or domain specific plugins. A plugin is free to
MIPRO 2016/DC VIS
implement an arbitrary user interface as well as to contribute
with one or more entries to the main menu. The administrator
has control over all menu entries and can select which of them
to be shown on the GUI. This allows for tailoring of the Portal
to particular domain needs even to a degree where a generic
job interface is not presented at all.
The development of a portal plugin is restricted to the base
technology of the Portal, which includes Vaadin [8] as the
underlying framework for building user interfaces in Java and
Spring [9] as a container for objects. These restrictions are
counterbalanced by the possibility to use internal portal components and features to speed up the plugin development. For
example, multiple graphical elements such as the job table can
be re-used from the existing implementation. Already present
are also the various user authentication methods as well as
a grid security context for grid interaction like asynchronous
discovery of resources and jobs. Thus, the development can
be focused purely on application domain aspects.
We present here two examples of such custom modules
developed in PL-Grid Plus and PL-Grid NG projects. The first
one is named SinusMed. SinusMed processes an input, that
being a series of computed tomography (CT) images, which
form a 3D image of a human head. It detects all air-filled
head areas. Areas are subsequently marked and categorised
using reference data sets prepared by medical doctors. Results
of the simulation comprise several sets of layered masks on
the input image, which must be presented in a visual form.
Another example is the AngioMerge application. It is used to
synchronise several angiograms, which are taken periodically
with an interval of several minutes. The output is supposed to
be displayed in 3D, optionally as an animation.
The portal modules for SinusMed and AngioMerge allow for
a simplified job preparation, which focuses on the actual application input and automatically pre-sets the required resources
for each application. Therefore, the user is not burdened with
additional knowledge about the computing infrastructure being
used. What is more, applications submit the jobs with the help
of a UNICORE broker so that site selection is fully automated.
The most attractive features of the above application specific interfaces are found in the output visualisation part.
The SinusMed directly shows the original CT images in the
browser (Fig. 2), enabling the overlay of synchronised masking
layers, computed by the application. The AngioMerge module
embeds a WebGL 3D interface (Fig. 3) allowing for interactive
viewing of the output and even playing the resulting animation.
V. C OMPARISON WITH OTHER SOLUTIONS
A. With Other UNICORE Clients
The UCC is a commandline tool that enables job and workflow management from a shell or scripting environment [10].
It covers all features of the UNICORE service layer including
data transfers, job and workflow submission and monitoring,
results fetching, etc. Among its characteristic qualities is the
possibility to submit multiple jobs in an automated fashion
via a batch mode. To speed up the work, new customised
commands can be defined by the user. However, the UCC is
193
Fig. 2. Results of sinuses detection embedded in the Portal interface after a
simulation in a low quality (fast) mode.
Fig. 3. Output visualisation of the AngioMerge module directly in the portal
web interface.
oriented towards experienced computer users, who find working with a terminal faster and more convenient. In contrast,
there are scientists who welcome an easy to use GUI.
The URC [11] is a standalone client that provides a detailed
graphical interface for the UNICORE functionality. Similar to
UCC, the URC supports all features like job and workflow
creation and submission, retrieval of results, data transfers
and grid browsing. In its essence the software is based on
Eclipse [12], which makes it easily extensible via the plugin
mechanism. Specific to the URC are the different perspectives,
which hide or show features from the UI in accordance to
how advanced the user is. The major strength of this client
is its powerful workflow editor with constructs like loops
and conditional statements for steering the control flow and
enabling a fully automated work process.
In comparison to the URC, the functionality of the Portal’s
workflow editor is elementary. We have taken this approach
due to feedback from our users. In order to use the advanced
structures, many users still need the assistance of a UNICORE
specialist. This is one of the main reasons why we decided to
keep the design of the Portal’s workflow editor simple and
light-weight. Thus, we ensure a comprehensive and intuitive
interface that covers the basic need, whereas the complex use
cases can benefit from the well-developed and well-established
editor of the URC.
Federated computing on the web has multiple advantages
to desktop clients. Unlike the UCC and the URC, the Portal
is available from any location and can be accessed by various
devices. While standalone clients require installation, administration and regular updates done by the user himself, a web
portal releases the user of this responsibility, thus saving time
and effort.
B. With Other Existing Web Clients
194
There are several existing web portals that concentrate on
HPC. One such solution is the InSilicoLab [13]. It is an
environment built to assist researchers in their specific domain.
In principle it has been designed to be generic, but it is
currently used only by projects in chemistry [14]. Similar to
the UNICORE Portal, this tool offers access to distributed
resources on the web, job preparation, submission and monitoring as well as fetching of results and data management.
However, the InSilicoLab has rather specialised into domainspecific interfaces. In contrast, the Portal is more generic with
the intention to be an easily extensible solution that can be
beneficial in various research domains. The Portal is also more
flexible with its numerous authentication possibilities through
the integration with Unity.
Another web client for HPC is Adaptive Computing’s Moab
Viewpoint [15]. The Moab Viewpoint is solely concentrated
on job and data management and does not cover any workflow
features. However, it offers an interface for creating application
templates, which helps in optimizing application run times and
reduces errors. Another enticing feature, which is not present
in the UNICORE Portal, is the existence of two GUI modes:
an end user and an administrator one. The administrator
mode enables supplementary functionality like reporting on
resource utilisation, troubleshooting of the workload, editing
or cancellation of jobs and tracking of node-usage. The Portal
can be customized by an administrator through a properties
file but this should not be confused with the administrator
mode from Viewpoint. The administration in the Portal is
not job based and offers no GUI. It concerns the enabled
options like languages, authentication methods, menu entries,
etc. for a particular Portal installation. One drawback of
MIPRO 2016/DC VIS
the Moab Viewpoint is that it is a commercial software,
which means extra costs incurring for the users. Furthermore,
Viewpoint does not operate in a distributed environment, but
only supports work on a singular site.
VI. C ONCLUSION
The UNICORE Portal is an open source solution that
facilitates scientific work by enabling federated computing on
the web. It offers a convenient and friendly GUI for job and
workflow submission, monitoring and data management. The
authentication to the Portal is done either locally or with the
help of Unity, which allows for using numerous authentication
mechanisms, including social and federated ones, without the
need to possess an X.509 certificate.
The default Portal implementation covers the generic use
case, which is sufficient for most projects. We presented as an
example the integration of the Portal in the HBP platform. The
Portal is also flexible and, due to its modular design, it is easily
extensible by self-developed plugins for specific applications.
The creation of a new plugin requires solely the effort that
is necessary to add the particularities of the targeted domain,
whereas the base portal components and features can be easily
re-used. We described two use cases with custom modules,
namely the SinusMed and the AngioMerge applications.
We also made a concise comparison between the Portal and
the other clients in the UNICORE portfolio with their strong
points and disadvantages. We discussed related web solutions
for HPC environments.
The Portal’s development does not stop here. It faces
multiple milestones on its future path. One of them, which is
under current development, is the workflow template feature.
The Portal will offer the option to import a ready template
file. The latter will be parsed and a familiar graphical form
will be automatically generated. Then the user will need to fill
the data and submit the updated workflow without having to
compose a new one from scratch. This approach will ensure a
comfortable parameter fitting, simple re-use of workflows and
uncomplicated workflow submission.
Another idea that is also in progress is the integration with
data sharing solutions. Scientists will be able to share with
others their files, such as workflow templates, from within the
Portal’s interface.
R EFERENCES
[1] A. Streit, P. Bala, A. Beck-Ratzka, K. Benedyczak, S. Bergmann,
R. Breu, J. M. Daivandy, B. Demuth, A. Eifer, A. Giesler, B. Hagemeier,
S. Holl, V. Huber, N. Lamla, D. Mallmann, A. S. Memon, M. S.
Memon, M. Rambadt, M. Riedel, M. Romberg, B. Schuller, T. Schlauch,
A. Schreiber, T. Soddemann, and W. Ziegler, “UNICORE 6 – recent
and future advancements,” Annals of telecommunications-annales des
télécommunications, vol. 65, no. 11-12, pp. 757–762, 2010.
[2] “UNICORE Open Source project page,” [accessed: 2016-02-01].
[Online]. Available: http://sourceforge.net/projects/unicore/
[3] “Unity project website,” February 2016, [accessed: 2016-02-01].
[Online]. Available: http://www.unity-idm.eu
[4] B. Schuller and T. Pohlmann, “UFTP: High-performance data transfer
for UNICORE,” in Proceedings of 7th UNICORE Summit 2011, ser. IAS
Series, no. 9. Forschungszentrum Jülich GmbH, 2011, pp. 135–142.
[5] J. Hughes, S. Cantor, J. Hodges, F. Hirsch, P. Mishra, R. Philpott, and
E. Maler, “Profiles for the OASIS Security Assertion Markup Language
(SAML) V2.0 OASIS Standard,” 3 2005, [accessed: 2016-02-01].
[Online]. Available: http://docs.oasis-open.org/security/saml/v2.0/
[6] “Human Brain Project,” [accessed: 2016-02-01]. [Online]. Available:
http://www.humanbrainproject.eu/
[7] “OpenID Connect,” [accessed: 2016-02-01]. [Online]. Available:
http://openid.net/connect
[8] “Vaadin framework website,” [accessed: 2016-02-01]. [Online].
Available: https://vaadin.com/
[9] R. Johnson, J. Höller, K. Donald, C. Sampaleanu, R. Harrop, T. Risberg,
A. Arendsen, D. Davison, D. Kopylenko, M. Pollack et al., “The spring
framework–reference documentation,” Interface, vol. 21, 2004.
[10] “UNICORE
commandline
client:
User
Manual,”
[accessed:
2016-02-01].
[Online].
Available:
http://unicore.eu/documentation/manuals/unicore/files/ucc/uccmanual.html
[11] B. Demuth, B. Schuller, S. Holl, J. Daivandy, A. Giesler, V. Huber,
and S. Sild, “The UNICORE Rich Client: Facilitating the automated
execution of scientific workflows,” in e-Science (e-Science), 2010 IEEE
Sixth International Conference on, Dec 2010, pp. 238–245.
[12] “The Eclipse Foundation open source community website,” [accessed:
2016-02-01]. [Online]. Available: https://eclipse.org/
[13] J. Kocot, T. Szepieniec, D. Har˛eżlak, K. Noga, and M. Sterzel, “Insilicolab – managing complexity of chemistry computations,” in Building
a National Distributed e-Infrastructure–PL-Grid. Springer, 2012, pp.
265–275.
[14] A. Eilmes, M. Sterzel, T. Szepieniec, J. Kocot, K. Noga, and M. Golik, “Comprehensive support for chemistry computations in PL-Grid
infrastructure,” in eScience on Distributed Computing Infrastructure.
Springer, 2014, pp. 250–262.
[15] “Adaptive computing: Viewpoint portal,” [accessed: 2016-02-01].
[Online]. Available: http://www.adaptivecomputing.com/products/hpcproducts/viewpoint/
ACKNOWLEDGEMENT
This work was made possible with assistance of the PL-Grid
Plus project, contract number: POIG.02.03.00-00-096/10, and
the PL-Grid NG project POIG.02.03.00-12-138/13, website:
www.plgrid.pl. The projects are co-funded by the European
Regional Development Fund as part of the Innovative Economy program.
The research leading to these results has also received funding from the European Union Seventh Framework Programme
(FP7/2007-2013) under grant agreement no. 604102 (Human
Brain Project).
MIPRO 2016/DC VIS
195
Problem-Oriented Scheduling of Cloud
Applications: PO-HEFT Algorithm Case Study
E.A. Nepovinnykh and G.I. Radchenko
*South
Ural State University, Chelyabinsk, Russia
nepovinnykhea@susu.ru, gleb.radchenko@susu.ru
Abstract - Today we see a significantly increased use of
problem-oriented approach to the development of cloud
computing environment scheduling algorithms. There are
already several such algorithms. However, a lot of these
require that the tasks within a single job are independent and
do not account for the execution of each task and the volume
of data transmitted. We propose a model of problem-oriented
cloud environment. Using this model, we propose a list-based
algorithm of problem-oriented planning of execution of
applications in a cloud environment that considers the
applications' execution profiles, based on a Heterogeneous
Earliest-Finish-Time (HEFT) algorithm.
Keywords – scheduling, execution planning, cloud computing,
grid computing, HEFT
I.
INTRODUCTION
Today a lot of complex e-Science tasks are solved using
computer simulation which usually requires significant
computational resources usage [1]. Moreover, the
solutions, developed for such tasks are often characterized
by structural complexity, which causes different resources
(informational, software or hardware) to be integrated
within a single solution. The complexity of the solutions
grows as the multidisciplinary tasks are considered.
Today’s common approach for building composite
solutions is based on Service-Oriented Architecture [2]
which forms the basis from interconnection of services and
hiding their complexity behind their interfaces.
Interconnection of the services within complex tasks is
usually implemented in a form of workflow structures,
which exploits graph-based structures to describe
interconnection of used services. On the other hand, today
the Cloud Computing concept is developed as a business
framework for providing on-demand services supporting
computing resources’ consolidation, abstraction, access
automation and utility within a market environment. The
service-oriented architecture in the cloud is best
implemented using the microservice approach. The
microservice model describes a cloud application as a suite
of small independent services, each running in its own
container and communicating with other services using
lightweight mechanisms. These services are built around
separate business capabilities, independently deployable
and may be written by different development teams using
different programming languages and frameworks [3].
to
To provide scientists and engineers a transparent access
the computing resources a “Problem Solving
The reported study was partially supported by RFBR, research project
No. 14-07-00420-a and by Grant of the President of the Russian
Federation No. МК-7524.2015.9
196
Environment” (PSE) concept is commonly used. A PSE is
a system that provides all the computational facilities
necessary to solve a target class of problems. It uses the
language of the target class and users need not have
specialized knowledge of the underlying hardware or
software [4]. At present, PSE researchers are investigating
a variety of fields, e.g., Cloud computing support,
education support, CAE usage support, document
generation support, and so on.
Today most of the systems that provide a problemoriented approach to e-science problems on the basis of
high performance computing resources use workflows to
organize a computational process [5]. Nodes of such
workflows represents separate tasks implemented by
individual services, and the edges define the data or control
flow. In this paper, under the “Problem Solving
Environment” term we would understand a set of services,
software and middleware focused on the implementation of
workflows to solve e-Science problems in a specific
problem domain, using resources of cloud computing
system [6].
Within a problem domain of PSE, a set of tasks,
forming the workflow, is predetermined. Those tasks can
be grouped into a finite set of classes. Task class is a set of
tasks that have the same semantics and the same set of input
parameters and output data. On the one hand, this imposes
restrictions on the class of problems that can be solved
using the PSE. On the other hand, such restriction allows to
use a domain-specific information (such as task execution
time on one processor core, scalability limits, and the
amount of generated data) during resources allocation and
scheduling, increasing the efficiency of use of available
computational resources.
So, in order to increase efficiency of distributed
problem-oriented computer environments it is feasible to
use problem-oriented task scheduling methods that use
domain-specific information in order to predict
computational attributes of a particular workflow.
The main goal of the research is to develop a scheduling
algorithm for a workflow-based problem-solving
environment, which would effectively use a domainspecific information (such as task execution time,
scalability limits, and the amount of data transfer) for
prediction of cloud computing environment resources load.
This paper is organized as follows. In section II we
present the concept and the basic idea of scheduling
applications in cloud environments. In section III we
MIPRO 2016/DC VIS
describe the cloud-based problem solving environment
model. In section IV we describe HEFT and PO-HEFT
cloud scheduling algorithms complete with a mathematical
task model. In section V we describe the implementation of
PO-HEFT algorithm in Workflow Sim’s cloud
environment simulation package. In section VI we
summarize the results of our research and give further
research directions.
II.
SCHEDULING APPLICATIONS IN CLOUD
ENVIRONMENTS
Analysis of the main trends in resource scheduling
research in distributed problem-oriented environments
shows that the theme of the problem-oriented scheduling
and prediction of environment load is an urgent task.
In the cloud computer data centers, Holistic Model for
Resource Representation is used in virtualized cloud
computing data [7]. This model is designed to represent
physical resources, virtual machines, and applications in
cloud computing environments. The model can be applied
to represent cloud applications, VMs, and physical hosts.
Each of these entities is described by multiple resources:
computing, memory, storage, and networking. A holistic
model increases the precision of cloud environment
simulation and enables a number of new simulation
scenarios focused on heterogeneity of the hardware
resources and virtualization. The model distinguishes
between computing, memory, storage, and networking
types of resources. However, the model can easily scale to
include other types of resources as well, e.g., additional
GPGPU units.
New
cloud-related
techniques
for
resource
virtualization and sharing and the corresponding service
level agreements call for new optimization models and
solutions. Computational Intelligence proves to be
applicable to multiple resource management problems that
exist at all layers of Cloud computing. Standard
optimization objectives for scheduling are to minimize
makespan and cost, but additional objectives may include
optimization of energy consumption or communications.
Solutions to this multi-objective optimization problem
include but are not limited to: Improved Differential
Evolutionary Algorithm combined with the Taguchi
method, Multi-Objective Evolutionary Algorithm based on
NSGA-II, Case Library and Pareto Solution based hybrid
GA Particle Swarm Optimization, Auction-Based
Biobjective Scheduling Strategy etc. [2]. The main
drawback of mentioned algorithms is the fact that they do
not use information about previous executions.
The main reason that traditional cluster and grid
resource allocation approaches fail to provide efficient
performance in clouds is that most of cloud applications
require availability of communication resources for
information exchange between tasks, with databases or the
end users [10]. CA-DAG model for cloud computing
applications, which overcomes shortcomings of existing
approaches using communication awareness. This model is
based on Directed Acyclic Graphs that in addition to
computing vertices include separate vertices to represent
communications. Such a representation allows making
separate resource allocation decisions: assigning processors
to handle computing jobs, and network resources for
MIPRO 2016/DC VIS
information transmissions. A case study is given and
corresponding results indicate that DAG scheduling
algorithms designed for single DAG and single machine
settings are not well suited for Grid scheduling scenarios,
where user run time estimates are available.
For practical purposes quite simple scheduler MaxAR
with minimal information requirements can provide good
performance for multiple workflow scheduling [11]. In real
Grid environments this strategy might have similar
performance comparing with the best ones when
considering approximation factor, mean critical path
waiting time, and critical path slowdown. Besides the
performance aspect the use of MaxAR does not require
additional management overhead such as DAG analysis,
site local queue ordering, and constructing preliminary
schedules by the Grid broker. It has small time complexity.
This approach is related with offline scheduling which can
be used as a starting point for addressing the online case.
Online Grid workflow management brings new challenges
to above problem, as it requires more flexible load
balancing workflows and their tasks over the time.
Nowadays the shifting emphasis of clouds towards a
service-oriented paradigm has led to the adoption of
Service Level Agreements (SLAs) [12]. The use of SLAs
has a strong influence on job scheduling, as schedules must
observe quality of service constraints. In terms of
minimizing power consumption and maximizing provider
income Min-e outperforms other allocation strategies. The
strategy is stable even in significantly different conditions.
The information about the speed of machines does not help
to improve significantly the allocation strategies. When
examining the overall system performance on the real data,
it is determined that appropriate distribution of energy
requirements over the system provide more benefits in
income and power consumption than other strategies. Mine is a simple allocation strategy requiring minimal
information and little computational complexity.
Nevertheless, it achieves good improvements in both
objectives and quality of service guarantees. However, it is
not assessed its actual efficiency and effectiveness.
One of the most popular algorithms is scheduled listbased algorithm Min-min [13]. Min-min sets high
scheduling priority to tasks which have the shortest
execution time. The main drawback of scheduled list-based
algorithms is that they do not analyze the whole task graph.
One of the important classes of computational problems
is problem-oriented workflow applications executed in
distributed computing environment [14]. A problemoriented workflow application can be represented by a
directed graph whose vertices are tasks and arcs are data
flows. Problem-oriented scheduling (POS) algorithm is
proposed. The POS algorithm takes into account both
specifics of the problem-oriented jobs and multi-core
structure of the computing system nodes. The POS
algorithm is designed for use in distributed computing
systems with manycore processors. The algorithm allows
one to schedule execution of one task on several processor
cores with regard to constraints on scalability of the task.
Cloud computing can satisfy the different service
requests with different configuration, deployment
condition and service resources of various users at different
197
time point. With the influence of multidimensional factors,
it is unreality to test with different parameters in actual
cloud computing center. Typical Tools for Cloud
Workflow Scheduling Research are CloudSim and
WorkflowSim [9]. CloudSim is a toolkit (library) for
simulation of cloud computing scenarios. It provides basic
classes for describing data centers, virtual machines,
applications, users, computational resources, and policies
for management of diverse parts of the system (e.g.,
scheduling and provisioning).
WorkflowSim extends the CloudSim simulation toolkit
by introducing the support of workflow preparation and
execution with an implementation of a stack of workflow
parser, workflow engine and job scheduler. WorkflowSim
is used for validating Graph algorithm, distributed
computing, workflow scheduling, resource provisioning
and so on. Compared to CloudSim and other workflow
simulators, WorkflowSim provides support of task
clustering that merges tasks into a cluster job and dynamic
scheduling algorithm that jobs matched to a worker node
whenever a worker node become idle.
In the following sections, we would present a new
problem-oriented resource-scheduling algorithm for
distributed computing environments, which uses heuristic
score-based approach based on the HEFT algorithm for the
task of the problem-oriented scheduling in cloud
environments.
III.
One particular feature of a problem-oriented computing
environment is the fact that said environment uses
information about task classes’ features during scheduling
and resource provisioning. We require that every task class
should have these functions defined for prediction of task
execution process depending on input parameters:
1) output data volume estimation function;
2) task execution time estimation function on a machine
with a given performance characteristics values
vector.
To implement the problem-oriented scheduling, let us
define two operators, which should be implemented in the
PSE:
1)
CLOUD-BASED PROBLEM SOLVING ENVIRONMENT
MODEL
Let us define a model for cloud problem solving
environment, so we could simulate the task scheduling
algorithm. We would work in the conditions, where a set
𝔐 of virtual machines 𝔪 ∈ 𝔐 was distributed between all
available nodes 𝔫 ∈ 𝔑 of cloud platform.
Let us define a virtual machine image performance
factor as:
π: 𝔪 → ℤ*+ ,
where 𝔪 is a virtual machine image.
Numerical characteristics of a virtual machine image,
synthetic tests results or existing functions’ test execution
results can serve as examples of such performance
characteristics [15, 16].
In order to maximize the quality of task characteristics
prediction on specified machine, we need to take into
account several performance characteristics, including such
characteristics as the number of available processors and
memory; CPU frequency; hard drive data exchange speed;
LINPACK test results and so on. Thus, let us define a
vector Π of the performance characteristics of virtual
machines deployed in the cloud-based PSE:
Π = π0 , π1 … π3 .
Each machine 𝔪 ∈ 𝔐 in a cloud-based PSE is
comparable to the performance characteristics of the vector
Π, which reflects the values of the performance of the
machine:
Π: 𝔐 →
198
Let’s define a set of tasks, that can be executed in a PSE
as a set ℱ of functions f ∈ ℱ. Each function f: 𝒞 89 → 𝒞 :;<
receives n information objects ℐ 89 = I089 , … , I989 of classes
𝒞 89 = C0 , … , C9 . The result of the function is m new
:;<
information objects ℐ :;< = I0:;< , … , IB
of classes
C
. We assume that in our model, each
𝒞 :;< = C0C , … , CB
task of a workflow is allocated to one virtual machine.
Direct access to components of the computing system is not
provided.
ℤ3*+ .
The operator of the expected output 𝜈(𝑓, ℐ GH ) – it is
the operator that returns the expected total size in
bytes of output data objects ℐ IJK for the function
𝑓: 𝒞 GH → 𝒞 IJK :
𝜈 𝑓, ℐ GH = |ℐ IJK |.
2)
The operator of the expected function’s execution
time 𝜏 𝑓, ℐ GH , Π , that returns the estimated run time
(in seconds) of a function 𝑓: 𝒞 GH → 𝒞 IJK for a given
set of input data ℐ GH on a given machine, with the
performance characteristics vector Π:
𝜏: 𝑓, ℐ GH , Π → ℕ.
Execution time of a function f: 𝒞 89 → 𝒞 :;< on a given
machine with a performance values vector Π can be defined
as an operator that takes input information objects vector
ℐ 89 . Unfortunately it is impossible to estimate a function
execution time with absolute accuracy due to the fact that
the computations involved in output information objects
preparation ℐ :;< might indirectly depend on multiple
factors that our model does not account for, including, but
not limited to, background processes, available cache
volume, branch prediction rate, etc. In order to take into
account this inherent inaccuracy, execution time estimate
can be modelled as a random value that is a sum of two
parts:
𝜒(𝑓, Π, ℐ GH ) = 𝜏 𝑓, Π, ℐ GH + 𝛼,
where τ f, Π, ℐ 89 – a deterministic function that represents
a dependency of execution time of the function f that is
running on a computer with a performance values vector Π
on input information objects vector ℐ 89 , α – a stochastic
value with the expected value (M α = 0), that represents
factors that our model does not account for.
In conditions, when a specific function (task) of PSE is
implemented on the basis of a virtual machine with a predefined performance characteristics vector Π+ , to evaluate
MIPRO 2016/DC VIS
the expected value E χ(f, Π+ , ℐ 89 ) we can use the k-nearest
neighbor method, based on records of execution time of the
previous launches of function f on the same machine with
close values of input parameters:
𝐸 𝜒(𝑓, Π+ , ℐ GH ) = 𝜏 𝑓, Π+ , ℐ GH =
1
𝑘
c
b
(1)
ℐ^^_ ,`a
𝑊G ℐ GH 𝑡
Gd0
where τ f, Π+ , ℐ 89 is a weighted average execution time
estimation for the function f with the ℐ 89 input parameters
on the basis of k previous observations of the execution
time t iℐgh,` of the function f with the input parameters ℐ889
a
g
on a machine with performance characteristics vector Π+ .
The weighting function W8 ℐ 89 assigns greater weight to
the function execution time records, where the values of the
input parameters are closer to the ℐ 89 .
To take into account a possibility of execution of the
function f on the virtual machine with the characteristics
vector that is different from Π+ , we can extend the
definition of the evaluation parameter vector, adding to the
vector ℐ 89 of the input parameters the virtual machine
performance vector Π = π0 , π1 … π3 . Thus, we assume
that as a characteristic we use the vector P of dimension
n + r, where n - number of input parameters of the
function f, r - the number of virtual machine performance
characteristics, defined as follows:
𝑃 = [I0GH , … , IHGH , 𝜋0 , … , 𝜋p ].
In this case, (1) can be transformed into the following
form:
GH
𝐸 𝜒(𝑓, Π, ℐ ) = 𝜏 𝑓, Π, ℐ
GH
1
=
𝑘
c
b
𝑊G 𝑃 𝑡r^ ,
Gd0
t isg
where
is the value of the previous observation of
execution time of the function f with the evaluation
parameter vector P8 .
Similarly, we can estimate the amount of output data for
the function f:
𝜈 𝑓, ℐ
GH
1
=
𝑘
c
𝑊G ℐ
GH
Gd0
b
𝑣 ^_ ,
ℐ^
where ν f, ℐ 89 is an estimate value of the volume of output
parameters of the function f on the basis of information on
output volumes of k previous runs of the same function
with the ℐ889 input parameters, using the weighting function
W.
IV.
HEFT ALGORITHM FOR THE PROBLEM-ORIENTED
SCHEDULING
We offer a list-based algorithm for problem-oriented
scheduling in cloud environments based on their computing
profiles. List-based scheduling involves the definition of
computational units' priorities and starting the execution
according to the received priority. The binding to highpriority tasks resources takes place first. The proposed
approach allows us to take into account the costs of
transmission of data between nodes, thereby reducing the
total time of execution of the workflow. The proposed
MIPRO 2016/DC VIS
algorithm is based on an algorithm of Heterogeneous
Earliest-Finish-Time (HEFT), but contains modifications
during the node level computation phase, and takes into
account the problem of calculating the incoming
communication value of its parent task [17].
Let Tw − be the size of the problem Tw , and the R be
the set of computing resources with an average processing
R
power R = 98d0 8 n. Then, the average time to
complete the task with all available resources is calculated
as
𝑇{
(2)
𝑅
Let Tw} be the amount of data transferred between tasks
Tw and T} , and R be the set of available resources with an
R
average capacity of data transfer R = 98d0 ~ n. Then the
average score on data transfer costs between tasks Tw and
T} for all pairs of p.
𝐸 𝑇{ =
𝑇{€
(3)
𝑅
Thus, the priority calculation unit may be defined as
𝐷 𝑇{€ =
𝑟𝑎𝑛𝑘 𝑇{ = 𝐸 𝑇{ +
+ max (𝐷 𝑇{€ + 𝑟𝑎𝑛𝑘(𝑇€ ))
(4)
†‡ ∈ˆJ‰‰(†Š )
where 𝑠𝑢𝑐𝑐(𝑇{ ) is the set of tasks that depend on the task
𝑇{ .
Thus, the task priority is directly determined by the
priority of all its dependent tasks. Assign tasks to the
resources as follows: a task with a higher priority if all the
tasks on which it depends, is appointed to the computing
resource, providing less time for the task [18].
Taking into account the specifics of the problemoriented cloud computing environment following
modifications apply to this algorithm.
Let F be the set of all functions that can be implemented
in the subject area. Then a separate problem Tw is a function
fw ∈ F with a set of input data objects ℐ 89 = I089 , … , I989 :
𝑇{ = 𝑓{ (ℐ{GH ),
We define R as the set of available for the deployment
virtual machines with mean production capacity
H
H
𝑅G
ΠG
𝑅 =
=
.
𝑛
𝑛
Gd0
Gd0
In this case, for evaluating the execution time we can
apply the following formula:
𝐸 𝑇{ = 𝜏 𝑓{ , 𝑅 , ℐ{GH
(5)
where τ fw , R , ℐw89 is an average execution time
estimation for the function fw on a set of machines with
mean production capacity R with the set of known values
of input parameters ℐw89 .
The model of problem-oriented services should take
into account the amount of data returned by each task Tw .
This may be used by the operator of the expected output
ν(f, ℐ 89 ), which returns the expected total size in bytes of
output data objects ℐ :;< . Consequently, within the
framework of problem-oriented model for the evaluation of
199
data transmission time between two tasks the following
estimation can be used:
𝐷 𝑇{€ = 𝜈 𝑓{ , ℐ{GH ∗ 𝑅{€ ,
(6)
where Rw} is the bandwidth of data transmission channel
in the cloud computing system. During the execution of
task it can be estimated as one of the following values:
1)
𝑅 = 0, when the data transmission channel consists
of a single node;
2)
𝑅 = 𝛽‘pIJ’ , when the data transmission channel is
shared by a group of nodes;
3)
𝑅 = 𝛽‰“JˆK”p ,, when the data transmission channel
is shared by a cluster of compute nodes.
Figure 1 shows the pseudo-code for algorithm of
problem-oriented workflow scheduling in a cloudcomputing environment based on computing profiles.
PROCEDURE: PO-HEFT
INPUT: TaskGraph G(T, E),
TaskDistributionList, ResourcesSet R
BEGIN
for each t T from task graph G
Approximate task execution
time according to (5)
for each e E from task graph G
Approximate data transfer time
according to (6)
Start the width-first search in
reverse task order and calculate a rank
for each task according to (4)
while T has unfinished tasks
TaskList <- get completed tasks
from task graph G
Schedule Task (TaskList, R)
Update TaskDistributionList
END
PROCEDURE: Schedule Task
INPUT: TaskList, ResourcesSet R
BEGIN
Sort TaskList in reverse task rank
order
for each t from TaskList
r <- get resource from R
that can complete t earlier
schedule t on r
update status of r
END
Fig. 1. Problem-oriented heuristic scheduling algorithm
PO-HEFT
V.
ALGORITHM IMPLEMENTATION AND PERFORMANCE
EVALUATION
In order to assess the proposed algorithm’s efficiency,
we had to develop a benchmark using Workflow Sim cloud
environment simulation platform. We have implemented
the PO-HEFT algorithm itself, as well as a naive brute force
algorithm that finds and ideal scheduling solution.
The algorithm was implemented as a number of Java
classes so that Workflow Sim can use it as the simulated
cloud environment’s scheduler. We have implemented both
a custom DatacenterBroker in order to schedule VMs in a
data center and a custom CloudletScheduler in order to
schedule tasks (cloudlets in CloudSim’s and Workflow
Sim’s terminology) in a single VM.
200
The algorithm was tested in a simulation in which
virtual machines with homogeneous characteristics have
been deployed. The simulated system was given the same
work flow 60 times, which greatly exceeds the capacity of
the system. For the distribution of the workflow we have
used: a scheduler that does not use the information about
the previous system runs that is built in Workflow Sim
itself, the perfect scheduler, which implements the ideal
scheduling through complete search space enumeration and
a scheduler based on the PO-HEFT algorithm, which uses
information about previous runs. The computational
complexity of the perfect scheduler does not allow its usage
in any non-trivial simulation and, therefore, this algorithm
is not present in this comparison. We have also tested
several algorithms such as plain HEFT, particle swarm
optimization and genetic algorithm and this will be a topic
for further research.
We plan to implement the developed cloud system and
model DAG, POS and Min-min algorithms behavior in
order to assess their efficiency.
VI.
CONLUSION
In this article, we assessed current scheduling
algorithms and defined a model that allows to evaluate
various cloud computing environment metrics for problemoriented scheduling. Next, we described the PO-HEFT
scheduling algorithm, which aims to provide workflow
scheduling in heterogeneous cloud environments. The main
distinctive feature of this algorithm is it's ability to adapt
the solution based on previous runs, which allows this
algorithm to provide better resource utilization.
The algorithm's efficiency was assessed in the
CloudSim with help of Workflow Sim extension cloud
environment simulation software. As a benchmark we used
CloudSim's built-in scheduler called "space-shared
scheduling policy" which uses round-robin for resource
provisioning and virtual machines creation. Our proposed
algorithm have shown significant efficiency gains over this
simple scheduler.
As a further development we will investigate the
possibility of deploying this algorithm at a real cluster in
order to assess its real-life, non-simulated performance. We
will also compare this algorithm against different
algorithms that do not use information about previous runs
in order to give an empirical prove that this is a viable
heuristic in workflow scheduling. We plan to extend the
algorithm in order to schedule not only tasks on machines
but also to schedule machine provisioning on virtual nodes.
REFERENCES
[1] S. V. Kovalchuk, P. A. Smirnov, K. V. Knyazkov, A. S. Zagarskikh,
and A. V. Boukhanovsky, “Knowledge-based expressive
technologies within cloud computing environments,” Adv. Intell.
Syst. Comput., vol. 279, pp. 1–11, 2014.
[2] D. I. Savchenko, G. I. Radchenko, and O. Taipale, “Microservices
validation: Mjolnirr platform case study,” 2015 38th Int. Conv. Inf.
Commun. Technol. Electron. Microelectron. MIPRO 2015 - Proc.,
pp. 235–240, 2015.
[3] J. Thones, “Microservices,” IEEE Softw., vol. 32, no. 1, pp. 116–116,
2015.
[4] H. Kobashi, S. Kawata, Y. Manabe, M. Matsumoto, H. Usami, and
D. Barada, “PSE park: Framework for problem solving
environments,” J. Converg. Inf. Technol., vol. 5, no. 4, pp. 225–239,
2010.
MIPRO 2016/DC VIS
[5] E. DEELMAN, D. GANNON, M. SHIELDS, and I. TAYLOR,
“Workflows and e-Science : An overview of workflow system
features and capabilities,” FGCS. Futur. Gener. Comput. Syst., vol.
25, no. 5, pp. 528–540.
[6] A. Shamakina, “Brokering service for supporting problem-oriented
grid environments,” UNICORE Summit 2012, Proc., vol. 15, pp. 67–
75, 2012.
[7] M. Guzek, D. Kliazovich, and P. Bouvry, “A Holistic Model for
Resource Representation in Virtualized Cloud Computing Data
Centers,” IEEE Int. Conf. Cloud Comput. Technol. Sci., pp. 590–598,
2013.
[8] P. Bouvry and B. Service, “Review Article A Survey of Evolutionary
Computation for Resource Management of Processing in Cloud
Computing,” no. may, pp. 53–67, 2015.
[9] C. Chen, J. Liu, Y. Wen, and J. Chen, “CCIS 495 - Research on
Workflow Scheduling Algorithms in the Cloud,” Ccis, vol. 495, pp.
35–48, 2015.
[10] D. Kliazovich, J. E. Pecero, A. Tchernykh, P. Bouvry, S. U. Khan,
and A. Y. Zomaya, “CA-DAG: Modeling Communication-Aware
Applications for Scheduling in Cloud Computing,” J. Grid Comput.,
2015.
[11] A. Hirales-Carbajal, A. Tchernykh, R. Yahyapour, J.-L. GonzalezGarcia, T. Roblitz, and J. M. Ramirez-Alcaraz, “Multiple workflow
scheduling strategies with user run time estimates on a Grid,” J. Grid
MIPRO 2016/DC VIS
Comput., vol. 10, no. 2, pp. 325–346, 2012.
[12] A. Tchernykh, L. Lozano, U. Schwiegelshohn, P. Bouvry, J. E.
Pecero, S. Nesmachnow, and A. Y. Drozdov, “Online Bi-Objective
Scheduling for IaaS Clouds Ensuring Quality of Service,” J. Grid
Comput., 2015.
[13] J. Yu, R. Buyya, and K. Ramamohanarao, “Workflow Scheduling
Algorithms for Grid Computing,” Springer Berlin Heidelb., vol. 146,
pp. 173–214, 2008.
[14] L. B. Sokolinsky and A. V. Shamakina, “Methods of resource
management in problem-oriented computing environment,”
Program. Comput. Softw., vol. 42, no. 1, pp. 17–26, 2016.
[15] J. J. Dongarra, P. Luszczek, and A. Petite, “The LINPACK
benchmark: Past, present and future,” Concurr. Comput. Pract. Exp.,
vol. 15, no. 9, pp. 803–820, 2003.
[16] “WPrime Systems. Super PI. 2013.” [Online]. Available:
http://www.superpi.net/ . [Accessed: 14-Nov-2015].
[17] H. Topcuoglu, S. Hariri, and I. C. Society, “Performance-Effective
and Low-Complexity,” Parallel Distrib. Syst. IEEE Trans., vol. 13,
no. 3, pp. 260–274, 2002.
[18] S. Chen, Y. Wang, and M. Pedram, “Concurrent placement, capacity
provisioning, and request flow control for a distributed cloud
infrastructure,” p. 279, 2014.
201
Towards a Novel Infrastructure for Conducting High Productive Cloud-Based
Scientific Analytics
∗
Peter Brezany∗ , Thomas Ludescher† and Thomas Feilhauer‡
Research Group Scientific Computing, Faculty of Computer Science, University of Vienna, Vienna, Austria
and SIX Research Centre, Brno University of Technology, Brno, Czech Republic
Email: peter.brezany@univie.ac.at
† Department of Computer Science, University of Applied Sciences, Dornbirn, Austria
Email: thomas@ludescher.at
‡ Department of Computer Science, University of Applied Sciences, Dornbirn, Austria
Email: thomas.feilhauer@fhv.at
Abstract—The life-science and health care research environments offer an abundance of new opportunities for improvement of their efficiency and productivity using big data
in collaborative research processes. A key component of this
development is e-Science analytics, which is typically supported
by Cloud computing nowadays. However, the state-of-the-art
Cloud technology does not provide an appropriate support
for high-productivity e-Science analytics. In this paper, we
show how productivity of Cloud-based analytics systems can
be increased by (a) supporting researchers with integrating
multiple problem solving environments into the life cycle of
data analysis, (b) parallel code execution on top of multiple
cores or computing machines, (c) enabling safe inclusion of
sensitive datasets into analytical processes through improved
security mechanisms, (d) introducing scientific dataspace–a
novel data management abstraction, and (e) automatic analysis
services enabling a faster discovery of scientific insights and
providing hints to detect potential new topics of interests. Moreover, an appropriate formal productivity model for evaluating
infrastructure design decisions was developed. The result of
the realization of this vision, a key contribution of this effort,
is called the High-Productivity Framework that was tested and
evaluated using real life-science application domain addressing
breath gas analysis applied e.g. in the cancer treatment.
Keywords-productivity model, scientific analysis, scientific
studies, cloud computing, security, breath gas analysis
I. I NTRODUCTION
e-Science refers to the modern large scale science that
is increasingly being carried out through distributed global
collaborations enabled by the Internet. Typically, a feature of
such collaborative scientific enterprises is that they require
access to very large data collections and very large scale
computing resources back to the individual user scientists.
A key component of this development is e-Science analytics, a dynamic research field, that includes rigorous
and sophisticated scientific methods of data preprocessing,
integration, analysis, and visualization. Unlike traditional
business analytics, e-Science analytics has to deal with
huge, complex, heterogeneous, and very often geographically distributed datasets, the volume of which is already
202
measured in petabytes. Because of the huge volume and
high-dimensionality of data, the associated analytics tasks
are typically data and compute intensive; therefore, they
are characterized together as a resource-intensive analytics.
Moreover, a high level of security has to be guaranteed for
many analytical applications, e.g., in finance and medical
sectors.
So far, the main focus has been on the functionality of
Cloud-based analytics systems, and not on the productivity
aspects associated with their development, use and impacts
on expected scientific discoveries. In the high performance
computing domain, a special US DARPA program and
national programs in several countires have been focused
on providing a new generation of economically viable high
productivity computing systems for national security and for
the industrial and research user communities. We believe that
a similar effort is needed in the Cloud research domains,
especially, e-Science analytics.
In scientific analysis, when time-consuming calculations
are performed, the execution time can be improved by using
more and more CPU power and calculating the results in
parallel, but the costs for execution will often be increased
with the reduced time consumption. Evaluating the productivity helps companies and research institutes to achieve
the best possible output with minimal costs and time. Our
research presented in this paper addresses exactly this issue.
Its main contribution is a formal model enabling to evaluate
and predict key productivity paramenters associated with
the development and execution of modern scientific analysis
software systems. Furthermore, we address several other
potential sources enabling to increase the productivity and
provide appropriate original solutions. For better understanding, we provide several example scenarios. The productivity
model and the different approaches are combined in the
High-Productivity Framework (HiProF) that we designed
and implemented. Its prototype software, documentation,
users guide, and a test infrastructure are available on the
Internet.
MIPRO 2016/DC VIS
Figure 1.
Research Questions and the HiProF Response.
During our research, while developing a high productivity
infrastructure for breath gass analysis called the ABACloud1 [1], we came across two major research questions
that are shown in Figure 1 in the HiProF context. The first
question “How can the productivity be estimated?” is very
important to evaluate the effect of decisions (e.g., increase
number of worker nodes to achieve the results in less time).
Without a generally usable model, these changes cannot
be measured or used to make the best decision in terms
of increasing the overall productivity. The second question
“How can the productivity be improved by technology?”
deals with several different ways to increase the overall
productivity in scientific analysis. The Security Concept,
Code Execution Framework, Dataspace Concept, and Automatic Analysis Framework are four of the most promising
contributions, which are used in different sample scenarios
introduced in the subsequent paragraphs. The Security Concept compares the productivity influence with and without
using suitable security mechanisms. The Code Execution
Framework (CEF) enables researchers to execute different
Problem Solving Environment (PSE) codes in parallel in a
cloud-based infrastructure, even if the corresponding PSE is
not installed locally. We proposed and realized the concept
1 It primarily supports the development and reproducibility of scientific
studies in the context of Problem Solving Environments, such as MATLAB,
R, and Octave and Cloud-enabled workflows.
MIPRO 2016/DC VIS
of scientific dataspace that involves and manages numerous
and often complex types of primary, derived and background
data in an intelligent way that is based on the Semantic
Web principles. The Automatic Analysis Framework (AAF)
can be used to pre-analyze newly collected data to detect
dependencies or data quality issues. As a next step, the AAF
can be used to find new insights into existing research areas
or even find new topics of interest. All these components
(Formal Productivity Model, Security Concept, Dataspace
Concept, Code Execution Framework, Automatic Analysis
Framework) are part of HiProF.
Developers of high-productivity infrastructures can select
the HiProF technologies and integrate them in their own infrastructures. All provided technologies can be used optionally. If private data (e.g., patient data) are involved and user
interactions in the infrastructure must be traced down to the
lowest level of system (e.g., database), the provided cloudenabled security concept simplifies the development of the
infrastructure to fulfill such a requirement. Installing and
using the CEF in a high-productivity infrastructure enables
researchers to execute problem solving environment codes
(MATLAB, R, and Octave) in a cloud-based infrastructure.
The CEF focuses on saving time and cost to increase the productivity. The AAF supports researchers during their daily
work if continuous data is collected. Existing algorithms
should be performed at these new data. Furthermore, the
developer or administrator of the infrastructure can use the
provided formal productivity model to make infrastructure
decisions (e.g., size of the private cloud infrastructure).
Depending on the requirements of the new infrastructure,
the model can be simplified by reducing unnecessary cost
variables (e.g., remove the license costs if only open source
software is used or the required licenses are already available
and do not generate additional costs).
The rest of the paper is organized as follows. Section Background and Related Work contains a well known
definition of productivity and the related work. Section Formal Productivity Model presents a general formula to calculate the productivity. The expected productivity improvements of the most promising approaches are discussed
theoretically afterwards in Section Productivity Model Usage Scenarious. Finally, we briefly summarize the results
achieved in Section Conclusions.
II. BACKGROUND AND R ELATED W ORK
This section provides a general definition of the term
productivity and the related work addressing productivity
in the field of computer science.
A very common and general definition of productivity is
listed at [2]: “Productivity is a measure relating a quantity
or quality of output to the inputs required to produce it.”
Productivity plays an important role in almost every
discipline. It can be used to generate a better result, to reduce
the accruing costs, to evaluate changes of infrastructure used,
203
to evaluate the impact of the quality of produced final results
(e.g., scientific discoveries) on the appropriate domains.
Productivity has been originally informally defined in
economics2 [3] as the amount of output per unit of input,
or, in other words, productivity is a ratio of production
output to what is required to produce it (input). E.g.: In
a computer factory the productivity can be calculated by the
number of computers divided by the working hours needed,
or inversely the number of working hours to assemble a
computer. The productivity can be increased by rising the
output or decreasing the working hours needed. This can
exemplarily be done by (a) increasing the skills or motivation
of the employees, (b) providing better working conditions
(e.g., equipment), or (c) providing an automated plant.
In the year 2004, a series of ten papers about the topic
High Performance Computing (HPC) productivity was published in the International Journal of High Performance
Computing Applications [4]. The different authors share
their point of views on this topic. H. Zima arranged the
workshop “High-Productivity Programming Languages and
Models” [5] in 2004. The goal of this workshop was to
design a new high-productivity language system and to find
a consensus on a set of common research strategies.
Figure 2. Different utility functions depending on time; modified from [6]
This figure shows different time-dependent utility functions. The users of
the productivity model can choose the best fitting function for their need.
III. F ORMAL P RODUCTIVITY M ODEL
J. Kepner [6] provides the following productivity formula:
Productivity = Ψ =
Utility
U(T)
U(T)
=
=
Costs
C
c(T)
(1)
2 The terms productivity and economic purpose partly the same goals.
The term economic mainly tries to reduce the costs. The term productivity
additionally includes the required time and the result/output.
204
U (T ) represents the time dependent utility function,
which is described in Section Utility. C stands for the total
costs. The total costs can equally be described by a function
with the required time (c(T )). The costs are further described
in Section Costs. T is the vector of all relevant time values
(e.g., maintenance time, execution time, etc.). This formula
is used in this paper to illustrate its support for decision
making and improving the overall productivity.
Utility
The Utility is the value a specific user or organization
defines on getting a certain answer in a certain time [6]. In
general, the utility function is time dependent. A computer
manufacturer has a higher benefit if the production process
of a computer takes a couple of minutes instead of several
hours. Furthermore, the benefit increases if the quality is
improved or a more powerful computer is produced.
Figure 2 shows different time dependent utility methods.
Its part Figure 2(a) shows a continuously decreasing utility
function. This function can be used for many different
application areas. Let’s just imagine that you are waiting
for a bus. If the bus arrives exactly at the same time as you
arrive at the bus station you have the highest benefit — if you
must wait for a while, the benefit will continuously decrease.
Figure 2(b) shows a step utility method. This kind of curve
can, for example, be used for a weather forecast simulation.
If the simulation takes longer than the time for which the
weather should be predicted, your output is useless, because
you already have seen the real weather. Figure 2(c) shows
a constant function that can exemplarily be used in a nontime critical scientific analysis. Figure 2(d) shows a multi
step curve, for example a 3D movie production needs to
render their scenes after all changes have been made. The
highest benefit is, when the user can see the result right
away or after a short coffee break. If he/she needs to wait
over night, the utility decreases, but at least the work can
be continued at the next working day. It is useless if the
user must wait for example one week to see the output of
his/her changes and in this case the utility is zero. A step
function can be the right choice for a researcher who works
with scientific analysis. If the analysis process needs only
a couple of seconds, the researcher has the highest benefit.
If the computation needs some hours, the utility decreases,
but at least the researcher has access to the results on the
next working day. Nevertheless, the benefit is probably zero
if the complete analysis needs much longer (e.g., years).
Depending on the given problem, it is very important to
choose the right utility function.
Costs
The total costs can be divided into total software costs CS ,
ownership costs CO , hardware/machine costs CM , personnel
costs CP , and data costs CD . We can express the same costs
dependent on the time (cS +cO +cM +cP +cD )×T , where cx
MIPRO 2016/DC VIS
are the costs per unit of time and T is a vector of all relevant
time values. The software costs can further be divided into
license costs CL for licensed software and full development
costs CD if it is a self developed software (CS = CL +
CDEV ). The ownership costs contain the energy costs CE ,
costs for buildings CB , and maintenance costs CM (CO =
CE +CB +CMA ). Dealing with data usually implies additional
costs. These costs accumulate during (a) data gathering CGA
(e.g., collecting sensor data), (b) data generation CGE (e.g.,
generating a random time series), data storage CST (e.g.,
store and backup data), data transfer CT (e.g., transmit data
to the Amazon S3), and data transformation or conversion
CC (e.g., convert a HDF5 file [7] to a CSV file), that result
in CD = CGA + CGE + CST + CT + CC .
Ctotal
ctotal
Ctotal
= CS + CO + CM + CP + CD
= cS + cO + cM + cP + cD
= ctotal × T
process can be almost impossible (e.g., if you need the date
of birth and the patient’s address for further analysis) or
it can be time-intensive which increases the total costs. If
a researcher is not allowed to access all relevant data, the
availability of the input data decreases which determines the
quality of the output as well. Both disadvantages decrease
the total productivity.
The data quality, data volume, or data availability related
utility function can exemplarily be a continuously increasing
function or normal step function (Figure 3). The step
function can be selected, for example, in the following
scenario. If security is available, all data can be used and
the utility value is maximized, otherwise only a subset of the
data is allowed to be used and the utility value is reduced.
(2)
(3)
(4)
The formula for the total productivity:
Ψ
=
Ψ
=
U (T )
CS + CO + CM + CP + CD
U (T )
(cs + cO + cM + cP + cD ) × T
(5)
(6)
The last two overall formulas can be used to calculate the
productivity in computer science. These costs contain all
accumulated costs (e.g., development, usage, maintenance,
license, data). Depending on the given scenario several
particular costs are zero or can be neglected. This model can
be used to calculate the productivity in resource-intensive
scientific analysis (data-intensive and/or compute intensive).
IV. P RODUCTIVITY M ODEL U SAGE S CENARIOS
The following sections describe how productivity can be
measured and increased by the means of several example
scenarios.
Using a Security Concept
The utility method does not only depend on the time, as
shown in Section Utility. The quality of the output depends
on the quality, volume, and availability of the input data as
well. The poor quality results in a lower valuable output
of the data analysis (utility function) and the additional
costs increase the total costs [8]. Both affect the overall
productivity negatively.
In many e-Science studies personal related data is involved. The privacy of this data must be assured and all
data has to be stored in a secure place during its whole
lifetime. Without security, either this sensitive data must
be anonymized or the researcher is not allowed to use this
private data for the analysis. The original sensitive data is not
available for the analysis procedure, which possibly affects
the volume of the available input data. The anonymization
MIPRO 2016/DC VIS
Figure 3.
Different utility functions, depending on data quality, data
volume, or data availability This figure shows different data-dependent
utility functions. These functions depend on data quality, data volume, or
data availability.
It is hard to determine a precise rate for the utility in practice. The utility method of e-Science studies can be defined
depending on the availability (with security mechanism) of
the input data and the time (U (T ime, Data) = U (t, D)).
To simplify the diagram, a continuously decreasing utility
function for the time dependencies (see Figure 2(a)) and a
step function for the data quality (see Figure 3(b)) is used.
Figure 4 shows the total utility function of the supposed
scenario. The highest output can be reached if the result is
available as soon as possible and if the complete data is
available.
Nevertheless the security overhead increases the total
U↑ 3
costs as well (Ψ = C↑
). The security overhead (e.g., Kerberos overhead) mainly depends on the infrastructure used
(e.g., network connection, execution time). The following
example shows how the productivity can be compared with
and without security in general. To calculate the productivity
threshold of a system with and without security, we defined
formula 7. For simplicity, we assume that the security
overhead influences all costs equally (e.g., personnel costs,
energy costs). U (t, D) is the utility function without security
features, US (t, D) describes the function with security. tS
represents the time additionally required for the security
(overhead).
3 The up arrows indicate that the expected value increases. On the other
hand, a down arrow would indicate that the expected value decreases.
205
Using a cloud-based data analysis infrastructure
Figure 4. Total utility function In this figure the utility function depends
on time (continuously decreasing) and on the data (step function).
U (t, D)
US (t + tS , D)
<
ctotal · t
ctotal · (t + tS )
Formula 7 can be rewritten as:
(7)
US (t + tS , D)
tS < t
−1
(8)
U (t, D)
If we assume that (a) the utility value with security is
larger than the utility function without security US (t + tS ) >
U (t) and (b) the execution time is much higher than the
overhead of the security t >> tS , the productivity of the
secure system is always better than the productivity of the
insecure system. In e-Science studies, where sensitive data
(e.g., personal related data) is involved, the productivity
increases when security is provided and more available
data can be used for analysis. Without security concept,
the data must be copied by hand which is time-intensive
and error-prone. The extra time increases the personnel
costs and invalid data decreases the utility function. Both
disadvantages affect the productivity in a negative way. More
details of a security concept we have developed are presented
in [9].
Dataspace Concept
Our dataspace system [10] involves and manages the
primary data captured from the application, data derived by
curation and analytics processes, background data including
ontology and workflow specifications, semantic relationships between dataspace items based on scientific research
ontologies, and available published data. Additionally, it
provides advanced querying mechanisms about the contents
and relationships in the dataspace and enables productive
reproducibility of scientific studies. Our formal productivity
model can be applied to this HiProF component in a similar
way as we showed it in the previous subsection.
206
A cloud-based data analysis infrastructure enables researchers to execute different problem solving environment
(PSE) codes (e.g., MATLAB, R, and Octave code) in parallel
in the cloud. The Code Execution Framework (CEF) [13]
component of the HiProF is a kernel system, which is used
in such an infrastructure. The same PSE method can be
executed with different parameter sets in parallel (embarrassingly parallel problems). The CEF can be installed at
a local Eucalyptus infrastructure, Amazon EC2, or every
other Amazon compatible cloud-based infrastructure. Different infrastructures can be combined to use a hybrid cloud
(e.g., local Eucalyptus for general usage, for high workloads
additional EC2 worker nodes are used). Furthermore, it is
possible to combine different PSE types within one single
analysis (e.g., MATLAB code can be transmitted for execution at the CEF with the R-client and vice-versa). In [12]
various scenarious are analyzed in details using our formal
productivity model. It is proven there that using the CEF
results in a faster execution and the productivity increases.
Automatic Analysis System
An automatic analysis system helps researchers to automatically receive information about existing data. For
example, the Automatic Analysis Framework (AAF) [11]
uses classification, prediction, and clustering algorithms to
automatically analyze research data. The AAF is built upon
the workflow management system Taverna, which is widely
used in many different domains and uses the data management system, based on the Dataspace Concept [10]. It helps
researchers to (a) find new topics of interest or (b) gain new
insights in their existing research area. Figure 5 shows a
general workflow from this automatic analysis. The AAF
can be continuously or in regular intervals executed by the
CEF (e.g., once per week).
Input Data
Data
Selection
Linear
Methods
Neural
Network
Report
Generation
Data Preparation
...
Principal
Component
Analysis
Data Analysis
Result Presentation
Figure 5. General AAF Workflow This data mining workflow is divided
into (a) Data Preparation, (b) Data Analysis, and (c) Result Presentation
section.
For calculating the productivity, it is important to know
that (a) all automatic analysis calculations have lowest
priority, meaning that they will only be executed at the CEF
MIPRO 2016/DC VIS
if no other calculation is waiting, and (b) the output will
be evaluated and ranked automatically, depending on the
percentage of the highest number of correctly classified test
samples.
The productivity of the AAF can be calculated as follows:
Ψ
=
U (T ) ↑
CS + CE ↑ +CB + CMA ↑ +CM + CP ↑
(9)
The AAF uses the CEF only when no other calculation
is waiting. Otherwise it waits until the CEF is running idle.
Therefore the software costs (CS ), building costs (CB ), and
machine costs (CM ) are zero. The energy costs CE and
the maintenance costs of the AAF increase, because of
the additional usage and further AAF configurations. After
the analysis, a researcher must evaluate the result which
means that the personnel costs (CP ) are rising. The system
automatically ranks the result which allows the researcher
to skip non-promising results.
Ψ
=
U (T ) ↑
CE ↑ +CMA ↑ +CP ↑
(11)
V. C ONCLUSIONS
This paper introduces a novel productivity model that
can be used in resource-intensive scientific analysis software
development processes. Productivity plays an important role
to achieve the best results in terms of time, costs, and
quality. Our model is an extended productivity formula from
J. Kepner with additional cost variables (personnel costs,
license costs, energy costs, building costs, maintenance
costs, and data costs). These adaptions are necessary in order
to be able to calculate the productivity of time-intensive
and data-intensive scientific analysis. We defined different
approaches to increase the overall productivity for these
types of application. Furthermore, we explained the usage of
the productivity model with several example scenarios. The
scenarios are mathematically described and the threshold
formulas are provided. With these formulas and examples,
an administrator can evaluate system adaptions (e.g., change
the infrastructure by adding additional worker nodes, using
faster computers, using a hybrid Code Execution Framework, etc.) and determines how the changes affect the
productivity.
The presented framework was tested and evaluated using
real life-science application domain addressing breath gas
analysis applied e.g. in the cancer treatment.
MIPRO 2016/DC VIS
The research leading to these results has received funding
from the Austrian Science Fund (Project No. TRP 77-N13)
and the Czech National Sustainability Program supported by
grant LO1401.
R EFERENCES
[1] I. Elsayed et al., “ABA-Cloud: support for collaborative
breath research,” Journal of Breath Research, vol. 7,
no. 2, pp. 026 007–026 007, 2013. [Online]. Available:
http://eprints.cs.univie.ac.at/3981/
[2] About.com Economics, “Productivity - Dictionary
Definition of Productivity,” http://economics.about.com/od/
economicsglossary/g/productivity.htm, Accessed July 2013.
[3] OECD, “Measuring Productivity - OECD Manual,” http://w
ww.oecd-ilibrary.org/content/book/9789264194519-en, 2001.
[4] Internation Journal of High Performance Computing Applications, “Table of contents, winter 2004, 18 (4),”
http://hpc.sagepub.com/content/18/4, 2004.
(10)
To increase the productivity, only the utility value must be
larger than zero. The project leader must predict the utility
function. If the project leader expects a valuable output of
the system, the productivity will be increased when using
the AAF in comparison to using no automatic analysis.
U (T ) > 0 =⇒ Ψ > 0
ACKNOWLEDGMENT
[5] H. Zima, “Workshop on high-productivity programming
languages
and
models,”
http://www.cs.illinois.edu/
homes/wgropp/bib/reports/hpl report.pdf, 2004.
[6] J. Kepner, “High Performance Computing Productivity Model Synthesis,” International Journal of
High Performance Computing Applications, vol. 18,
no. 4, pp. 505–516, 2004. [Online]. Available:
http://hpc.sagepub.com/content/18/4/505.abstract
[7] The
HDF
Group,
“ADF
Group
HDF5,”
http://www.hdfgroup.org/HDF5, Accessed Dec 2012.
[8] D. M. Strong et al., “Data quality in context,” Commun. ACM,
vol. 40, no. 5, pp. 103–110, May 1997.
[9] T. Ludescher, T. Feilhauer, and P. Brezany, “Security Concept
and Implementation for a Cloud Based E-science Infrastructure,” 2012 Seventh International Conference on Availability,
Reliability and Security, vol. 0, pp. 280–285, 2012.
[10] I. Elsayed and P. Brezany, “Dataspace support platform for
e-science,” Computer Science, vol. 13, no. 1, 2012. [Online].
Available: http://journals.agh.edu.pl/csci/article/view/14
[11] T. Ludescher et al., “Towards a high productivity automatic
analysis framework for classification: An initial study,” in Advances in Data Mining. Applications and Theoretical Aspects,
P. Perner, Ed. LNCS, Springer, 2013, vol. 7987, pp. 25–39.
[12] T. Ludescher, “Towards High-Productivity Infrastructures for
Time-Intensive Scientific Analysis,” Ph.D. Thesis, Faculty of
Computer Science, University of Vienna, Austria, 2013.
[13] T. Ludescher, T. Feilhauer, and P. Brezany, “CloudBased Code Execution Framework for scientific
problem solving environments,” Journal of Cloud
Computing:
Advances,
Systems
and
Applications,
vol. 2, no. 1, p. 11, 2013. [Online]. Available:
http://www.journalofcloudcomputing.com/content/2/1/11
207
An OpenMP runtime profiler/configuration tool for
dynamic optimization of the number of threads
Tamara Dancheva, Marjan Gusev, Vladimir Zdravevski, Sashko Ristov
Ss. Cyril and Methodius University, Faculty of Computer Science and Engineering
1000 Skopje, Macedonia
Email: tamaradanceva19933@gmail.com, {marjan.gushev, vladimir.zdraveski, sashko.ristov}@finki.ukim.mk
Abstract—This paper describes the implementation and the
experimental results of a tool for dynamic configuration of the
number of threads used in the OpenMP environment based on
the current state of the runtime at the time of the call. For
this purpose, we use a mix of profiling and machine learning
techniques to determine the number of threads. The decision to
set the number of threads is made at the time of the call.
The proposed approach is designed to be cost effective in the
scenario of a highly dynamic runtime primarily when running
relatively long tasks consisting of a number of parallel constructs.
I. I NTRODUCTION
The OpenMP API was designed with a goal to provide
a simple, fast, compact and portable way of parallelizing
programs [1]. Computer architectures have diversified and
evolved a great deal since the first release requiring OpenMP to
evolve as well. Today, considering the prevalence of SMPs and
SMT on the market, the massive virtualization and ccNUMA
architectures, the default schemes that have sufficed in the
past are no longer producing the desired results with the same
consistency as before.
Multi-cache architectures, hyper threading and multi-core
techniques helped prevent the potential stagnation in performance due to the limits imposed by physics with a cost. The
intensive exploitation of virtualization techniques especially
lately (of memory, computing power, application, operating
systems) only enhance the complexity introduced.
In order to get a good performance, programmers as well
as scientists have had to start studying the hardware characteristics, cache associativity, swapping, virtualization and many
other low-level hardware and software characteristics. The
main problem is the diversity of resource sharing mechanisms
in different architectures, most notably the SMT [2]. OpenMP
has a default scheme that is implementation specific, meaning
that OpenMP implementations use different default policies
to assign and interpret the internal control variables (ICVs).
Additionally, in virtual machines, the number of threads is
affected by the way different hypervisor present and assign
processors and virtual cores to the virtual machine. The default
scheme is static and tends to use the number of physical cores
as the default number of threads. This does not exploit the
hyper threading automatically and whether it is a good or bad
decision depends on the interconnection, location, resource
sharing policies and mechanisms of the processors involved
as well as how the program utilizes them.
208
Having a tool that can automatically learn the formula for
the optimal number of threads without previously gathering
explicit user feedback on the program to be ran can significantly automate the process of producing better performing
algorithms.
Initially, a suite of benchmarks is ran concurrently using
pThreads. Using profiling to capture the state of the runtime
while running the suite, an output dataset is created. This
dataset is used as an input for the creation of random trees
[3] (the forest tree algorithm [4]). This is a machine learning
algorithm that uses a supervised approach to train a number
of decision trees that all vote on the number of threads i.e
classifying the current runtime state, designed in such a way
as to avoid overfitting (generate bad predictions because the
tree is too specific to a subset of all the possible inputs or
runtime states).
In this paper we aim at testing the validity of the following
hypotheses:
H1 There exists a subset or multiple subsets of dataset
attributes for which the predictions generated by the
random forest/tool does not degrade performance in any
of the test cases relative to the performance achieved
by assigning the number of cores to the num threads
OpenMP variable.
H2 There exists a subset or multiple subsets of dataset
attributes for which the tool boosts performance in
comparison with the default OpenMP scheme either by
increasing or decreasing the number of threads.
Further details are disclosed in the following sections.
Implementation details of the proposed approach are given
in Section II. The experiments are described in Section III
and results in Section IV. Relevant discussion is given in
Section V, related work in Section VI and conclusions in
Section VII.
II. I MPLEMENTATION D ETAILS
This section describes the approach used to optimize the
number of threads for an OpenMP program.
Retrieval of the runtime parameters of an OpenMP program
is done using the SIGAR API [5]. The SIGAR API is an open
source cross-platform library that offers helps in parsing and
locating major system parameters including CPU usage, ram
usage, virtual memory IO traffic (total and by process) as well
as more detailed information about the processes, the hardware
MIPRO 2016/DC VIS
TABLE I
L IST OF ATTRIBUTES USED
num threads
cpu usage
pid cpu usage
ram usage
pid ram usage
pid page faults
vm usage
pid vm usage
pid vm read usage
pid vm write usage
processes
procs running
procs blocked
number of threads for one run
cpu usage for all processes
cpu usage for the current process
ram usage for all processes
ram usage for the current process
page faults for the current process
virtual memory usage for all processes
virtual memory usage for the
current process
virtual memory reads percentage
for the current process
virtual memory writes percentage
for the current process
# of processes and threads created
# of running processes and threads
# of blocked processes and threads
features and the operating system running on the machine. An
indicator for its quality is that it is a framework suitable for use
in cloud environments for measuring hypervisor performance
[6]. It offers an extensive set of utilities to monitor the state
of a highly dynamic runtime which makes it a suitable tool
for the kind of measurements needed for this tool. A sample
of all potential dataset attributes is given in Table I.
PThreads is a an implementation of thread-level parallelism [7]. It is used to concurrently run a suite of OpenMP
benchmarks. The benchmarks are ran for a range of threads
[1, max num threads] constituting one run. This is done
50 times per benchmark by default. The optimal number
of threads from the specified range is found using the best
performance time yielded during each consecutive run. The
corresponding runtime state is then written to disc.
The OpenMP algorithm which utilizes the tool has to call
the omp set num threads function before each parallel block
to invoke the wrapper function. The number of threads is
predicted using the random forest and used as a parameter
in the new handle which is returned.
III. D ESCRIPTION OF E XPERIMENTS
The goal of running the experiments is to simulate as
many states of the runtime as possible so that a representative
decision tree can be constructed out of the provided dataset.
This is closely related to the parameters that are used to
represent the state and is essential to be carefully configured
so that performance boost can be obtained. Random delays are
enforced in between runs in order to accomplish this better.
The dataset is boosted by generating additional dataset
records using the existing dataset records with repetition.
Basically the experiment is repeated a number of times, and
this helps to break the independency, or decrease its negative
impact on the decision trees that are to be generated out
of the dataset by adding a variance to the output that helps
remove some of the initial bias. This works for small biases
only. Additionally, boosting enables us to get p-values and
provides a good basis for running statistical tests and deriving
conclusions about our data. Therefore, the parameters have to
be very carefully selected. Additionally, random forest does
MIPRO 2016/DC VIS
TABLE II
E XPERIMENTAL ENVIRONMENT
FOR PROFILING
Benchmark programs
Hardware
OS
Dijkstra, Multitask, Poisson [8]
HP Probook 650 i5-4210M RAM 4GB
Debian, GNU Linux 6.0
TABLE III
L IST OF ATTRIBUTES INCLUDED IN THE DATASETS
Dataset 1
Dataset 2
Dataset 3
Dataset 4
all attributes listed in Table I
num threads, cpu usage ,pid cpu usage,
lst cpu usage, ram usage, pid page faults,
procs blocked, user, sys, idle, irq
num threads,pid cpu usage,lst cpu usage,
nice, idle, wait, irq, soft irq
num threads, pid ram usage, ram usage,
pid cpu usage, cpu usage, procs blocked
nice, idle, user, sys, wait,
irq, soft irq, lst cpu usage
not require any normalization of the dataset, since it considers
each attribute independently [4].
Each forest is created the first time when the call to
set num threads is made.
After the initial configuration, the forest is used to predict
the number of threads for the parallel region in the OpenMP
algorithm. The overall results are 50-150 obtained values of
elapsed times in both cases by using the random forest for
prediction, and by utilizing OpenMP default schemes.
The experimental environment is presented in Table II.
The experiments are based on using 4 subsets of the
parameters listed in Table I. The corresponding parameters
are listed in Table III.
The overall performance is analyzed in two environments.
No stress environment is the one where the experiments are
executed with no additional workload. A stress environment
is simulated using the stress program package for POSIX
systems that imposes stress on the system (CPU, memory, I/O,
disk stress) by starting various number of threads concurrently
to the experiments.
Three experiments are conducted as OpenMP default
schemes and four more experiments using the tool with
different dataset. To evaluate the speedup of using the tool
and the default configuration we compare the corresponding
elapsed times.
Algorithm I conducts exhaustive search to find a combination of input gates to produce logical circuit with output 1 [8].
It has a repetitive structure without complex data dependencies
and exploits a can high degree of parallelism. Algorithm II is
an implementation of the Fast Fourier Transformation (FFT)
algorithm with a lot of data-flow dependencies which do not
allow a high level of parallelism.
IV. R ESULTS
A. Algorithm I in a No Stress environment
Fig. 1 presents the results (elapsed times) obtained with the
default OpenMP scheme setting the number of threads variable
to 1, 2 and 4. The X-axis represents the number of conducted
209
Fig. 1. Performance of executing the Algorithm I in a no stress environment
Fig. 3. Performance of executing the Algorithm II in a no stress environment
with default OpenMP schemes
Fig. 4. Performance of executing the Algorithm II in a no stress environment
using the tool with Datasets 1 - 4
Fig. 2. Histograms for the 4 datasets for executing Algorithm I in a no stress
environment.
test runs. Four separate measurements are conducted to evaluate the tool performance results. Each measurement uses a
specific dataset defined in Table III to construct the forest. It
is used to predict the number of threads, labeled in the graph
with Dataset 1-4.
A more detailed information about the output number of
predicted threads with the tool is presented in Fig. 2. A total
of 150 test runs of Algorithm I is utilizing the same 4 forests
used for the performance measurements in Fig. 1 labeled with
Dataset 1-4 in the decision making.
The default OpenMP scheme yields best results with 4
threads (value of the OpenMP num threads variable). Using
this reference value for the optimal choice of the number of
threads, Dataset 2 and Dataset 4 result in a forest that most
accurately predicts the optimal number of threads. This is
presented in Fig. 2 pointing out the 100% accuracy for Dataset
2 and 96% accuracy for Dataset 4.
For machines with the same hardware configuration that by
default use the number of physical cores (2 in this case it is
210
2 Threads), the best speedup is Sp(2T, D2) achieved using
Dataset 2, with an average value of 1.876241, which confirms
the Hypothesis H2.
Additionally, the predictions do not include a number of
threads less than the physical number of cores nor for the
best performing Dataset 2 nor for any of the other datasets,
confirming the Hypothesis H1.
B. Algorithm II in a No Stress environment
The results obtained executing the Algorithm II using the
OpenMP default scheme are presented in Fig. 3 and for
executing the tool with Datasets 1 - 4 in Fig. 4
The histogram of predicted threads for executing the Algorithm II in a no stress environment is presented in Fig. 5.
The tool using Dataset 2 confirms the Hypothesis H2,
boosting performance in comparison with the most common
OpenMP default number of threads which corresponds to the
number of cores, i.e 2, in the majority of the runs. It can be
concluded that Dataset 4 results in slightly better performance
according to the frequency of the Dataset 4 points that lie
below Dataset2, which agrees with the minimum average out
of these two datasets.
Dataset 3 and Dataset 4 out of the four datasets give
the most accurate results according to the average elapsed
MIPRO 2016/DC VIS
Fig. 7. Performance of executing the Algorithm I in a stress environment
with Datasets 1 -4
Fig. 5. Histograms of predicted threads for executing Algorithm II in a no
stress environment.
Fig. 6. Performance of executing the Algorithm I in a stress environment
with OpenMP default schemes
time. However, the average elapsed time for this algorithm is
generally not a good indicator of the whether the predictions
optimize the performance since the intervals [mean-sd, mean
+sd] of the measurements listed in Table IV all overlap with
each other except for 1 thread. Therefore the graphs along
with the significant attributes should be analyzed in order to
determine which dataset results in the best optimization of the
algorithm.
Additionally, no dataset prediction degrades the performance, and results with a number of threads less than the
number of physical cores Fig. 5.
C. Algorithm I in a Stress environment
The performances of executing the Algorithm I in stress
environment are presented in Fig. 6 for default OpenMP
schemes and in Fig. 7 for Datasets 1-4.
Fig. 8 presents the histogram of predicted threads for
executing the Algorithm I in stress environment and contains
MIPRO 2016/DC VIS
Fig. 8. Histogram of predicted threads for executing the Algorithm I in a
stress environment
information about the frequency associated with the predictions.
Algorithm I exploits the parallelism at a high degree and
is accentuated enough so that in a stress environment using
OpenMP default scheme, it is utilizing more cores. Eventually,
this yields a better performance, except in very rare cases
with some overlap. This overlap is located in the interval
[x − sd, x + sd] for a certain value of x as elapsed time. It is
a case when the performance of executing the algorithm with
a given dataset overlaps with the same interval for execution
of an OpenMP scheme.
The execution of the Algorithm I with maximum number
of threads in the OpenMP default scheme yields to the best
performance, as presented in Table IV). The maximum number
of threads is consequently used as a reference value when
comparing the tool results with the OpenMP default scheme
results.
211
Fig. 9. Performance of executing the Algorithm II in a stress environment
with default OpenMP schemes.
Fig. 11. Histogram of predicted threads for executing the Algorithm I in a
stress environment
TABLE IV
AVERAGE ELAPSED TIMES FOR THE EXPERIMENTS
Fig. 10. Performance of executing the Algorithm II in a stress environment
with Datasets 1-4.
D. Algorithm II in a Stress environment
Fig. 9 presents the performance of executing Algorithm II
in a stress environment with OpenMP schemes and Fig. 10
with Datasets 1-4.
The best performance in the OpenMP schemes results in
the configuration with 4 threads judging by the average.
The majority of input states benefit more from 2 threads
than 4 threads when comparing the overlaps shown in the
performance graphs. This happens due to the variation. It is
not a negligible percentage, so it should be considered when
comparing the tool results with the OpenMP default scheme
results. The reference optimal number of threads is therefore
not uniquely determined, but is closely related to the runtime.
The results presented in Fig. 9 and Fig. 11 indicate that
from the OpenMP default scheme, the optimal number of
threads is leaning towards 4 threads. However the same as
with Algorithm I in a stress environment, this preference will
be reviewed more closely due to the higher variations of
the results i.e the unpredictability caused by the the highly
dynamic runtime state changes.
V. D ISCUSSION
Table IV presents the average values of elapsed times for
executing the Algorithm I in a no stress environment (A1NS),
212
Test run
1 Thread
2 Threads
4 Threads
Dataset 1
Dataset 2
Dataset 3
Dataset 4
A1NS
35.65783
16.79859
08.89994
11.95526
08.95332
12.16092
09.00934
A2NS
7.752695
5.192298
4.724949
4.831201
4.770104
4.640983
4.556192
A1S
94.67928
41.35630
25.33419
39.43302
25.10371
40.83514
31.29540
A2S
22.99271
13.52446
13.58029
13.70530
12.70467
13.42115
12.54097
TABLE V
S PEEDUPS OBTAINED FOR THE EXPERIMENTS
A1NS
A2NS
A1S
A2S
Test run
1 Thread
2 Thread
4 Thread
1 Thread
2 Threads
4 Threads
1 Thread
2 Threads
4 Threads
1 Thread
2 Threads
4 Threads
Dataset1
2.982606
1.405121
0.744437
1.604713
1.074743
0.978007
2.401015
1.048773
0.541023
1.677651
0.986805
0.990874
Dataset2
3.982636
1.876241
0.994037
1.625267
1.088508
0.990534
3.771525
1.647418
0.849842
1.809784
1.064527
1.068921
Dataset3
2.932165
1.381358
0.731847
1.670486
1.118793
1.018092
2.318574
1.012763
0.522447
1.713170
1.007698
1.011857
Dataset4
3.957874
1.864575
0.987857
1.701573
1.139614
1.037039
3.025342
1.321481
0.681704
1.833408
1.078422
1.082874
Algorithm II in a no stress environment (A2NS), Algorithm
I in a stress environment (A1S) and Algorithm II in a stress
environment (A2S).
The obtained values of speedups for the experiments are
displayed in Table V.
One can conclude that no dataset produces a speedup factor
of less than 1 with the number of physical cores equal to the
number of threads in (V). The exception is the Dataset 1 in
stress environment with a value of 0.986805. Considering the
standard deviation of the results, this value can be considered
acceptable due to the stress environment. This evidence sup-
MIPRO 2016/DC VIS
ports hypothesis H1 for all the datasets.
In summary, the performance evaluation yields the following conclusions. The speedups measured in the stress and
no stress environment for both Algorithms I and II suggest
that the predictions result in an optimized performance using
either Dataset 2 or Dataset 4 for both Algorithms I and II.
The indicators are the mean elapsed times combined with
the conclusions made by analyzing the performance graphs
that compare the results of executing the Datasets with the
OpenMP default scheme results.
The maximum number of threads results in best performance for both the no stress and stress environments. Since
Algorithm I is scaled vertically more successfully than Algorithm II stress testing does not result in a different optimal
number of threads. The tool predictions that yield the best performance using the maximum number of threads are Dataset
2 and Dataset 4 following closely behind with slightly lower
speedups. The optimization/performance boost is done by taking advantage of hyperthreading. The successful optimization
using Dataset 2 and Dataset 4 confirms Hypothesis H2 for
Algorithm I.
The optimal number of threads for Algorithm II in both
the no stress and stress environment is not unique as in the
previous case with Algorithm I. Therefore it requires more
thorough graph analysis of the resulting performances. This
analysis shows evidence that the tool succeeds in finding an
optimal number of threads that varies from 2 to 4. Despite
the high index of variance inevitably caused by the intensive
stress testing, close examination of the graph/plots and the
statistics of the datasets and the predictions, yields the same
conclusion that Dataset 2 and Dataset 4 result again in a
better performance than for the other datasets. This evidence
confirms hypothesis H2 for Algorithm II.
VI. R ELATED WORK
Among the research work done in area of adaptive or
dynamic runtime OpenMP configuration, one of the most
recent works, is based on creating a Multilayer Perception
neural network with one hidden layer with back propagation
to determine the number of threads for a parallel region based
on the external workload [9].
Even though different machine algorithm, benchmarks and
profiling tools are used, there are similarities between the goals
set and methodologies used between this paper and [9]. The
benefits of the approaches to optimization presented in both
papers are mainly demonstrated with increase in the workload.
A machine learning technique is used to predict the number
of threads and is tested in an unknown setting.
Other similar work is done building up on [9] using other
approaches other than neural networks in dynamic workload
settings, such as reinforced learning and Markov decision
processes for unsupervised learning [10]. Random forests
are mentioned as a suggestion for another machine learning
algorithm that can solve the problem.
Substantial amount of work is done in creating loop scheduling policies using runtime information. Other research work
MIPRO 2016/DC VIS
demonstrate promising results by using an adaptive loop
scheduling strategy targeting SMT machine [11].
VII. C ONCLUSION
The conducted experiments and performance analysis provide evidence that supports the correctness of hypotheses H1
and H2.
The main achievement is that the analysis tool provides a
way to map the dynamic runtime state to the optimal number
of threads in a real-time manner being able to respond to the
changes in the runtime. The automation of this task becomes
greater and greater advantage with increasing the number of
processors and hyperthreading scale on a machine.
Therefore the significance of this paper is to explore a
way of automating optimization in OpenMP and set up a
groundwork for developing a truly generic cross-platform optimization tool that requires minimum configuration to produce
good predictions on the optimal number of threads.
The results so far are encouraging, and the tool is able
to correct the default scheme and find the optimal number
of threads in cases of a highly dynamic runtime when the
variation of the parameters is significant enough to provide
indication of it.
R EFERENCES
[1] B. Barney. (2015, Aug) OpenMP. [Online]. Available: https://computing.
llnl.gov/tutorials/openMP/
[2] M. Curtis-Maury, X. Ding, C. D. Antonopoulos, and D. S. Nikolopoulos, “An evaluation of openmp on current and emerging multithreaded/multicore processors,” in OpenMP Shared Memory Parallel
Programming. Springer, 2008, pp. 133–144.
[3] OpenCV dev team. (2014, Nov) OpenCV documentation, Random
Trees. [Online]. Available: http://docs.opencv.org/3.0-beta/modules/ml/
doc/random trees.html
[4] L. Breiman and A. Cutler. (2004) Random Forests. [Online]. Available:
http://www.stat.berkeley.edu/∼breiman/RandomForests/cc home.htm
[5] R. Morgan and D. MacEachem. (2010, Dec) Sigar - system information
gatherer and reporter. [Online]. Available: https://support.hyperic.com/
display/SIGAR/Home
[6] P. V. V. Reddy and L. Rajamani, “Evaluation of different hypervisors
performance in the private cloud with sigar framework,” International
Journal of Advanced Computer Science and Applications, vol. 5, no. 2,
2014.
[7] B. Barney. (2015, Aug) POSIX Threads Programming. [Online].
Available: https://computing.llnl.gov/tutorials/pthreads/
[8] John Burkardt. (2011, May) C++ Examples of Parallel Programming
with OpenMP. [Online]. Available: http://www.stat.berkeley.edu/
∼breiman/RandomForests/cc home.htm
[9] M. K. Emani, Z. Wang, and M. F. O’Boyle, “Smart, adaptive mapping
of parallelism in the presence of external workload,” in Code Generation
and Optimization (CGO), 2013 IEEE/ACM International Symposium on.
IEEE, 2013, pp. 1–10.
[10] M. K. Emani, “Adaptive parallelism mapping in dynamic environments
using machine learning,” Ph.D. dissertation, The University of Edinburgh, 2015.
[11] Y. Zhang, M. Burcea, V. Cheng, R. Ho, and M. Voss, “An adaptive
OpenMP loop scheduler for hyperthreaded SMPs,” in ISCA PDCS, 2004,
pp. 256–263.
213
An effective Task Scheduling Strategy in multiple
Data centers in Cloud Scientific Workflow
Esma Insaf Djebbar
Ghalem Belalem
Department of Computer Science
University of Oran 1, Ahmed Ben Bella
Oran, Algeria
esma.djebbar@gmail.com
Abstract— Cloud computing is currently the most hyped and
popular paradigm in the domain of distributed computing. In
this model, data and computation are operated somewhere in a
cloud which is some collection of data centers owned and
maintained by a third party. Scheduling is the one of the most
prominent activities that executes in the cloud computing
environment. The goal of cloud task scheduling is to achieve high
system throughput and to allocate various computing resources
to applications. The Complexity of the scheduling problem
increases with the size of the task and becomes highly difficult to
solve effectively. In this research, we propose a task scheduling
strategy for Cloud scientific workflows based on gang scheduling
in multiple Data centers. The experimentation shows the
performance of the proposed strategy on response time and
average cost of cloudlets.
Keywords— Cloud scientific workflow, task scheduling, gang
scheduling, multiple Data centers.
I. INTRODUCTION
Cloud computing is a distributed computing paradigm that
mixes aspects of Grid computing, Internet computing,
Autonomic computing, Utility computing, and Green
computing. Cloud computing is derived from the servicecentric perspective that is quickly and widely spreading in the
Internet Technology world. From this perspective, all
capabilities and resources of a Cloud (usually geographically
distributed) are provided to the users as a service, to be
accessed through the Internet without any specific knowledge
of, expertise with, or control over the underlying technology
infrastructure that supports them. Cloud computing offers a
user-centric interface, that acts as a unique point of access for
users' needs and requirements. Moreover, it provides ondemand service provision, Quality of Service (QoS)
guaranteed offer, and autonomous system for managing
hardware, software, and data transparency to the users.
Cloud computing has recently received considerable
attention, as a promising approach for delivering Information
and Communication Technologies (ICT) services as a utility.
In the mechanism of providing these services it is necessary to
improve the utilization of data center resources which are
operating in most dynamic workload environments. Data
centers are the essential parts of cloud computing. In a single
data center generally hundreds and thousands of virtual
servers run at any instance of time, hosting many tasks and at
214
Department of Computer Science
University of Oran 1, Ahmed Ben Bella
Oran, Algeria
ghalem1dz@gmail.com
the same time the cloud system keeps receiving the batches of
task requests. During this context, one has to notice a few
target servers out of many powered on servers, which can
fulfil a batch of incoming tasks. So Task scheduling is a
valuable issue which greatly influences the performance of
cloud service provider. Traditional approaches that are used in
optimization are deterministic, fast, and give perfect answers
but often tends to get stuck on local optima. The complexity of
the task scheduling problem belongs extremely large search
space with correspondingly large number of potential
solutions and takes much longer time to find the optimal
answer. There is no ready made and well outlined
methodology to solve the problems under such circumstances.
However in cloud, it is tolerable to find near best solution,
preferably in a short period of time. This work is a
continuation of our work presented in [7].
The rest of the article is organized as follows. Section 2
presents the related work. Section 3 introduces the basic
strategy of the proposed approach, gives an example and
analyzes the research problem. Section 4 demonstrates the
simulation results and the evaluation. Finally, Section 5
addresses our conclusions and future works.
II. RELATED WORKS
There are so many algorithms for scheduling in Cloud
computing. The main advantage of scheduling algorithm is to
obtain a high performance. The main examples of scheduling
algorithms are First Come First Served (FCFS), Round-Robin
(RR), Min-Min algorithm, and Max-Min algorithm.
A.
FCFS Algorithm
First come First serve Algorithm basis means that task that
come first will be execute first.
B.
Round-Robin Algorithm (RRA)
In this Scheduling algorithm, time is to be given to
resources in a time slice manner.
C. Min-Min Algorithm
Min-Min Algorithm selects the smaller tasks to be
executed first.
MIPRO 2016/DC VIS
D. Max-Min Algorithm
Max-Min Algorithm selects the smaller tasks to be
executed first.
Scheduling in cloud computing can be categorized into
three stages:
Discovering a resource and filtering them.
Selecting a target resource (Decision stage).
Submission of a particular task to a target
resource.
Lin and Lu [1] developed an algorithm for scheduling of
workflows in service-oriented environments. Unlike
algorithms for Grid systems, it is able to utilize dynamic
provisioning of resources. However, it misses the capability of
considering the cost for utilization of resources required for its
utilization in Cloud environments.
In [2] is presented a scheduling algorithm based on cost of
workflows for real-time applications. The purpose of the
algorithm is to develop a scheduler that minimizes the cost
and still meets the time constraints imposed by the user. The
workflow is divided into subsets of tasks for establishing a
single flow. Tasks that do not form a single flow are separated
and each of them runs as an independent subset.
In [3], is presented a strategy of a dynamic workflow
scheduling that treats the relationship user/resource. In this
approach the resources are not seen individually, but grouped.
The scheduler, in this approach, selects the sites and this
selection is made by an opportunistic strategy. It aims to
spread the tasks of the workflow through Grid sites based on
their performance in previous submissions.
III. PROPOSED STRATEGY
In the Cloud computing system, several tasks are running
simultaneously, the tasks will have more access to data. To
run a task, the data must be aggregated and this requires more
data movement. Therefore, if several tasks are used the same
data, they must be placed together to minimize the frequency
of data movement. The proposed approach includes two
important stages. Each of which contains a set of operations to
be performed. Figure \ref{fig1} shows a global view of the
approach.
Fig1. A global view of the proposed approach
A. Stage of construction
During the construction phase, we use a matrix model to
represent the existing task. We form clusters in the task set by
the transformation of matrix, and then we distribute the data
set to different data centers as the original partitions to be used
for the next step.
First, we compute the task dependencies of all tasks and
accumulate a matrix TM whose elements TMij= dependencyij.
The value is the dependency between the tasks Ti and Tj. It can
be calculated by counting data in common between the sets of
task that are denoted as Ti and Tj. Specifically, for the
diagonal elements of the TM, each value means the number of
data that will be use by this task. TM is a symmetric matrix of
dimension n*n where n is the total number of existing
Datasets.
The Bond Energy Algorithm (BEA) is applied to the
matrix TM in order to group similar values together. Two
measures, BEC and BEL are defined for this algorithm. The
permutation is done so that these measures (see Formulas1 and
2) are maximized:
(1)
(2)
After applying the algorithm to group the BEA similar
values in the matrix. The TM matrix is partitioned according to
the number of data centers in multiple clustered arrays named:
CM1, CM2, CM3, ...
B. Stage of scheduling
The dependency matrix (i.eTM) is dynamically maintained
at this phase. When a new tasks are generated or added to the
system by user, we calculate their dependencies with all
MIPRO 2016/DC VIS
215
existing tasks and we add them to the matrix TM. We take the
dependency matrix (TM) as input and generate the clustering
dependency matrix (CM). In the CM, the items with the same
values are grouped together.
Before worrying Datasets that will be generated, first we
must run existing tasks. Since the movement of Datasets of a
data center to another is more expensive than the scheduling
of tasks to the Datacenter. A job scheduling algorithm is used
(Algorithm of scheduling).
Fig2. Average response time of cloudlets
In this algorithm, the technique used is based on the
placement of Datasets; the ready tasks are scheduled to the
Datacenter that contains the majority of the Datasets required.
A task is said to be ready if all required Datasets belong to the
set of existing Datasets. Once the tasks are completed, new
Datasets are generated.
IV. EXPERIMENTATIONS AND RESULTS
In this section, we describe the experiments that we
conducted to evaluate the proposed approach. Experiments
were conducted with the CloudSim toolkit [4, 5, 6]. The
objective of CloudSim Framework is to provide a generalized
and extensible simulation that allows the modeling, simulation
and experimentation of new infrastructure cloud and
associated application services. We used in our work a
simulator CloudSim version 3.0.3.
A. Average Response time
The template is designed so that author affiliations are not
repeated each time for multiple authors of the same affiliation.
Please keep your affiliations as succinct as possible (for
example, do not differentiate among departments of the same
organization). This template was designed for two affiliations.
In this first simulation, we calculated the average response
time with TimeShared and TimeShared Clustering (proposed
strategy). For a different number of Cloudlet (20, 40, 60, 80,
100) with a length corresponding to the Cloudlet. In Cloudsim,
the TimeShared algorithm can handle multiple requests
(Cloudlet) in the same time but cloudlets must share the
computing power of the machine, the functioning of the
TimeShared algorithm is equivalent to the Round Robin
algorithm. The following figures (Figure 2) and (Figure 3)
show the result of implementation of the average response
time.
216
Fig3. Average response time of cloudlets
Based on these results, we note that the average response
in Time Shared increases each time that the number of
cloudlets rise because many cloudlets are processed at once
while running takes time to process all cloudlets therefore the
average response time increases each time. In Time Shared
Clustering, the average response time is low because the
Cloudlet are divided in different Datacenter.
B. Average Cost of cloudlets
In this series of simulation, we calculated the average
processing cost of Cloudlet with both algorithms (TimeShared
and TimeShared Clustering). In (Figure 3 and Figure 4) are
presented, on the bar charts, the main executions performed on
that scenario.
Fig4. Average cost of cloudlets
MIPRO 2016/DC VIS
is the best and the most popular simulator for Clouds. The
objective of this simulation was to make experiments of some
metric; the wait time, response time and the cost of processing
with the two algorithms based on Shared Time (First Came
First Served) and SpaceShared which is based on (Round
Robin).
Fig5. Average cost of cloudlets
The objective of this series of simulation is to study the
impact of our application on the cost of processing cloudlets.
Based on these results, we note that the average cost in
TimeShared algorithm is very high compared to TimeShared
Clustering because the CPU utilization is less in TimeShared.
V. CONCLUSION AND FUTURE WORKS
Cloud computing implies that the data processing is not
performed only on local computers, but on third-party data
center. It refers equally to applications delivered as a service
over the Internet as infrastructure as a service, i.e. the compute
resources and / or storage. These technologies appear to be
key solutions for companies and research teams with modest
budgets. It allows to provide users with a set of application
without having a lot of resources, the user connects to the
cloud service provider site uses the applications proposed to it
without realizing it accesses machines (virtual or not)
different, it can also store personal data on remote servers, all
of these services, which are available to users, are provided
through the three cloud models, namely SaaS, PaaS and IaaS.
This work deals with the problem of task scheduling and
their influence on the performance and cost of execution on
the type of cloud IAAS. The initial objectives were to propose
a method to distribute costs and improve runtime execution
and implement and study the impact of our application on the
performance of Cloud Computing. We have established
simulations under CloudSim simulator environment because it
MIPRO 2016/DC VIS
This work seeks primarily to improve cloud services
including the execution time of applications and the financial
cost. For this, we compared the two algorithms TimeShrad and
TimeSharedClustering. The results of the experiments are
carried out under the CloudSim simulator prove encouraging
that meet our expectations. The results of experiments carried
out under the CloudSim simulator prove encouraging and have
met our expectations.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
C. Lin and S. Lu, SCPOR: An Elastic Workflow Scheduling Algorithm
for Services Computing, in Proc. Int'l Conf. SOCA, pp. 1-8, 2011.
J. Yu, R. Buyya, and C. K. Tham. Cost-based scheduling of scientific
workow application on utility grids. In E-SCIENCE '05: Proceedings of
the First International Conference on e-Science and Grid Computing,
pages 140-147, Washington, DC, USA, 2005.
L. A. V. C. Meyer, D. Scheftner, J.-S. Vckler, M. Mattoso, M. Wilde,
and I. T. Foster. An opportunistic algorithm for scheduling workflows
on grids. In VECPAR, volume 4395 of Lecture Notes in Computer
Science, pages 1-12, 2006.
R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F.D. Rose, and R.
Buyya, CloudSim: A Toolkit for Modeling and Simulation of Cloud
Computing Environments and Evaluation of Resource Provisioning
Algorithms, Softw., Pract. Exper., vol. 41, no. 1, pp. 23-50, Jan. 2011.
Rajkumar Buyya, Rajiv Ranjan and Rodrigo N. Calheiros, Modeling and
Simulation of Scalable Cloud Computing Environments and the
CloudSim Toolkit: Challenges and Opportunities.
Rodrigo N. Calheiros, Rajiv Ranjan, Anton Belogla zov, César A. F. De
Rose, and Rajkumar Buyya, CloudSim A Toolkit for Modeling and
Simulation of Cloud Computing Environments and Evaluation of
Resource Provisioning Algorithms.
Esma Insaf Djebbar, Ghalem Belalem. Optimization of Tasks
Scheduling by an Efficacy Data Placement and Replication in Cloud
Computing. in International Conference on Algorithms and
Architectures for Parallel Processing (ICA3PP), pages 22-29, December
2013.
217
Visualization in the ECG QRS Detection
Algorithms
A. Ristovski* , A. Guseva* , M. Gusev** , S. Ristov**
*
**
Innovation Dooel, Skopje, Macedonia
Ss. Cyril and Methodius University, Skopje, Macedonia
info@innovation.com.mk
Abstract—Digital ECG data analysis is a trending concept in
the field where applied computer science and medicine coincide.
Therefore, in order to meet the requirements that arise, our
R&D team has created an environment where developers can test
different approaches in data processing. To assist the objective,
the platform offers a number of features with the following
main target goals: 1) to increase the effectiveness in conducting
a proper medical diagnose, 2) to incorporate a unified format
of storing the results of the diagnosis conducted and 3) to test
various ECG QRS detection algorithms.
Index Terms—ECG Analysis; Modular Software Architecture;
AI Agent.
I. I NTRODUCTION
The striding process of data digitalization has not left the
ECG data immune to it. For an instance, modern professional
ECG instruments often feature a digital output in addition to
the hard copied records. While hard copied records have the
role of feasible hand-outs and provide medical personnel an
immediate overview of a patient’s condition, the digitalized
records have proven to be convenient for storage. Also, there
are devices know as Holter monitors, that take continuous
readings for a period of one or several days. Holter monitors
assist in the diagnosis of conditions that require an extended
keep up with the heart’s physiology. Their output is inevitably
digitalized since it is impossible to keep the lengthly recording
on paper. Not long ago, the medical gimmick market became
abundant with gadgets that administer the user with simplified
ECG readings, again, taking advantage of the most convenient
storage form - digitalized data.
No matter the level of exhaustiveness and accuracy of the
information contained in the digitalized data, many opportunities rise in the filed of digital data processing. What is of
greatest significance, is the opportunity to give a diagnosis on
the recordings without the assistance of medical personnel, i.e.
to replace human resources with a virtual Artificial Intelligence
(AI) agent [1].
The extent of details provided in a conducted diagnosis vary
according to the needs. In some cases, only statistical data
targeting a specific medical condition or property is collected,
and in others, a thorough analysis of the ECG recording is
handled - to deliver an accurate cardiac profiling.
Regardless the scope of the diagnosis, visualizing the outcome of the signal processing while testing different algorithms is of substantial benefit. Applying a filter to the signal
218
before it is processed can increase the effectiveness of the algorithm used for the processing. Therefore, for the purpose of
having an insight how the filter has effected the initial signal,
a graphic representation of the processed signal in contrast to
a representation of the original one is a must. That way, by
modifying the filters’ parameters and immediate observation
of the results, the process of improving the effectiveness gains
a significant advance.
An ECG signal consists of a sequence of form-specific
element set of typical waves that appear in a unambiguous
pattern, at least to say unambiguous for most of time. These
typical waves are referred to as components of the signal or
characteristic points. Although the signal’s components can be
easily distinguished by the human eye, it is quite the challenge
to teach an AI agent how to do so. By visually marking which
signal components have been detected and their exact point
of detection, one can determine to what extent the detection
algorithm has been successful. That way, the left-outs can
be consequently analyzed and the algorithm can be modified
accordingly, hence, doing corrections in a favorable direction.
Some algorithms may even use pattern recognition in the
detection of the signals’ components. By visualizing how does
a component fit in the pattern it is supposed to fit in, the
pattern can be improved in a way so that it better detects
the components that have been left out during the detection.
In addition, one of the more advanced features when using
a pattern recognition is the ability to asses which part of the
components deviates from the general form and in what way.
Therefore, using visualization in the development of algorithms which do the ECG component detection is of great
interest. The goal is to design an AI agent that will mimic the
way a medic would interpret a reading. Although the AI agent
itself does not need visualization of any kind to do its job, it
is practically inevitable for the development process to feature
the use of visualization modules, i.e. provide developers a
visual insight in how the approach they are using is affecting
the correlation between the input data - the ECG signal and
the output data - a legitimate diagnosis. Whereas the paper
is focused on the role of visualization in the AI agent design
process, it also encloses an architectural design of an ECG
diagnosis application that is not only aiming to deliver accurate
results, but also, to incorporate functionalities for testing new
approaches and improve overall effectiveness.
MIPRO 2016/DC VIS
The paper is organized as follows. The architectural design
of the solution is given in Section II. Section III describes
the architectural modules in detail and explains how the
visualization has been implemented how it assists the process
of devising the detection algorithms. An overview of existing
related work is handed in Section IV. Section V discusses the
conclusions and directions for future work.
II. A RCHITECTURAL D ESIGN
The features of a tool designed for delivering an ECG
diagnosis are incorporated into the modules of a system
with modular software architecture. Such design allows the
advantages of distributed development, code re-usability and
modular transparency, hence, the integration of new functionalities and the upgrade of existing ones can be done easily.
The process of AI agent assisted diagnosing is very close
to how a medic would act upon doing so, with respect to the
steps needed to utilize the input and output formats that make a
notable exception. The whole routine consist of several stages.
These include:
• Unpacking and interpreting the ECG recording, whatever
the file format;
• Pinpointing the signal components of the signal:
– P waves;
– QRS complexes;
– T waves;
• Analise the constellation and form of the signal components;
• Set a medical diagnosis;
• Save the annotations;
In addition, the system is capable to evaluate the effectiveness of the applied algorithm on an extensive database of
patients. It responds with several key effectiveness indicators,
such as detection rate (percentage of detected features), hit
rate (percentage of correctly detected features), miss rate
(percentage of incorrectly detected features), and also the extra
rate (percentage of extra determined features, which are not
recognized by a human and have to be eliminated). This
module is efficient in recognizing those parts of the ECG
which caused problems (extra features or misses) and a user
can automatically analyze them and improve the algorithm by
customizing and adapting its parameters.
The product reviewed in this paper has been designed in accordance with the stages of ECG analysis numbered above and
is using a separate software module for each stage. Guidelines
for pragmatic processing and several assisting visualization
modules are also designed in accordance with the concept of
modular architecture. They have been offered at disposal in
order to make the whole process simpler and more transparent
for developers. Although the architecture makes a notably
pragmatic solution for the problem, what is more important is
that it excels at increasing the effectiveness of the diagnostic
capabilities.
The overall architecture of the system is presented in Fig. 1.
The next section gives a description on how each module deals
MIPRO 2016/DC VIS
Fig. 1. An overview of the system’s architecture
with the problems in respect, and, moreover, focuses on the
interest of implementing the modules for visualization.
III. F UNCTIONAL D ESCRIPTION
One of the fundamental ideas for an application with such
architecture is interoperability. That is achieved by allowing
data input in different file formats, hence, making it possible
for the application to process output data from a variety of
ECG sensors. The application’s input module handles the different data representations containing the actual ECG signal,
so that the data is given a unified composition - an inevitable
prerequisite for the next stage of the analysis.
There are several standard medical record data formats for
storing and retrieving ECG files. Although most of them are
XLM based, still there are examples of binary files, such as
those used by PhysioNet open source library [2]. The most
profound is maybe the USA FDA standard HL7 [3] that
specifies annotated ECGs and provides means to systematically evaluate the ECG waveforms and measurement locations.
Bond et al. [4] review nine different formats used to store the
ECG, such as SCP-ECG, DICOM-ECG, and HL7 aECG. The
input module also does the consolidation of additional record
information, such as sensor ID, time stamp, location, type of
ECG lead being analyzed [5] and alike.
The ECG signal visualization module incorporated into the
application’s GUI is an important module for visualization of
the ECG signal. By the means of this feature, the developers
have an immediate insight on how does the analog representation of the digital signal look like, as if looking at a hard
copied ECG recording.
Therefore, this visualization module is responsible for displaying the consolidated ECG signal. The OpenGL libraries
[6] have been used for the purpose of visualizing the data.
Since the development environment has incorporated the C#
programming language, a wrapper for the OpenGL libraries
is needed. The choice of OpenGL wrapper has come down to
the OpenTK wrapper [7], superseding the Tao Framework [8].
The application interface displays the additional information
of the ECG signal and highlights the analog representation of
the ECG signal in a standard ECG graph paper surrounding
[9]. A standard record duration of 10 and 30 seconds are the
219
Fig. 2. A snapshot of the signal visualization GUI component
most common length of an ECG record, as the information
contained within a 10 seconds reading is sufficient for a
standard diagnosis. Moreover, records with these lengths are
optimal in terms of appearance since longer records would
make the visualized signal appear cluttered.
However, records with longer duration are not nonexistent.
Certain diagnosis types require readings that last up to half an
hour. Visualization of such records has been also supported,
by enabling transition between consecutive 10 second strips,
thus, maintaining optimal appearance.
The size of the visualization component in the Graphical
User Interface (GUI) is scalable, so that it adapts to the
dimensions it is supposed to have, hence, the ECG recording
being displayed scales up to provide maximum clarity and
visibility. The height of the graph paper squares scales up as
well, to conform with the signal’s amplitude. Optionally, a
zero amplitude line can be displayed as well.
The DSP module is another auxiliary application module
used for filtering the digital signal from noise. The effectiveness of the upcoming second phase of ECG analysis
can be significantly improved by applying a filter to the
signal. The filtering can realize various DSP methods, such
as, interpolation, moving average, regressive filters, differential
filtering, wavelets, discrete Fourier transformations, etc. The
module allows use of a combination of any of these filters and
customizing their parameters.
The processed signal is at request displayed within the same
visualization module, next to the original signal - in order
to give the developer an insight of how applying the filter
and changing the filters’ parameters is affecting the signal.
The GUI component is shown on Fig. 2. The way the ECG
signal visualization module and the DSP module improve the
effectiveness is shown on Fig. 3.
The second phase involves the feature extraction module
used for detection and localization of the signal components.
The easiest way of doing this is to conduct a QRS complex detection first, as this signal component has the most distinctive
form and is the easiest to pick up out of the ECG recording.
The other two components, the P waves and T waves, can
be optionally detected as well - by conducting a look up
for one of each in the interval between two neighbor QRS
220
Fig. 3. An iterative process on how the ECG signal visualization improves
the overall effectiveness
complexes. This stage can be done in a number of different
ways: by analyzing the signal’s curve slopes, by using pattern
recognition, by looking for a local maximum at predefined
intervals, etc. It is up to the developer to make an assessment
which approach is of greatest interest. All the detected signal
components are then collected in a logical structure, referred
to as ECG signal component annotations, where an annotation
MIPRO 2016/DC VIS
Fig. 5. A snapshot of the ECG component detection visualization
Fig. 4. An iterative process on how the pattern visualization model improves
the overall effectiveness
consist of the following couple: component index (time stamp)
and component amplitude.
In case of using a pattern recognition, the feature extraction
module displays the pattern and the signal component entering
the pattern. Because there are multiple components detected,
this module enables selection of the signal component displayed on the ECG pattern visualization module. In case of
undetected component, a specific part of the signal can be
put on display under the pattern so that the pattern can be
modified in order to better detect the components. This process
is explained on Fig. 4.
In case of using the expert system to diagnose a streaming
data from an ECG sensor as a web service, the feature
extraction module offers an excellent opportunity to tune and
customize various filter and feature extraction parameters of
the algorithms to match the specifics of the person.
The ECG signal visualization module can also help with
testing the effectiveness of stage two by specifically marking
the signal components, i.e. by indicating which components
have been successfully detected and which have not. The application’s interface offers selection of the signal components that
are being displayed. For each type of components, if specified
for display, an indicator is drawn right above their positions.
MIPRO 2016/DC VIS
The feature is shown on Figure 5.
After the signal’s components have been temporally pinpointed (localized), the third phase is realized by the
parametrization module. It features two types of analysis:
obtaining temporal parameters and obtaining vector polarization parameters. The temporal parameters’ analysis starts with
the calculation of typical component intervals. These intervals
optionally include: PR, QRS, QT, PP and RR interval duration.
Out of that data, additional temporal components are derived,
such as Beats Per Minute (BPM), Heart Rate Variation (HRV),
Beat Fluctuation (BFx), SDNN and other related parameters
[10]. Then, the form of the signal components is being
investigated into for the vector polarization analysis. Although
there is small number of options how the parametrization can
be done, since it comes down to a standard medical approach,
the way the component extraction is handled offers significant
flexibility and is, in fact, very similar to how the second phase
- the component detection, is accomplished, as the signal’s
amplitude is taken into consideration. In case of having only
one lead in the ECG recording, very little relevant information
can be given for the vector polarization, because the vector
polarization is a derived compound from several leads that
make up the QRS phasor [5]. Therefore, when processing one
lead record, the vector polarization comprises only parameters
such as abnormal amplitude levels and component width.
The fourth phase consists of the diagnosis module that can
closely correlate to how a cardiologist would conduct the ECG
analysis. The information collected at the third phase is being
used - a combination of a number of parameters, would trigger
a medical condition marker. All the triggered markers are then
reasoned into a closing composite diagnosis.
Eventually, all the information gathered at the previous
phases: the ECG signal and the additional ECG record data
(phase one), the ECG component annotations (phase two), the
typical ECG parameters (phase three) and the ECG diagnosis
(phase four) are consolidated within the output module into an
output file. Once again, it is up to the developer to determine
which information is of interest and needs to be included in
221
the output. The output data can then be stored into a data
base, sent through a network channel or stored locally as an
individual file.
Additional functionality of the final phase is the effectiveness check. There are databases that in addition to the ECG
recordings offer supplemental component annotations. The
annotations generated by the developed system and the pregenerated annotations supplemented to the ECG recordings
from the databases can be automatically compared, via another
auxiliary module, inbuilt in the output module. An extensive
report of such comparison optionally accompanies the output
file, and it includes analysis of the effectiveness indicators,
such as the detection rate, hit rate, miss rate and extra rate.
The application has a fully functional Graphical User Interface (GUI) that allows the developer to have an actual
overview of the diagnosis given, as well as interfaces for
handling the application’s input and output, a debug window
and a status pane, as shown on Figure 6. There are additional
interfaces for customizing the parameters used during the component detection process and for customizing the parameters
used for the signal filtering.
such as Java, C++, C#, and MATLAB. Oefinger et al. proposed
a web service Java and CGI-based, which can plot ECG signals
in the ECG database called Physionet [19]. Kartnik designed
a MATLAB based ECG simulator [16], which can generate
normal lead II ECG wave forms.
Another example of a web-based ECG simulator is the ”The
Six Second ECG Simulator” [20]. The simulator ”simECG”
was developed using C++ language [21]. Additionally, an ECG
simulator of particular interest is WebECG [22] where various
ECG signals can be generated and plotted in 3-dimensional
view with zooming and moving.
Therefore, most of the related available products are in
fact ECG simulators, meant for medical training of medical
personnel or medical software developers. However, the system proposed in this paper would assist the medical software
developers in a unique way. That is, not for training individuals
on how to read and interpret ECG recordings, but to enable
developers to design approaches for digital ECG processing
and analysis. On the other hand, the proposed solution has
functional similarities to other applications meant for diagnosis, such as the products by PhysioNet.
IV. R ELATED WORK
V. C ONCLUSION AND F UTURE W ORK
There are several tools that help ECG diagnosis, and most
of them are open source software packages or offered as web
application. ECG has been attracting the researchers a lot and
even the mathematical toolkits Mathematica [11] and MatLab
[12] offer extensions for ECG interpretation.
PhysioNet [13] have developed a large collection of software for viewing, analyzing, and creating recordings of physiologic signals, and their product Wave is a visual tool that
enables an extensible interactive graphical environment for
manipulating sets of digitized signals with optional annotations.
An ECG simulator is a device, which can simulate ECG
signals recorded previously in the real time mode [14]. The
main goal of ECG simulators is to convert the digital ECG
signals to analogue counterparts, which have been created in
a computer graphic environment [15]. The ECG Simulator
enables analysis and study of normal and abnormal ECG
waveforms without actually using the ECG machine. One can
simulate any given ECG waveform using the ECG Simulator
[16]. This simulator can be used both for clinical training
of doctors as well as for design, development and testing of
automatic ECG machine without having real subject for this
purpose.
Most of the simulators on the market are offering services
for training of readings of ECG and similar practical clinical skills. Nilsson et al. [17] propose a web-based ECGinterpretation program for undergraduate medical students.
Lu et al. [18] designed a tool for modeling and simulation
of electrophysiology. Kaur describes a simulator [14] that
produces realistic ECG with the possibility of including other
biological signals, e.g. blood pressure.
In the literature, software-based ECG simulators have been
developed by the help of different programming languages
This developers’ environment has been built so that it meets
the needs of an ongoing project. The project involves analysis
of ECG data recorded by a set of wearable ECG sensors.
An AI agent serves the sensors and is capable of detecting
any abnormalities in the subject’s cardiac condition and also,
capable of preventing critical scenarios on time.
So far, the application has substantially contributed to
improving the effectiveness of the component detection in a
number of ways: by improving the pattern recognition approach and by testing different component detection methods,
by contributing in the analysis of how different filters affect
the original signal, by following a strict component based
architecture which increases the code re-usability in case of
transitioning to other platforms.
One of the features that this tool offers is the transitioning
from a desktop application to a web service. This is of high
interest, because that way out R&D team could continue
working on a fully web based platform, hence, make the
corrections to the web server that is in charge of the processing
of incoming ECG records without much effort.
Howbeit, a greater variety of filters can be implemented, so
that the users can choose one or combination of several, all in
order to increase the accuracy of the diagnostics. At the same
time, most of the added features have an effect that can be
seen ”on the go” - all thanks to the visualizer module.
To conclude, although the software was designed for a
specific cause, it can easily be re-purposed or upgraded with
functionalities, all due to its architectural modularity. It is a
product that is not of exclusive interest to the team that build it,
but assists the work of others that deal with similar challenges.
The main benefit of this tool is the possibility to fine tune
the parameters for DSP filtering and feature extraction in the
ECG detection and, therefore, customize it towards a specific
222
MIPRO 2016/DC VIS
Fig. 6. A snapshot of the visual tool for ECG QRS detection
patient, especially in a case of using a streaming ECG sensor
that continuously sends data to a diagnosis system. A machine
learning approach can be used for the pattern parametrization
in order to boost the effectiveness of the expert system.
As future work, we plan to develop more filters and feature
extraction algorithms. Additionally, the system is planed to
support simultaneous processing of multiple recordings, hence,
it will allow handling of entire databases and record sets. The
effectiveness check that covers the annotations generated from
a single file, will be extended to cover the annotations of
multiple files and subject them to extensive statistical analysis.
We believe that such expert system would be of particular
interest for students and medical researchers.
R EFERENCES
[1] F. Gritzali, “Towards a generalized scheme for qrs detection in ecg
waveforms,” Signal processing, vol. 15, no. 2, pp. 183–192, 1988.
[2] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov,
R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley,
“Physiobank, physiotoolkit, and physionet components of a new research
resource for complex physiologic signals,” Circulation, vol. 101, no. 23,
pp. e215–e220, 2000.
[3] Health Level Seven International. (2015, June) Hl7 version
3 implementation guide: Annotated ECG (aECG). [Online].
Available: http://www.hl7.org/implement/standards/product brief.cfm?
product id=102
[4] R. R. Bond, D. D. Finlay, C. D. Nugent, and G. Moore, “A review
of ECG storage formats,” International journal of medical informatics,
vol. 80, no. 10, pp. 681–697, 2011.
[5] T. B. Garcia et al., 12-lead ECG: The art of interpretation. Jones &
Bartlett Publishers, 2013.
[6] D. Shreiner, M. Woo, J. Neider, T. Davis et al., OpenGL (R) programming guide: The official guide to learning OpenGL (R), version 2.1.
Addison-Wesley Professional, 2007.
MIPRO 2016/DC VIS
[7] S. Apostolopoulos, “The Open Toolkit library online documentation,”
http://www.opentk.com/doc.
[8] D. Hudson, R. Ridge, R. Loach et al., “The Tao Framework,” http:
//sourceforge.net/p/taoframework/wiki/Home/.
[9] E. Einthoven’s, “Ecg graph paper.”
[10] M. Malik, “Heart rate variability,” Annals of Noninvasive Electrocardiology, vol. 1, no. 2, pp. 151–181, 1996.
[11] T. Pogorelov. (1998, Sep) Ekg qrs detection algorithm. [Online].
Available: http://library.wolfram.com/infocenter/Demos/4476/
[12] R. Gupta, J. Bera, and M. Mitra, “Development of an embedded system
and matlab-based gui for online acquisition and analysis of ecg signal,”
Measurement, vol. 43, no. 9, pp. 1119–1126, 2010.
[13] PhysioNet.org. (2015, July) The WFDB software package, software
for viewing, analyzing, and creating recordings of physiologic signals.
[Online]. Available: https://www.physionet.org/physiotools/wfdb.shtml
[14] G. Kaur, “Design and development of dual channel ecg simulator and
peak detector,” Thapar Institute of Engineering and Technology, Deemed
University, Patalia, 2006.
[15] C. Caner, M. Engin, and E. Z. Engin, “The programmable ecg simulator,”
Journal of Medical Systems, vol. 32, no. 4, pp. 355–359, 2008.
[16] R. Karthik. (2003) Ecg simulation using matlab. [Online]. Available:
www.mathworks.com/matlabcentral/fileexchange/10858
[17] M. Nilsson, G. Bolinder, C. Held, B.-L. Johansson, U. Fors, and
J. Östergren, “Evaluation of a web-based ecg-interpretation programme
for undergraduate medical students,” BMC medical education, vol. 8,
no. 1, p. 25, 2008.
[18] W. Lu, D. Wei, X. Zhu, and W. Chen, “A computer model based on
real anatomy for electrophysiology study,” Advances in Engineering
Software, vol. 42, no. 7, pp. 463–476, 2011.
[19] M. Oefinger and R. Mark, “A web-based tool for visualization and
collaborative annotation of physiological databases,” in Computers in
Cardiology, 2005. IEEE, 2005, pp. 163–165.
[20] S. L. Canada. (2016) The six second ecg (cardiac rhythm simulator).
[Online]. Available: http://skillstat.com/tools/ecg-simulator
[21] M. J. Martins AC, Costa PD. (2016) simecg: Ecg simulator. [Online].
Available: http://simecg.sourceforge.net/
[22] E. Güney, Z. Ekşi, and M. Çakıroğlu, “Webecg: A novel ecg simulator
based on matlab web figure,” Advances in Engineering Software, vol. 45,
no. 1, pp. 167–174, 2012.
223
Analysis and comparison of algorithms in
advanced Web clusters solutions
D. Alagić* and K. Arbanas**
NTH Media, Varaždin, Croatia
Paying Agency for Agriculture, Fisheries and Rural Development, Zagreb, Croatia
da@nth.ch
krunoslav.arbanas@gmail.com
*
**
Abstract – Today's websites (applications) represent an
essential part of nearly every business system, therefore it is
unacceptable for them to be unavailable due to the everincreasing competition on the global market. Consequently,
such systems are becoming more and more complex to allow
for their high availability. To achieve a higher system
availability, a greater scalability is used in creating the socalled Web farms or Web clusters. This system requires
much more computer power than the traditional solutions.
Since such systems are very expensive and complex in
nature, the question is how to obtain the best possible results
with the least amount of investment. To achieve that, it is
necessary to take a look at the component which contains
the information on the request or traffic amount, and that is
the HTTP/HTTPS load balancer. The system is based on
several algorithms, however, there are no comprehensive
analyses that indicate which algorithm to use, depending on
the Web cluster and the expected amount of traffic. For this
reason, this paper provides a detailed comparison of several
frequently used algorithms in several different Web cluster
scenarios, i.e. loads. Also, examples are given as to when to
use a certain algorithm.
I.
INTRODUCTION
One of the first cloud computing solutions were simple
Web sites, i.e. portals that have become a lot more
complex in the past two decades due to globalization of
the market. Today, companies and organizations
increasingly base their operations on the forms of
computing that make the information available anytime
and anywhere. That is why no company or organization
can afford for their Web sites to be unavailable, since that
is detrimental to their reputation, business opportunities
and revenue [4][14]. In order to provide constant
availability of the system, both architecture and
infrastructure of these solutions have become a lot more
complex and, consequently, more expensive. Therefore, it
is almost impossible to find the traditional solution for the
Web sites; instead, everything is done with the help of
Web clusters or Web farms [17]. Such cluster solutions
present a kind of parallel and distributed computer system
consisting of computers which are interconnected (in this
case a Web server), which represent one computing
resource whose performance is far greater than that of
regular computers. This allows for a horizontal way of
spreading computing resources to more instances of Web
servers that are usually in the form of virtual machines
(multiple usage of the existing resources) and a lot
224
cheaper than the vertical growth, which represents a
strong and independent purchase of powerful and
expensive servers [16][20].
Such solutions offer many advantages such as a
scalable growth, meaning that it is possible to add more
Web servers to handle HTTP / HTTPS requests depending
on the increase of the traffic. In addition, such a system
allows for greater availability because the requests - with
the help of the HTTP / HTTPS load balancer (hereinafter
referred to as the LB) - are being distributed on several
Web servers (multi-node architecture). Due to that, the
chance of a single point of failure is decreased. Since the
system consists of several Web servers, such solutions are
more favorable when dealing with necessary system
changes and upgrades, because it is possible to work on
the servers one by one without causing any downtime
unlike the traditional concept [8][19].
In order for that to be possible, the architecture of such
systems is notably more complex, i.e., it consists of
multiple components, thus entailing a greater risk that one
of them will undergo an "operational dropout" or a
malfunction that would result in the collapse of the entire
system. In addition to a larger number of components
(servers, storage, etc.), the system is more complex
because it uses a more sophisticated equipment with
various distributed systems, as well as a large number of
protocols for all that to function without a flaw. Besides,
such complex systems are quite expensive because they
generate large OPEX and CAPEX costs, wherein their
usage is not always brought to the maximum. An example
of OPEX costs is the maintenance of the infrastructure
(servers, network, etc.), as well as other items, such as
other monthly expenses: Internet connection, electricity,
air conditioning, renting of the space, etc. An example of
CAPEX costs is an investment in new servers or network
equipment necessary for the operation of Web farms [5].
Figure 1. Schematic representation of the communication between end
user and the Web server (the traditional model)
MIPRO 2016/DC VIS
Another disadvantage is that the SLB algorithms do
not collect any feedback on the nodes’ status [1][9]. Due
to these limitations, we will consider further only the DLB
algorithms since they are quite flexible, and can be
modified and improved easily. The following table shows
the comparison between the existing DLB algorithms.
This paper will not compare and test all those
algorithms, but only those which are commonly used and
are generally available (open source).
Figure 2. Schematic representation of the communication between the
end user and the Web servers in the Web farm
The main problem here is that the equipment
amortization rate is very high, therefore the equipment is
trying to be put to a multiple use [10][21].
Due to all these costs, the already existing resources
need to be used in the best possible way to minimize the
costs, all the while making sure to not hinder the results.
To do so, we will take a look at one of the main
components of a Web farm, which is the LB system based
on a number of algorithms that have been used for many
years. Considering the fact that the aforementioned
algorithms are widely accepted in the LB systems and
many others, the question is which algorithm to use and in
which case for the optimal utilization of the system.
In order to find out which algorithm will provide the
best results in certain cases, it is necessary to fully
understand their operating principles. Depending on the
type of distribution of the tasks, i.e. the requirements, the
LB algorithms are divided into static (SLB) and dynamic
(DLB) ones [6][13][14].
In SLB algorithms, the requirements are assigned to
the executors (i.e. nodes) from the start, depending on
their characteristics. During the operation, it is not
possible to change the tasks or increase the number of
nodes.
TABLE I.
COMPARISON BETWEEN DIFFERENT ALGORITHMS FOR
THE DYNAMIC LOAD DISTRIBUTION [1]
Rand
Prand
Acwn
Pacwn
The possibility of
verifying the
state
No
Partially
Yes
Yes
Cyclic
Partially
Probalistic
Threshold
Least
Reception
Central
Global
Offer
Radio
Sed
Nq
Partially
Partially
Partially
Partially
Yes
Yes
Yes
Yes
Yes
Yes
Algorithm
MIPRO 2016/DC VIS
Performance
Excellent
Excellent
Good
Good
A little better than
the rand
Good
Better
Better
Not so good
Excellent
Good
Bad
Good
Good
Good
As previous research showed (see the table above),
there is not a large number of DLB algorithms used in
different situations, i.e. there is no universal DLB
algorithm to be applied to all computing distributed
systems [1]. Although almost all DLB algorithms have the
ability to verify the nodes status, i.e. the flow of
information and their load, none of them has the complete
information on the status of compute power. Their
verification applies only to the status and availability at
the application level, but not at the level of the operating
system. Thus, it is presumed that a LB system could be
more effective if it had a complete overview of the
system, i.e. an overview of its resources and the actual
load. However, that is a topic for another research.
II.
RESEARCH PROBLEM
Previous research in cloud computing is related to the
load balancer systems in Web clusters [2][3][11]
[16][17][18] [19], but also to the algorithms used in such
arrangements [1][7][15][17][20]. There are many
researches that deal with the topic, but there are two
problems:
1) Testing repeatability - in the tested environment, the
measurements are described poorly or not at all, while the
test samples are quite specific. In other words, it is not
possible to track those results in order to further expand
the existing research.
2) Partial analysis - those load balancer algorithms that
were compared are incomplete, i.e. a small number of
algorithms was compared, and in many cases the
observed variables are not clearly stated or defined.
For this reason, this paper will provide a complete
analysis of the LB system as well as the comparison
between its algorithms. During this research, the test
environment will be described in detail and the applied
technologies will be open source to enable for a higher
solution availability which enables the obtained data to be
used for future research.
III.
DATA SOURCES
The relevance of the data is provided by inclusion of
the data from a number of IT companies which provide
cloud computing services, or more accurately, Web
hosting services. These solutions are based on the Web
clusters that were previously explained.
225
All research and testing was carried out in one of the
data centers with a full access to the equipment and the
necessary financial data, i.e. the indicators essential for
this research.
In the given example, it is evident that the company
has a number Web clusters, categorized according to the
type of business or content. There are several reasons as
to why this type of architecture was opted for; one of
them is to achieve a high system availability, since a
certain group of Web sites is constantly exposed to
security attacks causing a hindrance and potential
unavailability of the system. The chart also makes it
possible to conclude that the resource utilization is not
linear and that it depends on the type of the content. In
other words, informative and business Web sites are
visited more during the day, while adult sites and sites
containing video games are frequented by night.
Given that there is a large part of computing resources
that are hibernating (that are not used optimally in the
course of 24 hours), the question is how to increase the
level of utilization of the computing resources. If we
present all this data on one chart, i.e. on a single Web
cluster, it can be noted that an overload is present on the
entire cluster. In other words, not even a larger system
could handle the entire load, wherein there would be time
periods when it would have very little traffic or load.
Thus, whether it be a number of separate Web clusters or
one major cluster, the computing resources would not be
used properly, and if we go back to the beginning of this
topic, we can remember how these resources generate
large monthly and capital costs.
Thus, the goal is to maximize the existing resources
without increasing the number of Web servers, as it is
done with commercial tools for the virtualization and
configuration of Web clusters, such as VMware. Such
tools perform a continuous monitoring of all Web servers
and their resources, causing a deploy of virtual machines
or Web servers in critical situations (e.g. in the case of a
sudden increase of traffic and the subsequent system
overload). This guarantees for the stability and
availability of the system, but it also generates additional
costs due to the increased number of servers. One way to
make a better use of computer resources is to choose the
best LB algorithm depending on the situation and the
amount of traffic that is present or expected.
This may not necessarily be true, but in order to
remove any doubt, this paper provides the analysis and
the comparison of a number of most commonly used
algorithms according to several different cases varying in
the request amount. The results will be gathered and
compared from a number of computer resources
monitoring tools, but also from the LB system in which it
is possible to monitor the load and the number of visits in
real time. The purpose is to determine whether
considerable savings are possible in regard to the
computing resources, or if the results are so similar that it
does not matter neither which algorithm is applied nor in
which case.
226
Figure 3. The system load by Web site category for 24 hours
Figure 4. The total system load in all Web sites categories of for 24
hours
IV.
TEST ENVIRONMENT AND CASES
The entire test environment is based on open source
tools, services and operating systems allowing for
reproducibility and greater availability of for further
analysis and research. Given that the test environment's
main objective is to test the load of Web nodes, the
complete redundancy of all service was not employed:
e.g. only one LB was configured, but not his passive
replica. Also, in order to simplify the reproducibility of
the test, classic local hard drives were used rather than a
central storage, since the main focus is on the computing
resources of the Web server and not on the data
processing performance of the storage. However, those
local disks contain a database that is configured in master
- master replication, i.e. an active synchronization of data
between two databases.
The environment consists of two Web clusters with
three Web servers, one of which is Apache and the other
nginx. Both Web services are included in order to fully
test both of them, depending on the type of content
(static/dynamic). As for the Website, or the content, a
generic Web site for CMS (Content Management
System) system was used, called WordPress (available at
the following URL: https://wordpress.org/latest.zip). For
the testing purposes, one Web site was installed on each
Web cluster. The content was not altered so that the test
could be repeatable, as well as used for further research.
As far as the infrastructure is concerned, the solution
consists of three physical servers with the following
virtual machines:
Physical server host1 with virtual machines: dtlb1 (LB), dt-db1 (first database) and dt-db2
(second database).
Physical server host2 with virtual machines: dtweb1, dt-web2, dt-web3 (nginx Web cluster).
MIPRO 2016/DC VIS
Physical server host3 with virtual machines: dtweb4, dt-web5, dt-web6 (Apache Web cluster).
Server specifications:
host1: 2x CPU, both quad-core, RAM 32GB,
local hard drive 140GB.
host2: 2x CPU, both quad-core, RAM 32GB,
local hard drive 140GB.
host3: 2x CPU, both quad-core, RAM 32GB,
local hard drive 140GB.
dt-lb1: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web1: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web2: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web3: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web4: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web5: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-web6: CPU, quad-core, RAM 4GB, local hard
drive 10GB.
dt-db1: CPU, quad-core, RAM 4GB, local hard
drive 20GB.
dt-db2: CPU, quad-core, RAM 4GB, local hard
drive 20GB.
Test environment specifications:
Physical server: HP ProLiant DL360 G7
OS: CentOS 6.7
Virtualization platform: Foreman Version 1.9.1
and Libvirt (virsh) 0.10.2
Database: MySQL mysql Ver 14.14 Distrib
5.6.12
LB: HA-Proxy version 1.5.4 2014/09/02
Scripting language: PHP 5.6.13
Web servers: Apache/2.2.15 and nginx/1.8.0
CMS (Web site): WordPress Version 4.3
Tools used for the assessment and comparison of the
results:
ApacheBench - enables traffic generation for a
specific Web site
Observium 0.15.12.7231 - monitoring of
computer resources and service load
HAProxy version 1.5.4 - monitoring of traffic, i.e.
the load of Web serves in real time
Three different cases were employed for test purposes:
Case 1 - Linear load of 100 consecutive
requirements
Case 2 - Load of 1000 requirements, 200 are done
simultaneously
Case 3 - Load of 10000 requirements, 500 are
done simultaneously in the course of 10 seconds
Tested algorithms [12]:
Roundrobin - according to this algorithm, each server
is used in turns according to the weights. It works best
MIPRO 2016/DC VIS
when the processing time on the server is equally
distributed. It is dynamic in nature, meaning that the
server weight can be adjusted. It is limited to 4095
active servers per backend. In some particular (and
very rare) cases, it can take several hundred requests
for the server to be re-integrated after being down for
a short time.
Static-rr - just as with the previous algorithm, each
server is used in turns according to the weights.
However, this algorithm is static, meaning that if the
server’s weight is changed during the process, it will
have no effect whatsoever. Next, this algorithm is not
limited regarding the number of servers, and when the
server goes up, it is always immediately reintroduced
into the farm. It also uses slightly less CPU to run
(approximately 1% less CPU).
Leastconn - with this algorithm, the server with the
lowest number of connections will receive the
connection. Round-robin is performed within server
groups working with the same load to ensure that all
servers will be used. It is recommended to use this
algorithm with very long sessions, such as LDAP,
SQL, TSE, etc. Yet, it is not very well suited for
protocols using short sessions, such as HTTP. The
algorithm is dynamic, meaning that server weights
may be adjusted during the process.
First - here, the first server with available connection
slots will receive the connection. Once a server
reaches its maxconn value, the next server will be
used. The purpose of this algorithm is to always use
the smallest number of servers so that extra servers
can be powered off during non-intensive hours. The
server weight is not important, and the algorithm is
more efficient in long session such as RDP or IMAP
than HTTP, though it can be used there as well. To
efficiently use this algorithm, it is recommended to
check server usage regularly and turn off unused
servers, as well as to regularly check backend queue
to turn servers on when the queue inflates.
Alternatively, using "http-check send-state" may
provide information on the load.
Source - The source IP address is hashed and divided
by the total weight of servers to determine which
server will receive the request, ensuring that the same
client IP address will always reach the same server if
the number of servers remains the same. If the hash
result changes, many clients will be directed to a
different server. The algorithm is generally used in
TCP mode where no cookie may be inserted, but can
also be used to provide a best-effort stickiness to
clients refusing session cookies. It is static by default,
meaning that changing a server's weight in the process
will have no effect, but this can be altered by using
"hash-type".
Uri - This algorithm hashes either the left part of the
URI or the whole URI and divides the hash value by
the total weight of servers, determining which server
will receive the request. This is used with proxy
caches and anti-virus proxies in order to maximize the
cache hit rate. Note that this algorithm may only be
227
used in an HTTP backend. It is static by default,
meaning that changing a server's weight in the process
will have no effect, but this can be changed using
"hash-type". The algorithm supports two optional
parameters - "len" and "depth", both followed by a
positive integer number, which can be helpful when
you need to balance the servers based on the
beginning of the URI only. With the "len" parameter,
the algorithm only considers that many characters at
the beginning of the URI to compute the hash. The
"depth" parameter indicates the maximum directory
depth to be used to compute the hash.
rdp-cookie - The RDP cookie <name> (or "mstshash"
if omitted) will be looked up and hashed for each
incoming TCP request. This is useful as a degraded
persistence mode, always sending the same user (or
session ID) to the same server. If the cookie is not
found, the normal roundrobin algorithm is used
instead. Note that the frontend must ensure that an
RDP cookie is already present in the request buffer.
For this you must use 'tcp-request content accept' rule
combined with 'req_rdp_cookie_cnt' ACL. This
algorithm is static by default, meaning that changing a
server's weight in the process will have no effect, but
this can be changed using "hash-type".
url_param - The URL parameter specified in
argument will be looked up in the query string of each
HTTP GET request. If the modifier "check_ post" is
used, an HTTP POST request entity will be searched
for the parameter argument, when it is not found in a
query string after a question mark. The message body
will be analyzed only after the advertised amount of
data has been received or if the request buffer is full.
In the rare case of using chunked encoding, only the
first chunk is searched. If the parameter is followed by
the “=” sign and a value, the value is hashed and
divided by the total weight of the running servers,
designating which server will receive the request. This
is used to track user identifiers in requests and ensure
that a same user ID will always be sent to the same
server if the number of servers remains the same. If no
value or parameter is found, a round robin algorithm
is applied. Note that this algorithm may only be used
in an HTTP backend. It is static by default, meaning
that changing a the weight in the process will have no
effect, but this can be changed using "hash-type".
hdr - The HTTP header <name> will be looked up in
each HTTP request. Just as with the equivalent ACL
'hdr()' function, the name in parenthesis is not case
sensitive. If the header is absent or does not contain
any value, the roundrobin algorithm will be used
instead. An optional 'use_domain_only' parameter is
available for reducing the hash algorithm to the main
domain part with some specific headers such as 'Host'.
This algorithm is static by default, which means that
changing a server's weight on the fly will have no
effect, but this can be changed using "hash-type".
228
Figure 5. Test environment architecture
The algorithms url_param and hdr were left out since
they are very specific and rarely used in clusters with a
larger number of domains.
V.
COMPARISON AND RESULTS
The testing was done Apache and nginx Web clusters
for all the above mentioned cases. During testing, two
main variables were monitored: the time necessary for the
processing of all requirements (measured in seconds) and
the system load, measured by the amount of
computational work performed by a computer system.
TABLE II.
Time of
execution
22.448
17.117
16.098
21.604
3.128
2.870
19.358
Algorithm
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
TABLE III.
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
TABLE IV.
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
Average system
load
0.5566
1.10333
0.71000
0.07000
0.02666
000001
0.07666
APACHE WEB CLUSTER – CASE 2
Time of
execution
44.078
34.655
34.802
103.678
13.239
8.979
33.612
Algorithm
Algorithm
APACHE WEB CLUSTER – CASE 1
Average system
load
29.63667
23.50667
22.73667
45.97333
1.55666
2.59
21.41333
APACHE WEB CLUSTER – CASE 3
Number of successfully
executed requests
581
402
252
22
1009
1139
10.037
Average
system load
9.65
5.26
23.65666
5.40334
2.87666
3.11
4.24333
MIPRO 2016/DC VIS
TABLE V.
NGINX WEB CLUSTER – CASE 1
Time of
execution
17.599
16.301
16.336
23.504
0.087
0.084
19.844
Algorithm
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
TABLE VI.
NGINX WEB CLUSTER – CASE 2
Time of
execution
6.771
35.342
33.843
68.544
0.309
0.188
35.993
Algorithm
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
TABLE VII.
Algorithm
leastconn
roundrobin
static-rr
first
source
uri
rdp-cookie
Average system
load
0.00667
0.07000
0.10333
0.02
0.000001
0.000001
0.00666
Average system
load
0.38333
20.8
20.221666
27.9666666
0.000002
0.000001
22.95333
NGINX WEB CLUSTER – CASE 3
Number of successfully
executed requests
25272
404
351
69
50000
50000
403
VI.
Average
system load
0.61333
6.23
3.53667
4.63667
0.000001
0.000001
6.33
CONCLUSION
The analysis shows that some algorithms provide
much better results than others. Namely, when looking at
the ratio of spent resources and the execution time for the
first case of Apache cluster, the best results were obtained
by the source and uri algorithms. These algorithms also
provided the best results in the second case, while the
worst second case result was given by the algorithm first
(two to three times worse results). In the case of a higher
load, i.e. in the third case for the Apache cluster, the best
results were again obtained by algorithms source and uri.
With the help of these algorithms, the largest number of
successfully processed requests featuring a lesser system
load was obtained in comparison with the remaining
algorithms.
A similar situation was found in the nginx cluster,
where the best results in all the cases were again obtained
with the help of source and uri algorithms, while the
algorithm first again gave the worst result. Other
algorithms provided more or less the same results, except
for the algorithm leastconn, which provided solid results
in the third case, compared to algorithms other than
source and uri.
computer resources which in turn positively reflect on the
CAPEX and OPEX costs mentioned at the beginning.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Ali, M.F.: The Study On Load Balancing Strategies In Distributed
Computing System. Int. J. Comput. Sci. Eng. Surv. 3, 2, 19–30
(2012).
Aversa, L., Bestavros, a.: Load balancing a cluster of web servers:
using distributed packetrewriting. Conf. Proc. 2000 IEEE Int.
Performance, Comput. Commun. Conf. (Cat. No.00CH37086).
(2000).
Chen, W. et al.: Design and implementation of server cluster
dynamic load balancing in virtualization environment based on
OpenFlow. Proc. Ninth Int. Conf. Futur. Internet Technol. 691–
697 (2014).
Chung-Cheng Li, Kuochen Wang: An SLA-aware load balancing
scheme for cloud datacenters. Int. Conf. Inf. Netw. 2014. 58–63
(2014).
Gruber, C.G.: CAPEX and OPEX in Aggregation and Core
Networks. 2009 Conf. Opt. Fiber Commun. - incudes post
deadline Pap. 9–11 (2009).
Iyer, S.: Load balancing and parallelism for the internet. July,
(2008).
Ji, Z., He, B.: A dynamic load balancing method for parallel
rendering and physical simulation system based sort-first
architecture. Proc. 2011 Int. Conf. Comput. Sci. Netw. Technol.
ICCSNT 2011. 3, 1792–1796 (2011).
Jung, J. et al.: Self-Adapting Load Balancing for DNS. 564–571
(2001).
Khiyaita, a. et al.: Load balancing cloud computing: State of art.
Netw. Secur. Syst. (JNS2), 2012 Natl. Days. 106 – 109 (2012).
Knoll, T.M.: A combined CAPEX and OPEX cost model for LTE
networks. (2014).
Lin, Z. et al.: A content-based dynamic load-balancing algorithm
for heterogeneous web server cluster. Comput. Sci. Inf. Syst. 7, 1,
153–162 (2010).
Luke, S. et al.: Essentials of Metaheuristics A Set of
Undergraduate Lecture Notes by BibTEX : (2011).
Mcheick, H. et al.: Load Balancing Mathematical Model. 2011
Dev. E-systems Eng. 581–586 (2011).
Pham, V. et al.: Gateway load balancing in future tactical
networks. Proc. - IEEE Mil. Commun. Conf. MILCOM. 1844–
1850 (2010).
Popa, L. et al.: A Cost Comparison of Data Center Network
Architectures. Proc. 6th Int. Conf. Co-NEXT ’10. (2010).
Rajavel, R.: De-Centralized Load Balancing for the
Computational Grid Environment. Comput. Intell. 419–424
(2010).
Teodoro, G. et al.: Load balancing on stateful clustered Web
servers. Proceedings. 15th Symp. Comput. Archit. High Perform.
Comput. (2003).
Ungureanu, V. et al.: Effective load balancing for cluster-based
servers employing job preemption. Perform. Eval. 65, 8, 606–622
(2008).
Werstein, P. et al.: Load Balancing in a Cluster Computer.Pdf.
(2006).
Yong Meng Teo, Ayani, R.: Comparison of Load Balancing
Strategies on Cluster-based Web Servers. Simulation. 77, 5-6,
185–195 (2001).
Opex-based data centre services: Co-location, managed services
and private cloud business support. 7,.
The purpose of the testing was to identify the
algorithms which provide the best results in various cases.
This research singles out algorithms source and uri, since
they are multiple times more successful that the rest,
which is a surprising fact since algorithms such as
lastconn and roundrobin are most frequently in use. It was
noted not only that algorithms can provide excellent
resuts, but also that they can enable multiple savings of
MIPRO 2016/DC VIS
229
Metamodeling as an Approach for Better
Computer Resources Allocation in Web Clusters
D. Alagić* and D. Maček **
NTH Media, Varaždin, Croatia
UniCredit S.p.A. Zweigniederlassung, Wien, Austria
da@nth.ch
davor.macek@foi.hr
*
Abstract - Constant changes are inherent to information
technology, the proof of which can be found in many
challenges, such as the business sector’s cloud computing.
Because of the recent economic and financial crisis
companies are forced to provide as best results as possible
with the least amount of costs. The main issue of cloud
computing are the computer resources (i.e. compute power).
Due to their less than optimal usage, computer resources
generate high-level costs. The term cloud computing
encompasses a wide range of technologies, therefore this
paper will focus only on Web clusters. It will describe the
issue of improper use of resources in these systems and its
main cause, as well as provide suggestions for further
optimization. Additionally, the paper will present a new
concept of HTTP / HTTPS traffic analysis which should
enable a more efficient usage of computing resources. The
concept has emerged from metamodeling of two methods.
The first one is method for distribution of HTTP / HTTPS
requests, and the other one is the Analytic Hierarchy
Process (AHP) method which is used for classification and
prioritization of requirements so that the entire Web cluster
could operate as best as possible with the least amount of
resources.
I.
INTRODUCTION
Note: The views expressed in this article are those of the
authors and do not necessarily reflect the views of the
NTH Media, UniCredit S.p.A., or Zagrebačka banka d.d.
In the last decade, the concept of cloud computing has
been used more frequently and we could say that it has
become an important part of any modern business system
as such. Cloud computing is a type of computing that
relies on sharing the computing resources with another
computer, starting with the applications and including a
variety of related services [4][9][10].
In order for the cloud computing to work, an IT
(Information Technology) infrastructure is necessary, i.e.
the Data Center. It is a well-known fact that such centers
are not cheap, considering their direct, indirect and general
costs. For a data center to survive in today's market,
continuous investments in the growth and improvement of
the system are required. In other words, capital
expenditures (CAPEX) as well as operating costs (OPEX)
are present [8][21].
The following chart represents the total cost of the
ownership of a single data center.
230
Figure 1. Total Cost of Ownership for DC [16]
An example of CAPEX cost in a single data center
represents an investment in new servers or network
equipment, while OPEX cost refer to the cost of
maintaining the system, such as: Internet connection,
electricity, air conditioning, space, maintenance and
replacement of hardware, rental space, etc. [11][16].
Due to these costs, the cloud computing providers are
trying to engage in multiple usage of the existing
resources. In other words, they strive to achieve the
automation and optimization of the system. Previous
results have shown that servers represent the biggest costs
because they result in high capital costs, and at the same
time have a high amortization rate. They also generate
high maintenance costs [16]. Therefore, the question is
how to apply multiple usage to such infrastructure, i.e. to
servers in order to minimize the mentioned costs. Due to
this problem, the virtualization of computing resources
appeared thirty years ago with which it has become
possible to distribute the resources of a single physical
device (in this case, the server) between several, smaller
virtual environments, i.e. between a number logical or
application process [15][19].
However, virtualization is now present in all data
centers and it is not "sufficient" as such [18]. In other
words, it is necessary to search for new ways and
technologies for multiple utilization and optimization of
computing resources next to virtualization, since that
would result in a smaller number of servers, and
ultimately reduce the consumption of electricity and other
operating expenses. Thus, by employing a single variable
- computer power - multiple savings are possible. To
achieve it, it is necessary to define the limits, i.e. to define
what kind of technology and services will be considered,
since cloud computing includes a number of them and
MIPRO 2016/DC VIS
also, they are rather specialized in terms of allocation and
utilization of resources.
Cloud computing services are categorized as follows:
Infrastructure-as-a-Service (IaaS), Platform-as-a-Service
(PaaS) and Software-as-a-Service (SaaS) [3][5][9]. This
paper mainly focuses on PaaS and SaaS services based on
the HTTP / HTTPS protocol. An example of such services
are Web sites where the user may or may not have a
complete access to and control over the application and its
contents or the data. Sets of such combined solutions, i.e.
Web sites are called Web farms, whose architecture and
design depend on the technology and type of content
(static or dynamic). For a better understanding of the main
cause of computer power consumption, we have to start
from scratch, i.e. from the amount of information that
must be processed in the allocated time.
II.
RESEARCH PROBLEM
In the past decade, there has been a rapid increase in
the use of cloud computing. To enable the existence of
such infrastructure, a large number of data centers was
made [6][10]. Such infrastructure is quite expensive
considering that its primary goal is to allow continuous
operation of all cloud computing services. In order to
allow for it, the entire system must be redundant (power
installations,
UPS,
air
conditioning,
network
infrastructure, etc.), wherein it has to have a spare
solution or system (a power generator, a BGP connection,
other geographic collocation for business continuity
purposes, etc.). All these items generate extremely high
CAPEX and OPEX costs which represent the main
motivation for improvement so as to allow savings at all
levels of the infrastructure and services [16].
Furthermore, this paper pinpoints the problems as well
as open issues representing the main motivation for this
research:
• High costs of data centers - High operating costs,
such as maintenance and electricity consumption,
as well as the renting of the DC area which is
usually charged by the square meter. Large capital
costs at every growth systems and at the same high
rate of depreciation on this type of goods. In other
words, the question is: how to allow for a multiple
usage of resources so as to decrease the amount of
investment.
• System complexity and availability - Nowadays,
all important business systems (but also all others)
are located in the data center. The vast majority of
these systems must not have any interruptions in
their work, meaning that the information systems
and infrastructure of a data center must be
available at all times. In order to achieve that,
information systems are becoming increasingly
complex, i.e. they have an increasing number of
components to be installed, configured and
maintained (e.g. redundancy, backup, etc.). Thus,
the question is how to simplify such systems
without jeopardizing their functionality and
availability. There is a number of commercial
solutions, but are complex and expensive, and they
MIPRO 2016/DC VIS
also require a lot more resources and investments
in infrastructure for the availability of the
information systems.
• Computer power utilization - As it was already
mentioned, the virtualization of resources has
existed for the past few decades. However, the
systems are becoming larger and more complex, so
that the current solutions are no longer sufficient if
we want to achieve a greater utilization of
computer resources. Modern solutions for a
multiple utilization of resources are mostly
specialized, i.e. their application scope is limited,
and thus the savings are not large-scale. In other
words, the solution should be scalable and
adaptable to multiple systems.
Due to all these problems, the question is how to
achieve a more efficient utilization of computing
resources in order to reduce the costs while simplifying
the system. Today, there are several models of Web
clusters. However, all of them share the system for the
allocation of HTTP / HTTPS requests (hereinafter
referred to as the LB system). Since the LB system
contains information on the real amount of traffic which
ultimately affects the computing resources, it can be
assumed that it is possible to achieve the goals of this
research if we improve the operation of the LB system.
To achieve that, this paper analyzes two methods whose
combination, or metamodel, should allow for more
efficient results.
III.
METHODS AND THEIR MODELS
This paper presents two methods: the first method
represents the basis for the operation of the LB system,
while the second method serves for the prioritization of
the type of Web traffic in the cloud infrastructure.
1) Load Balancing method
The method which represents the basis of the LB
system is always the same, only the operating algorithms
change. The following algorithms are well-known:
Round robin – This method for the allocation of
HTTP/HTTPS traffic is based upon one of the
simples and most frequently used algorithms. The
entire procedure is done by allocating a time
interval to each process in which it has to be
done. Upon the completion of the allocated time
interval, the system will stop the request and go
on to the next process on the list. If the process is
completed before the time runs out, the system
will process another request. Having used its
allocated time interval, the process goes on to the
end of the list. It is important to note that the
algorithm ensures that a time interval is assigned
to each and every request. The main issue here is
the determination of the optimal length of time. If
shorter, the system assigns a larger number of
requests, thus consuming more processor time
while transferring requests from one process to
the other. If the time interval is longer than
necessary, the response for the execution of
231
requests is also longer, resulting in a decreased
number of processed requests [12].
Weighted Least Connections – As with the
previous method, it is likewise necessary to have
a knowledge of the system and the expected
number of requests, since that is the basis for the
determination of the maximum number of
requests (connections) which can be processes by
a single nod. The process is done in cycles,
wherein the request first go to the node with the
largest number of free connections, i.e. the largest
amount of available resources [12].
The metamodel of the method for the allocation of
HTTPS/HTTPS request is shown through the example of
a company dealing with Web hosting. The following
table shows all key entities and their attributes.
TABLE I. METAMODEL METHOD OF THE LB SYSTEM
Data model
Appearance data
Department:
“213”
Department name:
“ICT”
Team:
“IT”
ICT
Resource:
“server
specification”
Web cluster:
“web01corporate”
Result:
“url”
10 CPU cores http://www.co
and 50 GB of mpany.com/c
Memory
ontact
Entity:
knowledge
Attribute:
knowledge_id
department_id
Entity:
Entity:
“person”
“method”
Attribute:
Attribute:
“person_id”
“method_id”
“first_name”
“lb_algorithm_i
“last_name”
d”
“type_of_job”
Person:
“422432”
Method:
First name:
“Load
“Tom”
Balancing”
Last name:
Algorithm:
“Clark”
“Round Robin”
Type of job:
“Technician”
Tom Clark
Entity:
it_service
Attribute:
service_id
Appearance
data
Entity:
“process”
Attribute:
“process_id”
Entity:
Entity:
“resource”
“result”
Attribute:
Attribute:
“resource_id”
“result_id”
“web_cluster_id”
Process:
“HTTP GET request”
Knowledge:
“131”
Department:
“213”
Service:
“562”
Real world
appearance
Data model
Entity:
“department”
Attribute:
“department_id”
“name”
“team”
Real world
appearance
Meta
model
Object:
Entity
Attribute
http://www.company.c
om/contact_form.asp?
name1=value1&name
2=value2
Web Clustering
Hosting Web Site
Round Robin
Entity:
external_sources
Attribute:
end_user_id
http_method
http_request
End user:
“33.176.216.1”
HTTP method:
“POST”
HTTP request:
“http://www.compa
ny.com”
33.176.216.1
Figure 3. ER model - Load Balancing method
Figure 2. Business model - Load Balancing method
232
2) Method for the prioritization of the type of Web
traffic in the cloud infrastructure
It is clear that load-balancing algorithms for the
utilization of some parts of the cloud infrastructure are
already developed and well-known. But there is a
problem regarding the efficient prioritization of the usage
of Web farm resources according to the type of Web
traffic. For example, due to high CAPEX and OPEX
costs, it would be quite unacceptable to allocate a Web
MIPRO 2016/DC VIS
The Analytic Hierarchy Process is among the most
widely exploited decision making techniques when the
decision is based on several tangible and intangible
criteria and sub-criteria. It is recognized as one of the
leading theories in multicriterion decision making field.
The application of the AHP technique has increased
significantly in the recent years, especially in the field of
IT due to its mathematical proven basis. The reason for
choosing the AHP and not some other MCDM technique
lies in the fact that there were already many proven
researches on applying the AHP to the load-balancing
MIPRO 2016/DC VIS
The following table shows the metamodel (Table II.)
of the AHP method for the prioritization of the cloud
Web traffic, according to which the business model was
designed (Figure 4.). From the defined business model
shown in Figure 4, the ERA (Entity-RelationshipAttribute) model was derived with all of its tables,
attributes and necessary relationships (Figure 5.).
TABLE II.
METAMODEL OF THE AHP METHOD FOR WEB TRAFFIC
PRIORITIZATION
Object:
Entity
Attribute
Meta
model
Data model
The Analytic Hierarchy Process (AHP) is a structured
technique for organizing, analyzing and making complex
decisions, based on mathematics and psychology. The
AHP is a multi-criteria decision-making (MCDM)
approach, introduced by mathematician Thomas L. Saaty.
The AHP is a decision making support tool which can be
used to solve complex decision problems. It uses a multilevel hierarchical structure of objectives, criteria, subcriteria and alternatives. Important data are derived by
using a set of pairwise comparisons. These comparisons
are further used to obtain the weights of importance of
the decision criteria and the relative performance
measures of the alternatives in terms of each individual
decision criterion [7].
So, the AHP is an effective MCDM tool that meets
the objective of decision making by ranking the choices
according to their merits. Now, upon clarifying which
MCDM method will be used, it is necessary to define the
AHP criteria for the prioritization of types of Web traffic
to choose the optimal load-balancing system. To make a
set of necessary criteria for the types of Web traffic, it is
also necessary to know which kind of business
environment it entails. In this case, we concentrated on
the banking business environment and their Web traffic.
The selected criteria are influenced by business
requirements and levels of criticality of individual
network services that support certain business banking
models. Thus, for the bank Web traffic we defined the
following evaluation and prioritization criteria, in
addition to the selection of the optimal cluster node:
public Web, e-banking, m-banking, e-commerce and IP
POS traffic. The organization uses the AHP (Analytic
Hierarchy Process) technique as a method for the
prioritization of Web traffic (domain) in the cloud
environment. The actual goal is to prioritize the cloud
Web traffic based on its type, and to select an adequate
(or optimal) alternative (web cluster node) according to
the defined criteria – a kind of Web traffic tagging.
Appearance data
The reason for choosing a multi-criteria decisionmaking technique rather than a technique such as the
SWOT analysis (Strengths, Weaknesses, Opportunities,
and Threats) lies in its applicability. The SWOT analysis
indeed provides a sufficient amount of quite descriptive
data in regard to each element of the analysis. Yet, it does
not include a defined criteria by which some individual
system or solutions could be evaluated and/or prioritized.
In this specific case, that would mean to choose an
appropriate Web cluster system according to the type of
Web traffic. Also, the final result of the evaluation done
with the help of the MCDM technique provided some
numeric/quantitative ratios among alternatives, i.e. the
Web cluster nodes. This is certainly measurable (and
absolutely necessary), unlike the descriptive information
provided by the SWOT analysis, which does not facilitate
the selection of the optimal cluster node. There are a few
MCDM techniques available for complex decision
making issues, but the AHP and TOPSIS techniques are
among the most frequent techniques used today [2]. The
reason for their extensive usage is their proven
mathematical foundation.
issues [1][20][22]. It is true that the TOPSIS MCDM
technique also has a certain number of proven
applications in multi-criteria decision making problems
related with load-balancing issues, but that number is still
lesser when compared to the AHP technique [13][17].
Also, according to the research, the AHP has more
applications in banking industry [2].
Real world
appearance
farm with plenty of resources (CPU, RAM) to a less
critical type of HTTP or HTTPS traffic. This would
present an imminent financial risk due to inadequate
managing of the available resources. Operational risk
would likewise be present, as well as the reputational one,
if a critical type of traffic is left without enough
resources. Therefore, in the context of the cloud
computing and Web traffic, the aim is to choose the
optimal load-balancing system depending on the type of
Web traffic. This leads us to the multi-criteria decision
making problem regarding the conditions of uncertainty
and risk. In order to address this complex problem, it is
necessary to use one of the multiple-criteria decision
making (MCDM) or decision analyses (MCDA)
techniques available today.
Entity:
“organization”
Entity:
Entity:
Entity:
Entity:
“method”
“domain”
“goal”
“criteria”
Attribute:
“organization_id Attribute:
Attribute:
Atribut:
Atribut:
”
“method_id” “domain_id” “goal_id” “criteria_id”
“name”
“name”
“description” “description” “name”
“address”
Organization:
Method:
“213”
Goal:
“7”
“58”
Name:
Domain:
Criteria:
Name:
“CloudOcean”
“Web Site” Description:
“14”
“Analytic
Address:
Description: “Web Traffic
Name:
Hierarchy
“30 Algonquin,
“url”
Prioritization “e-banking”
Process
New York, NY,
”
(AHP)”
USA”
CloudOcean
Analytic
http://www.co Web Traffic
Hierarchy
mpany.com Prioritization
Process (AHP)
e-banking
Entity:
“alternative”
Atribut:
“alternative_i
d”
“name”
Alternative:
“22”
Name:
“Web Cluster
web01corporate”
Web Cluster
web01corporate
233
Figure 4. Business model for the AHP method for Web traffic
prioritization
and the AHP method, we succeeded in developing a new
hybrid metamodel capable of prioritizing Web traffic
according to its type. The metamodel can also help load
balancers to direct the traffic to the appropriate Web
cluster node, ultimately resulting in a better resource
allocation and reduction of costs which represent the
ultimate goals of this research.
V.
EXPECTED CONTRIBUTIONS
The purpose of this research is to propose a concept
model combining two methods, the implementation of
which would entail multiple social and business
justifications for the following reasons:
•
•
Figure 5. ER model for the AHP method for Web traffic prioritization
IV.
COMMON METAMODEL METHODOLOGY
The Load Balancing (LB) method and the AHP
method for Web traffic prioritization have only two
common points – the method entities and Web traffic type
(IT_service in LB_method and its corresponding table
Domain in the AHP_method). However, these two
common points are actually crucial for the existence of
this new hybrid metamodel. By merging the LB method
Wide availability - today there are many commercial
solutions that are very expensive and demanding at
the same time, i.e. it takes a lot of compute power for
their proper operation. The purpose of this paper is to
use the open source technology and solutions so that
everyone can benefit from them, not only ICT
businesses, but also the scientific community for
research purposes. The new algorithm will also be
publicly available to everyone.
Improved utilization of resources - one of the main
objectives is: "How to achieve multiple usage of the
existing compute power"? As noted above, the
virtualization of computing resources is present
almost everywhere however, for it requires the socalled bare bone (physical resources) whose price
range and amortization are quite high the new
algorithm should orchestrate Web servers and their
free resources on its own and allocate them where
necessary. This is a drastic change compared to the
commercial solutions which solve the lack of
resources by configuring and delivering new instance
of Web server, which ultimately consumes even
Figure 6. ER model for the common metamodel
234
MIPRO 2016/DC VIS
•
•
•
•
more compute power.
Increased system availability - since compute power
can be better utilized, this means that higher
scalability and system availability are also possible.
Wide use - the goal of this solution represents its
broader application and the fact that the new
algorithm can be applied to all types of Web services
(Nginx and Apache), but also for both types of
content, that is the static and the dynamic.
Increased financial profitability (lower OPEX) - the
main requirement for any cloud computing is greater
availability. In order to achieve this, a larger and
more complex infrastructure is used that is expensive
to maintain. The new algorithm should allow
multiple utilization of computing resources that
provides even greater scalability with a lesser server
density, i.e. the necessary infrastructure. Since the
computing resources can be better utilized, we can
assume that the new algorithm should reduce the
number of physical servers which would result in
savings because it would allow lower maintenance
costs (less equipment and network infrastructure), a
smaller number of equipment (cost of maintaining
and replacing hardware), lower monthly expenses of
the data center (electricity and air conditioning) and
the like.
Environmental protection - due to the already
mentioned high costs of a single data center, very
few of them are in compliance with the
environmental regulations. In other words, the
electricity they use for their infrastructure does not
come from renewable energy sources, therefore the
aim is to reduce their electricity consumption. The
new algorithm should reduce the number of servers
due to the possibility of reallocating computing
resources which would ultimately result in lower
energy consumption.
VI.
CONLUSION
In the last decade, there has been a rapid increase
regarding the usage of the cloud computing. In order for
a such infrastructure to exist, a large number of data
centers has been made [6][10]. Such infrastructure is
quite expensive since its primary goal is to enable a
continuous operation of all cloud computing services. To
achieve that, the entire system must be redundant (power
installations,
UPS,
air
conditioning,
network
infrastructure, etc.), with a secured backup system (a
power generator, a BGP connection, other geographic
collocation, etc.). All these items generate extremely high
CAPEX and OPEX costs which is the main motivation
for improvement so as to allow for savings at all levels of
the infrastructure and services [16].
Consequently, this paper proposes a new model
created through a combination of two methods, which
could lead to achieving the set goals along with further
upgrading and improvement. For future research, we
propose the development of some scripts (e.g. Python
scripts) in the first phase. In the second phase, we
propose the development of the entire software solution
MIPRO 2016/DC VIS
in a certain programming language to support this new
metamodel. Also, we propose to investigate as to whether
is possible to use some other multi-criteria decision
making tool like TOPSIS, instead of the AHP. In case of
success, that would be additional forte for this hybrid
metamodel because of its modularity and scalability.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Aghazarian, V.: RQSG-I : An Optimized Real time Scheduling
Algorithm for Tasks Allocation in Grid Environments. 205–210
(2011).
Aruldoss, M.: A Survey on Multi Criteria Decision Making
Methods and Its Applications. Am. J. Inf. Syst. 1, 1, 31–43 (2013).
Celesti, A. et al.: An approach to enable cloud service providers to
arrange IaaS, PaaS, and SaaS using external virtualization
infrastructures. Proc. - 2011 IEEE World Congr. Serv. Serv. 2011.
607–611 (2011).
Chung-Cheng Li, Kuochen Wang: An SLA-aware load balancing
scheme for cloud datacenters. Int. Conf. Inf. Netw. 2014. 58–63
(2014).
Detal, G. et al.: Multipath in the middle (box). Work. Hot Top.
middleboxes Netw. Funct. virtualization. 1–6 (2013).
Dixon, D., Basiliere, P.: Cloud Computing for the Sciences. Anal.
June, 1–6 (2009).
Engineering, I. et al.: Using the Analytic Hierarchy Process for
Decision Making in Engineering Applications : Some Challenges.
Int. J. Ind. Eng. Theory, Appl. Pract. 2, 1, 35–44 (1995).
Gruber, C.G.: CAPEX and OPEX in Aggregation and Core
Networks. 2009 Conf. Opt. Fiber Commun. - incudes post
deadline Pap. 9–11 (2009).
Islam, S.S. et al.: Cloud computing for future generation of
computing technology. 2012 IEEE Int. Conf. Cyber Technol.
Autom. Control. Intell. Syst. 129–134 (2012).
Jadeja, Y., Modi, K.: Cloud computing - Concepts, architecture
and challenges. 2012 Int. Conf. Comput. Electron. Electr. Technol.
ICCEET 2012. 877–880 (2012).
Knoll, T.M.: A combined CAPEX and OPEX cost model for LTE
networks. (2014).
Luke, S. et al.: Essentials of Metaheuristics A Set of
Undergraduate Lecture Notes by BibTEX : (2011).
Ma, F. et al.: Distributed load balancing allocation of virtual
machine in cloud data center. Softw. Eng. Serv. Sci. (ICSESS),
2012 IEEE 3rd Int. Conf. 20 – 23 (2012).
Popa, L. et al.: A Cost Comparison of Data Center Network
Architectures. Proc. 6th Int. Conf. Co-NEXT ’10. (2010).
Rubenstein, B., Faist, M.: Data Center Cold Aisle Set Point
Optimization through Total Operating Cost Modeling.
Sampson, J., Tullsen, D.M.: Battery Provisioning and Associated
Costs for Data Center Power Capping. (2012).
Sayari, Z., Harounabadi, A.: Evaluation and Making a Tradeoff
between Load Balancing and Reliability in Grid Services using
Formal Models. 127, 9, 5–10 (2015).
Singh, A. et al.: Server-storage virtualization: Integration and load
balancing in data centers. 2008 SC - Int. Conf. High Perform.
Comput. Networking, Storage Anal. SC 2008. November, (2008).
Soundararajan, V., Herndon, B.: Benchmarking a Virtualization
Platform. 99–109 (2014).
Wang, M. et al.: Analytic hierarchy process in load balancing for
the multicast and unicast mixed services in LTE. 2012 IEEE
Wirel. Commun. Netw. Conf. 2735–2740 (2012).
Wiboonrat, M.: Life Cycle Cost Analysis of Data Center Project.
(2014).
Zhang, J.-D. et al.: Load Balancing Based on Group Analytic
Hierarchy Process. 2013 Ninth Int. Conf. Comput. Intell. Secur.
758–762 (2013).
235
Showers Prediction by WRF Model above
Complex Terrain
T. Davitashvili *, N. Kutaladze **, R. Kvatadze ***, G. Mikuchadze **, Z. Modebadze *, and I. Samkharadze*
* Iv. Javakhishvili Tbilisi State University/ I.Vekua Institute of Applied mathematics, Tbilisi, Georgia
** Georgian Hydro-meteorology Department, Tbilisi, Georgia
*** Georgian Research and Educational Networks Association, Tbilisi, Georgia
**** Iv. Javakhishvili Tbilisi State University/ Exact and Natural Sciences Faculty, Tbilisi, Georgia
tedavitashvili@gmail.com
Abstract - In the present article we have configured the
nested grid WRF v.3.6 model for the Caucasus region,
taking into consideration geographical-landscape character,
topography heights, land use, soil type, temperature in deep
layers, vegetation monthly distribution, albedo and others.
Computations were performed using Grid system GE-01GRENA with working nodes (16 cores+, 32GB RAM on
each) located at the Georgian Research and Educational
Networking Association (GRENA) which had been included
in the European Grid infrastructure. Therefore it was a
good opportunity for running model on larger number of
CPUs and storing large amount of data on the Grid storage
elements. Two particulate cases of unexpected heavy
showers which took place on 13th of June 2015 in Tbilisi and
20th of August 2015 in Kakheti (eastern Georgia) were
studied. Simulations were performed by two set of domains
with horizontal grid-point resolutions of 6.6 km and 2.2 km.
The ability of the WRF model in prediction precipitations
with different microphysics and convective scheme
components taking into consideration complex terrain of the
Georgian territory was tested. Some results of the numerical
calculations performed by WRF model are presented.
I.
INTRODUCTION
Regional climate formation above the territory of
complex terrains is conditioned due to joint actions of
large-scale synoptic and local atmospheric processes
where the last one is basically stipulated by complex
topography structure of the terrain. The territory of
Caucasus and especially territory of Georgia are good
examples for that. Indeed compound topographic sections
(85% of the total land area of Georgia are mountains)
play an impotent role for spatial-temporal distribution of
meteorological fields. As known the global weather
prediction models can well characterize the large scale
atmospheric systems, but not enough the mesoscale
processes which mainly associated with regional complex
terrain and land cover. For modeling these smaller scale
atmospheric processes and its characterizing features it is
necessary to take into consideration the main features of
the local terrain, its heterogeneous surfaces and influence
of large scale atmosphere processes on the local scale
processes. The Weather Research and Forecasting (WRF)
236
models are widely used by many operational services for
short and medium range weather forecasting [1]. The
WRF model version 3.6 represents a suitable mean for
studding regional and mesoscale atmospheric processes
such are: Regional Climate, Extreme Precipitations,
Hails, influence of orography on mesoscale atmosphere
processes, sensitivity of WRF to physics options etc. [1].
As a matter of fact the Advanced Research Weather
Research and Forecasting Model (WRF-ARW) is a
convenient research tool, as it offers multiple physics
options and they can be combined in many ways [2]-[9].
Indeed in WRF-ARW model the main categories of
physics parameterizations (microphysics, cumulus
parameterizations, surface physics, planetary boundary
layer physics and atmospheric radiation physics)
mutually connected via the model state variables
(potential temperature, moisture, wind, etc.) by their
tendencies and via the surface fluxes [1]-[5]. Taking into
account this broad availability of parameterizations it is
not easy to define the right combination that better
describes a meteorological phenomenon dominated above
the investigated region. Many works have been dedicated
to the problem of identification the best combination of
parameterizations in WRF model that better represents
the atmospheric conditions above the investigated region
[2]-[26]. Three main combination of the microphysics
parameterization schemes (WRF Single-Moment 3-class
(WSM3) scheme[11], Eta Ferrier scheme [12], Purdue
Lin scheme [13]) and 3 cumulus schemes (Kain-Fritsch
[7],
Betts-Miller-Janjic
[14]-[15],
Grell-Devenyi
ensemble [16]) were chosen for identification which
combination in WRF model was better simulated the
atmospheric lightning conditions in the Brazil
southeastern [5]. Analysis of numerical calculations in [5]
had shown that sets of WSM3 with Kain-Fritsch schemes
and Eta Ferrier with Betts-Miller-Janjic schemes, were
better represented temperature and wind at surface, low
and medium levels in the atmosphere, than Purdue Lin
with Grell-Devenyi ensemble schemes. Although the sets
of the WSM3 with Kain-Fritsch schemes and Eta Ferrier
with Betts-Miller-Janjic schemes, have been highlighted
again among all the combinations and the set WSM3 with
MIPRO 2016/DC VIS
Kain-Fritsch schemes was the one which presented the
best results for the meteorological variables [5]. Using
WRF model six different convective parameterization
schemes (CP’s) were investigated for studying the impact
CP’s on the quality of rainfall forecast over Tanzania for
two rainy seasons [2]. The results of numerical
experiments had showed that for extreme rainfall
prediction the Betts-Miller-Janjic (BMJ) and Ensemble
Grell-Devenyi (GD) Schemes gave better results than any
other CP’s for different regions and seasons in Tanzania.
Also WRF model to some extent performs better in the
cases of extreme rainfall [2]. Eight microphysics schemes
(Lin, WSM5, Eta, WSM6, Goddard, Thompson, WDM5,
WDM6) and three DA techniques were examined in the
WRF-ARW for reproduction observed strong convection
and 3 heavy precipitations over the US SGP for events of
27_31 May 2001 [3]. To reproduce vertical structure and
evolution of the warm-season convective events (27_31
May 2001) high temporal resolution data and cloud
measurements millimetre cloud radar (MMCR) were
used. From all simulation results an important deduction
was that quite accurate reproductions of lower
tropospheric moisture, temperature and wind profiles
were necessary for the successful application,
implementation of cloud-resolving regional models to
simulate deep convection [3]. In [4] results of 24 h.
predictions by WRF model had shown that the KainFritsch BMJ and Grell schemes were not in satisfactory
quality and the simplified Arakawa cumulus
parameterization scheme with Lin et al. micro-physics
scheme, RUC Land surface model, Asymmetric
Convective Model (ACM2), planetary boundary layer
physics gave a better result than others for produce
reasonable predictions in the Kelani River basin in Sri
Lanka [4]. The sensitivity of quantitative precipitation
forecasts to various modifications of the KF scheme and
determination at which grid spacing values the KF
scheme may no longer be needed on simulated
precipitation was studied in [17]. By the way the KainFritsch scheme [7],[18],[19] is frequently used to improve
forecasts for convective parameterization at grid spacing
below 20 km, likely because it has been shown that KF
scheme perform better convective parameterization than
other CPSs such as the Betts-Miller-Janjic and GrellDevenyi schemes [8], [9], [10]. Also the KF scheme
outperformed others for the 4 km simulation that used no
convective scheme, so KF scheme can be used to
improve forecasts even at such high resolutions
[8],[17],[20]. Atmosphere processes and parameterization
of physics for the Caucasus territory have been tested by
the WRF model using the following schemes: WSM 3class simple ice scheme, RRTM scheme, Dudhia scheme,
unified Noah and-surface model, Yonsei University
scheme, Kain-Fritsch (new Eta) scheme and Noah landsurface model scheme [25], [26]. In this study, WRF is
using for prediction heavy showers and hails for different
set of physical options over the regions characterized
with the complex topography. Mesoscale Convective
EU Commission Project H2020 VI-SEEM №675121
.
MIPRO 2016/DC VIS
Systems (MCS) have been studied using real data and
WRF simulations based on grid spacing in the range from
2.2 km to 19.8 km with an emphasis on 2.2 km. The
ability of the WRF model in prediction precipitations
with different microphysics and convective scheme
components taking into consideration complex terrain of
the Georgian territory have been tested.
II.
DATA AND METHODOLOGY
A.
Observational data
Hydro-meteorological observations (HMO) and
research in Georgia began in 1887 and in 1974 it was
performed in 33 HMO stations allocated in 11 towns of
Georgia. Since 1992 in the system of HMO arouse some
problems due to political and economical processes
developed on the territory of Georgia. Namely there was
decreased a number of HMO, functioning stations were
out of modern devices. At present air quality monitoring
performs by National Agency of Environment and under
his jurisdiction are 7 observation stations distributed in the
5 cities of Georgia: Tbilisi, Rustavi (eastern Georgia)
Kutaisi, Zestafoni and Batumi (western Georgia). Each
city has only 1 or 2 observation stations and only
exception is capital city of Georgia Tbilisi were for the
last ten-year period the observation were carried out in 8
posts, located in different districts of Tbilisi. It is obvious
that these numbers of stations are not enough for
assessment of hydro-atmosphere statement over the
territory of Georgia. In fact we have hydrometeorological
information only for separated areas were the stations are
located. We have obtained and analyzed only scant
information on air temperature, air humidity precipitation
amount, wind (speed, direction), of 13-14 June and 20-21
August 2015 in Tbilisi. All the above data were obtained
from Hydro-meteorology Department of Georgia and
from the meteorological post of Tbilisi State University.
Also we have analyzed radar’s information on clouds
structure located in Kakheti region and these data have
been used for assessment WRF model results.
B. Observed convective events during 13-14 June 2015
Weather on the night of 13 to 14 June 2015 in Tbilisi
was terrible with showers, thunderstorms and lights.
According to the Department of Hydrometeorology of
Georgia the maximal temperature on 13th of June reached
290 C. There were transfer of heat from the south by wave
and it stipulated high temperature showers with
thunderstorms and lights in Tbilisi. A shocking accident
took place on the night of 13 to 14 June 2015 in Tbilisi.
Namely late on 13 June 2015 during 1.5-2 hours there was
heavy shower and following of the heavy rainfall, a
landslide was released above the village of Akhaldaba,
about 20 km southwest of Tbilisi. The collapsed 1 million
m3 of land, mud, rocks and trees moved down from the
Akhaldaba mountain into Tbilisi and dammed up the Vere
river (the Vere river is flows from the Akhaldaba
mountain through the territory of Tbilisi’s zoo and further
discharge into the Kura river by a tunnel under square of
Heroes). A big wave (constructed by mass of slush, rocks
and trees) run across the Vere canyon and washed
everything away until the square of Heroes. The resulting
flood inflicted severe damage on the Tbilisi Zoo, Heroes'
237
Square and nearby streets and houses (see Fig.1).
Unfortunately this process has been resulted in at least 20
deaths, including three zoo workers and leaving half of the
Tbilisi Zoo’s animal inhabitants either dead or on the
loose.
the region has 15km height with maximal reflection
60dB. At 20:18 the cloud system with height 15km and
maximal reflection 60dB from the territory of Akhmeta
continued moving towards south-east direction and at
21:13 it reached territory of Kvareli with height 16km
and maximal reflection 60dB. The cloud system
continued migration and at 23:07 it shifted to the
Lagodekhi territory the height and reflection of the cloud
system began depletion. At 01:59 were formed a new
clod system and it began moving from north-west to the
north-east direction and at 02:19 it has 10km height,
maximal reflection 50dB at 02:42 it achieved Akhmeta
territory continued moving toward to the north-east
direction and leaved investigated region.
Figure 1. After flood on 14 June 2015 inTbilisi
C. Observed convective events 20-21 August 2015
It was dominated western atmospheric processes above
the territory of Georgia from 19 to 21 August 2015. There
were developed inner massive processes above the
territory of Tbilisi and it was hailed in the evening of 19th
August 2015. In the evening of 20th August 2015 it was
raining with thunderstorms and lights in Tbilisi. The
maximal temperature on 20th of August reached 360 C and
on 21st August it reached 310 C. Also on 20th of August
2015 a heavy rainfall was observed above the Kakheti
region (Kakheti is famous wine-making region in eastern
Georgia) of Georgia. Downpours with hail cause
destruction to some regions of Kakheti and resort suburbs
of Tbilisi Kojori and Kiketi, where ground floors of many
houses were flooded in the evening of 20th of August
2015. Namely, caused by the violent weather the rain
with hail lasted for half an hour and in some settlements
of the Gurjaani, Lagodekhi and Kvareli districts broke
roofs and even walls of houses. As a result, 50 percent of
crops have been destroyed and many fruit trees in
orchards were damaged (Source: http://eng.kavkazuzel.ru/articles/21952/). The latest largest hail storm in
Kakheti was recorded in July 2012 when 100 percent of
all crops were destroyed in several villages. Furthermore,
hundreds of houses were left without roofs and many
cattle died. Hail has always been a major issue for people
in Kakheti. Every year a large portion of the agricultural
sector, particularly grapes, were damaged by hail, leaving
farmers with no crops. It became necessary to put into
practice some implementation for solution to the hail
problem. More than 80 anti-hail firing points and
equipments (radars) have been installed in Kakheti for
reduction the damage to crops.
According to radar’s allocated in the Kakheti region
we have obtained the following information concerning
on cloud characteristics progression in the atmospheric
column over the investigated region. Namely at 19:00
o’clock of 20th August from south-west of radar system
there was outbreak of cloud systems having atmospheric
front appearance (looks), which was moving towards to
north-east direction, which from 19:42 began weakening.
At 19:20 a new clod systems were formed and began
moving from north-west to the town Akhmeta and 19:49
it achieved Akhmeta and the atmospheric column over
238
D. WRF model simulation design
The Advanced Research WRF ARW model version 3.6
was used to simulate the warm-season heavy precipitation
cases of the 13-14 June 2015 and 19-21 August 2015.
Below some details of the numerical schemes are present.
In our study we have used one-way nested domains
centered on the territory of Georgia. Namely simulations
were performed using a set of 2 domains with horizontal
grid-point resolutions of 6.6km and 2.2 km, both defined
as those currently being used for operational forecasts.
The coarser domain has a grid of 94x102 points which
covers the South Caucasus region, while the nested inner
domain has a grid size of 70x70 points mainly territory of
Georgia. Both use the 54 vertical levels including 8 levels
below 2 km. A time step of 10 seconds was used for the
nested domain. The WRF model contains a number of
different physics options such as micro physics, cumulus
parameterization physics, radiation physics, surface layer
physics, land surface physics, and planetary boundary
layer physics. Microphysics firmly determine water
vapor, cloud, and precipitation processes [1]. There are
number of microphysics such as Kessler scheme, Purdue
Lin scheme, WRF Single-Moment3-class (WSM3)
scheme, WSM5 scheme, WSM6 scheme, Eta Grid-scale
cloud and precipitation scheme, Thompson et al. scheme,
Goddard cumulus ensemble model scheme and Morrison
et al Moment scheme. In our study we have chosen
WSM6, Thompson, Purdue Lin, Morrison 2 Moment and
Goddard schemes. Cumulus parameterization schemes
are responsible for the sub-grid-scale effects of
convective and/or shallow clouds and theoretically valid
only for coarser grid sizes [1]. The types of cumulus
parameterization schemes are Kain-Fritsch scheme, Betts
– Miller - Janjic scheme, Grell – Devenyi ensemble
scheme, Grell-3d ensemble scheme and Simplied
Arakawa scheme. We have chosen Kain-Fritsch, Betts –
Miller - Janjic and Grell – Devenyi ensemble schemes
for our experiments.The planetary boundary layer (PBL)
is responsible for vertical sub-grid-scale fluxes due to
eddytransports in the whole atmospheric column [1].
Parameterization of the PBL directly influences on
vertical wind shear, as well as precipitation evolution
[21],[22]. In [23] summarized the main characteristics
that explain the differences among WRF PBL schemes
MIPRO 2016/DC VIS
and also there was investigated how the PBL evolves
within the ARW using 4-km grid spacing. There are
number of PBL schemes such as Yonsei University
scheme, Mellor-Yamada-Janjic scheme, MRF scheme,
Asymmetric Convective Model, Quasi-Normal Scale
Elimination and Mellor-Yamada Nakanishi and Niino
schemes are the varieties. According to [23] we have
mainly chosen Yonsei University scheme. The landsurface models use atmospheric information from the
surface layer scheme, radiative forcing from the radiation
scheme, and precipitation forcing from the microphysics
and convective schemes, together with internal
information on the land’s state variables and landsurface
properties, to provide heat and moisture fluxes over land
points and sea-ice points[2]. Five-layer thermal diffusion
model, Noah Land Surface Model, RUC Land Surface
Model and Pleim-Xiu Land Surface Model are the
species of land-surface models. We have chosen Noah
Land Surface Model. After considering various
combinations of microphysics, Cumulus parameterization
schemes, Land surface-physics and planetary boundary
layer physics its combination for our experiments are
given in the Table 1.
850 hPa for 13 June (21UTC) and 14 June (00UTC)
2015, respectively, which were simulated by WRF
Physics Options set1 (it gave a better result than others).
The calculated amounts of water vapor presented on the
Fig.2 and Fig.3 (nested domain with 6.6 km resolution) at
the accidental moments when atmospheric event were in
full swing are not in satisfactory agreement with real
situation which took place in Tbilisi and surroundings on
13 June 2015.
TABLE 1. Five set of the WRF parameterizations used in this study.
WRF
Physics
Micro
physics
Set1
Set 2
Set 3
Set 4
Set 5
WSM
6
Thom
pson
Purdu
e Lin
Morrison
2-Moment
Goddard
Cumulus
Paramet
erization
Surface
Layer
Planet.
Boundar
y Layer
LandSurface
Atmosph
eric
Radiat.
KainFritsc
h
MM5
Simil.
YSU
PBL
BettsMiller
Janjic
MM5
Simil.
YSU
PBL
KainFritsc
h
MM5
Simil
YSU
PBL
GrellDevenyi
ensemble
(PX)
Similarity
ACM2
PBL
KainFritsch
Noah
LSM
RRT
M/Du
dhia
Noah
LSM
RRT
M/Du
dhia
Noah
LSM
RRT
M/Du
dhia
Noah
LSM
RRTM/D
udhia
Noah
LSM
RRTM/
Dudhia
III.
Figure.2 Map of the relative humidity at the 850 hPa for 13 June 2015
(21UTC) simulated for the nested domain with 6.6 km resolution.
MM5
Similarit
YSU
PBL
RESULTS AND DISCUSSION
Simulation of 2 atmospheric accidental events
occurring during June and August 2015 over the territory
of Georgia were performed by WRF-ARW model on the
basis of the five different combinations of different
physics options such as micro physics, cumulus
parameterization physics, radiation physics, surface layer
physics, land surface physics, and planetary boundary
layer physics represented in the Table 1. Results of
numerical calculation showed that not one of the
combinations listed in the Table 1 were able to model
true atmospheric event which took place on the 13th of
June 2015. Namely results of numerical calculations
showed that 24h predictions by these schemes were not in
satisfactory quality as they were not able to account of
the small-scale processes that lead to the development of
deep convection. For example on the Fig.2 and Fig.3 are
presented predicted fields of the relative humidity on the
MIPRO 2016/DC VIS
Figure.3 Map of the relative humidity at the 850 hPa for 14 June 2015
(00UTC) simulated for the nested domain with 6.6 km resolution.
On the figures 4 and 5 are presented forecasted
precipitation fields on the 850 hPa height for 13 June
2015 (21UTC) and for 14 June 2015 (00UTC)
respectively. Both of the figures demonstrate 24 h WRFARW forecast failure and especially in the investigated
region where the both nested models predicted almost
dray conditions (insignificant precipitations). Namely
comparison Fig.4 with Fig.5 shows that considerably
increased amount of accumulated precipitations at the
coastal area of the Black Sea nearby of the Poti city, but
there is diffuse spectrum of accumulated precipitations on
239
the Fig.5 in comparison with Fig.4 in the investigated
area. Unfortunately for this case study, all of the
precipitation simulated in the region of interest (Tbilisi
and suborbs) was not convective in nature, only small
amount of precipitation was produced by the model. As it
is known the CPSs are producing precipitation, not the
microphysics [8], so this indicates that the set of choices
of CPSs were not producing precipitation for this
mesoscale case study. An important deduction from all
simulation results was that quite accurate reproductions
of lower tropospheric temperature and wind profiles but
these were not necessary for the successful simulation
mesoscale deep convection which took place on 13 June
2015 in Tbilisi and suburbs. In our opinion, it is
necessary to strengthen initial and boundary conditions
through data assimilation, and to improve the physical
linkages between the radiation physics, surface layer
physics, land surface physics.
and mountain slopes, pronounced small-scale secondary
circulations develop during the day. They lead to the
development of convergence lines, e.g., over mountain
crests, and mesoscale vortices influencing atmospheric
boundary layer evolution as well as the transport of
humidity and other tracers. Numerical calculations have
shown that combination of the Purdue Lin scheme with
Kain-Fritsch scheme and MM5 Similarity Surface Layer
(Set 3) and Goddard scheme with Kain-Fritsch scheme
and MM5 Similarity Surface Layer scheme (Set 5) gave
the better results than others. Selected convective cases
for Set3 and Sey5are shown in Fig.6 and Fig.7
respectively. Numerical calculations have shown that
there are indeed ‘natural’ scales of activity for the
convective parameterization within WRF.
Fig.6. Forecasted (Set-3, 20 August 21 UTC) accumulated precipitation
12 h sum for nested domain 2.2 km resolution.
Fig.4. Forecasted (13 June 00 UTC) accumulated precipitation 12 h
sum simulated for the nested domain with 6.6 km resolution.
Fig.5. Forecasted (14 June 00 UTC) accumulated precipitation 12 h sum
for nested domain with 1-way nesting method and 6.6 km resolution.
Also the WRF–ARW model, version 3.6, was used for
simulation shower which took place on 20 August with
5 different combinations of physical schemes (see
Table.1). Numerical calculations have shown that in all
cases, orographic forcing plays an important role in the
localization and intensification of precipitation in and
nearby of complex terrain. In the pre-convective
environment, caused by differential heating of valleys
240
Fig.7. Forecasted (Set-5, 20 August 21 UTC) accumulated precipitation
12 h sum for nested domain 2.2 km resolution.
Comparison Fig.6 with Fig.7 shows that the main features
of accumulated precipitations are predicted almost
similarly, but accurate study of the dynamics and its
comparison with the data of observations have shown that
Set 3 was able to model that true atmospheric event
which took place on the 20 -21 August 2015. In summary
it can be said, that above mentioned model can be
MIPRO 2016/DC VIS
successfully used for local weather extremes prediction
for western type synoptic processes such was 19-21
August atmospheric circulation above the Georgian
territory.
[7]
[8]
[9]
IV.
CONLUSION
In this study some comparisons between WRF forecasts
was done in order to check the consistency and quality of
WRF model for the day with the heavy precipitations
occur on the territory of Georgia. This first analysis
allowed verifying that in general the set of combinations
of Purdue Lin scheme with Kain-Fritsch scheme and
MM5 Similarity Surface Layer (Set 3) and Goddard
scheme with Kain-Fritsch scheme and MM5 Similarity
Surface Layer scheme (Set 5) gave the better results than
others for western atmosphere processes dominated
above the territory of Georgia. Also for evolution and
improvement of model skill for different time and spatial
scale the verification and assimilation methods should be
used for further tuning and fitting of model to local
conditions.
ACKNOWLEDGMENT
The authors are supported by the EU Commission
Project H2020 “VRE for regional Interdisciplinary
communities in Southeast Europe and the Eastern
Mediterranean” VI-SEEM №675121
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
W. C.Skamarock, J. B. Klemp, J.Dudhia, D. O. Gill, D. M. Barker,
W. Wang, J.G. Powers, “A description of the advanced research
WRF Version 2. NCAR Tech. Notes. Natl. Cent. for Atmos. Res.,
Boulder, Colorado. 2005
A. L. Kondowe, “Impact of Convective Parameterization Schemes
on the Quality of Rainfall Forecast over Tanzania Using WRFModel”, Natural Science, 2014, Vol.6, pp.691-699
Z. T. Segele, Lance M. Leslie and Peter J. Lamb, “Weather
Research and Forecasting Model simulations of extended warmseason heavy precipitation episode over the US Southern Great
Plains: data assimilation and microphysics sensitivity
experiments”, Tellus A 2013, V. 65, pp.1-26.
G.T. De Silva, S. Herath, S.B.Weerakoon and Rathnayake U.R.,
“Application of WRF with different cumulus parameterization
schemes for precipitation forecasting in a tropical river basin”,
Proceedings of the 13th Asian Congress of Fluid Mechanics 17-21
December 2010, Dhaka, Bangladesh, pp.513-516
G. S. Zepka and Jr. O. Pinto, “A Method to Identify the Better
WRF Parameterizations Set to Describe Lightning Occurrence”,
3rd meteorological Lightning Conference, 21-22 April, 2010,
Orlando, Florida, USA pp.1-10
Y. Yair, B. Lynn, C. Price, V. Kotroni, K. Lagouvardos,; E.
Morin, A. Mugnai and Llasat M. d. C., “Predicting the potential
for lightning activity in Mediterranean storms based on the
Weather Research and Forecasting (WRF) model dynamic and
microphysical fields”. J. Geophys. Res., 115, D04205, doi:
10.1029/2008JD010868.
MIPRO 2016/DC VIS
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
J. S. Kain, “The Kain–Fritsch Convective Parameterization” An
Update. J. Appl. Meteor., 2004, 43, 170–181.
E. K. Gilliland, , and C. M. Rowe, “A comparison of cumulus
parameterization schemesin the WRF model” . Preprints, 21st
Conf. on Hydrology, San Antonio, TX, Amer.Meteor. Soc., 2007,
P2.16.
W. Wang and N. L. Seaman, “A comparison study of convective
parameterization schemes in a mesoscale model”. Mon. Wea. Rev.,
1997, 125, 252-278.
L.-M. Ma and Z.-M. Tan, 2009: Improving the behavior of the
cumulus parameterization for tropical cyclone prediction:
convection trigger. Atmos. Res., 92, 190-211.
S.Y. Hong, J. Dudhia, and S.H. Chen, “A Revised Approach to Ice
Microphysical Processes for the Bulk Parameterization of Clouds
and Precipitation”. Mon. Wea. Rev., 2004, 132, 103–120.
B.S. Ferrier, Y. Jin, Y. Lin, T Black, E. Rogers, G. DiMego,
“Implementation of a new grid-scale cloud and precipitation
scheme in the NCEP Eta model”. Preprints 15th Conf. on
Numerical Weather Prediction,San Antonio, TX, Amer. Meteor.
Soc., 2002, 280–283.
Y. L. Lin, R. D.Farley, H. D. Orville,“Bulk Parameterization of
the Snow Field in a Cloud Model”. J. Appl. Meteor., 1983, 22,
1065–1092.
Z.I.Janjić, “The Step-Mountain Eta Coordinate Model: Further
Developments of the Convection, Viscous Sublayer, and
Turbulence Closure Schemes. Mon. Wea. Rev., 1994, 122, 927–
945.
Z.I. Janjić, ”Comments on “Development and Evaluation of a
Convection Scheme for Use in Climate Models”. J. Atmos. Sci.,
2000, 57, 3686.
G. A. Grell and Dévényi, D.“A generalized approach to
parameterizing convection combining ensemble and data
assimilation techniques”. Geophys. Res. Lett.,2002, 29 (14), 1693,
doi: 10.1029/2002GL015311.
J. D. Duda, “WRF simulations of mesoscale convective systems at
convection-allowing
resolutions
Graduate
Theses
and
Dissertations. , 2011. Paper 10272.
J. S. Kain and J. M. Fritsch, “The role of the convective “trigger
function” in numerical forecasts of mesoscale convective
systems”. Meteor. and Atmos. Phys.,1992, 49, 93–106.
J. S Kain,., and J. M. Fritsch, “Convective parameterization for
mesoscale models: The Kain–Fritsch scheme”. The Representation
of Cumulus Convection in Numerical Models, Meteor.Monogr.,
No. 24, Amer. Meteor. Soc.,1993, 165–170.
S.Yavinchan, , R. H. B. Exell, and D. Sukawat, 2011: Convective
parameterization in a model for the prediction of heavy rain in
southern Thailand. J. Meteor. Soc. Japan, 89A,pp. 201–224.
A. S.Monin, A. M. Obukhov,” Basic laws of turbulent mixing in
the surface layer of the atmosphere “Tr. Geofiz. Inst., Akad. Nauk
SSSR, 24, 1954, pp.1963–1967. (in Russian).
S.Y. Hong, and J. O. J. Lim, “The WRF single-moment 6-class
microphysics scheme (WSM6)”. J. Korean Meteor. Soc., vol. 42,
2006, pp.129–151.
A.E.Cohen, S.M.Cavallo, M.C.Coniglo, H.E. Brooks, “AReview
of Planetary Boundary Layer Parameterization Schemes and Their
Sensitivity in Simulating Southeastern U.S. Cold Season Severe
Weather Environments”, vol.30, 2015, pp.591-612
I., W. A. Jankov, M.Gallus, Segal, B. Shaw, and S. E. Koch, 2005:
The impact of different WRF Model physical parameterizations
and their interactions on warm season MCS rainfall. Wea.
Forecasting, 20, 1048–1060, doi:10.1175/WAF888.1.
T. Davitashvili, R. Kvatadze, N. Kutaladze, “Weather Prediction
Over Caucasus Region Using WRF-ARW Model, MIPRO, 2011,
Proceedings of the 34th International Convection, 2011, Print
ISBN: 978-1-4577-0996-8, Opatija, Croatia, pp.326-330
T. Davitashvili, G. Kobiashvili, R. Kvatadze, N. Kutaladze, G.
Mikuchadze, “WRF-ARW Application for Georgia” Report of
SEE-GRID-SCI User Forum, 2009, Istanbul, Turkey, pp.7-10
241
Methods and Tools to Increase Fault Tolerance of
High-Performance Computing Systems
I.A. Sidorov
Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia
ivan.sidorov@icc.ru
Abstract - This work focuses on the development of models,
methods, and tools to increase a fault tolerance of highperformance computing systems. The described models and
methods are based on automatic diagnostics of the basic
software and hardware components of these systems, the use
of automatic localization, correction of faults, and the use of
automatic HPC-system reconfiguration mechanisms. The
originality and novelty of the offered approach consist of
creating the multi-agent system with universal software
agents, capable of collecting node state data for analysis and
thereby enabling the agent to make the make necessary
decisions directly.
I.
INTRODUCTION
The solution of resource-intensive scientific problems
using high-performance computing (HPC) systems is
complicated owing to a number of difficulties faced by the
applied specialist. Examples of such difficulties include
requirements to optimize parallel programs (PP) for the
architecture of the HPC-system; complexity in the
decomposition of the initial data; necessity to select a
specific queue type from the list of the distributed
resource manager; requirements with respect to the
limitations of administrative policies of the HPC-system
(priorities for tasks, limitations in the number of tasks in
the queue, limitations in the maximum execution time,
etc.), and other conditions required for the preparation and
execution of the PP. One of the main problems of largescale computational experiments that require a large
number of simultaneously involved computing resources
is the task of providing fault tolerance to the computing
process execution. Because PP execution failure directly
affects the economic performance, fee for the computing
resources, fee for the disruption of experiment deadlines,
and fee for labor resources spent searching for problems
and failures, etc.
The reasons for the PP execution failures can be
classified into the following categories: mistakes made by
the developer in the PP code; mistakes made by the user
when preparing the initial data, and failures in the
hardware and software components of the HPC-system.
The first two categories of failures in some cases can
be identified and eliminated during the debugging process
on a small number of nodes. The subsequent launch of
large-scale computational experiments on a large number
of nodes often leads to failures due to the technical
problems of computational nodes.
The failure of hardware or software components on
one node (processor, memory, hard drives, network
242
devices, system services, etc.) will lead to the total failure
of the PP on all involved nodes. Checkpoint mechanism
can improve the reliability of the calculations, however,
when the program executes on a large number of
computing nodes, effective load significantly reduces.
This occurs because the PP spends most of the CPU time
for organization of a checkpoint and restoration after a
failure [1, 2]. In this regard, it is the actual task of creating
and applying effective methods and tools that will
increase the fault tolerance of HPC-systems, and in turn,
will reduce the number of failures of the PP executing on
these systems [3]. One approach to solve this problem is
the use of monitoring and diagnostics systems.
Known monitoring systems (Ganglia [4], Nagios,
Zabbix) focused primarily on the collection of data and
providing aggregated information to the operator about
status of the HPC-system components. More attention is
given to the tools monitoring the PP execution to collect
comprehensive data about effectiveness of the use of
computing resources (Lapta [5], mpiP, IPM). There are a
number of commercial tools for the control of a HPCsystem engineering infrastructure (ClustrX Watch, EMC
ViRP SRM, HP OpenView [6]) and open-source tools for
control of a HPC-system computing infrastructure (Iaso
[7] and Octorun [8]).
However, the level of automation at all stages of
monitoring, diagnostics, and troubleshooting is
unacceptably low and does not support the required level
of HPC-systems reliability. Moreover, the big problem for
the existing approaches in creating monitoring and
diagnostics systems for large-scale HPC-systems (and in
the future for exascale systems) is their obvious
hierarchical architecture. In accordance with this
architecture, the clients of the monitoring system send
data to the central (or intermediate) nodes, where the data
is collected and processed, and if necessary, control
actions are generated. The hierarchical architecture
imposes significant limitations on the scalability of the
monitoring system, leading to an overload on the network
components, data storage systems, and as result, late
reaction to the important and critical events.
This paper proposes a model, method, and tools to
increase the fault tolerance of HPC-systems with
automatic monitoring and intelligent diagnostics of
software and hardware components. Originality and
novelty of the offered approach consists in creating the
multi-agent system with universal software agents,
capable of collecting node state data for analysis and
decision-making directly on the agent side.
MIPRO 2016/DC VIS
II.
MODEL AND METHODS OF THE COMPUTING NODE
DIAGNOSTICS
The model of computing node diagnostics can be
represented by the following structure:
S O, Z ,T , C, F , R, P, Q, L, I
localize and troubleshoot the identified faults. Controldiagnostic
operations
implement
mappings:
F : O Z Z or F : O Z .
Elements of R include the following controldiagnostic operations types:
r1 – data collecting operations (obtain data about
the current state of active cooling systems, the
temperature of the motherboard and processors,
the hard-drive SMART info, the amount of free
memory on storage devices, the status of network
devices, the status of operating system services,
the status of connections to the network drives,
etc.);
r2 – test operations (fast or full memory tests,
testing of the hard drive devices for badblocks,
testing of the network components, etc.);
r3 – fault localization operations (identification
number of the failed RAM bank; identification of
processes using a lot of memory or generating a
huge workload for hard-disk drives or network
components; identification of the physical
location of the failed hard drive; identification of
failed cooler number on the motherboard etc.);
r4 – troubleshooting operations (cleaning
temporary files from storage devices, restarting
services of operating system services, remounting
network drives, etc.);
r5 – troubleshooting verification operations
(check the free disc space, check the free amount
of RAM, check stopped processes of the operating
system, check temperature of the CPU, etc.);
r6 – critical situations operations (node
shutdown, disconnecting node from the HPCsystem resource manager, killing all calculating
processes to decrease CPU-temperature, sending a
notification to the system administrator, etc.).
where:
O – is the node components set;
Z – is the measured characteristics set for each
node component;
T – is the measured characteristics types set;
C – is the predicates set;
F – is the control-diagnostics operations set;
R – is the control-diagnostics operations types
set;
P – is the production rules set;
Q – is the queue of the control-diagnostics
operations that are ready for execution;
L – is the log of the diagnostics process, and
I – represents the intervals
diagnostics process on the node.
for
starting
Elements of the set O are different components of a
HPC-system node, failure of which may result in failure
of the PP, because of the instances of PP being executed
on this node. The elements of O can be hardware
components (active cooling system, processor, memory
banks, hard disk drives, network devices, etc.) and
software components (operating system services and
processes, operating system network interfaces,
connections to network storages, etc.).
Characteristics of Z are divided into the following
categories:
workload computing characteristics of node
components (workload of CPU, CPU cores,
memory, network and hard-drives input/output
workload , etc.);
The type of operation defines the priority of its
execution in the operations queue Q : r1 – priority 1, r2 –
priority 2, r3 – priority 3, r4 – priority 4, r5 – priority 5,
r6 – priority 0 (maximal).
characteristics of physical state of node
components (temperature of
CPU
and
motherboard, uninterruptible power supply state,
hard drives state, etc.);
r3 , and r5 are classified as data obtain operations, which
characteristics of executed program state (priority,
CPU-usage time, memory usage, hard-drive, and
network storages utilization, etc).
Elements of T include following value types of
measured characteristics: logical (Boolean); integer; float;
percent (unsigned integer in interval [0…100]); chars;
string (up to 256 chars); text (up to 220 chars).
Elements of F include operations designed to perform
control actions to collect data of the hardware and
software component states and diagnostic actions to
MIPRO 2016/DC VIS
It should be noted that the operations of types r1 , r2 ,
collect information about the state of the node components
without making any changes. Operations r4 and r6 are
designed to perform automatic control actions aimed at
troubleshooting problems in software and hardware
components. Control actions can change the mode of the
node components in order to provide fault-tolerant
functioning of this node in the HPC-system. In case of
failure to resolve a critical situation triggered, control
actions should automatically withdraw the defective node
from the pool of available resources of the HPC-system.
243
Analysis of the control-diagnostic characteristics from
the set Z is performed by predicates from the set C ,
written as follows:
ci : z j lgical operator const ,
where ci C , z j Z , i 1, nC , j 1, nZ .
On the left side is specified characteristic of the set Z ,
is the value to be analyzed. Among the logical operators
that can be used are , , , , , and ! . The value
type of the const on the right side and the characteristic
value z k should be of the same type. It allows the
construction of complex expressions using operators
AN D, OR, XOR , and NOT .
The set P contains the production rules of the form
IF ck THEN f i . This structure is interpreted as follows:
if the predicate ck is true, it is necessary to perform the
operation f i . Production is ready when the values for all
characteristics included in the predicate are defined.
Production is completed if the predicate is true and the
operation on the right side has been added to the queue.
The process of diagnostics of the computing node
includes the following steps. With some interval from I ,
the monitoring agents that are working on the nodes run
the diagnostic procedure. The initial data of the
diagnostics is a vector containing values of the logical
characteristics from Z , which depend on the current
interval from I , which define modes of the diagnostic
process (for ex. use or not troubleshooting operations) and
the quality of the diagnostic process (quick testing, full
testing or advanced testing).
There is a certain order of interpretation of productions
and the addition of operations in the queue. It was stated
earlier that there are 6 defined operations types in the
model. In the first step, ready productions are chosen
where the right side has the control-diagnostic operations
of types r1 , r2 or r3 (data obtain operations).
Interpretation of selected productions was conducted,
which included the calculation of predicates and in case
their value was true, the operations were added in the
queue Q .
Processing operations in the queue Q are performed
in the following mode. One of the requirements of the
diagnostics tools is to reduce the total execution time of
operations on the computing node. To achieve this, a
strategy was selected for the execution of the maximum
number of operations in unit time. According to this
strategy, all control-diagnostic operations added to the
queue Q executed immediately in a separate thread in
parallel mode with other operations. At the same time, the
only condition to start operation in the queue is checking
to ensure that the same component from O does not
perform other operations. This condition is checked in
order to avoid execution of mutually contradictory
operations.
244
After
completion of an operation,
the queue
Q expands the set of defined characteristics values, and as
a result, expands the list of ready productions. The
procedure for interpretation of ready and not previously
completed productions starts again.
It should be noted that the second and the subsequent
steps of the productions interpretation, in addition to the
productions with specified operations of types r1 , r2 and
r3 , also selected productions of type r6 at the right side
(actions in critical situations). In the case the predicate is
true for such production, this operation is placed at the
first position in the queue. This approach ensures
immediate response to critical situations.
After all the operations of types r1 , r2 , and r3 are
completed, the next stage is executed. At this stage, the
selected productions on the right are the operations of r4
type (troubleshooting operation). After completion of all
r4 type operations, the next stage is executed with
selected productions of r5 type operations at the right side
(troubleshooting verification operations). At the last stage,
when type r6 operations are performed and when all other
operations are completed, the diagnostics process is
considered complete.
It should be noted that all elements of sets O , Z , C ,
F , P , I for the structure S are provided with
informative descriptions. During the interpretation of the
predicates and operations to the diagnostic log file L
sequentially added the actions taken. According to this
principle it is performed the construction of the
diagnostics log file.
III.
EXAMPLE OF NODE DIAGNOSTICS
As an example, consider a simple diagnostics process
of computing node RAM. Let the elements of the sets for
the structure S be as follows:
O:
Z:
C:
o1 – RAM of the computing node;
o2 – user tasks that executed on the node.
z0 – initial condition of diagnostic (type is string);
z1 – total RAM size (type is integer);
z2 – used RAM size (type is percent);
z3 – identification number of failed RAM bank
(type is integer);
z4 – status of sending notification to the administrator
of HPC-system (type is logical);
z5 – status of sending notification to the user of task o2
(type is logical);
z6 – status of command for stopping task execution o2
(type logical);
z7 – status of finishing task o2 (type logical).
c0: z0 == "quick" - performs quick diagnostics;
c1: z1 != 233
- total size of RAM should be 8 Gb;
c2: z3 > 0
- number of failed RAM bank was
successfully indentified;
c3: z2 > 99
- node RAM used more than 99 %.
MIPRO 2016/DC VIS
F:
P:
f1: {o1} → {z1,z2}
- operation to obtain node RAM
information (type is r1);
f2: {o1} → {z3}
- operation to localize a failed
RAM bank (type is r3);
f3: {o1, z3}→{z4}
- operation to notify HPC-system
administrator about
failed RAM bank (type is r6);
f4: {o2, z2} → {z5} - operation to notify the user of
the task o2 about exceeding the
limit of RAM (type is r6);
f5: {o2} → {z6}
- stop task execution (type is r4);
f6: {o1, o2} → {z7} - check the status of stopping
program execution o2 (type is r5).
p0: IF с0 THEN f1 - perform quick node
diagnostics;
p1: IF с1 THEN f2 - RAM size was changed, need to
localize a faulty RAM bank;
p2: IF с2 THEN f3 - notify the administrator about
identified failed RAM bank;
p3: IF с3 THEN f4 - notify user of task o2 about
exceeding the limit of the RAM;
p4: IF с3 THEN f5 - stop task o2, because the limit of
the RAM was exceeded.
The diagnostics process for such a structure may work
as follows. Let z0="quick". Operations queue after
interpretation of the ready productions will include the
following operation: Q={ f1 }. When the operation f1
completes, the values of parameters z1 и z2 will be defined.
Suppose that a node has two 4GB RAM banks, and one
bank has failed, then z1 = 232 (in such cases, the operating
system often continues to work, excluding the address
space of the failed bank). Also, assume that z2 > 99
(critical memory usage reached with the possibility of a
swap usage). In the next step, the predicates c1 and c3 are
true. Operations of localization Q={ f4 , f2 } were added to
the queue in accordance with the order of productions
processing. In the next step, the predicates c2 and c3 are
true and the following operations, Q={ f3 , f5 }, will be
added to the queue. And in the last step, the operation
Q={ f6 }, for checking process stopping, will be added to
the queue.
It should be noted that this example is demonstrative
of the above models and methods and does not contain a
complete list of operations and productions for a
comprehensive diagnostic of computing node RAM.
IV.
REALIZATION OF DIAGNOSTICS TOOLS
Software implementation of the diagnostics tools is
realized as a part of the meta-monitoring toolkit [9, 10]
designed for heterogeneous large-scale HPC-systems,
which includes hundreds of thousands of calculating
nodes. This toolkit is based on the use of service-oriented
technology, multi-agent technology, methods of creation
of expert systems, methods of decentralized data
processing, distributed data storage, and decentralized
decision-making.
The above model and methods were implemented as
part of the autonomous software agents of the meta-
MIPRO 2016/DC VIS
monitoring system that functions on the compute nodes in
the background mode. Agents perform the functions of
node state data collection, collected data analyzing,
generating and executing the control actions, and
communications with other agents. Ability to analyze data
and to make the necessary decisions on the node side is
the key feature of the implemented approach. As
discussed above, in most monitoring systems, the
collected data are always sent to the central node where
they are processed and stored. This implementation
significantly reduces the number of operations to analyze
data on the node side; however, regular sending of data to
the master node creates an additional load on the network
protocols stack, which also requires CPU time (formation,
analysis and control of network packets). In our approach,
the analysis of the data at the node side allows sending
data to the central node only if necessary. Comparative
analysis of the Ganglia monitoring client (gmond) and
agent of the developing system showed that for the same
frequency of the data gathering (every 30 sec.), our agent
with the functions of local data analysis and local
decision-making consumes 37% less CPU time.
Figure 1. The architecture of the base subsystems of the diagnostics
tools
Diagnostics tools
subsystems (Fig. 1):
include
the
following
base
decentralized data storage, which includes
knowledge base (contains specifications for
characteristics, productions, predicates, list of
previously detected problems and faults etc.),
control-diagnostics operations library (contains
modules realized using the specialized language),
round-robin database to store periodically
obtained monitoring data, and log files;
control subsystem, which performs functions of
coordinating node diagnostics and includes
mechanisms to start the diagnostics process in
certain time intervals, or in moments when
computing node is idle;
245
subsystem for interpretation of productions and
predicates;
the
control-diagnostics
manager;
operations
execution subsystem
operations, and
control-diagnostics
the logging subsystem.
for
nodes, and HPC-system reconfiguration ensured the
prevention of many computational processes failures.
queue
Program realization of the agent and all base
subsystems was implemented using the C++ programming
language. Collection of information about node state is
implemented using SIGAR [11] library. Implementation
of control-diagnostics operations is performed using a
specialized language developed by the author. This
language is a subset of ECMA Script [12], which was
extended to support calls of external commands, output
stream processing and a number of other mechanisms,
which allow obtaining data about the state of non-standard
software and hardware devices. Interpreter for productions
and predicates are based on the ECMA Script interpreter
too. All the specifications of the diagnostics tools are
described in the JSON format.
To store and have access to collected data, was
implemented a solution based on round-robin database
principles. The developed solution showed a higher
performance for our tasks in comparison with the
universal tools (RRDtools and MRTG [13]). For storage
of productions, predicates, list of previously detected
problems and other information, the light-weight
embeddable relational database SQLite [14] is used.
VI.
The described approach allows to control, diagnose,
and troubleshoot software and hardware components of
HPC-system nodes in a finite number of steps. It ensures
minimization of run-time diagnostics and troubleshooting
processes by using the mechanisms of parallel operations
execution, and the approach also supports an increase in
the fault tolerance of nodes thereby increasing the fault
tolerance of PP executed on these nodes. All these
features make it possible to enhance the fault tolerance of
the HPC-system.
ACKNOWLEDGMENT
The study was supported by Russian Foundation of
Basic Research, projects no. 15-29-07955-ofi_m and no.
16-07-00931. The author is grateful to A. G. Feoktistov
for discussions and valuable comments.
REFERENCES
[1]
[2]
[3]
[4]
V.
EXPEREMENTAL RESULTS
The developed methods and tools of diagnostics were
successfully approved in the Supercomputer Center of
ISDCT SB RAS [15]. Configuration of heterogeneous
HPC-system includes the following resources:
[5]
110 nodes with AMD Opteron 6276 processors
(total number of cores 3520);
[6]
20 nodes with Intel Xeon 5345 processors (total
number of cores 160);
2 nodes with GPU Nvidia Tesla C1060
accelerators (total number of cores 1920).
During diagnostics tools testing, several software and
hardware resources in critical condition and in an error
state were detected:
[node011.mat.icc.ru]: RAM bank #4 failed;
[node-12.bf.icc.ru]: CPU temp is more 87°C;
[node-04.bf.icc.ru]: SMART health status bad;
[node103.mat.icc.ru]: /store used 100%;
[node087.mat.icc.ru]: iface ib0 errors detected;
[tesla01.icc.ru]: CPU warnings in system log file.
The automatic testing of the nodes components, fault
localization and troubleshooting, isolation of unreliable
246
CONLUSION
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir,
“Toward exascale resilience: 2014 update,” Supercomputing
Frontiers and Innovations, vol. 1, nо. 1, pp. 5-28, April 2014.
P. Kogge, J. Shalf “Exascale Computing Trends: Adjusting to the
"New Normal" for Computer Architecture,” Computing in Science
& Engineering, vol. 15, no. 6, pp. 16-26, November 2013.
B. Mohr “Scalable parallel performance measurement and analysis
tools – state-of-the-art and future challenges,” Supercomputing
frontiers and innovations, vol. 1, no. 2, pp. 108-123. July 2014.
M. Massie, B. Li, V. Nicholes, V. Vuksan, Monitoring with
Ganglia Tracking Dynamic Host and Application Metrics at Scale,
O’Reilly Media, November 2012.
A. Adinets, P. Bryzgalov, V. Voevodin, S. Zhumatii, D.
Nikitenko, K. Stefanov “Job Digest: an approach to dynamic
analysis of job characteristics on supercomputers,” Numerical
methods and programming: Advanced Computing. vol. 13, no. 4,
pp. 160–166, 2012 (in Russian).
“HP OpenView,” http://www.openview.hp.com/solutions/ams/
ams_bb.pdf [online, accessed: 31-Jan-2016].
K. Lu, X. Wang, G. Li, R. Wang “Iaso: an autonomous faulttolerant management system for supercomputers,” Frontiers of
Computer Science, vol 6, no. 3, pp. 378-390, May 2014.
A. Antonov, V. Voevodin, A. Daugel-Dauge, S. Zhumatii, D.
Nokonenko “Ensuring effective operational control and battery
life of the MSU Supercomputer,” Bulletin of South Ural State
University, vol. 4, no. 2, pp. 33-43, 2015 (in Russian).
I. V. Bychkov, G. A. Oparin, A. P. Novopashin, I. A. Sidorov
“Agent-Based Approach to Monitoring and Control of Distributed
Computing Environment,” Lecture Notes in Computer Science,
vol. 9251, pp. 253-257, September 2015.
I. A. Sidorov, A. P. Novopashin, G. A. Oparin, V. V. Skorov
“Methods and tools of meta-monitoring of distributed computing
envinronments,” Bulletin of South Ural State University, vol. 3,
no. 2, pp. 30-42, 2014 (in Russian).
“System
Information
Gatherer
and
Reporter
API,”
https://github.com/hyperic/sigar, [online, accessed: 31-Jan-2016].
“Standard ECMA-262,” http://www.ecma-international.org/publications/standards/Ecma-262.htm [online, accessed: 31-Jan-2016].
H. Allen, P. Regnauld “Network management workshop: MRTG/
RRDTool,” Apnic 29, Kuala Lumpur, March 2010.
G. Allen, M. Owens, The Definitive Guide to SQLite, Apress,
November 2010.
“Irkutsk Supercomputer Center of SB RAS,” http://hpc.icc.ru/
[online, accessed: 31-Jan-2016].
MIPRO 2016/DC VIS
Logical-Probabilistic Analysis of Distributed
Computing Reliability
A.G. Feoktistov and I.A. Sidorov
Matrosov Institute for System Dynamics and Control Theory of SB RAS, Irkutsk, Russia
agf@icc.ru
Abstract - The aim of the study is to develop tools of
increasing a problem solving reliability in a heterogeneous
distributed computing environment by applying a
diagnostics of computing resources components and using
an analysis of problem solving schemes. A scheme (a plan) is
an abstract program for a problem solving. A special
attention is paid to the calculation of a problem solving
scheme reliability on the basis of a logical-probabilistic
method. This method is based on transiting from Boolean
functions for a reliability description of a problem solving
scheme to probability functions for determining indicators
of such reliability. Improving a problem solving scheme
reliability is carried out by a resource reservation. The
resource reservation applied in a problem solving scheme
provides obtaining of a reliability indicator that
approximates maximally the predetermined criterion of
reliability, taking into account limitations on a number of
reserve resources. The example of the problem solving
scheme and calculating its reliability is represented.
I.
INTRODUCTION
Today, special attention of specialists in the field of
cloud infrastructures and Grid-systems is given to
developing fundamentals of distributed computing for
solving large-scale scientific problems in various subject
areas. The development of new methods and algorithms
for a computing management in a heterogeneous
distributed computing environment (DCE), where nodes
have a complex hybrid structure, is actively carried out.
The heterogeneity of a DCE, the presence of hybrid
components in it structure, the wide range of fundamental
and applied problems and the necessity of scalable
computing actualize the researches related to improving a
reliability of computational processes.
In theory of computer systems reliability there exist
two traditional approaches to provide a reliability of
computational processes [1]. The first approach aims to
make an attempt to restore a computational process after
some fault of software or hardware components of a
computer system. The second approach aims to provide an
operability (fault tolerance) of a computer system while
fault of software or hardware components.
The first approach is typically realized by using
checkpoint mechanisms. Unfortunately, these mechanisms
don't allow to correctly restore a computational process
after a fault for some classes of parallel programs. In
addition, a total time of checkpointing and restarting of a
computational process can be comparable to a time of
solving a problem in a heterogeneous DCE [2].
MIPRO 2016/DC VIS
The second approach is based on using of a
redundancy of hardware, processes and data. Within this
approach there exist problems with switching from a
faulty component to a reserved component [3, 4].
A promising approach to improve a computer system
reliability [5, 6] is proactive detecting and troubleshooting
of current and potential hardware faults and using this
knowledge for planning computational processes and
allocating computing resources for an execution of these
processes.
However, the implementation of this approach for a
heterogeneous DCE raises some important problems.
Among them: the lack of universal models and methods of
determining reliability indicators for a heterogeneous
DCE; the lack of methods and tools for applying
reliability indicators in the traditional resource managers;
the insufficient automatization of troubleshooting in
traditional monitoring systems for DCE.
In this paper some aspects of an approach to ensure a
reliability of solving problems in a heterogeneous DCE
are considered. This approach is developed to the DCE,
which is characterized by following features:
Computer clusters act as nodes of the DCE. These
clusters can consist of hardware and software
components (computational elements) of various
architecture and configuration for different
parallel programming technologies. Clusters are
organized based on both allocated and nonallocated computational elements and, hence,
significantly differ by the reliability degree of
their computing resources.
Different levels of the DCE have various
categories of users working at them. Clusters are
used by both local users of these clusters and
global users of the DCE.
To solve a problem, a user should form a job for
the DCE. The job is a specification of solving
problem that include information on the required
computing
resources,
executable
applied
programs, input/output data and other required
information.
The specialized multi-agent system (MAS) for a
distributed computing management [7] is used for
formulating problems, planning problem solving
schemes and forming jobs of global users. Formed
jobs are decomposed by the MAS into a set of
subjobs for clusters. These subjobs and jobs of
247
local users are managed by traditional local
resource managers at the cluster levels. Agents of
the MAS observe for subjobs execution processes.
Clusters do not have enough free resources to
simultaneously process all jobs in its queues.
The features of our approach are listed below.
A high competition of users for common resources of
the DCE leads to the necessity to take into account all
characteristics of these resources in a process of their
allocation in order to achieve the required quality of job
execution [8]. Usually, a time or cost of job execution,
indicators of resources load balancing, coefficients of
resources efficiency or performance are used as main
criteria of the efficiency for a distributed computing
management. Today, besides the listed criteria, the
reliability of job execution in the DCE is the practical
importance criterion. Increasing the distributed computing
reliability allows guaranteeing the fulfillment of other
criteria of job execution in a greater degree.
The calculation of a problem solving scheme
reliability is carried out on the base of a logicalprobabilistic method [9]. This method provides transiting
from Boolean functions for a reliability description of a
system under study to probability functions for
determining indicators of such reliability.
A reservation is one of the simplest and effective
methods to improve computer system reliability [10]. In
this case, the reservation is to use additional (reserve)
nodes. These additional nodes are expected to assume the
faulty nodes functions in the execution process of a
problem solving scheme. Additional nodes are loaded
reserve. These reserve nodes are also used as main nodes.
Using loaded reserve due to the fact that other forms of a
reservation [11] such as an unloaded reserve, a hot reserve
and a multi-version job realization lead to significant
overhead and are more suitable for real time systems. The
reservation is carried out for single elements (nodes). A
large number of nodes may be required for such
reservation. Thence, in this paper the problem of forming
a set of reserve nodes providing of a reliability indicator
that approximates maximally the predetermined criterion
of reliability, taking into account limitations on a number
of reserve resources.
Data about the nodes reliability are provided by the
author's meta-monitoring system considered below.
The successful results of using the logical-probabilistic
method for studying distributed systems are known [12,
13]. A practical comparison of similar methods is
discussed in [14].
II.
META-MONITORING SYSTEM
In the considered DCE the meta-monitoring system is
used to test the DCE nodes, to collect data about current
state of the DCE nodes, to detect nodes faults, to diagnose
and to partially repair these faults. In this system the
processes of collecting data, monitoring nodes, detecting
and diagnosing faults are based on using multi-agent
technologies and the unique methods for decentralized
248
processing and distributed storage of data. Unlike the
known, the meta-monitoring system provides:
an ability to obtain data from the most popular
high-performance monitoring systems (Ganglia,
Nagios, Zabbix etc.);
a wide range of traditional and original functions
for obtaining and collecting data about hardware
and software components of the DCE nodes;
the high-level toolkit for the implementation of
the original functions for obtaining and collecting
data as modules in various programming
languages;
the specialized tools for collecting and analyzing
data of the engineering infrastructure of computer
clusters and data centers;
the auxiliary tools for a unification and
aggregation of data received from different
sources;
the new intelligent tools for an automated expert
analysis of data and generation of control actions
for changing nodes states;
the new service tools for periodic testing nodes,
detecting, diagnosing and partial repairing nodes
faults;
the special application programming interface
based on open standards for providing access to
the monitoring data for external software systems.
An accumulation of data about faults is carried out in
both automatic and manual modes. The registration of
faults in an automatic mode is performed by means the
meta-monitoring tools listed above. Similar registration of
faults in a manual mode allows the DCE administrator to
describe hardware and software faults that were not
recognized in an automatic mode.
The basic information about a fault is the following
information:
the object (computing node, storage system,
facility of an engineering infrastructure for a
computer cluster, etc.) containing a faulty
component;
the faulty component (hard drive, bank of RAM,
network adapter, etc.);
the degree of a component fault impact: without
consequences for the object, a fault of a
computational process in the object, a critical fault
and, as result, non-operability of the object;
the time of a fault detection;
the time of troubleshooting;
the result of troubleshooting: a complete
correction of the fault, a partial correction of the
fault and further using the object in the DCE,
excluding the object from the DCE configuration
and reconfiguring the DCE.
MIPRO 2016/DC VIS
An estimation of reliability indicators of the DCE
objects is based on accumulated data about the object
component faults.
The meta-monitoring system, mentioned above, is
described in details in [15].
III.
LOGIC-PROBABALISTICS ANALYSIS
A. Conceptual Model
Denote by the symbols F, Z, N and A the sets of
program modules, modules parameters, computing nodes
and agents correspondently. The relations between these
sets are denoted by the symbols Rin , Rout , Raf and Ran
such that Rin Z F , Rout Z F , Raf A F and
Ran A N . Then, a conceptual model of the DCE can
be described by the structure
M F , Z , N , A, Rin , Rout, Raf , Ran .
The relations Rin и Rout define respectively the sets
of input and output parameters for modules and, thus,
determine the information-logical connections between
modules. The relation Raf determines connections
between agents and modules that can be performed by
agents. A type of the relations Rin , Rout and Raf is the
“many-to-many”. The relation Ran determines agents’
nodes. A type of the relation Ran is the “one-to-many”.
Nodes of various clusters have different degrees of the
reliability. Computer clusters are represented by agents in
the management system of the DCE.
The user’s request for computational services of the
DCE is defined as a problem in the nonprocedural form
for the management system: “calculate the values for
parameters of the subset Z out Z knowing the values for
parameters of the subset Z in Z ”. In general, a set of
schemes for solving this problem can exist in a model of
the DCE. Each of these schemes determines what modules
and in what order should be executed. Denote by S the
set of schemes.
It is required to form the set S , to choose a single
scheme s S , to determine the agents, which will to
execute the modules of the scheme s , and to allocate the
nodes for an execution of these modules. Indicators of the
scheme performance must satisfy to such criteria of the
problem solving as the time or the cost. An additional
criterion is the scheme reliability – the probability p* t
of the scheme performance in the time moment t . All
these criteria are set by a user. A satisfaction degree of the
scheme performance indicators to the user’s criteria
defines a level for the quality of service in the DCE. In
this formulation, the problem of computation planning and
resource allocation is the NP (Non-deterministic
Polynomial)-hard [16].
In order to support further generality reasoning, two
fictitious modules f1 and f 2 are entered into the set F .
MIPRO 2016/DC VIS
The initial module f1 has the empty set of input
parameters and the subset Z in Z as the set of output
parameters. This module defines initial data of the
problem. The target module f 2 has the subset Z out Z
as the set of input parameters and the empty set of output
parameters. This module defines target parameters of the
problem.
The set S can be represented by the bipartite directed
acyclic graph G V ,U . The set V includes the
subsets VZ and VF of vertices corresponding to
parameters from the set Z and modules from the set F .
Denote by n f the number modules included in the
scheme s . Assume that, information-logical connections
between modules of the scheme s are described by the
Boolean matrix W . Dimensions of this matrix are
n f n f . The matrix element wi ,k 1 means that the
module f i depends on the module f k .
B. Multi-agent System
A hierarchical structure of the MAS can include two or
more layers functioning agents. Agents can play different
roles and perform different functions at their levels of the
hierarchical structure of the MAS. The agents roles can be
constant and temporary, occurring at discrete time
moments due the need to organize collective interaction.
The hierarchy levels of agents differ by the amount of
their knowledge. Agent of the higher level hierarchy has a
large amount of knowledge in comparison with agents of
the lower hierarchy level.
A subsystem of the MAS for distributed computing
management includes the agents of computation planning
and resource allocating. These agents are designed for
creating problem solving schemes and allocating
resources for schemes performance. Any agents may be
united in a virtual community of agents (VCA). In such
VCA the agents interact between themselves on the base
of a competition or cooperation. In detail, the multi-agent
algorithm of computation planning and resource allocation
is considered in [17]. This algorithm provides the
obtaining scheme s satisfied to the time or cost criteria,
which defined by a user. The data from meta-monitoring
system are used in MAS.
Let us now consider new aspects of the multi-agent
management related with the scheme s reliability.
C. Logical-Probabalistics Model
Denote by na the number of agents in some VCA.
These agents are expected to participate in the scheme s
performance. Denote by x the set of the Boolean
variables xi , j , k representing the events of execution
( xi , j ,k 1 ) or not-execution ( xi , j ,k 0 ) for the module f i
in the k-th node allocated by the agent a j , i 1, n f ,
j 1, na . The index k of the variable xi , j , k defines
correspondently the main ( k 1 ) and reserve ( k 2 )
249
nodes of the agent a j . These nodes are allocated for an
execution of the module f i . Each agent a j can allocate
the number c j of the reserve nodes in the scheme s
TABLE I.
module f i in k-th node of the agent a j . The recurrent
formulas (1)-(3) describe a logical circuit of the execution
reliability of a problem solving process:
y1 x 1 ,
if i 2,
hx ,
yi x
xi , ji ,1hx , if i 3,
hx
k :wi , k 1
y k x ,
(1)
+
If the following condition is fulfilled
na
(2)
c
j
0,
j 1
(3)
then the transformation process is finished. Else,
the determination of the index for module with
minimum probability of this module execution is
carried out:
k arg min 1
i 3, n f , ji J
1 p ,
e
i , ji ,l
l 1
where the number of nodes allocated by the agent
a ji for the module f i is denoted by ni , ji and
(4)
e ni, ji .
where ji 1, na .
A transition [18] from the Boolean function y2 x to
the probability function Pt is implemented through the
rules in Table I. The probability function Pt has the
following form:
Pt
nf
p
i 3
i , ji ,1
The number c jk is reduced on one unit.
The reservation of the additional node by the
agent a jk for the module f k is carried out by
replacing the element xk , jk ,e of the transformable
formula for the function y2 x by the element
xk , jk ,e xk , jk ,e1 , where the number of nodes
t .
The function Pt calculates the probability for the
single scenario of the scheme s performance. If
Pt p* t , then function y2 x transformation by
means of the probability indicators improvement for the
function structure elements is carried out. The function
y2 x determined in (4) is represented in the disjunctive
normal form. Therefore, this function is monotonic. Such
property provides the preservation or improvement of the
scheme s probability when the probability indicators for
the function structure elements are improved.
250
transformation process of the function y2 x
corresponding to the scheme s without nodes reservation
into the function y2 ' x corresponding to the scheme s
with nodes reservation includes following stages.
The Boolean function y2 x corresponding to the
target module f 2 determines the scheme s reliability
indicator. After all substitutions in (1)-(3), the function
y2 x takes the form
i 3
1 pi, j, k t
a ji allocates the main ( k 1 ) node for the module f i .
nf
xi, j , k
Denote by J the set of indexes for such agents that
c j 0 , J j : c j 0, j 1, na . . The algorithm for the
where i 1, n f , k 1, n f , ji 1, na . Initially, the agent
y2 x xi , ji ,1 ,
xi, j ,k
The probability function
element
pi, j ,k t
The Boolean function element
performance. Assume that, all nodes of the same agent are
homogeneous. A possibility to run modules in the main or
reserve nodes causes various scenarios of the scheme s
performance.
Let the Boolean function yi x determines conditions
of the module f i execution in the scheme s performance
and the function pi , j ,k t shows a probability of the
RULES OF TRANSITING FROM BOOLEAN FUNCTIONS TO
PROBABILITY FUNCTIONS
allocated by the agent a jk for the module f k is
denoted by nk , jk and e nk , jk .
The number nk , jk is increased on one unit.
y'2 x obtained
The
function
in
the
transformation process of the function y2 x is
represented in the following form:
n
y '2 x K l ,
l 1
(5)
MIPRO 2016/DC VIS
problem “calculate the parameter z 4 value knowing the
parameter z1 value”.
nf
K l xi , ji ,e ,
i 3
nf
n
na
n
i, j
,
i 3 j 1
where e k ji , k ji 1, ni, ji . Each elementary
conjunction in (5) represents the one of the
possible scenarios of the scheme s performance.
All elementary conjunctions are numbered from 1
to n in according to their rank ascending. The
orthogonalization of the function y'2 x is
carried out on the base of the algorithm offered in
[18]. Such orthogonalization of the function
y'2 x provides an incompatibility of the
possible scenarios of the scheme s performance.
The orthogonal function ~
y '2 x has the
following form:
~
y '2 x K1 K 1K 2 ... K 1 K 2 ...K n1K n .
Such constructions as “ f i , f j ” and “ f i f j ” mean
correspondingly that the modules f i and f j are executed
sequentially only or may be executed in parallel. The
scheme reliability indicator is defined as following:
p* t 0,995 . All Boolean functions yi x for the
problem solving schemes are shown in Table II.
TABLE II.
A simplification of the function ~
y '2 x by means
of the removing of the conjunctions that are
identically zero and the redundant conjunctions is
carried out.
A transition from the Boolean function ~
y ' x to
p' t ,
i
iI
where the probability of the i-th scenario for the
scheme s performance is denoted by p'i t .
The scheme s reliability is calculated with the
using of the function P' t .
If P' t p* t , then the transition to the first
stage is carried out. Else, the transformation
process is finished.
The nodes reservation process considered above
provides scheme reliability indicator that is as close as
possible to the given criterion of scheme reliability taking
into account the limitations on the number of nodes
allocated by each agent. These limitations ensure
convergence of the nodes reservation process.
SCHEME RELIABILITY CALCULATION EXAMPLE
Let the set S (Fig. 1) includes the two schemes
s1 : f1, f 3 f 4 , f 6 , f 2 and s4 : f1 , f 3 f 5 , f 6 , f 2 for the
MIPRO 2016/DC VIS
BOOLEAN FUNCTIONS yi x FOR PROBLEM SOLVING
SCHEMES
Scheme
s1
s2
y1x 1 ,
y1x 1 ,
y3 x y1x x3, j3 ,1 ,
y3 x y1x x3, j3 ,1 ,
y4 x y1x x4, j4 ,1 ,
2
the probability function P' t is implemented
through the rules in Table I. The probability
function P' t has the following form:
P' t
IV.
Figure 1. Bipartite directed acyclic graph for the set S
y5 x y1x x5, j5 ,1 ,
y6 x y3 x y4 x x6, j6 ,1 ,
y6 x y3 x y5 x x6, j6 ,1 ,
x3, j3 ,1x4, j4 ,1x6, j6 ,1 .
x3, j3 ,1x5, j5 ,1x6, j6 ,1 .
y2 x y6 x
y2 x y6 x
Let the set A a1 , a2 , a3 , a4 , a5 includes 5 agents
and the nodes reliability of these agents are
correspondingly following: 0,999, 0,999, 0,9999, 0,9999
and 0,99. The modules distribution by the agents are
shown in Table III. The performance probabilities for the
schemes s1 and s 2 without nodes reservation are
correspondently equal 0,9988 and 0,9898. Thus, the
reservation of the additional node in scheme s 2 is
necessary.
TABLE III.
Scheme
s1
s2
Module
MODULES DISTRIBUTION BY AGENTS
Agent
1
2
3
4
5
f3
–
–
–
+
–
f4
–
+
–
–
–
f6
–
–
+
–
–
f3
–
–
–
+
–
f5
–
–
–
–
+
f6
–
–
+
–
–
251
After the modules distribution by agents, the function
y2 x corresponding scheme s 2 has the following form:
16-07-00931. The authors are grateful to G. A. Oparin for
discussions and valuable comments.
y2 x x3,4,1 x5,5,1 x6,3,1 .
REFERENCES
[1]
The reservation is implemented by the agent a5,
because the resources of this agent provide the minimum
probability for execution of the module f 5 in comparison
with probabilities for execution of the modules f 3 and
y '2 x have
f 6 . After reservation, the functions y'2 x и ~
the forms:
[2]
y'2 x x3,4,1 x5,5,1 x6,3,1 x3,4,1 x5,5,2 x6,3,1 ,
[5]
~
y '2 x x3,4,1 x5,5,1 x6,3,1 x3,4,1 x5,5,2 x6,3,1 x 5,5,1 .
[3]
[4]
[6]
A transition from the Boolean function ~
y '2 x to the
probability function P' t and calculation of the scheme
s2 probability are implemented.
P' t p3,4,1 p5,5,1 p6,3,1 p3,4,1 p5,5,2 p6,3,1 1 p5,5,1
0,9997
[7]
[8]
[9]
Since P' t p* t , the transformation process is finished.
V.
CONCLUSION
The results of the study considered above suggest the
following conclusions.
Up to now, while logical-probabilistic analysis is
carried out, occurs the problem of constructing
Boolean functions, which describe the reliability
logic circuit of a studied system. In this paper the
new universal recurrent formulas for constructing
such functions based on a conceptual model of the
heterogeneous DCE are obtained.
The use of the models and algorithm developed
for logic-probabilistic analysis of the distributed
computing reliability in multi-agent management
system improves the quality of service for users’
jobs in the DCE.
An integrated application of the diagnostic
methods and tools of computing nodes jointly
with the tools for the problems solving processes
analysis in these nodes on the scheme level
provides a substantial increase reliability of the
DCE on the whole.
ACKNOWLEDGMENT
The study was supported by Russian Foundation of
Basic Research, projects no. 15-29-07955-ofi_m and no.
252
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
D. P. Siewiorek, R. S. Swarz, Reliable Computer Systems: Design
and Evaluation. Natick, MA: CRC Press, 1998.
F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir,
"Toward exascale resilience: 2014 update," Supercomputing
Frontiers and Innovations, vol. 1, nо. 1, pp. 5-28, April 2014.
E. Elmroth and J. Tordsson, “A standards-based Grid resource
brokering service supporting advance reservations, coallocation,
and cross-Grid interoperability,” Concurrency and ComputationPractice & Experience, vol. 21, no. 18, pp. 2298-2335, June, 2008.
E. Bauer, R. Adams Reliability and Availability of Cloud
Computing, Wiley-IEEE Press, September 2012.
С. Engelmann, G.Vallee, T. Naughton, "Proactive Fault Tolerance
Using Preemptive Migration," Parallel, Distributed and Networkbased Processing, 2009 17th Euromicro International Conference
on, pp. 252 – 257, February 2008
G. Da Costa, T. Fahringer, J. Rico-Gallego, I. Grasso, A. Hristov,
H. Karatza, A. Lastovetsky, F. Marozzo, D. Petcu, G. Stavrinides,
D. Talia, P. Trunfio, H. Astsatryan, "Exascale machines require
new programming paradigms and runtimes," Supercomputing
Frontiers and Innovations, vol. 2, no. 2, pp. 6-27, September 2015.
I. V. Bychkov, G. A. Oparin, A. G. Feoktistov, V. G. Bogdanova,
A. A. Pashinin, "Service-oriented multiagent control of distributed
computations," Automation and Remote Control, vol. 76, no. 11,
pp. 2000-2010, November 2015.
D. A. Menasce, E. Casalicchio, “QoS in Grid Computing,” IEEE
Internet Computing, vol. 8, no. 4, pp. 85-87, July 2004.
I. A. Ryabinin, “Logical-Probabilistic Calculus: A Tool for
Studying the Reliability and Safety of Structurally Complex
Systems,” Automation and Remote Control, vol. 64, no. 7, pp.
1177-1185, July 2003.
J. Li, “A Model of Resource Reservation in Grid,” International
Conference on Environmental Science and Information
Application Technology, Wuhan, China: IEEE CS Pupl., pp. 199202, July 2009.
D. A. Zorin, V. A. Kostenko, “Algorithm for Synthesis of RealTime Systems under Reliability Constraints,” Journal of Computer
and Systems Sciences International, vol. 51, no. 3, pp. 410-417,
May 2012.
S. Rai, M. Veeraraghavan, K. Trivedi, “A Survey of Efficient
Reliability Computation Using Disjoint Products Approach,”
Networks, vol. 25, pp. 147-165, May 1995.
J. Xing, C. Feng, X. Qian, P. Dai, “A simple algorithm for sum of
disjoint products,” Reliability and Maintainability Symposium,
Reno: IEEE CS Publ., pp. 1-5, January 2012.
A. Rauzy, E. Chatelet, Y. Dutuit, C. Berenguer, “A practical
comparison of methods to assess sum-of-products,” Reliability
Engineering & System Safety, vol. 79, no. 1, pp. 33-42, January
2003.
I. V. Bychkov, G. A. Oparin, A. P. Novopashin, I. A. Sidorov
“Agent-Based Approach to Monitoring and Control of Distributed
Computing Environment,” Lecture Notes in Computer Science,
vol. 9251, pp. 253-257, September 2015.
M. R. Garey, D. S. Johnson, “Computer and Intractability. A
Guide to the Theory of NP-Completeness,” New York: W. H.
Freeman and Company, 1979.
V. G. Bogdanova, I. V. Bychkov, A. S. Korsukov, G. A. Oparin,
A. G. Feoktistov, “Multiagent Approach to Controlling
Distributed Computing in a Cluster Grid System,” Journal of
Computer and Systems Sciences International, vol. 53, no. 5, pp.
713-722, September 2014.
A. A. Pospelov, “Logical Methods of Circuit Analysis and
Synthesis,” Moscow-Leningrad: Energiya, 1964. (in Russian).
MIPRO 2016/DC VIS
Distributed Graph Reduction Algorithm with
Parallel Rigidity Maintenance
*
Diego Sušanj* and Damir Arbula*
Faculty of Engineering, Department of Computer Engineering, Rijeka, Croatia
dsusanj@riteh.hr, damir.arbula@riteh.hr
Abstract - Precise localization in wireless sensor networks
depends on distributed algorithms running on a large
number of wireless nodes with reduced energy, processing
and memory resources. Prerequisite for the estimation of
unique nodes locations is rigid network graph. To satisfy
this prerequisite network graph needs to be well connected.
On the other hand, execution of distributed algorithm in the
graphs with large number of edges can present significant
impact on scarce resources of wireless nodes. In order to
reduce number of edges in the network graph, novel
distributed algorithm for network reduction is proposed.
Main objective of the proposed algorithm is to remove as
many edges as possible maintaining graph rigidity property.
In this paper a special case of graph rigidity is considered,
namely the parallel rigidity.
I.
INTRODUCTION
In today's era of rapid development of technology and
human progress, concept of ubiquitous computing has
become a reality in the form of computers, mobile phones,
home appliances, and other, so called, smart devices.
Directly driven by such progress is a need for simple and
fast connection of all those devices without relying on any
existing fixed infrastructure.
Wireless ad hoc networks were created in response to
the above mentioned needs. They represent a selfconfiguring, decentralized dynamic networks in which
devices or nodes, can move freely without depending on
existing fixed infrastructure. In other words, devices are
both the users and the infrastructure of those networks.
A.
Wireless sensors networks
A certain type of ad hoc wireless networks consisting
of spatially distributed autonomous sensors which are
used to monitor various environmental conditions, such as
temperature, pressure, illumination, etc. are called wireless
sensor networks. These networks are made up of several
to hundreds of interconnected nodes with cooperative
relaying of the information through the network to the
input/output gateway nodes.
Size and the price of these nodes are directly
dependent on constraints such as the type of power source,
the amount of memory, processing power and
communication bandwidth. The ultimate goal is to have
nodes that are as independent as possible which relates to
low or no maintenance and no use of the existing
infrastructure, most notably in terms of power.
MIPRO 2016/DC VIS
Nodes are usually equipped with batteries and
optionally they have some means of energy harvesting i.e.
means of obtaining energy from the environment.
Consequently, it is essential to pay attention to energy
consumption both on hardware and on software level.
There are two basic rules: (1) energy needed to send a
unit of data is far greater than the energy needed to
process the same amount of data, and (2) messages sent in
smaller steps with lower transmission power is more
energy efficient than those sent in a single step with
greater transmission power [1].
B. Problem definition
The motivation for this work primarily stems from the
issues of scalability of wireless sensor networks, more
precisely excessive network density. Problems arise when
there is a need for more advanced calculations, such as
localization of dense network on low end nodes and
because of that, successful execution of algorithm is
limited by the maximum number of neighbors for each
node [2].
In this sense, an adequate selection of neighbors
enables implementation of distributed algorithms in dense
wireless networks without losing network rigidity
property thus preserving information required for the
estimation of unique nodes locations.
II.
LOCALIZATION
Location service is one of the basic services of many
emerging computing and networking paradigms [5]. In the
case of wireless sensor networks, sensor nodes need to
know their locations in order to put detected and recorded
events into proper spatial context, or for example to enable
more informed and thus more efficient routing.
Sensors are usually deployed without any previous
knowledge of their absolute or relative location and there
is no infrastructure that would enable them to localize
themselves after deployment. One method of informing
node of its location is manual measurement and location
calculation. Such procedure is not feasible for larger
networks or in the case of deployment in the inaccessible
environment.
Another method involves the use of GPS, the global
positioning system. To be able to localize the node using
GPS, it is necessary to add a GPS receiver, which
significantly increases costs of the hardware. Also its size
and energy consumption are oftentimes prohibitive, as
253
well as the requirement of being in the range of satellite
signals i.e. outdoors.
A. Location estimation
Localization methods are classified by measurements
modality and two most popular are distance and azimuth
between neighboring nodes. In this paper only methods
based on azimuth measurements are considered.
Azimuth measurements between neighboring nodes
are usually performed using few different techniques. One
is based on using known radiation pattern of one or more
directional antennas, measuring received signal strength.
By comparing the intensity of the received signal received
on multiple antennas the angle of arrival of the signal can
be determined [3].
Another approach is based on the measurement of the
radio signals phase differences using an array of
omnidirectional antennas. There is also similar technique
that is using ultrasound signals [4].
B.
Anchored and anchor-free localization
The location information can be defined in the
absolute coordinate system of the environment, or in the
relative coordinate system – one that is being agreed upon
between nodes themselves. Although the actual absolute
locations in the latter case are not known, network
topology is matching and, by using translation, rotation
and scaling transformations, the relatively positioned
network graph can be aligned with its absolute
counterpart.
The basic premise of anchored localization is having
set of nodes (anchors) that know their actual location in
the environment i.e. by using GPS or manually. Using at
2
least three (in the two-dimensional space - ) or four (in
3
case of ) non-collinear anchors it is possible to localize
the entire network.
In case of anchor-free localization there is no
information about actual location of any node in the
network. The goal of this localization is to construct new
relative coordinate system usually pinned by placing one
node in the origin and one of its neighbors at coordinate
(1, 0). In this paper only two-dimensional case is
considered.
III.
GRAPH RIGIDITY
Basic definitions of graph theory and rigidity are given
in [9] and they are the basis for the modern theory of
rigidity and localization of wireless sensors networks that
is developed further in [5], [6] and [7].
Wireless network topology is usually described using
graph, but to be able to include nodes locations, it is
necessary to augment this description with the positions of
the graph vertices. This concept is called point formation
(or framework) and it represents one of the possible
spatial realization of the network graph.
Congruent formations are those that have equal
distances between all vertices while equivalent formations
have equal distances between connected vertices.
254
Formation is defined as a globally rigid if, and only if, all
equivalent formations are also congruent. The key
property of networks with globally rigid formations is that
it is possible to estimate unique location of all nodes.
Figure 1. Generic (a) and non-generic (b) formation
Generic formations are point formations in which the
coordinates of the vertices are algebraically independent
over the set of rational numbers. If the formation is not
generic, there is possibility that set of neighboring nodes is
part of same d 1 -dimensional space, which results in
existence of equivalent, but not congruent formations.
Fig. 1 shows two different formations of the same
graph with the location of vertices that are: (a) generic, (b)
not generic because these vertices are collinear and it's
possible to find equivalent non-congruent formation.
The problem of determining whether the individual
formation is globally rigid is NP-hard problem, while the
determination of generic formation is easier. In addition,
as an important feature, and a reason why the graph theory
is so thoroughly studied, is that global rigidness of generic
formation depends only on the network graph.
A. Parallel rigidity
Equation (1) defines mapping of measured azimuths
between nodes to a set of values which represents global
azimuths under which node i see node j , (2). Two point
formations
p
and
q
are parallel if their corresponding
N and azimuth functions are equal for all
vertices i and j provided i j .
graphs
: L [0, 2 )
(1)
i, j ij
(2)
Example of two parallel formations is shown in Fig. 2.
Subfigure (a) shows the basic formation, while in (b) one
can see translation of that formation. Subfigures (c) and
(d) show preservation of azimuth measurements i.e. angles
between edges of the graph, while scaling the network.
Subfigure (e) shows a non-congruent realization of the
formation in (a) while maintaining constraints set by
measured azimuths. Formation which has more than one
non-congruent solutions while using constraints based on
azimuth measurements is called flexible, while the one
that has only one congruent solution is called parallel
rigid. One such parallel rigid formation is given in figure
(f) [5], [9].
MIPRO 2016/DC VIS
Every node, after receiving token, records sender’s ID
in stack. In the case node doesn't have any more
neighboring nodes from which it has not received token he
returns token to the node on top of the stack. When node
receives token from three different neighbors it is marked
as rigid but continues to process information in the same
way as before.
Algorithm terminates globally when root node
receives return token and has no neighbors left to forward
it to. In that moment, all nodes should be flagged as rigid
with newly defined set of traversed edges L q which is
subset of initial set L p .
Figure 2. Parallel formations
IV.
PROPOSED METHOD
In this paper we propose a method for edge removal
based on modified depth first search.
Algorithm starts in the root node. Distributed election
of the root node is specific problem that was not focus of
proposed method thus the node with the lowest ID was
always selected as the root node.
Root node sends message to all his neighbors
requesting from them to response with number of their
neighbors. This number is defined as node weight. After
root receives weights from all of its neighbors, a special
message called token is sent to the neighbor with lowest
weight. In the case in which several nodes have the same
weight token is sent to the one with lowest ID. Node
receiving token starts the same process as root, and
requests all his neighbors’ weights.
Fig. 3 shows two examples of network graphs before
(on the left) and after (on the right) execution of edge
removal algorithm. Network graphs after execution of the
algorithm have far less edges in the dense parts of the
graph.
V.
RESULTS
Testing of this algorithm is done in three phases. First
two phases are performed using Pymote simulator [8].
First phase includes automated testing on two sets of
networks. First set is made of networks with specific
topologies on which this, and other algorithms, didn't
perform well. Second set is based on random generated
networks with number of nodes between 8 and 128. In this
phase proposed algorithm was tested on 8000 networks.
In the second phase benchmark testing was performed
in order to compare few versions of proposed algorithm.
Benchmark set was made up of 100 networks with
number of nodes between 8 and 16. Comparison of
algorithms is based on two parameters (1) maximum node
weight after algorithm execution compared to the number
before execution and (2) total number of remaining edges.
In the third phase five network topologies selected
from the benchmark set were implemented and tested on
real nodes in laboratory conditions.
Proposed algorithm passed all three phases always
producing rigid networks.
Figure 3. Network graphs before and after edge removal algorithm
MIPRO 2016/DC VIS
Figure 4. Maximum number of neighbours (weight) before and after
edge removal algorithm for benchmark set networks
255
To alleviate communication complexity while
retaining the rigidness of network graph, distributed
algorithm was designed and implemented. This edge
removal algorithm was tested using both Pymote
simulator and real wireless sensor nodes. The results
obtained in the simulator and in the implementation on the
real nodes in the laboratory have shown a significant
reduction in the number of edges and maximum network
weight while retaining graph rigidness.
Figure 5. Number of edges before and after edge removal algorithm
for benchmark set networks
Graph on Fig. 4 represents maximum node weight per
network before and after execution of algorithm compared
to the number of nodes in network. Algorithm is executed
in the 100 networks from the benchmark set with random
topologies. Important observation is that maximum weight
after edge removal algorithm (full line) does not increase
proportionally with the number of nodes in the network.
Another measure of algorithm success, is the number
of remaining edges after execution of edge removal
algorithm as shown in Fig. 5. Number of removed edges
increases with increasing density of the network.
VI.
CONCLUSION
Wireless sensor networks consist of a few dozens up to
several thousand of spatially distributed autonomous
sensors which are used to monitor various environmental
conditions. In these networks location information is
required for many services i.e. to provide measurement
proper spatial context or to localize recorded events.
Localization methods have their limitations. One of
them is a need for higher computing resources and higher
energy demand on low end nodes. Energy demand is
notable in dense networks, because of higher
communication complexity. Solution for that problem is
reducing the density i.e. by creating communication tree
or reduced communication graph.
On the other hand, requirement to estimate unique
location of all nodes is global rigidity of network
formation.
256
The field of localization in wireless sensor networks
and the development of new, and improving existing
algorithms is extremely broad. Future work includes
improvement of proposed algorithm by creating minimal
rigid graph (graph with theoretic minimum number of
edges), as well as creating method for selecting a subset of
neighbors which results in as small as possible
localization errors.
Furthermore, it would be interesting to study the
rigidity with directed graph, i.e. network where mutual
visibility of nodes is not guaranteed.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
REFERENCES
F. Zhao and L. J. Guibas, Wireless Sensor Networks: An
Information Approach. Morgan Kaufmann Publishers, Elsevier
Inc., 2004.
D. Arbula, “Distributed algorithm for node localization in anchor
free wireless sensor network,” Faculty of Electrical Engineering
and Computing, University of Zagreb, 2014.
J.-R. Jiang, C.-M. Lin, F.-Y. Lin, and S.-T. Huang, “ALRD: AoA
Localization with RSSI Differences of Directional Antennas for
Wireless Sensor Networks,” International Journal of Distributed
Sensor Networks, vol. 2013, Mar. 2013.
N. B. Priyantha, A. K. L. Miu, H. Balakrishnan, and S. Teller,
“The Cricket Compass for Context Aware Mobile Applications,”
in 7th ACMConf. Mobile Computing and Networking
(MOBICOM), 2001.
T. Eren, W. Whiteley, and P. N. Belhumeur, “A Theoretical
Analysis of the Conditions for Unambiguous Node Localization
in Sensor Networks,” Department of Computer Science,
Columbia University, 2004.
T. Eren, W. Whiteley, and P. N. Belhumeur, “Using Angle of
Arrival (Bearing) Information in Network Localization,” in 2006
45th IEEE Conference on Decision and Control, 2006, pp. 4676–
4681.
D. Zelazo, A. Franchi, and P. R. Giordano, “Rigidity Theory in
SE(2) for Unscaled Relative Position Estimation using only
Bearing Measurements,” arXiv:1311.1044 [math], Nov. 2013.
D. Arbula and K. Lenac, “Pymote: High Level Python Library for
Event-Based Simulation and Evaluation of Distributed
Algorithms,” International Journal of Distributed Sensor
Networks, vol. 2013, Mar. 2013.
G. Laman, “On graphs and rigidity of plane skeletal structures,” J
Eng Math, vol. 4, no. 4, pp. 331–340, Oct. 1970.
MIPRO 2016/DC VIS
Architecture of Virtualized Computational Resource
Allocation on SDN-enhanced Job Management
System Framework
Yasuhiro Watashiba∗ , Susumu Date† , Hirotake Abe‡ , Kohei Ichikawa∗ ,
Yoshiyuki Kido† , Hiroaki Yamanaka§ , Eiji Kawai§ , and Shinji Shimojo†
∗ Nara Institute of Science and Technology, Nara, Japan
Email: {watashiba, ichikawa}@is.naist.jp
† Osaka University, Osaka, Japan
Email: {date, kido, shimojo}@cmc.osaka-u.ac.jp
‡ University of Tsukuba, Ibaraki, Japan
Email: habe@cs.tsukuba.ac.jp
§ National Institute of Information and Communications Technology (NICT), Tokyo, Japan
Email: {hyamanaka, eiji-ka}@nict.go.jp
Abstract—Nowadays, users’ computation requests to a highperformance computing (HPC) environment have been increasing and diversifying for requiring large-scale simulations and
analysis in the various science fields. In order to efficiently and
flexibly handle such computation requests, resource allocation
of the virtualized computational resources on an HPC cluster
system such as Cloud Computing service is attracting attention.
Currently, we aim to realize a novel resource management system
(RMS) that enable to handle various resources of an HPC
cluster system, and have been studying and developing the SDNenhanced Job Management System (JMS) Framework, which
can manage an interconnect as network resources by integrating
Software Defined Networking (SDN) concept into a traditional
JMS. However, the current SDN-enhanced JMS Framework
cannot allocate virtualized computational resources to a job
because the computational resource management is performed
by the mechanism of a traditional JMS. In this paper, we propose
a mechanism to handle virtualized computational resources on
the SDN-enhanced JMS Framework. This mechanism enables
to deploy virtual machines (VMs) requested by the user to the
computing nodes allocated to a job and execute job’s processes
in the VMs.
I. I NTRODUCTION
Nowadays, a high performance computing (HPC) center,
which operates HPC environment to provide resources for
users, is required the efficient resource management due
to a fact that users’ computations has been increasing and
diversifying. In various science fields, the necessity of HPC
resources to perform scientific simulations and analysis has
been growing for the purpose of gaining high computing
performance and solving complex and/or large-scale scientific
problems. Since each of such computations has different
resource usage pattern and resource requirements, the HPC
center has to accommodate many and various user computations into its HPC environment. Thus, it is essential that the
resources in an HPC environment are handled by considering
the characteristics of each computation.
MIPRO 2016/DC VIS
The dominant architecture of the HPC environment has
been becoming a cluster system, which is composed of many
computing node connected with a high-speed network called
an interconnect. An general HPC cluster system has two
type of resources: computational resources such as CPU and
memory, and network resources in its interconnect. Thus, it
is important to efficiently and flexibly manage these resources
in order to provide an appropriate set of resources for user’s
computation. In many HPC cluster systems, Job Management
System (JMS), such as NQS [1], PBS [2] and Open Grid
Scheduler/Grid Engine (OGS/GE) [3], is widely adopted as
Resource Management System (RMS) which has the role to
administer users’ computations as jobs and efficiently allocate
resources to them for the purpose of efficient workload balancing. However, most traditional JMS available today can only
handle computational resources. This causes the computing
performance degradation under the influence of inefficient
allocation of the other resources to a job.
In regard to the computational resource allocation to a
job managed by a traditional JMS, today’s diversified user
computations may require more flexible resource provision
from an HPC cluster system. Generally, the computational
resources allocated to a job by a traditional JMS are physical
computational resources, that is, the computation of the job is
performed on computing nodes directly. Since the computing
environment in a computing node, such as the version of the
kernel and libraries, is static, it may not be suitable for a
computation of job with a specified resource requirement and
restriction. Therefore, a mechanism to provide computational
resources with user-required computing environment is also required. In order to realize such flexible computational resource
management, virtualized computational resources, which is
allocated computational resources as virtual machines (VMs)
to a job, have been focused on.
However, since virtualized computational resources have
257
some overhead coursed by virtualizing the physical computational resources, the computing performance on VMs may
be lower than the case of physical resource allocation. Thus,
a user who submits a job suitable for the allocation of
physical computational resources hopes traditional resource
allocation rather than the provision of virtualized computational resources. For achieving efficient and flexible resource
management, a JMS on an HPC cluster system is desirable to
allocate both physical and virtualized computational resources
according to the features of jobs. However, a traditional JMS
in an HPC cluster system does not have any mechanism to
handle VMs as virtualized computational resources, and the
RMS for Cloud Computing targets to allocate resources to
users’ computations by virtualized computational resources.
From these considerations above, we aim to realize a novel
flexible and efficient JMS capable of handling various resources as well as computational resources on an HPC cluster
system. So far, we have addressed an issues for managing an
interconnect in an HPC cluster system as network resources,
and have been studying and developing network-aware JMS,
called SDN-enhanced JMS framework [4], [5]. The SDNenhanced JMS framework can allocate an appropriate set of
computational and network resources by integrating Software
Defined Networking (SDN) into a traditional JMS. However,
in the SDN-enhanced JMS framework, since a traditional JMS
takes charge of the management of computational resources,
the SDN-enhanced JMS framework does not have the functionality to control virtualized computational resources. In this
paper, we propose the resource management mechanism to
handle VMs as virtualized computational resources on the
SDN-enhanced JMS framework as well as physical computational resources.
This paper is structured as follows. Section II briefly
introduces the SDN-enhanced JMS framework, which can
manage not only physical computational resource, but also
the interconnect of HPC cluster system as network resources.
Section III analyses the required functionalities for handling
VMs as virtualized computational resources, and then explains
the implementation of the function enhancement on the SDNenhanced JMS framework. In Section IV, we evaluate the
behaviors of the proposed mechanism to control virtualized
computational resources. In Section V, we conclude this paper
with the future work.
II. SDN- ENHANCED JMS F RAMEWORK
In this section, we briefly mention the system structure and
mechanism of the SDN-enhanced JMS framework [4], [5]. The
SDN-enhanced JMS framework enables to manage and allocate both computational and network resources by integrating
OpenFlow [6], which is a technology for realizing the SDN
concept, into a traditional JMS. The OpenFlow enables the
administrator to manage the whole network in a centralized
and programmable manner through a SDN controller, which
can be designed as software. Figure 1 shows the architecture
of the SDN-enhanced JMS framework. In designing the SDNenhanced JMS framework, the low-cost reusability and a high
258
degree of program portability were considered for facilitating
to cooperate with various traditional JMSs. As the result, the
functionalities to handle network resources was implemented
as an external module called Network Management Module
(NMM). Therefore, the mechanism to manage computational
resources in the SDN-enhanced JMS framework is provided
by the traditional JMS as illustrated in Fig. 1.
In the implementation of the SDN-enhanced JMS framework, OGS/GE [3] has been adopted as a traditional JMS
to integrate the functionalities of network resource management. In the SDN-enhanced JMS framework, an interface to
cooperate with the NMM is required for a traditional JMS.
The OGS/GE has the Parallel Environment Queue Sort (PQS)
API, which allows an administrator to define the new policy
to decide the allocating computational resources. The SDNenhanced JMS framework realize the cooperation between the
traditional JMS and the NMM by enhancing the PQS API.
The NMM has two components: the network control component and the brain component. The network control component has two functionalities to manage network resources
in the interconnect of HPC cluster system: retrieving the
information of the interconnect, and controlling the communication paths among computing nodes allocated to a job. In
order to implement these two functionalities by leveraging the
functions of OpenFlow, Trema [7], which is a development
framework of the OpenFlow controller, has been included in
the network control component.
The role of the brain component is to decide how to allocate
resources to a job based on resource usage of the HPC
cluster system and user requirement of the job. To gather the
information of a target job and the usage of both computational
and network resources, the brain component is connected with
a traditional JMS and the network control component through
the XML-RPC [8]. Moreover, a traditional JMS do not have
any algorithm to determine appropriate resources for a job
based on the usage and requirement of both computational
and network resources. Thus, the brain component has the
resource assignment policy class module, through which the
administrator can define how to allocate resources to a job as
a resource assignment policy. The resource assignment policy
class module provides a set of APIs that facilitate to flexibly
design a resource assignment policy in a Ruby script. In a job
script, a user can require an appropriate resource allocation
algorithm by indicating the name of resource assignment
policy as a parameter.
These mechanisms enable the SDN-enhanced JMS framework to manage and allocate both computational and network
resources. In the SDN-enhanced JMS framework, network
resources are handled as flow entries, which are defined
how to communicate between computing nodes. That is, all
communication paths between computing nodes allocated to
each job are administered as allocated network resources.
Moreover, the information of the network resource allocation
and the the usage of each link in the interconnect of the HPC
cluster system retrieved by the network control component are
stored in a database which is equipped in the NMM.
MIPRO 2016/DC VIS
NetworkManagementModule
Network
Control
OpenFlow
controller
(Trema)
Interconnect
Flow
entry
Database
Brain
ResourceAssignment
PolicyClassModule
Policy
Traditional
JMS
(OGS/GE)
Administrator
J
o
b
J
o
b
shepherd
shepherd
execd
execd
Computingnode
Computingnode
qmaster
Job
Script
User
Fig. 1.
Architecture of original SDN-enhanced JMS framework.
III. A RCHITECTURE AND I MPLEMENTATION
In this section, we derive the functionalities which are
required for handling VMs as virtualized computational resources in the process flow of JMS, and then explain the implementation of the novel SDN-enhanced JMS framework with
a mechanism to control virtualized computational resources.
A. Virtualized Computational Resources
The virtualized computational resources by leveraging a
VM enables to flexibly control the computing environment
in computational resources allocated to a job. In a VM, a user
can construct arbitrary computing environment that does not
depend on the structure of computing node in a HPC cluster
system. Thus, a computation of a job can be perform on
appropriate computing environment according to its resource
requirement and restriction. Resource provision by leveraging
VMs as virtualized computational resources has been adopted
in Cloud Computing [9], [10]. In the HPC field, resource
allocation by leveraging virtualized computational resources
has attracted much attention due to the growth of virtual
resource technology, and many researches to utilize virtualized
computational resources have been conducted [11], [12], [13],
[14].
For running a VM in a computing node, the hypervisor is
generally required. The hypervisor works as the middleware
for connecting between a VM and physical hardware (e.g.
Kernel-based Virtual Machine (KVM), Xen, VMware vSphere
Hypervisor, and Hyper-V). The VMs in a computing node
are not shared the internal resources each other due to a fact
that they are managed as different computing environments.
Since a today’s computing node has of many CPU cores
and large memory, the characteristic is useful to guarantee
the performance of computational resources allocated to a
job from other jobs’ behaviors. In the viewpoint of resource
management, virtualized computational resources also have
some advantages compared with the resource management
based on physical computational resources.
MIPRO 2016/DC VIS
B. Functionalities for Handling Virtualized Computational
Resources
In this section, we analyze the required functionalities for
controlling the virtualized computational resources as well as
the physical computational resources and network resources on
the SDN-enhanced JMS framework. To realize that virtualized
computational resources are handled on the SDN-enhanced
JMS framework, the following functionalities are considered
to be necessary.
1) User interface to indicate an arbitrary user-required VM
for executing a program in a job.
2) Controlling the behaviors of VMs in computing nodes
allocated to a job.
3) Installing modules and configuration for deploying the
job process in VMs.
4) Managing the information of virtualized computational
resources allocated to jobs.
The first functionality is user interface to indicate a userrequired VM for executing a computation in the job. This
functionality is necessary to allocate arbitrary computing environment constructed by a user to a job in an HPC cluster
system. Additionally, the parameters to define the number of
CPU cores and the amount of memory in a VM should be
included in this functionality. The second functionality is required for controlling the behavior of VMs because the SDNenhanced JMS framework cannot operate the action of VMs
in computing nodes. Before starting to assign job processes
to VMs, they have to boot up. Moreover, after the job is
finished, the VMs must shut down for releasing the resources
allocated to a job. Since the behavior of a VM is generally
controlled by a hypervisor, this functionality enables to operate
the hypervisor from the SDN-enhanced JMS framework. The
third functionality is a mechanism to set up an environment
of VM as a computing node of the HPC cluster system. In
the proposed virtualized computational resource management,
since it is supposed that the computing environment in a
VM is constructed by a user, the VM does not have the
information and the functions to control job processes from the
SDN-enhanced JMS framework (e.g. environment variables
and module to manage job processes, user information for
authentication and authorization, a path to access the user’s
home directory in the HPC cluster system, and so on). Thus,
the functionality to construct computing environment capable
of managing a VM from the SDN-enhanced JMS framework
is essential. The fourth functionality is a mechanism to manage the information of virtualized computational resources
allocated to jobs. Though the amount of used virtualized
computational resources can be administered as the usage of
physical computational resources, this information is necessary
for administrators and users to check the status of virtualized
computational resources.
C. Implementation of Virtualized Computational Resource
Management
In this section, we describe the implementation of the novel
SDN-enhanced JMS framework which equips the functional-
259
SDNͲenhancedJMSframework
NetworkControl
Brain
OpenFlowcontroller
(Trema)
ResourceAssignment
PolicyClassModule
Flow
entry
Policy
Interconnect
Administrator
Job
Script
User
Traditional
JMS
(OGS/GE)
qmaster
Information
ofVirtual
Computational
Resources
vSwitch
VM
VMpool
J
o
b
shepherd
vm control
execd
Computingnode
Fig. 2.
Architecture of SDN-enhanced JMS framework with the functionalities virtualized computational resource management.
ities for handling virtualized computational resources. As for
a hypervisor in a computing node, the KVM hypervisor is
adopted due to the fact that the OS of a computing node in
many current HPC cluster system is Linux. In order to manage
virtualized computational resources as well as physical computational resources on the SDN-enhanced JMS framework,
we have adopted a strategy of enhancing a mechanism of
computational resource management on the OGS/GE. Figure 2
illustrates architecture of our proposed SDN-enhanced JMS
framework with the functionalities for managing virtualized
computational resource.
For the first functionality listed in Section III-B, we implemented a user interface by extending the parameters of
job script offered by the OGS/GE with similar uses as the
requirement of network resources. An example of extended job
script is shown in Fig. 3. The extended job script allows a user
to embed the job’s requirements to virtualized computational
resources as well as physical computational. In this example,
the requirements of virtualized computational resources are
composed of a path of VM image and parameters of VM
setting such as the amount of memory in the VM. As for the
number of CPUs configured in a VM, it is limited to one in
this implementation for equalizing the number of allocating
slots.
The second functionality listed in section III-B was achieved
by operating the KVM hypervisor on each computing node
from the SDN-enhanced JMS framework. In order to control
260
#!/bin/csh
#$ͲqQUEUE_NAME
#$Ͳpe orte32
#$Ͳlnetprio=policy_name
#$Ͳlvm image=/vm/myͲvm
#$Ͳlvm memory=8gb
mpirun Ͳnp$NSLOTS./a.out
Fig. 3. Example of extended job script in proposed resource management
system.
the KVM hypervisor on each computing node, we developed
a new control module, called “vm control”, and deployed
it in each computing node. This module is called by the
“execd” module managed by the qmaster of OGS/GE, and
then performs the start or stop control of VMs on a computing node. In the OGS/GE, “execd” module usually calls
“shepherd” module for deploying job processes in a computing
node. In virtualized computational resource management in
the proposed SDN-enhanced JMS framework, “execd” module
calls vm control module instead of “shepherd” module for
managing the behavior of VMs. Since this enhancement of
the process flow to control job processes in a computing node
is performed by utilizing a setting in the OGS/GE, the source
code of OGS/GE is not changed.
MIPRO 2016/DC VIS
Decidingresourceallocationtojob
Specifiation of computing node
Calling“vm control”
GettingVMinformation
(1)
Replacingnodeinformation
(2)
preͲprocessing
MakingCDimageforVMsetting
BootingupVM
Fig. 5.
OS
CentOS 6.3
CPU
Intel Xeon E5-2620(2.00GHz) x2
Memory
64GB
Network
on board Intel I350 GbE
Specification of cluster system for evaluation.
(3)
ConfiguringVMenvironment
Assigningjobprocess
Finishingjobprocess
StoppingVM
Fig. 4.
postͲprocessing
Process flow of “vm control” module.
Fig. 6.
The third functionality listed in section III-B is implemented
as scripts included in the “vm control” module. The functions
of the “vm control” module are composed of three steps: (1)
Getting the VM information, (2) modifying a list of computing
nodes allocated to the job, and (3) setting up an environment
in a VM based on retrieved information. Figure 4 shows the
process flow of the “vm control” module.
The step (1) is to get the hostname of VMs allocated to
a job. In this implementation, the hostname of each VM
is decided based on the one of allocating computing node.
Since the VM hostname is not registered as the computing
node of the HPC cluster system to the OGS/GE, the SDNenhanced JMS framework cannot handle them. Thus, it is
necessary to retrieve the hostname of all VMs allocated to
the job. In the step (2), the list of allocating computing nodes
generated by the OGS/GE is rewritten into the list of VMs. In
the OGS/GE, the allocation of job processes is performed in
accordance with the list of allocated computing nodes. In order
to assign job processes to VMs by using the mechanism of the
OGS/GE, replacing the list based on the hostname retrieved
in the step (1) is required. The step (3) is to boot up allocated
VMs on computing nodes and then reconfigure computing
environment of the VM for executing a job process. In this
step, a configuration process which is performed after a VM
starting is prepared as a CD image with a script to set up it,
and then the CD image is run in each VM starting. Moreover,
since the CD image includes a script to execute “shepherd”
module, the OGS/GE can assign a job process to the VM.
Regarding the fourth functionality listed in section III-B,
the brain component of the SDN-enhanced JMS framework
store the information of VMs allocated to each job into a
database. Since a traditional JMS provides a command to show
the status of resource allocation to jobs, a command to refer the
MIPRO 2016/DC VIS
Situation of resource allocation to a job.
information of VMs allocated to each job was implemented.
Moreover, in this enhanced SDN-enhanced JMS framework, a
user can also require the network resources through the same
way.
IV. E VALUATION
In this section, we mention the evaluation of the behavior
of the proposed SDN-enhanced JMS framework with the
mechanism to control virtualized computational resources in
the developing environment. The evaluation environment is a
cluster system which is composed of 4 computing nodes and
3 OpenFlow switches (NEC PF5240) as illustrated in Fig. 5
To observe the allocation of virtualized computational resource by the proposed SDN-enhanced JMS framework, we
conducted the experiment in which multiple job with the
different number of processes were submitted to the cluster
system. In the experiment, as a requirement of virtualized computational resources, every jobs requested a same VM image,
which consists of a single CPU, 2GB memory and CentOS
6.3. Moreover, a user’s home directory was mounted to the
VM through Network File System (NFS) by the configuration
performed by the “vm control” module.
Figure 6 shows the result of two commands to display
the status of the allocation of physical and virtualized computational resources. The upper part of Fig. 6 is the result
of qstat command, which is provided by the OGS/GE for
displaying the usage of computing nodes. The lower part
of Fig. 6 show the situation of the allocation of VMs by
qstat+ command, which is developed in this implementation
of the fourth functionality described in Section III-C. From
Fig. 6, it was confirmed that the proposed SDN-enhanced JMS
framework makes it possible to assign job processes in userrequested VM like as the allocation to physical computational
261
TABLE I
OVERHEAD OF PROCESS TO
Process Num
1
2
3
4
CONTROL VIRTUALIZED COMPUTATIONAL
RESOURCES .
Pre-process (sec)
1.299
1.315
1.140
1.196
Post-process (sec)
0.004
0.004
0.004
0.004
resources. The proposed SDN-enhanced JMS framework allows a user to request either physical or virtualized computational resources to the HPC cluster system. Furthermore,
a user can simultaneously require network resources by the
functionality of the original SDN-enhanced JMS framework
because a communication path between VMs is the same as a
path between computing nodes in which the respective VMs
are accommodated.
Next, we measured the overhead caused by the “vm control”
module for controlling VMs on the proposed SDN-enhanced
JMS framework. The overhead of the “vm control” module
has the pre-process and the post-process as shown in Fig. 4:
the process to boot up VMs on allocated computing nodes
and then configure the computing environment in the VMs
before the job execution starts, and shutting down the VMs
allocated to the job after it finished. Since these processes
increase a processing time to allocate resources compared
with the original SDN-enhanced JMS framework, it is essential to evaluate the influence of overhead generated by
the virtualized computational resource management. Table I
shows the measurement of each process in the “vm control”
module. The measurement results are classified according
to the number of processes in jobs. From this result, it
was confirmed that the additional resource allocation time to
control the behavior of VMs is not much to influence efficient
resource management. Generally, since the time required for
calculating an appropriate set of resources is spent several
tens of seconds, the effect to the system throughput is small.
Moreover, since the number of processes did not affect the
processing time of resource allocation, it is also considered to
adopt the allocation of virtualized computational resources to
the large-scale computation.
V. C ONCLUSION
In the service of today’s HPC center, to provide virtualized
computational resources for a job like as Cloud Computing is
required for handling various users’ computational requests.
In this paper, we proposed a mechanism of virtualized computational resource management on top of the SDN-enhanced
JMS framework, which we have been studying and developing for efficiently and flexibly managing various resources
and described the architecture of the novel SDN-enhanced
JMS framework capable of handling the VM as virtualized
computational resources. In the evaluation, we showed the behavior of virtualized computational resource allocation by the
proposed SDN-enhanced JMS framework as well as physical
computational resources.
262
As the future work, we will conduct the evaluation experiments on the larger computing environment and confirm
the effectiveness of this resource management technology.
Moreover, we will investigate more flexible network resource
management by leveraging the functions of vSwitch.
ACKNOWLEDGMENTS
This research was supported in part by the collaborative
research of the National Institute of Information and Communications Technology (NICT) and Osaka University (Research
on High Functional Network Platform Technology for Largescale Distributed Computing).
R EFERENCES
[1] B. A. Kingsbury, The Network Queuing System, Sterling Software, Palo
Alto, 1986.
[2] R. Henderson, “Job Scheduling under the Portable Batch System,” in
Job Scheduling Strategies for Parallel Processing, D. Feitelson and
L. Rudolph, Eds. Springer, 1995, vol. 949, pp. 279–294.
[3] “Open Grid Scheduler: The official Open Source Grid Engine.”
[Online]. Available: http://gridscheduler.sourceforge.net/
[4] Y. Watashiba, S. Date, H. Abe, Y. Kido, K. Ichikawa, H. Yamanaka,
E. Kawai, S. Shimojo, and H. Takemura, “Performance Characteristics
of an SDN-enhanced Job Management System for Cluster Systems with
Fat-tree Interconnect,” in Emerging Issues in Cloud (EIC) Workshop, The
6th IEEE International Conference on Cloud Computing Technology and
Science (CloudCom 2014), December 2013, pp. 781–786.
[5] ——, “Efficacy Analysis of a SDN-enhanced Resource Management
System through NAS Parallel Benchmarks,” The Review of Socionetwork
Strategies, vol. 8, no. 2, pp. 69–84, December 2014.
[6] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,
J. Rexford, S. Shenker, and J. Turner, “OpenFlow: Enabling Innovation
in Campus Networks,” SIGCOMM Computer Communication Review,
vol. 38, no. 2, pp. 69–74, Mar 2008.
[7] “Trema: Full-Stack OpenFlow Framework for Ruby/C.” [Online].
Available: http://trema.github.com/trema/
[8] S. S. Laurent, E. Dumbill, and J. Johnston, Programming Web Services
with XML-RPC. O’Reilly & Associates, Inc., 2001.
[9] B. P. Rimal, E. Choi, and I. Lumb, “A taxonomy and survey of cloud
computing systems,” in 2009 Fifth International Joint Conference on
INC, IMS and IDC, 2009, pp. 44–51.
[10] W.-T. Tsai, X. Sun, and J. Balasooriya, “Service-oriented cloud computing architecture,” in 2010 Seventh International Conference on Information Technology: New Generations (ITNG), 2010, pp. 684–689.
[11] Y. Chen, T. Wo, and J. Li, “An efficient resource management system for on-line virtual cluster provision,” in Cloud Computing, 2009.
CLOUD’09. IEEE International Conference on. IEEE, 2009, pp. 72–
79.
[12] X. Li, H. Palit, Y. S. Foo, and T. Hung, “Building an hpc-as-a-service
toolkit for user-interactive hpc services in the cloud,” in Advanced Information Networking and Applications (WAINA), 2011 IEEE Workshops
of International Conference on, 2011, pp. 369–374.
[13] M. Taifi, A. Khreishah, and J. Y. Shi, “Building a Private HPC Cloud for
Compute and Data-Intensive Applications,” the International Journal on
Cloud Computing: Services and Architecture (IJCCSA), pp. 1–20, 2013.
[14] J.-L. Yu, C.-H. Choi, D.-S. Jin, J. R. Lee, and H.-J. Byun, “A Dynamic Virtual Machine Allocation Technique Using Network Resource
Contention for a High-performance Virtualized Computing Cloud,”
International Journal of Software Engineering & Its Applications, vol. 8,
no. 9, 2014.
MIPRO 2016/DC VIS
Near Real-time Detection of Crisis Situations
Sylva Girtelschmid∗ , Andrea Salfinger† , Birgit Pröll‡ , Werner Retschitzegger§ , Wieland Schwinger¶
∗‡
Inst. for Application Oriented Knowledge Processing, †§¶ Dept. of Cooperative Information Systems
Johannes Kepler University Linz
Altenbergerstr. 69, 4040 Linz, Austria
K0956640@students.jku.at∗ , {andrea.salfinger† , werner.retschitzegger§ , wieland.schwinger¶ }@cis.jku.at, bproell@faw.jku.at‡
Abstract—When disaster strikes, be it natural or man-made,
the immediacy of notifying emergency professionals is critical
to be able to best initiate a helping response. As has social
media become ubiquitous in the recent years, so have affected
citizens become fast reporters of an incident. However, wanting
to exploit such ‘citizen sensors’ for identifying a crisis situation
comes at a price of having to sort, in near real-time, through
vast amounts of mostly unrelated, and highly unstructured
information exchanged among individuals around the world.
Identifying bursts in conversations can, however, decrease the
burden by pinpointing an event of potential interest. Still, the
vastness of information keeps the computational requirements for
such procedures, even if optimized, too high for a non-distributed
approach. This is where currently emerging, real-time focused
distributed processing systems may excel. This paper elaborates
on the possible practices, caveats, and recommendations for
engineering a cloud-centric application on one such system. We
used the distributed real-time computation system Apache Storm
and its Trident API in conjunction with detecting crisis situations
by identifying bursts in a streamed Twitter communication. We
contribute a system architecture for the suggested application,
and a high level description of its components’ implementation.
I. I NTRODUCTION
During the last decade, social media channels have evolved
to the point of becoming omnipresent in our everyday lives.
Platforms, such a Twitter, play a vital role in sharing information fast. It should come to no surprise that the Twitter
data pool is popular for information mining as it can reveal
valuable insights, be it for crisis monitoring applications,
or marketing, to monitor the perception of a new product.
Moreover, being able to utilize Twitter data to promptly detect
a situation that requires a responsive action from a rescue crew,
e.g. during natural disasters, can often save lives. However,
even the situation detection alone is non-trivial. For a human,
processing Twitter conversations for this purpose would mean
having to look into the content of each message and extract
the interesting information from it. A more efficient approach,
better suited for a computer, is to cluster similar posts without
having to first consider the semantics of the content. To satisfy
the real-time detection demands, techniques for bursty topics
identification are applicable here.
Bursts are in this context collections of posts mentioning the
same topic more often within the current period than within the
previous one. The problem of detecting burst topics falls under
a broader category of research dealing with Topic Detection
and Tracking (TDT) that has been extensively addressed in
academic articles.
MIPRO 2016/DC VIS
Of the various TDT approaches, clustering is a popular one
to identify bursts. Other methods for burst detection include
finite state automata, Fourier transform, time series, or Wavelet
transform [1]–[3]. For our first setup, the clustering approach
appealed the most to us thanks to its inherent property of being
readily parallelizeable.
However, the enormous throughput of the real-world
streams of Twitter posts prohibits timely online processing
of a sequential implementation of state-of-the-art methods for
topic clustering. For an acceptable performance, this problem
requires single-pass, and optimized methods, as well as scalable implementation. In the recent past there has been work
on successfully applying Cloud technologies to the online
clustering problem to guarantee scalability, and near realtime processing of streaming data [4]–[6]. In this paper we
show that Cloud technologies have a potential in improving
responsiveness during emergency situations through mining
social media content.
Since our system also needs to work with persistent data,
we decided to make use of the Trident API, a high-level
abstraction of Apache Storm, as it makes stateful processing
more manageable than Storm alone. Although Trident is
gaining popularity, to the best of our knowledge, we are not
aware of any other systems that employ Trident for the use
of Burst Topic Detection. In our work, we evaluate Trident’s
applicability to detecting an outburst of a crisis situation in
near real-time. We propose an architecture for such an online
disaster detector system and contribute a high level description
of a Trident topology implementation.
The rest of this paper is organized as follows: In the next
section we discuss a number of research areas related to our
work and identify the specific ideas from which our devised
system borrows. In sec. III, we detail our approach to detecting
new emergency situations from Twitter data streams. Sec. IV
discusses our test cases. Finally, in sec. V, we conclude on
our findings, and provide an outlook on future work.
II. R ELATED W ORK
Due to the multi-disciplinarity of the envisioned application
domain, related approaches, and valuable preparatory work
need to be drawn from several areas. Techniques essential to
address the requirements imposed by the crisis management
application domain can be found in the research fields of Event
Detection, First Story Detection, Burst Detection, Knowledge
263
Bases, Keyword and Topic Extraction, and Parallelization,
which we will motivate and discuss in the following.
a) Event Detection: Finding a common topic within a
set of documents has been covered extensively in research
(TDT initiatives). In the subject of crisis monitoring, this
research also finds its application, as event/story detection is
in this subject’s heart. There are various approaches to this
problem: e.g. in [7], [8], the method of clustering documents
based on word similarity using the Vector Space model
(VSM) is presented. In [9], the clustering of the documents
is improved by considering the locality information in the
process of discriminating the events into the clusters. In [10],
the similarity score calculation also incorporates the social
network structure besides the content of the message. Yet
another class of solutions for event detection, and tracking
bases on probabilistic methods, such as presented in [11]–
[13]. There, the actually encountered message density w.r.t. the
specific topics is compared to an expected density. A valuable
work summarizing research on event detection in the realm of
Twitter can be found in [14].
b) Detection of Novel Events: Although organizing messages into clusters brings a useful insight to our work, it is
not sufficient for our problem solution. We have to consider
methods capable of identifying those topics that have not been
discussed before within some long enough time period. In
other words, we need to identify posts that are discussing a
new story. First Story Detection (FSD) methods are therefore
well applicable here, as they are designed to detect when a
document discusses a previously unseen content. Examples
of FSD implementation are presented in the works of [15],
[16]. The authors propose an optimized approach to FSD,
which makes it well suited for identifying events online, i.e.
for identifying events from real-time streaming text such as
tweets. The ideas set out in these works form the basis of our
new event detection algorithm.
c) Identifying Bursts: Typically, whenever a newsworthy
event occurs, many people tend to share information about it
on social media. This causes a temporal burst of closely related
messages which can be captured [1], [17]. Detecting bursts
has also been well studied [1]–[3], [18]. For example, in [3],
the author achieves real-time event detection by clustering
wavelet-based signals. Wavelet transformation technique is
applied to build signals for individual words, which are cross
correlated and the signals are then clustered using a scalable
eigenvalue algorithm. In our approach, we use a bucketing
method, similar to that proposed by [15], [16], since it directly
fits the parallel programming paradigm we adopted.
d) Knowledge-based Sensing: In our system, we also
need to consider techniques that allow us to report only
events which we are interested in. In this context, an event
is defined as a topic that suddenly draws the attention of
the public. As such, it may often be of no value for our
purposes, since, for example, news about celebrity deaths are
also causing abrupt increase in related posts being sent. Our
application needs to be able to identify only those events that
are related to disasters. To achieve this, a disaster ontology
264
may be employed. There have been many attempts in building
an ontology related to disaster. It is, however, often the case
that existing disaster ontologies focus only on a specific type
of disaster, which they explore and describe in detail [19],
[20]. For our purposes, it is sufficient to have a high level
disaster dictionary to identify the particular disaster situation
referred to in the tweets. Working with multiple languages
would, therefore, be also feasible.
e) Extracting Keywords: Document keyword and
keyphrase extraction is another complex problem addressed
in the research [21]. The frequent techniques include word
frequency analysis, distance between words, or lexical chains
to rank the keywords. When it comes to keyphrase extraction,
graph-based ranking methods are being successfully used.
However, such methods are proposed primarily for a single
document or a document collection. This is not directly
applicable when it comes to extracting representative words
from such a short document as a tweet. The large variety of
topics found in tweets also makes this task more challenging.
In [22], the authors propose a method specifically for keyword
extraction from Twitter. The approach is based on organizing
keyphrases by topics learnt from Twitter. For our purposes,
the best approach is to apply a content-sensitive keyword
extraction, as our tweet collection from which we need to
extract the keywords is already formed of similar content.
f) Meeting Real-time Demands: The algorithms that can
achieve our task of detecting a new story from a real-time
stream of data are comparatively computationally expensive.
To safely achieve real-time response of a system processing the
full streaming data flow from Twitter (the so called ‘firehose’
), parallelization of the algorithm is necessary. Some recent
studies have shown that the implementation of clustering algorithms on the Storm distributed framework is reasonable [4],
[5]. In our implementation, however, we utilize the higher
level Storm API called Trident. This design benefits from the
guarantee of exactly-once semantics1 , as well as the ability to
process streaming messages in batches which performs better
when querying a database system.
III. I MPLEMENTATION
Extracting useful information from Twitter data flow is
no doubt a challenge. The posts are not only short (140
characters), and varied in topic but also highly noisy, in that
most of the conversations contain a lot of useless babble. The
task of identifying bursts offers itself as the best approach to
finding information of value as more people will get drawn
into a conversation about something vital than about a certain
individual reporting on currently drinking the most delicious
cacao.
In this project, we focus on utilizing one of the FSD
technique which applies a Vector Space model and is optimized using Locality Sensitive Hashing. This technique is
described in detail in [16], and we provide a brief overview
1 During a node failure, only the messages that have not been fully processed
will get resent. This is as opposed to Storm’s at-least-once semantics, where
some messages may get processed twice.
MIPRO 2016/DC VIS
in subsection III-B1. This FSD algorithm outputs a similarity
score for each incoming message together with the ID of the
message to which it is the most similar. The next stage involves
the actual clustering, or bucketing to identify bursts. This is
performed based on the message’s content similarity score
and monitoring of the growth rate of the buckets in which
similar tweets are gathered. The components of our system
are described in sec. III-B.
A. Distributed Platform
Besides Spark Streaming2 , the Apache Storm Trident API
is currently the best suited framework for our application. It
simplifies the implementation of parallel tasks by providing
a programming interface suitable for stream processing and
continuous computation3 .
As already mentioned in sec. I, the successful use of Storm
Apache in data stream clustering applications has recently
been reported [4], [5]. The advantages of using the Trident
API over the core Storm API in our application are threefold:
First, Trident can guarantee exactly-once semantics as opposed
to at-least-once semantics of Storm. This means, that we don’t
have to worry about already processed messages being resend
in case of a failure in the computing cluster and therefore we
will not be faced with potential erroneous bursty event reports.
Another advantage of using Trident is the fact that it handles
streams of batches of tuples as opposed to streams of tuples.
This achieves better performance for communication with a
database.
Lastly, state persistence handling is incorporated generically
in the Trident API which allows us to be flexible in the
selection of our back-end technology. On the other hand, using
Trident requires some additional checks for our algorithm,
which we point out at the end of subsection III-B2.
B. System Components
Fig. 1 shows a component overview of the proposed system.
In our setup, the topology is fed by data from a KafkaSpout,
where the source of the stream is 1% of the Twitter firehose
accessed through a Hosebird client4 .
The first component of our system implements the core
parts of the open source project “First Story Detection on
Twitter using Storm“5 and adopts it for running on an actual
distributed cluster with real-life data streams. For later queries,
tweets are saved in the NoSQL distributed storage mechanism
Apache Cassandra. The next component takes care of grouping
of related tweets into buckets, which are monitored for bursts
based on their growth rate. References to the bulk of tweets
from the fastest growing bucket, i.e. the tweets form the burst,
are also made persistent. The task of the third component is
to decide whether the topic of the burst is of interest. The
final component executes only if a crisis related new event
was detected. Its task is to notify responsible operators from
2 http://spark.apache.org/streaming/
3 http://storm.apache.org/
4A
robust Java HTTP library for consuming Twitter’s Streaming API
5 https://github.com/mvogiatzis/first-stories-twitter
MIPRO 2016/DC VIS
the area of the event occurrence, and to automatically spawn
a new instance of a Twitter search tracking this event.
1) FSD using Locality Sensitive Hashing: In this section,
we briefly explain the gist of the FSD component.6 This
component outputs additional information for each incoming
message: the Twitter ID of the closest neighbour message (the
most similar post), and a score of similarity in the range of
[0.0 − 1.0], where 1.0 identifies an identical post.
Every newly arriving message from the Twitter data stream
is split into words and a number of preprocessing steps are
applied (e.g. removal of URLs and mentions, replacement
of some, often intentionally, misspelled words, as well as
simplistic7 word stemming). The cleaned up corpus is then
submitted for calculation of the nearest neighbour.
First, this process involves determining the TF-IDF (TermFrequency - Inverse Document Frequency) weighting for each
term in the message to convert the representation of the tweet
message into a vector normalized by Euclidean norm. The
new vector must then be compared to the vectors of the
preceding messages. An approximate near-neighbour among
the seen documents can be found fast using the Locality
Sensitive Hashing algorithm. It works on the assumption that
similar tweets tent to hash to the same value. This allows
for an efficient optimization where the number of documents
to which comparison has to be made is greatly reduced.
Namely, it uses hash tables to bucket similar tweets so that
each incoming tweet will be compared with only the tweets
that have the same hash.Finally, cosine similarly measure is
applied to compute the distance of the nearest neighbour from
the incoming tweet. If the calculated distance to the closest
tweet is below a predefined threshold, this algorithm also uses
an additional step of comparing the distance to a fixed number
of latest tweets. This step alleviates the problem of possibly
overlooking a closer tweet posted in the immediate time
proximity. Such problem can easily arise given the method’s
randomness of selecting tweets for comparison in the first
place.
2) Bucketing and Identifying bursts: In effect, the process
of grouping the posts that are similar also identifies a new
event. In other words, if an incoming message was found to
have a low similarity score, this message will be marked as
a potential new event (by placing it in a new bucket). Then,
given that there will be enough related/similar posts following
it, this event will grow in its own cluster. In [16], this process
is referred to as threading, whereas in [10] it is called the
cluster summary. We chose to describe this process in two
subtasks - one of bucketing and the other of identification of
the bursts.
The first task of this component is to gather similar tweets
in the same bucket. Before bucketing, tweets whose similarity score is unfavorable (too low) are filtered out. I.e. if
(1 − cosineDistance) < threshold the tweet is included in
6 For detailed explanation, refer to the original author’s website at http:
//micvog.com/2013/09/08/storm-first-story-detection/
7 For performance reasons, we only apply stemming to English words that
have a common stem in all their inflected variants
265
TRIDENT TOPOLOGY - part1
CLUSTERING NEW BURSTY STORIES
Real-time
stream of
tweets
350,000 tweets
per minute
From Kafka Spout tweets flow into the FSD algorithm
1% of Firehose
FSD
Optimized Locality
Sensitive
Hashing algorithm
Incoming tweet ID
Colliding tweet ID
Proximity score
BURST
EVENT DETECTION
Bucketing and burst
identification
From the selected
FSD result, tweets
get saved into C*
Topic, keywords, and
butst content (tweet IDs,
texts, users, and locations)
are saved into C*
TRIDENT TOPOLOGY - part 2
DISASTER DETECTION AND ACTION
ALERT
NOTIFIER
STREAMING FILTERED
SEARCH
HISTORICAL FILTERED
SEARCH
DISASTER
DETECTOR
Localized
Topic
Keywords
and
Burst
content
LOCALIZED
OPERATOR
NOTIFICATION
TOPIC'S
KEYWORDS
EXTRACTOR
TOPIC MATCHING
AGAINST
DISASTER
DICTIONARY
LOCALIZING
THE EVENT
Figure 1: System architecture overview
further processing. The predefined threshold was found to give
the best result if kept in the range of [0.5, 0.6]. As explained
in [16], higher threshold values cause the topics in a bucket
to be diverse and plentiful, while lower threshold makes the
topics very specific and scarce. A passing tweet will then be
placed either in a new bucket, if its nearest neighbour is not
already in one of the existing buckets, or it will join the already
existing cluster of similar tweets in their bucket. If it ends up
in a new bucket, it is considered to potentially contribute to a
new story discussing a previously unseen content. The number
of buckets is finite, so whenever all buckets are occupied and
a new story tweet streams in, the bucket that has the lowest
timestamp of its most recent content update is freed up for
reuse. Also the size of a bucket is limited. Whenever a new
tweet is determined to belong to a full bucket, we simply
remove the older half of the bucket to free up space. The
assumption is that the tweets that the removed IDs refer to
are not of importance (too far apart to be able to form a burst)
since otherwise they would have been already reported as a
266
burst.
An important Trident implementation detail lies in the
necessity to first simulate the filling of the buckets. This is
because Trident works on batches and, therefore, the state of
the buckets gets updated only at the end of a batch. Let us
consider the case when a new batch contains a new event as
its first tweet, and the tweets following it are related only to
that first tweet. Since that tweet is not placed in the bucket
before the next tweet is processed, the next tweet would also
be (incorrectly) identified as a new event and a new bucket
would be assigned to it. This is a clear disadvantage of using
batches in our algorithm, however, it pays off later when
communication with a database system is required.
Finally, once in a while a fastest growing bucket will be
detected. Before passing the burst content down the processing
pipeline, the Burst Event Detector determines whether this
bulk of tweets could be considered a burst by checking the
time span within which the tweets were posted. The burst
content is then consumed by the Disaster Detector component.
MIPRO 2016/DC VIS
3) Disaster Detector: The task of the Disaster Detector is
to find whether the captured event is an emergency situation
and, if it is, where it is located. First, the keywords must
be extracted and based on them, utilizing our disaster type
dictionary, we identify the topic. Finally, if the topic matches
a disaster situation, we localize it. At this stage, database query
is necessary to access the tweet.
a) Extracting keywords: As keywords, we wish to find
five words that best summarize the content of the burst. Our
approach to finding them involves calculating TF within the
burst. In other words, we only weigh each term positively
for the number of times it occurs within the burst. The
disaster dictionary is meant to contain words representative
of a specific disasters. For our testing purposes, we populated
our dictionary by representative keywords of a hurricane
and snowstorm disasters. The datasets were compiled from
querying Twitter’s historical stream for the specific disaster
types during their known occurrences.
b) Identifying disaster type: The topic is identified by
matching any of the keywords to the part of the dictionary
describing a given (predefined) topic. As an additional check,
needed to prevent overlapping terms causing us to select the
wrong topic, we weigh the keywords negatively relative to
the number of times it occurs in the dictionary entry for other
topics. The entries for topic, keywords, IDs of the contributing
tweets and their text, location (if given) and users’ information
are saved to Cassandra. The database can be queried by
other topologies (possibly spawned later to do retrospective
analysis on the tweets) without affecting the performance of
our system.
c) Geo-localizing the event: In order to inform only
those crisis response agencies that are directly concerned with
the occurring crisis situation, we need to be able to determine
where the event is happening. Spacial grounding of the tweets
in the burst requires to consider not only the coordinates,
if given, but also the user’s location as well as the places
mentioned directly in the text of the tweet. This non-trivial task
is carried out by the Geo-Tagger subcomponent we describe
in our previous work [23].
4) Alert Component: Once a disaster event is detected and
its location, type and representative keywords are determined,
we pass the information to the Alert component. The keywords
are used to start up new data collection sessions from Twitter
both from recent history as well as from a real-time stream.
Finally, the responsible operators can be notified who, for the
duration of the disaster, are assumed to administer searches
also from other social media channels using our CrowdSA
application presented in [23]–[25].
IV. R ESULTS AND E VALUATION
Due to the difficulty of determining the “ground truth” and
the appropriate metrics, a fully functional evaluation of our
system (the correct and near real-time detection of a crisis
situation) is beyond the scope of the present work. However,
we devised three types of tests to show that our proposal of
implementing the detection of crisis situations from Twitter
MIPRO 2016/DC VIS
content in Trident is reasonable. All tests carried out are run
in an unsupervised mode, where we do not assume to know
a priori the type of event being detected. The data collected
for this purpose are from hurricane Iselle reaching Hawaii in
August 2014, snowstorm affecting the US East cost in January
2016, and hurricane Patricia hitting Mexico in autumn 2015.
In the first test case, we published our collected historical
data to the Kafka queue and measured the processing speed
in terms of the number of tweets processed in a second.
We were able to reach an average (for the three different
datasets) processing speed of about 4000 messages per second.
This is below what the full Twitter firehose could serve (on
average, there are about 6000 messages posted on Twitter in
one second8 ). However, after increasing the parallelization and
utilizing a larger cluster, the performance can be boosted to
surpass the firehose requirements.
In the second testing scenario, we evaluated the effectiveness of the burst detection capability and the keyword extraction by consuming the historical datasets from the hurricane
Iselle. Our system could detect a number of bursts within
the datasets occurring relative to the time of the day. Most
importantly, the largest burst was detected on the 10th of
August, which corresponds to our graphical inspection of the
dataset.
For the third test, we fed the system by real-time Twitter
streaming data. These account for about 1% of the Twitter
Firehose. Since it was not expected to encounter any hurricane
or snowstorm disaster during the run of the test, we let
the system report on all bursts which it found, essentially
skipping the Disaster Detector component. We observed that
even with only six nodes, the system was performant in that
it processed tweets fast enough without getting congested and
falling behind.
All of the above tests were run on a six node virtual cluster
with 8 cores each, 64 bit CentOS, and 16GB of RAM. The
cluster was provisioned using Ambari9 and runs Storm, Kafka
broker, and a Cassandra server. The 1st node runs Nimbus
(Storms master daemon) while the rest are the worker nodes
(running Storm’s Supervisors). Greater parallelization within
Storm was selectively applied for intensive tasks such as
the dot product calculation in similarity estimations. As a
future evaluation strategy, we plan to experiment with different
parallelization setups within Storm while processing the same
data stream of tweets to better understand the capabilities of
the distributed framework.
V. C ONCLUSION AND F UTURE W ORK
In this paper, we reported on our work that focused on
the problem of near real-time (online) Burst Topic Detection
from streaming Twitter data in application to detecting newly
occurring crisis situations. Our approach involves clustering
tweet messages based on content similarity and implementing
the algorithms in a parallel fashion using the Apache Storm
8 http://www.internetlivestats.com/one-second/
9 https://ambari.apache.org/
267
Trident API. Our results have shown that the use of First
Story Detection in combination with capturing temporal bursts
of similar messages streamed from a Twitter firehose, and
mapping the captured events to a predefined disaster dictionary
enables fast reporting of a crisis situation when implemented
in a distributed fashion.
As a future work, we are interested in looking at the
potential improvements if the structure of the social network is
also considered in the similarity measurements. Additionally,
we plan to compare the results of the unsupervised event detection (as currently performed by our setup) with a supervised
approach in which we would use our dictionary to filter tweets
before they are passed to the clustering component. Last but
not least, we would like to experiment with incorporating a
more sophisticated model for filtering out the tweets before
bucketing. Namely, since in the case of a large burst, it might
be more effective to also include the very near tweets in the
processing, we’d like to use a dynamic threshold value updated
based on the current state of the stream.
ACKNOWLEDGMENT
The authors would like to thank Matthias Steinbauer and
the Telecooperation Institute of JKU Linz for providing cloud
computing resources and support during provisioning our
cluster.
This work has been funded by the Austrian Federal Ministry
of Transport, Innovation and Technology (BMVIT) under
grant FFG BRIDGE 838526 and under grant AD WTZ
AR10/2015.
R EFERENCES
[1] J. Kleinberg, “Bursty and hierarchical structure in streams,” in Proceedings of the Eighth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, KDD ’02, (New York, NY,
USA), pp. 91–101, ACM, 2002.
[2] Q. He, K. Chang, and E.-P. Lim, “Analyzing feature trajectories for event
detection,” in Proceedings of the 30th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
SIGIR ’07, (New York, NY, USA), pp. 207–214, ACM, 2007.
[3] J. Weng and B.-S. Lee, “Event detection in twitter.,” ICWSM, vol. 11,
pp. 401–408, 2011.
[4] X. Gao, E. Ferrara, and J. Qiu, “Parallel clustering of high-dimensional
social media data streams,” in Cluster, Cloud and Grid Computing
(CCGrid), 2015 15th IEEE/ACM International Symposium on, pp. 323–
332, May 2015.
[5] G. Wu, O. Boydell, and P. Cunningham, “High-throughput, web-scale
data stream clustering,” in Proceedings of the 4th Web Search Click Data
workshop (WSCD 2014), 2014.
[6] K. XIANGSHENG, “Microblog mining based on cloud computing
technologies: Mesos and hadoop.,” Journal of Theoretical & Applied
Information Technology, vol. 48, no. 3, 2013.
[7] J. Yin, A. Lampert, M. Cameron, B. Robinson, and R. Power, “Using
social media to enhance emergency situation awareness,” Intelligent
Systems, IEEE, vol. 27, pp. 52–59, Nov 2012.
[8] J. Rogstadius, M. Vukovic, C. A. Teixeira, V. Kostakos, E. Karapanos,
and J. A. Laredo, “Crisistracker: Crowdsourced social media curation
for disaster awareness,” IBM J. Res. Dev., vol. 57, pp. 1:4–1:4, Sept.
2013.
268
[9] M. Nagarajan, K. Gomadam, A. P. Sheth, A. Ranabahu, R. Mutharaju,
and A. Jadhav, Web Information Systems Engineering - WISE 2009:
10th International Confee rence, Poznań, Poland, October 5-7, 2009.
Proceedings, ch. Spatio-Temporal-Thematic Analysis of Citizen Sensor
Data: Challenges ann d Experiences, pp. 539–553. Berlin, Heidelberg:
Springer Berlin Heidelberg, 2009.
[10] C. C. Aggarwal and K. Subbian, “Event detection in social streams,” in
In Proceeding of the Twelfth SIAM International Conference on Data
Mining, (Anaheim, California, USA), pp. 624–635, SIAM / Omnipress,
April 26-28 2012.
[11] H. Smid, P. Mast, M. Tromp, A. Winterboer, and V. Evers, “Canary in a
coal mine: Monitoring air quality and detecting environmental incidents
by harvesting twitter,” in CHI ’11 Extended Abstracts on Human Factors
in Computing Systems, CHI EA ’11, (New York, NY, USA), pp. 1855–
1860, ACM, 2011.
[12] T. Sakaki, F. Toriumi, and Y. Matsuo, “Tweet trend analysis in an emergency situation,” in Proceedings of the Special Workshop on Internet
and Disasters, SWID ’11, (New York, NY, USA), pp. 3:1–3:8, ACM,
2011.
[13] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users:
Real-time event detection by social sensors,” in Proceedings of the 19th
International Conference on World Wide Web, WWW ’10, (New York,
NY, USA), pp. 851–860, ACM, 2010.
[14] F. Atefeh and W. Khreich, “A survey of techniques for event detection
in twitter,” Comput. Intell., vol. 31, pp. 132–164, Feb. 2015.
[15] H. Becker, M. Naaman, and L. Gravano, “Learning similarity metrics for
event identification in social media,” in Proceedings of the Third ACM
International Conference on Web Search and Data Mining, WSDM ’10,
pp. 291–300, ACM, 2010.
[16] S. Petrović, M. Osborne, and V. Lavrenko, “Streaming first story
detection with application to twitter,” in Human Language Technologies:
The 2010 Annual Conference of the North American Chapter of the
Association for Computational Linguistics, HLT ’10, pp. 181–189,
Association for Computational Linguistics, 2010.
[17] W. Xie, F. Zhu, J. Jiang, E.-P. Lim, and K. Wang, “Topicsketch: Realtime bursty topic detection from twitter,” in Data Mining (ICDM), 2013
IEEE 13th International Conference on, pp. 837–846, Dec 2013.
[18] Q. Diao, J. Jiang, F. Zhu, and E.-P. Lim, “Finding bursty topics from
microblogs,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL ’12,
(Stroudsburg, PA, USA), pp. 536–544, Association for Computational
Linguistics, 2012.
[19] S. Liu, D. Shaw, and C. Brewster, “Ontologies for crisis management: a
review of state of the art in ontology design and usability,” in Proceedings of the Information Systems for Crisis Response and Management
conference (ISCRAM 2013 12-15 May, 2013), 2013.
[20] D. De Wrachien, J. Garrido, S. Mambretti, and I. Requena, “Ontology
for flood management: a proposal,” Flood Recovery, Innovation and
Response III, vol. 159, p. 3, 2012.
[21] Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document
using word co-occurrence statistical information,” International Journal
on Artificial Intelligence Tools, vol. 13, no. 01, pp. 157–169, 2004.
[22] W. X. Zhao, J. Jiang, J. He, Y. Song, P. Achananuparp, E.-P. Lim, and
X. Li, “Topical keyphrase extraction from twitter,” in Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies-Volume 1, pp. 379–388, Association for
Computational Linguistics, 2011.
[23] A. Salfinger, W. Retschitzegger, W. Schwinger, and B. Pröll, “Crowd
sa – towards adaptive and situation-driven crowd-sensing for disaster
situation awareness,” in Cognitive Methods in Situation Awareness
and Decision Support (CogSIMA), 2015 IEEE International InterDisciplinary Conference on, pp. 14–20, IEEE, 2015.
[24] A. Salfinger, S. Girtelschmid, B. Pröll, W. Retschitzegger, and
W. Schwinger, “Crowd-sensing meets situation awareness: A research
roadmap for crisis management,” in System Sciences (HICSS), 2015 48th
Hawaii International Conference on, pp. 153–162, IEEE, 2015.
[25] A. Salfinger, W. Retschitzegger, W. Schwinger, and B. Pröll, “Mining
the disaster hotspots – situation-adaptive crowd knowledge extraction
for crisis management,” in Cognitive Methods in Situation Awareness
and Decision Support (CogSIMA), 2016 IEEE International InterDisciplinary Conference on, in press, 2016.
MIPRO 2016/DC VIS
Automatic protocol based intervention plan
analysis in healthcare
Miklos Kozlovszky1,2, Levente Kovács3, Khulan Batbayar1, Zoltán Garaguly1
Biotech Knowledge Center/Obuda University, Budapest, Hungary
MTA SZTAKI/Laboratory of Parallel and Distributed Computing, Budapest, Hungary
3 Physiological Controls Group/Obuda University, Budapest, Hungary
1
2
{kozlovszky.miklos@nik, kovacs.levente@nik, khulan batbayar@biotech,garaguly.zoltan@biotech}.uni-obuda.hu,
Abstract - Evidence and protocol based medicine decreases
the complexity and in the same time also standardizes the
healing process. Intervention descriptions moderately open
for the public, and they differ more or less at every medical
service provider. Normally patients are not much familiar
about the steps of the intervention process. There is a
certain need expressed by patients to view the whole healing
process through intervention plans, thus they can prepare
themselves in advance to the coming medical interventions.
Intervention plan tracking is a game changer for
practitioners too, so they can follow the clinical pathway of
the patients, and can receive objective feedbacks from
various sources about the impact of the services. Resource
planning (with time, cost and other important parameters)
and resource pre-allocation became feasible tasks in the
healthcare sector. The evolution of consensus protocols
developed by medical professionals and practitioners
requires accurate measurement of the difference between
plans and real world scenarios. To support these
comparisons we have developed the Intervention Process
Analyzer and Explorer software solution. This software
solution enables practitioners and healthcare managers to
review in an objective way the effectiveness of interventions
targeted at health care professionals and aimed at
improving the process of care and patient outcomes.
Keywords
health care intervention, process analyzer
I.
INTRODUCTION
A growing demand can be seen in the healthcare sector
for value-added, premium category medical services,
which involves continuous medical monitoring and care.
Another aspect within Europe is the increasing
tendency of patient tourism and the ever growing number
of medical services for foreigners at premium service
providers. Interventions at a premium medical service
provider has always well defined parameters (e.g.: cost,
duration, etc.) and the whole intervention plan of the
patients can be easily represented by timed graph
structures. Intervention plan is part of the clinical pathway
and in larger scale it is an important part of the patient’s
life path.
MIPRO 2016/DC VIS
An intervention plan is basically builds up from
generic consensus protocol(s), additionally it is always
patient-centered thus it contains some patient related
personalized parameter set as well. If we could compare
the planned interventions (intervention plan) with the
occurred interventions we could objectively assess the
differences in medical services, analyze their impact, the
effectiveness of the core consensus protocol, and the real
usage scenarios of the underlying protocol. For such
motivation we have created an Intervention Process
Analyzer and Explorer software solution, which is
automatically able to assess deviations from the
intervention plan in an objective way. Our paper builds up
as follows: In the first section we give an overview about
our term definitions, in the next section we detail the predefined requirements of the solution and the identified
data sources. This will be followed by an overview about
the intervention plan analysis (about the analysis levels,
and the used methods), then we provide a short descript
about the internal architecture of the solution and finally
summarize our results.
II.
STATE-OF-THE-ART
In recent years Electronic Medical Record and medical
billing process is merging together, and provides solid
foundation for more effective tracking of patients,
medical services and interventions. Patient Health
Records (PHR) are evolving into digital format. HIMSS
Analytics developed [13] the Electronic Medical Record
Adoption Model (EMRAM) in 2005 as a methodology
for evaluating the progress and impact of electronic
medical record systems for hospitals and hospital
ambulatory facilities. In this model eight stages (0-7) has
been
identified,
that
measures
a
hospital’s
implementation and utilization of digital information
technology applications. The Electronic Health Record
(EHR) is basically a longitudinal electronic record of
patient health information generated by one or more
encounters in any care delivery setting. It contains
information about patient demographics, progress notes,
problems, medications, vital signs, past medical history,
immunizations, laboratory data and various reports.
269
Electronic Health Records can be stored at medical
service providers or accessed online by public [10, 11,
12]. Workflows are invading many medical fields. For
example surgical workflows and their analysis is
nowadays a hot topic, thus procedure modeling,
workflow optimization, and skill assessment can provide
more effective and precise procedures in the operating
room (OR). Other medical workflows, treatment plans are
usually accessible only as consensus protocols, however
the real implementation of such protocols (orders, used
materials, drugs, timelines, etc.) differs significantly from
hospital to hospital. Some generic solutions are available
both at the academic side (quantitative and qualitative
workflow and EHR system evaluations), and at the
commercial side (e.g.: the HealthCloud from Salesforce
[8], and cloud based Practice Management systems).
A reliable and robust coding scheme and naming
convention system is inevitable to build up future proof
intervention plan databases. Recently there are still
problems in the healthcare sector:
•
coding schemes differs at country level (here we
should note that most of the codes based on ICDx published by WHO [14])
•
coding schemes are by default rigid structures,
however due to the new medical interventions
their scope (list) is always growing (WHO’s
most recent version is ICD-10, ICD-10 was
endorsed by the 43th World Health Assembly in
May 1990 and came into use in WHO Member
States as from 1994. ICD is currently under
revision, and planned release date for ICD-11 is
2018.
•
new national level coding and naming
convention scheme versions are in many cases
not backward compatible ,
•
different codes can be mapped to the same
element.
A. Used term definitions
In the followings we describe shortly the most
important basic terms for our paper defined/adapted from
literature [1, 4, 5]:
•
Protocol: generic set of interventions (events and
activities), and rules of a certain healthcare domain for a
well defined group of patients, which was developed as a
consensus agreement by a team of experts. It can have a
constant evolution over time (can have version, can be
expired after a well defined period of time), it based on the
best practices of the healthcare professionals (and also on
patient expectations) and the recorded patient statistics. It
contains a set of events and activities with time and spatial
constraints. Its internal structure can be represented as
formal process description or a graph (protocol graph),
which can contain iterations, conditional alternative paths.
270
•
Intervention plan: Belongs to a single patient and
contains events, activities, medical services with time,
location, and resource parameters in a personalized
manner. It can contain multiple protocols and also can
contain complex control patterns (such as iterations,
conditions, etc.)
•
Intervention plan graph: Visualization of the
intervention plan as a directed graph.
•
Clinical pathway: Contains a set of events and
activities with time and spatial constraints described in
minimum one (personalized) intervention plan.
•
Realized interventions: Multisource dataset,
which belong to a single patient, and contains event,
activity and medical service logs, with time, location and
patient data and various resource parameters. It is a clear
reflection what was historically occurred with the patient
during its clinical pathway.
III.
AIMS AND REQUIREMENTS
Premium medical service providers are only viable, if
they able to run their “business” with high efficiency. The
market filters out all the medical service providers, which
are not able to assess their potential and the quality of
their services. These data can only collected with
accountable, objective measurements, and continuous
high resolution service monitoring. On the patient side
new requirements appeared, such as pre-evaluation and
interactive monitoring of health services and intervention
modeling/virtualized service models with high accuracy.
Our Intervention Process Analyzer and Explorer solution
provides effective solution to do comparative analysis
between the intervention plans and the occurred
interventions, assess the difference and provide large
scale statistics about the frequently used intervention
scenarios. As identified requirements the realized solution
should support combination with external healthcare
databases and data mining applications to reveal more
aspects of healthcare services, and their effectiveness on
patient health. The analysis process should support
offline running mode, totally independent from any
interventions, without any intervention plan status
restrictions (if the targeted intervention plan is not in a
finished or closed state).
IV.
DATA SOURCES
Our software solution collects information from various
data sources, and stores these data in a semi structured
data repository for further processing. The simplified
system overview is shown in Figure 1.
MIPRO 2016/DC VIS
Figure 2. Intervention plan editor and visualizer GUI
Figure 1. System overview
The collected main data sets are the following:
• Patient data
o
Patient base data
o
Patient survey data (satisfaction survey,
etc.)
o
Anamnesis data
•
Healthcare service data
•
Historical logs received from the medical service
provider or any healthcare systems
•
Protocol data
•
Intervention plan data
•
Realized intervention data (from the healthcare
system)
V.
A. Analysis process
Two types of analysis have been identified: simple
analysis (individual intervention plan vs. realized
intervention set of a single patient) and population scale
analysis (complex, multi-parameter level intervention plan
comparative analysis of a set of patients) shown in Figure
3.:
•
Simple analysis compares the intervention
plan’s graph structure with almost 150
different, pre-defined performance indicators
to the occurred interventions (and the
measured parameters). We try not only to
compare the parameters, but assess the
impact of the difference
•
Population scale analysis can be a handy tool
for practitioners and healthcare managers to
create statistical analysis about protocol
usage, average statistical parameters and
expected patient outcomes.
BUILDING UP AND ANALYSIS OF
INTERVENTION PLANS
On the intervention plan editor GUI (shown in Figure 2.)
the intervention plan can be described as (an arbitrary
complex) workflow, where the small circles are
representing start/stop events, X represents alternative
pathways with conditions, large boxes with labels and
small icons are representing pre-defined intervention
processes, and arcs define the direction of the path.
Intervention plan analysis can be done from many
viewpoints. In the GUI different user groups (healthcare
manager, medical expert/practitioner) can do analysis
both on individual, or on a large set of intervention plans.
Figure 3. Intervention plan vs. realized intervention comparison
B. Analysed parameters
We have identified a large set of intervention task
parameters (Pi), which support during evaluation to
MIPRO 2016/DC VIS
271
objectively measure how the plans are matching the real
world:
•
Sequence of the intervention steps
•
Amount of intervention steps
•
Intervention step sub-parameters
•
Conditional path decision accuracy
•
Iterations
•
Service resource consumption
Logically, an intervention step parameter can be arbitrary
complex, and can contain undefined number of subparameters in a recursive form.
C. Used algorithms
We are using different type of algorithms to do
comparison between intervention plan and realized
interventions. :
•
Boyer-Moore algorithm [6], which is basically a
string searching algorithm.
•
Needleman-Wunsch algorithm [7], which is an
algorithm used mainly in bioinformatics to align protein
or nucleotide sequences using dynamic programming. It
provides global sequence alignment with penalties.
•
Smith-Waterman algorithm [8], which is also a
bioinformatics oriented matching algorithm. It provides
local sequence alignment using penalties.
D. Parameter difference evaluation
After the analysis process we are evaluating the parameter
value differences. We denote the P intervention process’s
i-th parameter as Pi .We have defined impact score matrix
and assigned an impact score (0<KPi<1) to each member
of the parameter set. We are using a simplified linear
evaluation function to calculate the impact of the
differences (I).
,
(1)
matrices can be used to analyze the intervention graphs
from different viewpoints (medical service provider has
easily different subjective impact values, than patients).
In the current version of our solution only similar
interventions process parameter types can be compared.
This is caused by the fact, that we are not using any
substitution matrix, to define mapping possibilities
between the different intervention processes and their
parameters. The evaluation results provide us a handy
solution to compare and analyze intervention plans with
the occurred intervention records. Similarities are helping
to define real consensus intervention paths. The
measurable differences are interesting investigation points
where a premium medical service provider can gain vital
information about sits service quality, and patients
requirements.
VI. SUMMARY
The aim of our research was to define a medical
intervention plan analyzer software solution, equipped
with accurate intervention evaluation algorithms,
combined with a user friendly graphical interface. With
the designed and developed software solution we are able
to compare arbitrary complex intervention graph
structures with occurred interventions received as patient
health records or intervention logs. Planned intervention
task parameters are mapped to the occurred intervention
parameters in an automatic way. We have defined the
weight of each intervention parameters, and calculate an
absolute distance of the two parameter values as the
impact of the occurred differences
From the parameter evaluation a lot of “hidden”
information can be extracted such as information about
the planning accuracy of the medical professionals, the
difference between the implemented intervention plans
and the official so called consensus intervention protocols
or the correlation factors between the (really) occurred
interventions and the patient outcomes, or the user
satisfaction level and the occurred interventions. The
developed framework solution is a generic intervention
plan analyzer, however it can be used or adapted easily to
a large set of medical domains. We have validated
successfully the system within the premium diabetes and
dental care medical service domains in Hungary.
where Pi holds the planned and P’i is the really
occurred intervention values.
VII.
,
(2)
simply calculates the weighted difference of the planned
and occurred intervention tasks. Even if we can note here
the subjective impact score value definitions. The larger I
means larger distance between the plans and the real
world scenarios. We can use the calculated values to
search for alternative intervention graph paths or simply to
optimize between the alternatives. Multiple impact score
272
ACKNOWLEDGMENTS
The projects have been supported by the European
Union. The authors would like to thank GOP-1.1.1-112012-0055- Dialogic „DIALOGIC – Mathematical model
based decision support system for diabetes monitoring”,
for its financial support. The authors would like to thank
PPT Ltd. for their research support many advices and
useful contributions.
REFERENCES
[1]
Surján György, Borbás Ilona, Gődény Sándor, Juhász Judit,
Mihalicza Péter, Pékli Márta, Kincses Gyula, Varga Eszter, Juhász
MIPRO 2016/DC VIS
[2]
[3]
[4]
[5]
[6]
[7]
Judit, Nagy Attila, Szabó Dóra, Vargáné Lőrincz Ildikó;
Egészségtudományi fogalomtár,
http://fogalomtar.eski.hu/index.php/Kezd%C5%91lap
Carry M. Renders,, Edward H. Wagner,, Gerlof D. Valk,, Jacques
Thm. Eijk Van,, Simon J. Griffin,, Willem J.J. Assendelft;
Interventions to Improve the Management of Diabetes in Primary
Care, Outpatient, and Community Settings, A systematic review;
Diabetes Care, Volume 24, Number 10, pp.:1821-1833, October
2001
Dr. Vanessa Diaz, Marco Viceconti, Veli Stroetmann, Dipak
Kalra et al. ;Radmal for the Digital Patient; DISCIPULUS project
report, 2013 march
A betegút szervezés módszertana és európai példái; HealthOnLine
Hírlevél; 2012/7 – Különszám; 2012. augusztus 27.
De Bleser L; Depreitere R; Waele K; Vanhaecht K; Vlayen J;
Sermeus W (2006): Defining pathways. Journal of Nursing
Management. 2006 october, 14. vol., 7.issue, pp.: 553-563. 11p.
Boyer, Robert S.; Moore, J Strother (October 1977). "A Fast
String Searching Algorithm.". Comm. ACM (New York, NY,
USA: Association for Computing Machinery) 20 (10): 762–772.
doi:10.1145/359842.359859. ISSN 0001-0782.
Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general
method applicable to the search for similarities in the amino acid
MIPRO 2016/DC VIS
[8]
[9]
[10]
[11]
[12]
[13]
[14]
sequence of two proteins". Journal of Molecular Biology 48 (3):
443–53. doi:10.1016/0022-2836(70)90057-4. PMID 5420325
HealthCloud from Salesforce:
http://www.salesforce.com/industries/healthcare/healthcloud/?d=70130000002DuoI , accessed on 20. March. 2016
Mala Ramaiah, Eswaran Subrahmanian, Ram D Sriram, Bettijoyce
B Lide; Workflow and Electronic Health Records in Small
Medical Practices; Perspect Health Inf Manag. 2012 Spring;
9(Spring): 1d.Published online 2012 Apr 1.PMCID: PMC3329208
EHR Portal http://www.practicefusion.com/phr/ , accessed on 20.
March. 2016
ChARM PHR Portal https://charmphr.com/login.sas , accessed on
20. March. 2016
IntelliChart Portal http://www.intelichart.com/ , accessed on 20.
March. 2016
EMRAM http://www.himssanalytics.org/providersolutions#block-himss-general-himss-prov-sol-emram , accessed
on 20. March. 2016
ICD-10, Classification of Diseases and Related Health Problems
10th Revision, Volume 2, Instruction manual, World Health
Organization, 2010 Edition, ISBN 978 92 4 154834 2 (NLM
classification: WB 15)
273
Using Fourier and Hartley Transform for Fast,
Approximate Solution of Dense Linear Systems
Željko Jeričević* and Ivica Kožar**
*
**
Department of Computer Engineering/Engineering Faculty, Rijeka, Croatia
Department of Computer Modeling/Civil Engineering Faculty, Rijeka, Croatia
zeljko.jericevic@riteh.hr
Abstract - The solution of linear system of equations is one
of the most common tasks in scientific computing. For a
large dense systems that requires prohibitive number of
operations of the order of magnitude n3, where n is the
number of equations and also unknowns. We developed a
novel numerical approach for finding an approximate
solution of this problem
based on Fourier or Hartley
transform although any unitary, orthogonal transform
which concentrates power in a small number of coefficients
can be used. This is the strategy borrowed from digital
signal processing where pruning off redundant information
from spectra or filtering of selected information in
frequency domain is the usual practice. The procedure is to
transform the linear system along the columns and rows to
the frequency domain, generating a transformed system.
The least significant portions in the transformed system are
deleted as the whole columns and rows, yielding a smaller,
pruned system. The pruned system is solved in transform
domain, yielding the approximate solution. The quality of
approximate solution is compared against full system
solution. Theoretical evaluation of the method relates the
quality of approximation to the perturbation of eigenvalues
of the residual matrix. Numerical experiments illustrating
feasibility of the method and quality of the approximation,
together with operations count are presented..
I.
INTRODUCTION
The solution of a system of linear equations is one of
the basic methods in scientific computing. In the case of
dense systems the order of magnitude N3 multiplications
is required (N is dimension of a square matrix)...
This paper extends the previous work [1],[2] in which
we developed a framework for efficient linear least
squares problems. We also supplement previous analysis
with classical forward and backward error analysis [3].
Some equations developed previously [1] are briefly
repeated here as final results without the details. The
basic idea of constructing the approximate solutions for
large, dense systems using the Fourier or Hartley space
representation remains the same.
274
II.
THEORY
The Fourier transform is formally done by
multiplication with Fourier matrix but in actual
computations FFT is used whenever possible.
b = Ac
(1)
Applying the Fourier transform on system of linear
equations (1) is done along the columns. Premultiplying
the (1) with Fourier matrix F and inserting F-1F=I after
the matrix A, the transform along the column vector y
and along the columns of matrix A, as well as the inverse
Fourier along the rows of A and the Transform of
solution vector c is performed (2). It is a good idea that
before the transformation, the elements of vector b and
corresponding row and column vectors in matrix A are
sorted in such a way that the transform has close to the
most compact representation. That would concentrate the
energy in the Fourier transform in as little number of
frequencies as possible.
Fb = ( FAF −1 ) ( Fc )
(2)
To avoid computations with complex numbers in the
case of real systems the Hartley transform [4] can be
used. This transformation of original system with the
terms grouped as shown using the parenthesis will yield
the Fourier or Hartley transform of vector c as a solution
for system (2).
III.
METHOD
After the transformation the system (2) is of the same
size as original system but it could be pruned down by
deleting rows and columns containing “insignificant”
information, yielding the smaller system. The information
is termed insignificant in the signal processing sense: the
frequencies whose magnitude is a smallest percentage of
total magnitude are discarded.
The selection of
significant frequencies is accomplished by computing the
magnitudes for frequencies in vector b and sorting them
MIPRO 2016/DC VIS
in decreasing order. For columns, the sum of frequencies
for each column is used for that purpose. Using that
information, the transforms of vector b and matrix A can
be shortened into significant parts to be retained and
insignificant parts to be discarded. The solution of pruned
system will yield the Fourier interpolation of solution
vector c. This approach decreases the size of the system
and represents the filtering out of non significant
frequencies in order to build a smaller model system with
a less equations and less unknowns then the original
system [5]. In this respect our approach is different then
Beylkin’s [6] whose idea was to increase the sparsity of
the matrix by using the wavelet transform. Using residual
matrix R and Taylor order expansion [1] it was shown
that approximate inverse for system converges faster to
exact equations inverse the smaller eigenvalues (λ) for (IR) matrix are (R is residual matrix).
( y = Fb
y = Bx
−1
p
B ≈B
B = FAF −1
R ≡ I − B p−1 B ⇒ B p−1 B = I − R
B = Bp ( I − R )
B = ( I − R) B
−1
−1
p
δ x = ( I − R ) B0−1δ y
−1
( I − R)
= ( QQ −1 − QΛQ
)
−1 −1
−1
= ⎡⎣Q ( I − Λ ) Q −1 ⎤⎦ = Q ( I − Λ ) Q −1
⎡
⎤
⎛ 1 ⎞
−1
Q
diag
Q
⎢
⎥→I
⎜
⎟
lim
−
λ
1
λ →0 ⎢
⎥
i
⎝
⎠
i
n
=
1,...,
⎣
⎦
( I − R)
−1
−1
( 3)
= I + R + R 2 + R3 + ...
Bn−1 ≡ ( I + R + ... + R n ) B p−1
b
2
(4)
2
(5)
2
2
Where index p denotes solution of the pruned system.
TABLE I
%
freq
||c||2
Δf
||b||2
Δb
4.7x105
5
100
4.7x10
5
4
83
1.2x105
0.62
0.29
0.63
6.9x10-6
3
68
874
0.70
0.22
0.63
5.3x10-5
2
43
24
1.09
1.20
0.62
4.3x10-3
1
25
1
0.43
1.03
0.60
1.9x10-1
0.64
0.63
Where N is the dimension of the square system, second
column defines how much of total frequency magnitude
is used in constructing the solution, and Con.# is
condition number of the matrix computed as a ratio of the
largest and smallest singular value. Forward and
backward errors are denoted Δf and Δb, respectively.
Solutions for Hilbert system (vector c) are shown in
Figure 1. and restored right hand side (vector b) in Figure
2.
Solutions of the linear system
value
Where Bp is pruned approximation of B based on the
most significant frequencies. Solving the model system
will yield the approximate solution of original system (1).
The approximate solution (x+δx) is different from true
solution x for difference vector δx. It is important to show
how difference vector behaves depending on size of
matrix Bp. The (3) shows that R should be a contraction
mapping: its eigenvalues should all be non-negative, and
smaller then one. Smaller the eigenvalues are, closer (IR)-1 is to identity matrix as can be seen from the limit (4)
and consequently closer Bp-1 is to B-1.
The system of linear equations based on Hilbert matrix
(an ill conditioned, but non-singular matrix) is used here
to illustrate the change in condition number for matrix of
decreasing size. The numerical experiments with forward
and backward error analysis are summarized in Table I.
Table I contains the results for rearranged Hilbert matrix.
The rearrangement was done to reduce the Gibbs rigging
Con.#
5
B∞−1 → B −1
MIPRO 2016/DC VIS
c
b − bp
Δb =
N
−1
c − cp
Δf =
x = Fc )
−1
−1
due to difference between the first and the last elements
in each column and row. The forward (4) and backward
(5) error analysis are defined here:
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
1
2
3
4
5
index
Figure 2. Exact solution (circle), N=4 (square), N=3
(diamond) N=2 (triangle up), N=1 (triangle down).
275
IV.
Restored right hand side of Ac=b
CONLUSION
The approximate method for a fast solution of linear
problems has been proposed. The quality of the
approximation can be tested and controlled. We developed
the equation which shows how the eigenvalues of residual
matrix determine quality of solution and extent of
corrections to be applied in order to approach the true
solution. The proposed method offers more detailed
assessment for the quality of solution then classical
forward and backward error analysis.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1
2
3
index
4
5
REFERENCES
Figure 2. Exact vector b (circle), N=4 (square), N=3
(diamond) N=2 (triangle up), N=1 (triangle down).
[1]
[2]
Some interesting conclusions can be drown from the
figures. The restoration of b is fairly good even for the
worst of computed solutions. Consequently, the backward
error analysis is too optimistic. The results of Forward
error analysis are more realistic, but are not easy to
compute, requiring the correct answer to evaluate an
approximate one. Approach with eigenvalues (3) seems
overly expensive, but to evaluate the convergence, only
the highest eigenvalue is required. If it is less then 1, the
iterative improvement will converge.
276
[3]
[4]
[5]
[6]
Jeričević Ž., Kožar I, "Faster Solution of Large, Over-Determined,
Dense Linear Systems", DC VIS MIPRO (2013)
228-231.
Jeričević Ž., Kožar I, "Theoretical and statistical evaluation for
approximate solution of large, over-determined, dense linear
systems", DC VIS MIPRO (2015) 227-229.
O’Leary, D.P., “Scientific Computing With Case Studies”, 2009,
SIAM, Philadelphia, pp 383
Bracewell, R.N., “The Hartley Transform”, Oxford University
Press, New York, 1986 p 160.
Jeričević, Ž., “Approximate Solution of Linear Systems” Croatica
Chemica Acta, Vol. 78 (2005) 601-615
Beylkin, G., Coifman, R., and Rokhlin, V, “Fast Wavelet
Transforms and Numerical Algorithms”, Com. Pure App Math.
Vol. 44 (1991) 141–183
MIPRO 2016/DC VIS
Procedural Generation of Mediterranean
Environments
N. Mikuličić* and Ž. Mihajlović*
*
University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia
niko.mikulicic@gmail.com, zeljka.mihajlovic@fer.hr
Abstract - This paper describes an overall process of
procedural generation of natural environments through
terrain generation, texturing and scattering of terrain cover.
Although described process can be used to create various
types of environments, focus of this paper has been put on
Mediterranean which is somewhat specific and has not yet
received any attention in scientific papers. We present a
novel technique for procedural texturing and scattering of
terrain cover based on cascading input parameters. Input
parameters can be used to scatter vegetation simply by slope
and height of the terrain, but they can also be easily
extended and combined to use more advanced parameters
such as wind maps, moisture maps, per plant distribution
maps etc. Additionally, we present a method for using a
satellite image as an input parameter. Comparing results
with real-life images shows that our approach can create
plausible, visually appealing landscapes.
Keywords: procedural modeling, landscape generation,
virtual environments, natural environments
I.
INTRODUCTION
Reproducing realistic environments has always been a
challenge in computer graphics mainly due to a large
number of data needed to be simultaneously processed
and displayed on screen. With many optimization methods
and ever advancing graphics hardware, realistic
landscapes have become achievable in real time and are
widely used in computer games, military simulators and
film industry. Recent trends show that virtual landscapes
have also found their usage in touristic promotions and
since environmental research has become an important
topic in the last years, it can be expected for virtual
environments to have a major role in simulating how
changes in different environmental parameters influence
the surrounding environment.
Manually creating a whole environment would be a
long lasting, tedious task, therefore many procedural
techniques have been developed to aid the process.
Although procedural, these methods usually have many
parameters and finding the right values is often a
time-consuming trial and error process. In recent years,
various methods for inverse procedural modeling have
been introduced [7][22]. Rather than setting parameter
values manually, these methods provide ways to learn
them. Nevertheless, procedural modeling remains the
most common approach to generation of environments.
MIPRO 2016/DC VIS
Authors discussing generation of natural virtual
environments usually reproduce mountain meadows and
dense forests. Although visually appealing, tall and dense
trees efficiently hide everything behind them so many
objects can be safely culled and popping effects can be
easily hidden. In this paper we focus on natural
Mediterranean environments which have not yet been
discussed. Mediterranean environments often lack tall
trees and instead have very dense shrubbery. Since shrubs
often cannot hide the landscape behind, various
optimization techniques have to be implemented with
extra care for popping effect to be properly hidden and for
the environment to be efficiently rendered in real time.
Our generation process is being done in four steps:
preparation, terrain mesh generation, texture mapping and
scattering of terrain cover. In preparatory step, a user
defines a set of textures and models to be used and assigns
them input parameter values. The second step is used to
create a terrain triangle mesh from heightmap which is
followed by procedural generation of splat map and
texturing of the terrain in the third step. The last step
places the terrain cover using scattering algorithms to
achieve natural randomness. Although divided into four
steps, the user often returns to preparatory step to adjust
parameter values and then repeats the process until he is
satisfied with results. The whole process needs to be
repeated only when user changes the terrain geometry.
Otherwise, user can focus on just one step.
Although the focus of this paper is on Mediterranean,
the described process can also be used to reproduce any
other natural environment simply by using different
textures and models.
After presenting related work in chapter two, chapter
three continues by explaining techniques used to represent
terrain, textures and terrain cover as well as the
optimizations required for real time performance. Fourth
chapter focuses on algorithms used for procedural
texturing and scattering of terrain cover. We present our
results in chapter five and give conclusion and some
guidelines for future work in chapter six .
II.
RELATED WORK
This work belongs to procedural modeling of natural
environments, and as such, it connects various fields of
computer graphics.
277
Procedural modeling automates the process of
content creation by using set of rules and algorithms rather
than creating content manually. It is often applied where
manual creation of content would be too cumbersome a
task. Procedural modeling has been successfully used to
create various types of content some of which include
textures [6], plants [10], buildings [12], cities [18], and
terrains [8].
Terrain generation is often being done in two steps:
generation of heightmap and creation of 3D mesh.
Heightmaps are usually generated using fractal algorithms
[1][11], noises [1][6] and erosions [1][13][17]. After
creating a heightmap, terrain mesh is generated and
optimized using level of detail algorithms. Static LOD
algorithms reduce computational at the cost of memory
demands and often need nontrivial and rigid preprocessing
stage [23]. Continuous LODs are generated dynamically
and are therefore more flexible and scalable but also
computationally more demanding [15][21].
Forest representation includes representations of
individual trees as well as the process of achieving their
natural distribution in ecosystem. Trees have been
represented with level of detail [3], billboards [20],
parallel billboards [9], volumetric textures [16] and
billboard clouds [5]. To create an ecosystem, trees are
scattered using global-to-local approaches [4][14] that
define some global rule by which the scattering occurs.
Another approach is local-to-global [2][4][14] that models
interaction between individual plants from which the
global distribution arises.
III.
RENDERING
Natural environment consists of several somewhat
independent layers: terrain, textures and terrain cover. In
order to achieve efficient rendering, optimization should
be done in each layer.
Terrain is usually represented with a heightmap
which can be generated procedurally or downloaded from
Internet in case real-world data is needed. Fig. 1 shows
heightmap of Croatian island Ist (left) and heightmap
generated using Perlin noise and Fractional Brownian
motion (right). To generate terrain triangle mesh from
heightmap we use Lindstrom-Koller simplification [15] on
a single terrain region. Creating multiple regions with
levels of detail is left for future work.
Textures are applied to terrain by simple planar
texture mapping. Splat mapping technique is used to
provide more surface details close to observer while
simple color mapping is applied to more distant areas
where details are not visible anyway.
Terrain cover is split into two subcategories: details
and high cover.
Details are small objects, like grass, visible only from
short distances. Due to their size, many objects are needed
to cover any portion of a terrain so they tend to use up
most of the available computational resources. In order to
achieve real-time rendering of large fields covered with
grass, various optimization methods have to be
implemented. Modeling each blade of grass independently
would quickly reach a maximum of vertices and polygons
278
Figure 1. Heightmap of Mediterranean island Ist, Croatia (left) and
procedurally generated heightmap (right)
a graphics card can process in real time and for that
reason, more efficient approaches have been developed.
We represent multiple blades of grass with one axial
billboard. Many billboards are needed to cover an area
with grass and sending each billboard to GPU
independently would mean too many draw calls and
therefore, slower rendering. Instead, multiple billboards
are grouped together and sent to GPU as point clouds in
which every vertex represents the position of a single
billboard. Billboard geometries are then created at runtime
in geometry shader at positions defined by vertices in the
given point cloud. Additionally, since details are visible
only from short distance, all billboards with distance to
observer greater than some user defined value are culled.
Terrain split to regions would fasten the culling process,
allowing large groups of billboards to be discarded with a
single distance check. To avoid sudden popping effects, a
transitional area is used in which a billboard goes from
completely invisible to fully visible using alpha cutout
technique.
High cover includes all objects that can be viewed
from greater distances like trees or larger rocks. To
optimize the number of polygons in a scene, levels of
detail have been used. Objects close to the observer are
rendered using high quality models, while lower quality
models are used on objects that are further away. To avoid
popping effects while transitioning between different
models of the same object, a cross-fade technique is used.
Both models are rendered for some small amount of time,
while one is slowly fading in and the other is fading out.
For the lowest quality model of a tree we use a simple
axial billboard.
IV.
PROCEDURAL GENERATION
In this chapter we focus on procedurally transforming
input parameters into texture weights and terrain object
placement probabilities. Texture weights are used to
create splat maps and color maps for terrain texturing
while terrain object placement probabilities are used to
generate terrain cover using scattering algorithms.
A. Weights Calculation
Input parameters are defined in preparatory step. For
each input parameter, every texture and terrain object
defines minimum and maximum values and a weight
function to describe parametric area it resides in. For
example, a texture can be given a slope input parameter.
To define a parametric area it resides in, a texture has to
specify minimum and maximum slope. Weight function is
used to describe texture's weight or preference in area
MIPRO 2016/DC VIS
between minimum and maximum. Fig. 2 represents a
common weight function although any curve can be used.
Weight functions are defined in
interval and scaled
to fit specified range.
Once we have defined parametric areas for all textures
and terrain objects, their weights can be easily calculated.
For texture or terrain object with parameters
and
, the weight
is defined as:
weight functions
(1)
is parameter
where is the number of parameters and
dependent function which maps terrain position
to
parameter value. It can represent calculating surface slope,
elevation, sampling value from texture etc. Put simply,
final weight is calculated by multiplying weights of
individual parameter values at given position.
B. Texturing
Texture weights are calculated to give us information
how much each texture belongs to specified area. This
information can be used to procedurally texture a terrain.
For terrain texturing we used splat mapping and color
mapping techniques.
Color mapping technique textures a whole terrain with
just one texture. One pixel of color map usually covers not
as small portion of terrain so this technique usually results
with dull and blurry terrains when observed from close. It
would require extremely large textures to create detailed
surface using this technique which would be quite
inefficient.
To add more surface detail, splat mapping technique is
used. Splat mapping uses multiple high detailed surface
textures that are mapped to a small part of terrain and then
tiled to fit the rest. One additional texture called splat map
is mapped to whole terrain and used to provide
information on how much each surface texture contributes
to the final color of surface. Final color is determined
dynamically in fragment shader by sampling all surface
textures and multiplying sampled colors with their
contributions sampled from splat map.
Splat map can be directly generated from weights
defined in previous chapter. One pixel in splat map
corresponds to terrain area at position . Texture weights
are then calculated using (1), scaled to sum of 1 and stored
in a single splat map pixel, one channel per surface
texture. This means that one splat map can store weights
for four surface textures. We can always use a second
splat map for four new surface textures, but that can
become inefficient. Usually, four surface textures are
more than enough.
Color map is generated similarly. Instead of storing
texture weights and calculating surface color dynamically
in fragment shader, colors are sampled, mixed and baked
into a color map. To avoid high frequencies in resulting
texture, colors are not sampled from original surface
textures but rather from their last mipmap.
MIPRO 2016/DC VIS
Figure 2. Weight function
C. Scattering of Terrain Cover
After procedurally texturing a terrain, next step is to
provide it with grass and high cover.
To achieve natural randomness and uniform
distribution of grass, we use a regular grid with random
displacement algorithm. Terrain is split into a regular grid
with one grass billboard assigned to the center of each
cell. Each billboard is then displaced in random direction
by amount small enough to remain inside the given cell.
For each billboard position , grass weights are calculated
using (1). Grass type is then chosen using roulette wheel
method where weight of grass type is proportional to
associated probability.
Potential collisions of grass on cell borders can be
ignored as they visually do not make much difference.
Collisions between trees, however, should be avoided.
Trees can have different sizes and placing them in a
regular grid would not be as easy as it was with grass.
Therefore, for scattering trees we use simple random
scattering with collision detection. We randomly choose a
position, calculate tree weights and choose a tree sort
using roulette wheel method. If chosen sort cannot fit onto
position without colliding with surroundings, we remove
the sort from consideration and try with another. The
algorithm ends when all trees have been placed or the
maximum number of iterations has been reached.
In reality, forest is not just trees being randomly
scattered. They are grown from seeds, fighting for space
and resources. Stronger trees prevail giving birth to new
ones and the process repeats. New seeds fall in the vicinity
of mother tree which results in trees than are not just
randomly scattered but rather grouped. To simulate this
process we have implemented a survival algorithm with
environmental feedback similar to algorithm described in
[2]. At the beginning of each iteration, all plants generate
seed at random positions inside seeding radius. Seed that
falls outside of terrain or on a ground that is not inside
specie's parametric area is removed. Next step is detecting
collisions between plants. Plants in collision are fighting
for survival and weaker plant, a plant with smaller
viability [2], is eliminated. At the end of iteration, all
plants increase in age and plants that have reached their
maximum age are removed from the ecosystem. The
algorithm ends after defined number of iterations.
Fig. 4 shows comparison of these three scattering
algorithms where different tree species are being
represented by circles of different colors and radii. Grid
displacement algorithm (left) gives good uniform
distribution of objects of the same size. Random scattering
with collision detection (middle) can successfully scatter
objects of different sizes, however, it does not produce
clusters of same species like survival algorithm (right)
which gives forest a more natural look.
279
Figure 4. Scattering algorithms. Scattering on regular grid with random
displacement (left), random scattering with collision detection (middle)
and survival algorithm (right)
D. Classifying Satellite Image
For procedural texturing and scattering of terrain
cover, we can also use a satellite image as an input
parameter. To translate image into weights we use a
simple, color based classification. To calculate weight at
certain terrain position, satellite image is sampled and
given color mapped to HSV space where boundaries
between colors are more obvious.
Fig. 3 shows a satellite image with its classification.
Due to atmospheric scattering, images taken from great
distances have their colors shifted towards blue. This
makes classifying harder and more advanced terrain cover
classifier would be needed to fully indentify the coastal
line or tell difference between sea and vegetation. More
advanced methods for classifying surface cover have been
developed [19], however this approach is simple and gives
good enough results in natural, uninhabited areas.
After color sampled from satellite image is classified
to surface cover type, we can treat that information as
weight equal to if terrain object is of specified type, or
if it isn't.
V.
RESULTS
To evaluate our approach we will make comparisons
in different scales between real and procedurally
generated Mediterranean landscape.
Fig. 5 shows Croatian island Ist textured using
different input parameters. Fig. 5 (left) represents terrain
textured using slope and height. This method is generic
and usually gives good results on both real and procedural
terrains. Fig. 5 (middle) represents the same landscape
textured using terrain cover classification from satellite
image. Terrain is textured more precisely using this
method, however it can only be used on real-world
Figure 3. Satellite image of Ist (left) and classification of terrain cover
(right)
landscapes. Additionally, procedurally generated terrain is
never a perfect representation of real one due to
inaccuracies
in
heightmap,
interpolations
and
simplifications of terrain geometry. For that reason,
coastal line is often misaligned as it can be seen on Fig. 5
(middle). Fig. 5 (right) tries to solve the coastal line
problem using combination of previous two approaches.
The coastal line height is defined and everything above
that level is textured using terrain cover classification
while everything below using only slope and height as
input parameters. This makes coastal line more
monotonous but consistent.
Fig. 6 shows satellite image of smaller, uninhabited
part of Ist and its procedural representation generated
using already mentioned coastal line reconstruction
approach. The technique proves to be more reliable in
uninhabited areas with image taken from closer distance.
In addition to textures, Fig. 6 (right) also contains trees
scattered according to terrain cover classification.
Following image (Fig. 7) shows the same terrain from
the ground. To represent Mediterranean grass we used
textures that are similar to following Mediterranean
plants: Lactuca serriola, Trisetum flavescens, Conyza
sumatrensis, Eupatorium cannabinum, Urtica dioica and
Melilotus sp. Also, a model of Arbutus unedo has been
used as a bush. To have more control over distribution of
each species we included additional spread parameter for
every grass and tree type.
Finally, Fig. 8 displays a comparison between real and
procedurally generated landscape. Every object on the
image is procedurally scattered except for the two bushes
in front which are placed manually for easier comparison.
Figure 5. Comparison of procedural texturing by slope and height (left), by classifying satellite image (middle) and using coastal line reconstruction
approach (right)
280
MIPRO 2016/DC VIS
Figure 6. Satellite image of Dumboka cove at Ist (left), terrain cover classification (middle) and procedurally generated landscape (right)
Figure 7. Procedurally generated Dumboka cove from ground
Figure 8. Comparison between real (left) and procedurally generated landscape (right)
MIPRO 2016/DC VIS
281
VI.
CONCLUSION AND FUTURE WORK
We presented a method for procedural generation of
environments through terrain generation, texturing and
scattering of terrain cover using cascaded input
parameters. By calculating and multiplying weights of
individual input parameters we obtain information about
how much each texture or species wants to reside on given
terrain position. That information can be used as texture's
contribution to terrain color when texturing the terrain, or
as probability when scattering the terrain cover. Height
and slope of the terrain have proven to be reliable input
parameters for generic uses. In case of real-world terrains,
simple terrain cover classification from satellite image can
be used to provide basic information for texturing and
scattering of terrain cover. To fix coastal line issues
caused by geometry misalignments between real and
generated terrain, we introduced a coastal line
reconstruction approach. Additionally, we used a spread
parameter to have more control over distribution of each
species. Comparing real life images with those generated
procedurally, we believe that our method can create
plausible and visually appealing Mediterranean
landscapes.
Many suggestions for future work can be made.
Terrain should be split into regions for faster view frustum
culling of terrain geometry, grass and trees. Level of detail
algorithms should be used to dynamically reduce the
polygon count of faraway terrain regions. Those regions
could also use cheaper texturing technique, for example
color mapping instead of splat mapping. Closer, high
detail regions could use normal mapping or even fractal
displacement of geometry to achieve 3D look of
Mediterranean karst. Scattering terrain cover based on
satellite image would benefit from more advanced terrain
cover classifier. Although height and slope as input
parameters give good generic results, they are not primary
factors in shaping Mediterranean environment and it
would be interesting to see how would additional
parameters, like wind map and resistance to wind and salt,
affect the final look of the environment.
REFERENCES
[1]
[2]
[3]
[4]
282
T. Archer, "Procedurally generating terrain", 44th annual midwest
instruction and computing symposium, Duluth, 2011., pp.
378393.
B. Beneš, "A stable modeling of large plant ecosystems", In
Proceedings of the International Conference on Computer Vision
and Graphics, Association for Image Processing, 2002, pp. 94–
101.
C. Colditz, L. Coconu, O. Deussen, C. Hege, "Real-time rendering
of complex photorealistic landscapes using hybrid level-of-detail
approaches", 6th Interational Conference for Information
Technologies in Landscape Architecture, 2005.
O. Deussen, P. Hanrahan, B. Lintermann, R. Měch, M. Pharr, and
P. Prusinkiewicz, "Realistic modeling and rendering of plant
ecosystems", In Computer Graphics (Proceedings of ACM
SIGGRAPH) (1998), pp. 275–286.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
X. Décoret, F. Durand, F. X. Sillion, J. Dorsey, "Billboard clouds
for extreme model simplification", In ACM Transactions on
Graphics (Proceedings of ACM SIGGRAPH) (2003), pp. 689–
696.
D. S. Ebert, S. Worley, F. K. Musgrave, D. Peachey, and K.
Perlin, Texturing & Modeling, a Procedural Approach, Elsevier,
3rd edition, 2003.
A. Emilien, U. Vimont, M.-P. Cani, P. Poulin, and B. Beneš,
"WorldBrush: Interactive example-based synthesis of procedural
virtual worlds", ACM Transactions on Graphics (SIGGRAPH
2015), vol. 34, issue 4, pp. 11.
J.-D. Génevaux, É. Galin, É. Guérin, A. Peytavie, and B. Beneš,
"Terrain generation using procedural models based on hydrology",
ACM Transactions on Graphics (SIGGRAPH 2013), vol. 32, issue
4, 143:1–13.
A. Jakulin, "Interactive vegetation rendering with slicing and
blending", In Eurographics 2000 (Short Presentations), 2000.
R. Měch, and P. Prusinkiewicz, "Visual models of plants
interacting with their environment", In Computer Graphics
(Proceedings of ACM SIGGRAPH) (1996), pp. 397–410.
G. S. P. Miller, "The definition and rendering of terrain maps", In
Computer Graphics (Proceedings of ACM SIGGRAPH) (1986),
vol. 20, pp. 39–48.
P. Müller, P. Wonka, S. Haegler, A. Ulmer, and L. Van Gool,
"Procedural modeling of buildings", In ACM Transactions on
Graphics (SIGGRAPH, 2006) , vol. 25, issue 3, pp. 614–623.
F. K. Musgrave, C. E. Kolb, and R. S. Mace, "The synthesis and
rendering of eroded fractal terrains", In Computer Graphics
(Proceedings of ACM SIGGRAPH) (1989), pp. 41–50.
B. Lane and P. Prusinkiewicz, "Generating spatial distribution for
multilevel models of plant communities", In Proceedings of
Graphics Interface ’02., vol. 1, pp. 69–80.
P. Lindstrom, D. Koller, W. Ribarsky, L. F. Hodges, N. Faust, and
G. A. Turner, "Real-time, continuous level of detail rendering of
height fields", In Computer Graphics (Proceedings of ACM
SIGGRAPH) (1996), pp. 109–118.
F. Neyret, "Modeling, animating and rendering complex scenes
using volumetric textures", IEEE Transactions on Visualization
and Computer Graphics, vol. 4, issue 1 (1998), pp. 55–70.
J. Olsen, "Realtime procedural terrain generation", Technical
Report, University of Southern Denmark, 2004.
Y. I. H. Parish, and P. Müller, "Procedural modeling of cities", In
Computer Graphics (Proceedings of ACM SIGGRAPH) (2001),
pp. 301–308.
S. Premoze, W. B. Thompson, and P. Shirley, "Geospecific
rendering of alpine terrain", In Rendering Techniques (1999), pp.
107–118.
J. Rohlf, J. Helman, "IRIS performer: A high performance
multiprocessing toolkit for real-time 3D graphics", In Computer
Graphics (Proceedings of ACM SIGGRAPH) (1994), pp. 381–
394.
F. Strugar, "Continuous distance-dependent level of fetail for
rendering heightmaps (CDLOD)", In Journal of Graphics, GPU,
and Game Tools, 2009, vol. 14, issue 4, pp. 5774.
O. Št’ava, S. Pirk, J. Kratt, B. Chen, R. Měch, O. Deussen, and B.
Beneš, "Inverse procedural modelling of trees", Computer
Graphics Forum, 2014, vol. 33, issue 6, pp. 118–131.
T. Ulrich, "Rendering massive terrains using chunked level of
detail control", SIGGRAPH Super-Size It! Scaling Up to Massive
Virtual Worlds Course Notes, 2002.
MIPRO 2016/DC VIS
Energy-Aware Power Management of Virtualized
Multi-core Servers through DVFS and CPU Consolidation
Hamed Rostamzadeh Hajilari *, Mohammad Mehdi Talebi * and Mohsen Sharifi *
*Iran University of Science and Technology/School of Computer Engineering,Tehran, Iran
hamedrostamzade@comp.iust.ac.ir, mehdi_talebi@comp.iust.ac.ir, msharifi@iust.ac.ir
Abstract— Considerable energy consumption of datacenters
results in high service costs beside environmental pollutions.
Therefore, energy saving of operating data centers received a
lot of attention in recent years. In spite of the fact that modern
multi-core architectures have presented both power
management techniques, such as dynamic voltage and
frequency scaling (DVFS), as well as per-core power gating
(PCPG) and CPU consolidation techniques for energy saving,
the joint deployment of these two features has been less
exercised. Obviously, by using chip multiprocessors (CMPs),
power management with consideration of multi-core chip and
core count management techniques can offer more efficient
energy consumption in environments operating large
datacenters. In this paper, we focus on dynamic power
management in virtualized multi-core server systems which are
used in cloud-based systems. We propose an algorithm which is
effectively equipped by power management techniques to select
an efficient number of cores and frequency level in multi-core
systems within an acceptable level of performance. The paper
also reports an extensive set of experimental results found on a
realistic multi-core server system setup by RUBiS benchmark
and demonstrates energy saving up to 67% compared to
baseline. Additionally it outperforms two existing consolidation
algorithms in virtualized servers by 15% and 21%.
Index Terms— CPU Consolidation, Energy Saving, DVFS,
PCPG, Multi-core Servers.
I. INTRODUCTION
Due to large amount of energy consumed in today‟s
servers, there is a growing need for energy-aware resource
management. Large volume of energy consuming has
implications on electricity costs causing many datacenters
reporting millions of dollars for annual usage. 30$ billion is
reported for enterprise power and cooling in the world
annually. Additionally, a more recent report from the
Natural Resources Defense Council (NRDC) claims waste
and inefficiency in U.S. data centers that consumed a
massive 91 billion kWh of electricity in 2013 will increase
to 140 billion kWh by 2020, the equivalent of 50 large (500
megawatt) power plants. The amount of energy consumed
by the world‟s data centers will triple in the next decade,
putting an enormous strain on energy supplies and dealing a
hefty blow to efforts to contain global warming, experts say.
It is probably safe to say that most data center managers are
dealing with the challenge of increasing data growth and
limited IT resources and budgets. With “save everything
forever” strategies becoming more prevalent for many
organizations.
Traditionally active servers were focused for energy
efficiency. The former approaches aim at reducing the
number of active servers in a datacenter by consolidating all
the incoming workloads into fewer server machines whereas
the latter attempts to keep the performance of each active
MIPRO 2016/DC VIS
server in accordance with the assigned workload so that
energy can be saved at the level of each server.
A data center is designed to provide the required
performance based on its service level agreements (SLAs)
with clients even during workload hours, and hence, its
resources are vastly under-utilized at other times. As an
example, the minimum and the maximum utilization of the
statically provisioned capacity of Facebook‟s data center are
40% and 90%, respectively [1]. Therefore, a great amount of
energy costs can be reduced by consolidating workloads into
as few server machines as possible and turning off the
unused machines. The server consolidation has been
considered in many studies; and virtual machine migration
(VMM) has been used as a means of server consolidation in
many researches [2], [3], [4], [5] and [6].
Due to overheads and limitations associated with the
server consolidation like high network latency, large system
boot time, other approaches still can be concerned for
energy saving. Because of these limitations, consolidation
decisions are made in long periods of times. The longer the
period is, the more server machines are still under-utilized
which implies that there is still hoping to use other
techniques to improve energy efficiency.
Focusing on each active sever is another state-of-art
technique. While focusing on each active server; two widely
accepted and employed techniques are used which are sleep
states and Dynamic Voltage and Frequency Scaling
(DVFS). Obviously these new approaches can be used as a
complementary with the previous techniques.
DVFS was introduced decades ago and is one of the wellknown and common energy-aware CPU management
techniques which has been studied in many researches [2]
[7] [8].
DVFS has its own limitations. First the supply voltages
have already become quite low and therefore a small amount
of further supply voltage reductions is possible. In addition,
typically in datacenter servers there are two or more
processor chips containing multiple CPUs with a single onchip power distribution network, shared by all the CPUs,
which makes it impossible for CPUs to operate in different
supply voltage level and hence various clock frequencies.
This limitation will result in under-utilized CPUs where the
available performance level is higher than what is actually
needed, hence energy is wasted.
Another well-known CPU energy management technique
is Dynamic Power State Switching. In Many modern
processors, it is possible to turn on or off the cores or CPUs.
The workload for each CPU can be measured by OS and it
is possible to Turn CPUs on or off. In other words each CPU
is placed in its own power domain and therefore the power
to each such domain can be independently gated. You
should notice that the suggestion from the OS may not be a
good one that‟s why it may be cancelled by Power Control
283
Unit (PCU) which resides in the processor chip. PCPG is the
deepest sleep state in modern multi-core processors which is
capable to shut down a core completely. So a gated core has
nearly zero energy consumption received a lot of attention.
CPU consolidation is a system level technique to
consolidate load to fewer active cores and the remained
cores can be turned off. Traditionally, clock gating was used
for the unloaded cores which also leads to energy saving.
But, since modern processors support PCPG and CPU
consolidation, they lead to significant energy saving.
Obviously this approach beside DVFS can report better
results.
In this paper, our algorithm is equipped with both of the
mentioned approaches. We suppose virtualized multi core
servers that make our approach appropriate for cloud
computing environments. In our approach we present a
method to find the best choice of the number of active cores
and appropriate frequency according to the related workload
and then power gate remained cores. All of the results are
obtained from the actual hardware measurements and not
simulations. Experiments with implementation of our
algorithm on a realistic multi-core server system setup have
shown 67% percent improvement in energy consumption.
The rest of the paper is organized as follows. Section II
introduces related work on using PCPG and DVFS for
energy efficient power management in datacenters. Section
III describes problem definition for multicore processors of
servers. In section IV our energy consumption model is
presented and in section V the proposed energy aware
mechanism based on energy model is described. We
evaluate proposed mechanism compared to existing works
in section VI. Finally in section VII conclusion and future
works are presented.
II.
RELATED WORKS
Despite various studies in power management of
processors which exploit DVFS [2], [7], [9] and [10] rather
than sleep states, recently presenting PCPG as deepest sleep
state spread out the use of sleep states in datacenters [11]
[12], [13], [14] and [15]. The Jacob Leverich and et al.
research was the first one in this area which evaluates the
use of PCPG in datacenters and demonstrated the
importance of PCPG in power managements systems [12].
Their results show that PCPG solely can save energy up to
40% which is 30% more than using DVFS as a knob. They
also show that the joint deployment of these two knobs
together could save energy up to 60% in datacenters.
As respect to considerable energy saving by PCPG, gating
the power of cores has energy and performance overheads
[16], [14] and [17]. Therefore, frequent mispredictions may
result in inefficiency power management and power
dissipation. Niti Madan et al. demonstrate that it is vital to
attend PCPG by guarded mechanisms to hinder such
negative consequences on intra-core level [17]. Reference
[16] demonstrate that in addition to intra-core algorithms,
inter-core algorithms also suffer from such negative
outcomes and may lead to power dissipation, unlike
expectations. Therefore, it is crucial to exploit guard
mechanisms to prevent such negative outcomes. In [16] the
authors use a guarded mechanism to decide when to gate the
cores in both intra-core and inter-core gating algorithms. We
also considered power and performance overheads of PCPG
in our mechanism. Power overhead is considered in power
model and due to the interval between the high threshold
284
and maximum capacity of active cores; there is always room
to mitigate performance overhead.
In the datacenters, due to power capacity overload and
overheating caused by high server densities, system failures
may occur. There is a need for robust and definite power
managements to avoid system failures. As gated core
doesn‟t consume energy, it has high potential to widely be
used in power capping problems. Kai Ma et al. propose an
integration design of using PCPG and DVFS/overclocking
in such problems to not only satisfy power constraints but
also to optimize performance [13]. They were trying to
optimize the performance of a Chip Multiprocessor (CMP)
within a given power constraint (i.e., power capping) but we
are minimizing the energy consumption of CMPs in
virtualized servers within an acceptable performance.
In the datacenters experiencing low utilization, CPU
consolidation technique equipped by PCPG present
significant energy saving [18], [11] and [19]. With
commercial hardware support of Core-level Power Gating,
consolidation becomes a promising power management
technique in datacenters [20]. In [1] the authors presented a
technique called Core Count Management (some variant of
the CPU consolidation technique) and reported 35% energy
saving. However, they reported both power and performance
results based on simulations performed by using simple
power and performance models. Their work only uses PCPG
but we propose a joint deployment of PCPG and DVFS.
The most related works to ours are [11] and [18] which
use joint techniques of DVFS and CPU consolidation
together in virtualized environments. In spite of
aforementioned researches, they implement a consolidation
algorithm on a realistic multi-core server system setup. The
authors first investigated effect of CPU consolidation on
power dissipation and performance (latency) of such
systems, and concluded by presenting two new CPU
consolidation algorithms for virtualized multi-core servers.
Their algorithms blindly consolidate CPUs based on the
predicted utilization requirement, but, in our work we
consolidate and choose frequency level of active cores based
on their energy consumptions.
III.
PROBLEM DEFINITION
In most of the todays power management researches, for
the sake of simplicity in design due to different performance
overhead of DVFS and PCPG, a decoupled approach is used
in which the power management knobs operate
independently. Considering these power management knobs
independently have substantial drawback which leads
inefficient simultaneous functions.
In [19] this issue has been discussed that DVFS hinders
PCPG for the workloads of multi thread applications like
Fluidanimate from PARSEC benchmark. DVFS and PCPG
are applied in different periods of time independently and
due to lower overhead, DVFS is used in smaller periods and
PCPC is used in longer periods in contrary. While workload
is low, DVFS which is running in shorter periods of time
decides to reduce the frequency to keep cores in higher
utilization. But high utilization makes PCPG to keep the
cores working and so none of them is turned off. Authors in
[19] indicate that using DVFS beside PCPG not only doesn‟t
improve energy saving but also results higher energy
consumption than PCPG solely. Therefore there is a need
for an approach using both of the techniques without
hindering each other.
MIPRO 2016/DC VIS
Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
60%
60%
60%
OFF
OFF
OFF
OFF
OFF
OFF
OFF
OFF
OFF
Core 6
Core 7
Core 8
Core 9
Core 10
Core 11
Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
90%
90%
OFF
OFF
OFF
OFF
0.4
OFF
OFF
OFF
OFF
OFF
OFF
0.2
Core 6
Core 7
Core 8
Core 9
Core 10
Core 11
1
P8:1.6GHz
0.8
P7:1.733GHz
P6:1.867GHz
0.6
P5:2GHz
P4:2.133GHz
P3:2.267GHz
P2:2.4GHz
P1:2.533GHz
P0:2.667GHz
0
a. Low number of active cores
0
20
40
60
80
100
Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
60%
60%
60%
60%
60%
60%
Fig2. Linear function for energy consumption respect to utilization
60%
60%
60%
60%
60%
60%
Core 6
Core 7
Core 8
Core 9
Core 10
Core 11
Core 0
Core 1
Core 2
Core 3
Core 4
Core 5
65%
65%
65%
65%
65%
65%
65%
65%
65%
65%
65%
OFF
Core 6
Core 7
Core 8
Core 9
Core 10
Core 11
Our processor supports 9 P-states which P0 is the
maximum performance state corresponding maximum
frequency and P9 is lowest possible frequency that
processor supports. You should notice that lower P-state
results higher frequency and computation and in
consequence higher energy consumption. The relations
described in this section are derived from data collected in
benchmarking experiments that CPU intensive workloads
with different levels of CPU demand were applied to the
servers under test.
The power consumption of the servers was collected for all
the operation frequencies. The data from the experiments
suggest that for a fixed operating frequency, the power
consumption of the server is approximately linear function
of the utilization of the processor which is given by (1)
where Ui is the utilization of the ith core and ai and bi are the
constant value calculated from the collected data.
b. High number of active cores
Fig1. Effect of CPU consolidation on workload.
CPU consolidation tries to resolve aforementioned issues
based on utilization and frequency of cores. Now this
question is raised that “Does consolidation always maximize
energy saving?”. Our experimental result show that the
answer is no. In low number of cores while turning off one
core, the workload of the others is increased dramatically
and makes them to work in higher frequency which results
more energy consumption (Fig1.a). Conversely in High
number of cores, reducing one of the active cores doesn‟t
affect the workload of the others much, so there is no
tangible change in the frequencies (Fig1.b). In this case
turning off active cores is an effective approach. Our
approach in this paper brings dynamicity to consolidation
approaches to choose best number of active cores and
frequency for minimizing energy consumption.
IV.
ENERGY CONSUMPTION MODEL
In this section we present our approach for energy
consumption model which the approach is based on. We
first model the relation between core frequencies, number of
active cores and energy consumption. We use presented
model in our approach to set appropriate number of active
cores and frequency in order to save energy.
Obviously the frequency, performance state and energy
consummation are tightly coupled elements of our model. In
this part these relationships are calculated. In Fig.2 there are
different performance states (P-State). Each P-State
corresponds to a frequency supported by the processor.
Processor_power = ai * Ui + bi , for each frequency I (1)
In our approach constant parameter of linear relation
between frequencies and utilizations are calculated
beforehand. We use the parameters in our energy
consumption model to select the number of cores and its
performance state which minimize the energy consumption.
In order to control the number of active cores, PCPG is
used which is a circuit-level technique to cut off the power
supply to a core. Obviously there are overheads for
switching on and off a gated core. In [16] and [17] have
discussed the overhead costs of the per core power gating,
and demonstrate importance of guard mechanisms in using
PCPG in order to ensure power savings. We also consider
the overhead of the PCPG to achieve power saving by
turning any core off.
In this paper we use DVFS and PCPG simultaneously, so
our approach must consider both of them. In each iteration
we should decide about the number of active cores and
related frequency, therefore energy consumption for target
active cores and target frequency by considering the
overhead for core gating should be calculated. Energy
consumption is calculated by (2) which Ep is the consumed
energy in the specified performance state and Eg is the
overhead for turning cores on or off .
(2)
MIPRO 2016/DC VIS
285
Assume a server with „N‟ CPUs which all VMs have been
consolidated to run on „n‟ active CPUs and that the
remaining CPUs are power gated. The average CPU
consumption of the ith CPU (Ei) is intrinsically related to the
average CPU utilization (Ui) and the time interval. It can be
modeled as follows:
∑
∑
(3)
Where the T is the control period. For the calculation of Eg if
coretarget is bigger than corecurrent we have (4) otherwise Eg
equals zero.
(
)
(4)
V. ENERGY AWARE MECHANISM
In this section we present an energy aware algorithm to
decide about the number of active cores and appropriate
frequency simultaneously. So, due to assess required
resources in different frequencies, there is need to find the
relationship between utilization and frequency. In the case
of a core, increasing the frequency will result decrease in
utilization. In fact there is linear relationship between
frequency and utilization which can be illustrated as:
(5)
Where Ui is the utilization of the core in frequency of Fi.
Web servers are multi-thread applications which its load can
be distributed to more CPUs. So, for the web servers we can
extend (5) to (6).
∑
∑
(6)
Where Ci and Cj is the number of cores in state i and j,
respectively. For the simplicity we assume all the cores
work on the same frequency and having the same utilization.
(7)
Additionally as we demonstrated in the previous works
[21] and [22], in virtualized servers there is contention
between VMs in getting the physical resources. Our research
demonstrated that for the ratio of vCPUs to physical CPUs
bigger than a threshold, contention of VMs increase
significantly. So, in this algorithm cminPossible is the minimum
number of cores for preventing significant VM performance
degradation due to resource contentions in an overbooked
server.
Input: N (total number of CPUs), fn (frequency), cn (the number
active CPUs), Th (high threshold), and un (average utilization)
Output: fn+1 (frequency), cn+1 (the number of active CPUs)
1: energy = max;
2: for core cn+1 in [cminPossible, cminPossible+1, … , N]
3: if Un+1(fmax) < Th then //need more core
4:
for frequency f in all frequencies (ascending
order) do
5:
if un+1 > Th then //need more frequency
6:
continue;
7:
energyi = calculateEnergy(f, un+1, c)
8:
if energyi < energy then
9:
energy = energyi
10:
fn+1 = f
11:
cn+1 = core
Fig3. Energy aware algorithm
286
In each iteration we assume that the amount of needed
resources is the same with the previous cycle. In each cycle
we choose the best number of active cores and specified
frequency based on the workload amount in current iteration
(Predicting the amount of consumed energy is out of the
scope of this paper) in order to minimize E in (2). All of the
possible states (frequency and number of active cores) are
calculated and the best one is chosen. In each iteration,
based on the predicted workload we can calculate the
utilization of each core based on (7). The calculation is done
by the presented energy model in previous section in which
the overhead of changing the number of active cores in
addition to energy consumption of active cores. You can
find our algorithm on Fig 3.
Un+1(fmax) represents utilization for the max frequency of
current cores, if it is bigger than up threshold, then current
cores cannot satisfy the workload, Hence we have to
increase number of cores. We calculate all of the possible
states based on the number of cores and possible p-states to
minimize energy consumption. The method calculateEnergy
calculates the energy consumption based on the selected
frequency, load and the number of active cores and if the
calculated energy is less than the previous states then the
parameters are set.
VI. EXPERIMENTAL RESULTS
In this section, first we present our experimental testbed
and benchmark which generates workload to evaluate the
algorithm then three existing consolidation approaches are
introduced and then the experimental testbed results are
proposed. Next our algorithm is compared with the
approaches which due to equal comparison conditions we
also implemented them in Xen kernel. Finally, the physical
testbed results are presented.
Our server is equipped with two Intel Xeon X5650
processor and 8 GB RAM. Each one of the CPUs has 6
physical cores having 9 frequency levels: 2.667GHz,
2.533Gz, 2.4GHz, 2.267GHz, 2.133GHz, 2GHz, 1.867GHz,
1.733GHz and 1.6GHz. Our processors support PCPG as the
deepest sleep state.
Xen [23] is the widely used virtual machine manager in
big companies like Amazon. We used Xen 4.4 latest
available version as the hypervisor to host VMs. Our
algorithm is implemented in Xen kernel which have direct
access to resources for measurement and manipulating
power management knobs. To avoid limitations of the
network in experiments, we put the workload clients on the
server, too. Each server hosts 4 virtual machines as clients
and 4 web server virtual machine altogether with dom0. All
the virtual machines run Ubuntu 14.04 as OS with 20GB of
hard disk space. Each web server virtual machine has 2
vCPUs and 800 MB RAM but each client machine has 1
vCPU and 600 MB RAM.
We chose RUBiS (Rice University Bidding System) [24],
an auction website benchmark modeled after eBay to
simulate real web servers. In RUBiS multiple clients which
their number are defined beforehand, connect to the web
server and simultaneously do selling, browsing and bidding
operations. Each client starts a session and does
aforementioned operations randomly after getting the
response of each request from the servers. The benchmark
starts with up ramp phase which it progressively increases
the number of sessions to reach peak load due to not
MIPRO 2016/DC VIS
Table I. Energy efficiency and throughput results.
Approaches
Throughput %
Energy
improvement %
Base
1
n/a
Consolidation1
0.97
58
Consolidation2
0.94
55
Our algorithm
0.94
67
creating massive sudden load which is not tolerable by the
server. In the steady state that is the second phase, the peak
load will be maintained and finally in down ramp the load
will be decreased to reach the end of the benchmark.
Our approach is compared with three existing approaches.
The first approach is the baseline which does not use any
power management knobs. Both the frequency and number
of on cores in server is maximum. The two other approach
that use PCPG and DVFS simultaneously, are threshold
based algorithms. In the first consolidation approach if the
required resource is bigger than up threshold, it first checks
that there is any gated core or not. If there is, the algorithm
will bring the gated core up and otherwise, it will increase
the frequency of the cores. For the required resources lower
than down threshold, it firstly check lower frequency to
satisfy the workload and finally if there is no lower
frequency, it will turn the cores off to satisfy the workload
[11]. All in all, this algorithm tries to keep the cores as
lowest possible frequency.
The second consolidation algorithm, conversely tries to
keep the number of cores at the lowest possible value. In the
case of need for more resources, first it tries to increase the
frequency to meet the required resource and if the frequency
is in the highest possible frequency, it will increase the
number of cores. In the case of lower resource need than
down threshold, it will try to decrease the number of cores
first. Then if lower frequency could not be found, the
number of cores will be decreased.
Table I illustrates the energy efficiency and throughput of
each test proportion to the baseline result. Throughput of
each test is defined as the ratio of the requests to duration of
the benchmark in the experiment. The first row illustrates
results of the baseline approach defined beforehand. The
second and third row are the experimental results of the
consolidation algorithms described and the last one is ours.
Even though blindly consolidation algorithms will have
significant energy savings but there is still room for
minimizing energy consumption. In Table I, since the
baseline approach uses maximum resource to response the
workload; we normalized values proportion to its results.
Our algorithm reduces energy consumption up to 67% and
15% more than consolidation1 algorithm which is the best
existing algorithm for consolidation. Note that although our
algorithm has maximum energy saving but its throughput is
less than consolidation1, because consolidation1 increases
core and unfold more resources in facing growth in
workload. Obviously we can mitigate the throughput
degradation by exploiting accurate prediction mechanism
but it is out of the paper scope. The reason why our
algorithms and consolidation 2 throughputs are equal is that
based on the energy consumption model, both the
algorithms operate similar to each other in facing the
increase in workload. However, in lowering workload they
MIPRO 2016/DC VIS
have different approach and our algorithm have 21% more
energy saving respect to consolidate 2.
VII. CONCLUSION AND FUTURE WORKS
Due to limitations of server consolidation approaches, the
servers also suffer from low utilizations of resources. Hence
there is a need for an efficient energy management for
power of processors. Modern chip multi-processors provide
two options to reduce energy consumption: DVFS and
PCPG which made CPU consolidation as an effective knob
in power management of servers.
Even though CPU consolidation hinders decoupled use of
DVFS and PCPG but our experiments on modeling the CPU
power consumption in different frequencies and different
number of cores demonstrate that for low number of cores
scaling out and operating in low frequencies there has more
energy savings. Therefore, although blindly consolidation
approaches significantly improve energy consumption but
the energy saving is not maximized.
Our proposed algorithm provide dynamically consolidate
CPUs to minimize energy consumption. We also propose a
novel energy model which considers energy overhead of
PCPG, too.
Finally, up to 67% energy saving without significant
performance degradation is reported in our experimental
results. Additionally the proposed algorithm outperforms
two existing CPU consolidation algorithms by 15% and
21% in energy saving.
In this paper we didn‟t consider any specific order for ON
cores in our consolidation mechanism. Choosing the right
order of active and inactive cores is planned as future work.
Since higher density of ON cores results in higher
temperature, more energy may be needed for cooling
system, and hence different order of active cores is of a
great importance.
VIII. REFERENCES
[1] O. Bilgir, M. Martonosi and Q. Wu, "Exploring the Potential of CMP
Core Count Management on Data Ceter Energy Savings," in
Workshop on Energy Efficient Design, 2011.
[2] G. Dhiman, G. Marchetti and T. Rosing, "vGreen: a System for
Energy Efficient Computing in Virtualized Environments," in
Proceedings of the 2009 ACM/IEEE international symposium on Low
power electronics and design, 2009.
[3] R. Nathuji and K. Schwan, "VirtualPower: Coordinated Power
Management in Virtualized Enterprise Systems," in ACM SIGOPS
Operating Systems Review, 2007.
[4] N. Bobroff, A. Kochut and K. Beaty, "Dynamic placement of Virtual
Machines for Managing SLA Violations," in Integrated Network
Management, 2007. IM'07. 10th IFIP/IEEE International Symposium
on,, Munich, 2007.
[5] N. Van, F. D. T. Hien and J.-M. Menaud, "Autonomic Virtual
Resource Management for Service Hosting Platforms," in Proceedings
of the 2009 ICSE Workshop on Software Engineering Challenges of
Cloud Computing, 2009.
[6] C. Clark, S. H. Keir Fraser, J. G. Hansen, E. Jul, C. Limpach, I. Pratt
and A. Warfield., "Live Migration of Virtual Machines," in
Proceedings of the 2nd conference on Symposium on Networked
Systems Design & Implementation-Volume 2, 2005.
[7] G. Von Laszewski, L. Wang, A. J. Younge and X. He, "Power-aware
Scheduling of Virtual Machines in DVFS-enabled Clusters," in In
287
Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE
International Conference on, 2009.
[8] P. Pillai and K. G. Shin, "Real-time Dynamic Voltage Scaling for
Low-power Embedded Operating Systems," in ACM SIGOPS
Operating Systems Review, 2001.
[9] J. Chen, T. Wei and J. Liang., "State-Aware Dynamic Frequency
Selection Scheme for Energy-Harvesting Real-Time Systems," Very
Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.
22, no. 8, pp. 1679 - 1692, 2014.
[10] Y. Wang, X. Wang, M. Chen and X. Zhu, "Power-Efficient Response
Time Guarantees for Virtualized Enterprise Servers," in Real-Time
Systems Symposium, 2008, Barcelona, 2008.
[11] I. Hwang, T. Kam and M. Pedram, "A study of the Effectiveness of
CPU Consolidation in a Virtualized Multi-Core Server System," in
Proceedings of the 2012 ACM/IEEE international symposium on Low
power electronics and design, 2012.
[12] J. Leverich, M. Monchiero, V. Talwar, P. Ranganathan and C.
Kozyrakis, "Power Management of Datacenter Workloads Using PerCore Power Gating," in Computer Architecture Letters, 2009.
[13] K. a. X. W. Ma, "PGCapping Exploiting Power Gating for Power
Capping and Core Lifetime Balancing in CMPs," in Proceedings of
the 21st international conference on Parallel architectures and
compilation techniques, 2012.
[14] H. Jiang, M. Marek-Sadowska and S. R. Nassif., "Benefits and Costs
of Power-Gating Technique," in Computer Design: VLSI in
Computers and Processors, 2005. ICCD 2005. Proceedings. 2005
IEEE International Conference on, San Jose, CA, USA, 2005.
[15] J. Lee and N. S. Kim, "Optimizing Throughput of Power- and
Thermal-Constrained Multicore Processors Using DVFS and Per-Core
Power-Gating," in Design Automation Conference, 2009. DAC '09.
46th ACM/IEEE, San Francisco, CA, 2009.
[16] M. Annavaram, "A Case for Guarded Power Gating for Multi-Core
Processors," in High Performance Computer Architecture (HPCA),
2011 IEEE 17th International Symposium on, San Antonio, TX, 2011.
[17] N. Madan, A. Buyuktosunoglu, P. Bose and M. Annavaram, "Guarded
Power Gating in a Multi-Core Setting," in Computer Architecture,
2010.
[18] I. Hwang and M. Pedram, "CPU Consolidation versus Dynamic
Voltage and Frequency Scaling in a Virtualized Multi-Core Server," in
Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,
2013.
[19] A. Vega, A. Buyuktosunoglu, H. Hanson, P. Bose and S. Ramani.,
"Crank It Up or Dial It Down: Coordinated Multiprocessor Frequency
and Folding Control," in Proceedings of the 46th Annual IEEE/ACM
International Symposium on Microarchitecture, 2013.
[20] J. B. Leverich, "Future Scaling of Datacenter Power-efficiency,"
Stanford University, 2014.
[21] M. Sharifi, H. Salimi and M. Najafzadeh., "Power-efficient distributed
scheduling of virtual machines using workload-aware consolidation
techniques," The Journal of Supercomputing, vol. 61, no. 1, pp. 46-66,
2012.
[22] H. Salimi and M. Sharifi, "Batch scheduling of consolidated virtual
machines based on their workload interference model," Future
Generation Computer Systems, vol. 29, no. 8, pp. 2057-2066, 2013.
[23] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R.
Neugebauer, I. Pratt and A. Warfield, "Xen and the Art of
Virtualization.," in ACM SIGOPS Operating Systems Review, 2003.
[24] "http://rubis.ow2.org/," RUBiS: Rice University Bidding System.
[Online].
288
MIPRO 2016/DC VIS
Human Posture Detection Based on Human
Body Communication with Muti-carriers
Modulation
Wenshu Ni1, Yueming Gao2, Zeljka Lucev3, Sio Hang Pun4, Mario Cifrek5, Mang I Vai6 , Min Du7
Fujian Key Laboratory of Medical Instrument and Pharmacy Technology, Fuzhou University, Fuzhou,China
3,5
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
1
595317255@qq.com, 2fzugym@163.com, 3zeljka.lucev@fer.hr
1,2,4,6,7
Abstract – Multi-node sensors for human posture
detection, by acquiring kinematic parameters of the
human body, helps to further study the laws of human
motion. It can be used as reference for the quantitative
analysis tool for some specific application, such as
healthcare, sports training and military affairs, etc.
Compared with the traditional optical method, posture
detection based on the inertial sensors has smaller
limitation of space, lower cost, and easier implementation.
In this paper, a human posture detection system was
introduced. Utilizing the parameter data obtained by the
inertial sensors, three-dimensional angles of the human
hand movement could be calculated via quaternion
algorithm for data fusion. The angles data transmission
among the sensor nodes was successfully realized by the
human body communication(HBC) transceivers based on
capacitive coupling of multi-carriers OOK modulation at
the data rate of 57.6kbps. The bit error rate(BER) was less
than 10-5. The human posture could be reconstructed on
the PC host. Ultimately, the implementation of the overall
result showed the feasibility of the system.
Keywords - Inertial sensors, Human body communication,
Multi-carriers OOK modulation, Posture reconstruction
I.
INTRODUCTION
In recent years, human action recognition combined
with specific applications such as virtual reality, health
care, sports training, has caught much attention. Human
posture is closely related to some disease characteristics.
Detection will be further helpful for some specific
condition as a quantitative tool. Physical therapy with
quantitative approaches for geriatrics training or
Parkinson’s patients for example[1]. Due to the
improvements of Micro-Electro-Mechanical system
(MEMS)[2][3], the size, accuracy, robustness and
dynamic response are greatly improved. Compared with
the traditional optical method, inertial sensors for
posture detection has lower cost, less space requirement,
and easier implementation.
However, traditional means of communication
among the inertial sensors are often based on Bluetooth,
RF, Zigbee or WIFI[4], which have high power
consumption and are likely to cause radiation and
wireless signal aliasing. To reduce these impacts, the
human body communication(HBC) has been introduced.
HBC utilizes the human body as signal channel to
transmit the data among the nodes. Capacitive coupling
method of HBC transmits signals by generating an
electric field so that the sensors attached on or
MIPRO 2016/DC VIS
implanted in the human body could share the
information. Methods of HBC system are divided into
two types. The first is the direct transmission of digital
square wave signal without modulation[5][6], and the
second is a modulation scheme with carrier frequency[7].
While the latter one is more conducive to achieve high
data rate[8] and multichannel transmission[9].
In this paper we proposed a novel approach to the
human posture detection system design. In order to save
the channel resources and reduce the radiation impact,
taking the direct measurement of the human body
posture into account as well, capacitive coupling
method of HBC with multi-carriers on-off keying(OOK)
modulation was considered to achieve the two ways
transmission of digital signals. The circuit of OOK
modem has been proposed, which was capable to
transmit the digital signal at the rate of 57.6kbps and
the BER was less than 10-5. Besides, multi-carriers
modulation method was adopted to realize the two
channels parallel transmission in real time.
II.
SYSTEM DESIGN
Body posture detection system can detect the
precise angle information of human action, and play an
important role as a quantitative tool for some specific
application. In this paper, a human gesture detection
system design was proposed. Three inertial sensors was
capitalized on to collect and process information of the
posture angle. HBC was responsible for the two ways
sensor data transmission among the sensors at the data
rate of 57.6kbps. Eventually, the reconstruction
operation was implemented on the PC.
Fig.1 Diagram of the proposed human posture detection system
The structure of the system can be showed in Fig. 1,
it comprises of: (1) Inertial sensor nodes, (2)HBC
289
device, (3)Reconstruction of body posture. To begin
with, the inertial sensor node, which consists of
accelerometer, gyroscope and magnetometer, will detect
the kinematics data of the body. Fig.1 shows the
structure of the sensor. In this paper, we had three
sensors put on the up arm, lower arm and hand
respectively to get the kinetic information of the three
parts. Then algorithm of data fusion will figure out the
three-dimensional angles of three parts in the free space.
The capacitive coupling HBC was used for wireless
transmission. Only the signal electrodes of TX and RX
are attached to the body while the ground (GND)
electrodes are left floating. The TX establishes the
quasi-static electric field, while the RX detects the
electric potential at the remote end in this application[10],
therefore, the information could be shared between TX
and RX, which performs less attenuation than the
galvanic coupling. As shown in the Fig.2. In this
mechanism, the node 1 got the upper arm posture data
and then transmitted it with carrier 1 modulation. The
node 2 received the data via the HBC receiver 1 and
transmitted the data of node 2 and node 1with carrier 2
frequency modulation. Finally, the node 3 received
former two channel signal via HBC receiver 2, and then
the received signals together with the data of node 3
would be used to reconstruct the posture on the PC.
Therefore a tree structure was formed to detect the arm
and hand posture. The transceivers position setup on the
human body is as Fig.5.
Fig.2 The HBC mechanism
III.
METHOD
A. Calculation of Angles
In this system, inertial sensor module MPU6050,
was adopted considering its convenience in wearable
property. With sensor attached, the arm movement for
example can be regarded as the coordinate change of
the sensor model. Therefore the three-dimensional
angles of arm is equivalent to the module’s, which
could be expressed by quternion as shown in (1).
Equation (2) is the carrier’s transformation matrix. With
the data obtained by the accelerometer, gyroscope and
magnetometer, the transformation matrix could be
corrected using error vector products. Then the
quaternion is refreshed by the numerical integration
290
method, and the RungeKutta method is described as (3).
Finally, due to the transformation of quaternion being
equivalent to the Euler’s, the Roll, Pitch and Yaw
angles can be calculated eventually, following the
(4)-(6).
Q q0 q1i0 q2 j0 q3k0 cos (li0 mj0 nk0)sin ,
2
2
1 2(q22 q32 ) 2(q1q2 q0q3 ) 2(q1q3 q0q2 )
CbR (q) 2(q1q2 q0q3 ) 1 2(q12 q32 ) 2(q2q3 q0q1 ),
2(q1q3 q0q2 ) 2(q2q3 q0q1 ) 1 2(q12 q22 )
.
0 g x g y g z q0
q.0
q 1 gx
0
g z g y q1
,
.1
0
g x q2
q2 2 g y g z
.
gy gx
0 q3
g z
q3
Yaw arctan
2q q 2q q
sin
arctan 1 2 2 0 2 3 ,
cos
1 2 ( q 2 q3 )
(1)
(2)
3
( 4)
Pitch arcsin( 2 q1 q 3 2 q 0 q 2 ),
(5 )
2q q 2q q
Roll -arctan 2 3 2 0 2 1 ,
1 2(q1 q2 )
( 6)
B. Human body communication of OOK multi-carriers
modulation
In order to transmit multiple signals, we use a
multi-carrier OOK modulation method based on
frequency division multiplexing principle. As shown in
the Fig.3, the total bandwidth of the human body
channel was divided into many sub-bands(or
subchannels) to transmit multiple signals with different
carrier frequency modulation, while the receiver end
would separate the channels, so that multi-channel
parallel transmission of signals without interference
could be possible. The energy of electromagnetic wave
is mainly in the form of electric field when its
frequency is below 10MHz in the capacitive coupling
of HBC, which performs less radiation and better
transmission characteristics[11]. In this system, we use
6MHz, 9MHz as carriers frequency. The narrow band
pass filters with center frequency of the carriers
separate the two channel signals and removed the
unwanted noise.
Fig.3 Multi-carriers modulation of HBC
MIPRO 2016/DC VIS
HBC on-off keying modulation scheme is shown in
Fig.4. The HBC transceiver realizes the OOK
modulation by sinusoidal oscillator circuit and analog
switch. We produce the frequency of 9 MHz or 6MHz
3V amplitude sine wave. The digital signal is input to
the TXIN. When the TXIN is '1', frequency sine wave is
output, and when the TXIN is '0', no signal is output.
Thus the OOK modulation is realized. In the receiver,
signals suffered attenuation and distortion. Chebyshev
passive band pass filter has been employed because of
its fast attenuation of the transition band to remove the
unwanted signals. The center frequency of the filter is
carrier frequency. Then the envelop detector together
with the comparator could demodulate the signal and
control the electrical level in the range of TTL level,
making sure the MCU of the receiver could get the
digital signal via the serial port directly.
result. The electrodes of transceivers were attached on
the human arm and hand back as described in session II.
The data was modulated and transmitted though the
human body to the receiver board. The test signal was
NRZ type generated by a STM32 serial port with a rate
of 57.6kbps. The measured output waveform of
recovered data could be seen clearly on the screen of
the oscilloscope, showing that the data receivers
recovered the modulation signals and separated the two
channel signals successfully. The Fig.6 shows the
output of the modulation signal and the received signal
of transceiver 1 whose carrier frequency is 6MHz. After
analysis, the bit error bit was less than 10-5 when the
data rate was 56.7kbps, indicating the HBC system
could work successfully.
Fig.6 Modulation and demodulation wave form compared with
original signal
Fig.4 The OOK modulation scheme
IV.
RESULT AND DISCUSSIONS
A. Experiment of HBC
In the measurement experiment, all of the circuits
board were powered by battery in order to separate the
transmitter node and the receiver node; the
Textronix2024 oscilloscope was used in the experiment,
which is self-powered by its own battery inside, could
keep apart two passageways in the round as well. A
24-years old male’s left arm was used as experimental
subject. The medical electrode’s size was 4cm×4cm.
Fig.5 HBC devices experiment
B. Reconstruction of the posture of human arm and
hand
As shown in Fig.7, three inertial sensors are
respectively put in the upper, lower arm and the hand
back to obtain the attitude angles of the three parts. As a
result, the three dimensional angles could be acquired
by the computer. The result of reconstruction could be
seen in Fig.7, there are three cylinders regarded as the
human arm and hand, the red part is the upper arm, and
the yellow one is the lower arm, while the purple one is
the hand. The result showed that the system based on
the inertial sensor was able to measure and reconstruct
the human posture.
Fig. 7 The reconstruction of human posture
Fig.5 shows the measurement setup and the wave
MIPRO 2016/DC VIS
291
V.
CONCLUSION
In this paper, a human posture detection system
based on inertial sensors and human body
communication was proposed and implemented. Using
algorithm
of
quaternion
to
calculate
the
three-dimensional angles of the human arm and hand,
the posture information could be obtained. The human
body was selected as propagation medium for two
channel data transmission. Capacitive coupling method
of HBC was considered, besides, the frequency of
6MHz and 9MHz were chosen as multi-carriers
frequency in HBC devices of OOK modulation. The
transceivers were implemented and the BER was less
than 10-5 when the data rate was 57.6kbps as system
designed. The arm and hand posture could be
reconstructed on computer so that the results are visible
and quantitative for some specific application.
[11] Joonsung Bae, Hyunwoo Cho, Kiseok Song, Hyungwoo Lee
and Hoi-Jun Yoo. The Signal Transmission Mechanism on the
Surface of Human Body for Body Channel Communication,
IEEE Transactions on Microwave Theory and Techniques, 2012,
60(3): 582-593.
ACKNOWLEDGMENT
The authors would like thank to the Ministry of
Science Foundation of China 2013DFG32530, the
National Natural Science Foundation of China
61201397 and the Funds of the Department of
Education of Fujian Province, China, JA13027.
REFERENCES
Fay B. Horak, PhD, PT, and Martina Mancini, PhD. Objective
Biomarkers of Balance and Gait for Parkinson’s Disease Using
Body-worn Sensors. BALANCE AND GAIT BIOMARKERS,
Vol.38, NO.11, pp.1544-1551,2013.
[2] N. Barbour and G. Schmidt, “Inertial sensor technology trends,”
IEEE Sensor J., vol. 1, no. 4, pp. 332–339, Dec. 2001.
[3] J. Barton, A. Lynch, S. Bellis, B. O’Flynn, F. Murphy, K.
Delancy, and S. C. O’Mathuna, “Miniaturized inertial
measurement units (IMU)for wireless sensor networks and
novel display interfaces,” in Proc. Electronic Components and
Technol. Conf., 2005, pp. 1402–1406.
[4] Yao-Chiang Kan, Chun-Kai Chen. “A Wearable Inertial Sensor
Node for Body Motion Analysis.” IEEE SENSORS JOURNAL,
VOL. 12, NO. 3, MARCH 2012.
[5] C. H. Hyoung, J. B. Sung, J. H. Hwang, J. K. Kim, D. G. Park,
S. W.Kang, “A novel system for intrabody communication:
Touch-And-Play,” Proc. ISCAS, pp. 1343–1346, May.21-23,
2006.
[6] Linlin Zhang, Yueming Gao. “Design of Human Motion
Detection Based on the Human Body Communication,” In Proc,
IEEE TENCON 2014, Macau SAR, China. Nov. 2015. pp. 1-4.
[7] K. Hachisuka, A. Nakata, T. Takeda, Y. Terauchi, K. Shiba, K.
Sasaki,H. Hosaka, and K. Itao, “Development and performance
analysis of an intra-body communication device,” in Proc. 12th
IEEE Int. Conf. Solid-State Sens., Actuators, Microsyst., pp.
1722–1725, Jun, 2003.
[8] T. Leng, Z. Nie, W. Wang, F. Guan, and L. Wang, "A human
body communication transceiver based on on-off keying
modulation," In Proc., IEEE isbb 2011, China, Nov. 2011, pp.
61-64.
[9] Ž. Lčev, I. Krois, and M. Cifrek, "A multichannel wireless
EMG measurement system based on intrabody communication,"
XIX IMEKO World Congress, Fundamental and Applied
Metrology, Lisbon, Portugal, pp. 1711-1715, September 2009.
[10] Ruoyu Xu, Hongjie Zhu. Electric-Field Intrabody
Communication Channel Modeling With Finite-Element
Method. IEEE TRANSACTIONS ON BIOMEDICAL
ENGINEERING, VOL. 58, NO. 3, MARCH 2011
[1]
292
MIPRO 2016/DC VIS
SAT-based Search for Systems of Diagonal Latin
Squares in Volunteer Computing Project
SAT@home
Oleg Zaikin, Stepan Kochemazov, Alexander Semenov
Matrosov Institute for System Dynamics and Control Theory SB RAS, Irkutsk, Russia
Email: zaikin.icc@gmail.com, veinamond@gmail.com, biclop.rambler@yandex.ru
Abstract—In this paper we considered the problem of finding
pairs of mutually orthogonal diagonal Latin squares of order
10. First we reduced it to Boolean satisfiability problem. The
obtained instance is very hard, therefore we decomposed it into
a family of subproblems. To solve the latter we used the volunteer
computing project SAT@home. In the course of 10-month long
computational experiment we managed to find 29 pairs of
described kind, that are different from already known pairs.
Also we considered the problem of search for triples of diagonal
Latin squares of order 10 that satisfy weakened orthogonality
condition. Using diagonal Latin squares from the known pairs
(the most of them were found in SAT@home) we constructed new
triples of proposed kind. During this computational experiment
we used a computing cluster.
I. I NTRODUCTION
The combinatorial problems related to Latin squares pose
interest for mathematicians since Leonard Euler. A lot of
general information about these problems can be found in [1].
Latin square of order n is a square n × n table filled with
elements from some set M ,|M | = n in such a way that each
element from M appears in each row and each column exactly
once. Initially Leonard Euler used the set of Latin letters as
M , therefore the corresponding combinatorial designs were
named Latin squares. In this paper for convenience we will
use as M the set {0, . . . , n − 1}. The Latin square is called
diagonal if both its primary and secondary diagonals contain
all numbers from 0 to n − 1. In other words, the constraint
on the uniqueness is extended from rows and columns to two
diagonals.
A pair of Latin squares of the same order is called orthogonal if all ordered pairs of the kind (a, b) are different, where
a is the number in some cell of the first Latin square and b is
the number from the same cell in the second Latin square. If
there are m different orthogonal Latin squares, from which
each pair is orthogonal, then it is called the system of m
mutually orthogonal Latin squares (MOLS). One of the most
well-known unsolved problems in this area is the following:
to answer the question whether there exists a triple of MOLS
of order 10.
There are many different approaches to solving problems
regarding combinatorial designs. In the present paper we use
the so-called SAT approach [2]. Basically, it means that we
reduce the original problem to Boolean satisfiability problem (SAT) and then apply state-of-the art SAT solvers to
MIPRO 2016/DC VIS
an obtained instance. The attractiveness of this approach is
justified by the fact that a lot of problems from different
areas (for example, software verification, cryptography or
bioinformatics) can be effectively reduced to SAT. Despite
the fact that SAT is NP-hard, and all known algorithms for
solving SAT are exponential in the worst case scenario, stateof-the-art heuristic algorithms manage to solve SAT instances
encoding practical problems from various areas in reasonable
time. The majority of such fast SAT solvers are based on the
Conflict-Driven Clause Learning paradigm (CDCL) [3].
Evidently, among SAT instances there are very hard variants
and to solve them in reasonable time it is necessary to involve
significant amounts of computational resources. That is why
the improvement of the effectiveness of SAT solving algorithms, including the development of algorithms that are able
to work in parallel and distributed computing environments, is
a very important direction of research. In 2011 we launched
the volunteer computing project SAT@home aimed at solving
hard SAT instances [4]. One of the aims of the project is to find
new combinatorial designs based on the systems of orthogonal
Latin squares.
Let us briefly outline the paper. In the second section we
describe the volunteer computing project SAT@home. In the
third section we detail how we apply SAT approach to finding
pairs of mutually orthogonal diagonal Latin squares (MODLS)
of order 10 in SAT@home. In the fourth section we show
how the pairs of MODLS found in SAT@home can be used
to search for triples of diagonal Latin squares of order 10
that satisfy weakened orthogonality condition and discuss the
results of our computational experiment, performed using a
computing cluster.
II. VOLUNTEER C OMPUTING P ROJECT SAT@ HOME
Volunteer computing [5] is one of the types of distributed
computing. Its defining characteristic is that it uses computational resources of volunteers PCs. The majority of volunteers
here are usual people, not affiliated with a single organization
or group of companies. Usually, one volunteer computing
project is designed to solve one or several closely related
hard problems. It is important to note that when volunteers
PC is connected to the project, all the calculations on it are
performed automatically and do not inconvenience user since
for this purpose only idle resources of PC are employed.
293
Another distinctive feature of volunteer computing projects
is that they can use only embarrassing parallelism [6], i.e.
the original problem should be decomposed into a family of
subproblems that can be solved independently from each other.
Volunteer computing is very cheap — to maintain a project one
needs only a dedicated server and several client applications
fitting into one infrastructure. Empirically, the main difficulty
here lies in software development and database administration.
Also it is crucial to provide a feedback to project users via
web site and special forums. An attractive consequence of such
structure is that volunteer project can spend several months or
even years to solve one hard instance.
The majority of currently functioning volunteer computing
projects are based on the Berkeley Open Infrastructure for
Network Computing (BOINC) [5], developed in Berkeley in
2002. Overall, at the present moment there are about 70 active
volunteer projects. Total performance of these projects exceeds
11 PFLOPs. Structurally, volunteer computing project can be
considered as a sum of the following parts: server daemons,
database, web site and client applications. Here daemons
include work generator (it generates tasks to be processed
by volunteers PSs), validator (that checks the correctness of
the results received from volunteers PCs) and assimilator (it
processes correct results). The client applications should have
versions for the widespread computing platforms.
In 2011 the authors of the paper in collaboration with
colleagues from IITP RAS developed the volunteer computing
project SAT@home [4]. On February 7, 2012 this project was
added to the official list of BOINC projects1 with alpha status.
In 2015 the status was upgraded to beta. The main goal of the
project is to solve hard SAT instances from various subject
areas. SAT@home is based on the BOINC platform. Currently
(as of January 22, 2016) the project has about 3784 active PCs
from users all over the world, and has the average performance
of 8.4 TFLOPs.
Let us consider the basic features of the SAT@home project
in more detail. The work generator daemon is located on
the server. It decomposes the original SAT instance into a
family of subproblems. For this purpose it uses decomposition
parameters, found via the special Monte Carlo method [7] on
a computing cluster. On this stage it is necessary to use a
computing cluster because the corresponding computational
scheme uses a lot of interprocessor exchange (essentially, it
employs fine-grained parallelism). According to the concept
of redundant calculations used in BOINC, the work generator
creates 2 copies of each task to be processed by 2 users
from different teams (it decreases the possibility of cheating).
The validator daemon checks if both copies of a single task
yielded the same result and then gives verified results to the
assimilator daemon. If there was found satisfying assignment
then the assimilator checks it, and if it is correct, then it marks
the original SAT instance as solved. The client application in
SAT@home is based on the slightly modified CDCL solver
M INI S AT [8].
1 http://boinc.berkeley.edu/projects.php
294
In May, 2012, the six-month computational experiment,
aimed at solving 10 instances of A5/1 keystream generator
cryptanalysis, successfully ended in SAT@home [7]. Note
that in this experiment there were considered only instances,
that could not be solved via known Rainbow tables. For
each instance the goal was to find unknown initial values
of generator registers. In 2014 in SAT@home there were
successfully solved five weakened cryptanalysis instances for
the Bivium keystream generator [9]. In each instance the
values of 9 out of 177 bits corresponding to initial values
of generator registers (at the end of initialization phase) were
known to make them solvable in reasonable time.
In general, the SAT@home project is a powerful tool for
solving hard SAT instances. In the next sections we will
describe how we apply it to the search for combinatorial
designs.
III. F INDING PAIRS OF M UTUALLY O RTHOGONAL
D IAGONAL L ATIN S QUARES OF O RDER 10 IN SAT@ HOME
The existence of a pair of MODLS of order 10 was proved
in 1992 — in the paper [10] three such pairs were presented. In
2012 we started in SAT@home the computational experiment
aimed at finding new pairs of MODLS of order 10. The
experiment was finished in 2013 and its results were published
in [11]. First we constructed a propositional encoding for this
problem. The obtained CNF had 2000 Boolean variables and
434440 clauses. The size of the corresponding Conjunctive
normal form (CNF) in the DIMACS format was 10 Mb. We
used the so-called “naive” encoding (for example, see [12]).
We decomposed the obtained SAT instance as follows. The
first row of the first diagonal Latin square was fixed to be
equal to “0 1 2 3 4 5 6 7 8 9”. It does not lead to the loss
of generality because of the properties of Latin squares. After
this we processed all possible values of the first 8 cells of the
second and the third rows. As a result we obtained a family
of subproblems, in which for each subproblem for the first
diagonal Latin square the values were fixed for 26 out of 100
cells (10 from the first row and 8 from the second and the third
rows). In terms of CNF it translated into assigning values to
260 out of 2000 variables in each SAT instance. In SAT@home
experiment each job batch contained 20 such SAT instances.
For each instance M INISAT had a limit of 2600 restarts that
is more or less equal to 4 minutes on one core of state-of-theart CPU. To process 20 million subproblems generated for the
experiment it took about 9 months of work of the SAT@home
project (from September 2012 to May 2013). As a result we
found 17 new pairs of MODLS of order 10 (in addition to
three previously known pairs from [10]).
In April 2015 we started in SAT@home another experiment
on the search for new pairs of MODLS of order 10. Note
that in the experiment held in 2012 (see [11]) we used the
decomposition chosen according to some rational considerations. In the new experiment we decided to use a more
systematic approach to choosing the decomposition. In Table
1 we show the data for 7 variants of decomposition (including
one suggested in [11]) and the results obtained for them. In
MIPRO 2016/DC VIS
TABLE I
T HE RESULTS OF EXPERIMENTS FOR DIFFERENT DECOMPOSITIONS .
Decomposition
Solutions found
Time
1 row, 9 cells
1
1 month in 2015
2 rows, 2 cells each
-
1 day in 2015
2 rows, 3 cells each
-
3 days in 2015
2 rows, 4 cells each
-
2 weeks in 2015
2 rows, 5 cells each
23
4 months in 2015-2016
2 rows, 6 cells each
5
3 months in 2015-2016
2 rows, 8 cells each
17
9 months in 2012-2013
total from April 17, 2015 to February 7, 2016 we managed
to find 29 new previously unknown pairs of the considered
kind (compared to 3 pairs from [10] and 17 pairs from [11]).
All found solutions are available online at the web site of
the SAT@home project. In this experiment we used the same
client application with the same limit on the amount of restarts
as in 2012. In all decompositions we considered first cells from
the left (it means that since we always fix the first row, then
we vary the rows starting from the second).
Let us comment the Table 1. The decomposition “2 rows,
8 cells each” corresponds to the decomposition from [11]. All
the other 6 decompositions have been tested in 2015-2016.
The decomposition “2 rows, 2 cells each”, “2 rows, 3 cells
each” and “2 rows, 4 cells each” did not yield any results
under the experiment rules. The decomposition “2 rows, 5
cells each” turned out to be more effective than “2 rows, 8
cells each” used in [11], because it made it possible to find
more new pairs per time unit (even if we take into account
that the performance of the SAT@home project in 2015 was
twice that in 2012). It also was the most effective compared
to all other decompositions we tested in 2015-2016. Using “2
rows, 6 cells each” we managed to find several new pairs, but
ifs effectiveness compared to “2 rows, 5 cells each” turned
out to be low, that is why the corresponding experiment was
suspended.
IV. F INDING T RIPLES OF M UTUALLY PARTIALLY
O RTHOGONAL D IAGONAL L ATIN S QUARES OF O RDER 10
U SING C OMPUTING C LUSTER
Let us consider a triple of Latin squares of the same order.
Among all possible sets of ordered pairs of elements, that
form when we look at this in the same manner as when we
check the orthogonality condition for all three pairs of squares
simultaneously, we choose the set with maximal power. Let
us refer to the power of this set as characteristics of the
considered triple of Latin squares. Currently the record triple
of mutually partially orthogonal Latin squares of order 10 in
this notation is the one published in [13]. In this triple square
A is fully orthogonal to squares B and C, but squares B and
C are orthogonal only over 91 pairs of elements out of 100. It
means that in our notation the characteristics of this triple is
MIPRO 2016/DC VIS
91. We are focused on finding triples of diagonal Latin squares
of order 10, that have high value of characteristics.
In [11] we suggested to consider the problem of constructing
triples of MODLS of order 10 in the form of SAT problem.
We proposed a propositional encoding, in which by editing a
special clause it was possible to set up the desired value of
characteristics. When we add to this CNF the known values
of two known Latin squares that form an orthogonal pair, it
is reduced to finding values of remaining unknown Boolean
variables in such a way that the resulting third diagonal Latin
square forms with the considered known pair the partially
orthogonal triple with desired value of characteristics. For
each known pair (at that moment there were only 20) we
constructed a separate CNF by assigning values to Boolean
variables corresponding to the elements of known pair. We employed the multi-threaded CDCL solver TREENGELING [14],
that was launched on every obtained SAT instance with time
limit equal to 1 hour on one computing node of “Academician
V.M. Matrosov” computing cluster of ISC SB RAS2 . One node
of this cluster contains 32 CPU cores and 64 Gb RAM. Using
this approach it was possible to find the triple of proposed kind
of squares of order 10 with characteristics equal to 73 (i.e. only
for 1 considered SAT instance TREENGELING managed to find
a satisfying solution, calculations on the other SAT instances
were interrupted). It should be noted, that the triple of MODLS
of order 10 from [10] has characteristics 60, but in that triple
2 pairs of squares out of 3 are orthogonal (opposite to 1 out
of 3 in the triple from [11]).
Since in 2015-2016 we found 29 more new pairs of MODLS
of order 10, we constructed 29 more SAT instances in addition
to 20 considered in [11]. UsingTREENGELING with the same
time limit (1 hour) we found two more partially orthogonal
triples with characteristics equal to 73 (see (1) and (2)).
0123456789
1902634875
4276398510
9018573642
3795182064
5647021398
6381245907
7569810423
8450967231
2834709156
0123456789
2438910567
6981074235
5602137894
8516703942
3079865421
7264589310
4850392176
9347621058
1795248603
3825079164
1743862059
6587431290
8204697531
4690583712
7462918305
2976150843
5031246978
9158304627
0319725486
(1)
0123456789
1290378564
4869201357
2971865043
3607589421
9584137206
5732614890
6348720915
8015943672
7456092138
0123456789
9567241803
5390784162
1258037694
8719325046
7941560328
4876102935
2085693471
6432879510
3604918257
9310475826
3245809671
6508347219
5081623794
7963581042
1497062538
8629754103
2754198360
0172936485
4836210957
(2)
2 http://hpc.icc.ru
295
These triples are based on the 5-th and 16-th pairs of
MODLS of order 10 found in the experiment respectively (in
each triple the first two diagonal Latin squares are from the
corresponding pair of MODLS).
The propositional encoding proposed in [11] made it possible to obtain one more class of results. In this encoding
we can set the constraint that the characteristics of the triple
is 100, thus formulating the problem of search for a triple
of MODLS of order 10. In this case when we augment the
SAT instance with the known values of cells of two squares,
the problem transforms to the following: for a given pair of
MODLS of order 10 to find the third diagonal Latin square that
forms a triple of MODLS of order 10, or to prove that there is
no such square. Thus we constructed 49 SAT instances with
assignments of values corresponding to 49 known pairs. For
each instance there was added the constraint specifying that
the characteristics of the triple is equal to 100. On average to
solve one such SAT instance the PLINGELING CDCL solver
[14] took about one second, using 32 processor cores (one
computing node of the Academician Matrosov cluster of ISC
SB RAS). In all cases it was proven that there is no square
satisfying specified limitations (this is a consequence of the
fact, that all CNFs were unsatisfiable). We even managed
to significantly improve this result. When we reduced the
specified value of characteristics, the SAT instances considered
were getting progressively harder. At the present moment we
proved for 49 known pairs that they cannot be used to construct
sets of three partially orthogonal diagonal Latin squares of
order 10 with the characteristic equal to 87. The solving of
corresponding SAT instances took PLINGELING SAT solver on
average about 8 hours per instance on 1 cluster node.
V. R ELATED W ORKS
There are several examples of application of high performance computing to the search for combinatorial designs
based on Latin squares. In [15] there was proven that there
is no finite projective plane of order 10 via special algorithms
based on constructions and results from the theory of error
correcting codes. The corresponding experiment took several
years. On its final stage there was used quite a powerful (at that
moment) computing cluster. More recent example is the proof
of hypothesis about the minimal number of clues in Sudoku
[16] where special algorithms were used to enumerate and
check all possible Sudoku variants. To solve this problem a
modern computing cluster had been working for almost a year.
The volunteer computing project Sudoku@vtaiwan [17] was
used to confirm the solution of this problem.
In [18] there was described the application of SAT solvers
to finding systems of orthogonal Latin squares. Among other
things, the author of [18] used a specially constructed small
desktop grid for more than 10 years in an attempt to find a
triple of MODLS of order 10. Unfortunately, the experiment
did not yield any success.
Apparently, [19] became the first paper about the use of a
desktop grid based on the BOINC platform for solving SAT.
296
It did not evolve into a publicly available volunteer computing
project (like SAT@home did).
VI. C ONCLUSION
In the paper we describe the results obtained by applying
high performance computing to SAT-based searching for systems of diagonal Latin squares of order 10. Using SAT@home
we found 29 pairs of MODLS of order 10 (in addition to 20
previously known pairs). Using a computing cluster we found
two triples of partially orthogonal diagonal Latin squares of
order 10 with characteristics equal to 73 (in addition to one
such triple found earlier). We also proved that based on all
49 known pairs of MODLS of order 10 it is impossible not
only to construct a triple of MODLS of order 10, but even to
construct a triple of partially orthogonal diagonal Latin squares
of order 10 with characteristics equal to 87.
ACKNOWLEDGMENT
The research was funded by Russian Science Foundation
(project No. 16-11-10046).
R EFERENCES
[1] C. J. Colbourn and J. H. Dinitz, The CRC handbook of combinatorial
designs. CRC Pr I Llc, 1996.
[2] A. Biere, M. J. H. Heule, H. van Maaren, and T. Walsh, Eds., Handbook
of Satisfiability, ser. Frontiers in Artificial Intelligence and Applications.
IOS Press, February 2009, vol. 185.
[3] J. Marques-Silva, I. Lynce, and S. Malik, Conflict-Driven Clause Learning SAT Solvers, ser. Frontiers in Artificial Intelligence and Applications.
IOS Press, February 2009, vol. 185, ch. 4, pp. 131–153.
[4] M. Posypkin, A. Semenov, and O. Zaikin, “Using BOINC desktop grid
to solve large scale SAT problems,” Computer Science (AGH), vol. 13,
no. 1, pp. 25–34, 2012.
[5] D. P. Anderson and G. Fedak, “The computational and storage potential
of volunteer computing,” in Sixth IEEE International Symposium on
Cluster Computing and the Grid (CCGrid 2006), 16-19 May 2006,
Singapore. IEEE Computer Society, 2006, pp. 73–80.
[6] I. Foster, Designing and Building Parallel Programs: Concepts and Tools
for Parallel Software Engineering. Boston, MA, USA: Addison-Wesley
Longman Publishing Co., Inc., 1995.
[7] A. Semenov and O. Zaikin, “Using monte carlo method for searching
partitionings of hard variants of boolean satisfiability problem,” in Parallel Computing Technologies - 13th International Conference, PaCT 2015,
Petrozavodsk, Russia, August 31 - September 4, 2015, Proceedings,
ser. Lecture Notes in Computer Science, V. Malyshkin, Ed., vol. 9251.
Springer, 2015, pp. 222–230.
[8] N. Eén and N. Sörensson, “An extensible sat-solver,” in Theory and
Applications of Satisfiability Testing, 6th International Conference, SAT
2003. Santa Margherita Ligure, Italy, May 5-8, 2003 Selected Revised
Papers, ser. Lecture Notes in Computer Science, E. Giunchiglia and
A. Tacchella, Eds., vol. 2919. Springer, 2003, pp. 502–518.
[9] O. Zaikin, A. Semenov, and I. Otpuschennikov, “Solving weakened
cryptanalysis problems for the Bivium cipher in the volunteer computing
project SAT@home,” in Second International Conference BOINC-based
High Performance Computing: Fundamental Research and Development
(BOINC:FAST 2015), Petrozavodsk, Russia, September 14-18, 2015, ser.
CEUR-WS, vol. 1502, 2015, pp. 22–30.
[10] J. Brown, F. Cherry, L. Most, E. Parker, and W. Wallis, “Completion
of the spectrum of orthogonal diagonal latin squares,” Lecture notes in
pure and applied mathematics, vol. 139, pp. 43–49, 1992.
[11] O. Zaikin and S. Kochemazov, “The search for systems of diagonal
Latin squares using the SAT@home project,” in Second International
Conference BOINC-based High Performance Computing: Fundamental
Research and Development (BOINC:FAST 2015), Petrozavodsk, Russia,
September 14-18, 2015, ser. CEUR-WS, vol. 1502, 2015, pp. 52–63.
[12] I. Lynce and J. Ouaknine, “Sudoku as a SAT problem,” in International
Symposium on Artificial Intelligence and Mathematics (ISAIM 2006),
Fort Lauderdale, Florida, USA, January 4-6, 2006, 2006.
MIPRO 2016/DC VIS
[13] J. Egan and I. M. Wanless, “Enumeration of MOLS of small order,”
Math. Comput., vol. 85, no. 298, pp. 799–824, 2016.
[14] A. Biere, “Lingeling essentials, A tutorial on design and implementation
aspects of the the SAT solver lingeling,” in POS-14. Fifth Pragmatics of
SAT workshop, a workshop of the SAT 2014 conference, part of FLoC
2014 during the Vienna Summer of Logic, July 13, 2014, Vienna, Austria,
ser. EPiC Series, D. L. Berre, Ed., vol. 27. EasyChair, 2014, p. 88.
[15] C. Lam, L. Thiel, and S. Swierz, “The nonexistence of finite projective
planes of order 10,” Canad. J. Math., vol. 41, pp. 1117–1123, 1989.
[16] G. McGuire, B. Tugemann, and G. Civario, “There is no 16-clue sudoku:
Solving the sudoku minimum number of clues problem via hitting set
enumeration,” Experimental Mathematics, vol. 23, no. 2, pp. 190–217,
2014.
[17] H.-H. Lin and I.-C. Wu, “Solving the minimum sudoku problem,” in
The 2010 International Conference on Technologies and Applications
of Artificial Intelligence, ser. TAAI ’10. Washington, DC, USA: IEEE
Computer Society, 2010, pp. 456–461.
[18] H. Zhang, Combinatorial Designs by SAT Solvers, ser. Frontiers in
Artificial Intelligence and Applications. IOS Press, February 2009,
vol. 185, pp. 533–568.
[19] M. Black and G. Bard, “SAT over BOINC: an application-independent
volunteer grid project,” in 12th IEEE/ACM International Conference on
Grid Computing, GRID 2011, Lyon, France, September 21-23, 2011,
S. Jha, N. gentschen Felde, R. Buyya, and G. Fedak, Eds.
IEEE
Computer Society, 2011, pp. 226–227.
MIPRO 2016/DC VIS
297
Architectural Models for Deploying and Running
Virtual Laboratories in the Cloud
E. Afgan1,2, A. Lonie3, J. Taylor1, K. Skala2, N. Goonasekera3,*
1
Johns Hopkins University, Biology department, Baltimore, MD, USA
Ruder Boskovic Institute (RBI), Centre for Informatics and Computing, Zagreb, Croatia
3 University of Melbourne, Victorian Life Sciences Computation Initiative, Melbourne, Australia
enis.afgan@jhu.edu, alonie@unimelb.edu.au, jxtx@jhu.edu, skala@irb.hr, ngoonasekera@unimelb.edu.au
2
Abstract - Running virtual laboratories as software services
in cloud computing environments requires numerous
technical challenges to be addressed. Domain scientists using
those virtual laboratories desire powerful, effective and
simple-to-use systems. To meet those requirements, these
systems are deployed as sophisticated services that require a
high level of autonomy and resilience. In this paper we
describe a number of deployment models based on technical
solutions and experiences that enabled our users to deploy
and use thousands of virtual laboratory instances.
I.
INTRODUCTION
The past decade has seen cloud computing go from
conception to the de facto standard platform for application
deployment. Cloud infrastructures are delivering resources
for deploying today’s applications that are more scalable
[1], more cost-effective [2], more robust [3], more easily
managed [4], and more economical [5]. Researchers and
research groups are no different from the rest of the
industry, expecting robust, powerful cloud platforms
capable of handling their data analysis needs. However,
deploying such platforms still requires a significant amount
of effort and technical expertise. In this paper, we build on
our experiences from 5+ years of building and managing
virtual laboratories that were deployed thousands of times
on clouds around the world. We present viable architectural
deployment models and extract best practices for others
developing or deploying their own versions of robust
research platforms.
The theme of this paper revolves around deploying the
concept of a virtual laboratory as a platform for performing
data analysis [6]. Virtual labs offer access to a gamut of data
analysis tools and workflow platforms that are closely
linked to commonly used datasets; they offer access to
scalable infrastructure that has been appropriately
configured, beforehand as well as dynamically at runtime.
Once built, the virtual labs are often deployed on demand
by the researchers themselves. However, in order to make
these platforms available to domain researchers, there is a
requirement to build, configure, and provision the
necessary components. Depending on the complexity of a
virtual lab, this is often a complex task spanning expertise
in system administration, platform development, and
domain-specific application setup.
In addition to deploying variations of said virtual labs
on public clouds, institutions are increasingly setting up
academic clouds (e.g., NeCTAR, Chameleon, JetStream).
Locality of the infrastructure, restrictions on off-shoring
data [7], avoiding vendor lock-in and the no-cost or merit* Corresponding author
298
based allocation of resources are attractive reasons for
utilizing those clouds. From a platform deployment
standpoint, this brings up additional challenges because the
platforms need to be deployed, managed, maintained, and
supported on these additional clouds while coping with any
differences among the cloud providers. It is hence
imperative to design scalable, robust and cloud agnostic
models for deploying these systems. Figure 1 captures the
core concepts enabling development of such models: (a) a
cross-cloud API layer; (b) automation; (c) a configurable
and ‘composable’ set of resources. These concepts, detailed
in the remainder of the paper, embody the notion that for
successfully building a global virtual lab a common
platform rooted in automation is needed.
VL
VL
VL
Resource set
Resource set
Resource set
Automated build and deploy
Cloud 1 API
Cloud 1
Common API
Cloud 2 API
Cloud 2
Cloud n API
Cloud n
Figure 1. Virtual lab deployment stack unified for multiple clouds.
II.
FUNCTIONAL REQUIREMENTS
The choice of a Virtual Lab architecture is driven by a
variety of aspects and cross-cutting concerns [8]. While
some of these decisions are general architectural decisions
applicable to software in general, and some are highly
specific to the domain in question, there are some concerns
which are applicable to virtual labs in general. In this
section, we provide a treatment of such concerns, and list
some of the various architectural concerns that must be
addressed in designing and developing a virtual lab
environment.
For example, a virtual lab would need to determine the
level of customisation that is required by a user. If a
significant customisation is required, it is often the case that
it will impact other users, and therefore, isolated or
MIPRO 2016/DC VIS
individualised access to resources is preferable over access
to a common pool of shared resources. For example, a userowned container or virtual machine, as opposed to a predeployed web service.
Similarly, small job sizes can typically be catered to by
a single, individualised VM, whereas large job sizes may
require an architecture that can dynamically scale to
accommodate more diverse needs.
The choice of appropriate strategy is dependent on
several additional factors, including the purpose of the
TABLE I.
Challenges
virtual lab, target cloud(s) capabilities, available people
effort commitment and similar. Table 1 supplies a core list
of design questions to answer when weighing the available
options. Note that there is no single answer to the supplied
questions but they are largely dependent on the aims of the
virtual lab. Deciding what are the acceptable answers for
the particular lab will help guide the myriad of technical
choices related to implementation. In the sections that
follow, we discuss various compute and data provisioning
strategies that can accommodate these decisions.
FUNCTIONAL DESIGN QUESTIONS TO CONSIDER WHEN DESIGNING A VIRTUAL LAB.
Description
Infrastructure
Infrastructure maturity
Which cloud to use? How stable/mature is the infrastructure? Will the deployed lab be robust for users?
Infrastructure agnosticism
How easy is it to support multiple infrastructure providers? Is this desirable/necessary to increase
accessibility/robustness?
Support
What type of support does the provider offer, for the virtual lab and individual users?
User Management
Per-user customisation
Can each user customise the virtual lab according to their needs, and have a safe environment in which to
learn through failure?
Data management
How is data put into/taken out of the virtual lab?
Quota management
What resource quotas should be enforced for the user? Does the infrastructure provider support that?
Users’ Management of VL
Instance lock-in
Is an upgrade path available so that the user can always use the latest version of the virtual lab?
Replicability
Can the user replicate their experiments - with a guarantee that all software versions remain unchanged?
Reliability
How reliable should the service be? Can losses be tolerated?
Service Management
Software management
Can a user manage the software on the virtual lab on their own, or do they need system administration skills?
Licensing constraints
Are there specific licensing constraints that limit the use of the software?
Security
Authentication
Should the virtual lab allow for single-signon with institutional credentials?
Credentials
How are institutional credentials translated into cloud provider credentials?
Authorization
What actions are users allowed to perform within a virtual lab?
III.
VIRTUAL LAB INFRASTRUCTURE COMPONENTS
Answering above questions and provisioning a virtual
lab requires a marriage of various, complex software
components to their required storage and processing
MIPRO 2016/DC VIS
resource requirements. Depending on the intended usage
for the virtual lab, there are a number of choices regarding
the use of appropriate cloud resources. Table 2 provides a
snapshot of the available approaches for supplying compute
capacity along with the pros and cons for each option.
299
TABLE II.
Provisioning
strategy
COMPUTE PROVISIONING STRATEGIES.
Description
Pros
Cons
Machine Image
A pre-built machine image with all
required software already installed.
• Quick startup
• Excellent reproducibility
• Difficult to upgrade due to monolithic
nature
• Software packages not self-contained causing potential version conflicts
• Software potentially tied to OS version
• Limitations on size
• Breached applications may affect entire
machine
Container
A pre-built container (such as
Docker or LXC), which is deployed
on top of a running Machine Image
or a cloud container service.
• Extremely quick startup
• Excellent reproducibility
• Containers mostly independent of
underlying machine’s operating
system and version
• Easier updates to individual
components
• Breaches contained to container
• Must be pre-built
• Still very new so quickly changing
technology
Runtime
Required software installed at
runtime, using automation software
such as Ansible, Chef, Puppet, etc.
• Push or Pull update models
• Updates easier to make
• Slow deployment/startup times
• Less reproducible (transient network
errors, software version changes)
Hybrid
A pre-built machine image/container
for quick startup, brought up-to-date
through runtime deployment/
extensibility.
• Can benefit from the advantage of
all of the above models
• More complex to implement
Compute resources need to be matched with suitable
storage capacity. Table 3 differentiates among the currently
available cloud storage resource types and captures the pros
and cons of each. The supplied information examines ways
that data can be brought to the compute infrastructure, since
this is still the dominant way to work with existing
TABLE III.
Storage
Model
scientific software. We do not consider the reverse model
as many virtual labs still struggle to shed the weight of their
accumulated legacy software that require a shared, UNIXbased file-system to run, and are not yet in a position to take
advantage of such models despite their benefits.
DATA PROVISIONING STRATEGIES FOR GETTING REQUIRED STATIC DATA TO THE COMPUTE.
Description
Pros
Cons
Volumes /
Snapshots
A volume or snapshot containing the
required data, which is attached to an
instance at runtime.
• Quick to create/attach
• Suitable for large amounts of data
• Not shareable between clouds (in
OpenStack, not shareable between
projects)
• Not guaranteed to be available (e.g.,
infrastructure/quota restrictions)
• Limited sharing ability between nodes
(e.g., volumes only attachable to 1
instance at a time)
Shared POSIX
filesystem
A shared filesystem containing the
required data (e.g. NFS, Gluster),
which is mounted on the target node
at runtime.
• One-time setup
• Very fast to attach
• Updates visible at runtime to all
virtual lab instances
• Suitable for very large amounts of
data
• Must be setup on each supported cloud
• Centralised management/single point of
failure
• Not geographically scalable
Remotely
fetched data
archive
An http/ftp link containing the
required data, which is downloaded
and extracted onto local/transient
storage.
• Cloud agnostic
• Scalable
• Slow - takes a long time to fetch and
extract
• Not suited to very large amounts of data
- to reduce download times/costs
Object-store
Object-storage service provided by
the cloud provider (e.g. S3, Swift).
• High scalability
• Not suitable for random access
• Not supported by legacy tools
300
MIPRO 2016/DC VIS
IV.
have appropriate access to the cloud provider where
the image is available and must personally launch an
instance of the virtual lab; various launcher
applications can make this a straightforward process.
Once launched, the user will have full control over the
virtual lab services but will also need to manage the
services, particularly when upgrades or fixes are
necessary. In addition to managing the services, the
user is in charge of data management, ensuring that the
data is not lost when an instance is terminated. Virtual
lab providers need to bundle the virtual lab into an
instantiatable image, such as a virtual machine image
or a container, and provide periodic upgrades to the
image. Examples include CloudBioLinux [10];
DEPLOYMENT OPTIONS
The resources available and required to compose a
virtual lab can be assembled in a variety of configurations.
Configurations support different use cases and require
varying levels of technical complexity to deploy. Hence,
depending on the intended purpose of the virtual lab, it is
important to choose an appropriate deployment model. We
define the following deployment models and supply a
flowchart in Figure 2 to navigate among the models:
•
•
Centrally managed resource is a virtual lab which is
presented as a public service to the community.
Typically available as a web portal, this virtual lab
requires little or no setup from the user’s side and
permits the user to readily utilize resources offered by
the virtual lab. Because it is a public resource, the user
is likely to experience limited functionality, such as
usage quotas, no ssh access, no possibility for
customisation and other similar constraints typical of
public services. While the users do not require any
setup for this type of virtual lab, the lab maintainers
need to manage and update underlying infrastructure
supporting the supplied services. Resource
management needs to account for the scaling of the
supplied services, upgrades, and reproducibility of
user’s results. In addition to accessibility, other main
drivers for choosing this model for lab deployment are
data management restrictions (in case data is too large
for feasible sharing) and software licensing
constraints. Examples of this type of virtual lab include
XSEDE science gateways (https://portal.xsede.org/
web/guest/gateways-listing), Characterisation Virtual
Lab (https://www.massive.org.au/cvl/), usegalaxy.org
portal [9];
Standalone image represents a feature-full version of
the virtual lab in a small package. A user is required to
Will the virtual lab
be used as a shared,
non-customisable
community service?
No
Yes
Yes
Standalone
VM/container
from an image
Are the number of
users and workload
sizes predictable?
Yes
Are the
anticipated
workloads
small?
•
Persistent short-lived scalable cluster is a
dynamically scalable version of the virtual lab image
with additional services to handle infrastructure
scaling. These services (i.e., cluster management
services) are used to provision a virtual cluster at
runtime (e.g., Slurm, SGE, Hadoop) or utilize cloud
provider services for scaling (e.g., container engine).
The cluster manager software will also supply
additional cluster management services, such as cluster
persistence, allowing a user to shutdown the cluster
when not in use while ensuring the data is preserved.
This deployment model requires coordination of
several resource types from Section 3 and use of
cluster management software, hence implying a
significant deployment effort from the virtual lab
deployers. Examples include the Genomics Virtual
Lab (GVL) [11];
•
Long-lived scalable cluster has the same
characteristics of a short-lived cluster as well as the
ability to upgrade running services. The upgrades are
typically handled by the cluster management software.
No
Are the data
analysis needs
periodic?
No
Long-lived
scalable
virtual cluster
Yes
Persistent
short-lived
scalable
virtual cluster
No
Statically
sized centrally
managed
resource
Dynamically
scalable centrally
managed
resource
Figure 2.
V.
Data provisioning strategies for getting required static data to the compute.
DISCUSSION
In addition to the hardware and functional requirements
for establishing a virtual lab, there are other important
technical and management decisions that affect its
MIPRO 2016/DC VIS
deployment. One of the key attractions for using virtual labs
is the high-level, software-as-a-service experience
delivered to a user. An implication is that the functions
offered by the deployed services need to function well,
301
which drives a need for good testing strategies and quality
assurance. However, complex services and frequent
releases make this a challenge for the deployers. It is hence
advisable to automate the testing procedure and develop a
quality control process before each release. Ideally, the
testing process is decentralized, testing the individual
services for proper functionality and focusing the testing of
the virtual lab on the configuration setup. Such testing can
be achieved by adopting a user-centric view of the virtual
lab and using tools such as Selenium to automate the typical
user actions [12].
Further, virtual labs should be complemented with a set
of training materials, describing all the steps required to
access the services supplied by a virtual lab. Additional
training materials for using the services are also beneficial,
particularly if they are accompanied with webinars or handon workshops.
Besides the technical implementation of the virtual lab,
arguably the most challenging piece of a virtual lab is longterm support. Domain researchers will rely on the virtual
lab to perform data analyses and publish new knowledge
based on the obtained results. For reproducibility purposes,
it is hence important to maintain their access to the
resources required to use a virtual lab. When using a
commercial cloud provider and supplying shared resources,
note should be taken to questions of what happens when the
project funding runs out or software being used becomes
obsolete. Such questions imply that the virtual lab should
make appropriate provisions to allow all the data and
accompanying methods to be downloaded or transferred off
the infrastructure initially used.
Related to the long term support is the notion of
upgradeability. For example, a user working with a
particular version of a virtual lab, may wish to upgrade to
the latest available version. It is generally undesirable to
foist an upgrade on users, as this can adversely affect
reproducibility when software versions are changed.
Therefore, a controlled migration or exit path is often
necessary so that users can switch to newer versions of a
virtual lab when appropriate for their circumstances.
VI.
SUMMARY
With the increased proliferation of cloud computing
infrastructures, we believe the concept of a virtual lab - a
composite platform capable of performing open-ended data
analyses - will become a prevalent platform for utilizing
cloud resources by researchers. In this paper we’ve
described the components required to compose a virtual
lab. Technical and managerial aspects of the decision
making process have been presented shining light on the
tradeoffs among viable options.
Looking to the future, it is expected that the concept of
a virtual lab will continue to evolve towards a more
integrated, quickly deployable system that is instantly
accessible by users. Containers, automation solutions, and
serverless runtime platforms are likely key technologies
that will be adopted to realize this evolution.
ACKNOWLEDGMENTS
This project was supported in part through grant
VLS402 from National eCollaboration Tools and
302
Resources, grant eRIC07 from Australian National Data
Service, grant number HG006620 from the National
Human Genome Research Institute, and grant number
CA184826 from the National Cancer Institute, National
Institutes of Health.
REFERENCES
[1]
Q. Zhang, L. Cheng, and R. Boutaba, “Cloud
computing: state-of-the-art and research
challenges,” J. Internet Serv. Appl., vol. 1, no. 1, pp.
7–18, Apr. 2010.
[2] M. Armbrust, A. Fox, R. Griffith, A. Joseph, and
RH, “Above the clouds: A Berkeley view of cloud
computing,” Univ. California, Berkeley, Tech. Rep.
UCB , pp. 07–013, 2009.
[3] G. Garrison, S. Kim, and R. L. Wakefield, “Success
factors for deploying cloud computing,” Commun.
ACM, vol. 55, no. 9, p. 62, Sep. 2012.
[4] G. Garrison, R. L. Wakefield, and S. Kim, “The
effects of IT capabilities and delivery model on
cloud computing success and firm performance for
cloud supported processes and operations,” Int. J.
Inf. Manage., vol. 35, no. 4, pp. 377–393, Aug.
2015.
[5] S. P. Ahuja, S. Mani, and J. Zambrano, “A Survey
of the State of Cloud Computing in Healthcare,”
Netw. Commun. Technol., vol. 1, no. 2, p. 12, Sep.
2012.
[6] S. D. Burd, X. Luo, and A. F. Seazzu, “CloudBased Virtual Computing Laboratories,” in 2013
46th Hawaii International Conference on System
Sciences, 2013, pp. 5079–5088.
[7] J. J. M. Seddon and W. L. Currie, “Cloud
computing and trans-border health data: Unpacking
U.S. and EU healthcare regulation and compliance,”
Heal. Policy Technol., vol. 2, no. 4, pp. 229–241,
Dec. 2013.
[8] A. Garcia, T. Batista, A. Rashid, and C. Sant’Anna,
“Driving and managing architectural decisions with
aspects,” ACM SIGSOFT Softw. Eng. Notes, vol.
31, no. 5, p. 6, Sep. 2006.
[9] E. Afgan, J. Goecks, D. Baker, N. Coraor, A.
Nekrutenko, and J. Taylor, “Galaxy - a Gateway to
Tools in e-Science,” in Guide to e-Science, X.
Yang, L. Wang, and W. Jie, Eds. Springer, 2011,
pp. 145–177.
[10] K. Krampis, T. Booth, B. Chapman, B. Tiwari, M.
Bicak, D. Field, and K. Nelson, “Cloud BioLinux:
pre-configured and on-demand bioinformatics
computing for the genomics community,” BMC
Bioinformatics, vol. 13, p. 42, 2012.
[11] E. Afgan, C. Sloggett, N. Goonasekera, I. Makunin,
D. Benson, M. Crowe, S. Gladman, Y. Kowsar, M.
Pheasant, R. Horst, and A. Lonie, “Genomics
Virtual Laboratory: A Practical Bioinformatics
Workbench for the Cloud.,” PLoS One, vol. 10, no.
10, p. e0140829, Jan. 2015.
[12] E. Afgan, D. Benson, and N. Goonasekera, “Testdriven Evaluation of Galaxy Scalability on the
Cloud,” in Galaxy Community Conference, 2014.
MIPRO 2016/DC VIS
A CAD Service for Fusion Physics Codes
Marijo Telenta, Leon Kos and EUROFusion MST1 Team∗
University of Ljubljana, Mech. Eng., LECAD, Ljubljana, Slovenia
marijo.telenta@lecad.fs.uni-lj.si, leon.kos@lecad.fs.uni-lj.si
Abstract—There is an increased need for coupling machine
descriptions to various fusion physics codes. We present a
computer aided design (CAD) service library that interfaces
geometrical data requested by fusion physics codes in completely
programmatic way for use in scientific workflow engines. Fusion
codes can request CAD geometrical data at different levels of
details (LOD) and control major assembly parameters. This
service can be part of the scientific workflow that delivers
meshing of the CAD model and/or variation of the parameters.
In this paper we present re-engineering of the ITER tokamak
using an open source CAD kernel providing standalone library
of services. Modelling of the machine is done with several LOD,
starting from the rough one and building/replacing with more
detailed models by adding more details and features. Such CAD
modelling of the machine with LODs delivers flexibility and
data provenance records for the complete CAD to physics codes
workflow chain.
I. I NTRODUCTION
Several commercial “workbench”-style workflows can integrate CAD data for further analysis and parametric optimisation. Usually, the CAD data for use in analysis is prepared
or extracted and simplified in such a way that it is then
easily processed further on by meshing tools. Control of the
CAD data output relies on proprietary plug-ins that allow
extraction to some extent for interaction foreseen by the
“integrators”. Depending on the workbench-suite, building a
coupled simulation with external codes/solvers is only partially
supported in general and does not offer required flexibility to
“custom” physics codes used by the scientists. Importance of
the “open” approach to data and results is often prerequisite
for (re)use in scientific community and that rules out direct
coupling to CAD packages within the “scientific” workflow
engines such as Kepler [1].
Instead of “extracting” the machine descriptions from CAD
models directly, most codes use manually prepared machine
data for their input and have a “hard time” to verify its
correctness with the CAD model. For each new machine to
be introduced into the code, one needs to repeat the procedure
of machine description creation by maintaining “internal”
input format used by the code. Most of these custom input
file formats are relatively simple and meant to be read in
line with Fortran and other programming languages. Little
attention is given to consistency checking of the input format.
Most of the fusion-physics codes use “discretised” geometry
for input by losing some “precision” in machine descriptions.
Notable exceptions are Monte Carlo codes [2] which do
∗
See http://www.euro-fusionscipub.org/mst1
MIPRO 2016/DC VIS
Fig. 1. ITER tokamak complex re-modelled with Open CASCADE kernel
that provides diverse programmable geometric services from CAD data to
fusion codes at several levels of details (LOD). Incremental LOD add features
to assemblies while maintaining the physical correctness are one of the
parameters of the CAD service.
not use grids for determining the particle position but rather
simple primitives (spheres, bounding blocks, etc.) and to some
extent higher level primitives bounded by NURBS surfaces
commonly used in CAD modelling. Such codes are even faster
if the space is not discretised with dense “grids” which can
easily increase computational complexity of positional checking. In principle, input grid density controls the computational
complexity of the code and a proper balance of the grid and
available time is always sought by the scientists to achieve
acceptable accuracy. It should be noted that computational
grid usually differs from input grid although both should be in
“consensus” when introducing important geometrical details
that are neglected with “rough” grids. Described rationale
means that for scientific modelling several input grids are
needed that correspond to the “latest” CAD model available
as the only “true” input source from which grids or other
geometrical primitives should be generated for use as a code
input. CAD modelling process of the tokamak machine in
Fig. 1 was divided into five assemblies: blanket, divertor,
magnets, cryostat, and vessel. For every assembly different
303
level of details (LOD), leading geometrical dimensions, and
other parameters such as number of sections are defined. The
resulting CAD model is quite complex, with many parts,
details and features. Physics codes need only specific parts of
the CAD model prepared in a code-compatible format often
called a “machine description”. Including the LOD parameter
that provides increased number of parts and geometrical features to analyses of physics codes accuracy and corresponding
computational complexity is usually not performed due to
tedious input preparation.
Manual reduction of features done by removing and simplifying unnecessary details in the CAD model is often called
defeaturing. However, in this paper, opposite approach was
taken. The CAD model was reverse engineered, i.e., the CAD
model is re-modelled from bottom up, starting from simpler
shape with basic features and modelling the geometry by
adding features in parametric way. In this process different
LOD’s are defined where each LOD has pre-determined specific complexity of the model. The complexity of the model is
defined by the number of features and repetitive pieces used
to create the CAD model.
II. L EVEL O F D ETAILS (LOD)
CAD model of the tokamak machine is reverse engineered
using the Open CASCADE kernel. Open CASCADE [3] is an
open source CAD kernel that offers programmatic flexibility
in modelling and meshing of CAD models which mesh can
be used as an input for fusion codes. The different levels
of details are created in the process of programming the
geometry with different CAD model complexity respectively.
Basic shapes are modelled with little of geometrical details
and with different number of segments as shown without the
cryostat assembly in Fig. 2. The first LOD, LOD-0, includes
the basic shape and features of the model. This LOD could be
used for analysis where details are not as important and make
the grid unnecessarily more complex.
Several LODs for different sections are shown with different
number of segments. This means that the physics codes could
be supplied with varying LODs and segments in a programmatic way. Furthermore, the shapes may differ between
LODs to follow physics requirements for the simulation. For
example, in Fig. 2(a) the divertor is modelled in a “fused”
way resembling basic shapes without any void space inside;
Blanket and vessel in Fig. 2(b, c) are having no ports apertures.
If holes are important for simulation then one can increase the
LOD for that part only. If there is a specific requirement on the
shape or number of holes one can “inherit” the code and add or
remove features. Numeration of LOD doesn’t need to follow
agreed levels of detail among assemblies if they are properly
developed and documented. Users can consider the code that
generates the CAD models as a starting point for specific
modifications that can later on become a part of the CAD
service. When shape of some part needs modifications due to
precision of obeying the code requirements on the complexity,
the same principles of LOD modifications can be followed.
304
(a)
(b)
(c)
(d)
Fig. 2. LOD-0 of the divertor segment (a), half revolution of the blanket (b),
quarter revolution of vessel (c), and magnets with 3 segments. The geometry
for all parts uses the basic shape without the features that are added or being
replaced at a higher LOD.
MIPRO 2016/DC VIS
Drawback of the programmatic approach is clearly nongraphical representation of the models. For easier modifications and comprehension we are using Python “wrapping”
of the PythonOCC [4] CAD kernel. The Python code is
interpreted and skips the need of recompilation at source code
modifications. CAD kernel performance by using Python is
not significantly affected, as all kernel operations are written
in C++. Repeating lengthy process of model creation with
increasing LOD is even more comfortable with programming
in Python than within a CAD package done interactively.
Listing 1. Snippet where separate sections of the tokamak are called in
programmatic way in Python. In addition, CAD service for cross section is
called for divertor.
(a)
(b)
divertor(display, level_of_detail, number_of_bodies,
divertor_outer_radius, divertor_base_radius,
divertor_outer_dimension, divertor_inner_dimension)
magnets(display, level_of_detail,
Magnet_Outer_Radius, Magnet_Inner_Radius,
Magnet_System_Height, Number_Of_Elements)
vessel(display, level_of_detail, NumberOfSlices,
VacuumVesselInnerRadious, VacuumVesselOuterRadious,
VacuumVesselUpperHeight, VacuumVesselBottomHeight)
blanket(display, level_of_detail, revolve_for,
height_above_ks, height_below_ks, inner_radius,
outer_radius, top_outer_radius, top_inner_radius,
shell_width)
(c)
cryostat(display, level_of_detail,
cryostat_base_inner_radius, sections)
Section(display, shape, angle)
As one can see from the Python example in Listing 1, each
assembly that outputs to display has several input parameters,
such as LOD, number of segments, and parameters which
define the geometry. In this way, desired level of details is
set, and the CAD model geometry can be altered by changing
the input parameters, for example, vessel’s inner and outer
radius can be changed and all other dimensions of the vessel
accordingly. Instead of outputting to display, one can choose
to output shapes/assemblies and combine them into desired
assembly for further operations on geometry. Operations, such
as Section() in the last line of Listing 1 are part of geometry
services provided by the library.
Creation of the each assembly with PyhonOCC undergoes
several steps in programming CAD kernel. Python code for
each assembly is also self-consistent. It acts as a module that
can be imported independently into other Python codes. Modules have additional geometric parameters such as dimensions
and positions of the features that are not exposed but available
for modifications by the users.
The second level of details in Fig. 3 adds details and
new shapes that were left out at LOD-0. Decision on what
is included at each level is guided by the physics code
requirements. As one can see, for the second LOD-1 apertures
are modelled in the blanket and the vessel. The finest LOD2 shown in Fig. 4 defines the most detailed CAD geometry
where further geometric details are added to the CAD model
in reference to the LOD-1, such as fillets, fasteners, etc.
MIPRO 2016/DC VIS
(d)
Fig. 3. LOD-1 of the divertor segment (a), quarter revolution of the blanket
(b), quarter revolution of the vessel (c), and one segment of the magnets (d).
Divertor is now split into two bodies with better description of the shapes.
Apertures for diagnostic ports, antennas and piping are added to the blanket
and the vessel. Toroidal magnets are more precisely described and cooling
piping is added.
305
III. A CAD SERVICE
(a)
(b)
The aim of the presented CAD service is to deliver a
common interface for transfer of CAD data into machine
descriptions for use by physics codes of several facilities
(ITER, AUG, JET, MAST-U, TCV and W7-X). The ultimate
goal is to control, in programmatic way, the disfeaturing
process of the CAD model to the LOD. The product of
the disfeaturing process could be than used as an input for
different meshing codes and eventually meshed correctly.
CAD SERVICE LIBRARY
level of detail
(LOD)
attributes
(material)
parameters
assembly
sections
OUTPUT RESULTS
Meshing
Closed
surfaces
PHYSICS CODES
Cross-section
...
VISUALISATION
(c)
(d)
Fig. 4. LOD-2 of the divertor segment (a), half revolution of the blanket
(b), quarter revolution of the vessel (c), and 3 segments of the magnets (d).
Blankets that were previously modelled as a single shape with apertures are
now separated and have gaps in-between. Fillets are added to vessel. Divertor
was further detailed with holes. Magnets received details in piping and edge
features. With the finest LOD-2 one can study influence of the tiny details to
the simulations. The choice of what is included is still left to the user with
the modification of the code.
306
Fig. 5. CAD service library coupling with physics codes that can control
generated input.
A CAD service, when used within a scientific workflow,
as shown in Fig. 5, delivers an output in a format suitable
for reading by general meshing tools. Meshing operation is
treated as a black box that outputs the grid which is then
used an input for physics codes. The output mesh, as well as,
the results of the physics codes, which are calculated using
these grids are stored together in a scientific database format.
The mesh format from the meshing tools is favoured to be
stored in a common fusion modelling grid description. Ideally,
it should be possible all results to be stored in general grid
description (GGD) format, as this enhances further processing
and analysis of the results using visualisation tools that are
available within the EUROFusion Integrated Modelling community. IGES and STEP standards are commonly used for the
CAD data exchange and can serve as a common data format
for input to meshing codes. Otherwise, if a “mesher” is unable
to read the standard CAD format then this usually means
that custom input needs to be prepared by the CAD service.
For example, physics code PFCFLUX [5] calculates the heat
flux on the plasma facing components (PFC). Plasma-facing
components are a generic term for any part of the tokamak
machine where large amount of power deposition is found.
The PFCFLUX needs as an input triangular mesh of the 3D
PFC surfaces. The usual way in producing required mesh is to
manually separate the PFC surfaces from the CAD geometry,
fill any holes and gaps not needed on those surfaces, assign
different materials to different surfaces, and heal any surfaces
if needed. The CAD service library can supply the PFCFLUX
code with needed input data in programmatic way which could
save time and effort in preparing the mesh. Since PFCFLUX
needs only surface triangles, Open CASCADE kernel can be
used to build the triangular mesh [6]. Similar approach was
MIPRO 2016/DC VIS
(a)
(b)
(c)
Fig. 6. CAD service for cross-section of divertor with (a) at LOD-0, (b) at LOD-1, and (c) at LOD-2.
taken in SMARDDA software library which offers possibility
to perform design-relevant calculations for PFCs [7].
The CAD service is missioned as a library or a module
which would be then installed and used as a service for specific
physics code. Different CAD service library/module would be
created for different tokamak machines. It can provide closed
surfaces, needed for analysis, for example, in plasma facing
components. In addition, the service could offer, as mentioned
above, different LODs which would satisfy the specific needs
of the physics code considered. It can give specific attributes of
the model such as material. Also, basic mesh can be provided
or different cross sections of the model. Finally, depending
on the need, different assembly sections and parameters can
be provided to the mesher and/or physics codes. If provided
data to the physics codes are not sufficient or not valid, CAD
service library can re-provide requested data in appropriate
form. One can see from Fig. 6 the product of the CAD
service for cross section of the divertor with different levels
of details. This service is used by the physics codes which use
2D geometry for calculations such as SOLPS-ITER [8].
IV. C ONCLUSION
Programming CAD kernel Open CASCADE is used to
model CAD geometry of a tokamak machine from simple
shape to more complex one. In the process of creating the
geometry, different levels of details are defined and programmed. Product is a CAD service library/module which can
by request serve physics codes with appropriate data. Variety
of the services could be offered depending on the physics
code used, such as closed surfaces, cross sections, basic mesh,
and material. Numerous physics codes could potentially have
benefit using this CAD service. The time spent on preparing
the CAD model for meshing will be significantly shorter when
using this library. Also, provenance is recorded such that others
could repeat the procedure to produce the particular mesh in
question for the physics code. Reverse engineering approach
for CAD geometry modelling presents opposite approach of
CAD model disfeaturing. “Manual” preparation of the CAD
models is tedious task that can take several months. With this
service many of the preparation steps for simplification are
avoided and can be fine-grained and uniformly controlled.
Usual simplification steps include geometric simplification,
MIPRO 2016/DC VIS
suppression/hide of irrelevant details, and decomposition in
LODs of complex parts. Future work will include additional
tokamaks, services and actors for Kepler.
ACKNOWLEDGEMENT
This work has been carried out within the framework of the
EUROfusion Consortium and has received funding from the
Euratom research and training programme 2014-2018 under
grant agreement No. 633053, task agreement AWP15-EEGJSI/Telenta. The views and opinions expressed herein do not
necessarily reflect those of the European Commission. The
authors would like to acknowledge R. Pitts, S. Pinches and
X. Bonnin from ITER for helpful discussions. Thanks goes to
many students from the Faculty of Mechanical Engineering at
the University of Ljubljana that have contributed to this work.
R EFERENCES
[1] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock,
“Kepler: An extensible system for design and execution of scientific workflows,” in SSDBM ’04: Proceedings of the 16th International Conference
on Scientific and Statistical Database Management. Washington, DC,
USA: IEEE Computer Society, 2004, p. 423.
[2] L. Lu, U. Fischer, Y. Qiu, and P. Pereslavtsev, “The CAD to MC geometry
conversion tool McCad: Recent advancements and applications,” in ANS
MC2015 - Joint International Conference on Mathematics and Computation (M&C), Supercomputing in Nuclear Applications (SNA) and the
Monte Carlo (MC) Method, 2015.
[3] “OpenCASCADE – Open CASCADE technology,” http://opencascade.
org, 2016.
[4] “PythonOCC website – Open CASCADE for Python,” http://pythonocc.
org, 2016.
[5] M. Firdaouss, V. Riccardo, V. Martin, G. Arnoux, C. Reux, and
J.-E. Contributors, “Modelling of power deposition on the jet iter
like wall using the code pfcflux,” Journal of Nuclear Materials,
vol. 438, Supplement, pp. S536 – S539, 2013, proceedings of
the 20th International Conference on Plasma-Surface Interactions in
Controlled Fusion Devices. [Online]. Available: http://www.sciencedirect.
com/science/article/pii/S0022311513001190
[6] K. L. Telenta, M, R. Akers, and E.-I. Team, “Interfacing of CAD models
to a common fusion modelling grid description,” in Proceedings of the
International Conference Nuclear Energy for New Europe, Portorož,
Slovenia, 2015, pp. 707.1 – 707.8.
[7] W. Arter, V. Riccardo, and G. Fishpool, “A cad-based tool for calculating power deposition on tokamak plasma-facing components,” Plasma
Science, IEEE Transactions on, vol. 42, no. 7, pp. 1932 – 1942, sept.
2014.
[8] S. Wiesen, D. Reiter, V. Kotov, M. Baelmans, W. Dekeyser,
A. Kukushkin, S. Lisgo, R. Pitts, V. Rozhansky, G. Saibene, I. Veselova,
and S. Voskoboynikov, “The new solps-iter code package,” Journal of
Nuclear Materials, vol. 463, pp. 480 – 484, 2015. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0022311514006965
307
Correlation between attenuation of 20 GHz
satellite communication link and Liquid Water
Content in the atmosphere
Maks Kolman* and Gregor Kosec**
*student at Jožef Stefan Insitute, Department of Communication Systems, Ljubljana, Slovenia
**Jožef Stefan Insitute, Department of Communication Systems, Ljubljana, Slovenia
maks.kolman@student.fmf.uni-lj.si, gregor.kosec@ijs.si
Abstract – The effect of Liquid Water Content, i.e. the mass
of the water per volume unit of the atmosphere, on the attenuation of a 20 GHz communication link between a ground antenna and communication satellite is tackled in this paper. The
wavelength of 20 GHz electromagnetic radiation is comparable to the droplet size, consequently the scattering plays an
important role in the attenuation. To better understand this
phenomenon a correlation between measured LWC and attenuation is analysed. The LWC is usually estimated from the pluviograph rain rate measurements that captures only spatially
localized and ground level information about the LWC. In this
paper the LWC is extracted also from the reflectivity measurements provided by a 5.6 GHz weather radar situated in Lisca,
Slovenia. The radar measures reflectivity in 3D and therefore
a precise spatial dependency of LWC along the communication link is considered. The attenuation is measured with an
in-house receiver Ljubljana Station SatProSi 1 that communicates with a geostationary communication satellite ASTRA 3B
on the 20 GHz band.
I. I NTRODUCTION
The increasing demands for higher communication capabilities between terrestrial and/or earth-satellite repeaters
requires employment of frequency bands above 10 GHz [1].
Moving to such frequencies the wavelength of electromagnetic radiation (EMR) becomes comparable to the size of
water droplets in the atmosphere. Consequently, EMR attenuation due to the scattering on the droplets becomes significant and ultimately dominant factor in the communications quality [2]. During its propagation, the EMR waves
encounter different water structures, where it can be absorbed or scattered, causing attenuation [1]. In general, water in all three states is present in the atmosphere, i.e. liquid
in form of rain, clouds and fog, solid in form of snow and
ice crystals, and water vapour, which makes the air humid
[3]. Regardless the state, it causes considerable attenuation
that has to be considered in designing of the communication strategy [2]. Therefore, in order to effectively introduce
high frequency communications into the operative regimes,
an adequate knowledge about atmospheric effects on the attenuation has to be elaborated.
In this paper we deal with the attenuation due to the
scattering of EMR on a myriad of droplets in the atmosphere that is characterised by LWC and drop size distribution (DSD). A discussion on the physical background of
the DSD can be found in [4], where authors describe basic
308
mechanisms behind distribution of droplets. Despite the efforts to understand the complex interplay between droplets,
ultimately the empirical relations [5] are used. The LWC
and DSD can be related to the only involved quantity that
we can reliable measure, the rain rate [6]. Recently it has
been demonstrated that for high rain rates also the site location plays a role in the DSD due to the local climate conditions [7].
In general, raindrops can be considered as dielectric
blobs of water that polarize in the presence of an electric
field. When introduced to an oscillating electric field, such
as electromagnetic waves, a droplet of water acts as an antenna and re-radiates the received energy in arbitrary direction causing a net loss of energy flux towards the receiver.
Some part of energy can also be absorbed by the raindrop,
which results in heating. Absorption is the main cause of
energy loss when dealing with raindrops large compared
to the wavelength, whereas scattering is predominant with
raindrops smaller than the wavelength [2]. The very first
model for atmospheric scattering was introduced by lord
Rayleigh [8]. The Rayleigh assumed the constant spatial
polarization within the droplet. Such simplifications limits
the validity of the model to only relatively small droplets in
comparison to the wavelength of the Incident field, i.e. approximately up to 5 GHz when EMR scattering on the rain
droplets is considered. A more general model was developed by Mie in 1908 [2], where a spatial dependent polarization is considered within the droplet, extending the validity of the model to higher droplet size/EMR wavelength ratios. Later, a popular empirical model was presented in [9],
where attenuation is related only to the rain rate. The model,
also referred in as a Marshal-Palmer model, is widely used
in evaluation of LWC from reflectivity measured by weather
radars [10]. Marhsall-Palmer model simply states the relation between the attenuation and rain rate in terms of
a power function. In this paper we seek for correlation
between the LWC and attenuation measurements. LWC
is extracted from reflectivity measurements provided by a
weather radar situated in Lisca and operated by Slovenian
Environment Agency [11]. Attenuation is measured by inhouse hardware that monitors the signal strength between
Ljubljana Station SatProSi 1 and communication satellite
ASTRA 3B [12, 13, 14, 15]. The main purpose of this paper is therefore to investigate correlation between precipitation measured in 3D with the meteorological radar and the
measured attenuation.
MIPRO 2016/DC VIS
III. M EASUREMENTS
II. G OVERNING MODELS
Before we proceed to measurements some basic rela- A. Measurements of signal attenuation
tions are discussed.
Jožef Stefan Institute (JSI) and European Space Agency
Attenuation (A) is a quantity measured in [dB] that de- (ESA) cooperate in SatProSi-Alpha project that includes
scribes the loss of electromagnetic radiation propagating measuring attenuation of the communication link between
through a medium. It is defined with starting intensity Is on ground antennas and a satellite, more precisely between
the ASTRA 3B satellite and SatProSi 1 station. The ASand the intensity received after propagation Ir as
TRA 3B is a geostationary communication satellite located
on the 23.5◦ E longitude over the equator. It broadcasts the
Is
A = 10 log10 .
(1) signal at 20 GHz, which is received at SatProSi 1 with an inIr
house receiver, namely 1.2 m parabolic antenna positioned
The specific attenuation (α = A/L) measured in [dB/km] on the top of the JSI main building with a gain of about
as a function of rain rate (R) measured in [mm/h] is com- 47 dB. The SatProSi measures attenuation every 0.15 seconds, resulting in over 500000 daily records, since 1. 10.
monly modelled as [5]
2011.
α(R) ∼ a Rb .
(2)
Coefficients a and b are determined empirically by fitting
the model to the experimental data. In general, coefficients
depend on the incident wave frequency and polarization,
and ambient temperature. Some example values for different frequencies are presented in Table 1.
Drop size distribution (DSD) is a quantity that, unsurprisingly, describes the distribution of droplet sizes. The
simplest characterization of rain is through rain rate R, measured in [mm/h]. However, rain rate do not give any information about the type of rain. For example a storm and a
shower might have the same rain rate, but different drop size
distribution. However, a simple DSD model is presented in
[9]
N (D) = U exp(−V Rδ D),
(3)
where D stands for drop diameter measured in [mm],
N (D) describes number of droplets of size D to D + dD in
a unit of volume measured in [mm−1 m−3 ] and R rain rate
measured in [mm/h]. The values of equation parameters
were set to U = 8.3 · 103 , V = 4.1 and δ = −0.21. The
DSD was also determined experimentally for different rain
rates [5]. The experimental data is presented in Figure 1,
where we can see that the typical diameter of drops is in
range of mm. There is a discrepancy between the theoretical and experimental data with very small droplets. This can
be fixed with a modified DSD, however scattering on such
small droplets is negligible so the difference is not relevant.
B. Measurements of rainfall rate
Two sources of rain measurements are used in this paper. The first one is a pluviograph installed locally in the
proximity of the antenna. The rain rate is measured every
five minutes.
Another, much more sophisticated, measurements of
rain characteristics are provided by meteorological radars.
The basic idea behind such radars is to measure EMR that
reflects from water droplets. The measured reflectivity is
then related with rain rate with Marhsall-Palmer relation.
Radar reflectivity factor Z is formally defined as the sum of
sixth powers of drop diameters over all droplets per unit of
volume, which can be converted into an integral
Z∞
Z=
N (D)D6 dD .
(4)
0
Note that the form of relation follows the Rayleigh scattering model [16]. Z is usually measured in units mm6 m−3 .
When conducting measurements a so-called Equivalent Reflectivity Factor
ηλ4
Ze =
(5)
0.93π 5
is used, where η means reflectivity, λ is radar wavelength
and 0.93 stands for dielectric factor of water. As the name
suggests both are equivalent for large wavelengths compared to the drop sizes, as in Rayleigh model [16].
Reflectivity factor and rainfall rate are related through
Marshall-Palmer relation as
b̃
Z[mm6 m−3 ] = ãR[mm/h]
,
(6)
where Z[mm6 m−3 ] is reflectivity factor measured in
mm6 m−3 and R[mm/h] is rainfall rate measured in mm/h.
In general, empirical coefficients ã and b̃ vary with location
and/or season, however, are independent of rainfall R. Most
widely used values are ã = 200 and b̃ = 1.6 [9, 10]. Meteorologists rather use dimensionless logarithmic scale and
define
Figure 1. DSD measured in Czech Republic (one year measurement, rain
rate R is the parameter of particular sets of points) [5]. Lines represent
the theoretical value as determined by (3).
MIPRO 2016/DC VIS
dBZ = 10 log10
Z
= 10 log10 Z[mm6 m−3 ] ,
Z0
(7)
where Z0 is reflectivity factor equivalent to one droplet of
diameter 1 mm per cubic meter.
309
f[GHz]
a
b
10
0.0094
1.273
12
0.0177
1.211
15
0.0350
1.143
20
0.0722
1.083
25
0.1191
1.044
30
0.1789
1.007
TABLE 1. VALUE OF COEFFICIENTS FOR M ARSHAL -PALMER RELATION α(R) AT DIFFERENT FREQUENCIES . [5]
The meteorological radars at Lisca emit short (1 µs) tion between quantities is clearly seen on the figure but a
electromagnetic pulses with the frequency of 5.62 GHz and closer inspection is needed to reveal more details about the
measure strength of the reflection from different points in correlation.
their path. Radar collect roughly 650000 spatial data points
per radar per every atmosphere scan, which they do every
10 minutes. They determine the exact location of all their
measurements through their direction and the time it takes
for the signal to reflect back to the radar.
In addition to reflectivity radars also measure the radial
velocity of the reflecting particles by measuring the Doppler
shift of the received EMR, but this is a feature we will not
be using.
IV. DATA ANALYSIS
The analysis begins with handling approximately 20 GB
of radar data for the academic year 2014/15 accompanied
with 3 GB of signal attenuation data for the same time period and approximately 5 GB of attenuation and local rain
gauge data for years 2012 and 2013.
A. Preprocessing the radar spatial data
Radar data was firstly reduced by eliminating spatial
points far away from our point of interest, namely the JSI
main building where antenna is located. The geostationary
orbit is 35 786 km above the sea level, therefore the link
between the antenna and the satellite has a steep elevation
angle 36.3◦ . In fact just 20 km south of the antenna the ray
rises above 15 km, which is the the upper boundary for all
weather activities [3]. Knowing this, a smaller area of the
map can be safely cropped out, reducing the number of data
points from around 650 000 to approximately 6500 for each
radar scan covering an 40 km × 40 km area.
Although we already gravely reduced original data size,
we must still reduce thousands of points into something tangible. The positions of both the antenna and the satellite
are known at all times, a lovely consequence of them being
stationary, therefore the link between them can be easily
traced. Roughly 150 points on the ray path are used as a
discrete representation of the link, referred to as link points
in future discussions. For each link point a median of n
closest radar measurements is computed as a representative
value. The other way of extracting reflectivity factor was
simply to take closest n points to the antenna and select the
median value of those. A visualisation of both methods is
presented in Figure 2.
Figure 2. Positions of radar measurements. The blue rectangle is the
location of the antenna and the rain gauge. The 64 points closest to the
antenna are enclosed in a red sphere and marked as red circles. Red dots
mark the remainder of 512 closest points. The green line is the ray path
between antenna and satellite with green circles representing
corresponding support nodes for support size n = 4.
Figure 3. Measured antenna attenuation and rain rate extracted from 64
radar measurements closest to the antenna. Both datasets have been
sorted into 30 minute bins.
B. Correlation between rain and attenuation
In order to find a relation between rain rate and electromagnetic attenuation, measurements of both quantities must
be paired. There is no obvious way of doing this since both
are measured at a vastly different time-scale. We ended up
dividing time into bins of duration t0 and pairing the measurements that fall within the same bin. The maximum values of every quantity was selected as a representative for the
Now we are left with multiple scalar quantities as a
given time period.
function of time. Antenna attenuation for every 0.15 s, local
rain gauge for every 5 min and various extractions of reflecThe correlation coefficient between two variables X and
tivity factor for every 10 min. Note, that radar values are Y can be calculated using
not averaged over 10 minutes, radar simply needs 10 minutes to complete a single scan. In Figure 3 an example of corr(X, Y ) = mean((X − mean(X)) · (Y − mean(Y )))
rainfall rate measured with weather radar and the measured
std(X)std(Y )
attenuation for a three day period is presented. A correla(8)
310
MIPRO 2016/DC VIS
and is a good quantity for determining linear dependence
between X and Y .
According to the Marshall-Palmer power law a linear
relation exists between logarithms of rain rate and specific
attenuation. Our measurements are of total attenuation A
and not of specific attenuation so we must adjust the equation. We assume a typical distance L as a connecting factor
between the, which gives us
log10 A = log10 La + b log10 R.
(9)
The exact value of L is not relevant as only the parameter b
will interest us. Therefore a slope on a log-log graph, such
as on Figure 4, is equal to the model parameter b. We used
a least square linear fit on each set of data to get the corresponding values for b.
Figure 5. Correlation between rain rate and attenuation with respect to the
number of local support size n and time bin size t0 .
Similarly a correlation with respect to the number of integral support nodes and time bin size is presented in FigIn addition, correlation between logarithmic values of ure 6 . Again, the best correlation is obtained with 8h time
rain rate and attenuation
bins, however with the integral model a small integral support, i.e. n = 22 , already suffices to obtain fair correlation.
corr log10 A[dB] , log10 R[mm/h]
(10) Such a behaviour is expected. In an integral mode we follow the ray and support is moving along, therefore there
is no need to capture vast regions for each link point. On
is used as a quality measure of their relation.
the other hand in a local approach only one support is used
and therefore that support has to be much bigger to capture
enough details about the rain conditions.
V. R ESULTS AND DISCUSSION
Once we paired attenuation and rainfall data we can
scatter the points on a graph. In Figure 4 the attenuation
against rain rate at 8 h bin size is presented. For the local
radar representation n = 26 and for integral representation
n = 22 support size is used. The correlation can be clearly
seen, however not unified, as one would expect if measurements and rain rate - reflectivity model would be perfect.
Since we introduced two free parameters, namely time bin
t0 and spatial support size n for integral and n for local
radar representation, a sensitivity analyses regarding those
parameters is needed.
Figure 6. Correlation between rain rate and attenuation with respect to the
number of integral support size n and time bin size t0 .
To compare measurements acquired with radar and the
ones acquired with the local rain gauge a simpler presentation of correlation is shown on Figure 7. One set of data
has rain rate extracted from radar using the integral method
with support size 4 and two sets using closest either 64 or
512 nodes.
Figure 4. Attenuation dependency on the rain rate measured in three
different ways. Local rain gauge (blue), path integration on each step
selecting closest 4 points (green) and from 64 points closest to the
antenna (red). All measurements have been put into 8h bins.
In Figure 5 a correlation with respect to the number of
local support nodes and time bin size is presented. The best
correlation is obtained with 8h time bins and a local n = 26
support size.
MIPRO 2016/DC VIS
Figure 7. Correlation between rain rate and attenuation as a function of
time bin size t0 for different ways of extracting the rain rate.
311
In the next step we compare our measurements with a
Marshall-Palmer model, specifically the exponent b. According to [9] in 20 GHz the b0 = 1.083 should hold. In
Figure 8 differences between our measurements and b0 with
respect to the time bin size are presented for the same sets
of data as were used in the correlation analysis of Figure 7.
An order of magnitude improvement is visible between local rain gauge and data extracted with radar.
expected, despite the fact that the correlation with the measured attenuation is the highest with the rain gauge measurements. The localized information from the rain gauge
simply cannot provide enough information to fully characterize the rain conditions along the link. There are still some
open questions to resolve, e.g. what is the reason behind the
8 h time bin giving the best result, how could we improve
the correlation, etc. All these topics will be addressed in
future work.
ACKNOWLEDGMENT
The authors acknowledge the financial support from the
state budget by the Slovenian Research Agency under Grant
P2-0095. Attenuation data was collected in the framework
of the ESA-PECS project SatProSi-Alpha. Slovenian Environment Agency provided us with the data collected by
their weather radars.
R EFERENCES
Figure 8. Exponent in attenuation to rainfall relation b compared to value
b0 from Table 1 for 20 GHz as a function of bin duration t0 for a few
ways of extracting rainfall.
VI. C ONCLUSION
[1] J. Goldhirsh and F.L. Robison. Attenuation and space
diversity statistics calculated from radar reflectivity
data of rain. Antennas and Propagation, IEEE Transactions on, 23(2):221–227, 1975.
[2] M. Tamošiunaite, S. Tamošiunas, M. Žilinskas, and
M. Tamošiuniene. Atmospheric attenuation due to humidity. Electromagnetic Waves, pages 157–172, June
2011.
Osnove MeteoThis paper deals with the correlation between attenua- [3] J. Rakovec and T. Vrhovec.
rologije.
Društvo Matematikov Fizikov in Astion of EMR signal due to the scattering on the ASTRA 3B
tronomov Slovenije, 2000.
- SatProSi 1 link and measured rain rate. The main objective of the paper is to analyze the related measurements and
comparison of results with Marshall-Palmer model. The at- [4] E. Villermaux and B. Bossa. Single-drop fragmentation determines size distribution of raindrops. Nature
tenuation is measured directly with an in-house equipment
Physics, 5(9):697–702, September 2009.
with a relatively high time resolution (0.15 s). The rain
characteristics are measured with a rain gauge positioned [5] O. Fiser. The role of dsd and radio wave scattering in
next to the antenna and a national meteorological radar. The
rain attenuation. Geoscience and Remote Sensing New
rain gauge measures average rain rate every five minutes at
Achievements, 2010.
a single position, while the radar provides a full 3D scan
of reflectivity every 10 minutes. Although the attenuation [6] P. Pytlak, P. Musilek, E. Lozowski, and J. Toth. Moddepends mainly on the DSD, a rain rate is used as a referelling precipitation cooling of overhead conductors.
ence quantity, since it is much more descriptive, as well as
Electric Power Systems Research, 81(12):2147–2154,
easier to measure. The reflectivity measured with the radar
2011.
is therefore transformed to the rain rate with the MarshallPalmer relation. More direct approach would be to relate [7] S. Das, A. Maitra, and A.K. Shukla. Rain attenuation modeling in the 10-100 GHz frequency using
the attenuation with the measured reflectivity directly, howdrop size distributions for different climatic zones in
ever that would not change any of the conclusions, since, on
tropical India. Progress In Electromagnetics Research
a logarithmic scale, a simple power relation between reflecB, 25:211–224, 2010.
tivity and rain rate reflects only as a linear transformation.
The analysis of support size and time bin size showed quite
Blue sky and rayleigh
strong dependency on the correlation. It is demonstrated [8] C.R. Nave.
scattering.
http://hyperphysics.phy6
2
that time bin 8h and support sizes of n = 2 and n = 2 for
astr.gsu.edu/hbase/atmos/blusky.html.
Accessed:
local and integral approach, respectively, provide a decent
2015-12-5.
correlation (0.6 − 0.7) between logarithms of measured attenuation and rain rate. Furthermore, the power model has [9] J.S. Marshall and W.McK. Palmer. The distribution
been fitted over measured data and the value of the expoof raindrops with size. Journal of meteorology, pages
nent has been compared to the values reported in the liter165,166, August 1948.
ature. The model shows best agreement with the MarshallPalmer model, when the rain rate is gathered from the inte- [10] R. Uijlenhoet. Raindrop size distributions and radar
reflectivity–rain raterelationships for radar hydrology.
gral along the communication link. somewhat worse agreeHydrology and Earth System Sciences, pages 615–
ment is achieved with a local determination of rain rate. Re627, August 2001.
sults obtained with the rain gauge are the furthest from the
312
MIPRO 2016/DC VIS
[11] Slovenian Environment Agency. (Agencija Republike
Slovenije za Okolje).
diversity experiment performed in slovenia and austria. In The 9th European Conference on Antennas
and Propagation, Montreal, Lisbon, Portugal, on 1217 APRIL 2015, EuCAP 2015, page 5, 2015.
[12] A. Vilhar, G. Kandus, A. Kelmendi, U. Kuhar,
A. Hrovat, and M. Schonhuber. Three-site Ka-band
diversity experiment performed in Slovenia and Austria. In Antennas and Propagation (EuCAP), 2015 9th [15] C. Kourogiorgas, A. Kelmendi, A. Panagopoulos,
S.N. Livieratos, A. Vilhar, and G.E. Chatzarakis. Rain
European Conference on, pages 1–5. IEEE, 2015.
attenuation time series synthesizer based on copula
[13] U. Kuhar, A. Hrovat, G. Kandus, and A. Vilhar. Stafunctions. In The 9th European Conference on Antistical analysis of 19.7 ghz satellite beacon measuretennas and Propagation, Montreal, Lisbon, Portugal,
ments in ljubljana, slovenia. In The 8th European
on 12-17 APRIL 2015, EuCAP 2015, page 4, 2015.
Conference on Antennas and Propagation, to be held
at the World Forum in The Hague, The Netherlands,
on 6-11 APRIL 2014, EuCAP 2014, pages 944–948, [16] Lord F.R.S. Rayleigh. XXXIV. On the transmission
of light through an atmosphere containing small parti2014.
cles in suspension, and on the origin of the blue of the
sky. Philosophical Magazine Series 5, 47(287):375–
[14] A. Vilhar, G. Kandus, A. Kelmendi, U. Kuhar,
384, 1899.
A. Hrovat, and M. Schönhuber. Three-site ka-band
MIPRO 2016/DC VIS
313
Practical Implementation of Private Cloud with
traffic optimization
D. G. Grozev, M. P. Shopov, N. R. Kakanakov
Technical University of Sofia / Department of Computer systems and Technologies, Plovdiv, Bulgaria
dgrozev@vmware.com, mshopov@tu-plovdiv.bg, kakanak@tu-plovdiv.bg
Abstract - This paper presents a practical implementation of
a private cloud, based on VMware technology, optimized to
support CoS and QoS (even when overlay technology like
VXLAN is used) in a field of smart metering in electrical
power systems and IoT. The use of cloud computing
technologies increase reliability and availability of the
system. All routing, firewall rules and NATs are configured
using NSX. Implementation of CoS and QoS in virtual and
physical network will guarantee necessary bandwidth for
normal operation among other virtualized services.
I.
platforms and should be built very scalable. Nevertheless,
exporting them in online processing clusters could reduce
the privacy and increase the cost of communication. The
best alternative in such cases is building a private cloud.
Nowadays most of companies have their own cloud
platform and virtual infrastructure, where they run
business critical application and store data. Advantages of
the private cloud includes:
•
It’s reliable and scalable. All resources are
virtualized and in case of demand we can add
more storage, or computing without downtime or
impact.
•
Fast provisioning. Using techniques like
templates we can deploy thousands machines
with few clicks.
•
Automation. Pretty much everything can be
automated using specific command-line tools or
REST API.
•
Common user interface: Decoupling the
computation infrastructure and the input system,
enables multiple user interfaces to exist side by
side allowing user-centric customization.
INTRODUCTION
Cloud services and Internet of things (IoT) are among
the most discussed terms in the latest research topics.
They represent the two extremes of computer science –
from small distributed devices to large computational
infrastructure. The both topics are used together as IoT is
one of the important sources of data that should be
processed in large volumes [1, 2, 3]. But the use of cloud
technologies in IoT are not limited to data processing and
storage. The infrastructure can be used to deploy virtual
devices and ease their management and dissemination of
configurations. Deployment of virtual devices in the cloud
will lead to new services: sensing-as-a-service or even
sensor-as-a-service [4, 5].
A sustainable energy future depends on an efficient,
reliable and intelligent electricity distribution and
transmission system, i.e., power grid. The smart grid has
been defined as an automated electric power system that
monitors and controls grid activities, ensuring the twoway flow of electricity and information between power
plants and consumers – and all points in between [6].
The smart power grid is a native application of IoT
and Cloud technologies as the intelligent meters are highly
distributed and they generate huge amount of
measurement data. Texas instruments propose a smart grid
architecture with intelligent meters and data concentrators.
A data concentrator is the core of the energy management
framework. It provides the technology to measure and
collect energy usage data. The concentrator can also be
programmed to analyse and communicate this information
to the central utility database. Not only can the utility
providers could use this information for billing services,
but can improve customer relationships through enhanced
consumer services such as real-time energy analysis and
communication of usage information. Additional benefits
of fault detection and initial diagnosis can also be
achieved, further optimizing the operational cost [7].
All data processing and manipulation services as data
concentrators could benefit of the elasticity of the cloud
In the paper, an architecture for a private cloud for
energy measurements and data processing for smart power
grid is presented. This architecture is consisting of a
private cloud that hosts data storage, management and
presentation services together with network of virtualized
sensors. The paper discusses techniques to build and/or
tune private cloud to adopt device management software
along with other virtualized services.
II.
BACKGROUND
A. Traffic optimization using NSX distributed logical
router
There are two types of traffic in the cloud: NorthSouth and East-West. North-South traffic is usually
ingress/egress. This includes traffic from/to Internet and
Intranet. Also it includes traffic between VMs from
different port groups and networks that need to be routed
in layer 3 physical devices. East-West traffic is VM or
Management traffic that bounce between hypervisors
(ESXi) [8].
With Distributed Logical Routing (DLR) deployment
we move routing functionality to the hypervisor (kernel
level) and effectively remove sub-optimal traffic path.
Each ESXi host can route between subnets at line rate or
nearly line rate speed. This means that if we have two
The presented work is supported by the National Science Fund of Bulgaria project under contract Е02/12.
314
MIPRO 2016/DC VIS
VMs from different networks, but on same host and use
DLR communication between them won’t leave the host.
The idea behind DLR is not new one, but it is unique in
virtualization world. All data centre switches have
separation between control and data plane. This allows
network engineers to restart or update control plane, while
data plane is working and there is no down time. DLR has
two components. The first one is DLR Control VM that is
a virtual machine and second one is the DLR kernel
module that runs in all ESXi hypervisor. DLR kernel
module is called “route-instance” and has same copy of
information in each ESXi host [10].
DLR has three types of interfaces: Uplink, Logical
interface (LIF) and Management. Uplink is used by DLR
Control VM to connect to upstream routers and it’s called
“transit” interface between physical and logical space.
DLR support OSPF and BGP on its Uplinks, but you
cannot run them both at same time. LIFs are interfaces
that each ESXi host has in its kernel. They are layer 3 and
acts as default gateway for all VM traffic connected to
logical switches. Management interface is second
interface on DLR control VM and again the idea comes
from data centre switches, where management interface is
separate physical port. This allow all management traffic
to be separated in out of band (OOB) network. Services
that should run on management interface are: SSH,
Syslog, SNMP.
Propagation of routing information in DLR is a
complex one. Initially all routing information is obtained
and stored in Control VM. Actually routing daemons runs
here. Then it is sent to NSX controller, which via
permanent SSL encrypted channel push it in each ESXi
host kernel. DLR LIFs have same IP addresses and
vMACs in each ESXi host. vMACs are not visible in
physical network. Fig. 1 shows all DLR components and
relation between them.
If we move routing in the cloud using NSX ESRs,
physical routers and layer 3 switches will be offloaded,
but traffic between hypervisors won’t be reduced. Virtual
router (ESR) lives on a singles host, which means that VM
traffic will reach that hypervisor via layer 2 network, will
be routed, and goes back to host where VM is. With DLR
netcpa process on each host stores routing information
and when communication virtual instances are on the
same host no network traffic is sent in the physical
network.
It’s always easy to put all VMs in a single port group
and network and get rid of routing at all. This design is
possible if there no requirements for security. Separating
VMs in different networks will give us full control over
exchanged traffic.
B. Traffic optimization using vDS features: shaping,
NIOC, reservation and limits
Backbone of each cloud is virtual switch, where all
VMs are connected. With distributed virtual switch (vDS)
we can optimize traffic using following features: Shaping
– a shaper could be applied on each port group; Network
IO control (NIOC).
NIOC concept revolves around resource pools that are
similar in many ways to the ones already existing for CPU
and Memory. NIOC classifies traffic into six predefined
resource pools as follows: vMotion, iSCSI, Fault tolerant
(FT) logging, Management, NFS, Virtual machine traffic.
With this feature each traffic type is configured with
shares. In case of host physical NIC congestion traffic will
receive bandwidth that is equal to available physical
bandwidth multiplied by ration of traffic shares and sum
of all traffic shares of traffic participating during
congestion. There are 3 layers in NIOC (Fig 2): teaming
policy, shaper and scheduler [9].
There is a new method of teaming called LBT (Load
base teaming). It basically detect how busy those physical
network interfaces are, then it will move the flows to
different cards. LBT will only move a flow when the
mean send or receive utilization on an uplink exceeds 75
percent of capacity over a 30-second period. LBT will not
move flows more often than every 30 seconds.
There are two attributes (Shares and Limit) one can
control over traffic via Resource Allocation. Resource
Allocation is controlling base on vDS and only apply to
this vDS. It applies on vDS level not on port group or
dvUplink level. Shaper is where limits apply. It limits
traffic by the class of traffic. Each vDS has it’s own
resource pool and resource pool are not shared between
vDS.
Shares apply to dvUplink Level and each share rates
will be calculated base on traffic of each dvUplink. It
controls share value of traffic going through this particular
dvUplink and make sure share percentage is correct. This
concept allows flexible networking capacity partitioning
to help users to deal with over commitment when flows
compete aggressively for the same resources.
Fig.1. DLR configuration distribution [9]
MIPRO 2016/DC VIS
NIOC ensures traffic isolation so that a given flow will
never be allowed to dominate over others, thus preventing
drops and undesired jitter. On each traffic type we can
configure reservations and limits.
315
Fig. 2 Network I/O Control [10]
C. Classification and marking, which prepare traffic
for further processing in physical network
Usually CoS and QoS are not configured in the cloud.
If any, it’s applied on physical network devices, where
classification, marking and policing happened. Nowadays
VMware network virtualization support CoS and QoS.
They are relatively new features that come with vSphere
5.5. Enabling CoS and QoS will allow us to mark the
traffic on vDS level and offload physical network. We
have to configure physical switches to trust CoS and QoS
values.
III.
USE-CASE SCENARIO: ENERGY MANAGEMENT
SYSTEM IN THE CLOUD
A. Architecture of the proposed energy management
system in the cloud
The main architecture proposed is consisting of:
message broker for supporting communication between
sensing elements and application and storage elements;
NoSQL data storage (Data Logger) that stores the raw
measurements form sensing devices; Application server
that provides platform for running the data manipulation
and knowledge extraction scripts; relational DB server for
storing processed data; Presentations server that runs
software for preparing the data visualisation – tables,
graphics, maps; and multiple virtual sensor nodes – each
one representing a real measurement unit. The virtual
sensor nodes should be a replica of the physical sensor –
keeping the data extracted from real sensor and providing
interface for its configuration. The servers in the
infrastructure communicate with virtual sensors for
configuration or data extraction and the synchronization
between the physical and virtual sensor is done in
isolation to the others with specific protocols for the
sensor . The logical topology used is shown on Fig. 3.
316
Fig. 3. Logical Topology
MIPRO 2016/DC VIS
B. Private cloud in Technical university Sofia, branch
Plovdiv
For the implementation of energy management system
a private cloud is built in Technical University – Sofia,
branch Plovdiv. It utilizes five hosts with total CPU 83
GHz, total RAM 182GB, total storage >6TB (4 Dell
PowerEdge 1950 hosts and one HP ProLiant D80 G7).
The hosts are interconnected via two Cisco 3750 Gigabit
access switches and one top-of-rack Datacenter switch
AS4600-54T-C (with Cumulus Linux), and the they use
Fujitsu Eternus DX90 S2 Disk storage system connected
using two Brocade Fibre Channel DS-5000B switches.
All hosts run ESXi 5.5 operating system, organised in one
cluster and are registered in one vCenter. All management
services run on separate cluster. Physical topology of our
cloud architecture is shown on Fig. 4.
The presented cloud infrastructure can provide
different cloud services to energy measurement system.
Apart from obvious IaaS where user can deploy its own
virtual machines or PaaS where user can use its own
preconfigured instances of a message broker, data storage
and virtual devices, the following specific cloud services
are used:
•
•
•
Data-as-a-Service – users can extract data from
Data Logger or RDB to apply own algorithms for
analyses;
Sensing-as-a-Service – users can extract Raw
data from sensor measurements;
Sensor-as-a-Service – users can use the existing
virtual sensor infrastructure to apply monitoring
and control functions.
IV.
TRAFFIC OPTIMIZATION IN THE PRIVATE CLOUD
A. Traffic types and classes in the use case scenario
In our private cloud we recognize following types of
traffic:
1. Management – this include all traffic generated by
virtual infrastructure: virtual machine migration
(vMotion), connection to the network storage (iSCSI,
NFS), fault tolerance, user interface for management
(WebGUI, PowerCLI), etc.
2. Virtual Machine own traffic – this is the traffic
generated from all VMs connected to VLAN based port
groups in the distributed virtual switches.
3. VXLAN traffic – this include traffic generated from
all Virtual Machines connected to virtual logical switches.
Within VM traffic we have following classes:
•
Reading sensors;
•
Configure sensors;
•
Store measurements;
•
Extract data series for manipulation;
•
Store processed data in RDB;
•
Extract data for presentation;
•
Export data.
The first two classes are between the message broker
node and virtual devices. The third class is between
message broker and Data Logger. The forth class is
between Data Logger and Application Server. The firth
class is between Application server and RDB server. The
sixth class is between RDB and presentation server. The
seventh class is between data centre and external Internet
services.
B. Preliminary results and discussion
The applied optimizations in the private cloud in
technical university include:
Fig. 4. Physical topology
MIPRO 2016/DC VIS
•
NIOC is enabled on vDS that span over hosts
running virtual services of the energy
management system. We have added custom
defined network resource pool for VXLAN and
configured it with 100 shares and no limit. This
will guarantee that in case of congestion VXLAN
traffic will have enough bandwidth. Here we
omit any QoS settings, because we want outer
VXLAN QoS to match original traffic. All
management traffic is placed in system resource
pools with default shares – 50 shares per pool.
•
DLR is used to route traffic between different
DMZ zones. If two heavy communicating VMs
are on the same physical host the traffic do not
leave the host and physical network is offloaded.
•
DRS Affinity Rules are added on vCenter to keep
VMs that have to exchange large amount of data
between each other on the same host.
•
CoS and QoS rules are added on the vDS
VXLAN port-group to prepare the traffic for
prioritization in the physical networking. We
317
create exact rules that select traffic between
VМs, by source/destination IP and protocol
ports. Applying traffic shaping is for entire portgroup. Having this in mind we group VМs in
appropriate ones.
Some initial tests are done for proving the workability
of the system. On the physical cluster, several Virtual
Machines with Ubuntu 14.04 server are started. They
represent different services, specific to smart metering:
NoSQL data store or Data Logger; Application server for
data manipulation and knowledge extracting; relational
DB store for storing extracted data series and/or data
patterns; Presentation server for data visualization. Virtual
devices are represented by virtual machines running
Ubuntu Snappy. Additional virtual devices can be
deployed on emulated ARM processors using QEMU.
These VMs are configured in different subnets
according to the logical topology (Fig. 3). Configuration
of the DLR on all VMs is the same. The routing table of
one VM is shown on Fig. 5. On each VM several LIFs are
configured and multicast group is assigned to each:
which ease administration, adds scalability and reliability,
and is general enough to be applied in wide range of
scenarios.
The preliminary results shows that using DLR for
routing between VXLAN reduces the overall traffic in the
cluster network and applying NIOC lowers the ratio of
dropped and delayed packets due to high network loads.
The results are preliminary and more detailed
experiments should be made to check the influence of
prioritization of one traffic to others and to obtain detailed
numerical values for the reduction of traffic and delay.
These results can be verified in different scenarios with
different distribution and load of traffic classes.
ACKNOWLEDGMENT
The presented work is supported by the National
Science Fund of Bulgaria project “Investigation of
methods and tools for application of cloud technologies in
the measurement and control in the power system” under
contract Е02/12 (http://dsnet.tu-plovdiv.bg/energy/).
REFERENCES
[1]
TABLE I.
DESCRIPTION OF A LIF IN VMS
Mode
Routing, Distributed, Internal
ID
Vxlan:100005
IP
172.17.7.254
Connected Dvs
dglabs
VXLAN Multicast IP
239.0.0.5
DHCP Relay
Server List: 172.17.0.17
For testing the optimization traffic generators are
started on different VMs and some preliminary results are
obtained. The tests gather information about delay and
jitter of VXLAN traffic when source and sink run on same
ESXi host or on different ones. These values are compared
to check if DLR works as expected.
TABLE II.
DELAY FOR ONE FLOW BETWEEN TWO VMS*
min
max
avg
One host
274.2
283.0
274.3
Separate hosts
836.8
842.5
837.0
* Results are in microseconds (μs)
V.
CONLUSION
The paper discusses the traffic optimization techniques
that can be used in a cloud infrastructure with virtual
networking (VXLAN). These techniques are applied in a
use case scenario – an architecture for energy
management system in a cloud. The presented architecture
is implemented in a private cloud infrastructure in
Technical University of Sofia, Plovdiv branch in virtual
laboratory for Distributed Systems and Networking
(http://dsnet-tu-plovdiv.bg).
Alamri A, Ansari WS, Hassan MM, Hossain MS, Alelaiwi A,
Hossain MA, "A survey on sensor-cloud: architecture,
applications, and approaches," International Journal of Distributed
Sensor
Networks, Article
ID
917923,
Feb
2013,
doi:10.1155/2013/917923, 2013.
[2] Beng, L., “Sensor cloud: Towards sensor-enabled cloud services,”
Intelligent Systems Center Nanyang Technological University,
2009.
[3] Botta, A.; de Donato, W.; Persico, V.; Pescape, A., "On the
Integration of Cloud Computing and Internet of Things," in Future
Internet of Things and Cloud (FiCloud), 2014 International
Conference on , pp.23-30, 27-29 Aug. 2014.
[4] Zaslavsky, A., C. Perera, D. Georgakopoulos, “Sensing as a
service and big data”, Proceedings of the International Conference
on Advances in Cloud Computing (ACC), Bangalore, India, July,
2012, Pages 21-29 (8), arXiv preprint (1301.0159).
[5] Rao, B.B.P.; Saluia, P.; Sharma, N.; Mittal, A.; Sharma, S.V.,
"Cloud computing for Internet of Things & sensing based
applications," in Sensing Technology (ICST), 2012 Sixth
International Conference on , pp.374-380, 18-21 Dec. 2012.
[6] Wu L., G. Kaiser, C. Rudin, R. Anderson, “Data Quality
Assurance and Performance Measurement of Data Mining for
Preventive Maintenance of Power Grid”, Columbia University
Academic
Commons,
2011,
http://hdl.handle.net/10022/AC:P:12174.
[7] P. Prakash, “Data concentrators: The core of energy and data
management”, Texas Instruments white paper, Copyright © 2013,
Texas Instruments Incorporated.
[8] VMware NSX Technical Product Management Team, "VMware®
NSX for vSphere Network Virtualization Design Guide ver 3.0",
Aug
21,
2014
,
online
[04.02.2016]:www.vmware.com/files/pdf/products/nsx/vmw-nsxnetwork-virtualization-design-guide.pdf
[9] B. Hedlund, "Distributed virtual and physical routing in VMware
NSX for vSphere", The Network Virtualization Blog, November
25,
2013.
online
[04.02.2016]:
https://blogs.vmware.com/networkvirtualization/2013/11/distribut
ed-virtual-and-physical-routing-in-vmware-nsx-for-vsphere.html
[10] V. Deshpande, "Network I/O Control (NIOC) Architecture – Old
and New", VMware vSphere Blog, Posted on January 25, 2013,
online [01.02.2016]: https://blogs.vmware.com/vsphere/tag/nioc
Proposed implementation allows building all layer of a
energy management system using cloud technologies
318
MIPRO 2016/DC VIS
GLOSSARY
VXLAN (Virtual Extensible LAN) – A Virtual Network that
emulates an Ethernet broadcast domain
NSX - VMware NSX is the network virtualization platform for the
Software-Defined Data Center (SDDC);
DRS (Distributed Resource Scheduler) – technology for balancing
the computing capacity by cluster to deliver optimized performance for
hosts and virtual machines.
VDS – VMware vSphere Distributed Switch (VDS) provides a
centralized interface from which you can configure, monitor and
administer virtual machine access switching for the entire data center.
DLR – Distributed Logical Router. It separates Control and Data
plane. Control plane is a VM, but data plane is part of hypervisor kernel
vMotion – VMware vSphere live migration allows you to move an
entire running virtual machine from one physical server to another,
without downtime.
Fig. 5. DLR routing table on a host
MIPRO 2016/DC VIS
319
Improving Data Locality for NUMA-Agnostic
Numerical Libraries
P. Zinterhof
University Salzburg/Dept. of Computer Sciences, Salzburg, Austria
peter.zinterhof3@sbg.ac.at
Abstract – Modern multi-socket servers are frequently
based on the NUMA paradigm, which offers scalability but
also introduces potential challenges. Many numerical
libraries have not been designed for such architectures
specifically, so their node-level performance relies heavily on
the inherent quality of their parallelization (e.g. OpenMP)
and the use of highly specialized tools and techniques such
as thread-pinning and memory page placement.
In this paper we propose a simple and portable framework
for improving performance of NUMA-agnostic numerical
libraries by controlling not only the location of threads but
also the location of allocated memory. In addition to many
related approaches we also apply fine-grained control over
memory placement that is implemented by means of the
operating system’s first-touch principle.
I.
INTRODUCTION
We see a steady trend towards ever increasing
performance on all levels of computational hardware. It is
a well-established fact, that with current silicon-based
technology the performance of typical microprocessors
cannot be increased any more just by increasing their
clock rates. This is mainly owed to the fact that doubling
the clock rate roughly equals the quadrupling of power
consumption. In such a setting proper cooling quickly
becomes either technologically unviable, as the resulting
energy density on the surface of the chips would require
very advanced and expensive kryo-techniques or would be
outlawed by simple terms of economic rationality. The
latter especially being the case in the area of
supercomputing where power-consumption of the
installation is already one of the main limiting factors.
Instead, performance increases are to be achieved by
several design factors that are being selectively combined
by hardware designers. On the core-level one sees larger
cache memories and more powerful instructions that rely
on advanced techniques (e.g. speculative execution,
instruction re-ordering, etc.) and wide execution units for
vector processing. On the chip level we observe ever
increasing numbers of parallel cores, located within a
single die and connected to each other by caches or
special bus systems. This level also marks the beginning
of the non-uniform memory architecture (NUMA). On the
system-level multiple CPUs are integrated into single
systems, or, as in the case of large NUMA-machines,
320
multiple systems seamlessly integrate into a single,
potentially very large system that is operated under a
single system-wide operating system kernel. Each level
offers performance advantages but also potential pitfalls.
In this paper we focus on performance optimization on
cache coherent NUMA-systems, as they constitute the
main building blocks of many current high-performance
computers and compute clusters.
Certainly, there is a wide range of computational
workloads on cluster installations, expressed by an even
greater range of software applications. It is our conjecture,
that many of these applications are based on external
numerical libraries, which in some cases might still be
amenable to improvements. Despite much work in the
field and a growing awareness of the problems in
achieving high performance, software development and
performance tuning can be still tricky, especially when
knowledge, expertise, or proper tools are scarcely
available. It is the goal of this paper to demonstrate an
easy path to performance improvements in the use of
numerical libraries. That is, we take a zero-knowledge
approach with regards to the concepts and implementation
details of these libraries, but concentrate on the
orchestration of working threads and memory allocations,
both of which can be controlled to some extent at the level
below the actual library functions.
We will base our work on the well-known basic linear
algebra subroutines-library (BLAS) that is a tremendously
successful numerical library, important to many fields of
research as well as application design. The BLAS-library
already supports symmetric multi-processors by means of
OpenMP, but other than that is agnostic to the special
characteristics of NUMA-based systems. Nevertheless,
there do exist tools for thread-pinning that allow certain
threads of the parallel execution to be pinned to certain
cores of the CPUs. Henceforth, the allocation of working
memory for BLAS-operations and the distribution of it
within a system comprising several NUMA-nodes is
either left to the application or to some allocation policy
that is defined at the start of the application.
Therefore the proposed framework aims at the proper
allocation of memory without requiring intricate
knowledge of the inner structure and workings of the
BLAS-library and at some more detailed level than
offered by most available tools. It seems important to note
MIPRO 2016/DC VIS
that our framework uses BLAS as a test case but is not
being limited to this library.
II.
RELATED WORK
A.
Research
Improving performance for applications on parallel
computers is an important area both in research and
development. The importance is certainly augmented by
the increasing number of high-performance machines
worldwide and the fact that almost every new system
today is based on multi-core CPUs.
One approach to optimize performance is to find
optimal placements for multiple threads onto a multi-core
CPU. In [1] this placement is guided by hardware
performance counters, that have been introduced to
modern x86-based CPUs. The proposed autopin-tool[1] is
probing several different settings of thread-scheduling
until an optimum is declared and being used for
scheduling of the application for the remaining runtime.
A similar “on-the-fly”-optimization technique will be
shown in our framework, too. It has also been shown [2]
that there often is no globally optimal thread placement in
real-life applications, as for different parallel execution
blocks in the code different – locally adjusted –
scheduling will be superior. This local adjustment is also
the main path we pursue in our proposed framework,
although we concentrate on the placement of memory
pages in main memory in conjunction with the standard
thread placement that is governed by OpenMP. Another
interesting feature of the approach described in [2] is the
operation on binary executable files instead of the source
code of the application. This certainly allows for an easier
deployment and possibly wider adoption among users, as
no source code has to be available and no recompilation
takes place. As pointed out in [3], improving data-locality
is only one – albeit important – factor with respect to
performance. There are situations in which an imbalance
in the layout of pages will not hurt performance. In special
cases, one can even rely on an aggregation von L3 caches
in multi-processor environments to even gain super-linear
speedups [4]. On the contrary, even in well-balanced
layouts performance might be sub-optimal due to
congestion of memory links that introduce additional
access latencies. There is no single optimum solution for
these kinds of problems, since the answer also involves
other aspects such as the tradeoff between time-to-solution
vs. speedup and the decision between manual tuning and
auto-tuning. Depending on the setting different paths
might be regarded to be superior to others. There is also an
interesting relation of energy efficiency of a computation
and its performance as pointed out in [5], where high
cache utilization is enforced by binding threads to cores
either by the pthread-API but also by means of the likwid
toolset as in our own setting.
In general, improvements on data-locality not only
support shorter times to solution but also increase energy
efficiency and as a direct consequence of the latter even
higher performing computer systems become feasible
within the same power envelope.
B.
Frameworks and Tools
MIPRO 2016/DC VIS
Amongst solutions for thread-pinning and control of
memory allocation at the level of libraries there is a range
of tools which offer support for these tasks at the system
level. This latter approach removes the need of making
decisions by the software developer, but enables to exert
control over certain aspects of the run-time behavior of an
application by the user or system administrator.
Likwid
Likwid [6] is a set of tools for performance optimization
on NUMA-systems. It can display all kinds of hardware(topology) information, read out hardware performance
counters and energy meters. It can also be used to pin
threads to cores for the runtime of an application. We will
base our experiments (section IV Results) on Likwid.
Taskset
Taskset is a command usually included in Linux
distributions. It enables the user to retrieve or set
processor affinities for a parallel program. An application
will only be run on a set of cores that can be specified by
some mask. For instance, the command ‘taskset –c
1,2,8,16 application.exe’ will ensure that the 4 parallel
threads of applications.exe will not be executed on any
other cores that 1,2,8, and 16. It is not entirely clear,
whether affinities are fixed during the run time of the job
or whether the kernel is still allowed to change affinities
of threads within the requested bounds of cores 1,2,8, and
16. If one application is intended to be confined to a
single CPU socket or NUMA-node (there is no strict 1:1
mapping between the two terms as there frequently can
be found more than one NUMA-node on a single CPU
socket) the above question seems irrelevant. Whereas in
the case of high performance computing we most likely
want to employ all available cores on the machine which
would render an affinity set of <1,2,3, .., MaxCore>
rather redundant under the assumption that threads would
still be allowed to move between cores.
Numactl
Numactl is comparable to the taskset command in Linux,
but this command adds functionality by allowing to bind
memory to specific NUMA-nodes. Also, it offers a more
relaxed way of specifying the location of threads by using
the coarser term of “NUMA-node” instead of core
numbers. With the intention of locking threads to closely
connected memory in order to increase data locality, it is
sufficient to tell the system that a specific set of threads
shall be confined to a certain NUMA-node as all cores
within a NUMA-node enjoy the same memory access
characteristics.
AutoNUMA
AutoNUMA [7] is both a memory and thread placement
tool that has been included in the Linux kernel. By
keeping track of an application's memory accesses is
determines an online strategy for page migration which is
321
then executed by a separate kernel thread. Unfortunately,
some distributions need to be patches before using
AutoNUMA, thus rendering the software less portable.
III.
IMPROVING DATA LOCALITY
Our framework aims at performance improvements by
locally adjusting memory distribution over different
NUMA-nodes by exploiting the well-known first-touch
policy in Linux. Under first-touch policy memory pages
of some newly allocated block of memory will first be
mapped to some Zero-page, such that reading from any
position of the block will result in bit patterns 0. Not until
the first write operation the operating system will decide
which NUMA-node the actual page of memory will be
physically placed on. Linux will take that NUMA-node to
hold the page that is closest to the thread which is making
the write. Hence Linux already provides high data
locality for the given thread and block of memory under
the rationale that the first-calling thread will continue to
make frequent use of a memory page afterwards.
Furthermore, memory distribution is being controlled in a
fine-grained fashion by selectively choosing NUMAnodes for all arrays that will be used by BLAS within a
given application program.
In general we find two ways of providing and initializing
arrays for use with BLAS.
322
Allocation with explicit initialization.
Explicit initialization of arrays can be the
consequence of the applications ‘natural’
dataflow (e.g. some input data have to be stored
in memory before BLAS operations can take
place on them) or the personal programming
style of the developer.
A important factor will be whether this
initialization is accomplished by a sequential
thread or a pool of parallel threads, especially on
a system employing first-touch policy. A
sequential thread will typically force all pages to
be physically located at its local NUMA-node,
until this node will be exhausted and follow-up
pages will ‘spill over’ onto neighboring NUMAnodes. Initialization within a 'parallel'-block in
OpenMP will lead to a balance of the
distribution of pages over a certain number of
NUMA-nodes, as OpenMP always aims to
provide a balanced distribution of tasks.
Allocation without initialization.
When no explicit initialization precedes the use
of some memory block or array by BLAS, the
operating system will choose some NUMA-node
on the basis of the system-wide placement
policy. This may be either the above mentioned
first-touch or the round-robin policy, depending
on Linux kernel settings.
In a typical application we do not see a single call to
some library function but a series of calls, often
embedded in a dynamic control flow. Here one can
encounter a series of different library functions repeatedly
operating on the same block of memory, with each
function potentially sporting a different memory
placement for its optimal performance, but as the
placement of memory is done only once during the firsttouch phase this placement might prove inadequate for
some of the subsequently invoked library functions. This
can affect performance negatively as a high degree of
data-locality cannot always be ensured in such a setting.
In order to deal with that problem, our framework not
only enables control of memory placement at the level of
single pages in a multi-threaded (OpenMP) environment
but also allows changes to that placement dynamically,
albeit not in an automatic fashion.
Control of placement is based on the definition of the
topology of the placement, followed by a function which
employs a set of parallel threads to force the physical
mapping of pages onto the NUMA-node targets (Fig. 1).
This mapping has to be accomplished by way of parallel
threads in order to exploit the first-touch policy, which
states that a page will be physically stored on that
NUMA-node of which the calling thread is part of during
the first write operation to that page.
The placement is defined as a vector with each element
denoting the identification of the thread which is to
execute the first touch on the corresponding memory
page. We base our experiments on a Linux kernel with a
page size of 4096 bytes.
Function 'touch' (Fig. 1) applies some write operation to
array ‘memblock’ at memory position i*4096, thus
forcing ownership of page i to the NUMA-node that
corresponds to the calling thread ‘TID’.
void placePages (placementVector,
memblock)
{
#pragma omp parallel private (i, TID)
{
TID=omp_get_thread_num();
for (i=0; i < number_of_pages; i++)
{
if (TID == placementVector[i])
touch (memblock, i * 4096);
}
}
}
Figure
OpenMP based
page placement
By freeing
and1:reallocation
of some
memory block the
placement can be changed at will during runtime. This
operation could be substituted with the page migration
function of the NUMA library, but with maximum
portability in mind the above approach has been favored.
The operation imposes very little overhead as all memory
use in a NUMA-system will have to go through some
initialization anyway. The only overhead is given by the
MIPRO 2016/DC VIS
parallel for loop, which seems rather negligible by never
exceeding 8 ms wall time in our experiments.
Three methods of deriving proper placement vectors have
been employed.
random placement
genetic optimization1
Monte-Carlo based random search
The “random placement” method places pages on
randomly chosen NUMA-nodes. It works surprisingly
well (see section IV. Results). By subjecting the placement
of pages to optimization by means of a genetic algorithm
we also aim to increase data-locality and overall system
performance. The genetic algorithm is based on a
simulated population of genetic individuals that encode
different entries of the placement vectors. By selectively
recombining
and
permuting
individuals
under
evolutionary pressure the quality of the placement will be
improved from generation to generation. This algorithm is
based on the assumption that for each two individuals of
the population a clear distinction of their respective
performance can be made. As it turns out, pinning threads
to specific cores still allows for some variance in
execution times that effectively counteracts this necessary
distinction and thus hampers the applicability of the
genetic algorithm.
As a result the genetic algorithm shows similarities to
another method which can be called Monte Carlo-based
random search.
Our framework therefore offers support for performancesampling (based on Monte Carlo method) for an arbitrary
number of repetitions of calls to a BLAS-function. Over
time this random search typically yields placement
vectors of increasing quality. The user has then to provide
some stopping criteria that concludes the search operation
with the optimal placement vector to be employed for all
subsequent calls to that BLAS-function. It has to be noted
that the search operation can be integrated within the
normal operation of the application (‘on-the-fly’
optimization), so that no serious overhead is incurred.
Placement vectors can also be stored and retrieved for
later use.
IV.
RESULTS
For assessment we applied the framework to three
operations (dgemm, dgesv, dsytrd+dorgtr+dsteqr) of the
BLAS library in double precision (64 bit). BLAS
functions have been applied to randomized input data and
linked against the Intel MKL v11.2 library on a quadsocket SuperMicro system consisting of 4 AMD Opteron
6386 SE CPUs (64 cores total) and 512 GB RAM, which
1
is arranged in the form of 8 NUMA nodes. The likwidpin[6] utility has been built for the employed CentOS 6.3
Linux system and subsequently being used to pin 64
OpenMP threads to the 64 physical cores of the test
system. Each experiment has been repeated 10 times and
reported numbers denote the average performance or runtime, respectively.
The first experimental run involves the solution of a
linear system of equations, where the ‘naïve’ setting of
extending an otherwise sequential code with the
parallelized BLAS routine ‘dgesv’ is compared to the
code parallelized with OpenMP and the code parallelized
and optimized by the proposed framework.
Also, likwid [6] has been used for pinning threads during
the latter two trials, leaving the naïve code under the
standard regime of Linux both for scheduling threads and
placement of memory pages.
Table 1 displays various execution times and Fig. 2
displays run times for different problem sizes, ranging
from 1024 up to 16384 in steps of 512.
Our framework constantly performs better than the other
two approaches.
Example two is based on a triple call to BLAS which
computes eigenvalues and eigenvectors of a dense matrix.
The placement strategies 'OpenMP' and 'framework'
perform better than naïve (single core) memory allocation
and placement along with the lack of thread pinning up to
some 30 %. Nevertheless, execution times of 'OpenMP'
and 'framework' (Fig. 3) are very close and probably
within the margins of measurement errors, so we cannot
declare a clear winning scheme in this realm.
Table 1: performance for BLAS dgesv function,
in seconds of wall time
Placement method/
problem size
4096
8192
12800
16384
naive
1.04
7.39
13.69
37.79
OpenMP framework
0.58
4.09
8.66
24.50
0.57
4.05
8.58
23.64
Experiments have been conducted by means of the
Genetic
Algorithm
Utility
Library,
http://gaul.sourceforge.net with an initial population
of 128 individuals under a Darwinian scheme with
crossover/mutation and migration probabilites of
0.03/0.09/0.07.
MIPRO 2016/DC VIS
323
Figure 4: BLAS dgemm performance
Figure 2: solve system of equations, BLAS dgesv
Example three is based on matrix/matrix-multiplication
by means of the dgemm-function of the BLAS. The
implementation is based on a blocking algorithm, which
aims at high utilization of the three levels of processor
caches in modern CPUs. Thus, dgemm already provides
near to optimum performance in its standard incarnation
and it should be less amenable to optimizing page
placements. Fig. 4 displays performance for dgemm with
input matrices A, and B being initialized in parallel by the
64 OpenMP-threads, placements being done by the
proposed framework and also by using first-touch based
placements as a result of the output of a preceding
dgemm operation. So in the latter case we effectively ran
two (dummy) dgemm matrix multiplications first.
By just employing the first-touch policy within these
operations we obtained page placements for the resulting
output matrices C1 and C2 that reflect typical placements
of output matrices for dgemm operation. Only in the last
step we measured performances of the dgemm operation,
now being based on C1 and C2 as input matrices. The
described setup aims to model the standard data flow for
some abstract numerical application which most often
bases the input data of a function on the output data of
another preceding function.
Our proposed framework will lead to an average increase
in performance by 0.5 % over the OpenMP-style
Figure 3: BLAS eigenvector/values computation
324
initialization and placement while employing first-touch
policy yields an average speedup of 0.35 %.
Applying Monte Carlo random search on dgemm for 150
steps yields an additional speed up of 1 % (Fig. 5)
This may sound very modest, but with this optimization
mode being easily integrated into existing algorithms it
still can be regarded as a meaningful improvement. Also,
offline optimization can be used with the resulting
placement vector being retained for later use.
V.
CONCLUSION AND OUTLOOK
Based on the widely known BLAS library we have
demonstrated a simple but nevertheless effective
framework for optimizing data-locality within numerical
libraries that have not specifically been designed for use
on Non Uniform Memory Architecture-based parallel
systems. Our approach operates at the level of memory
allocation and does only require an operating system that
is capable of first-touch page placement policy and thread
pinning. Thus we regard the proposed framework to be
highly portable.
Also, no prior knowledge of implementation details of
the numerical library is required by the user in order to
enjoy the benefits the framework.
Figure 5: Monte Carlo optimization over first
150 samples. Validation for iterations [151,300]
MIPRO 2016/DC VIS
Future work will explore some fused approach, which
integrates optimization of both thread placement and page
placement as a single objective. This multi-modal
optimization can be expected to deliver further
improvements of performance of numerical libraries on
modern NUMA-systems.
[3]
[4]
[5]
REFERENCES
[1]
[2]
T. Klug, M. Ott, J. Weidendorfer, C. Trinitis, „autopin – automated
optimization of thread-to-core pinning on multicore systems“, In:
P. Stenström (ed.) Transactions on HiPEAC III, LNCS, vol. 6590,
pp. 219-235, Springer, Heidelberg, 2011.
A. Mazouz, S. Touati, D. Barthou, “Dynamic Thread Pinning for
Phase-Based OpenMP Programs”, Euro-Par 2013 Parallel
Processing, pp 53-64 , Lecture Notes in Computer Science, 8097,
Springer Berlin Heidelberg, 2014.
MIPRO 2016/DC VIS
[6]
[7]
F. Gaud et al., “Challenges of Memory Management on Modern
NUMA Systems”, Communications of the ACM, pp 59-66, Vol.
58, No.12, 2015.
G. Kosec, M. Depolli, A. Rashkovska, R. Trobec, “Superlinear
speedup in a local parallel solution of thermo-fluid problems”,
OpenMP parallelization of a local PDE Solver”, J. Computers and
Structures, pp 30-38, Vol. 133, Pergamon Press, Elmsford, NY,
USA, 2014.
D. Davidovic, M. Depolli, T. Lipic, K. Skala, R. Trobec, “Energy
Efficiency of Parallel Multicore Programs”, Scalable Computing:
Practice and Experience, Volume 16, Number 4, pp. 437–448.
http://www.scpe.org, 2015.
J. Treibig, G. Hager, G. Wellein, “LIKWID: A lightweight
performance-oriented tool suite for x86 multicore environments,”
http://arxiv.org/abs/1004.4431
J. Corbet, “AutoNUMA: The other approach to NUMA
scheduling”, LWN.net, http://lwn.net/Articles/488709/, 2012.
325
Use Case Diagram Based Scenarios Design for a
Biomedical Time-Series Analysis Web Platform
Alan Jovic*, Davor Kukolja*, Kresimir Jozic** and Mario Cifrek*
*
University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
** INA - industrija nafte, d.d., Zagreb, Croatia
Corresponding author: alan.jovic@fer.hr
Abstract - Biomedical time-series analysis deals with
detection, classification and prediction of subjects' states
and disorders. In this paper, we present requirements and
scenarios of use for a novel web platform designed to
analyze multivariate heterogeneous biomedical time-series.
The scenarios of use are described with the corresponding
UML Use Case Diagrams. We also discuss some
architectural
and
technological
issues,
including
parallelization, visualization and feature extraction from
biomedical time-series. The goal of this paper is to present
what we currently consider as the best approach for design
of such a system, which may also be beneficial for similar
biomedical software systems. The paper is focused on design
and architectural considerations only, as implementation of
the complex system has only just begun.
I.
INTRODUCTION
Web and mobile applications development for
biomedical services is continuously growing in the
healthcare community [1,2]. While the developed
software aims either at general population users [2] or at
medical professionals for continuous monitoring of
patients [3], there is very little interest involved in
developing an integrative web platform that could benefit
medical professionals, researchers and experienced
general users in modeling subjects' states and disorders.
The goal of the currently running Croatian Science
Foundation research project HRZZ-MULTISAB1 is to
develop a high quality system for an all-encompassing
analysis of multivariate heterogeneous biomedical timeseries. It is conceived that this system would be used by
all interested users, including medical doctors, biomedical
engineers, computer scientists, and others, depending on
their goal. As the system will be developed as a web
platform, the users will be able to access it from afar,
which differentiates it from local or hospital-specific
software solutions.
The problem that the project tackles at its core is the
hard problem of efficient biomedical time-series features
identification. There have been many efforts put forth to
discover and describe both domain-specific and general
time-series relevant features [4-6]. The platform that we
are currently in the process of developing will offer
significant advancements in this respect, as we will
consider the use of time-series features there were proven
1
This work has been fully supported by the Croatian Science
Foundation under the project number UIP-2014-09-6889.
326
to be effective in a wide range of sciences [6], as well as
in specific biomedical signal domains [7].
Once the platform's implementation is completed, we
plan to investigate and compare the best feature
combinations advocated by domain experts with the best
general time-series features and with no-features
engineering approach using deep learning approaches [8].
The goal is to achieve as accurate as possible models of
subject's states, including various medical disorders.
Presently, in this paper, the goal is to describe system
requirements for such a web platform and to elucidate the
way in which the platform will be built. We focus on the
requirements, with a brief overview of the system's
architecture and the technologies involved.
II.
SYSTEM REQUIREMENTS
System requirements, which were agreed upon by the
entire project working group were the following:
• An integrative software solution for the analysis of
multivariate heterogeneous biomedical time-series.
• Software solution implemented in the form of a web
platform, as thin client - fat server.
• Software logic layer on the server written in Java
programming language.
• Interface towards the user implemented with a set of
contemporary web development technologies
(HTML5, CSS3, TypeScript...).
• Multiple input file formats: European data format
(EDF) and EDF+, textual format for signals and
annotations, images formats. Files that contain metadata (anamnesis, disorder annotations, etc.)
• Visualization of signals in 2D (records inspection)
and specific body disorders in 3D using graphical
hardware.
• Biomedical time-series preprocessing, such as signal
filtering, R peak detection in ECG series, data
transformations in: time, frequency, and timefrequency domains.
• Biomedical time-series feature extraction - features
would be chosen by: 1) a medical expert system
implemented in the platform, which would be based
on current medical knowledge from guidelines and
MIPRO 2016/DC VIS
relevant scientific papers, 2) an expert user,
manually; a large number of features need to be
supported by the platform, both general and domainspecific biomedical time-series features.
• Feature selection methods, classification, regression,
and prediction machine learning algorithms should
be used to construct accurate subject state models.
• Results reporting in contemporary formats.
The platform will necessary support the analysis of the
following biomedical time-series:
•
ECG – electrocardiogram – cardiovascular
activity
•
HRV – heart rate variability – a series of RRintervals whose variability is analyzed
•
EEG – electroencephalogram – brain activity
•
EMG – electromyogram – muscle activity
Additionally, optionally, the platform will support:
•
electrooculogram (EOG), electroglottogram
(EGG),
cardiotocogram
(CTG),
electrocorticogram
(ECoG),
photoplethysmogram (PPG), pressure (arterial
blood pressure - ABP, pulmonary artery pressure
- PAP, central venous pressure - CVP),
respiration (impedance), oxygen saturation
(SpO2), CO2, gait rhythm, galvanic skin
resistance, etc.
The platform will support multivariate heterogeneous
analysis (e.g. ECG + HRV + SpO2).
In the implementation phase, the focus will be
primarily to demonstrate the working capabilities of the
platform on a smaller set of signals and medical disorders.
III.
UML USE CASE DIAGRAMS FOR SYSTEM
REQUIREMENTS VISUALIZATION
System requirements have been detailed in the form of
UML Use Case Diagrams, v.1.4+ [9], drawn in Astah
Community edition tool [10], which present how a user
would interact with the system. These diagrams provide a
behavioral description of a system's use, without insight
into a detailed temporal sequence in which the interaction
with the system is performed. Regardless of the absence
of a temporal view on the interaction process, all the
scenarios of use can be easily identified. This facilitates
web platform development.
7. Model construction
8. Reporting
Some of these phases may be skipped entirely,
depending on the user.
Additionally, the 9. diagram, User account, depicts
registration, login and profile management. Finally, the
10. diagram, Platform administration, shows the
administrative access to the web platform.
In the subsequent subsections, we present the diagrams
and their detailed explanation.
A. Analysis Type Selection
Analysis type selection, Fig. 1, is conducted in a way
that a user chooses the goal for an analysis that he wants
to perform: detection, classification, prediction, or
inspection and visualization. He also chooses the data type
on which the analysis will be based. The data type may
include exclusively biomedical time-series or biomedical
time-series with additional domain data (e.g. subject
anamnesis, metadata which describes record type, etc.).
Biomedical time-series may be univariate by type,
which means that there is only a single measured data
array available in the record, e.g. the times of heart beats,
or multivariate, which means that there are several
measured data arrays present in the record. Additionally,
multivariate data may he homogeneous, which means that
they come from the same source measurement device, e.g.
EEG data, or heterogeneous, which means that there are
more than one measurement device involved at the same
time, e.g. ECG + HRV + systolic ABP. An example of
analysis type selection would be classification based on
EEG data.
B. Scenario Selection
Scenario selection, Fig. 2, enables the user to select a
predefined scenario, based on previously selected analysis
type. It also allows the construction of a completely new
scenario through full customization. A list of predefined
analysis scenarios with all of the corresponding aspects
(used components) will be provided to the user. The user
will be able to confirm a scenario or change some of its
aspects, such as preprocessing methods, feature extraction
methods or model construction methods. When defining a
There were eight UML Use Case Diagrams that we
identified, which describe the phases of the analysis:
1. Analysis type selection
2. Scenario selection
3. Input data selection
4. Records inspection
5. Records preprocessing
6. Feature extraction
MIPRO 2016/DC VIS
Figure 1. Analysis type selection.
327
new custom scenario, or changing an existing scenario,
the user will be able to select any of the methods for
accomplishing his goals. The predefined scenarios will
contain features that are to be extracted from signals,
which will be selected by the medical expert system
specifically designed for this purpose. The medical expert
system will be implemented based on current relevant
medical guidelines, standards, and relevant medical and
biomedical engineering literature. An example of a
scenario would be to detect epilepsy in EEG, using
wavelet transform modulus maxima and correlation
dimension features, and a C4.5 decision tree.
Figure 2. Scenario selection.
C. Input Data Selection
Input data selection, Fig. 3, includes the choice of file
extensions (or file formats) that will be uploaded for the
analysis. These formats include EDF, EDF+ (with or
without annotations), separate textual annotation files (e.g.
heart beat annotations), textual raw signal files (without
annotations) and image files (e.g. JPEG, TIFF). The user
will have to select the appropriate file format in order for
file upload to work without errors. Aside from selecting
the format, the user also specifies the files that will be
uploaded or selects the folder from which all the required
files can be found. If there are some specific details
related to records loading into platform (e.g. only partial
file loading, limitations for some image formats, etc.),
these can also be specified during this phase.
D. Records Inspection
After the records are uploaded into the platform, it will
be possible to inspect and visualize (Fig. 4) each loaded
record through the platform's graphical user interface.
Here, the user will be able to: select specific record
segment, select temporal and amplitude scaling of the
signals, display some or all of the signal trails (arrays),
inspect record headings and domain data (if such
information exists). 3D visualization will be enabled for
certain specific states for some biomedical signals (e.g.
visualization of myocardial infarction location based on
ECG). Additionally, the user will be able to annotate some
of the available signals or change the existing annotation
files and save these changes.
E. Records Preprocessing
Records preprocessing, Fig. 5, assumes various
procedures for signal filtering and data transformations.
Signal filtering includes noise filtering (e.g. notch filters
for 50 / 60 Hz), baseline wandering corrections (e.g. for
ECG because of breathing, body movements or electrodes
impedance changes), records segmentation into windows
of certain width. Window width may be chosen by the
user or it may be implicit, determined by the very features
that are to be extracted (e.g. a single RR interval for
detailed ECG analysis). Other filtering methods include
various linear and nonlinear filtering procedures with the
goal to obtain higher quality signals.
Data transformations will be performed after the
filtering procedures and will include various time (e.g.
principal component analysis - PCA), frequency (e.g. fast
Fourier transform, Hilbert Huang transform), timefrequency (e.g. wavelet transform) and other types of data
328
Figure 3. Input data selection.
Figure 4. Records inspection.
transformations. The goal of data transformations is to
obtain the data in a form from which it is easier to
calculate precise domain or general features for describing
specific subjects' states.
F. Feature Extraction
Feature extraction, Fig. 6, is the central step in the
analysis of biomedical time-series. The user will already
be provided with a list of features in the scenario selection
phase. Regardless of the scenario, in the feature extraction
phase, it will be possible so select additional features and
MIPRO 2016/DC VIS
cancel feature extraction execution at any time. The user
will be allowed to save the extracted feature vectors in a
file for future analysis, if he wants to have the
intermediate results recorded.
G. Model Construction
Model construction, Fig. 7, is a complex step in the
analysis of biomedical time-series that takes place after
feature extraction and includes various machine learning
algorithms for the analysis of the extracted feature vectors.
In this step, the user will be able to load the file with
feature vectors that were calculated and saved earlier and
resume the analysis or continue the analysis from the
recently obtained and unrecorded extracted feature
vectors. Model construction phase starts with the option to
select feature vectors' preprocessing methods. Herein, one
can use feature vectors' dimensionality reduction methods
such as various transformations (e.g. PCA) or feature
selection methods (filters, wrappers, embedded, and other
methods [11]).
Figure 5. Records preprocessing.
Dimensionality reduction is necessary in the case where a
large number of features were extracted (either by the
choice of the expert system or by the user's selection). The
goal of dimensionality reduction is to keep only relevant
and non-redundant features in order to improve speed and
accuracy of model construction. Aside from feature space
dimensionality reduction, it will be possible to use some
methods to alter the feature vectors themselves (e.g. resampling, missing values replacement, etc.).
Figure 6. Feature extraction.
specify parameters for calculation of a particular feature,
for those features that are parametric. When starting
feature extraction, the user will be able to inspect the
information about the estimation of the analysis duration
and will be able to modify the type of calculation
parallelization that will be performed. It will be possible to
After the feature vectors preprocessing step, the user will
be able to select a model construction method. During this
step, first a method is selected and then, the method
parameters are defined. Method parameters selection will
be made possible if the modeling method is parametric
(e.g. C4.5 decision tree, random forest).
Before starting model construction, it is necessary to
specify the learning method that will be used: holdout
Figure 7. Model construction.
MIPRO 2016/DC VIS
329
procedure, where a part of dataset is used training and the
other part for testing, or cross-validation, where testing is
performed on dataset segments so that all the dataset is
eventually used for testing. It will also be possible to
perform only model testing on a set of new samples, if
there is a model present in the system that was trained
earlier on the same feature set. When starting model
construction, the user will be able to inspect the
information regarding the estimation of the construction
duration and will be able to modify the type of computing
parallelization that will be performed. The construction
procedure could be cancelled at any time. After model
construction, it will be made possible to save the trained
model into a file for later use.
Figure 8. Reporting.
H. Reporting
Reporting, depicted in Fig. 8, is the step in which a
user chooses the way in which the results of the analysis
are displayed. Thereafter, the report is displayed in the
selected form (on a web page only, in PDF, MS Excel,
Word or ODF form). It will be possible to save the copy
of the report onto the client's computer, whereas its copy
will be stored on the server. Additionally, the user will be
able to grant access to the report to other registered users,
at his convenience.
I.
User Account
An account for a new user will be opened by
registering the user onto the platform, which will include
entering the data about the user (necessary data: user
name, password and e-mail address; additional data: first
and last name, title, affiliation, affiliation's address,
telephone), Fig. 9. Logging into the system will be
allowed after confirmation that the registration was
accepted. Profile management will be enabled after a
successful login. It will include editing personal
information, accessing generated (finished) reports that
are linked to the user or resuming the last analysis
conducted by the user.
Platform Administration
Platform administrator will have to login into the
platform in the same way as the user, by entering his user
name and password, Fig. 10. The administrator will have
at his disposal a number of options. First, he will be able
to edit his personal profile. He will also be able to manage
user accounts for all the platform's users. This will include
opening a new user account (as if a user has registered),
confirming the registration of a new user, and removing
an existing user account. For each account, administrator
will be able to inspect the user's personal information
(except password), without the possibility of modification.
Figure 9. User account.
J.
The administrator will be able to inspect and search all
of the reports generated in the platform. Also, he will be
allowed access to all of the other files stored in the system,
including all the saved input files, all the feature vectors
files, and all the constructed models file. He will be able
to delete them at his convenience. Moreover, the
administrator will have access to all platform's logs, which
will be sorted by date. The logs will contain relevant
information about the users' actions during the use of the
330
Figure 10. Platform administration.
platform. Additional information about the platform as
well as its usage statistics will also be at his disposal.
IV.
PLATFORM ARCHITECTURE
A. Web Technologies
System architecture was envisioned as a web portal,
which will be accessed by the users through their
browsers. This type of architecture was selected, as it is
necessary to enable access to the platform from any
remote location. With this, the range of potential users is
widened, which is important for platform recognition,
acceptance, and breadth of applications.
MIPRO 2016/DC VIS
On the client hand side, during platform development,
HTML5, CSS3, and Typescript languages will be used for
designing web pages. As aids in developing web pages,
Angular 2 framework will be used for programming and
Bootstrap framework will be used for better visual
experience and platform responsiveness. Also, WebGL
technology will be used for 3D visualization of human
body sections that are of interest to the user.
On the server hand side, Java programming language
will be used. This language was chosen because of a large
number of existing signal processing libraries, data
parsing, implemented machine learning algorithms, ease
of web development, and efficient parallelization support.
Server side will communicate with the client side through
the RESTful protocol. In order to lower the requirements
on the server's resources, we will test the use of 1) a
standard Java application server with a minimum set of
options, and 2) a stand-alone framework that supports the
required functionality (e.g. Spring Boot). A minimum set
of options will include a user authentication library and an
object relational mapping library (e.g. Hibernate).
For reporting, JasperReports® Library, an open-source
Java library that supports all the required formats
(including HTML) will be used.
required goal could be determined. Auxiliary packages
would contain only signatures (name, parameters) of the
available components for a specific analysis phase.
The platform will contain a large number of
biomedical time-series features, both general and domainspecific, i.e. related to a specific type of time-series (e.g.
EEG). In the platform, we plan to use the features from
HRVFrame [5], EEGFrame [13] and Comp-Engine [6]
frameworks, with appropriate modifications, language
translations and testing. Additionally, other domainspecific frameworks, such as the one for ECG analysis
will have to be implemented mostly from scratch.
V.
We have shown the requirements and architecture for
an integrative biomedical time-series analysis web
platform. Future work will involve implementing the
requirements and reporting on the achieved modeling
results of the subjects' states.
REFERENCES
[1]
B. Architectural Details
The developed architecture should be both modular
and scalable. Its modularity should enable the
development of various add-ons, while its scalability
should support the concurrent work of many users. The
platform should also support two important aspects of
contemporary software: 1) parallelization and 2) dynamic
module loading.
[2]
Parallelization will be used to accelerate calculations,
which is especially efficient for the algorithms that can be
divided into independent execution sections, without the
need for synchronization. Parallelization will be enabled
in two phases of the analysis: for feature extraction and for
model construction, because these phase are the most
computationally demanding. We may also consider
employing it for records preprocessing, if deemed
necessary. In any case, the goal will be to maximize the
resource capacity on the server with respect to the number
of available CPU cores, as well as employing general
purpose GPU parallelization. GPU parallelization will be
achieved using JCuda or jocl frameworks, depending on
the available hardware. The platform will evaluate the
available hardware and the number of running processes
in order to enable maximum support for a user's demands.
[5]
Dynamic module loading will be a useful feature of
the platform, which is significant from the perspective of
server resources' optimization. If, for example, a user
wants to classify ECGs using C4.5 decision trees, without
any additional algorithms, then there will be no need to
load all the other platform modules into memory. The
modules will be implemented so that they will have
maximum cohesion of services within the module, while
retaining minimum coupling between other similar
modules [12]. This will be enabled through an early
definition of analysis scenario (see section III. B). The
exact sequence of module execution to achieve the
MIPRO 2016/DC VIS
CONCLUSION
[3]
[4]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
J. Oster, J. Behar, R. Colloca, Q. Li, Q. Li, and G. D. Clifford,
“Open source Java-based ECG analysis software and Android app
for atrial fibrillation screening,” in: Computing in Cardiology
Conference (CinC) 2013, IEEE, pp. 731–734, 2013.
A. Szczepanski and K. Saeed, “A Mobile Device System for Early
Warning of ECG Anomalies,” Sensors, vol. 14, pp. 11031–11044,
2014.
G. Paliwal, A. W. Kiwelekar, “A Product Line Architecture for
Mobile Patient Monitoring System,” in: Mobile Health, Vol. 5 of
Springer Series in Bio-/Neuroinformatics, pp. 489–511, 2015, doi:
10.1007/978-3-319-12817-7_22.
A. Bravi, A. Longtin, and A. J. E. Seely, “Review and
classification of variability analysis techniques with clinical
applications,” BioMed. Eng. OnLine, vol. 10, p. 90, Oct. 2011.
A. Jovic, N. Bogunovic, and M. Cupic, “Extension and Detailed
Overview of the HRVFrame Framework for Heart Rate
Variability Analysis,” in: Proceedings of the Eurocon 2013
Conference, I. Kuzle, T. Capuder, and H. Pandzic, Eds. Zagreb:
IEEE Press, pp. 1757–1763, 2013.
B.
D.
Fulcher,
M.
A.
Little,
and
N.
S.
Jones, “Highly comparative time-series analysis: the empirical
structure of time series and their methods,” J. Roy. Soc. Interface,
vol. 10, p. 20130048, April 2013.
A. Jovic and N. Bogunovic, “Evaluating and Comparing
Performance of Feature Combinations of Heart Rate Variability
Measures for Cardiac Rhythm Classification,” Biomed. Signal
Process. Control, vol. 7 no. 3, pp. 245–255, May 2012.
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, Book in
preparation
for
MIT
Press,
2016,
url:
http://goodfeli.github.io/dlbook/.
Object Management Group, “Unified Modeling Language®
(UML®) Resource Page”, http://www.uml.org/, last accessed on:
2016-02-21.
Change
Vision
Inc.,
“Astah,
Community
Edition,”
http://astah.net/editions/community, last accessed on: 2016-02-21.
A. Jovic, K. Brkic, N. Bogunovic, “A review of feature selection
methods with applications,” in: Proceedings of the MIPRO 2015
Conference, P. Biljanovic, Ed., Rijeka: MIPRO Croatian Society,
2015, Rijeka: MIPRO Croatian Society, pp. 1447–1452, 2015.
K. Praditwong, M. Harman, X. Yao, “Software Module Clustering
as a Multi-Objective Search Problem,” IEEE Trans. Software
Eng., vol. 37, no. 2, pp. 264–282, April 2011.
A. Jovic, L. Suc, and N. Bogunovic, “Feature extraction from
electroencephalographic records using EEGFrame framework,” in:
Proceedings of the MIPRO 2013 Conference, P. Biljanović, Ed.,
Rijeka: MIPRO Croatian Society, pp. 1237–1242, 2013.
331
1
Augmented Reality for Substation Automation by
Utilizing IEC 61850 Communication
M. Antonijević*, S. Sučić* and H. Keserica *
*
Končar-Power Plant and Electric Traction Engineering Inc, Zagreb, Croatia
{miro.antonijevic; stjepan.sucic; hrvoje.keserica}@koncar-ket.hr
Abstract - IEC 61850 standard represents the most commonly
used communication technology for new substation automation
projects. Despite the fact that IEC 61850 provides a semantic
data model and a standardized configuration description, these
facts are underutilized in substation automation management
today. This is specifically illustrated in the data visualization
domain where new technologies such as virtual and augmented
reality have reached significant maturity levels and have not been
used for IEC 61850 system visualization so far.
In this paper IEC 61850 features have been combined with
augmented reality technologies for providing added value
visualization capabilities in substation automation domain. The
developed prototype demonstrates proof-of-concept solution for
regular substation automation checks and maintenance activities.
I. INTRODUCTION
Smart Grid automation has introduced significant novelties in
different Smart Grid subsystems including, Distributed Energy
Resources (DERs), distribution automation (DA) and
substation automation systems (SAS). One of the elementary
prerequisites for successful automation is unified
communication mechanisms that facilitate subsystem remote
monitoring and control. International standard IEC 61850
represents one the of the Smart Grid automation pillars by
introducing standardized communication principles and
semantic description of controlled systems. Existence of
standards-compliant semantics that describe power system
process data allow new possibilities for developing unforeseen
applications used in SAS domain. An example of these
application categories are Augmented Reality (AR)
applications that can utilize meta-data provided by IEC 61850
standard and directly access process related information in
order to provide added value features for SAS management.
This paper analyses possibility of using IEC 61850 standard
for developing AR applications used for SAS maintenance
activities.
The paper is organized as follows. The following chapter
provides an overview of AR technology usage in industrial
automation systems. The third chapter describes main features
of IEC 61850 standard and AR technologies in order to give
an overview of possible use in SAS environment. The fourth
chapter describes main functionalities of proposed AR
application, its architecture and implementation approach. The
fifth chapter provides analysis results of current developed
prototype while the last chapter gives a conclusion.
332
II.
AUGMENTED REALITY IN INDUSTRIAL SYSTEMS
Industrial machines have become advanced tools where
automation and advanced feedback is made possible through
dedicated control computers. The computers allow us to fully
or partially automate complex procedures and can help make
manual control more secure and precise. Real-time data from
the process is available and many parameters can be
interactively controlled through the computer interface.
Automation can also allow an operator to monitor multiple
machines simultaneously, reducing the number of required
personnel for a machine pool. Many critical procedures exist,
however, that cannot be completely automated. In such cases,
the operator might need to be able to visually follow and
interactively control parts of the current operation, while
simultaneously monitoring numerous rapidly changing
parameters. The maturity of AR technologies has allowed its
usage in different industrial automation environments.
AR technologies have been successfully applied in
manufacturing [1], smart building management [2],
automotive and aerospace industries [3]. The AR application
in SAS environment have already been used in [4] and [5].
Hoverer, in both articles AR is used for simulation purposes
and operator training. Neither of aforementioned articles deals
with process related SCADA used for SAS maintenance
purposes as proposed in this paper.
III.
IEC 61850 AND AUGMENTED REALITY
IEC 61850 [6] is often regarded as just another remote
control protocol for electric utilities, but this international
standard is more than a set of rules and encodings for data
retrieval from field devices. IEC 61850 defines automation
architecture requirements for utility subsystems in order to
enable communication and semantic interoperability among
multi-vendor equipment. Despite being primarily developed
for substations [7], IEC 61850 is now extended for wind
power plant domain [8], Distributed Energy Resources
(DERs) [9] and hydropower plants [10].
Information modeling scope
Server
1…n
Logical Device (LD)
1…n
Logical Node (LN)
Application scope
Data Set (DS)
1…n
0…n
1…n
Data Object (DO)
1
1…n
Data Attribute (DA)
Functional constraint (FC)
1…n
1…n
1…n
Figure 1 IEC 61850 data class model
MIPRO 2016/DC VIS
2
A.
IEC 61850 basics
1)
Data model
Data semantics provided by IEC 61850 are closely related to
the functionalities of devices in utility subsystems such as
DERs [9]. IEC 61850 data models [11] are based on objectoriented modelling of data relevant to process automation.
Figure 1 shows relationships between IEC 61850 data
model classes. The top parent class is the Server which
represents a physical device, i.e., a device controller.
The Server consists of one or more Logical Devices (LDs),
i.e., virtual representations of devices intended for supervision,
protection or control of automated system. LDs are created by
combining several Logical Nodes (LNs) which represent
various device functionalities. LNs are a crucial part of IEC
61850 data semantics.
2)
Data exchange
The ACSI is a novel paradigm, introduced by IEC 61850,
for describing data exchange procedures in utility subsystems
such as substations [11]. ACSI model classes define abstracted
information services used for vertical and horizontal
communication among IEC 61850 devices. ACSI is not a
protocol but a method to tie IEC 61850 abstract services to
application layer protocols such as MMS [12].
These ACSI model classes can be used as standardized
information interfaces for devices which are realized as
IEC 61850 servers. Thus, any IEC 61850 enabled client
software can take full advantage of their remote control.
3)
Managing and engineering IEC 61850 systems
The engineering process for IEC 61850 systems is based
on exchange of XML documents which are formatted
according to the System Configuration description Language
(SCL) [13]. There are several SCL document types depending
if they describe device or the integrated system itself. The
engineering process based on SCL document exchange is
relatively static and most commonly used for substation
automation systems where communication system and electric
network topology are predefined.
B.
Augmented reality – main features
AR is a computer technology that augments the real
environment through visually represented information. Thus,
to find the opportunities for AR applications in industrial
systems it is necessary to review the main AR features to
suggest the solutions based on AR. The main AR features are
as follows:
AR can follow the user's viewpoint by means of a
tracking system;
AR can superimpose virtual objects onto the user's
view of a real world scene;
AR can render the combined image of virtual objects
and a real world scene in real time;
AR can locate virtual objects in a real world scene in
correct scale, location and orientation.
The control computer often has access to a large amount of
process information and we can expect this to increase as the
systems become more sophisticated and complex. It is thus
important that the data is clearly presented and easily
accessible, in order to avoid unnecessarily attentiondemanding interfaces and information overload for the user.
MIPRO 2016/DC VIS
Today’s control computers typically present their data on a
traditional computer display and often use a keyboard/mouse
or a touch-screen as input devices. The operator observes the
process through the machine’s safety glass, while using a
computer to the side for control and feedback. This setup
results in divided attention if the operator has the need to both
follow the procedure visually and simultaneously monitor
important values on the computer display. While this problem
can in part be addressed through display placement, it
inherently separates the process data from the process itself.
Goal of this paper is to provide solution of integrating the
process data with the workspace behind the safety glass using
AR technology. AR allows combining interactive computer
graphics with real objects in a physical environment, such as
the workspace of an industrial machine. It is particularly
suitable for today’s increasingly complex industrial machine
processes as it enables intuitive representation and real-time
visualization of relevant information in the right place. There
are several situations where it might be advantageous to have
the capability of annotating the real process with virtual
information. An operator might for example want to indicate
or emphasize locations inside the workspace behind the safety
glass to a co-worker or a student. The type, dimensions and
state of the tool currently in operation may be indicated by a
virtual label. Process simulation in the real machine using
virtual tools and virtual materials could increase safety
through virtual previews of the procedure, and provide
implicit visual warnings from unintentional geometrical
inconsistencies.
1)
The ideal augmented reality interface integrates
computer graphics with a real environment seamlessly and
without encumbering technology. The long tradition of mobile
AR systems has required systems based on head mounted
displays (HMD) that involve complex tracking and equipment
to be worn by the user such as the Oculus Rift. The popularity
of video see-through systems can be attributed to the relative
ease of implementation and rapid prototyping possibilities
provided through various software libraries, such as
ARToolkit [14]. Even spatial AR systems like the ASTOR
system [15] have been in research and have provided very
good results. The majority of AR applications today are
marker-based requiring specific markers for 3D tracking and
positioning. In this paper QR (Quick Response) [16] has been
used as a marker.
2)
Augmented reality markers
There are a lot of different types of markers used in AR and
although all are applicable in certain situations, one of them,
the QR code, is being used a lot more than any of its
competitors. Its name is an abbreviation of Quick Response
Code and it is a 2D matrix barcode developed by Desno Wave
Corporation in 1994 [16], [17].
A QR Code is capable of handling many types of data, such
as numeric and alphabetic characters. One pattern can encode
up to 7,089 numeric characters or 4,296 alphanumeric
characters. Despite a large chunk of data that can be stored in
one QR code it can be decoded by a very lightweight
smartphone application or a lightweight PC application with
camera access, unlike other 2D barcodes which usually require
a specific scanner to help decode them.
ARToolkit, the library that is used in this project has its
333
3
own markers, but due to QR’s superiority it has been decided
to use QR codes instead of AR’s traditional markers. A QR
Code is quite similar to an ARToolkit marker in appearance
(Figures 2 and 3), but can encode more information than
ARToolkit markers.
The goal is to use QR codes as superior markers and
combine them with AR techniques by using QR Codes as
traditional ARToolkit markers. Doing the registration process
of traditional markers can be omitted therefore simplifying the
procedure. Also encoding large amounts of data enables
developing more complex and interesting applications.
TABLE 1 ERROR CORRECTION LEVEL OF A QR CODE [18]
Level
L
M
H
Q
TABLE II COMPARISON OF QR CODE WITH FIDUCIAL MARKERS [18]
Feature
Need to pre-register
Model storing
Limited number of
markers
Universality
IV.
Figure 2 Fiducial markers
Alignement
pattern
Position Area
Percentage of codewords that can
be restored (Approximation)
7%
15 %
25 %
30 %
QR Code
No
Internet
Larger
Fiducial Marker
Yes
Local
Smaller
Universal
barcode
Standalone
IEC 61850 AR APPLICATION
Developing IEC 61850 AR application is based on utilizing
IEC 61850 vertical communication, data engineering trough
SCL and QR codes as data markers for identifying parts of the
SAS equipment according to the IEC 61850 data semantics.
The SCL files provide information about semantically
annotated process data. This information is correlated with QR
markers assigned to SAS interior (transformers, circuit
barkers, switches, feeders, etc.). The QR image recognition
recognizes markers, the AG projector draws additional
information on the maintenance engineer smartphone screen.
The process information overlay SAS equipment directly
connecting marked equipment with process information in real
time available through SCADA system. By integrating AR the
SAS maintenance activities can be significantly simplified
allowing unambiguous equipment identification, providing
real time measurements and eased SAS monitoring.
A.
Data Area
Figure 3 . IEC 61850 QR code
Figure 3 shows an example of a QR code and highlights its
parts. A QR code’s most important sections are three large
square patterns (each contains a small black square with a
white border) which are positioned in three corners of the
code. They are used to determine the position of the code.
Version 2 (and above) expands on this by adding an align
square used to align the code after detection. The rest of the
code is used to draw a large number of small blocks that
encode the data in the code. An additional benefit of this type
of encoding is that the code can be reconstructed even if a part
of it was damaged or is missing. This is important when
scanning the code with a camera because the camera is never
going to be positioned perfectly and parts of the code will
often be omitted.
There are a couple of correction levels of a QR code which
can be seen on Table 1. Higher correction levels allow more of
the code to be missing with the drawback of increasing the
code size. Superiority of a QR code over other fiducial
markers can be seen in Table 2 [18]. These are the arguments
for using QR codes in SAS AR applications.
334
Application architecture
Application architecture includes several components. The
smartphone camera captures the QR code and processes the
process data identifier based on IEC 61850. The recognized
path represents semantically annotated process data according
to IEC 61850 communication standard. This data is correlated
with the SCADA process data that retrieves the data from the
internal SCADA memory database. The SCADA system
gathers data from Intelligent Electronic Device (IED) via
vertical IEC 61850 communication. The application
architecture is shown in Figure 5.
B.
Implementation approach
All substation primary equipment elements have been
marked with a QR code which describes the path to the
uniquely identified process data according to IEC 61850
standards. Figure 3 shows an example of the QR code for
process data representing transformer voltage measurement in
phase A (myLD/MMXU1.PhV.phsA.cVal.mag).
The QR code library used for generating QR codes is
qrcode 5.2.2 [19] which is a simple Python library that can
generate a QR code with data that you want the code to have.
The data is used for identification information for the entity
that the QR code is put on (for example the transformer
mentioned above). Using a device with a camera, like a
smartphone, QR code is detect first the with ARToolkit
libraries and process the code’s contents. Once it is known
MIPRO 2016/DC VIS
4
what IEC 61850 process data needs to be displayed the
application is connected to a SCADA data exchange interface
and ask for the data. The SCADA is then responsible for
fetching the data and sending it back to AR application. Once
the data has arrived the information is overlaid over the
camera feed and the type of overlay will usually depend on the
type of data shown. This is also done using ARToolkit
libraries for overlaying information. In most cases it will be a
simple value and name being overlaid over the marker. After
the initial overlay the SCADA is asked to provide real time
updates of the data in question as long as the marker is in the
camera’s vision. This real-time data is provided by the
commercial SCADA system PROZA NET. The updates are
then used to update the overlaid image and the result is having
real-time data being shown on the camera feed as long as it
has the marker in its line of sight.
after it has been located in a 3D world it is necessary to
decode the information contained in it. A lot of programs used
for decoding markers require the user to place the marker
exactly in front of the camera to correctly decode the
information. Since QR codes are used as ARToolkit markers it
cannot be demanded from the user to position the camera in
such a way paying attention to the position and the orientation
of the camera in regards to the code. Because of that fact QR
code must be aligned to facilitate information extraction of the
QR code. The code is always projected from one plane on the
camera image plane so we can restore its orientation using
perspective transformation which we can be written as:
where
are original coordinates of the QR code and
are aligned homogenous coordinates of the same
code. The rest of the variables can easily be estimated using
the coordinates of the four corners of the code.
This perspective transformation can now be used to
determine the proper orientation of the code. Now that the QR
code is readable by our application it is quite simple to use the
information to ask a SCADA system for the data and overlay
the data over the QR code using ARToolkit.
V.
Figure 4 Pattern detection
ARToolkit can calculate the percentage of confidence with
which the pattern was recognized. If it is above 50% the
pattern is treated as a good candidate for detection. If three
candidates are detected in a single frame their special
relationship is calculated and those that don’t follow the right
triangle rule set by QR codes are discarded. Figure 4 shows
the special relationships that need to be satisfied.
After determining the three position detection patterns
ARToolkit’s capabilities can be used to find the fourth corner.
Locating the code is a big step in understanding the code and
PROTOTYPE EVALUATION
Evaluation is based on utilizing the developed application
prototype that is applied in a real world substation. The
preliminary results show that QR codes represent a good
selection as markers for IEC 61850 data semantics. Figure 6
shows the application in action.
So far the application is in its infancy, as you can see from
the screenshot in Figure 6, but the results are promising
enough for us to be confident that it will be highly applicable
in real world situations and that the real-time data update will
be attractive to end users.
SCADA system
Substation
AR application screen
PhV.phsA.
cVal.mag =
5kA
Smart phone
Transformer
Figure 5 Application architecture
MIPRO 2016/DC VIS
335
5
Figure 6 Screenshot of the application in a real world situation
VI.
CONCLUSION
In this paper it has been shown how AR technologies
can be utilized in SAS systems. For the demonstration
purpose, an AR application has been developed based on
utilizing QR markers in order to correlate semantically
annotated process data with real time SCADA data.
It has been shown that combining standards-based
communication and AR technology can provide valueadded features for substation maintenance and regular
inspection and therefore, reducing the amount of required
time and effort for utility maintenance engineers. This
paper presents solution applied in SAS environments.
However, since IEC 61850 semantics covers several new
automation domains it can be easily applied for
subsystems such as DER, wond power plants, etc.
REFERENCES
[1]
S. K. Ong and A. Y. C. Nee, Virtual and augmented reality
applications in manufacturing. Springer Science & Business Media,
2013.
[2]
L. Romualdo Suzuki, K. Brown, S. Pipes, and J. Ibbotson,
Smart building management through augmented reality, in Pervasive
Computing and Communications Workshops (PERCOM Workshops),
2014 IEEE International Conference on, 2014, pp. 105–110.
[3]
H. Regenbrecht, G. Baratoff, and W. Wilke, Augmented
reality projects in the automotive and aerospace industries, Comput.
Graph. Appl. IEEE, vol. 25, no. 6, pp. 48–56, 2005.
336
[4]
T. R. Ribeiro, P. R. J. dos Reis, G. B. Júnior, A. C. de Paiva,
A. C. Silva, I. M. O. Maia, and A. S. Araújo, Agito: Virtual reality
environment for power systems substations operators training, in
Augmented and Virtual Reality, Springer, 2014, pp. 113–123.
[5]
P. R. J. dos Reis, D. L. G. Junior, A. S. de Araújo, G. B.
Júnior, A. C. Silva, and A. C. de Paiva, Visualization of Power Systems
Based on Panoramic Augmented Environments, in Augmented and
Virtual Reality, Springer, 2014, pp. 175–184.
[6]
IEC, Communication Networks and Systems in Substations ALL PARTS, Int. Std. IEC 61850-SER ed1.0, 2011.
[7]
IEC, Communication networks and systems for power utility
automation - Part 7-4: Basic communication structure - Compatible
logical node classes and data object classes, Int. Std. IEC 61850-7-4,
ed2.0, 2010.
[8]
IEC, Wind turbines – Part 25-2: Communications for
monitoring and control of wind power plants – Information models, Int.
Std. IEC 61400-25-2 ed1.0, 2006.
[9]
IEC, Communication networks and systems for power utility
automation - Part 7-420: Basic communication structure - Distributed
energy resources logical nodes, Int. Std. IEC 61850-7-420 ed1.0, 2009.
[10]
IEC, Communication Networks and Systems for Power
Utility Automation - Part 7-410: Hydroelectric Power Plants Communication for Monitoring and Control, Int. Std. IEC 61850-7-410
ed1.0, 2007.
[11]
IEC, Communication networks and systems for power utility
automation - Part 7-2: Basic information and communication structure Abstract communication service interface (ACSI), Int. Std. IEC 618507-2 ed2.0, 2010.
[12]
IEC, Communication networks and systems in substations –
Part 8-1: Specific Communication Service Mapping (SCSM) –
Mappings to MMS (ISO 9506-1 and ISO 9506-2) and to ISO/IEC 88023,IEC Std. IEC 61850-8-1 ed1.0, 2004.
[13]
IEC, Communication networks and systems for power utility
automation - Part 6: Configuration description language for
communication in electrical substations related to IEDs, Int. Std. 618506, ed2.0, 2009.
[14]
Open Source Augmented Reality SDK | ARToolKit.org.
[Online]. Available: http://artoolkit.org/.
[15]
A. Olwal, J. Gustafsson, and C. Lindfors, Spatial augmented
reality on industrial CNC-machines, in Electronic Imaging 2008, 2008,
pp. 680409–680409–9.
[16]
QR code, Wikipedia, the free encyclopedia. 21-Feb-2016.
[17]
K. Ruan and H. Jeong, An augmented reality system using
Qr code as marker in android smartphone, in Engineering and
Technology (S-CET), 2012 Spring Congress on, 2012, pp. 1–3.
[18]
T.-W. Kan, C.-H. Teng, and W.-S. Chou, Applying QR code
in augmented reality applications, in Proceedings of the 8th
International Conference on Virtual Reality Continuum and its
Applications in Industry, 2009, pp. 253–257.
[19]
qrcode 5.2.2 : Python Package Index. [Online]. Available:
https://pypi.python.org/pypi/qrcode.
MIPRO 2016/DC VIS
Innovation of the Campbell Vision Stimulator
with the Use of Tablets
J. Brozek*, M. Jakes ** and V. Svoboda**
*
Department of information technologies, University of Pardubice, Pardubice, Czech republic
Department of software technologies, University of Pardubice, Pardubice, Czech republic
mail@jobro.cz, jakes@asote.cz, svoboda@asote.cz
**
Abstract - The article covers three fundamental themes: a)
performance solutions using gaming to treat multiple eye
defects; in particular - Amblyopia; b) an explanation of the
issue and design of the software (including games) which is
intended for therapeutic or health purposes; and c)
highlighting the modern solutions and the power of software
products for the needs of the health sector, in particular in
the fields of diagnostics and rehabilitation.
The reader can learn basic information about eye diseases
and the principles of their treatment, and become
acquainted with the reasons why video games are
appropriate for rehabilitation.
Very important and beneficial for the reader is the section
which focuses on a) the differences in the standard software
and the healthcare software, b) the high risks associated
with defects of software, or even the risk of side effects with
the so-called „perfect software”, c) the fact that a major
part of software development does not comply with all of the
standards.
health care cannot be arranged as a standard software,
because the risks are too high, and the consequences could
be disastrous. Therefore, the design of the software
(including games) must be executed exceptionally well
and according to valid standards.
B. Amblyopia
Our solution primarily targets Amblyopia, also known
as the lazy eye disease. According to various sources, 34% of the people suffer from this disease. The disease
almost exclusively affects only one eye. It can occur
without symptoms, or with some visible symptoms, such
as strabismus, closing the affected eye, or fear to close the
healthy eye. For more information see [1]; [2], [3] or [4].
The article also discusses the advantages of the software
solution over other methods of rehabilitation. Most of the
paradigms are generally applicable.
Familiarity with the principles of this application can thus
be interesting even for developers in the relevant areas.
I.
INTRODUCTION
We need to introduce the issue from several different
perspectives, such as historical context, scope of the
diseases and how to cure them.
A. Context of the issue
There are 500 million people suffering with one of
many eye defects at the present time. The current level of
human knowledge shows, that 80% of these defects can be
treated. Some can be treated surgically, others using
conservative methods.
Currently however, the option of conservative
treatment (rehabilitation) is dramatically changing. In the
past, a rehabilitation option have been severely limited
and has had unverifiable results. Modern times, however,
brings an ideal device for rehabilitation of eye defects –
the tablet. The combination of the intuitive touch input,
directly on the visualization unit appears to be ideal; but
the level of effectiveness, however, depends on
appropriate software.
Figure 1: Health and Amblyopic vision
The disease occurs in early childhood (during the time
of the development of vision). As a result of a physical
defect, the brain never learns how to properly use signals
from the eye. Over time, the brain can completely
suppress signals from the affected eye. So, amblyopia is a
mixed disorder of eye and brain. According to the current
level of knowledge, treatment options are possible only
until about 12 years of age, in a period when the brain is
still able to get basic sensory perception. However, it
should be noted that some studies suggest that the disease
can be partially corrected in later age. Treatment
principles for Amblyopia are explained in the one of the
following chapters.
However, the problem occurs right at the time of the
design and implementation of the software. Solutions for
MIPRO 2016/DC VIS
337
C. Binocular cooperation
Binocular stereoscopic vision (or cooperation), is
absolutely necessary for people in the modern world.
After the composition of picture from both eyes, the brain
is able to project a third dimension. It means, that when
only using one eye, it is not possible to define a distance
(warning; trying to emulate this state by short time
covering one eye does not work due to the experience of
the past, and previous knowledge of sizes and distances).
Fair binocular cooperation is essential not only for
activities, such as cycling, or driving the vehicle, but also
for simple hand-eye coordination.
The problem with the lack of binocular cooperation
arises generally because of: 1) cooperation never being
learned, or 2) gradually going blind in one eye. It is
possible to say that an age limit for healing Amblyopia is
set primarily by the period, when it is possible to teach the
brain binocular cooperation of both eyes. Rehabilitation
will focus on efforts to practice cooperation; for example
hand-eye coordination for which stereoscopic vision is
necessary. (E.g. threading beads on a thread.)
For more information see [5] or [6].
II.
CURENT THREATMENT METHODS
Amblyopia is a combined (the eye-brain) disease, so
then the treatment must respect that. As a first step it is
necessary to perform a check of the eye and, if
appropriate, remove or correct its physical inadequacy.
This correction rarely needs intervention, and most
patients need only glasses. The second phase of treatment
lies in the practical application of conservative methods
(sometimes referred as rehabilitation). All conservative
methods have a common basis. Before therapy starts, the
patient’s healthy eye is prevented from seeing, by using a
special shield – occluder. This measure, and on the one
hand increases the discomfort of the patient, on the other
hand, it is a way how to force the use of the affected eye.
The brain cannot ignore the signals from the affected eye
for long, and moreover must learn to work with the eye.
Gradually this leads to sharpened vision.
The later phases of treatment require the alternate
involvement of both eyes (to avoid its degeneration). In
the third phase of the deployment comes the treatment of
binocular cooperation.
The treatment itself can be accommodated in a variety
of ways. During the processing of the work we used
primarily [7], [8], [9], [10].
A. Conservative
It is completely conservative principle of treatment. It
is based in the fact that, in addition to the blinding of a
healthy eye no special measures are taken. This is the
oldest and least effective type of treatment. It can only be
effective in cases of slight damage. Its big advantage is
price. The level of discomfort and treatment duration are
major disadvantages.
exercise takes place so that the patient simply looks for
some particular character. The pace of the workout is
determined by the patient. It practices substantial
stimulation of vision and the need to select information.
The slight drawbacks are: 1) quick eye fatigue, 2) it is not
training for various distances, and 3) the exercise requires
a high degree of concentration.
C. A conservative with multimedia, or computer using
Modern times allow with a TV, computer or tablet to
treat patients by a conservative method, but without the
personal risk of being in the physical world. Poor vision
can no longer become a cause of accidents and stumbles.
Using an occluder and viewing the television or computer
reduces risks. On the other hand, narrowing the
rehabilitation method restricts the effect of vision
stimulation, and loses overlap to motoric and the broader
context of eye use (compared with the classic conservative
method). In addition, it eliminated the benefit of
stimulating the eye for multiple distances.
The biggest problem, however, is the very
questionable quality applications (and games) and some
phenomena while watching TV. The method seems
modern and efficient, but already has a number of
disadvantages. It is further extended by the inability to
adapt the pace of the patient. There is thus a real risk that
patient does not improve, or conversely, that there is a
deterioration in the condition of the patient.
D. CAM
Campbell view stimulator, briefly CAM, was
developed in the Soviet union round the year 1960 and
subsequently was upgraded several times. Actually, it was
the first solution that was designed directly by doctors.
Because the custom solution presented in this paper is
based on the CAM, it will be described in more detail.
CAM is a physical (the most times) electromechanical
device that consists of three plates. Far away from the
patient is a rotating plate with black and white squares or
spirals. Closer to the patient is a transparent plate with the
image. The closest part is the glass or foil, on which the
patient can draw. The principle of treatment can be
likened to a smarter coloring book. These coloring pages
are on the moving background. During tracing or
rendering the eye is strongly stimulated (even more than is
stimulated by the view in the real world). An advantage of
the solution is also an implicit connection between
treatment of Amblyopia and strengthening the link
between vision and motorics. CAM (figure 2) is an elderly
solution and current technologies allow its significant
evolution.
B. Search in text
One of the old, but still effective methods is
considered to be a combination of the blinding of a
healthy eye and search in a text (e.g. in the book). The
338
MIPRO 2016/DC VIS
requirements of the market (special chapter focusses on
market needs). The solution is shown at figure 3.
Figure 2: Campbell vision stimulator
III.
OUR SOLUTION
Custom solutions have a historical context and a link
to one of the current core developers. The history of
evolution illustrated, that the solution is proven. During 10
years there were many steps to the current version and this
is first non-medical paper, which explains principles to the
IT community.
The development started when the brother of one of
the developers began to be treated for Amblyopia.
Although the CAM secured the fastest guarantee for
success, the old electromechanical CAM was available
only in the regional hospital, about 25 km away. The
treatment has been complicated by distance, or by the
need to keep the patient in the hospital for several weeks
(the patient was 5 years old). For this reason the first
efforts to create a physical copy of a CAM device started.
Unfortunately, the requirements in the area of health care
in the combination of electronics and mechanics could not
be met. The first generation fails.
The second generation solution came in 2008. The
principle of the devices used a standard LCD monitor,
through which the glass was installed. On the glass, it was
possible to draw and trace. It was a mere evolution of an
electromechanical CAM, which in principle was nothing
new, it just ensured greater availability of treatment.
The turning point came in the year 2011, i.e. in the
period, when tablets in the Czech Republic began to be
affordable. The third generation solution has been
implemented on a Tablet PC. Dimensions touch control,
and availability for individual households changed the
face of treatment. The solution was gradually modified
and finally in 2013 under the name ANNA (Czech female
first name) the technique started to be used in the Czech
Republic and neighboring countries. Pardubice Regional
Hospital, which has co-worked on the development, still
uses this solution. This generation extended the principles
with the after-action review system, and “Heal & Play”.
This version is very popular among child patients.
The last version was released in the year 2015 under
the name Anna II, with a version that in addition to the
requirements of doctors and patients, also reflects the
MIPRO 2016/DC VIS
Figure 3: Software solution on tablet
The development of a custom solution is fixed, but for
a long time without publishing output. The solution won a
few prestigious awards, illustrating that the solution really
works. The application itself has gained 1st prize in the
Business ideas shows consecutively in the years 2014 and
2015. In 2013, it was awarded a special prize by the Dean
(head) of faculty. In 2014 the solution received a special
prize from the Rector (head of the University) for
outstanding achievement in research and innovation. In
2015, our solution was nominated for the forthcoming
research prize given by the National Government of the
Czech Republic.
A. The issue of programming for the needs of the health
care
Each developer knows or feels intuitively, that the
different types of software are subject to various claims;
not only in terms of functionality, but also, for example, in
testing methodologies, in development methodologies, or
even in the restrictions for some of the used algorithms. A
demonstration example, made for the need of teaching, or
other simple web applications, does not require complex
systems design, project management, or structural control.
Often enough, the solution works. However, there are also
a set of issues, on which human life or health depends. In
applications for health care purposes, very strict rules of
programming apply. The defect or unexpected behavior of
such a program should have disastrous consequences.
It is possible to produce relatively simple programs,
which should be developed in a few weeks. However, can
you afford to take such a risk? A poorly designed program
could endanger the health and lives of users. Such a
gamble would be on your conscience if you think only of
the financial situation, and utilize incorrect software.
Solution without standard is not possible to insure. As the
result of the major risks are great demands on the
software. During the development, a series of standards
must be followed, and the correct development procedures
respected (for example, a very often applicable solution is
similar to the methodology of the unified process). All
must be properly documented and for virtually every step
you need to create additional documents. This leads to a
339
situation, where the (programing) code has thousands of
rows and project documentation has a length of ten
thousand rows. Additional ten thousand rows then
requires the creation of tests and their results.
All of the forms and procedures are standardized. For
the design of the software you need to use ISO 90003
[11], for creating health solution ISO13485 [12], ISO
testing 17025 [13] and 29119 [14-16], for the proper team
management and documentation ISO 21500 [17], 12207
[18] and 9000 [19] and you need to know many others of
them.
Software development is so immensely more difficult.
Although the two applications may look the same, but the
cost of solution witch respects all the standards is ten
times more, in comparison to a solution that is not limited
by the standards. In addition, it should be noted, that
standards may limit the methods of programming, or even
restrict the allowed algorithms.
For example, a pragmatic approach teaches the
programmers to use algorithms and data structures from
the character of their average (often amortized)
complexity.
The Splay tree is a data structure with advanced
heuristics, the “move to front” operation improves it
performances in many classes of application, unlike other
forms of search trees. The Quicksort sorting algorithm is
often used for its simplicity and speed.
Unfortunately, the standards do not allow the solution,
which has a very good average complexity, but the scope
focuses on its worst-case complexity (and it is many times
not so good in advanced data structures). Therefore, we
look for a solution in the selection of algorithms according
to the ratio of their performance, and taking into account
the worst-case situation.
Methodology for software testing is described very
comprehensively in standards. All testing must be
conducted against the protocols, and many characteristics
of the device, so that the error was virtually eliminated.
Each test must be carried out at several levels and must
always fulfil the standard conditions of the request. At the
same time, it is necessary to test the substandard
conditions, influenced by the external environment, and of
course the hazards. The software developer must realize
(and document) the following types of tests (minimally):
unit tests, integration tests, acceptance tests, factory tests,
user side tests, integrity tests, tests the stability of the
environment. After completion of the test, records of the
tests much be archived for a minimum of a program
lifetime (standards usually stored 10 years after the
closing of the official aid, and the aid must be provided
normally at least two years after end of app distribution).
Figure 4: Development demands
Successful completion of effective phases of software
tests, from the developer’s side, finalizes the development
part. The issue does not end there. In the next stage, you
should test the software from many other aspects. For
example, our solutions have to go through a whole range
of display tests. A test of maximum dynamic image
changes (high value of dynamic changes of the image
would lead to risk of launch epilepsy bout), normal
contrast test, specific contrast test (contrast during nonstandard lighting) and many others must be measured.
After all of these tests, it is possible, to declare the
prototype as ready for first testing on patients. After a
further lengthy time, the product is also ready for clinical
tests.
B. Base principle
As well as the CAM, our solution is based on active
workplace with homogeneously changing background.
Every background is changing at the same rate (usually a
rotation around a constant Center of rotation), but with
different grain size (resolution) of the background.
Background with smaller grain sizes (DPI-the underlying
shape) creates a simpler solution for the patient.
Background with a higher grit is very challenging.
Patients with higher degrees of disabilities are not able to
work on the environment with a higher grit.
The foreground application varies according to the
selected mode. It is possible to choose the mode of the
coloring book (thus running mode, which emulates the
CAM and brings into the advanced features), or run one of
the many games.
The principle is based on the actual painting book.
Firstly, patient traces the strokes outline with his finger
and thereby draw by selected color. After drawing around
the outlines the patient can select the color and with finger
holding should fill image with the colors. The maximum
therapeutic efficiency is achieved during tracing process.
The rendering is only a complementary part of the
solution to the patient's picture finished. After he finish, he
can Save the picture for future seeing of his progress (of
his accuracy) or for printing.
The coloring pages are equipped with several
interesting solutions, such as snapping to lines with a
scalable tolerance (which brings a higher success rate and
340
MIPRO 2016/DC VIS
higher motivation), after action review system, and
techniques for changing the difficulty of painting.
C. Heal & Play
Heal & Play is a designation used in the Department of
software technology of University of Pardubice, where the
development is for the health sector using the games. The
name is self-explanatory. It is in the process of
registration, but there is no further protection. This does
not serves as a standard, but for the description of program
purpose.
All games are usually very simple and use the same
principle with dynamic background as CAM. The aim of
the games is not provide a comprehensive gaming
experience, but to make the treatment has become a
pleasant and popular activities, at least enough to it is
possible.
The benefits of the games for the whole solution are
clear; the treatment is becoming, to be popular way of
time spending. Thanks to a greater spectrum of the games
is to ensure that the player does not start to be bored with
them. The determination of the appropriate level of
difficulty, length and scaling of variables that affect the
effectiveness of the treatment, are currently the most
effective solution for the rehabilitation equipment in this
category.
IV.
DISCUSSION
Successive experiments, interviews with physicians in
the development of cooperation, or from simple feedback
from users, it is possible to characterize some of the
profiling features of our solution.
A. The availability of the solution
Free interdisciplinary synthesis of proven principles of
the CAM with the programming techniques for Tablet,
delivers the benefits of fantastic availability. The tablets
are currently expanding strongly in most households,
where they are relatively cheap to buy. In addition, the
advantage of the use of tablets is that they have a very
intuitive, and easily controlled by most people.
B. Mobility
Unlike the original CAM, a great advantage of our
solution is united mobility. Treatment can take place by
default; in a treatment facility, or in the comfort of your
home, or even while traveling. Thanks to this property
there has been a dramatic increase in the satisfaction of the
treatment. Especially for small children, it is important
that the principles of comfort and safety at home are
adhered to.
C. The risk of the solution
Each tablet is an electronic bench-mounted unit, which
is certified for its operation, so it is safe to use in hospitals
and homes.
The risks associated with software are eliminated
thanks to the creation of a solution in accordance with all
the standards, thanks to the very high quality of testing.
MIPRO 2016/DC VIS
D. Snapping
The software solution enables you to deploy the
routines that significantly facilitate the use of the software,
or allow you to scale the difficulty of the individual
treatments. An example might be: when a child is asked to
trace an outline. The software allows you to set, for
example, a 0.5 cm tolerance such that the traced line will
perfectly to snap onto the intended target. In a paper book,
or CAM, such implementation is not possible. This
approach has many advantages. The patient feels a greater
degree of progress, the overall appearance of the coloring
book is better and has easier Diagnostics. Tolerance,
however, should be used with caution, so that the error is
progressively reduced, therefore, with a view to enhancing
the enforcement of claims on the patient.
E. Fun
Implementation of games into a certified therapeutic
device has produced fantastic results. The children, on
which the software is targeted itself, began take a
treatment as fun.
They themselves repeatedly asked for healing cycle. It
removes the problem of kids refusing treatment, because it
is fun. It is another benefits of our solution, because it
could combine combine the high efficiency of treatment
with great fun ratio.
Most of the games with this principle are very simple,
to do not disturb the heal principle. But they are set, to
increase their difficulty. Patient can very easy not succeed
when playing games. Just make the games challenging is
more important than their high-quality graphic design
(graphics design with full textures could very easily
degrade the therapeutic effect of the treatment).
F. The secondary effects of treatment
Using the tablet as an I/O device and display unit,
creates certain benefits for the user. Unlike other solutions
(e.g. almost all conservative with the use of information
technology) already used during the treatment of
amblyopia, occurs spontaneously with the strengthening
of ties between the vision and the motoric abilities. The
binding of the eye and the hand is created in such a
pleasant manner that may lead to skiping of some
specialized rehabilitation treatments.
G. Facilitate the evaluation of the impact of treatment
The after action review system is very convenient for
patients and doctors. By storing statistics can be very
closely monitored the development of treatment for every
individual patients. Thanks to this you can set them the
appropriate therapeutic schemes, taking into account the
needs of each individual patient. The system is also
running on separate devices. These are used with longterm medical supervision. In this case, there are automatic
processes, which recommend optimal settings according
to patient degree of disability.
V.
CONCLUSION
The article introduced the principles that are used to
treat Amblyopia. The paper, however, tried to capture
enough of the software development process for health
care, so that it could be most beneficial for the reader. The
341
reader should obtain information on the issue of the
development of solutions for the health sector, the
demanded steps and also get some inspiration for their
own solution.
The entire article is trying to show that through the
games on the Tablet you can achieve fantastic results and
improve the lives of thousands of people. However, the
development of the application, which appears to be
relatively simple, is extremely challenging, because it is
not easy to meet all the standards. The complexity,
however, would not deter developers; since the standards
are built so as to protect users and in fact even the
developers.
[6]
[7]
[8]
[9]
[10]
[11]
ACKNOWLEDGMENT
[12]
The paper submission is particularly financed from
Students grant prize at university of Pardubice.
[13]
REFERENCES
[14]
[1]
[2]
[3]
[4]
[5]
342
A. Fielder, Amblyopia: a multidisciplinary approach. Oxford:
Butterworth-Heinemann, 2001.
K. J. Ciuffreda, D. M. Levi and A. Selenow, Amblyopia: basic and
clinical aspects. Boston: Butterworth-Heinemann, 1991.
J. Osman, Amblyopia. Bolinas, CA: Avenue B, 1993.
M. Schapero, Amblyopia, 1st ed., Philadelphia: Chilton Book Co,
1971.
D. J. Getz, Strabismus and amblyopia, CA: Optometric Extension
Program, 1990.
[15]
[16]
[17]
[18]
[19]
P. H. Spigel, Handbook of Pediatric Strabismus and Amblyopia.
New York: Springer Science Business Media, Inc, 2006.
S. W. Ambler and L. L. Constantine, The Unified process
transition and production phase: best practices in implementing
the UP. Lawrence, Kan.: CMP Books, Masters collection
(Lawrence, Kan.), 2002.
R. Cimler, J. Matyska and V. Sobeslav, Cloud based solution for
mobile healthcare application. In: Proceedings of the
18thInternational Database Engineering. New York, New York,
USA: ACM Press, 2014.
G. Divisova, Strabismus. 2.,. Praha: Avicenum, 1990.
P. Kroll and P. Kruchten, The rational unified process made easy:
a practitioner's guide to the RUP. Boston: Addison-Wesley,
c2003.
ISO/IEC 90003:2014, Software engineering -- Guidelines for the
application of ISO 9001:2008 to computer software
ISO 13485:2003, Medical devices -- Quality management systems
-- Requirements for regulatory purposes
ISO/IEC 17025:2005, General requirements for the competence of
testing and calibration laboratories
ISO/IEC/IEEE 29119-1:2013, Software and systems engineering –
Software testing -- Part 1: Concepts and definitions
ISO/IEC/IEEE 29119-2:2013, Software and systems engineering –
Software testing -- Part 2: Test processes
ISO/IEC/IEEE 29119-3:2013, Software and systems engineering –
Software testing -- Part 3: Test documentation
ISO 21500:2012, Guidance on project management
ISO/IEC 12207:2008, Systems and software engineering -Software life cycle processes
ISO 9000:2005, Quality management systems -- Fundamentals
and vocabulary
.
MIPRO 2016/DC VIS
Classification of Scientific Workflows Based on
Reproducibility Analysis
A. Bánáti1, P. Kacsuk2,3 and M. Kozlovszky1,2
Óbuda University, John von Neumann Faculty of Informatics, Biotech Lab
Bécsi str. 96/b., H-1034, Budapest, Hungary
2
MTA SZTAKI, LPDS, Kende str. 13-17, H-1111, Budapest, Hungary
3
University of Westminster, 115 New Cavendish Street, London W1W 6UW
{banati.anna, kozlovszky.miklos}@nik.uni-obuda.hu,
kacsuk@sztaki.mta.hu
1
Abstract - In the scientist’s community one of the most vital
challenges is the reproducibility of a workflow execution. The
necessary parameters of the execution (we call them
descriptors) can be external which depend on for example the
computing infrastructure (grids, clusters and clouds), on
third party resources or it can be internal which belong to the
code of the workflow such as variables. Consequently, during
the process of re-execution these parameters may change or
become unavailable and finally they can prevent to reproduce
the workflow. However in most cases the lack of the original
parameters can be compensated by replacing, evaluating or
simulating the value of the descriptors with some extra cost
in order to make it reproducible. Our goal in this paper is to
classify the scientific workflows based on the method and cost
how they can become reproducible.
I.
INTRODUCTION
In large computational challenges scientific workflows
have emerged as a widely accepted solution for performing
in-silico experiments. In general these in-silico experiments
consist of series of particularly data and compute intensive
jobs, and in most cases their executions require parallel and
distributed infrastructure (super/hypercomputers, grids,
clusters, clouds).
The successive steps of an experiment are chained to a
so called workflow, which can be represented by a directed
acyclic graph (DAG). The nodes are so called jobs, which
includes the experimental computations based on the input
data accessed through their input ports. In addition, these
jobs can product output data, which can be forwarded
through their output ports to the input port of the next job.
The edges of a DAG represent the dataflow between the
jobs (Figure 1.).
An essential part of the scientific method is to repeat
and reproduce the experiments of other scientist and test the
outcomes themselves even in a different execution
environment. A scientific workflow is reproducible, if it
can be re-executed without failures and gives the same
result as the first time. In this approach the failures do not
mean the failures of the Scientific Workflow Management
System (SWfMS) but the correctness and the availability of
the inputs, libraries, variables etc. Different users for
different purposes may be interested in reproducing the
workflow,
for
example
the
MIPRO 2016/DC VIS
Figure 1. Workflow example with four jobs (J1, J2, J3, J4)
authors of the workflow(s) in order to prove their results,
readers or other scientists in order to reuse the results or
reviewers in order to verify the correctness of the results
[1]. Additionally, nowadays scientific workflow
repositories are already available and in this way the
scientists can share their results with each other and even
they can reuse the existing workflows to create new ones.
The two most significant obstacles of reproducing a
workflow are the dependencies of workflow execution and
the rich collection of provenance data. The former can be
perceived as the necessary and the latter one as the
satisfactory requirements of the reproducibility. The
dependencies of the execution mean those resources which
require external (out of the scientific workflow
management system, SWfMS) services or resources such
as third party services, special hardwares/softwares or
random value generator [2]. Elimination of these
dependencies in most cases is not possible, so they have to
be handled in some other way: different methods should be
set up to make the workflows reproducible.
To achieve our goal we have defined the descriptor
space and the decay-parameters of the jobs that give us the
possibility to analyze the workflow from a reproducibility
perspective. The descriptor space contains all the
parameters (call descriptors), which are necessary to
reproduce the workflow. There are descriptors, which are
constant and do not change in time. Other descriptors are
continuously changing (for example a database which
continuously get more and more data from sensor
networks). Also descriptors based on external services
(such as third party services) may exist which can be
unavailable after a few years. Finally there are descriptors
which are unknown and its behavior is unpredictable. In
343
this case the workflow is non-reproducible. The decayparameter describes the type and the measure of the change
of the descriptor. With the help of the decay-parameter we
have determined five categories of the workflows:
reproducible, reproducible with extra cost, approximately
reproducible, reproducible with probability P and nonreproducible.
The goal of our investigation to find different methods
to make reproducible the workflow in the different
categories even if it requires extra costs or compromises. In
certain cases this goal is implementable but often the result
of the workflow is only evaluable with the help of
simulations. If there is no method to make the workflows
reproducible, our goal is to provide the scientists with
useful information about the conditions and probability of
the reproducibility of his workflows.
The rest of the paper is organized as follows: In the next
section we provide a short background and overview about
works related to our research. Section 3 presents the
mathematical model of our reproducibility analysis. In
section 4 we give the classification of the scientific
workflows based on our analysis. In section 5 based on our
model we define the general measures of the
reproducibility analysis. Finally we summarize our
conclusions and reveal the potential future research
directions.
II.
STATE OF THE ART
Currently the reproducibility of scientific workflows is
a burning question which the scientist community has to
face with and has to solve. Accordingly in the latter onetwo years many researchers investigate this issue. One part
of the literature analyzes the requirements of
reproducibility and the other part deals with the
implementation of such tools or frameworks.
The first group agree on the importance of the careful
design [3], [4], [5], [6], [7] which on one hand means the
increased robustness of the scientific code, for example
with a modular design and detailed description of the
workflow, and of the input and output data examples, and
consequent annotations [8]. On the other the careful design
includes the careful usage of volatile third party or special
local services. In these cases two solutions exist, but
reproducibility is uninsurable: 1. taking a digital copy of the
entire
environment
using
a
system
virtual
machine/hardware virtualization approach capturing and
storing metadata about the code and environment that
allows it to be recreated later [8].
Zhao et al. [9] in their paper investigate the cause of the
so called workflow decay, which means that year by year
the ability and success of the re-execution of any workflow
significantly reduces. They examined 92 Taverna
workflows submitted in the period from 2007 to 2012 and
found four major causes: 1. Missing volatile third party
resources 2. Missing example data 3. Missing execution
environment (requirement of special local services) and 4.
Insufficient descriptions about workflows. Hettne et al. [10]
in their paper list ten best practice to prevent the workflow
decay. Grothe et al. [11] analyze the characteristic of
applications used by workflows and list the requirements in
order to enable the reproducibility of results and
344
determination of provenance. To the former mentioned
requirements they assumed the deterministic feature of
applications in order to perform appropriate provenance
collection.
There exist available tools, VisTrail, ReproZip or
PROB [12], [13], [14], which allow the researcher and
scientist to create reproducible workflow. With help of
VisTrail [12], [15] reproducible paper can be created,
which includes not only the description of scientific
experiment, but all the links for input data, applications and
visualized output which always harmonizes with the
actually applied input data, filter or other parameters.
ReproZip [13] is another tool, which stitches together the
detailed provenance information and the environmental
parameters into a self-contained reproducible package.
The Research Object (RO) approach [16], [17] is a new
direction in this research field. RO defines an extendable
model, which aggregates a number of resources in a core or
unit. Namely a workflow template; workflow runs obtained
by enacting the workflow template; other artifacts which
can be of different kinds; annotations describing the
aforementioned elements and their relationships.
Accordingly to the RO, the authors in [18] also investigate
the requirements of the reproducibility and the required
information necessary to achieve it. They created
ontologies, which help to uniform these data. These
ontologies can help our work and give us a basis to perform
our reproducibility analysis and make the workflows
reproducible despite their dependencies.
Piccolo et al [19] collected the tools and techniques and
proposed six strategies which can help the scientist to create
reproducible scientific workflows.
Santana-Perez et al [20] proposed an alternative
approach to reproduce scientific workflows which focused
on the equipment of a computational experiment. They
have developed an infrastructure-aware approach for
computational execution environment conservation and
reproducibility based on documenting the components of
the infrastructure.
To sum up the results mentioned above, we can
conclude that the general approach is that the scientist has
to create reproducible workflows with careful design,
appropriate tools and strategies. But none of them intended
to solve the problem related to the dependencies rather they
suggested to bypass them. Moreover, they did not deal with
the following question: How an existing workflow can be
made reproducible?
III.
THE MODEL
In our approach a scientific workflow consisted of N
jobs can be written as a function of its job:
𝑆𝑊𝑓(𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝐑
(1)
where R is the vector of results.
In our investigation we assume, that a given workflow
is executed at least one time and the provenance database
of the workflow execution is available. In this case we can
assign a so called descriptor space to every job of the given
workflow.
MIPRO 2016/DC VIS
𝐷𝐽𝑖 = {𝑑𝑖1 , 𝑑𝑖2 , … , 𝑑𝑖𝐾𝑖 }
(2)
The elements of this descriptor space are called
descriptors and they give all the necessary parameters to
reproduce the job. These parameters can be for example
variables of the infrastructure, variables of the code,
parameters of system calls, inputs, outputs and partial data
or access paths of external resources etc [21]. Every
descriptor has a name and a value. In addition we also
assign them a so called decay-parameter which describes
the type and the measure of the change of the given value.
The decay-parameter can be zero, which means that the
value of this descriptor is not changing in time, in other
word the availability of this descriptor (and its value) can
be insured in one, two, ten or any years. In this case this
descriptor does not cause dependency and the
reproducibility of the job does not depend on this
descriptor. The decay parameter can be infinite, if the
descriptor’s value is unknown. For example in case of
random generated values. The value of the decay-parameter
can be a distribution function F(t) if the availability of the
given resource varies in time according to this F(t). The
fourth option is that the value of the decay parameter is a
function – vary(t, x) – depending on time, which determines
the variation of the descriptor’s value.
Formally:
𝑑𝑒𝑐𝑎𝑦(𝑣𝑖 ) =
𝟎, if the value of the descriptor is
not changing in time
∞,
if the value of the descriptor
is unknown
𝑭𝒊 (𝒕),
distribution function of the
availability of the given value
TABLE 1.
Descriptor’s
name
Descriptor’s
value
Decay-parameter
d1
v1(d1)
decay(v1)
c1
d2
v2(d2)
decay(v2)
c2
…
…
…
…
dK
vK(dK)
decay(vK)
cK
The descriptors and its decay parameters can originate
from three different sources: from the users, from the
provenance database and it can be automatically generated
by the SWfMS. [21]
IV.
=𝐽𝑂𝐵𝑖 (𝑡0 + ∆𝑡, 𝑣𝑖1 (𝑑𝑖1 ), 𝑣𝑖2 (𝑑𝑖2 ), … , 𝑣𝑖𝐾𝑖 (𝑑𝑖𝐾𝑖 )) = 𝑹𝒊
(4)
for every ∆t.
In addition if a scientific workflow contains N jobs and
the jobs are reproducible, the scientific workflow is also
reproducible:
𝑆𝑊𝐹(𝑡0 , 𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝑆𝑊𝐹(𝑡0 + ∆𝑡, 𝐽1 , 𝐽2 , … , 𝐽𝑁 ) = 𝐑
(5)
for every ∆t.
Also we can assign a cost to the descriptors. This gives
the measurement of the “work” or cost which is necessary
MIPRO 2016/DC VIS
THE CLASSIFICATION
Analyzing the decay parameters of the descriptors we
can classify the scientific workflows. First, we can separate
the workflows which decay-parameters for all the jobs are
zero. These workflows are reproducible at any time and any
circumstance since they do not have dependencies. Than
we can determine those ones which can influence the
reproducibility of the workflow in other words which also
have non-zero decay parameter(s). Four groups have been
created:
1.
At least one decay-parameter of the descriptor is
infinite, but with the help of additional resources or
tools this dependency of execution can be
eliminated. In this case the cost of this descriptor
indicates that there are possibility to reproduce the
job with some extra cost.
2.
At least one decay-parameter of the descriptor is
infinite and the cost of this descriptor is also
infinite. In this case the dependency of the
workflow can not be eliminated and the workflow
is non-reproducible.
3.
At least one decay-parameter of the descriptor is a
probability distribution function and the other ones
are zero.
4.
At least one decay-parameter of the descriptor is a
vary function and the other ones are zero. (Table
2.)
With the help of these expressions we can define the
reproducibility in the following way:
Definition: The Ji job is reproducible, if
𝐽𝑂𝐵𝑖 (𝑡0 , 𝑣𝑖1 (𝑑𝑖1 ), 𝑣𝑖2 (𝑑𝑖2 ), … , 𝑣𝑖𝐾𝑖 (𝑑𝑖𝐾𝑖 )) =
Cost
to make the job reproducible. For example, when the value
of the descriptor is a large amount of data which cannot be
stored even on extra storage. We can assign a cost to this
extra storage. Or another example if the descriptor is
changing in time and its decay-parameter is a so called
“vary function”. In this case to reproduce this workflow we
can apply simulation tools based on the sample set which
also result an extra cost (see section IV.A).
𝑽𝒂𝒓𝒚𝒊 (𝒕, 𝒗𝒊 ),
if the value of the
descriptor is changing
{
in time
(3)
THE DESCRIPTOR SPACE OF A JOB AND ITS
MEASURES
A. Reproducible workflows
The first group represents the reproducible workflows.
In this case all the decay-parameters of all the jobs belonged
to a workflow are zero. These workflows are reproducible
and they can be executed and re-executed at any time and
any circumstance since they are not influenced by
dependencies.
B. Reproducible workflow with extra cost
There are workflows, which have dependencies and
infinite decay-parameters, but the appropriate cost is not
infinite. In this case with the help of additional resources or
tools these dependencies can be eliminated. For example, if
345
a computation is based on random generated value, this
descriptor’s value is unknown (infinite). In this case with
the help of an extra, operation system level tool we can
capture the return value of the system call and we can save
it in the provenance database [22]. The third example is
when a virtualization tool, such as a virtual machine have
to be applied to reproduce the workflow.
C. Approximetly reproducible workflows
In certain cases the workflow execution may depend on
some continuously changing resource. For example there
are continuously growing databases which get the data
from sensor networks without intermission. If the
computation of a workflow use some statistical parameters
of this database, the statistical values never will be the
same. In this case the appropriate descriptor’s value of the
given job may change on occasion of every re-execution,
consequently the reproducibility of this workflow could be
failed.
If the workflow was executed S times and the
provenance database is available, we can create a sample
set which contains the S different values of the changing
descriptors and the S results of the workflow. In this case
we can analyze the change of the descriptor’s value, we can
write its function and even, we can determine a general
evaluating method of the result. On occasion of a later reexecution, if reproducing is not possible, this evaluating
method can be applied and an evaluated result can be done
with a given probability [22].
D. Reproducible workflows with a given probability
Many investigations revealed the problem caused by
volatile third party resources […], when the reproducibility
of workflows became uncertain. The third party services or
any external resources can be unavailable during the years.
If we know this decay of the resources and if we can
determine its probability distribution function we can
predict the behavior of the workflow on occasion of a reexecution at a later time. Sometime the users may have to
know the chance of the reproducibility of their workflow.
Assuming that the probability distribution of the third party
service is known or assumable we can inform the users
about the expected probability of the reproducibility.
To formalize the problem, first, we have separated the
Mi descriptors of a given job Ji which depend on external or
third party resources and its decay-parameter, which is a
probability distribution function given as follows:
𝐹𝑖1 (𝑡), 𝐹𝑖2 (𝑡), … , 𝐹𝑖𝑀𝑖 (𝑡). The rest of the descriptors have
zero decay-parameter. In this case, at time t0, a given
descriptor’s value 𝑣𝑖𝑗 (𝑑𝑖𝑗 ) is available with a given
probability (for the sake of the easier comprehensibility
hereafter we omitted the i index referred to the ith job of a
given scientific workflow):
(𝑡 )
(𝑡 )
(𝑡 )
𝐹1 (𝑡0 ) = 𝑝1 0 , 𝐹2 (𝑡0 ) = 𝑝2 0 , … , 𝐹𝑀 (𝑡0 ) = 𝑝𝑀 0 (6)
Let us assign to the job Ji a state vector 𝐲𝒊 =
(𝑦𝑖1 , 𝑦𝑖2 , … , 𝑦𝑖𝑀𝑖 ) ∈ {0,1}𝑀𝑖 , in which the 𝑦𝑖𝑗 = 1 , if the
jth descriptor of the job Ji is unavailable. In this way the
probability of a given yi state vector can be computed as
follows:
𝑦𝑗
1−𝑦𝑗
𝑝(𝑦) = ∏𝑀
𝑗=1 𝑝𝑗 (1 − 𝑝𝑗 )
346
(7)
TABLE 2. CLASSIFICATION OF SCIENTIFIC WORKFLOWS
decay-parameter
cost
category
decay(v)=0
cost = 0
reproducible
decay(v) = ∞
cost = ∞
decay(v) = ∞
cost = C1
decay(v) = F(t)
cost = C2
decay(v) = vary(t,v)
cost = C3
non-reproducible
reproducible with extra
cost
reproducible with
probability P
approximately
reproducible
In addition a time interval can be given during which the
descriptor is available with a given probability P.
Since we assume the independency of the descriptors the
cumulative distribution function of the job Ji can be written
as follows:
𝐅𝑖 (𝑡) = ∏𝑀
𝑗=1 𝐹𝑖𝑗 (𝑡)
(8)
E. Non-reproducible workflows
There is no method to make the workflow reproducible.
In this case the scientific workflow probably contains
non-deterministic job or jobs.
V.
REPRODUCIBILITY ANALYSIS
It may be important to inform the user about the
reproducibility of his workflow or even the cost of the
reproducibility. Based on our mathematical model we can
determine two measures according to the expected cost: the
average cost and the reproducibility probability.
1. Average Cost (AC) expressed as
(9)
𝐸(𝑔(𝐲)) = ∑𝑦∈𝑌 𝑔(𝐲)𝑝(𝐲)
where 𝑔(𝐲) = ∑𝐾
𝑖=1 𝑐𝑖 .
2. Reproducibility Probability (RP)
(10)
𝑃(𝑔(𝐲) > 𝐶) = ∑𝑌:𝑔(𝐲)>𝐶 𝑝(𝐲)
where C is a given level of the reproducibility cost.
VI.
CONLUSION
In this paper we investigated the possible types of the
scientific workflows from a reproducibility perspective.
The basis of our analysis is the decay-parameter which
describes the type and the measure of the change of the
descriptor’s values. According to this parameter we
determined a cost function which means the “work”
required to reproduce the given job or workflow. In this
way we could classify the scientific workflows, how they
can be reproduced at a later time. In the different categories
we set up methods to make the workflows reproducible or
we gave the probability and the extra cost of the
reproducibility. Finally we gave two general measure to
evaluate the expected cost of the reproducibility.
The goal of our research is to support the scientists with
methods to make their experiment reproducible and to
provide information about the possibility to reproduce their
workflows.
MIPRO 2016/DC VIS
REFERENCES
[1]
D. Koop, E. Santos, P. Mates, T.Vo Huy, P Bonnet, B. Bauer,
M.Troyer, D.N. Williams, J.E. Tohline, J. Freire, C.T. Silva, „A
Provenance-Based Infrastructure to Support the Life Cycle of
Executable Papers”, Internatioonal Conference on Computational
Science,
ICCS
2011.
[Online].
Available:
http://www.sciencedirect.com.
[2] A. Banati, P. Kacsuk, M. Kozlovszky, M. Four level provenance
support to achieve portable reproducibility of scientific workflows.
In Information and Communication Technology, Electronics and
Microelectronics (MIPRO), 2015 38th International Convention on
(pp. 241-244). IEEE.
[3] P. Missier, S. Woodman, H. Hiden, és P. Watson, „Provenance and
data differencing for workflow reproducibility analysis”,
Concurrency and Computation: Practice and Experience, 2013
[4] R. D. Peng, „Reproducible Research in Computational Science”,
Science, köt. 334, sz. 6060, o. 1226–1227, dec. 2011
[5] J. P. Mesirov, „Accessible Reproducible Research”, Science, köt.
327, sz. 5964, o. 415–416, jan. 2010.
[6] D. De Roure, K. Belhajjame, P. Missier, J. M. Gómez-Pérez, R.
Palma, J. E. Ruiz, K. Hettne, M. Roos, G. Klyne, C. Goble, és
others, „Towards the preservation of scientific workflows”, in
Procs. of the 8th International Conference on Preservation of
Digital Objects (iPRES 2011). ACM, 2011.
[7] S. Woodman, H. Hiden, P. Watson, és P. Missier, „Achieving
reproducibility by combining provenance with service and
workflow versioning”, in Proceedings of the 6th workshop on
Workflows in support of large-scale science, 2011, o. 127–136.
[8] A. Davison, „Automated Capture of Experiment Context for Easier
Reproducibility in Computational Research”, Computing in
Science & Engineering, köt. 14, sz. 4, o. 48–56, júl. 2012.
[9] J. Zhao, J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. GarciaCuesta, A. Garrido, K. Hettne, M. Roos, D. De Roure, és C. Goble,
„Why workflows break—Understanding and combating decay in
Taverna workflows”, in E-Science (e-Science), 2012 IEEE 8th
International Conference on, 2012, o. 1–9.
[10] K. M. Hettne, K. Wolstencroft, K. Belhajjame, C. A. Goble, E.
Mina, H. Dharuri, D. De Roure, L. Verdes-Montenegro, J. Garrido,
és M. Roos, „Best Practices for Workflow Design: How to Prevent
Workflow Decay.”, in SWAT4LS, 2012
[11] P. Groth, E. Deelman, G. Juve, G. Mehta, és B. Berriman,
„Pipeline-centric provenance model”, in Proceedings of the 4th
MIPRO 2016/DC VIS
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Workshop on Workflows in Support of Large-Scale Science, 2009,
o. 4.
J. Freire, D. Koop, F. S. Chirigati, és C. T. Silva, „Reproducibility
Using
VisTrails”,
2014.
[Online].
Available:
http://citeseerx.ist.psu.edu/viewdoc/download doi:10.1.1.369.9566
F. S. Chirigati, D. Shasha, és J. Freire, „ReproZip: Using
Provenance to Support Computational Reproducibility.”, in TaPP,
2013.
V. Korolev, A. Joshi, V. Korolev, M. A. Grasso, A. Joshi, M. A.
Grasso, D. Dalvi, S. Das, V. Korolev, Y. Yesha, és others, „PROB:
A tool for Tracking Provenance and Reproducibility of Big Data
Experiments.”, Reproduce’14. HPCA 2014, köt. 11, o. 264–286,
2014.
D. Koop, J. Freire, és C. T. Silva, „Enabling Reproducible Science
with VisTrails”, arXiv preprint arXiv:1309.1784, 2013.
O. Belhajjame, K. Corcho, D. Garijo, J. Zhao, P. Missier, D. R.
Newman, R. Palma, S. Bechhofer, G. C. Esteban, J. M. GomezPerez, G. Klyne, K. Page, M. Roos, J. E. Ruiz, S. Soiland-Reyes, L.
Verdes-Montenegro, D. De Roure, and C. Goble. Workflow-centric
research objects: First class citizens in scholarly discourse. In
Proceedings of the ESWC2012 Workshop on the Future of
Scholarly Communication in the Semantic Web, 2012
Bechhofer S., De Roure D., Gamble M., Goble C., Buchan I.:
Research objects: Towards exchange and reuse of digital
knowledge. In: he Future of the Web for Collaborative Science,
2010.
Belhajjame K., Zhao J., Garijo D., Gamble M., Hettne K., Palma
R., Goble C.: Using a suite of ontologies for preserving work
ow-centric research objects. In: Web Semantics: Science, Services
and Agents on the World Wide Web, 2015.
Piccolo S.R., Lee A.B., Frampton M.B.: Tools and techniques for
computational reproducibility. In: bioRxiv, vol. 022707, 2015.
Santana-Perez I., Prez-Hernndez M.S.: Towards Reproducibility in
Scientific Workflows: An Infrastructure-Based Approach. In:
Scientific Programming, vol. 2015, p. 11, 2015.
Banati A., Kacsuk P., Kozlovszky M.: Minimal sufficient
information about the scientific workflows to create reproducible
experiment. In: IEEE 19th International Conference on Intelligent
Engineering systems (INES), Slovakia, 2015.
Banati A., Kacsuk P., Kozlovszky M.: Reproducibility analysis of
scientific workflows; Acta Politechnica Hungarica, unpublished
347
Dynamic Execution of Scientific Workflows in
Cloud
E. Kail , J. Kovács , M. Kozlovszky1,2 and P. Kacsuk
1
2
2,3
Óbuda University, John von Neumann Faculty of Informatics, Biotech Lab
Bécsi str. 96/b., H-1034, Budapest, Hungary
2
MTA SZTAKI, LPDS, Kende str. 13-17, H-1111, Budapest, Hungary
3
University of Westminster, 115 New Cavendish Street, London W1W 6UW
{kail.eszter, kozlovszky.miklos}@nik.uni-obuda.hu,
{jozsef.kovacs, kacsuk}@sztaki.mta.hu
1
Abstract - Scientific workflows have emerged in the past
decade as a new solution for representing complex scientific
experiments. Generally, they are data and compute
intensive applications and may need high performance
computing infrastructures (clusters, grids and cloud) to be
executed. Recently, cloud services have gained widespread
availability and popularity since their rapid elasticity and
resource pooling, which is well suited to the nature of
scientific applications that may experience variable demand
and eventually spikes in resource. In this paper we
investigate dynamic execution capabilities, focused on fault
tolerance behavior in the Occopus framework which was
developed by SZTAKI and was targeted to provide
automatic features for configuring and orchestrating
distributed applications (so called virtual infrastructures)
on single or multi cloud systems.
I.
INTRODUCTION
Over the last few years, cloud computing has emerged as
a new model of distributed computing by offering
hardware and software resources as virtualization-enabled
services. Cloud providers give application owners the
option to deploy their application over a network with a
virtually infinite resource pool with modest operating and
practically no investment costs. Today, cloud computing
systems follow a service-driven, layered software
architecture model, with Software as a Service (SaaS),
Platform as a Service (PaaS), and Infrastructure as a
Service (IaaS). In this paper we are primarily focusing on
IaaS cloud services. In an IaaS environment the CPU,
storage, and network resources are supplied by a
collection of data centers installed with hundreds to
thousands of physical resources such as cloud servers,
storage repositories, and network backbone. It is the task
of the cloud orchestrator to select the appropriate
resource for an initiated application or service executed in
the cloud.
Due to their rapid elasticity and almost infinite
resource pooling capabilities cloud services have also
gained widespread popularity for enacting scientific
experiments. Scientific experiments are widely used in
most scientific domains such as bioinformatics,
earthquake science, astronomy, etc. In general they consist
348
of multiple computing tasks that can be executed on
distributed and parallel infrastructures.
Scientific workflows are used to model these scientific
experiments at a high level abstraction. They are
graphically represented by Directed Acyclic Graphs
(DAGs), where the nodes are the computing tasks and the
edges between them represent the data or control flow.
Since these experiments mostly require compute and data
intensive tasks the execution of scientific workflows may
last for even weeks or months, and may manipulate even
terabytes of data. Thus scientific workflows should be
executed in a dynamic manner in order to save energy and
time.
Dynamic execution has three main aspects: fault
tolerance, intervention and optimization techniques. Fault
tolerance means to continue the execution with the
required SLA even in the presence of failures, or to adapt
to new situations and actual needs during runtime. Since
scientific workflows are mainly explorative by nature
scientists often need to monitor the execution, to get
feedback about the status of the execution and to interfere
with it. Intervention by the scientist, workflow developer
or the administrator may also be needed in a planned or in
an ad-hoc manner. The third aspect of dynamic execution
concerns with optimization mechanisms, such as
performance, budget, time or power optimization
techniques.
In this paper we are investigating the possibilities of
executing scientific workflows dynamically in Occopus,
we examine the required extensions to provide a reliable
service for workflow orchestration and propose a fault
tolerant mechanism which is based on the workflow
structure and replication technique.
Occopus [7] is a newly introduced framework,
developed by the Hungarian SZTAKI and was targeted to
provide automatic features for configuring and
orchestrating distributed applications on single or multi
cloud systems.
Our paper is structured as follows: in the next section
we give a brief overview about the related work on
communication middleware used in distributed systems
and on fault tolerance in cloud. In section III we introduce
Occopus framework in a more detailed fashion. In section
MIPRO 2016/DC VIS
IV we analyze the possibility of executing workflows in
Occopus and section V introduces our solution in detail.
Finally in the last section we give a brief insight into our
fault tolerant proposal and the conclusion closes our work.
II.
RELATED WORK
A. Communication middleware in the cloud
As distributed applications transcending geographical and
organizational boundaries the demands placed upon their
communication infrastructures will increase exponentially.
Modern systems operate in complex environments with
multiple programming languages, hardware platforms,
operating systems and the requirement for dynamic
deployments, and reliability while maintaining a high
Quality-of-Service (QoS).
Cloud orchestration in general means to build up and
manage interconnections and interactions between
distributed services on single or multi cloud systems. At
first the orchestrator allocates the most appropriate
resource for a job from a resource pool, than monitors the
functioning of the resource with a so called heartbeat
mechanism. However, this mechanism only gives
feedback about the physical status of the resources (CPU,
memory usage, etc.) it cannot provide reliability in
communication and data sharing.
Concerning scientific workflows the main challenge is to
provide high availability, reliable communication, fault
tolerance and SLA based service. In most scientific
workflow management system a special middleware is
responsible to maintain the connection, and data
movement between the distributed services and to
schedule the tasks according to available resources and
predefined constraints (data flow model).
To provide a reliable, flexible and scalable
communication between the services a suitable
communication middleware is needed.
Communication middlewares can be categorized as
Remote Procedure Call (RPC) oriented Middleware,
Transaction-Oriented Middleware (TOM), ObjectOriented/Component middleware (OOCM) and MessageOriented Middleware (MOM) [6].
RPC oriented Middleware is based on a client-server
architecture and provides remote procedure calls through
APIs. This kind of communication is synchronous to the
user, since it waits until the server returns a response thus
it does not enable a scalable and fault tolerant solution for
workflows [5].
A Transaction-Oriented Middleware (TOM) is used to
ensure the correctness of transaction operations in a
distributed environment. It is primarily used in
architectures built around database applications [14].
TOM supports synchronous and asynchronous
communication among heterogeneous hosts, but due to its
redundancies and control information attached to the pure
data for ensuring high reliability, it results in low
scalability in both the data volume that can be handled,
and in the number of interacting actors.
An Object-Oriented/Component Middleware (OOCM) is
based on object-oriented programming models and
MIPRO 2016/DC VIS
supports distributed object request. OOCM is an
extension of Remote Procedure Calls (RPC), and it adds
several features that emerge from object-oriented
programming languages, such as object references,
inheritance and exceptions. These added features make
OOCM flexible, however this solution enables still
limited scalability.
A Message-Oriented Middleware (MOM) allows
message passing across applications on distributed
systems. A MOM provides several features such as:
asynchronous and synchronous communication
mechanisms
data format transformation (i.e. a MOM can
change the format of the data contained in the
messages to fit the receiving application [16])
loose coupling among applications
parallel processing of messages
support for several levels of priority.
Message passing is the ancestor of the distributed
interactions and one of the realizations of MOM. The
producer and the consumer are communicating via
sending messages. The producer and the consumer are
coupled both in time and space; they must both be active
at the same time. The consumer receives messages by
listening synchronously on a channel and the recipient of
a message is known to the sender.
Message queues are newer solutions for MOM, where
messages are concurrently pulled by consumers, as well
as a subscription based exchange solution, allowing
groups of consumers to subscribe to groups of publishers,
resulting in a communication network or platform, or a
message bus. Message queues provide an asynchronous
communication protocol. Its widespread popularity lies in
not only its asynchronous feature but in the fact that it
provides persistence, reliability and scalability enabling
both time and space decoupling of the so called
publishers and consumers.
Advanced Message Queuing Protocol (AMQP) [1] is an
open standard application layer protocol for messageoriented middleware. RabbitMQ [3] is an open source
message broker software (sometimes called messageoriented middleware) that implements the AMQP and can
be easily used on almost all major operating systems.
B. Fault tolerance in cloud
Although cloud computing has been widely adopted by
the industry, still there are many research issues to be
fully addressed like fault tolerance, workflow scheduling,
workflow management, security, etc [8]. Fault tolerance
is one of the key issues amongst all. It is a complex
challenge to deliver the quality, robustness, and reliability
in the cloud that is needed for widespread acceptance as
tools for the scientists’ community.
To deal with this problem many research has been
already done in fault tolerance. Fault tolerance policy can
be proactive and reactive. While the aim of proactive
349
III.
Info Broker
Info
Provider
Infraand
Node
descripti
ons
Infrastructure
Processor
Enactor
OCCOPUS ARCHITECTURE
Occopus [7] (Fig. 1) has five main components:
enactor issues virtual machine management requests
towards the infrastructure processor;
infrastructure
processor, which is the internal representation of a virtual
infrastructure (enabling the grouping of VMs serving a
common aim); cloud handler enables federated and
interoperable cloud use by abstracting basic IaaS
functionalities like VM creation service composer, which
ensures that VMs meet their expected functionalities by
utilizing configuration management tools and finally the
information broker that decouples the information
producer and consumer roles with a unified interface
throughout the architecture.
After receiving the required infrastructure description
the enactor immediately compiles it to an internal
representation. It is the role of the enactor to forward and
upgrade the node requests to the infrastructure processor
and to monitor the state of the infrastructure continuously
during the setup and the existence of the infrastructure.
This monitoring function is achieved by the help of the
info broker. Among others this component is responsible
for tracking the information flow between the nodes. If it
notices a failure of a node or a connection it notifies the
enactor. The enactor then upgrades the infrastructure
description and forwards it to the infrastructure processor.
The infrastructure processor receives node creation and
node destruction requests from the enactor. During
creation infrastructure processor sends a contextualized
VM requests to the cloud handler. Within the
contextualization information the processor places a
reference to some of the previously created attributes of
VMs.Node destruction requests are directly forwarded to
the cloud handler component.
The cloud handler as its basic functionality, provides
an abstraction over IaaS functionalities and allow the
creation, monitoring and destruction of virtual machines.
For these functionalities, it offers a plugin architecture that
can be implemented with several IaaS interfaces (currently
Occopus supports EC2, nova, cloudbroker, docker and
OCCI interfaces).
350
The main functionality of the Service Composer is the
management of deployed software and its configuration
on the node level. This functionality is well developed and
even commercial tools are available to the public.
Occopus Service Composer component therefore offers
interfaces to these widely available tools (e.g., Chef,
Puppet, Docker, Ansible).
Compiler
techniques is to avoid situations caused by failures by
predicting them and taking the necessary actions, reactive
fault tolerance policies reduce the effect of failures on
application execution when the failure effectively occurs.
Different fault tolerance challenges and techniques
(resubmission, checkpointing, self-healing, job migration,
preemptive migration) have been implemented using
various tools (HAProxy, Hadoop, SGuard,) in the cloud.
Also there are a lot of methods created for providing fault
tolerant execution of scientific workflows in the cloud.
Mostly they heavily rely on sophisticated and complex
models of the failure behavior specific to the targeted
computing environment. In our investigations we are
targeting a solution that is mostly based on the workflow
structure and data about the actual execution timings
retrieved from provenance database.
Node resolver
[cloudinit, docker,
cloudbroker]
Info
Provider
Info
Provider
Service Composer
[chef]
Cloud Handler
[boto, nova, docker,
cloudbroker, occi]
Figure 1. Occopus architecture
IV.
SCIENTIFIC WORKFLOWS IN OCCOPUS
A. Scientific workflows
In Occopus framework a virtual infrastructure can be
built upon a directed acyclic graph representing some
complex scientific experiment which consists of numerous
computational steps. The connection between these
computational steps represent the data dependency, in
other words the dataflow during the experiment. With
Occopus the infrastructure descriptor would contain the
needed resource requirements for each task and also the
SLA for the tasks or for the whole workflow. An SLA
requirement could be time or budget constraint or a need
for green execution. In such a scenario the execution
could be seen as data flowing across the VMs starting
from the entry task executed on the first VM, terminating
by the exit task executed on the last VM. In a scenario like
this every computational step is mapped to an individual
VM. After submitting the virtual infrastructure descriptor
based on the workflow model Occopus would support the
creation of an infrastructure like this. This type of
workflow creation and execution is called Service
Choreography in related works.
1) Advantages
Concerning the execution of scientific workflows in
Occopus has several advantages:
The resources are available continuously, it means that
task execution are not forced to wait for free resource. The
infrastructure is built up easily without expertise
knowledge of the individual cloud providers. The
monitoring is also provided by the Occopus framework.
There is no need for scheduling and resource allocation is
done by Occopus based on the virtual infrastructure
descriptors.
MIPRO 2016/DC VIS
2) Problems with scientific workflows executed in
Occopus
Concerning scientific workflows there might arise a lot
of issues:
Scientific experiments are being data and compute
intensive which may last for even weeks or month and
may use or produce even terabytes of data. Due to the
long execution time many failure could arise. Types of
faults that can arise during execution and need to be
handled in order to provide a fault tolerant execution can
mainly be categorized to the following categories: VM
faults, programming faults, network issues, authentication
problems, file staging errors, and data movement issues.
In order not to lose the already calculated work, fault
tolerance must be provided.
• When a node fails, the computation which was done by
this node is lost. As described in the previous section
Occopus monitors the nodes of the virtual infrastructure
and when the enactor notices a failed node, it is deleted
from the virtual infrastructure list and a new one is
created. The execution could be restarted on it. But what
should happen with the data that was consumed by this
failed node?
• Let us focus on only one aspect of SLA-s (Service Level
Agreement), namely the time constraints. Scientific
workflows are often constrained by soft or hard deadlines.
While soft deadline means that the proposed deadline
should be met with a probability p, hard deadline means
that the results are useless after the deadline. When there
is a failure upon recovery the makespan of the whole
workflow is increased and maybe deadlines cannot be
met. How can it be ensured that SLAs are met?
• Fault tolerance technique should also be concerned when
executing scientific workflows in the cloud. The most
frequent fault tolerant techniques in the cloud are using
resubmission and replicas. How can it be ensured that
more than one successors of the same type (replica) is able
to receive the results of the predecessor(s) and how can it
be provided that the number of replicas can change
dynamically in time?
In the next section we are looking for the best solution
that address the issues described above.
V.
SOLUTIONS
There are two main widespread used alternatives that can
give solutions for the above mentioned problems and are
supported by open source softwares. One of them is
based on service discovery feature, while the other uses a
message queueing system.
A. Service Registry
Service discovery is a key component of most distributed
systems and service oriented architectures deploying
more services. Service locations can change quite
frequently due to host failure or replacement, similarly to
a scientific workflow execution. A node must discover
somehow the IP address and the port number of the peer
MIPRO 2016/DC VIS
application. One solution is to use a dedicated,
centralized service registry node.
A service registry, is a database of services, their
instances and their locations. The main task of it is to
register hosts, ports and authentication credentials, etc.
Service instances are registered with the service registry
on startup and deregistered on shutdown. Clients of the
service query the service registry to find the available
instances of a service. If a node fails the service registry
database is upgraded with the new node.
Concerning workflow execution if a node fails then the
service registry updates its database with the new client,
but the computation that was already done is lost. Also
the consumed data is lost with the failed node. If a
computation has successfully terminated on a VM then
the results of this computation task can be (should be)
stored in provenance database, but because of the nature
and size of the provenance database, it must be located on
a permanent storage. To retrieve these data from this
storage may have high latency due to geographic location
which can be far from the cloud provider.
The flexibility of the solution is also not so good. In this
case the nodes know where to send data so they use the
synchronous remote procedure call or the asynchronous
message passing middleware. As it was mentioned
already in the related work the RPC model does not
support large volume of data and does not support
reliable transport of data. Also with message passing
middleware the communication abstract is the channel
and a connection must be set up between producer and
consumer and consumer listens for the channel
synchronously.
Using this solution a special agent would be needed to
orchestrate the execution of the workflow itself. Without
this agent this solution would work only for small
workflows that use does not move high volume of data
does not need long time to be executed and the reliability
of the resources are high.
B. Message Queuing
Using message queues would simplify almost all of the
above mentioned problems. Advanced Message Queuing
(AMQ) Protocol is an open standard message oriented
middleware. In this approach the message producer does
not send the message directly to a specific consumer
instead characterize messages into classes without
knowledge of which consumer there may be. Similarly,
consumers only receive messages that are of interest,
without knowledge of existing producers. AMQP
operates over an underlying reliable transport layer
protocol such as Transmission Control Protocol (TCP).
The basic idea is that consumers and producers use a
special node to accomplish message passing, which
serves as a rendezvous point between senders and
receivers of messages. They are the queues which are
buffers that temporary or permanently store the messages.
The middleware server has two main functionalities: one
of them is buffering the messages in memory or on disk
when the recipient cannot accept it fast enough and the
other one is to route the messages to the appropriate
351
queue. When a message arrives in a queue the server
attempts to deliver it to the consumer immediately. If this
is not possible the message is stored until the consumer is
ready. If it is possible the message is deleted from the
buffer immediately or after the consumer has
acknowledged it. The reliability lies in this feature. The
acknowledgments could be sent only when the node had
successfully processed the data. The scalability and fault
tolerance can be realized by clustering the same type of
nodes. In this solution the consumers and producers are
not known to each other and there can be more than one
consumer belonging to a single queue. The power of
AMQP comes from the ability to create queues, to route
messages to queues and even to create routing rules
dynamically at runtime based on the actual environmental
conditions. This feature would enable to realize an SLA
based, fault tolerant execution for workflows.
replicas in order to ensure that time critical workflows
can be successfully terminated before the soft or hard
deadline with a probability of p. In our solution every
task in a workflow is assigned certain number of replicas.
The number of replicas is determined by the estimated
execution time of a task, the structure of the workflow,
the failure zone of a task (which is the affected tasks in
the case of a failure of a given task), and the estimated
failure detection time and resubmission time. Before
Occopus starts to build the infrastructure the algorithm is
executed to determine the number of replicas of each task
and the infrastructure is built. When during execution
unexpected situation happens (for example too many
failures occurring, or even that there is no error) than the
number of replicas can be changed accordingly since
Occopus is able to upgrade the infrastructure.
Determining the exact details of our algorithm is our
future work.
VII. CONLUSION
In this paper we have introduced Occopus, a one click
cloud orchestrator framework, which supports distributed
applications to be executed in single or multi homed
clouds. We have investigated the advantages and
problems of having scientific workflows being executed
with Occopus and gave a proposal for a communication
middleware which would provide a reliable, fault tolerant
workflow execution environment. We gave a first insight
for a fault tolerant method that can be used with Occopus
as well and which detailed work out determines our future
research direction.
Figure 2. Possible architecture with Message Queueing
In Fig. 2 a possible architecture for executing scientific
workflows with Occopus can be seen. The abstract model
of the scientific workflow consists of 3 tasks (A1, A2 and
A3) for each an individual VM is started in the cloud.
These are provided by the Occopus framework. Also
Occopus does the monitoring of the resources, as well as
the infrastructure upgrading. All of the tasks are
communicating with the MQ (message queue), so they
are not aware of each other. The MQ can be positioned
also in the cloud on a VM or on external storage. It
depends on the amount of data that must be shared
between the tasks and the geographic location of the
VMs. The Agent is only responsible for monitoring the
workflow execution according to the predefined
constraints (the input, output format of the data, time
constraints, etc.) and according to the SLA to request a
virtual infrastructure change from the Occopus
framework (for example to start more or less replicas of
the tasks).
VI.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
AMQP Advanced Message Queuing Protocol, Protocol
Specification, Version 0-9-1, 13 November 2008.
E. Curry, Message-Oriented Middleware, in: Q.H. Mahmoud
(Ed.), Middleware for Communications, John Wiley and Sons,
Chichester, England, 2004, pp. 1–28.
A. Videla, J.W. Williams, RabbitMQ in Action: Distributed
Messaging for Everyone, MEAP Edition Manning Early Access
Program, 2011.
P.T. Eugster, P. Felber, R. Guerraoui, A.-M. Kermarrec, “The
many faces of publish/ subscribe”, ACM Comput. Surv. 35 (2)
(2003) 114–131.
K. Geihs, Middleware challenges ahead, IEEE Comput. 34 (6)
(June 2001) 24–31.
M. Albano et al. Message-oriented middleware for smart grids,
Computer Standards & Interfaces 38 (2015) 133–143.
G. Kecskeméti, M. Gergely, A. Visegrádi, Zs. Németh, J. Kovács,
P. Kacsuk, One Click Cloud Orchestrator: bringing Complex
Applications Effortlessly to the Clouds, WORKS 2014.
A. Bala, I. Chana, Fault Tolerance- Challenges, Techniques and
Implementation in Cloud Computing, IJCSI International Journal
of Computer Science Issues, Vol. 9, Issue 1, No 1, January 2012
FAULT TOLERANCE METHOD BASED ON WORKFLOW
STRUCTURE
Concerning time critical applications a reliable fault
tolerant method should be provided. In this section we lay
down the bases of our fault tolerant framework that uses
352
MIPRO 2016/DC VIS
FPGA Kernels for Classification Rule Induction
P. Škoda* and B. Medved Rogina*
*
Ruđer Bošković Institute, Zagreb, Croatia
pskoda@irb.hr
Abstract - Classification is one of the core tasks in machine
learning data mining. One of several models of classification
are classification rules, which use a set of if-then rules to
describe a classification model. In this paper we present a
set of FPGA-based compute kernels for accelerating
classification rule induction. The kernels can be combined to
perform specific procedures in rule induction process, such
as evaluating rule coverage, or estimating out-of-bag-error.
Since classification problems are getting increasingly larger,
there is a need for faster implementations of classification
rule induction. One of the platforms that offer great
potential for accelerating data mining tasks is FPGA (field
programmable gate array), which provides the means for
implementing application specific accelerators.
Key words - FPGA, dataflow, machine learning, classification
rules
I.
INTRODUCTION
Classification is one of the fundamental tasks in
machine learning, and is used in a wide variety of
application. One of the fundamental models of
classification is classification rules [1]. Classification rules
express the classification model as a set of IF-THEN
rules, and enable implementation of fast classifiers with
high throughput. Classification rule induction is usually
performed in one of two ways: (a) by constructing a
decision tree and then extracting rules form it; or (b) by
using a covering algorithm. Applications of classification,
as well as the field of data mining in general, are faced
with continuing increase in dataset sizes. This drives
efforts to develop new faster and more efficient
algorithms, as well as developing new hardware platforms
that provide high computational power [2].
One hardware platforms that is gaining more ground in
computing is the field programmable gate array (FPGA).
FPGA. FPGAs are digital integrated circuits designed to
be user-configured after manufacturing [3], [4]. They
enable fast development and deployment of custom digital
hardware, and thus allow implementation of custom
computational architectures that can be reconfigured on
demand.
The most suitable computational mode for FPGA is
the dataflow model. Most computer systems implement a
control-flow model, in which the computation is described
by a sequence of operations that are executed in specified
order. In dataflow model, computation is described by
chain of transformations applied to a stream of data [5].
This work was supported in part by Maxeler University
Programme, and by Croatian Science Foundation under the project
number I-1701-2014.
MIPRO 2016/DC VIS
The dataflow is described in form of a graph. Nodes of the
graph represent operations, and edges represent data links
between the nodes. The dataflow graph maps naturally to
the hardware implementation. Nodes/operations are
translated to functional hardware blocks, while edges are
translated to data buses and signals. By dividing the graph
into stages separated by registers, it’s transformed into a
pipelined structure suitable for implementation on FPGA.
In this paper we present a novel set of kernels for
accelerating classification rule induction that uses a
variant of covering algorithm. The kernels are targeted for
dataflow architecture realized on FPGA.
II.
RELATED WORK
While there has been a lot of activity in implementing
classification rules on FPGAs, mostly for network
filtering applications, there is no published work known to
authors related to rule learning. There is, however, some
work in the closely related problem of decision tree
induction [6]–[8].
Narayanan et al. [7] have used FPGA to implement
decision tree induction system for binary classification.
They implemented Gini impurity computation, as the most
intensive part of the process. The rest of the algorithm is
executed on a PowerPC CPU embedded in the FPGA
device.
Chrysos et al. [8] implement frequency table
computation on FPGA as part of HC-CART system for
learning decision trees. Frequency table computation is
implemented in Frequency Counting module, which
receives attribute-class label pairs. The pairs are used to
generate addresses for frequency table locations which
contents are then incremented by one. In their
implementation, all attribute-class label pairs received by
the kernel have to be prepared by program running on
CPU, and then preloaded to memory attached to FPGA.
Input data is transferred from CPU to FPGA memory
many times in the course of program execution.
In our previous work [9], we implemented a 2D
frequency table computation using FPGAs. The kernel
receives two input streams – one for address and one for
class labels – and counts occurrences of pairs in FPGA’s
internal memory. The dataset is held in FPGA attached
SDRAM. We built upon this work by expanding it to
multiple attribute streams [10], which allow computation
of several tables simultaneously, and by implementing
efficient data transfer for the computed tables.
In this paper we present out work
353
TABLE I.
BASIC INFORMATION ON MAXELER VECTIS-LITE FPGA
BOARD
FPGA
Off-chip RAM (LMem)
On-chip RAM (FMem)
CPU ↔ FPGA bandwidth
LMem ↔ FPGA bandwidth
III.
Xilinx Virtex-6 XCVSX475T
6 GiB (6×1 GiB) DDR3-800 SDRAM
~ 4 MiB Block RAM
2 GB/s
38.4 GB/s max.
DATAFLOW ENGINE ARCHITECTURE
The accelerator is implemented in form of a dataflow
engine (DFE) which consists of at least one kernel and a
manager. The computation is implemented by kernels,
while the role of manager is to organize data movement
within the DFE. Kernels are instantiated within the
manager, and data links between the kernels, external
RAM, and CPU are defined.
The frequency table computation was implemented
using the Maxeler platform [11]. The platform includes an
FPGA board, drivers, and API. The DFE is coded in Java,
using the API which provides objects and methods that are
translated to hardware units by the compiler. The compiler
translates the Java code into VHDL, and generates
software interface for CPU code [12]. The DFE was
implemented on Maxeler Vectis-Lite board, which
contains a single FPGA, and on-board SDRAM. The
board is connected to the host PC workstation via PCIe
bus. Basic information on the board is shown in Table I.
A. Count conditional kernel
Architecture of the ComputeFreq-MS kernel is shown
in Fig. 1. The kernel has two scalar inputs for parameters:
one for number of items to process (items), and one for
stream length in items (strmLen). Stream length sets the
number of items to read from the on-board SDRAM,
which requires all accesses to be made in 96 byte blocks,
i.e. 24 item blocks (4 bytes per item). The kernel
processes the first items elements from the stream, while
the rest are read from memory and ignored.
The kernel consists of six identical frequency counter
structures. Each frequency counter receives two input
streams – one for attribute and one for class values. Both
streams carry 32 bit unsigned integers. The input streams
are sliced so the low NA bits retained from the attribute,
and NC bits are retained from the class values. The
retained bits are concatenated to form the frequency table
address. In the table address, the attribute value is the
high, and the class value is the low part of the address
word.
The kernel has a total of seven stream inputs, six for
attribute (att0 – att5), and one for class (class) value
streams. Each attribute value stream is connected to a
single frequency counter. The class value stream fans out
to all frequency counters, so that they all receive identical
class value streams.
B. Increment conditional kernel
The frequency tables are stored in on-chip memory –
block RAM (BRAM). Each frequency counter has one
BRAM which is addressed by the address formed from
the low bits of attribute and class streams. The addressed
location is read, its value incremented by one, and written
back to the same location. The BRAM is configured as
single port RAM with read-first synchronization. Due to
access latencies of the BRAM itself, and the registers
inserted by the compiler, there are latencies inside the
loop. To ensure correct counting, the kernel’s input
streams are throttled to allow input of one element each m
clock cycles, where m is the internal latency of the loop.
After all elements of interest are streamed in, the
contents of the BRAMs are streamed out to the host main
memory through stream output s. The output stream s is
made by joining output streams from all frequency
counters (s0 – s5) into a one vector stream. The output
vector is padded to 8 elements, i.e. two additional dummy
elements are added to it. The padding enables efficient
conversion of the output stream to word width of 128 bits
which is used in communication over PCIe bus.
As BRAMs are read, they are at the same time reset to
zero. During the read phase, the BRAMs are addressed by
a counter, and their write ports are switched from
increment-by-one values to constant zero. In this case
there are no loop dependencies, and the output stream is
not throttled – it outputs one element every clock cycle. At
the same time, any remaining elements in the input stream
are read un-throttled (one per clock) and ignored.
C. Logic simple kernel
A single ComputeFreq-MS kernel is instantiated in
manager. Input streams attClass and att0 – att5, are linked
TABLE II.
Figure 1. ComputeFreq-MS kernel architecture for up to 64 unique
attribute and class values (NA = 6, NC = 6).
354
FPGA RESOURCE USAGE BY THE DFE
Resource
Used
Total available
Utilization
LUTs
Flip-flops
DSP blocks
Block RAM
34,724
50,064
0
226
297,600
297,600
2,016
2,128
11.67 %
16.82 %
0.00 %
10.62 %
MIPRO 2016/DC VIS
to on-board SDRAM. All input streams use linear
memory access pattern. Output stream s is linked to
computer main memory. Source addresses for input
streams att0 – att5, attClass, values for scalar inputs items
and strmLen, and destination address for output stream s
are defined through the generated DFE interface.
The kernel parameters were set for up to 64 unique
attribute and class values (NA = 64, NC = 64). Each
frequency table holds 4096 32-bit words. Due to the
frequency counter’s loop latency, it can process one item
(attribute - class pair) every five clock cycles. Kernel
clock frequency is set to 300 MHz. With six attribute
streams, the DFE can process up to 360×106 elements per
second. FPGA resource usage for the DFE is given in
Table II.
IV.
EXPERIMENTAL RESULTS
A. Test environment
The kernel was benchmarked using code form C4.5
Release 8 decision tree learning program [13]. Parts
needed to load the datasets, and computing the frequency
tables were extracted from the program and used for
benchmarking. The frequency table computation was
parallelized using OpenMP [14] by distributing attributes
of the dataset over the threads. This included replicating
the frequency table data structure to accommodate multithreaded execution. The original ComputeFrequencies
function was modified to use the replicated data structure.
Since the attributes were distributed over threads, multiple
invocations of ComputeFrequencies function were
necessary if number of attributes exceeded the number of
threads.
Functions for transforming the dataset to the
appropriate format, and loading it to the DFE were added
to the benchmark program. For execution on DFE, a new
function was created by modifying the original
ComputeFrequencies function. Modifications involve
removing the computation loop and replacing it with
function calls to the DFE. Received results are
transformed into the frequency table data structure used
by the parallelized software implementation. Since the
DFE processes six attributes in parallel, the
ComputeFrequencies function is invoked once for every
Figure 2. Execution time on CPU as function of number of items,
measured for datasets with 6 to 1,536 attributes
MIPRO 2016/DC VIS
group of six attributes.
Time
measurement
was
added
to
the
ComputeFrequencies function. Entire run time of the
function is measured. The function’s throughput is
calculated from the measured times using the formula:
Ta ,i =
ai
ta ,i
(1)
where a is number of attributes, i is number of items, ta,i is
function execution time, and Ta,i is the calculated function
throughput for the dataset with a attributes, and i items.
A set of 102 synthetic datasets were used in the
benchmark. Number of attributes ranged from 6 to 1,536,
in a geometric sequence 6×2i, with i = 0, 2... 7. Number of
items ranged from 2048 to 4×220, in a geometric sequence
2i, with i = 11, 12... 22. The dataset size was limited to
3 GiB, i.e. to 768×220 elements. Datasets were generated
randomly, with uniformly distributed values in range of
1-63 for attribute values, and 0-64 for class values.
Benchmark program was compiled using gcc
compiler, version 4.4.7. Benchmarks were run on Intel
Xeon E5-1600 workstation, with 16 GiB DDR3-1600
RAM, under Centos 6.5 Linux OS.
B. CPU benchmark results
For the baseline CPU performance, the measurements
were conducted on a six-threaded parallelized
implementation of ComputeFrequencies function.
Measured execution times are given in Table III, and are
shown in Fig. 2. For datasets with over 128×210 items,
execution time scales approximately linearly with number
of items. There is a larger increase in execution in area of
16×210 – 32×210 items, for datasets with 96 or more
attributes, and 32×210 – 256×210 for datasets with less than
96 attributes.
This increase in execution is more obviously visible in
function throughput, shown in Fig. 3, as a sharp drop of
the throughput. From the figure it’s visible that for
datasets with fewer items, the throughput increases with
increasing number of items. Once the dataset exceeds a
certain number of items (dependent on number of
Figure 3. Throughput on CPU as function of number of items,
measured for datasets with 6 to 1,536 attributes
355
Figure 5. Execution time on DFE as function of number of items,
measured for datasets with 6 to 1,536 attributes
Figure 4. Throughput on DFE as function of number of items,
measured for datasets with 6 to 1,536 attributes
attributes), throughput drops, and quickly stabilizes to an
approximately constant value. As number of attributes
increases, function throughput decreases, assuming equal
number of items. The number of items at which peak
throughput is recorded, also decreases with increasing
number of attributes, as well as the peak throughput value
itself. For datasets with 384 or more attributes, there is no
increase of throughput. Throughput just drops from one
approximately constant value to another.
in Fig 4. Execution time on DFE scales approximately
linearly with number of items in the dataset, on datasets
with 512×210 or more items. On datasets with fewer items,
the execution times asymptotically approach a certain
minimal value, dependent on number of attributes. For a
fixed number of items, the execution times scale linearly
with number of attributes.
Maximum measured constant (invariant to number of
items) throughput is 617×106 elements/s, achieved on
dataset with 12 attributes. Minimal measured constant
throughput is 52.2×106 elements/s, measured on dataset
with 1,536 attributes.
The drop in function throughput is most likely a
consequence of interaction of CPUs cache system and
main SDRAM. Smaller datasets have a smaller likelihood
of cache miss, which translates into higher throughput for
smaller datasets. Another important factor is the data
structure for storing ht dataset. The dataset is stored in
attribute major order, which leads to memory accesses
with stride. With more attributes, the stride is larger, and
consequentially, so is the likelihood of cache miss.
C. DFE benchmark results
DFE was benchmarked under the same condition as
the six-threaded software implementation. Execution
times measured on DFE are given in Table IV, and shown
The kernel throughput graph, shown in Fig 5, shows
that curves for all number of attributes form a tight group.
The curves for different number of attributed cannot be
distinguished one from another. On datasets with 512×210
items, the throughput is approximately constant at
340×106 elements/s. The measured throughput is close to
the theoretical maximum of 360×106 elements/s.
The DFE performs better with datasets with larger
number of items, close to theoretical maximum when
number of items is 512×210 or more. This is a
consequence of communication and control overheads
between the DFE and the host computer. The overhead is
independent from number of items, and its influence
diminishes as the number of items increases. As can be
seen from Table IV, execution times are approximately
constant for datasets with 8,192 or less items. The
minimum DFE execution time was calculated from these
measurements, amounting to 801.1 μs.
D. Comparison of the results
DFE and CPU results were compared by computing
speedup, i.e. ratio of execution times on CPU and on DFE.
The speedup is shown in Fig 6. For most of the test dataset
sizes, execution on DFE is slower than on CPU. For
datasets with fewer than 48 attributes, the DFE is slower
no matter what the dataset size is. For 48 attributes the
speedup exceeds unity value for numbers of items 128×210
or more. The maximum speedup is achieved on datasets
with 384 and more attributes. Speedups on these datasets
are close in value, suggesting that by further increasing
number of attributes will result in negligible increase in
speedup. These values can then be used as approximation
of the upper bound on speedup. The maximum speedup
measured is 6.26×, achieved on dataset with 1,536
attributes and 512×210 items.
Figure 6. Speedup, measured for datasets with 6 to 1,536 attributes
356
MIPRO 2016/DC VIS
TABLE III.
Number
of items
2,048
4,096
8,192
16,384
32,768
65,536
131,072
262,144
524,288
1,048,576
2,097,152
4,194,304
6
12
24
50.59 μs
63.31 μs
88.21 μs
139.2 μs
245.6 μs
462.1 μs
899.0 μs
2.056 ms
4.215 ms
8.365 ms
16.70 ms
33.49 ms
105.3 μs
133.5 μs
190.1 μs
302.8 μs
538.0 μs
1.014 ms
2.066 ms
5.184 ms
10.35 ms
20.48 ms
40.80 ms
81.27 ms
215.0 μs
280.4 μs
403.6 μs
668.8 μs
1.219 ms
2.296 ms
6.949 ms
14.48 ms
28.61 ms
56.80 ms
113.3 ms
226.3 ms
MEASURED EXECUTION TIMES ON CPU
Number of attributes / Execution time
48
96
192
434.9 μs
571.0 μs
847.8 μs
1.557 ms
2.867 ms
10.92 ms
25.08 ms
49.73 ms
98.80 ms
197.3 ms
394.1 ms
787.0 ms
The total performance of the DFE was estimated by
calculating the weighted average of speedups, weighted
by number of elements in the dataset:
S=
∑ ∑ ans
∑ ∑ an
a
a
(1)
a,n
n
mf
Fa , n
(1)
where Ea,n is efficiency and Fa,n is throughput for dataset
with a attributes and n items, f is clock frequency, and m is
number of threads or number of streams on CPU and DFE
respectively. Peak throughput values were used to
calculate efficiency.
The software implementation (CPU baseline) was
executed on six threads, on CPU clocked at 3.2 GHz. Peak
throughput was 875×106 elements/s. The DFE uses six
TABLE IV.
6
12
24
800.6 μs
817.1 μs
898.9 μs
1.031 ms
1.322 ms
1.929 ms
3.009 ms
5.277 ms
9.664 ms
18.55 ms
36.03 ms
70.96 ms
1.597 ms
1.634 ms
1.780 ms
2.014 ms
2.626 ms
3.758 ms
6.153 ms
10.41 ms
19.35 ms
37.08 ms
72.09 ms
142.0 ms
3.220 ms
3.248 ms
3.544 ms
4.043 ms
5.338 ms
7.576 ms
11.98 ms
20.70 ms
38.62 ms
74.19 ms
144.2 ms
284.1 ms
MIPRO 2016/DC VIS
384
768
1,536
6.049 ms
10.59 ms
19.56 ms
41.85 ms
156.3 ms
373.2 ms
706.2 ms
1.518 s
3.079 s
6.359 s
13.21 s
–
8.887 ms
14.37 ms
37.46 ms
91.15 ms
360.6 ms
808.1 ms
1.702 s
3.467 s
6.987 s
14.01 s
–
–
21.32 ms
42.51 ms
89.34 ms
195.6 ms
803.1 ms
1.860 s
3.795 s
7.658 s
15.44 s
–
–
–
streams, and was clocked at 300 MHz. Peak throughput on
DFE was 355×106 elements/s. Execution efficiency on
CPU is 21.9 clock cycles per element, and on DFE it’s
5.08 clock cycles per element. Of the 5.08 clocks, 4 are
consequence of the frequency counter’s internal loop
latency.
V.
For further comparison, execution efficiency was
computed for CPU and DFE. The efficiency is defined as:
Number
of items
2,048
4,096
8,192
16,384
32,768
65,536
131,072
262,144
524,288
1,048,576
2,097,152
4,194,304
1.769 ms
3.096 ms
5.027 ms
13.08 ms
46.18 ms
97.45 ms
191.5 ms
382.5 ms
766.7 ms
1.553 s
3.176 s
6.303 s
n
where S is the average speedup, and sa,n is the speedup on
dataset with a attributes and n items. The average speedup
is 4.1.
E a,n =
916.9 μs
1.237 ms
2.228 ms
3.321 ms
13.99 ms
35.06 ms
62.94 ms
132.1 ms
267.1 ms
537.8 ms
1.071 s
2.139 s
CONLUSION
In this paper we presented a multi-streamed compute
architecture for 2D frequency matrix computation,
implemented on FPGA platform. This multi-streamed
architecture is an advancement on our previous work [9].
Benchmark results reveal that CPU outperforms DFE
for smaller datasets. This is a consequence of control and
communication latencies between the FPGA board and
the host computer. Minimal time required to execute any
action on DFE is approximately 800 µs, regardless of
dataset size. Small datasets are easily processed by the
CPU in less time. The DFE outperforms CPU for larger
datasets that have at least 48 attributes, 32×210 items, and
6×220 elements. Best speedup achieved by DFE is 6.26×.
The DFE is more efficient in processing data, requiring
only 5.08 clock cycles per dataset element, while CPU
requires 21.9, when performing at peak efficiency. Of the
DFE’s 5.08 clock cycles, 4 cycles are consequence of
frequency counter’s internal loop latencies.
MEASURED EXECUTION TIMES ON DFE
Number of attributes / Execution time
48
96
192
6.385 ms
6.559 ms
7.133 ms
8.187 ms
10.55 ms
15.54 ms
24.43 ms
42.10 ms
77.18 ms
147.9 ms
288.3 ms
567.9 ms
12.86 ms
13.00 ms
14.39 ms
16.47 ms
21.03 ms
30.35 ms
48.17 ms
83.98 ms
154.0 ms
296.4 ms
576.1 ms
1.136 s
25.69 ms
25.90 ms
28.33 ms
33.04 ms
42.13 ms
60.91 ms
97.42 ms
169.3 ms
308.5 ms
594.5 ms
1.154 s
2.272 s
384
768
1,536
51.29 ms
51.89 ms
58.38 ms
65.85 ms
83.68 ms
123.2 ms
192.3 ms
338.4 ms
616.5 ms
1.184 s
2.307 s
–
102.5 ms
104.2 ms
113.6 ms
131.9 ms
167.0 ms
243.9 ms
381.3 ms
675.5 ms
1.236 s
2.374 s
–
–
203.4 ms
209.2 ms
227.3 ms
260.4 ms
337.5 ms
483.4 ms
779.6 ms
1.356 s
2.467 s
–
–
–
357
The kernel can be further improved by vectorizing
inputs stream in the same manner its output was
vectorized. This will allow efficient processing of more
elements in parallel, and better utilization of available
memory bandwidth. Another improvement would be
reducing or eliminating the internal loop latencies. This
can potentially quadruple its performance assuming that
operating frequency of the kernel does not drop due to
higher design complexity.
In future work, in addition to the stated improvements,
this kernel will be integrated in the C4.5 program as
accelerator unit. However, to achieve significant
performance gains the communication latency between
host computer and FPGA will have to be compensated for.
This will most likely involve adding support for batch
processing of a series of small dataset, and require some
modifications of the original algorithm. Overall, with
additional improvements the kernel is expected to
outperform the CPU for a wider range of dataset sizes.
REFERENCES
[1]
[2]
[3]
[4]
358
J. Han and M. Kamber, Data Mining: Concepts and Techniques,
2nd ed. Morgan Kaufmann, 2006.
A. N. Choudhary, D. Honbo, P. Kumar, B. Ozisikyilmaz, S. Misra,
and G. Memik, “Accelerating data mining workloads: current
approaches and future challenges in system architecture design,”
Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 1, no. 1,
pp. 41–54, Jan. 2011.
I. Kuon, R. Tessier, and J. Rose, “FPGA Architecture,” Found.
Trends Electron. Des. Autom., vol. 2, no. 2, pp. 153–253, 2008.
Mark L. Chang, “Device Architecture,” in Reconfigurable
Computing: The Theory and Practice of FPGA-based
Computation, S. Hauck and A. DeHon, Eds. Morgan Kaufmann,
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
2008, pp. 3–27.
J. B. Dennis, “Data Flow Supercomputers,” Computer (Long.
Beach. Calif)., vol. 13, no. 11, pp. 48–56, Nov. 1980.
P. Škoda, B. Medved Rogina, and V. Sruk, “FPGA
implementations of data mining algorithms,” in MIPRO, 2012
Proceedings of the 35th International Convention, 2012, pp. 362–
367.
R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J.
Zambreno, “An FPGA Implementation of Decision Tree
Classification,” in 2007 Design, Automation & Test in Europe
Conference & Exhibition, 2007, pp. 1–6.
G. Chrysos, P. Dagritzikos, I. Papaefstathiou, and A. Dollas, “HCCART: A parallel system implementation of data mining
classification and regression tree (CART) algorithm on a multiFPGA system,” ACM Trans. Archit. Code Optim., vol. 9, no. 4,
pp. 1–25, Jan. 2013.
P. Škoda, V. Sruk, and B. Medved Rogina, “Frequency Table
Computation on Dataflow Architecture,” in MIPRO, 2014
Proceedings of the 37th International Convention, 2014, pp. 357–
361.
P. Škoda, V. Sruk, and B. Medved Rogina, “Multi-stream 2D
frequency table computation on dataflow architecture,” in MIPRO
2014, Proceedings of the 38h International Convention, 2015, pp.
288–293.
“Maxeler
Technologies,”
2015.
[Online].
Available:
http://www.maxeler.com/. [Accessed: 23-Dec-2015].
O. Pell and V. Averbukh, “Maximum Performance Computing
with Dataflow Engines,” Comput. Sci. Eng., vol. 14, no. 4, pp. 98–
103, Jul. 2012.
J. R. Quinlan, “C4.5 Release 8,” 1993. [Online]. Available:
http://rulequest.com/Personal/c4.5r8.tar.gz. [Accessed: 23-Jun2013].
OpenMP Architecture Review Board, “OpenMP Application
Program
Interface,”
2011.
[Online].
Available:
http://www.openmp.org/mp-documents/OpenMP3.1.pdf.
[Accessed: 05-Feb-2014].
MIPRO 2016/DC VIS
VISUALIZATION SYSTEMS
Prototyping of visualization designs of 3D vector
fields using POVRay rendering engine
J. Opiła*
*AGH
University of Science and Technology
Department of Applied Computer Science, Faculty of Management, Cracow, Poland
jmo@agh.edu.pl
Abstract – There is a persistent quest for novel methods of
visualization in order to get insight into complex phenomena
in variety of scientific domains. Researchers, ex. VTK team,
achieved excellent results; however, some problems
connected with implementation of new techniques and
quality of the final images still persist.
Results of inspection of number of visualization styles of 3D
vector field employing POVRay ray-tracing engine are
discussed, i.e. hedgehogs, oriented glyphs, streamlines,
isosurface component approach and texturing design. All
styles presented have been tested using water molecule
model and compared concerning computing time,
informativeness and general appearance. It is shown in the
work that Scene Description Language (SDL), domain
specific language implemented in POVRay is flexible
enough to use it as a tool for fast prototyping of novel and
exploratory
visualization
techniques.
Visualizations
discussed in the paper were computed using selected
components of API of ScPovPlot3D, i.e. templates written in
the SDL language. Results are compared to designs already
implemented in VTK.
Keywords - POVRay, vector field
ScPovPlot3D, visual data analysis, VTK.
I.
visualization,
INTRODUCTION
In the recent years both, computational and
experimental methods deliver a lot of data. Most
productive sciences includes astronomical sky surveys,
engineering, econometrics, medical sciences including
medical imaging and this condensed list is hardly
complete. Often collected or computed data are in the
form of 3D vector field, static or dynamic, on all spatial
scales, for example ocean currents, wind speed or
electrostatic molecular level fields. Vector fields can be
described by differential equations which sometimes can
be reduced to a simple formula e.g. for gravitational, or
electrostatic field. However in numerous cases they can be
obtained by measurements only for example winds
distribution [1], [2]. Reliable analysis of such a data
requires intensive usage of computer data processing and
subsequent visualization step.
After many years of development ([3], [4])
visualization of 3D vector fields still may be improved
according to characteristic of a specific case, e.g. by
The work was supported by AGH University of Science and
Technology.
MIPRO 2016/DC VIS
implementation of novel hybrid designs. Thus
visualization lib